Archive and purge data

Retention policies for archiving, backing up, and purging data should include your Itential Platform and Itential Automation Gateway (IAG) assets. This page covers best practices and suggested methods for on-premises Itential Platform and IAG installations.

Itential Platform is not a system of record. Itential strongly recommends specifying a retention policy that suits your business needs.

Itential Platform

Itential Platform uses MongoDB to store application and instance data. Itential recommends performing regular backups of the MongoDB database and storing backup data in a separate location for safekeeping and recovery. The specific Itential Platform .bin installation file should also be retained to support re-installation to exact specifications if restoration is necessary.

Itential Automation Gateway

IAG maintains host files, scripts, playbooks, and custom modules on a Linux file system. Back up those file locations periodically using cron jobs.

Data storage size

When planning data preservation and backups:

Check your configuration and default directory size. The maximum storage size for backup data is often limited.
Avoid archiving long-running workflows.
Make large job documents modular. MongoDB uses GridFS for storing files larger than 16MB.

Log rotation

Gather log data from a rolling log. As a recommended default, configure a rolling log distributed across 99 files over time (this number can be adjusted lower). Purge rolled log files when the log directory reaches a particular size or when the active log file reaches its file size threshold.

Archiving and backup checklist

	Operation	Recommendations
☐	Take a snapshot and dump MongoDB data.	Can be the entire MongoDB database if no CI/CD process is in place to repair workflows, JSTs, or forms. Can also be limited to jobs, tasks, Golden Config compliance collections, and Lifecycle Manager entities (2023.1). All data files should be zipped and stored in a separate location.
☐	Set log rolling to 99 files.	Adjust the rollover log threshold to a lower file number if needed. Configure a time-based policy to purge rollover logs.
☐	Archive IAG assets.	Use a CI/CD process to manage hosts, scripts, playbooks, and similar files. Zip all assets and place them in a storage archive for snapshot purposes.
☐	Back up device inventory.	If using an internal inventory in IAG, back up the SQLite database that holds device data.
☐	Set a frequency for retention and purging.	30 days is the minimum standard for purging stale data. 60–90 days or longer is the minimum standard for retaining data, depending on storage, organizational policy, and business needs.

Collections

Itential recommends including the following MongoDB collections in data backups.

Collection	Description
`job_history`	Number of jobs in the system over a period of time.
`job_output`	Output generated after running a workflow.
`jobs`	Job documents.
`wfe_job_metrics`	Workflow Engine metrics data for workflows.
`wfe_task_metrics`	Workflow Engine performance data on tasks within workflows.
`ucm_configs`	Configuration backups.
`ucm_compliance_reports`	Compliance reports.

Additional collections

Lifecycle Manager (LCM) collections can also be archived and purged, though the cadence depends on policy and business need. LCM data may be relevant for several years in some organizations, while others may only need to retain data from the past 12 months. Because LCM collections contain integral state data and resource models for network configuration, Itential strongly recommends defining which LCM collections to store, where to store them, and for how long.

Archive jobs (2023.2)

Job data collections

In the 2023.2 release, all job variable data — including incoming and outgoing task data — has been moved out of the job and task collections.

Data less than 16MB is stored in the job_data collection.
Data greater than 16MB is stored in the GridFS bucket, which uses the job_data.chunks and job_data.files collections.

Job ID required for archiving

When archiving a job in 2023.2:

The job_id is required to retrieve all job_data documents for archiving.
If job_id is not provided, the overwhelming majority of job-related data will remain in the database.

job_data collection

All data in the job_data collection can be queried by job_id:

1 {
2   "_id": ObjectId("657cc218a4f4b23e8083bff0"),
3   "job_id": "f04d7175b9d7452bb718c3c6",
4   "data": "example"
5 }

GridFS bucket

For GridFS, data is split between the job_data.chunks and job_data.files collections. The job_data.files collection contains metadata and all queryable information. The job_id is located in metadata.job for GridFS documents.

Use the MongoDB driver and consult the GridFS documentation for guidance on querying and deleting files.

Example job_data.files document:

1 {
2   "_id": ObjectId("657b6f14c928923a20266cc6"),
3   "length": 30000002,
4   "chunkSize": 261120,
5   "uploadDate": ISODate("2023-12-14T21:09:40.525Z"),
6   "filename": "3f0f8f8e-ded1-4c80-8c40-d5eb3657a018",
7   "metadata": {
8     "job": "24b0ea5ffbe646b181531585"
9   }
10 }

Sample commands

MongoDB

Run mongodump:

$ mongodump --db=<old_db_name> --collection=<collection_name> --out=data/

Run mongorestore:

$ mongorestore --db=<new_db_name> --collection=<collection_name> data/<db_name>/<collection_name>.bson

Linux

Edit a cron job:

$ crontab -e

Back up IAG inventory:

$ sqlite3 iag.sq3 ".backup 'inventory'"

Restore IAG inventory:

$ sqlite3 iag.sq3 ".restore 'inventory'"

Itential pre-builts

The Archive Job Data pre-built allows Itential Platform users to archive the jobs and tasks collections for the 2023.1 and 2022.1 releases.

A forthcoming pre-built will enable job archiving on 2023.2 and later versions. Once available, this page will be updated with the newer pre-built.