- 15 May 2024
-
DarkLight
-
PDF
Archiving & Purging
- Updated on 15 May 2024
-
DarkLight
-
PDF
Retention policies in place for archiving, backing up files and purging should also include your IAP and IAG assets. This article provides several best practices and suggested methods of complying with such policies for on-premise IAP/IAG installations.
⚠ The Itential platform is not a system of record. Itential strongly recommends that you specify a retention policy that suits your business needs.
Itential Automation Platform (IAP)
IAP uses MongoDB to store application and instance data. Itential recommends that regular backups of the Mongo database are performed and backup data is stored in a separate location for safekeeping and recovery requirements. The specific IAP .bin
file used for installation should also be retained to facilitate a re-install to exact specifications should restoration be necessary.
Itential Automation Gateway (IAG)
IAG maintains host files, scripts, playbooks, and custom modules on a Linux file system. Those file locations should be backed up periodically at scheduled times using cron
jobs.
Data Storage Size
In the broader context of data preservation and ensuring accessible data backups:
- Check the configuration and default directory size. The maximum storage size for backup data is often limited.
- Avoid archiving long running workflows.
- Make large job documents modular. In MongoDB, GridFS is used for storing files larger than 16 MB.
Log Rotation
Log data should be gathered from a rolling (rollover) log. As a recommended default, a rolling log should be distributed across 99 files over time (this number can be adjusted lower). To keep the logging system within specified space limits, rolled log files can be emptied (purged) whenever the log directory reaches a particular size and space is needed, or whenever the active log file generated by debug and trace has reached its file size threshold.
Checklist
Use the following checklist as an aid to help guide you through the process of archiving and performing backup operations.
Operation | Recommendations | |
---|---|---|
☐ | Take snapshot and dump MongoDB data. | - Can be the entire Mongo database, if no CI/CD process in place to repair workflows, JSTs, or forms. - Can also be only jobs, tasks, Golden Config compliance collections, and LifeCycle Manager entities (2023.1). -All data files should be zipped and stored in a separate location. |
☐ | Set log rolling to 99 files. | - Rollover log threshold can be adjusted to a lower file number. - Configure a time-based policy to purge rollover logs. |
☐ | Archive IAG assets. | - A CI/CD process should be used to manage hosts, scripts, playbooks, etc. - All assets should be zipped and placed in a storage archive for snapshot purposes. |
☐ | Backup device inventory | ⚠ If using an internal inventory in IAG, backup the SQLite database that holds the device data. |
☐ | Set a frequency for retention and purging. | - 30 days is the minimum standard for purging stale data. - 60-90 days or longer is the minimum standard for retaining data, depending on storage, organizational policy, and business needs. |
Collections
Itential recommends the following list of MongoDB collections be included in a data backup.
Collection Name | Description |
---|---|
job_history |
Number of jobs in the system over a period of time. |
job_output |
Output generated after running a workflow. |
jobs |
Job documents. |
wfe_job_metrics |
Workflow Engine metrics data for workflows. |
wfe_task_metrics |
Workflow Engine performance data on tasks within workflows. |
ucm_configs |
Configuration backups. |
ucm_compliance_reports |
Compliance reports. |
Additional Collections
While additional collections from Lifecyle Manager (LCM) can be archived and cleaned (purged), the cadence can vary based on policy and business need. Some will find this type of data is valid for several years, while others may only want to maintain LCM data that covers the last 12-months. Since LCM collections are comprised of integral state data and resource models for network configuration, Itential strongly recommends that you specify what LCM collections to store, where they should be stored and for how long.
Archiving Jobs (2023.2.0)
This section provides practices and recommendations as they relate to archiving jobs in the 2023.2 IAP release.
Job Data Collections
All job variable data in IAP 2023.2, as well as incoming and outgoing task data, has been moved out of the job and task collections.
- For data less than 16MB, it will live in a collection called
job_data
. - For data greater than 16MB, it will live in the GridFS bucket in MongoDB, which uses two collections named
job_data.chunks
andjob_data.files
.
Job ID Required for Archiving
When archiving a job in the 2023.2 release:
- The
job_id
is now required to retrieve alljob_data
documents for archiving. - If the
job_id
is is not provided, the overwhelming majority of job-related data will remain in the database.
Job Data Collection
All data in the job_data
collection can be found by query on job_id
. Below is an example of what data in the job_data
collection is currently structured to look like.
{
"_id" : ObjectId("657cc218a4f4b23e8083bff0"),
"job_id" : "f04d7175b9d7452bb718c3c6",
"data" : "example"
}
GridFS Bucket
For GridFS, the data is split between the job_data.chunks
and job_data.files
collections.
Data in job_data.files
contains the metadata and all queryable information.
You will need to use the MongoDB driver, and consult the official GridFS documentation for how to query the GridFS bucket, as well as how to delete files.
Below is an example of the job_data.files
document structure in the 2023.2 release. You can see that the job_id
is located in metadata.job
for the GridFS document.
{
"_id" : ObjectId("657b6f14c928923a20266cc6"),
"length" : 30000002,
"chunkSize" : 261120,
"uploadDate" : ISODate("2023-12-14T21:09:40.525Z"),
"filename" : "3f0f8f8e-ded1-4c80-8c40-d5eb3657a018",
"metadata" : {
"job" : "24b0ea5ffbe646b181531585"
}
}
Sample Commands
The following is a list of frequently used commands to support your backup strategy and preserve data.
MongoDB
To run mongodump
:
mongodump --db=<old_db_name> --collection=<collection_name> --out=data/
To run mongorestore
:
mongorestore --db=<new_db_name> --collection=<collection_name> data/<db_name>/<collection_name>.bson
Linux
To enter edit mode for a Cron job:
crontab -e
To backup IAG inventory:
sqlite3 iag.sq3 ".backup 'inventory'"
To restore IAG inventory:
sqlite3 iag.sq3 ".restore 'inventory'"
Itential Pre-Builts
This pre-built allows IAP users to archive the jobs and tasks collections for the 2023.1 and 2022.1 release only: Archive Job Data.
A forthcoming pre-built will enable job archiving on 2023.2 and later versions; however, the targeted release has yet to be determined. Once available, this page will be updated to reflect the newer pre-built for archiving job data in subsequent versions of IAP (23.2+).