Clumio Bulk Restore Automation

Important

Copyright 2024, Clumio, a Commvault Company. Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

What is a Bulk Restore

Bulk restores are used to restore multiple resources from different originating locations (AWS account and region pairing) and/or times to one or more target locations.

How to use a Bulk Restore

Based upon the source definition, Clumio will find the set of appropriate backups for identified resources and restore those resources using the user provided target information.

What information do I need to initiate a bulk restore

The inputs - in json format - required to run the bulk restore automation can be defined ahead of time or easily crafted/updated when the restore is needed. Inputs are in two different categories: source information and target information.

Source information can include AWS account, AWS Region, AWS resources tags, and a datetime search window.

Target information is the resource specific AWS infrastructure elements running in the target location that are required to deploy that specific type of AWS resource. The only additional value you will need is a Clumio API token that validates your permissions to run the automation based upon your Clumio login.

What does the Clumio bulk restore use for this solution

To greatly simplify the process of running the recovery, the bulk restore automation leverages a serverless architecture (AWS Lambda functions) and a state machine (AWS StepFunctions). This scales out the recovery process to initiate all restores at the same time.

Limits on the number of concurrent restores and the performance of those restores are dependent upon the resource types being restored. https://help.clumio.com/docs/clumio-service-limits

This solution can be deployed anywhere in AWS and does not need to have access to either the original AWS source location or the target locations. Outside of the AWS resources mentioned above, logging in AWS CloudWatch, a S3 bucket used temporarily to deploy the solution; the only other AWS resource needed is an AWS Secret which can optionally be used to store your Clumio API token.

Files in this source repository

Note

The files included in this github repository for information purposes only. This python code represents the contents of the lambda functions used by the state machine. This code along with non-default python packages are bundled in the ZIP file that is required to run the CFT.

Note

JSON file example_step_function_inputs.json is an example of the inputs required to run the step function. These inputs would be modified to reflect your environment.

Note

An IAM role that has permissions to execute the step function and the lambda functions (and to write to CloudWatch for logging purposes) must be identified/created before you deploy the CFT template. If required, you can modify the permission of this IAM role after all of the resources have been created to scope those permissions to achieve least privilege. If you use the AWS secret to store your Clumio api token, this IAM Role will also need to have read access to the secret.

The example of both the role trusted relationships and the policy can be found in examples folder.

Note

CloudFormation deployment template: code/clumio_bulk_deploy_cft.yaml. It deploys the full solution — both the bulk restore and bulk list/discovery state machines, plus a shared set of Lambdas. Deploy this template to set up everything.

Build

To build you will need a Unix type shell (bash, zsh, ...), Python 3.12, make and zip.

make build

It will fetch the dependencies and generate a versioned zip (clumio_bulk_restore-<version>.zip, where <version> is read from the VERSION file at the repo root) under the build directory, alongside the rendered CloudFormation template (clumio_bulk_deploy_cft.yaml).

The zip file must be uploaded to a S3 bucket where it can be accessed by the CloudFormation Template when you deploy the solution. Upload it under its versioned filename — the rendered CFT references that exact key.

Build version

The build version is read from the VERSION file at the repo root and stamped by make build into:

version.txt packaged inside the Lambda zip
The zip filename itself: clumio_bulk_restore-<version>.zip
Each Lambda's Code.S3Key in the rendered CFT (so each release loads from a unique S3 key and CloudFormation re-pulls the Lambda code on stack update)
The CodeVersion parameter default and the Version stack output, both visible in the CloudFormation console after deploy

To cut a new release, bump VERSION and re-run make build. Upload the new versioned zip to S3 and run a stack update against the new CFT — Lambda code updates happen automatically; no parameter overrides required.

Important

The CFT parameter for the Lambda zip key was renamed from LambdaZipObject (full filename) to LambdaZipObjectPrefix (prefix only, default clumio_bulk_restore). On first stack update against the new template, the old parameter is dropped and the new default is used. Customers who had set a custom LambdaZipObject value should pass a matching LambdaZipObjectPrefix on the upgrade.

Tuning for scale

Two CFT parameters control how the state machine waits on long-running Clumio restores. Defaults are sized for restoring 64TB-class volumes:

Parameter	Default	Description
`PollingIntervalSeconds`	`60`	Seconds between Clumio task-status polls
`PollingMaxAttempts`	`200`	Maximum polling attempts per restore (~48h wall-time at the default interval; each Task Lambda invocation also internally polls Clumio for ~10 min)

Override them on aws cloudformation deploy --parameter-overrides PollingMaxAttempts=400 PollingIntervalSeconds=120 for unusually slow or unusually fast workloads. The inner per-record / per-asset Maps are Distributed Maps with MaxConcurrency: 100 — concurrent restore / list-asset fanout is bounded by that ceiling and by your account's Lambda concurrency quota.

Running the Automation

Tip

Warning

FOR EXAMPLE PURPOSES ONLY

Input Definitions

Base Input Parameter	Description
clumio_token	Clumio API bearer token https://help.clumio.com/docs/api-tokens
debug	Set to a non-zero value to debug issues

Source/Search Input Parameter	Description
source_account	AWS account from which the ebs resources where backed up
source_region	AWS region from which the ebs resources where backed up
search_direction	When Choosing backups based upon a point in time look for backups "before" or "after" this point in time
"before"	Set a search window from the point in time to the current time
"after"	Set a search window from the point in time to the max search time
end_search_day_offset	If searching "before" a point in time this represents the offset from the current day to the point in time
start_search_day_offset	If searching "before" a point in time this is not used
start_search_day_offset	If searching "after" a point in time this represents the offset from the current day to the point in time
end_search_day_offset	If searching "after" a point in time this represents the offset from the current day to the max search time
search_tag_key	If searching by AWS tag set search key and value
search_tag_value	If searching by AWS tag set search key and value

Target Input Parameter EBS	Description
target_account	AWS account where the ebs resource is to be restored
target_region	AWS region where the ebs resource is to be restored
target_aws_az	required, infrastructure value for restore AWS AZ
target_iops	optional, infrastructure value for EBS iops setting. Should only be use if target_volume_type is one of gp3, io1, or io2.
target_volume_type	optional, infrastructure value for EBS volume type setting. Required if target_iops is set.
target_kms_key_native_id	optional, infrastructure value for restore AWS KMS key id

Target Input Parameter RDS	Description
target_account	AWS account where the ebs resource is to be restored
target_region	AWS region where the ebs resource is to be restored
target_subnet_group_name	required, infrastructure value for RDS Subnet group name
target_rds_name	required, infrastructure value for RDS instance/cluster name
target_security_group_native_id	optional, infrastructure value for RDS Security Group List
target_kms_key_native_id	optional, infrastructure value for RDS AWS KMS key id

Target Input Parameter EC2	Description
target_account	AWS account where the ebs resource is to be restored
target_region	AWS region where the ebs resource is to be restored
target_aws_az	required, infrastructure value for restore AWS AZ
target_vpc_native_id	required, infrastructure value for EC2 VPC id
target_subnet_native_id	required, infrastructure value for EC2 Subnet id
target_kms_key_native_id	optional, infrastructure value for EC2 AWS KMS key id
target_iam_instance_profile_name	optional, infrastructure value for EC2 IAM instance profile name
target_key_pair_name	optional, infrastructure value for EC2 Key pair name
target_security_group_native_id	optional, infrastructure value for EC2 Security Group List

Target Input Parameter DynamoDB	Description
target_account	AWS account where the ebs resource is to be restored
target_region	AWS region where the ebs resource is to be restored
change_set_name	required, infrastructure value for DynamoDB table name component

Target Input Parameter ProtectionGroup	Description
search_pg_name	Required. The name of protection group that the restore is going to be done from.
target_bucket	Required. The name of the bucket that the resource will be restored to.
search_bucket_names	Optional. The list of bucket names within the protection group to be restored.
search_object_filters	Optional. A specification dict to filter the objects to restore.
target_prefix	Optional. The prefix to add to the restored object.

Note

Optional infrastructure target values may still be required based upon the configuration of the original backed up resource.

Name		Name	Last commit message	Last commit date
Latest commit History 88 Commits
.github/workflows		.github/workflows
code		code
examples		examples
screenshot		screenshot
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
CLAUDE.md		CLAUDE.md
Makefile		Makefile
README.md		README.md
VERSION		VERSION
pyproject.toml		pyproject.toml
requirements-dev.txt		requirements-dev.txt
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Clumio Bulk Restore Automation

What is a Bulk Restore

How to use a Bulk Restore

What information do I need to initiate a bulk restore

What does the Clumio bulk restore use for this solution

Files in this source repository

Build

Build version

Tuning for scale

Running the Automation

Input Definitions

About

Uh oh!

Releases 7

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Clumio Bulk Restore Automation

What is a Bulk Restore

How to use a Bulk Restore

What information do I need to initiate a bulk restore

What does the Clumio bulk restore use for this solution

Files in this source repository

Build

Build version

Tuning for scale

Running the Automation

Input Definitions

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases 7

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages