Skip to content
June 1, 2025
  • Game Day Every Day: How AWS Prepares for the Unexpected
  • Practice like you play: How Amazon scales resilience to new heights (ARC316)
  • Reboot MSK Broker with AWS Fault Injection Service
  • Ethical Considerations and AI Governance

Laurent Domb

Chief Technologist AWS

  • AWS
  • Chaos Engineering
  • Presentations
  • Interviews
  • About
  • Home
  • AWS Fault Injection Simulator Cross Account Experiments via AWS StepFunctions
  • Chaos Engineering

AWS Fault Injection Simulator Cross Account Experiments via AWS StepFunctions

laurent2 years ago2 years ago06 mins

Many AWS customers run their workloads across multiple AWS accounts. Therefore they want to be able to run chaos experiments across accounts to understand how their workload behaves during a cascading or correlated failure. Today, AWS Fault Injection Simulator does not yet support targets in different accounts, but this doesn’t hinder us to run experiments via AWS StepFunctions which has great integrations with AWS Fault Injection Simulator.

AWS StepFunctions allows us to create states with the following actions:

If you are interested in having a central place where you create experiments and fan out experiments to the various accounts via service catalog, you can read up on it here. In the case of this blog post, I’ve created an experiment that only executes a chaos-mesh experiment via FIS in account A and reboots an EC2 instances via FIS in account B. You will point the execution steps to your own FIS Experiment Templates.

Please keep in mind that when running chaos experiments in your environment you’d want to follow the following workflow before the execution of the experiment. As the goal of this blog post is to provide you with insights on how to built a StepFunction that can run experiments cross accounts, I will therefore skip much of this workflow and only focus on the FIS execution via StepFunction.

For our workload, that is comprised of an EKS Cluster in account A, and a Database on EC2 in account B, I will build the following state machine

Before we can start, you will need to create a IAM role in account B that you will use to allow account A to assume the FIS-Execution role.

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "FisExecutionRole",
            "Effect": "Allow",
            "Action": [
                "fis:StartExperiment",
                "fis:TagResource"
            ],
            "Resource": "*"
        }
    ]
}

You will also have to define a trust policy for this role so that account A is authorized to assume the role in account B

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Principal": {
                "AWS": "arn:aws:iam::YourAccountID:root"
            },
            "Action": "sts:AssumeRole",
            "Condition": {}
        }
    ]
}

Note the ARN of the role in account B as you will need it in the Step Function Step in account A!

Let’s create a state machine in account A and click next.

In the search field on the top left enter FIS

and drag the StartExperiment tab into your State Machine workflow. Rename it as you like. You should see something like this

Click twice and notice the banner on the bottom of the page! We will add the permissions once the role is created.

Give the StateMachine a name and click

This will get you to the following page. Click on Edit Role in IAM to Add the missing permissions.

Keep in mind that our role does also need assume role permissions for the cross account access. We are therefore adding the following permissions

Allow the role to assume all resources

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "AllowAssumeRole",
            "Effect": "Allow",
            "Action": "sts:AssumeRole",
            "Resource": "*"
        }
    ]
}

as well as execute the experiment.

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "FisExecutionRole",
            "Effect": "Allow",
            "Action": [
                "fis:StartExperiment",
                "fis:TagResource"
            ],
            "Resource": "*"
        }
    ]
}

Now go back to your state machine and add a second step as follows

Make sure that for the IAM role for cross-account access – optional you chose

Provide IAM role ARN and choose the role ARN in account B that you’ve created before! arn:aws:iam::AccountNumberB:role/fisfullaccess

Click apply the changes. You are now ready to execute the State Machine.

You should see both states turning green

You can now go verify on both accounts in your FIS console that both experiments were executed with the Tag names defined in your step functions step!

For a comprehensive workflow in a single account please review Chaos experiments using AWS Step Functions and AWS Fault Injection Simulator

Tagged: cross account crossaccount Fault Injection Simulator FIS step functions StepFunctions

Post navigation

Previous: re:Invent 2022 Building Confidence Through Chaos Engineering on AWS
Next: Toronto SIBOS 2023: Increasing resilience and financial innovation with cloud services by AWS

Game Day Every Day: How AWS Prepares for the Unexpected

 

Ethical considerations and ai governance

Recent Posts

  • Game Day Every Day: How AWS Prepares for the Unexpected
  • Practice like you play: How Amazon scales resilience to new heights (ARC316)
  • Reboot MSK Broker with AWS Fault Injection Service
  • Ethical Considerations and AI Governance
  • Toronto SIBOS 2023: Increasing resilience and financial innovation with cloud services by AWS

Archives

  • September 2024
  • December 2023
  • November 2023
  • December 2022
  • May 2022
  • October 2020
  • May 2020
  • November 2019
  • May 2019
  • January 2019
  • October 2018
  • July 2018
  • May 2018
  • April 2018
  • March 2018
  • February 2018
  • January 2018
  • October 2017
  • September 2017
  • August 2017
  • July 2017
  • June 2017
  • May 2017
  • April 2017
  • February 2017
  • January 2017
  • November 2016
  • October 2016
  • September 2016
  • June 2016
  • April 2016
  • March 2016
  • January 2016
  • November 2015
  • August 2015
  • July 2015
  • June 2015
  • May 2015
  • March 2015
  • February 2015
  • October 2014
  • September 2013
  • June 2013
  • May 2013
  • October 2012
  • September 2012
  • August 2012
  • July 2012
  • April 2012
  • March 2012
  • January 2012
  • July 2011
  • May 2011
  • March 2011
  • February 2011
  • January 2011

Categories

  • About
  • ansible
  • AWS
  • Chaos Engineering
  • CloudForms
  • Interviews
  • Linux
  • Networking
  • OpenShift
  • Openstack
  • Presentations
  • Product Management
  • Puppet
  • Uncategorized
  • Virtualization

Tags

analysis ansible ansible automation inside ansible playbook bundle ansible service broker ansible tower apb asb auto remediation aws capsh ceph openstack juno cinder glance chaos engineering cloud Cloudforms docker dynamic resource objects errata FIS Insights introspection Inventory ironic juniper Laurent Domb Logstash ipv6 multi-az openshift openstack OSP7 OSP Director puppet rds redhat Red Hat Insights Red Hat Summit 2017 resilience rhel7 rhv Satellite 6 satellite6 security smartstate srx 300 tripleO
Newsmatic - News WordPress Theme 2025. Powered By BlazeThemes.