Building a Microservice Lightning-Fast with AWS SAM
This blog post is a follow-up to my previous blog post, Enforcing IAM least privilege with AWS SAM + Airtable. I will shift the focus of this blog post from the actual microservice, which is very simple, to the great benefits of using AWS SAM to build it.
I don’t write code regularly anymore. However, I built this service at a production-ready level using AWS SAM in a few days without much effort. Leaving the code aside, I’ll try to walk you through the steps I had to take to define the required supporting infrastructure using SAM and hopefully highlight how easy it was.
Why do I ❤️ SAM?
Before diving deep into the microservice, let’s cover some key features that make me like SAM.
Cloud deployments and sandbox environments
SAM allows you to deploy your full microservice in the cloud in seconds. During my tests with the HelloWorld example, SAM took about 10 seconds to deploy the entire stack (API GW + Lambda) into an AWS account. However, more importantly, you can create dedicated sandbox environments by naming your stack differently:
sam deploy --profile sam-dev --stack-name bob-iam-detective
This approach allows multiple developers to create a dedicated stack for their development and testing without interfering with other developers/stages.
Live sync deployments
SAM relies on CloudFormation to deploy your stack. This approach translates to long waiting times (i.e. 10 seconds) when making micro changes. To solve this problem, AWS provides live sync deployments by using the following command:
sam sync --profile sam-dev --stack-name bob-iam-detective
The previous command syncs your application with your cloud deployment using direct AWS APIs instead of CloudFormation. This translates into near real-time updates to your stack when making changes to your code.
However, note that SAM only supports this feature on AWS Lambda, Amazon API Gateway, and AWS StepFunctions APIs.
Localhost development made it easy
The following SAM command launches your stack in a Localhost Docker container for testing. If you pair this environment with Localstack, you get an excellent environment for local development and testing.
sam local start-api
# For example: you can invoke the hello world API like this
curl http://127.0.0.1:3000/hello
CI/CD pipelines
Last but not least, SAM provides templates for CI/CD for a dual account setup (i.e. development & production). This CI/CD template would start on every commit to Git, run tests and deploy your application to both accounts. To initialize the pipeline, run the following command:
<code>sam pipeline init --bootstrap</code>
However, out of the box it would fail on the build phase unless you remove the line resolve_s3 = true from the samconfig.toml file:
[default.package.parameters]
resolve_s3 = true
If you are new to SAM, I recommend looking at this SAM workshop from AWS.
Enabling testing on the pipeline
It’s important to note that the CI/CD template generated by SAM does not have any testing enabled by default. You must do the following:
- In the file Codepipeline.yaml: Uncomment all the sections related to unit tests. The file has a few lines with the text “# Uncomment and modify the following step for running the unit-tests”.
- In the same file Codepipeline.yaml: This might be only specific to my TypeScript setup. However, to make it work, I had to replace the existing Image (Image: aws/codebuild/amazonlinux2-x86_64-standard:3.0) with a later version (Image: aws/codebuild/amazonlinux2-x86_64-standard:5.0). The pipeline will work without this change.
- In the file buildspec_unit_test.yml: I had to add the following code to run the tests for my TypeScript code:
version: 0.2
phases:
install:
runtime-versions:
nodejs: 18
pre_build:
commands:
- cd iam-detective/
- npm install
build:
commands:
# trigger the unit tests here
- echo 'Running unit tests'
- npm run test
The architecture
The primary responsibility of the backend is to read data from AWS and push it to Airtable. Concretely, these are the main tasks:
- List all the IAM roles from the current account running our lambda.
- For all the roles, create an Access Advisor report and push only the unused permissions (at Grabyo, we consider a permission “unused” after three months) to Airtable for manual review. We don’t send used permissions as they don’t need reviewing.
Because this logic only needs to run once daily, AWS Lambda was the best option. Subsequently, AWS SAM was the obvious choice.
However, because we have hundreds of roles in our AWS accounts, the whole process can’t run in a single lambda (the max execution time for a lambda at the time of writing is 15 minutes).
To solve this challenge, we decided to go for the following architecture:
The main components of this architecture are:
- EventBridge Schedule: Executes the IAM inspector Lambda daily.
- IAM Inspector Lambda: Assumes the role cross-account-role to list all the roles from all the different development or production accounts. For each of these roles, it sends a new job to an SQS queue.
- Role Processor Lambda: For every job, it creates an Access Advisor report and sends the unused permissions results to Airtable.
The IAM inspector lambda
These are the general steps implemented on this lambda:
- List all the roles from all the AWS accounts: The lambda is responsible for listing IAM roles from multiple accounts. We need a cross-account IAM role to allow it to access all the accounts.
- Get all the existing roles in Airtable: We need to query this data to avoid notifying Airtable of permissions already in Airtable.
- Push all non-existing roles to an SQS queue: We need to push all new unused permissions to an SQS for later processing.
To kickstart the project, I built the service using one of the SAM templates as a solid foundation to build upon. Concretely, I went for these options for the Hello World Example (a simple API gateway endpoint link to a lambda that responds with a Hello World test) on top of NodeJS18 with TypeScript and packaged as ZIP.
I selected this stack for several reasons. TypeScript is a language I’m well-versed in, and it offers notably quick build and Lambda cold start times. Just for context, building this application in Node.js takes roughly 10 seconds, while its Java equivalent requires about 45 seconds. Furthermore, if you opt for an Image package deployment (i.e., Docker running on Lambda), these times increase significantly.
This is the final template.yaml, after renaming the hello-world application, it looks like this:
AWSTemplateFormatVersion: '2010-09-09'
Transform: AWS::Serverless-2016-10-31
Description: >
iam-detective
Sample SAM Template for iam-detective
# More info about Globals: https://github.com/awslabs/serverless-application-model/blob/master/docs/globals.rst
Globals:
Function:
Timeout: 60
Resources:
InvestigateIAMPermissionsFunction:
Type: AWS::Serverless::Function # More info about Function Resource: https://github.com/awslabs/serverless-application-model/blob/master/versions/2016-10-31.md#awsserverlessfunction
Properties:
Tags:
Service: iam-detective
CodeUri: iam-detective
Handler: InvestigateIAMPermissions.listAllIAMRoles
Runtime: nodejs18.x
Architectures:
- x86_64
Events:
IAMDetectiveInvestigateIAMPermissionsAPI:
Type: Api # More info about API Event Source: https://github.com/awslabs/serverless-application-model/blob/master/versions/2016-10-31.md#api
Properties:
Path: /iam-detective
Method: get
Metadata: # Manage esbuild properties
BuildMethod: esbuild
BuildProperties:
Minify: false
Target: "es2020"
Sourcemap: true
EntryPoints:
- InvestigateIAMPermissions.ts
Outputs:
# ServerlessRestApi is an implicit API created out of Events key under Serverless::Function
# Find out more about other implicit resources you can reference within SAM
# https://github.com/awslabs/serverless-application-model/blob/master/docs/internals/generated_resources.rst#api
InvestigateIAMPermissionsFunctionApi:
Description: "API Gateway endpoint URL for IAM Detective"
Value: !Sub "https://${ServerlessRestApi}.execute-api.${AWS::Region}.amazonaws.com/Prod/iam-detective/"
InvestigateIAMPermissionsFunction:
Description: "Investiage IAM Permissions Lambda Function ARN"
Value: !GetAtt InvestigateIAMPermissionsFunction.Arn
InvestigateIAMPermissionsFunctionIamRole:
Description: "Implicit IAM Role created for Investiage IAM Permissions function"
Value: !GetAtt InvestigateIAMPermissionsFunctionRole.Arn
The previous template contains a lambda with an API gateway that should be removed as it doesn’t have any authentication attached and it still misses a few critical components for the lambda to perform its tasks.
I recommend creating the CI/CD pipeline at this stage as it will allow you to regularly merge your changes to the infrastructure and code into Git and deploy the changes automatically in the cloud. I like to do this regularly after I add a new feature to ensure the system works in the cloud and not only locally.
SQS queue
The lambda must list all IAM roles and send them to a queue. Defining this queue together with a dead-letter queue on the template can be achieved like this:
# This is an SQS queue with all default configuration properties. To learn more about the available options, see
# https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/aws-properties-sqs-queues.html
InvestigateIAMPermissionsQueue:
Type: AWS::SQS::Queue
Properties:
VisibilityTimeout: 901 # Max lambda timeout + 1 second
RedrivePolicy:
deadLetterTargetArn:
Fn::GetAtt: [InvestigateIAMPermissionsDeadLetterQueue, Arn]
maxReceiveCount: 3
InvestigateIAMPermissionsDeadLetterQueue:
Type: AWS::SQS::Queue
...
Outputs:
...
InvestigateIAMPermissionsQueue:
Description: "The SQS queue to communicate the two lambda functions"
Value: !Ref InvestigateIAMPermissionsQueue
InvestigateIAMPermissionsDeadLetterQueue:
Description: "The dead letter SQS queue to send messages that can't be processed"
Value: !Ref InvestigateIAMPermissionsDeadLetterQueue
We can use environment variables to pass the SQS URL to the lambda. To do this, we need to include the following code inside the Lambda properties definition inside the template:
Properties:
Environment:
Variables:
SQS_URL_TO_PROCESS_IAM_ROLES: !Ref InvestigateIAMPermissionsQueue
This is the TypeScript code required to pull that URL from the environment variables:
<code>const sqsURL = process.env.SQS_URL_TO_PROCESS_IAM_ROLES ?? "ERROR";</code>
Permissions
For the lambda function to perform its tasks, it requires the following permissions:
Cross-account access role:
At Grabyo, we have different AWS accounts. For this reason, the lambda needs to list all roles in the different AWS accounts. We have decided to create a cross-account access role in all the accounts and allow the lambda to assume all these roles. This is the CloudFormation for this cross-account IAM role:
AWSTemplateFormatVersion: "2010-09-09"
Resources:
IAMDetectiveCrossAccountAccessRole:
Type: "AWS::IAM::Role"
Properties:
RoleName: "cross-account-access"
AssumeRolePolicyDocument:
Version: "2012-10-17"
Statement:
- Effect: "Allow"
Principal:
AWS:
- "arn:aws:iam::XXXXXXXXXXXX:root"
Action: "sts:AssumeRole"
Policies:
- PolicyName: "IAMDetectiveCrossAccountAccessRolePermissions"
PolicyDocument:
Version: "2012-10-17"
Statement:
- Effect: "Allow"
Action:
- "iam:GenerateServiceLastAccessedDetails"
- "iam:ListRoles"
- "iam:GetServiceLastAccessedDetails"
Resource: "*"
Outputs:
IAMDetectiveCrossAccountAccessRoleArn:
Value: !GetAtt IAMDetectiveCrossAccountAccessRole.Arn
Using the aws cli, you can create this IAM role in all the accounts needed using this command:
aws cloudformation create-stack
--stack-name detective-cross-account-access-role
--template-body file://detective-cross-account-access.yaml
--capabilities CAPABILITY_NAMED_IAM
Airtable access:
This lambda uses the information stored in Airtable for performance optimization reasons before it submits new roles for processing. To access Airtable, the lambda needs to use a Personal access token. We are storing this token in SecretsManager using the following CloudFormation template:
Parameters:
AirtableSecret:
Type: String
NoEcho: true
Resources:
IamDetectiveAirtableSecret:
Type: AWS::SecretsManager::Secret
Properties:
Name: airtable-token
Description: Secret for IAM Detective Airtable integration
SecretString: !Sub '{"airtable_token":"${AirtableToken}"}'
Outputs:
IAMDetectiveAirtableTokenARN:
Description: "The airtable secret for the IAM detective microservice."
Value: !Ref IamDetectiveAirtableSecret
Export:
Name:
Fn::Sub: "${AWS::StackName}-ARN"
Using the aws cli, you can create this secrets manager stack and provide the secret at the time of creation by using this command:
aws cloudformation create/update-stack
--stack-name airtable-secret
--template-body file://secrets-manager-airtable.yaml
--parameters ParameterKey=AirtableSecret,ParameterValue=XXXXXX
--capabilities CAPABILITY_NAMED_IAM
Opsgenie access:
We use Opsgenie in the backend service to check the health of the service. We use Opsgenie heartbeats to do this. Again, this requires an API token that we store on SecretsManager. Here is the CloudFormation template:
Parameters:
OpsgenieSecret:
Type: String
NoEcho: true
Resources:
IamDetectiveOpsgenieSecret:
Type: AWS::SecretsManager::Secret
Properties:
Name: opsgenie-token
Description: Secret for IAM Detective Opsgenie integration
SecretString: !Sub '{"opsgenie_token":"${OpsgenieToken}"}'
Outputs:
IAMDetectiveOpsgenieTokenARN:
Description: "The opsgenie secret for the IAM detective microservice."
Value: !Ref IamDetectiveOpsgenieSecret
Export:
Name:
Fn::Sub: "${AWS::StackName}-ARN"
Lambda IAM permissions:
We need to grant access to the lambda to perform all of these actions by adding the following to the lambda policy on the SAM template:
Policies:
- Statement:
- Sid: Stmt1679505932243
Effect: Allow
Action:
- sts:AssumeRole
Resource:
[
arn:aws:iam::XXXXXXXXXXXX:role/iam-detective-cross-account-access,
arn:aws:iam::XXXXXXXXXXXX:role/iam-detective-cross-account-access,
...
]
- Sid: Stmt1679384123196
Effect: Allow
Action:
- sqs:SendMessage
Resource: !GetAtt InvestigateIAMPermissionsQueue.Arn
- Sid: Stmt1679253132905
Effect: Allow
Action:
- secretsmanager:GetSecretValue
Resource:
- Fn::ImportValue:
Fn::Sub: "${AirtableSecretsStackName}-ARN"
- Fn::ImportValue:
Fn::Sub: "${OpsgenieSecretsStackName}-ARN"
This policy will grant access to assume the cross-account role and access to both secrets for Airtable and Opsgenie.
Cronjob
Finally, the service needs a daily cronjob to kickstart the lambda. To define this cronjob in the SAM template, you must add the following code inside the events of the lambda function (next to the API gateway event).
Events:
...
IAMDetectiveInvestigateIAMPermissionsCron:
Type: Schedule
Properties:
Schedule: "cron(0 1 * * ? *)"
Description: This is the cron job for the IAM Detective - Invetigate IAM permissions daily.
Enabled: True
Please note that multiple events firing the lambda means it can be called from the API gateway or the cronjob independently.
The role processor lambda
Taking into account that this lambda will be run once per role in the SQA queue, these are the steps implemented on this lambda:
- Generate a ServiceLastAccessedReport: To find out unused permissions, we first need to generate a service last accessed report and then download the report generated.
- Update Airtable: With the report, we need to iterate over all the permissions in the role, identify unused ones (i.e., older than three months), and send them to Airtable.
This is the code required to include the lambda on the template:
# This is the Lambda function definition associated with the source code: sqs-payload-logger.js. For all available properties, see
# https://github.com/awslabs/serverless-application-model/blob/master/versions/2016-10-31.md#awsserverlessfunction
SQSIAMProcessor:
Type: AWS::Serverless::Function
Properties:
Tags:
Service: iam-detective
CodeUri: iam-detective
Handler: SQSIAMProcessor.sqsIAMProcessorHandler
Runtime: nodejs18.x
Architectures:
- x86_64
Description: A Lambda function that logs the payload of messages sent to an associated SQS queue.
# This property associates this Lambda function with the SQS queue defined above, so that whenever the queue
# receives a message, the Lambda function is invoked
Events:
SQSQueueEvent:
Type: SQS
Properties:
Queue: !GetAtt InvestigateIAMPermissionsQueue.Arn
BatchSize: 10
Enabled: true
ScalingConfig:
MaximumConcurrency: 2
MemorySize: 128
Timeout: 900
Policies:
- Statement:
- Sid: Stmt1679505932243
Effect: Allow
Action:
- sts:AssumeRole
Resource:
[
arn:aws:iam::XXXXXXXXXXXX:role/cross-account-access,
...,
]
- Sid: Stmt1679505932905
Effect: Allow
Action:
- secretsmanager:GetSecretValue
Resource:
- Fn::ImportValue:
Fn::Sub: "${AirtableSecretsStackName}-ARN"
Metadata: # Manage esbuild properties
BuildMethod: esbuild
BuildProperties:
Minify: false
Target: "es2020"
Sourcemap: true
EntryPoints:
- SQSIAMProcessor.ts
Invoking event:
You can see that the only event invoking this lambda is the SQS queue we defined earlier compared to the cronjob and API events defined for the previous lambda.
It’s worth pointing out that For this lambda, we need to limit the number of concurrent executions down to the minimum (2 at the time of writing). This is required because the lambda IAM Inspector lambda will generate hundreds of requests for processing, and we can’t process them all in parallel because we would get API rate-limiting requests from Airtable. This is achieved by adding the field MaximumConcurrency: 2 in the previous template.
Lambda IAM permissions:
Regarding permissions, the lambda requires the same cross-account access role as it needs to generate the IAM reports and access the same Airtable secret to perform updates of the newly found permissions.
The final template
This is the final template, including all infrastructure:
AWSTemplateFormatVersion: "2010-09-09"
Transform: AWS::Serverless-2016-10-31
Description: >
iam-detective
Sample SAM Template for iam-detective
# More info about Globals: https://github.com/awslabs/serverless-application-model/blob/master/docs/globals.rst
Globals:
Function:
Timeout: 60
Parameters:
AirtableSecretsStackName:
Description: Name of the airtable secrets manager stack.
Type: String
Default: "airtable-token"
OpsgenieSecretsStackName:
Description: Name of the opsgenie secrets manager stack.
Type: String
Default: "opsgenie-token"
Conditions:
IsDevAccount: !Equals [!Ref AWS::AccountId, "630843564847"]
Resources:
InvestigateIAMPermissionsFunction:
Type: AWS::Serverless::Function # More info about Function Resource: https://github.com/awslabs/serverless-application-model/blob/master/versions/2016-10-31.md#awsserverlessfunction
Properties:
Tags:
Service: iam-detective
CodeUri: iam-detective
Handler: InvestigateIAMPermissions.listAllIAMRoles
Runtime: nodejs18.x
Environment:
Variables:
SQS_URL_TO_PROCESS_IAM_ROLES: !Ref InvestigateIAMPermissionsQueue
Architectures:
- x86_64
Policies:
- Statement:
- Sid: Stmt1679505932243
Effect: Allow
Action:
- sts:AssumeRole
Resource:
[
arn:aws:iam::XXXXXXXXXXXX:role/cross-account-access,
...
]
- Sid: Stmt1679384123196
Effect: Allow
Action:
- sqs:SendMessage
Resource: !GetAtt InvestigateIAMPermissionsQueue.Arn
- Sid: Stmt1679253132905
Effect: Allow
Action:
- secretsmanager:GetSecretValue
Resource:
- Fn::ImportValue:
Fn::Sub: "${AirtableSecretsStackName}-ARN"
- Fn::ImportValue:
Fn::Sub: "${OpsgenieSecretsStackName}-ARN"
Events:
IAMDetectiveInvestigateIAMPermissionsCron:
Type: Schedule
Properties:
Schedule: "cron(0 1 * * ? *)"
Description: This is the cron job for the IAM Detective - Invetigate IAM permissions daily.
Enabled: True
Metadata: # Manage esbuild properties
BuildMethod: esbuild
BuildProperties:
Minify: false
Target: "es2020"
Sourcemap: true
EntryPoints:
- InvestigateIAMPermissions.ts
# This is an SQS queue with all default configuration properties. To learn more about the available options, see
# https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/aws-properties-sqs-queues.html
InvestigateIAMPermissionsQueue:
Type: AWS::SQS::Queue
Properties:
VisibilityTimeout: 901 # Max lambda timeout + 1 second
RedrivePolicy:
deadLetterTargetArn:
Fn::GetAtt: [InvestigateIAMPermissionsDeadLetterQueue, Arn]
maxReceiveCount: 3
InvestigateIAMPermissionsDeadLetterQueue:
Type: AWS::SQS::Queue
# This is the Lambda function definition associated with the source code: sqs-payload-logger.js. For all available properties, see
# https://github.com/awslabs/serverless-application-model/blob/master/versions/2016-10-31.md#awsserverlessfunction
SQSIAMProcessor:
Type: AWS::Serverless::Function
Properties:
Tags:
Service: iam-detective
CodeUri: iam-detective
Handler: SQSIAMProcessor.sqsIAMProcessorHandler
Runtime: nodejs18.x
Architectures:
- x86_64
Description: A Lambda function that logs the payload of messages sent to an associated SQS queue.
# This property associates this Lambda function with the SQS queue defined above, so that whenever the queue
# receives a message, the Lambda function is invoked
Events:
SQSQueueEvent:
Type: SQS
Properties:
Queue: !GetAtt InvestigateIAMPermissionsQueue.Arn
BatchSize: 10
Enabled: true
ScalingConfig:
MaximumConcurrency: 2
MemorySize: 128
Timeout: 900
Policies:
- Statement:
- Sid: Stmt1679505932243
Effect: Allow
Action:
- sts:AssumeRole
Resource:
[
arn:aws:iam::XXXXXXXXXXXX:role/cross-account-access,
...
]
- Sid: Stmt1679505932905
Effect: Allow
Action:
- secretsmanager:GetSecretValue
Resource:
- Fn::ImportValue:
Fn::Sub: "${AirtableSecretsStackName}-ARN"
Metadata: # Manage esbuild properties
BuildMethod: esbuild
BuildProperties:
Minify: false
Target: "es2020"
Sourcemap: true
EntryPoints:
- SQSIAMProcessor.ts
Outputs:
InvestigateIAMPermissionsFunction:
Description: "Investigate IAM Permissions Lambda Function ARN"
Value: !GetAtt InvestigateIAMPermissionsFunction.Arn
InvestigateIAMPermissionsFunctionIamRole:
Description: "Implicit IAM Role created for Investigate IAM Permissions function"
Value: !GetAtt InvestigateIAMPermissionsFunctionRole.Arn
InvestigateIAMPermissionsQueue:
Description: "The SQS queue to communicate the two lambda functions"
Value: !Ref InvestigateIAMPermissionsQueue
InvestigateIAMPermissionsDeadLetterQueue:
Description: "The dead letter SQS queue to send messages that can't be processed"
Value: !Ref InvestigateIAMPermissionsDeadLetterQueue
Final thoughts
In this journey of creating a robust microservice, we’ve uncovered the remarkable capabilities of AWS SAM. The ability to deploy, test, and maintain cloud-native applications at lightning speed is a game-changer. The power of AWS SAM, combined with your expertise, opens the door to endless possibilities for innovation and efficiency in your development process.
As you embark on your own AWS SAM adventures, remember that the cloud is your playground, and AWS SAM is your ultimate tool. Keep experimenting, keep building, and enjoy the speed and agility that AWS SAM brings to your development projects. Happy coding!
We’re hiring!
We’re looking for talented engineers in all areas to join our team and help us to build the future of broadcast and media production.