This post was co-written by Andrew Alaniz , Technical Information Security Officer, and Brady Pratt, Cloud Security Enginner, both at BBVA USA.
Data Loss Prevention (DLP) is a common topic among companies that work with any type of sensitive data. One of the challenges is that many people either don’t fully understand what DLP is, or rather, have their own definition of what it is. Regardless of one’s interpretation of DLP, one thing is certain: before you can control data loss, you need to locate find the data sources.
If an organization can’t identify its data, it can’t protect it. BBVA USA, a bank holding company, turned to AWS for advice, and decided to use Amazon Macie to accomplish this in Amazon Simple Storage Service (Amazon S3). Macie is a fully managed data security and data privacy service that uses machine learning and pattern matching to discover and protect your sensitive data in AWS. This blog post will share some of the design and architecture we used to deploy Macie using services, such as AWS Lambda and Amazon CloudWatch.
Data challenges in Amazon S3
Although all S3 buckets are private by default, everyone is aware of the challenges of unsecured S3 buckets exposing data publicly. Amazon has provided a way to prevent that by removing the ability to make buckets public. As with other data storage mechanisms, this doesn’t stop anyone from storing sensitive data within AWS and exposing it another way. With Macie, one can classify the data stored in S3, centrally, and through AWS Organizations.
We can break the Macie architecture into two main parts: S3 discovery and evaluation, and S3 sensitive data discovery:
The setup of discovery and evaluation is simple and straightforward, and should be enabled through Amazon Organizations and across all accounts. The cost of this piece is minimal, and it provides valuable insights into the compliance state of S3 buckets.
Once setup of discovery and evaluation is completed, we are a ready to move to the next step and configure discovery jobs for our S3 buckets. The architecture includes the use of S3, Amazon CloudWatch Events, Amazon EventBridge, and Lambda. All of the execution should happen in a centralized account, but the event triggers should come from each individual account.
When determining the architectural design of the solution, consider a few main components:
Utilize AWS Organizations: Macie allows native integration with AWS Organizations. This is a significant advantage for Macie. Additionally, within AWS Organizations, it allows the delegation of the Macie master account to a subordinate account. The benefit of this is that it allows centralized management while allowing for the compartmentalization of roles.
Ease of management
One of the most challenging things to manage is non-conforming configurations. It’s much easier to manage a standard way to create, name, and configure settings. Once the classification jobs were ready to be created, we had to take into consideration the following when deploying Macie for our use case:
- Macie classifies content in a single job across one account.
- If you submit multiple jobs that contain the same bucket, Macie will scan the objects multiple times.
- Macie jobs are immutable.
Due to these considerations, we decided to create one job per S3 bucket. This allows administrators to search more easily for jobs related to findings.
Macie plays an essential role, not only in identifying data and improving data collection, but also in compliance. We needed to make a decision about how to determine if an S3 bucket would be included in a classification job. Initially, we considered including all buckets no matter what. The logic here was that even if we make an assumption that a bucket would never have sensitive data in it, an entity with the right role could always add something at a later date.
Finally, we implemented a solution to tag specific buckets that were known to have immutable properties and which would never allow sensitive data to be added. We could do this because we knew exactly what data was in the bucket, who or what created the bucket, and exactly who or what had access to the bucket.
An example of this type of bucket is the S3 bucket used to store VPC Flow Logs. We know that this bucket is only created by provisioning scripts and is only going to store VPC flow logs that contain no sensitive data based on data classification standards. Also, only VPC services and specific security services can access this bucket for anything other than READ. This is controlled organizationally and can be tagged with a simple ignore key/value pair upon creation.
Deploying Macie at BBVA USA
BBVA USA developed an approach to working within AWS that allows guardrails to be applied as accounts are created. One of those guardrails identifies if developers have stored sensitive data in an account. BBVA needed to be able to do this, and do it at scale. If there is a roadblock or a challenge with AWS services, the first place BBVA looks is to support, but the second place is the Technical Account Manager.
After initiating conversations with its account team, BBVA determined that AWS Macie was the tool to help them with this challenge.
With the help of its technical account manager (TAM), BBVA was able to meet with the Macie Product team and discuss the best options for deploying at scale. Through these conversations, they were even able to influence the Macie product roadmap.
Getting Macie ready to deploy at scale was actually quite simple once the architectural pattern was designed.
Initial job creation
In order to set up jobs for each existing bucket in the organization, it’s a matter of scripting the job creation and adding each bucket from each account into its own job, which is pretty straightforward.
Job creation for new buckets
The recommended architecture and implementation for existing buckets:
- Whenever a new S3 bucket is added to Organization accounts, trigger a CloudWatch Event in the target account.
- Set up a cross account EventBridge to consume the Event. Using the EventBridge allows for a simpler configuration and centralized management of both Events and Lambda.
- Trigger a Lambda function in a delegated Macie admin account, which creates classification jobs to apply Macie to all the newly created S3 buckets.
- Repeat the same process when a bucket is deleted by triggering a cancel job.
Evaluate the state of S3 buckets
To evaluate the S3 accounts, turn on Macie at the organization master account and delegate administration to a subordinate account used for Macie. This enables management consolidation of security features into a centralized security account. This helps further restrict access from those that may need access to the master billing account. Finally, enable Macie by default on all organization accounts.
BBVA USA worked directly with the Macie product team by leveraging its relationship with the AWS account team and Enterprise Support. This allowed the company to eventually deploy Macie quickly and at scale. Through Macie, the company is able to track any changes to configurations on buckets that allow a bucket to be public, shared, or replicated with external accounts and if the encryption policies are disabled. Using Macie, BBVA was able to identify buckets that contained sensitive information and put in another control to bolster its AWS governance profile.