This is a guest post by David Ting, VP of Engineering at Nylas. In their own words, Nylas is a pioneer and leading provider of universal communications APIs that allow developers to quickly connect their applications to every email, calendar, or contacts provider in the world. Over 26,000 developers around the globe use the Nylas communications platform to handle over 1 billion API requests per day to providers such as Gmail, Microsoft Exchange, Outlook.com, Yahoo! and more.
Recently, the Nylas Engineering team had the opportunity to participate in AWS Data Lab, a 4-day intensive collaboration between experts at AWS and the Nylas team to create tangible deliverables that accelerate data and analytics modernization initiatives. Although participating in the AWS Data Lab during this unprecedented time of the pandemic presented some challenges, the Nylas team was laser-focused and thoroughly prepared with clear objectives, which resulted in exciting developments we’d like to share in this post.
In this blog post, we share the following:
- Nylas’s goals heading into AWS Data Lab
- How we coordinated multiple databases in 4 short days
- What we plan to build going forward
Goals heading into AWS Data Lab
The Nylas team had an overarching goal: increase storage by 10 times in the next 12 months while reducing cost, reduce operational complexity, and increase scalability and reliability. To that end, the team set four specific objectives to build the following:
- A transaction table in Amazon DynamoDB
- A streaming prototype for consuming data in DynamoDB
- A streaming solution for consuming data in DynamoDB
- A messages table in DynamoDB (as a stretch goal)
These objectives were directly informed by the current reliability challenges the team faced. Stemming from fundamental MySQL database architecture issues, the database tables that caused the most problems were message, event, contact, and transaction.
The current configuration of these four sharded MySQL tables makes it difficult to quickly iterate on new services on existing architecture. With the guidance of Data Lab Solutions Architects and Database and Analytics service experts, the idea was to use Amazon Kinesis Data Streams, the AWS Lambda Serverless framework, and NoSQL to tackle the issue.
Coordinating databases in 4 days
In 4 short days, the Nylas team accomplished an astonishing amount—achievements that required the team to learn Lambda and coordinate multiple databases.
Building the transaction table and establishing confidence in DynamoDB writes
The AWS Data Lab experts began offering prescriptive architectural guidance and sharing best practices on a once-a-week, pre-lab basis that started a month before the lab. Before the 4-day intensive, the team wrote the code for DynamoDB writes to be ready to try turning on the feature flag at the lab. With this preparation, the team achieved their first goal of building a transaction table in DynamoDB, and set the foundation for replacing our highest traffic MySQL table with DynamoDB.
Currently, we double write to MySQL and to DynamoDB if its feature flag is turned on. Additionally, we sample a percentage of transactions based on the resource’s ID before writing to DynamoDB. But we also wanted to put in checks to verify the data’s integrity and be absolutely sure that the transition could roll out without issue. This involved using Lambda, a serverless compute service that runs your code in response to events and automatically manages the underlying compute resources for you, essentially enabling us to write code that runs on the AWS side.
In the preceding example, the code checks that a transaction exists in the MySQL database and in our new DynamoDB database for the same record. This Lambda function runs checks on a percentage of the transactions entered into the DynamoDB database, which increases confidence about the switch by making sure that the databases are identical and no data is lost.
Consuming DynamoDB Data in our application
To build a streaming prototype and streaming solution to consume data in DynamoDB, we used our delta endpoints. The delta endpoints are how our customers get informed of changes they care about, such as new or deleted messages. To set the groundwork, the team refactored the delta endpoints to read from either MySQL or DynamoDB. Practically speaking, this means that a new data layer is added without disrupting the current data flowing to delta endpoints.
The team successfully shows how our delta stream, a very high traffic API endpoint, can guarantee better reliability and work without disruption. Backing the current system with DynamoDB takes a significant load off the system and points to the future of a more stable, low-latency, real-time streaming service.
Improved schema page for messages table
The messages table is even more impactful than the transaction table. Incredibly, the team architected a new and improved schema for the messages table, which works in a non-relational database such as DynamoDB. The following screenshot shows the design.
Moving from a relational database such as MySQL to a non-relational data store such as DynamoDB has many challenges, especially if you use many of the relational database features, for example joins. With the guidance and assistance from the AWS Data Lab team, we came up with several designs that we think will work effectively with our dataset, but are hosted in DynamoDB.
Currently, DynamoDB handles a small percentage of traffic. As we increase that percentage, we anticipate that the MySQL databases will no longer be overloaded and will result in an increase in stabilization and reliability improvements. This also creates more headroom for the MySQL databases, resulting in long-term cost savings. Additionally, because DynamoDB scales infinitely, the team can scale transactions and webhooks with greater ease and confidence. Finally, because we can run Lambda microservices, this sets a foundation for building microservices on Nylas.
What’s next: High-throughput streaming endpoints
The foundation has been set for building microservices on Nylas. But it also allows for faster iteration on data and streaming products. The successful prototyping and solution for data consumption illuminates a future for new ultra-high-throughput streaming endpoints, such as a new ultra-high-bandwidth, low-latency streaming pipe for current or new prospects that need to connect directly to our firehose.
In the new architecture diagram, the team added Amazon Kinesis Data Streams. This represents a stream of all changes in our system, every single “You’ve got mail” message or calendar update. Significantly, it represents the backbone of what will eventually become a new suite of Nylas streaming products.
This decoupling of streaming data, held in a first-class and robust home of its own, means that we can use that stream to power many other services, such as the data team’s products and other exciting future AI applications.
With AWS Data Lab, Nylas made a huge first step in its goal of increasing storage tenfold within 12 months while reducing costs. The team used Lambda and DynamoDB to set the foundation for a strong streaming and data consumption product, which significantly increases Nylas customers’ user experience.
The content and opinions in this post are those of the third-party author and AWS is not responsible for the content or accuracy of this post.
About the author
David Ting is the VP of Engineering at Nylas, responsible for product, research, and development. Nylas is the leading communications platform in the world, powering thousands of applications with connectivity APIs that enable automated communications workflows. David brings more than 20 years of experience to Nylas, having developed scalable systems including AI/ML, eCommerce, logistics, gaming, music, and video.