One of our favorite things about working on Amazon Redshift, the cloud data warehouse service at AWS, is the inspiring stories from customers about how they’re using data to gain business insights. Many of our recent engagements have been with customers upgrading to the new instance type, Amazon Redshift RA3 with managed storage. In this post, we share experiences from customers using Amazon Redshift for the first time, and existing customers upgrading from DS2 (Dense Storage 2) and DC2 (Dense Compute 2) instances to gain improvements in performance and storage capacity for the same or lower costs.
From startups to global quick service restaurants and major financial institutions, Amazon Redshift customers span across all industries and sizes, including many Fortune 500 companies. We’re proud that Amazon Redshift breaks down cost and accessibility barriers of a data warehouse, so startups and non-profits can realize the same benefits as established enterprises from running analytics at scale with Amazon Redshift.
The diverse customer base also allows the Amazon Redshift team to continue to innovate with new features and capabilities that deliver the best price performance for any use case. Use cases range from analytics that help Britain’s railways run smoothly, to providing insight into the behavior of millions of people learning a new language, playing online games, learning to code, and much more. As the world around us responds to changes in every aspect of personal and business life, Amazon Redshift is helping tens of thousands of customers respond with fast and powerful analytics.
More and more customers have gravitated to Amazon Redshift because of continued innovation, including the new generation of Amazon Redshift nodes, RA3 with managed storage. This latest generation of Amazon Redshift is unique because it introduces the ability to independently scale compute and storage with Redshift managed storage, (RMS). This enables you to scale cost-effectively because you can add more data without increasing compute cost, or add more compute without increasing storage costs. This makes RA3 a cost-effective option for both steady and diverse data warehouse workloads, gives you room to grow, and maximizes performance.
New customers like Poloniex and OpenVault benefit from the flexibility of Amazon Redshift RA3
Many customers are growing and looking for a cloud data warehouse that can scale with them, easily integrate with other AWS services, and deliver great value. For customers like Poloniex and OpenVault, who are just getting started with Amazon Redshift, we recommend using the new RA3 nodes with managed storage. New customers like RA3 because you can size your data warehouse for your core workload and easily scale for spikes in users and data to balance performance and costs. For example, you can use concurrency scaling to automatically scale out when the number of queries suddenly spike up, or use elastic resize to scale up and add nodes to make queries run faster. If you’re using clusters intermittently, you can pause and resume on a schedule or manually. You can further reduce costs on steady state clusters by investing in reserved instances with a 1- or 3-year commitment.
Poloniex, one of the longest standing cryptocurrency trading platforms in the world, distributing hundreds of billions of dollars in cryptoassets, uses AWS to gain insights into how users interact with their platform and how they can improve the customer experience in trading, lending, storing, and distribution. They evaluated multiple data warehousing options and chose to work with AWS to design a lake house approach by querying and joining data across their Amazon Redshift data warehouse and Amazon Simple Storage Service (Amazon S3) data lake with the Amazon Redshift Spectrum feature.
“When we were evaluating data warehouses, we chose Amazon Redshift over Snowflake because of the transparent and predictable pricing,” says Peter Jamieson, Director of Analytics and Data Science at Poloniex. “The scalability and flexibility have been enormously valuable as we scale our analytics capability with a lean team and infrastructure. We benefit from the separation of compute and storage in the Amazon Redshift RA3 nodes because we have workflows that create a significant spike in our compute needs, especially when aggregating historical transaction data.”
Organizations are often looking to share the data and insight gained through analytics with their end-users as part of their product or service. The Software as a Service (SaaS) model enables this, and we work closely with SaaS customers to understand the value their data provides so they can use Amazon Redshift to unlock additional business value. Amazon Redshift is well positioned to build a scalable, multi-tenant SaaS solution with features that deliver consistent performance with multiple tenants sharing the same Amazon Redshift cluster.
OpenVault, a full-service technology solutions and data analytics company, enables cable, fiber, and mobile operators around the world to unlock the power of the data in their network to optimize and monetize their businesses. They shared a similar story:
“Amazon Redshift powers analytics in our SaaS solutions to provide insight that can be used to anticipate residential and business broadband trends,” says Tony Costa, EVP and CTO at OpenVault. “This makes it possible to use fast-growing broadband data to make decisions that result in revenue growth, new revenue streams, reduced operational/capital expenses, and improved quality of service for broadband operators. We chose Amazon Redshift RA3 because it is a cost-effective analytics and managed storage solution. It empowers OpenVault’s data scientists and operator customers to perform near real-time analysis of billions of rows of records and seamlessly evolve with the growing analytics needs and ad-hoc inquiries of our customers.”
If you’re new to Amazon Redshift, many resources are available to help you ramp up, including AWS employees and partners. For more information, see Getting Started with Amazon Redshift and Request Support for your Amazon Redshift Proof-of-Concept.
Duolingo, Social Standards, Yelp, Codecademy, and Nielsen get better performance and double the storage capacity at the same price by moving from Amazon Redshift DS2 to RA3
For years, customers with large data storage needs chose Amazon Redshift DS2 (Dense Storage 2) for its price-performance value. Customers such as NTT Docomo and Amazon.com ran petabyte-scale workloads in a single cluster on DS2 node types. However, as data size kept increasing exponentially, the amount of data actively being queried continued to become a smaller fraction of the total data size. You had to either keep adding nodes to store more data in the data warehouse, or retire data to Amazon S3 in a data lake. This creates operational overhead. With Amazon Redshift RA3, after the data is ingested in the cluster, it’s automatically moved to managed storage. RA3 nodes keep track of the frequency of access for each data block and cache the hottest blocks. If the blocks aren’t cached, the large networking bandwidth and precise storing techniques return the data in sub-seconds.
For customers like Duolingo, Social Standards, Yelp, and Codecademy, who are among the tens and thousands of customers already using Amazon Redshift, it’s easy to upgrade to RA3.
Duolingo is the most popular language-learning platform and the most downloaded education app in the world, with more than 300 million users. The company’s mission is to make education free, fun, and accessible to all. They upgraded from Amazon Redshift DS2 instances to the largest instance of RA3 to support their growing data.
“We use Amazon Redshift to analyze the events from our app to gain insight into how users learn with Duolingo,” says Jonathan Burket, a Senior Software Engineer at Duolingo. “We load billions of events each day into Amazon Redshift, have hundreds of terabytes of data, and that is expected to double every year. While we store and process all of our data, most of the analysis only uses a subset of that data. The new Amazon Redshift RA3 instances with managed storage deliver two times the performance for most of our queries compared to our previous DS2 instance-based Amazon Redshift clusters. The Amazon Redshift managed storage automatically adapts to our usage patterns. This means we don’t need to manually maintain hot and cold data tiers, and we can keep our costs flat when we process more data.”
For more information about how Duolingo uses Amazon Redshift, watch the session from AWS re:Invent 2019, How to scale data analytics with Amazon Redshift.
Amazon Redshift is designed to handle these high volumes of data that collectively uncover trends and opportunities. At Social Standards, a fast growing market analytics firm, Amazon Redshift powers the analytics that helps enterprises gain insights into collective social intelligence. The comparative analytics platform transforms billions of social data points into benchmarked insights about the brands, products, features, and trends that consumers are talking about.
“At Social Standards, we are creating the next generation of consumer analytics tools to discover and deliver actionable business insights with complete and authentic analysis of social data for strategic decision making, product innovation, financial analytics, and much more,” says Vladimir Bogdanov, CTO at Social Standards. “We use Amazon Redshift for near real-time analysis and storage of massive amounts of data. Each month we add around 600 million new social interactions and 1.2 TB of new data. As we look forward and continue to introduce new ways to analyze the growing data, the new Amazon Redshift RA3 instances proved to be a game changer. We moved from the Amazon Redshift DS2 instance type to RA3 with a quick and easy upgrade, and were able to increase our storage capacity by eight times, increase performance by two times, and keep costs the same.”
These performance and cost benefits also attracted the popular online reviews and marketplace company, Yelp, to upgrade from DS2 to RA3. Yelp’s mission is to connect people with great local businesses, and data mining and efficient data analysis are important in order to build the best user experience.
“We continue to adopt new Amazon Redshift features and are thrilled with the new RA3 instance type,” says Steven Moy, a Software Engineer at Yelp. “We have observed a 1.9 times performance improvement over DS2 while keeping the same costs and providing scalable managed storage. This allows us to keep pace with explosive data growth and have the necessary fuel to train our machine learning systems.”
For more information about how Yelp uses Amazon Redshift, watch the session from AWS re:Invent 2019, What’s new with Amazon Redshift, featuring Yelp.
As current health conditions shine a spotlight on online learning, many organizations are scaling and using data to guide decision-making. Codecademy uses Amazon Redshift to store all the growing data generated through customers’ use of their web application, including high-volume events such as page visits and button clicks. Their data science team uses this data to develop various statistical models, and by analyzing these models, improve the app based on how customers use it.
“Codecademy is an education company committed to teaching modern skills within technology and code, as well as a catalyst in the shift toward online learning,” says Doug Grove, Director of Infrastructure and Platform at Codecademy. “We were leveraging DS2.xls for our Amazon Redshift cluster and moved to RA3.4xls for performance gains. Moving to the RA3s resulted in a two times performance increase and cut data loading times in half. The separation of compute and storage allows us to scale independently, and allows for easier cluster maintenance.”
For many customers that started using a data warehouse on-premises and migrated to AWS, the scale and value of cloud continue to pay off. Nielsen, the global measurement and data analytics company, provides the most complete and trusted view of consumers and markets worldwide with operations in over 100 countries. A recent upgrade from DS2 to RA3 was the next step in their analytics journey, and helped them save costs, increase performance, and prepare for continued growth.
“We migrated from an on-premises data warehouse to Amazon Redshift in 2017 to optimize costs and to scale our solution to meet the growing demand,” says Sri Subramanian, Senior Manager of Technology at Nielsen. “Our data warehouse workloads run 24/7 at a scale of 1 billion rows per day. We recently migrated our Amazon Redshift cluster from DS2.8x to the new RA3.4x instance type. We have seen a performance gain of up to 40–50% on most of our workloads at a similar price point. Since the RA3 instance types separate compute and storage, disk utilization is no longer a concern. The upgrade was straightforward, and we went from proof of concept to solving complex business challenges quickly.”
These performance gains and productivity improvements are consistent themes from the feedback we’re getting from customers moving from DS2 to RA3. For more information about upgrading your workloads, see Overview of RA3 node types.
Rail Delivery Group, FiNC, and Playrix move from Amazon Redshift DC2 to RA3 to scale compute and storage independently for improved query performance and lower costs
Customers often chose DC2 (Dense Compute 2) for its superior query performance and low price. However, as the data sizes grew, clusters became bigger without the need for additional compute power. Many customers like Rail Delivery Group, FiNC, and Playrix are finding that by upgrading to RA3, they can get significantly more storage space and the same superior performance without increasing costs. For some use cases that need a large amount of raw computational power at the cheapest price and don’t require over 1 TB of data, DC2 provides industry-beating performance. However, if data is likely to grow to over 1 TB compressed, choosing RA3 node types and sizing for compute requirements is a much simpler and cheaper solution in the long run.
One company that found their storage needs growing faster than compute is Rail Delivery Group, a non-profit organization that brings together the companies that run Britain’s railway. They use Amazon Redshift to analyze rail industry data such as timetables, ticket sales, and smartcard usage.
“Since we started using Amazon Redshift for analytics in 2017, we have grown from 1 node to 10 nodes,” says Toby Ayre, Head of Data & Analytics at Rail Delivery Group. “Our data storage needs grew much faster than compute needs, and we had to keep unloading the data out of the data warehouse to Amazon S3. Now, with RA3.4xl nodes with managed storage, we can size for query performance and not worry about storage needs. Since we upgraded from a 10 node DC2.large cluster to a two node RA3.4xl cluster, our queries typically run 30% faster.”
Optimizing costs while also preparing for future growth are consistent requirements for our customers. For FiNC Technologies, the developer of the number one healthcare and fitness app in Japan, data drives a cycle of continuous improvement and enables them to deliver on their mission to provide personalized AI for everyone’s wellness. The personalized diet tutor, private gym, and wellness tracker app helps users make informed decisions about their health and well-being based on real-time metrics about their behavior.
“At FiNC Technologies, we rely on Amazon Redshift to manage KPIs to continuously improve our web services and apps,” says Komiyama, Kohei, a Data Scientist in FiNC. “We upgraded to the Amazon Redshift RA3 from DC2 because our storage needs were growing faster than our compute. We found it easy to upgrade, and like that our new data warehouse scales storage capacity automatically without any manual effort. Since upgrading, we’ve reduced operational costs by 70%, and feel prepared for future data growth.”
While FiNC optimized for growing storage, Playrix, one of the leading mobile game developers in the world, optimized for compute. With over $1 billion annual revenue and more than 2,000 global employees, Playrix builds popular games like Township, Fishdom, Gardenscapes, Homescapes, Wildscapes, and Manor Matters. They use data to better understand the customer journey.
“We rely on data from multiple internal and external sources to gain insight into user acquisition and make marketing decisions,” says Mikhail Artyugin, Technical Director at Playrix. “We moved our Amazon Redshift data warehouse from 20 nodes of DC2.xlarge to three nodes of RA3.4xl to future proof our system. We’re thrilled with the increase in computing power that makes it faster to deliver insight on the marketing data, and we have almost infinite storage space with managed storage, all for a reasonable price. The friendly and productive collaboration with AWS enterprise support and product team was an extra bonus.”
The Amazon Redshift RA3 nodes with managed storage deliver value to new customers like Poloniex and OpenVault, and to existing customers upgrading from DC2 and DS2 instances like Duolingo, Social Standards, Yelp, Codecademy, Nielsen, Rail Delivery Group, FiNC, and Playrix.
If you’re new to Amazon Redshift, check out our RA3 recommendation tool available on the AWS Management Console when you create a cluster. If you’re already an Amazon Redshift customer and you haven’t tried out RA3 yet, it’s easy to upgrade in minutes with a cross instance restore or elastic resize. If you have existing Amazon Redshift DC2 or DS2 Reserved Instances, you can contact us to get support with the upgrade. For more information about recommended RA3 node types and cluster sizes when upgrading from DC2 and DS2, see Overview of RA3 node types.
New features and capabilities for Amazon Redshift are released rapidly, and RA3 is set up for the new scale of data because with AQUA (Advanced Query Accelerator) for Amazon Redshift, performance will continue to improve. You can sign up for the preview of this innovative new hardware-accelerated cache, and the clusters running on RA3 will automatically benefit from AQUA when it’s released. We continue to innovate based on what we hear from our customers, so keep an eye on What’s New in Amazon Redshift to learn about our new releases.
About the authors
Corina Radovanovich leads product marketing for cloud data warehousing at AWS. She’s worked in marketing and communications for the biggest tech companies worldwide and specializes in cloud data services.
Himanshu Raja is a Principal Product Manager for Amazon Redshift. Himanshu loves solving hard problems with data and cherishes moments when data goes against intuition. In his spare time, Himanshu enjoys cooking Indian food and watching action movies.