Amazon Web Services has announced the general availability of AWS Lake Formation, a fully managed service that makes it much easier for customers to build, secure, and manage data lakes.
AWS Lake Formation simplifies and automates many of the complex manual steps usually required to create a data lake, including collecting, cleaning, and cataloging data, and securely making that data available for analytics.
Customers can easily bring their data into a data lake from a variety of sources using pre-defined templates, automatically classify and prepare the data, and centrally define granular data access policies to govern access by the different groups within an organization.
Customers can then analyze this data using their choice of AWS analytics and machine learning services, including Amazon Redshift, Amazon Athena, and AWS Glue, with Amazon EMR, Amazon QuickSight, and Amazon SageMaker following in the next few months. There are no additional charges required to use AWS Lake Formation, and customers pay only for the underlying AWS services used.
Customers want to be able to perform analytics and machine learning across all of their data, regardless of the format or where the data lives. A data lake removes data silos and allows data to reside in a central place so customers can more easily apply different types of analytics and machine learning across all of their data.
Amazon Simple Storage Service (Amazon S3) has become a very popular place for customers to build data lakes because of its scale, cost-effectiveness, durability, and easy integration with AWS’s analytics and machine learning services. However, even with those significant benefits, building and managing a data lake can still be a complex and time-consuming process.
Customers need to provision and configure storage, move data from disparate sources into the data lake, and extract the schema and add metadata tags to make it accessible from a searchable data catalog. In order to do so, customers must clean and prepare the data – including partitioning, indexing, and transforming the data – to optimize the performance and cost that comes with running analytics on the data. Then, they have to set up data access roles and enforce security policies across their storage and each of their different analytics engines, and update the security policies when permissions change or new end users are added. And, finally, customers are required to make the data available in a secure way to their data analysts so that they can analyze and process the data using any of the available analytics engines. These steps require customers to perform a lot of manual work, and as a result, most customers can take up to several months to set up a data lake.
AWS Lake Formation significantly simplifies the process and removes the heavy lifting from setting up a data lake. AWS Lake Formation automates manual, time-consuming steps, like provisioning and configuring storage, crawling the data to extract schema and metadata tags, automatically optimizing the partitioning of the data, and transforming the data into formats like Apache Parquet and ORC that are ideal for analytics. AWS Lake Formation cleans and deduplicates data using machine learning to improve data consistency and quality. To simplify data access and security, AWS Lake Formation provides a single, centralized place to set up and manage data access policies, governance, and auditing across Amazon S3 and multiple analytics engines. To reduce the time analysts and data scientists spend hunting down the right data set for their needs, AWS Lake Formation provides a central, searchable catalog which describes the available data sets and their appropriate business use. Customers can now easily access data from a single place and integrate with their choice of AWS analytics and machine learning services, including Amazon Redshift, Amazon Athena, and AWS Glue, with Amazon EMR, Amazon QuickSight, and Amazon SageMaker following in the next few months. With AWS Lake Formation customers can set up and begin using a data lake in days instead of months.
“Our customers tell us that Amazon S3 is the ideal place to house their data lakes, which is why AWS hosts more data lakes than anyone else – with tens of thousands and growing every day. They’ve also told us that they want it to be easier and faster to set up and manage their data lakes,” said Raju Gulabani, Vice President, Databases, Analytics, and Machine Learning, AWS. “That’s why we built AWS Lake Formation, so customers can spend more time learning from their data and innovating, rather than wrestling that data into functioning data lakes. AWS Lake Formation is available today and we’re excited to see how customers use it as one of the building blocks for growing and transforming their businesses and customer experiences.”
AWS Lake Formation is available today in US East (Ohio), US East (N. Virginia), US West (Oregon), Asia Pacific (Tokyo), and Europe (Ireland) with additional regions coming soon.
Panasonic Avionics Corporation is the world’s leading supplier of in-flight entertainment and communication systems. “We wanted to create a data platform with the ability to manage the security settings for all the different applications in our environment. With AWS Lake Formation, we can now define policies once and enforce them in the same way, everywhere, for multiple services we use, including AWS Glue and Amazon Athena,” said Anand Desikan, Director of Cloud and Data Services, Panasonic Avionics. “The enhanced level of control gives us secure access to data and meta-data for columns and tables, not just for bulk objects, which is an important part of our data security and governance standard.”
Amgen is the world’s largest independent biotechnology company. “At Amgen we’ve been heavy users of Amazon Redshift, Amazon EMR, and Databricks clusters for over three years. Setting up security and access controls for each AWS account, service, user, and data set at the level of detail that was required could be cumbersome,” said Kerby Johnson, Enterprise Data Lake Product Owner, Amgen. “AWS Lake Formation streamlines the process with a central point of control while also enabling us to manage who is using our data, and how, with more detail. AWS Lake Formation allows us to manage permissions on Amazon S3 objects like we would manage permissions on data in a database. Our users will be able to find, access, and analyze the data they need with the tools they prefer. This new workflow can make everyone more productive when using Amgen’s data.”
Alcon is a leader in innovation and development of life-changing vision and eye care products. “Like a lot of companies, we started our data lake initiative to get away from having inaccessible silos of data,” said Srinivas Ravilisetty, IT Analytics Lead, Alcon. ”With AWS Lake Formation we can quickly add access to existing Amazon S3 buckets and define what’s in them and how it can be used. The data remains in place in S3, but we have full control over it for other uses.”
Life360 is the world’s leading peace of mind service for families. The Life360 app brings families closer with smart features designed to protect and connect the people who matter most. “We wanted to use AWS Lake Formation to build our data lake for supporting location-based time-series data, and make it much easier to load data. The pre-fabricated blueprints helped get data into the data lake without our data engineering team having to write code from scratch, so they could focus on operationalizing ingest, not reinventing the wheel,” said Richard Chennault, Head of Cloud and Data Services, Life360, Inc. “With AWS Lake Formation we were able to quickly unlock data available in Amazon S3 and make it available to analyze across a broad spectrum of AWS data services. The data remains in place in Amazon S3, we can analyze it in many different ways, and we maintain full control over it.”
Zalando is Europe’s leading online platform for fashion and lifestyle. “As Europe’s most fashionable tech company, we work hard to find digital solutions for every aspect of the fashion journey,” said Alberto Miorin, Engineering Lead, Zalando SE. “AWS Lake Formation gave us a scalable central point of control for data access through Amazon Redshift that not only simplified the process, but improved it through granular control over how our data is being used. Now we can discover, access, and analyze data in our data lake with our preferred tools, and leverage it for business intelligence and data science. This streamlined workflow helps our executives make the right decisions on time, and fosters innovation through machine learning.”
Accenture is a leading global professional services company, providing a broad range of services and solutions in strategy, consulting, digital, technology, and operations. “I focus on helping clients in their ‘Data on Cloud’ journey. Specific to that, we have seen that organizations are dealing with a lack of trusted data when they need to perform analytics on data coming from multiple sources,” said Namrata Maheshwary, Senior Architect for the Data Business Group, Accenture. “Data cleansing is a critical step in data analytics and can greatly impact the business outcome and decision making. The new features in AWS Lake Formation have been hugely beneficial to address the challenge of data veracity and securing access to the data lake. We found it tremendously useful to make use of the advanced machine learning techniques for data preparation to find matching records, clean, and deduplicate data from different data sources. This will help reduce the time, effort, and cost, while improving the quality and accuracy of the data in a customer’s data lakes.”
Quantiphi is an Artificial Intelligence and Big Data software and services company driven by the desire to solve complex business problems. Quantiphi specializes in building data lakes and AI solutions for customers to deliver quantifiable value. “AWS Lake Formation allows us to deliver a secure data lake with access to relevant data in days,” said Arnav Gupta, AWS Practice Lead, Quantiphi. “We now have the ability to deliver the best of both worlds for our customers – full security, plus simplified access to relevant data for their users to make decisions easily. Our customers can focus on making smarter, analysis-driven business decisions by tapping into a powerful, centralized data source.”