Data Lake - HG Insights

A central location for storing various types of data a business needs. A data lake is different from a data warehouse in the sense that data can remain in its raw form, without being transformed. This allows developers to essentially build a “schema on read” which basically means as data gets processed, the application can determine how to use and store that data. This makes big data analysis easier because less work is required to cleanse and organize the data in the data lake. AWS, Azure and Google Cloud offer data lake solutions in the form of their blob storage, like S3 in the case of AWS. All data in a data lake can be stored in an S3 bucket and then applications can read from that bucket and determine how to process it. Data Lake technologies like Glue can crawl buckets and determine what type of data is in them, making querying for that data much simpler.

What do we mean by this?

The library archives: All your data in it’s oldest, purest form waiting for you to find it.