Apache Hadoop - HG Insights

“An open-source software framework for distributed storage and processing of large sets of data. When working with extremely large data sets it is important to be able to process information quickly so that users are not waiting for data to appear on their application or website. A common problem with large data sets is that the more data that is ingested, the longer it takes for infrastructure to process the data. Hadoop was invented to be able to run data intensive workloads at a more cost effective price. Hadoop essentially takes large data sets and breaks them up into smaller chunks of data for workers to process. Hadoop can manage millions of workers that are processing data in parallel such that large data sets can be quickly retrieved. Hadoop is responsible for scheduling the workers tasks, and coalescing all of the results so that an application can deliver the correct data back to the end user. Hadoop is a technology that is ultimately implemented by a developer. Hadoop also runs on its own dedicated infrastructure which needs to be setup. Public cloud providers typically offer Hadoop as a service, such as AWS EMR or Google Map Reduce for App Engine.”

What do we mean by this?

A data processing software that’s free and used by nearly everyone.