Skip to main content

Posts

Showing posts from February 23, 2020

Extract Translate Load

So, What it's like Extract Translate Load ?   - ETL was a solution to get analytics at scale. Once we have huge data at scale of hundreds of tera bytes or even at peta scale, we may need a HPC to ask questions on such data. Using commodity compute horizontally would be cost effective in most of the businesss cases. Initially Hadoop had its helping hand in the process, however when Spark could do it efficiently the world said "Why not?". For us to get analytics on huge data largely unstructred and from hetrogenous sources, like every other engineering problem we divided the problem so we can conquer it with ease. We made a layer to Extract, this layer would just abstract us different data sources and get us the data. Traslate layer would structure the data for us so that our logical questions would fit into the arena.Load come in where we need to distribute the compute task at hand to large commodity clusters. Here's where big data framework would be a friend at hel