HaDoop: About Hadoop

Hadoop is a large-scale distributed batch processing infrastructure. It’s 100% open source,

and pioneered a fundamentally new way of storing and processing data. Instead of relying

on expensive, proprietary hardware and different systems to store and process data.

Hadoop enables distributed parallel processing of huge amounts of data across

inexpensive, industry-standard servers that both store and process the data, and can scale

without limits. With Hadoop, no data is too big.

Hadoop can handle all types of data from disparate systems: structured, unstructured, log

files, pictures, audio files, communications records, email, just about anything you can think

of, regardless of its native format. Even when different types of data have been stored in

unrelated systems, you can dump it all into your Hadoop cluster with no prior need for a

schema. In other words, you don’t need to know how you intend to query your data before

you store it and keep it all online for real-time interactive querying, business intelligence,

analysis and visualization.

The objective of this course is to give you a detailed understanding about the architecture

of the system, and be able to use the system effectively for storing and processing huge

data.

HaDoop