About Hadoop

Hadoop is a large-scale distributed batch processing infrastructure. It’s 100% open source,
and pioneered a fundamentally new way of storing and processing data. Instead of relying
on expensive, proprietary hardware and different systems to store and process data.
Hadoop enables distributed parallel processing of huge amounts of data across
inexpensive, industry-standard servers that both store and process the data, and can scale
without limits. With Hadoop, no data is too big.
Hadoop can handle all types of data from disparate systems: structured, unstructured, log
files, pictures, audio files, communications records, email, just about anything you can think
of, regardless of its native format. Even when different types of data have been stored in
unrelated systems, you can dump it all into your Hadoop cluster with no prior need for a
schema. In other words, you don’t need to know how you intend to query your data before
you store it and keep it all online for real-time interactive querying, business intelligence,
analysis and visualization.
The objective of this course is to give you a detailed understanding about the architecture
of the system, and be able to use the system effectively for storing and processing huge
data.

No comments:

Post a Comment