Hadoop is a large-scale distributed batch
processing infrastructure. It’s 100% open source,
and
pioneered a fundamentally new way of storing and processing data. Instead of
relying
on
expensive, proprietary hardware and different systems to store and process
data.
Hadoop
enables distributed parallel processing of huge amounts of data across
inexpensive,
industry-standard servers that both store and process the data, and can scale
without
limits. With Hadoop, no data is too big.
Hadoop
can handle all types of data from disparate systems: structured, unstructured,
log
files,
pictures, audio files, communications records, email, just about anything you
can think
of,
regardless of its native format. Even when different types of data have been
stored in
unrelated
systems, you can dump it all into your Hadoop cluster with no prior need for a
schema.
In other words, you don’t need to know how you intend to query your data before
you
store it and keep it all online for real-time interactive querying, business
intelligence,
analysis
and visualization.
The
objective of this course is to give you a detailed understanding about the
architecture
of
the system, and be able to use the system effectively for storing and
processing huge
data.
No comments:
Post a Comment