What is Big Data?
Big Data refers to large and complex datasets that cannot be processed using traditional data
processing techniques. It is characterized by the three Vs: Volume, Variety, and Velocity.
Explain the three Vs of Big Data.
Volume: Refers to the sheer amount of data generated.
Variety: Refers to the different types and sources of data, including structured, semi-structured,
and unstructured data.
Velocity: Refers to the speed at which data is generated and processed.
What is Hadoop?
Hadoop is an open-source framework for distributed storage and processing of large datasets
across clusters of computers using simple programming models.
What are the core components of Hadoop?
Hadoop Distributed File System (HDFS) for distributed storage and data replication.
Yet Another Resource Negotiator (YARN) for resource management and job scheduling.
MapReduce for distributed processing of large datasets.
What is MapReduce?
MapReduce is a programming model and processing engine for processing and generating large
datasets in parallel across a distributed cluster.
What is Apache Spark?
Apache Spark is an open-source, distributed computing system that provides an interface for
programming entire clusters with implicit data parallelism and fault tolerance.