Hadoop
is an open source software. The Apache Hadoop software library is a framework that enables the distributed storage & distributed processing of large data sets i.e. Big data, on computer clusters of commodity servers using simple programming models.Commodity computing
is refered to low-budget cluster computing. Here multiple computers, multiple storage devices, and redundant interconnections compose the user equivalent of a single highly available system. Large numbers of computing components are used for parallel computing to get the greatest amount of computation at very low cost.
We prefer here commodity computing for its inexpensive, modestly performing hardware components which work in parallel. Commodity computing systems are conceived, designed, and built to minimize the cost per unit of performance.
It is designed in such a way that it can be scaled up from a single server to thousands of machines , each offering local computation and storage ,with a very high degree of fault tolerance.it dows not rely on Rather than rely on hardware to deliver high-availability rather Hadoop software is able to detect and handle failures at the application layer making the computer clusters extremely resilient.
It delivers a highly-available service on top of a cluster of computers, each of which may be prone to failures.Hadoop can easily handle high volumes of complex data in petabytes.
It delivers a highly-available service on top of a cluster of computers, each of which may be prone to failures.Hadoop can easily handle high volumes of complex data in petabytes.
The base Apache Hadoop framework is composed of the following modules:
- Hadoop Common
contains libraries and utilities needed by other Hadoop modules.
- Hadoop Distributed File System (HDFS)
Hadoop Distributed File System (HDFS) is a file system that spans all the nodes in a Hadoop cluster for data storage. It links together the file systems on many local nodes to make them into one big file system. A distributed file-system that stores data on commodity machines
- Hadoop YARN
Yet Another Resource Negotiator (YARN) assigns CPU, memory, and storage to applications running on a Hadoop cluster. YARN is a resource-management platform which manages compute resources in clusters and uses them for scheduling of users' applications. YARN enables other application frameworks (like Spark) to run on Hadoop as well.
- Hadoop MapReduce
a programming model for large scale data processing.It is the task tracker that manages the programming jobs
Hadoop Distributed File System
Its Hadoop Distributed File System (HDFS) splits files into large blocks (default 64MB or 128MB) and distributes the blocks amongst the nodes in the cluster. Then to process the data, the Hadoop Map/Reduce ships code (Jar files) to the nodes that have the required data, and the nodes then process the data in parallel.
Additional software packages can be installed on top of or alongside Hadoop, such as :
- Apache Pig
- Apache Hive
- Apache Hbase
- Apache Spark
YARN
YARN stands for "Yet Another Resource Negotiator".YARN holds the resource management capabilities in such a way that they can be used by new engines. It also helps MapReduce to do data process. With YARN, you can now run multiple applications in Hadoop, all sharing a common resource management.
MapReduce Java code is common used but any programming language can be used with "Hadoop Streaming" to implement the "map" and "reduce" parts of the user's program.
The Hadoop framework itself is mostly written in the Java programming language, with some native code in C and command line utilities written as shell-scripts.
Hadoop has changed the dynamics of large scale computing with its four salient characteristics.
Hadoop enables a computing solution that is:
- Cost effective
Hadoop brings parallel computing to commodity servers. The result is a considerable decrease in the cost per terabyte of storage.
- Scalable
We can add New nodes as needed, and add them without changing the data formats, how data is loaded, how jobs are written, or the applications on top.
- Flexible
Hadoop is schema-less, and can absorb any type of data, structured or not, from any number of sources. Data from multiple sources can be joined and aggregated in arbitrary ways enabling deeper analyses .
- Fault tolerant
When you lose a node, the system redirects work to another location of the data and continues processing .- Abstraction
- Method
Overriding
- Method
Overloading
- Instance
Variables
- Java
Applets
- Pop
ups and Alerts
- Absolute path
- Relative path
- Annotations
Selenium Webdriver Browser Commands - Absolute path Vs
Relative path
- Selenium Webdriver Pop
ups and Alerts
- Testng Annotations -
part 1
- Object
Models in QTP - Part 1
Agile Testing Methodology - Extreme Programming
and customer satisfaction
- Mobile testing - What
are the Challenges in mobile testing & Strategies we can follow to
deal with them
- Crowdsource testing -
Crowdtesting
- Model-based testing
(MBT)
- Big Data Testing
- Cloud Testing
- TDD Test Driven
Development
- Verification vs
Validation
- Software Testing
Interview Questions - Mock Test CSTE / ISTQB
- Software testing types
- Risk Management
Comments
Thanks you so much.
hadoop online training in usa