After all, what is Hadoop? Is Hadoop a database? I have often heard "that company uses the Hadoop database". But when I started to study Big Data I saw that things were not really like that.
So if it's not a database, what is it?
After all, what is Hadoop? Is Hadoop a database? I have often heard "that company uses the Hadoop database". But when I started to study Big Data I saw that things were not really like that.
So if it's not a database, what is it?
Hadoop is an ecosystem for distributed computing, that is, designed to handle large amounts of data ( petabytes ) with high speed. This ecosystem is composed of several systems / technologies.
The idea of Hadoop is to perform heavy processing by dividing the task into several nodes (cluster), in order to increase computational power. For this to happen, a file system is used on the nodes of each cluster called the HDFS (Hadoop distributed file system), which contains files with large amounts of data and processing is performed using a programming technique called MapReduce.
Following is an example of systems that can be part of this ecosystem and a brief explanation of each.
HDFS - Hadoop file system, this file system works in a distributed way, using large blocks of memory.
Map Reduce -Model programming for large-scale processing. Based on mapping and reduction (reduce).
Yarn - It is a resource management platform responsible for the management of clustered computing resources, as well as the scheduling of resources.
Hive - Convert SQL queries into MapReduces.
Pig -Language for creating MapReduces
Hbase - A NoSQL database oriented columns (columnar), which can be used on the HDFS. Provides access to large amounts of data at high speed.
Flume - Log export system containing large amount of data for HDFS
Anbari -Hadoop Cluster Monitoring
Sqoop -Freedom of exporting data from SGBDS to Hadoop. Uses JDBC, generates a Java data export class for each table in the relational schema
Oozie / Control-M -Agendor / task manager and Workflows for hadoop.
Today Hadoop is maintained by the Apache Foundation. And it has Enterprise distributions known to Cloudera and Hortonworks.
Open source software architecture that allows the execution of applications using thousands of machines Provides storage, management and distributed data processing capabilities Designed for batch processing of large data sets One of the pioneers of Big Data technology generation