Hello. I'm having a hard time working with MapReduce. Whenever I run the application I can not seem to get results, as the Map function apparently executes, but the Reduce function stays at 0%.
When I check the files that were generated on the server that hadoop is installed, the Input is perfect, in .txt as defined, however the reduce generates a folder with the name of the file that I defined (file_name.txt) inside this folder is a log folder and the file "part-00000", which is empty.
What should I do to get the result of the operation?
Follow the driver to connect the application to Hadoop:
public static void main(String args[]) throws IOException{
System.out.println("Olá");
ControleBD conBD = new ControleBD();
ControleArq conAR = new ControleArq();
conAR.gravar(conBD.pesquisa());
JobConf conf = new JobConf(Principal.class); // Definimos qual classe o job tomará como principal
conf.setJobName("TestePrincipal"); //Nome do job que irá executar na maquina virtual
FileInputFormat.addInputPath(conf, new Path("/user/hadoop-user/input/DadosBancarios.txt")); // Definimos o arquivo de entrada
FileOutputFormat.setOutputPath(conf, new Path("/user/hadoop-user/output/saidaDadosBancarios.txt")); // Definimos o arquivo de saida
conf.setMapperClass(ClasseMapper.class); // configuramos a classe do mapper
conf.setReducerClass(ClasseReducer.class); // configuramos a classe do reducer
conf.setOutputKeyClass(Text.class); // definimos o tipo de saida esperada para as operações de map e reduce, nesse caso Texto
conf.setOutputValueClass(IntWritable.class); // definimos o tipo de saida esperada para as operações de map e reduce, nesse caso inteiros
JobClient.runJob(conf); // Executa o Job com as configurações passadas
}
The Mapper class follows:
public class ClasseMapper extends MapReduceBase implements Mapper<LongWritable, Text, Text, IntWritable> {
public void map(LongWritable chave, Text valor,OutputCollector<Text, IntWritable> output, Reporter reporter)throws IOException {
String linha = valor.toString();
System.out.println(linha);
String ano = "";
int valorIndice = 0;
if(linha.contains("year:")){
String[] divisor = linha.split(":");
ano = divisor[1];
}
if(linha.contains("value:")){
String[] divisor = linha.split(":");
valorIndice = Integer.parseInt(divisor[1]);
}
output.collect(new Text(ano), new IntWritable(valorIndice));
}
}
Follow the Reducer class:
public class ClasseReducer extends MapReduceBase implements Reducer<Text, IntWritable, Text, IntWritable> {
public void reduce(Text chave, Iterator<IntWritable> valor, OutputCollector<Text, IntWritable> output, Reporter reporter)throws IOException {
int maxValue = 99999999;
while (valor.hasNext()) {
maxValue = Math.max(maxValue, valor.next().get());
}
output.collect(chave, new IntWritable(maxValue));
}
}
Follow the log generated by Eclipse-Plugin:
15/09/17 17:13:37 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
15/09/17 17:13:38 INFO mapred.FileInputFormat INFO: Total input paths to process: 1
15/09/17 17:13:38 INFO mapred.FileInputFormat INFO: Total input paths to process: 1
15/09/17 17:13:39 INFO mapred.JobClient: Running job: job_201509170444_0002
15/09/17 17:13:40 INFO mapred.JobClient: map 0% reduce 0%
15/09/17 17:13:48 INFO mapred.JobClient: map 100% reduce 0%
15/09/17 17:13:53 INFO mapred.JobClient: Job complete: job_201509170444_0002
15/09/17 17:13:53 INFO mapred.JobClient: Counters: 16
15/09/17 17:13:53 INFO mapred.JobClient: File Systems
15/09/17 17:13:53 INFO mapred.JobClient: HDFS bytes read = 152753
15/09/17 17:13:53 INFO mapred.JobClient: HDFS bytes written = 10
15/09/17 17:13:53 INFO mapred.JobClient INFO: Local bytes read = 44044
15/09/17 17:13:53 INFO mapred.JobClient INFO: Local bytes written = 88160
15/09/17 17:13:53 INFO mapred.JobClient: Job Counters
15/09/17 17:13:53 INFO mapred.JobClient: Launched reduce tasks = 1
15/09/17 17:13:53 INFO mapred.JobClient: Launched map tasks = 2
15/09/17 17:13:53 INFO mapred.JobClient: Data-local map tasks = 2
15/09/17 17:13:53 INFO mapred.JobClient INFO: Map-Reduce Framework
15/09/17 17:13:53 INFO mapred.JobClient: Reduce input groups = 1
15/09/17 17:13:53 INFO mapred.JobClient: Combine output records = 0
15/09/17 17:13:53 INFO mapred.JobClient INFO: Map input records = 6240
15/09/17 17:13:53 INFO mapred.JobClient: Reduce output records = 1
15/09/17 17:13:53 INFO mapred.JobClient INFO: Map output bytes = 31200
15/09/17 17:13:53 INFO mapred.JobClient INFO: Map input bytes = 149856
15/09/17 17:13:53 INFO mapred.JobClient: Combine input records = 0
15/09/17 17:13:53 INFO mapred.JobClient INFO: Map output records = 6240
15/09/17 17:13:53 INFO mapred.JobClient: Reduce input records = 6240
Ps: I'm not using Hadoop directly on my computer, it's running in a virtual machine ( link )
Thankful
Rafael Muniz