Help with searching information in an html with js / node

0

I'm having trouble implementing a code. I've been helped some other time and now I'm needing it again. The purpose is a script that helps me to remove from an html the name of the teachers and the link from their respective lattes, save somehow for later I handle the data. I am a beginner in this world and still do not know how to work with jquery, but nothing prevents me from using this option if I am guided. By analyzing the html code of the page, I could see that the html tag <h2> is only used for the names of the teachers, so I got the contents of all the headers2 and managed to save. I noticed that after this tag, the next "href" is where the lattes link of the respective teacher ... I'm stuck exactly at that point. I talked a lot but I think I managed to be clear. Thank you guys.

    const url = 'http://www.ppg-educacao.uff.br/novo/index.php/corpo-docente'
const axios = require('axios')
const cheerio = require('cheerio')


axios.get(url).then(response =>{
    const funcionarios = response.data
    const $ = cheerio.load(response.data)
    const professores = $('h2').text()
    console.log($('h2').text())
    //const lattes = $('a href="http://lattes.cnpq.br/"' ).text()
    //console.log(lattes)
    //const informacoes = []
    //informacoes.push({'nome ': professores, 'lattes ': lattes})
    //console.log (informacoes)

})

    
asked by anonymous 09.08.2018 / 21:12

2 answers

0

By analyzing the DOM of the page, we realize that each teacher is in a <div> block with its standard elements (which facilitates our reading of the data), with the item column-1 class.

The only h2 element is the teacher's name, just like you did. The second element p contains a single element a which is the link to the curriculum.

To select divs with jQuery, we use $('.item.column-1')

The code would look like this, resulting in an array of named and link objects.

var professores = []
$('.item.column-1').each(function(index) {
  var nome = $(this).children('h2').text()
  var link = $(this).find('p a')[0].href
  professores.push({
    nome: nome,
    link: link
  })
})
    
09.08.2018 / 22:36
0

I have changed the code for you.

const url = 'http://www.ppg-educacao.uff.br/novo/index.php/corpo-docente'
const axios = require('axios')
const cheerio = require('cheerio')

let objs = []
let nomes = []
let urls = []

axios.get(url).then(response =>{
    const funcionarios = response.data
    const $ = cheerio.load(response.data)
    $('h2').each((i, e) => {
        nomes.push(e.children[0].data.trim());
    });
    $('p a').each((i, e) => {
        urls.push(e.attribs.href);
    });

    nomes.forEach((nome,i) => {
        objs.push({nome: nomes[i], lattes: urls[i]});
    });

    console.log(objs);
})

Result:

[ { nome: 'Adriano Vargas Freitas',
    lattes: 'http://buscatextual.cnpq.br/buscatextual/visualizacv.do?id=K4276752U6' },
  { nome: 'Alessandra Frota Schueler',
    lattes: 'http://buscatextual.cnpq.br/buscatextual/visualizacv.do?id=K4799322D6' },
  { nome: 'Bruno Alves Dassie',
    lattes: 'http://buscatextual.cnpq.br/buscatextual/visualizacv.do?id=K4707912H5' },
  { nome: 'Carlos Eduardo Zaleski Rebuá',
    lattes: 'http://buscatextual.cnpq.br/buscatextual/visualizacv.do?id=K4260783J4' },
  { nome: 'Carmen Lúcia Vidal Pérez',
    lattes: 'http://lattes.cnpq.br/0646181238100482' },
  { nome: 'Cecilia Maria Aldigueri Goulart',
    lattes: 'http://lattes.cnpq.br/7281306371405447' },

    ...
{ nome: 'Valdelúcia Alves da Costa',
    lattes: 'http://lattes.cnpq.br/3766561922402070' },
  { nome: 'Waldeck Carneiro',
    lattes: 'http://lattes.cnpq.br/4129978776761994' },
  { nome: 'Zoia Ribeiro Prestes',
    lattes: 'http://lattes.cnpq.br/1927800358488148' },
  { nome: 'Zuleide Simas da Silveira',
    lattes: 'http://lattes.cnpq.br/8037763146233564' } ]
    
09.08.2018 / 22:56