How to get all the href from another external page

0

I'd like to access an external page ( google for example), and inside it run a script to capture all the hrefs, I read about the fact that browsers do not allow to use GET to get the html, but I believe that I must have a way to do this, I have this code for now. I read about googlebot, and would like to try doing this in JS.

$.ajax({   
    url: 'http://google.com',
    type: 'GET',
    success: function(res) {
        $('a').each(function() {
            alert($this.href);
        });
    }
});
    
asked by anonymous 08.10.2017 / 20:59

1 answer

0

You will not be able to do this with just the frontend for the reasons explained in the comments in your post. Using nodejs you can do this. In the code below I get all the a tags from a site and then get the contents of the href attribute of each one.

var request = require('request');
var cheerio = require('cheerio');
var searchTerm = 'screen+scraping';
//URL DO SITE 
var url = 'http://www.meusite.com.br';

request(url, function(err, resp, body){
  //CARREGA O HTML
  $ = cheerio.load(body);
  links = $('a'); //Pega todas as tags a, exatamente como o jquery

  //Passa por todas as tags obtidas no trecho acima.
  $(links).each(function(i, link){
    //Na tag pega o atributo href e imprime no console.
    console.log($(link).text() + ':\n  ' + $(link).attr('href'));
  });
});
    
09.10.2017 / 14:22