Get news via URL

0

I recently saw a video on youtube , where the guy through the URL of news that was in globe and other sites of the type, he recovered the title of the news, the body and the images along with the formatting used. The application was written with the LARAVEL framework.

What kind of resources do you use to make an application? do you have any library in LARAVEL that makes it easier?

    
asked by anonymous 03.12.2016 / 00:56

1 answer

2

To do this you will inevitably need to parse the HTML of the page. You will need to use a DOM parser to do this.

What the DOM Parser will do is grab the HTML you downloaded and turn it into a DOM object in which you can browse and get the information you need.

I've done some projects of this type in particular, and the biggest problems you'll encounter are basically two:

1) Each site (and sometimes different sessions or stories from the same site) has a different HTML structure, so you have to make different maps for each session / site.

2) HTML sites (even large ones such as UOL, Earth) have badly formatted, error-prone htmls. This can eventually make a mistake at the time of parsing the gift, which will complicate your life.

The key is to find a parser that preprocesses html to correct errors, or is error-tolerant.

The last time I worked on a project like this, I did a little robot with java, because it has a Java-ready library that is perfect for this, that you can get the data in the HTML like jquery. It's really cool!

link

Hugs and good luck!

    
03.12.2016 / 04:06