C # - How to Make a Simple Web Scraping

0

I want to read information from an HTML page of a online radio . I have tried to do the reading using HtmlAgilityPack, however without success because the page in question that I am working does not use ElementId, I imagine that it is not a problem, however I do not know how to use the API and the examples I found needed to use the GetElementbyId () .

I need to receive two information from this page (Playing Now, Playing Next), and assign them to their respective variable. Preferably using native C # functions, however I have no problem using some API like the HtmlAgilityPack (Mainly if so the procedure is simplified).

Here is a print of the page that I want to do Web Scraping.

ThecodeIcurrentlyhavelookslikethis:

namespaceWeb_Scraping{classSimplesWebScraping{voidMain(){//RealizaodownloaddapáginaemStringvarwebClient=newWebClient();stringpagina=webClient.DownloadString("http://hts01.painelstream.net:9074/index.html?sid=1");

        //Declara variáveis do tipo string para armazenar os dados/conteúdos extraidos no website.
        string playingNow = string.Empty;
        string playingNext = string.Empty;

        //Realiza o Web_Scraping
        //Como fazer isso?

        //Escreve os dados extraídos
        Console.Write("Reproduzindo Agora: " + playingNow);
        Console.Write("Reprodução Seguinte: " + playingNext);
    }
}
}

What is the best way to perform this procedure, can someone show me example codes for this action? I already told them Thanks for the help!

    
asked by anonymous 07.11.2017 / 06:35

1 answer

0

Regular Expressions

This is still the best way to handle string as you are looking.

With RegEx - Regular Expressions - you create a pattern and it conflicts against a text. And all the matches that exist will be returned.

Example:

Pattern ([A-Z])\w+ : Sequence that starts with characters from A to Z - uppercase - and then are followed by /w characters in at least one +

If you run this pattern against the text below:

  

Well to Stack Overflow !

The return will be: [ "Bem", "Stack", "Overflow" ] , because the sequences that begin with capital letters, are followed by more letters - and only letters.

This is the RegExr Tester I recommend: link Very good for testing and learning RegExpr.

    
07.11.2017 / 09:47