Substring (string.IndexOf) is returning unwanted parts

3

I'm capturing a music site . I would like to return only 2 information artist and music. It is in this code snippet:

<div class="nowOnAir">
            <a href="http://www.radioitalia.it/artista/edoardo_bennato/1.php" onclick="javascript:loadUrl(this.href);return false;" class="autore" title="Scopri tutto su edoardo bennato">
                edoardo bennato            </a><br />
            <span>le ragazze fanno grandi sogni</span>

        </div>

Artist = edoardo bennato

music = le ragazze fanno grandi sogni

I'm trying to recover like this:

string musica = resposta.Substring(resposta.IndexOf("<span>"), resposta.IndexOf("</span>"));
string artista = resposta.Substring(resposta.IndexOf("autore"), resposta.IndexOf("</a><br />"));

In the case of artist ok, I know there are more items, but the music for me would be 100% correct, but it returns the following content in the song:

"<span>le ragazze fanno grandi sogni</span>\n            \n        </div>\n     \t\n        \n        \n                \n        \n        \n        \n        \n        \n        \n        \n        \n        \n        \n        \n        \n        \n        <div class=\"iTunes\">\n        \n                \n           <a href=\"http://www.amazon.it/gp/redirect.html?camp=2025&creative=165953&location=http%3A%2F%2Fwww.amazon.it%2Fgp%2Fsearch%3Fkeywords%3Dsolo%252Cclaudio%2Bbaglioni%26url%3Dsearch-alias%253Ddigital-music&linkCode=xm2&tag=radiital-21&SubscriptionId=AKIAINZG7TF6TOXSKWSQ\" target=\"_blank\">\n           <img src=\"http://static.ritalia.nohup.it/img/2014/acquista_amazon.jpg\" title=\"Acquista su Amazon\"  alt=\"Acquista su Amazon\" />\n           </a>\n        \n        \n        \n\t\t       <!--http://clk.tradedoubler.com/click?p=24373&a=1945182&url= -->\n        \t<a style=\"background:none;\" href=\"https://itunes.apple.com/it/album/solo/id956867691?i=956867694&uo=4\" target=\"_blank\"><img src=\"http://static.ritalia.nohup.it/img/2013/Download_on_iTunes_Badge_IT_110x40_0824.png\" title=\"Scarica su itunes\"  alt=\"scarica\"/></a>\n       \n           \t\t</div>\n\t\t\n\t\t<script>\n        $(document).ready(function(){\n            var mostra=0;\n            $(\".last5\").mousedown(function(){\n                if(mostra==0){\n                    $(\".songs\").fadeIn(\"fast\");\t\n                    mostra=1;\t\n                }else{\n                    $(\".songs\").fadeOut(\"fast\");\t\n                    mostra=0;\t\n                }\n            });\n        \n        \n        });\n\t\t\n        </script>\n       \n        \n        \n        \n     \n    <div class=\"fotoArtista\">    \n       \n    \t<a href=\"http://www.radioitalia.it/multimedia/galleria/artista/1/claudio_baglioni/684.php\" onclick=\"javascript:loadUrl(this.href);return false;\" title=\"Guarda tutte le foto di claudio baglioni\">Foto: 53</a>\n   \n        \n    \t<a href=\"http://www.radioitalia.it/multimedia/video/artista/1/claudio_baglioni/1999.php\"  onclick=\"javascript:loadUrl(this.href);return false;\" title=\"Guarda tutte i video di claudio baglioni\">Video: 35</a>\n\t\n    \t\n    </div>\n        <div class=\"newsArtista\">\n    \t<a href=\"http://www.radioitalia.it/news/1/index.php\"  onclick=\"javascript:loadUrl(this.href);return false;\">\n    \t    Tutte le news\n        </a>\n\t</div>\n        \n        \n        \n        \n        \n        <div class=\"correlati\">\n            <h3>Artisti consigliati</h3>\n            <ul>\n                                <li><a href=\"http://www.radioitalia.it/artista/emma/1.php\"  onclick=\"javascript:loadUrl(this.href);return false;\" title=\"Emma\"><img src=\"http://static.ritalia.nohup.it/img/icons/artista/55827c6812054.jpg\" border=\"0\" ></a></li>\n                                <li><a href=\"http://www.radioitalia.it/artista/marco_mengoni/1.php\"  onclick=\"javascript:loadUrl(this.href);return false;\" title=\"Marco Mengoni\"><img src=\"http://sta"

What's wrong?

    
asked by anonymous 03.08.2015 / 03:51

1 answer

3

You're really getting the wrong positions.

The beginning is not considering the characters of what you are looking for. So if you're looking for <span> you have to get 6 characters ahead so you do not get your own search string.

The second parameter expects how many characters you want to pick, not the position. Then you should find the string that does the final wedding and should subtract what has already been disregarded before in the case the value of the first parameter. This way you have the number of characters and not the position.

So:

using static System.Console;

public class Program {
    public static void Main() {
        var resposta = @"<div class=""nowOnAir"">
            <a href=""http://www.radioitalia.it/artista/edoardo_bennato/1.php"" onclick=""javascript:loadUrl(this.href);return false;"" class=""autore"" title=""Scopri tutto su edoardo bennato"">
                edoardo bennato            </a><br />
            <span>le ragazze fanno grandi sogni</span>

        </div>";
        var inicio = resposta.IndexOf("<span>") + 6;
        var musica = resposta.Substring(inicio, resposta.IndexOf("</span>") - inicio);
        inicio = resposta.IndexOf("autore") + 6;
        var artista = resposta.Substring(inicio, resposta.IndexOf("</a><br />") - inicio);
        WriteLine(musica);
        WriteLine(artista);
    }
}

See working on dotNetFiddle .

Note that the artist's result is wrong as you recognize it. Adapt to what you need now. You already know where you were wrong.

One detail: getting parsing pages from third parties is asking to have problems, unless the creator of the page claims that they will never make changes to it. Just do it in desperation.

    
03.08.2015 / 04:07