You can use regex with preg_match_all
like this: / p>
<?php
$dados = file_get_contents('https://www.site.com.br/sitemap.xml');
if (preg_match_all('#www\.site\.com\.br/numero/([^/]+)/#', $dados, $matches)) {
$matches = $matches[1];
foreach ($matches as $value) {
echo $value, '<br>', PHP_EOL;
}
}
The #www\.site\.com\.br/numero/([^/]+)/#
is the regex, the points have \
in front to escape, because the point matches any character (less line break), which is within ([^/]+)
in the case of [^/]
indicates that preg_match_all
takes any character except /
, in this way it will extract everything that comes after www.site.com.br/numero/
and before the next bar.
Example on IDEONE
XML
Now if you are using XML and this:
www.site.com.br/numero/123/www.site.com.br/numero/124/www.site.com.br/numero/125/
In fact it is the view of your browser that did not render the "XML", so the preg_match
and nor substr
will work, assuming your Xml (if it's even an xml) is more or less like this :
<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
<url>
<loc>http://www.site.com.br/numero/123/</loc>
<lastmod>2005-01-01</lastmod>
<changefreq>monthly</changefreq>
<priority>0.8</priority>
</url>
<url>
<loc>http://www.site.com.br/numero/124/</loc>
<lastmod>2005-01-01</lastmod>
<changefreq>monthly</changefreq>
<priority>0.8</priority>
</url>
<url>
<loc>http://www.site.com.br/numero/125/</loc>
<lastmod>2005-01-01</lastmod>
<changefreq>monthly</changefreq>
<priority>0.8</priority>
</url>
<url>
<loc>http://www.site.com.br/numero/126/</loc>
<lastmod>2005-01-01</lastmod>
<changefreq>monthly</changefreq>
<priority>0.8</priority>
</url>
</urlset>
Then you can use DOM or simplexml_load_file
(or simplexml_load_string
), in this case using < in> simplexml :
<?php
$urlset = simplexml_load_file('sitemap.xml');
foreach($urlset as $url) {
if (preg_match('#www\.site\.com\.br/numero/([^/]+)/#', $url->loc, $match)) {
$numeros[] = $match[1];
}
}
foreach ($matches as $value) {
echo $value, '<br>', PHP_EOL;
}
With $url->loc
you get the value of the <loc>
tag, if your XML may have a different format, just change ->loc
by the tag name you use.
Example on IDEONE