Catch part of a URL with cURL

1

Expensive, was looking at this topic Get a value inside html and understood that using this resource "parse_url ($ href, PHP_URL_QUERY)" has the function of saving the query string in a variable.

My doubts are as follows: I want to go through a page (URL), and find all urls, in this case, the href that has only a certain portion of the URL, for example: "/ folder / subfolder /", ie only the links that have that section in the URL.

I looked at some example on the web, but it is not working correctly, it is printing all URLs

Here's what I'm trying to solve:

<?php
  $url = "htt´://www.minhaurl.com.br";
  $ch = curl_init();
  $timeout = 5;
  curl_setopt($ch, CURLOPT_URL, $url);
  curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
  curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, $timeout);
  $html = curl_exec($ch);
  curl_close($ch);

  $string = '';
  $nlinks = 0;
  $slinks = 0;
  $meuslinks = array();
  $x=1;
  $dom = new DOMDocument();
  @$dom->loadHTML($html);

  foreach($dom->getElementsByTagName("a") as $link) {
    $string = $link->getAttribute("href");
    if(strpos($string,"/pasta/subpasta/") == false){
        $slinks++;
    }else{
        if(strpos($string,"/pasta/subpasta/") == true){
            $exibe = "<a href='".$string."'>".$string."</a>";
            echo $exibe."<br>";
            $nlinks++;
            $meuslinks[$nlinks] = $exibe;
        }
    }
  }
  $tam = sizeof($meuslinks);
  while($x <= $tam){
    echo $meuslinks[$x]."<br>";
    $x++;
  }
  echo "<br> ".$nlinks." links foram encontrados!<br>";
  echo "<br> ".$slinks." links foram encontrados!<br>";
  ?>
    
asked by anonymous 17.05.2018 / 15:40

1 answer

0

Look, I did this and it worked, see:

$conteudo = file_get_contents("https://www.uol.com.br/");
$regexLinks = "/[w]*[A-z0-9]{1,}[\.]{1}[A-z0-9]*[A-z\.0-9_\/-]*/";
preg_match_all($regexLinks, $conteudo, $links, PREG_UNMATCHED_AS_NULL);
$hrefs =  $links[0];
$busca = "/mercado/2018/"; // links que contem os diretórios "mercado" e "2018"
$cont = 0;
$urls = array();

foreach($hrefs as $link){

    if(strpos($link, $busca)){
        $cont++;
        $urls[] = $link;
    }

}
if($cont == 0){
    echo "nenhum link encontrado";
    die();
}
echo "Total de links encontrados: ";
echo $cont."</br>";

foreach($urls as $url){
    echo "<a href='$url'>".$url."</a></br>";
}
    
07.06.2018 / 01:36