Check if URL exists

1

I would like to know how do I validate (if there are) a member's URL's, I am using AngularJS , AJAX and http requests however I can get the status of a URL that I created in a mock, but I can not check a URL if it is external.

$http({
  method: 'GET',
  // url: 'http://private-e5528d-alugueme.apiary-mock.com/api/v1/categories/1'
  // url: 'http://pt.stackoverflow.com/'
  // url: 'https://twitter.com/pmargreff'
}).then(function successCallback(response) {
  console.log(response);
}, function errorCallback(response) {
  console.log(response);
});

When my URL is first what comes in the console response is:

Object { data: Object, status: 200, headers: fd/<(), config: Object, statusText: "OK" }

When I try to get a public URL like Stack Overflow or my own twitter profile the answer is as follows:

Object { data: null, status: -1, headers: fd/<(), config: Object, statusText: "" }

However, if I check the Network tab of my browser the URL has been checked and its status there is 200 when it exists, or 404 if the URL is invalid. First I thought it was some kind of blocking of AngularJS itself, and I tried validation via AJAX as follows:

$.ajax({
  // url: 'http://private-e5528d-alugueme.apiary-mock.com/api/v1/categories/1',
  // url: 'http://pt.stackoverflow.com/',
  // url: 'https://twitter.com/pmargreff',
  type:'HEAD',
  error: function()
  {
    alert('não existe');
  },
  success: function()
  {
    alert('existe');
  }
});

I had the same type of response, when I try to validate my pool I can, since external url I can not and my network tab continues to show correct results.

I tried with promisses and the result was the same:

$.get(url)
    .done(function() { 
      alert('existe');
    }).fail(function() { 
      alert('não existe');
    })

Am I making a mistake in the code or are the errors caused by the sites blocking the request itself? And if it's the second option, can I get around this?

I'm trying not to use the facebook and twitter APIs, so I'd like a response that does not use these options.

    
asked by anonymous 17.04.2016 / 16:08

2 answers

3

Directly you can not do this.

All sites in modern browsers can only have requests for themselves. To allow another site to connect to yours, you must set Access-Control-Allow-Origin , read more about it here .

What solution?

The solution would be to add the required Header (Access-Control-Allow-Origin: *), for example.

  

PHP:

header('Access-Control-Allow-Origin: *');  

But ... As you noticed: who should insert the Header is who is required, so you should change the Twitter codes and add the Access-Control-Allow-Origin . Well, this is not possible!

Game over?

Not exactly. This limitation only occurs on the client side, ie your site can not connect to another. But that does not stop your site server from connecting to another server.

So you can do this:

  

PHP:

function verificarURL($url) {

    // Inicia CURL
    $curl = curl_init($url);
    curl_setopt_array($curl, [
            // Permite obter retorno:
            CURLOPT_RETURNTRANSFER => 1,

            // Define para retornar false se for <200 >=400:
            CURLOPT_FAILONERROR => 1,

            // Autoriza seguir o 'Location':
            CURLOPT_FOLLOWLOCATION => 1,

            // Limita o número de 'Location:' a ser seguido:
            CURLOPT_MAXREDIRS => 2,

            // Adiciona o 'Referer' baseado no Location:
            CURLOPT_AUTOREFERER => 1,

            // Verifica o SSL do website (Previne contra MITM):
            CURLOPT_SSL_VERIFYPEER => 1,
            CURLOPT_SSL_VERIFYHOST => 2,

            // Define o local do CA (as autoridades confiaveis, pode baixar em https://curl.haxx.se/ca/cacert-2017-06-07.pem):
            CURLOPT_CAINFO => __DIR__ . DIRECTORY_SEPARATOR . 'cacert-2017-06-07.pem',

            // Limita para protocolos HTTP/HTTPS (Previne contra outros protocolos, como 'file//', inclusive em redicionamento):
            CURLOPT_REDIR_PROTOCOLS => CURLPROTO_HTTPS | CURLPROTO_HTTPS,

            // Limita para TLSv1.2:
            CURLOPT_SSLVERSION => CURL_SSLVERSION_TLSv1_2,

            // Define um timeout em segundos (contra Slow HTTP Attack e afins):
            CURLOPT_TIMEOUT => 4,
            CURLOPT_CONNECTTIMEOUT => 2,
            //CURLOPT_LOW_SPEED_LIMIT =>
            //CURLOPT_LOW_SPEED_TIME =>
        ]
    );

    // Executa a requisição:
    $dados = curl_exec($curl);
    // Fecha o CURL
    curl_close($curl);

    // Se o HTTP CODE for menor que 200 e maior que 400 ele será false;
    return $dados !== false;
}

verificarURL('http://seusite.com');
  

/! \ SECURITY:

Most cURL security issues have already been fixed and are minimally secure for public use, where user informs $url .

However, there are still some problems. Your IP (from the server) will be exposed to the cURL target, obviously this can be a problem if you use CloudFlare and the like, which hide the IP of your server. Another problem is that the redirection (and also the domain itself) can point to another device on the local network, for example https://malicioso.com sends Location: 192.0.0.1 , your code will follow and say "192.0.0.1" exists, which may be relevant.

Is there another alternative?

Unfortunately you need to make this request on server-side, you can not get the client to do this.

But ... You can "outsource" the service using Yahoo!

Yahoo has a feature called XPath, at least here's what I found about it, you can see at link . Remember that XPath is not from Yahoo, but Yahoo allows you to do XPath, if you want to read more about it here too , in short XPath allows manipulation of XML.

In this case you can make a request using the following query:

  

SQL / YQL:

     

This API has been deprecated, you should use htmlstring , get here , but it is very unstable.

select * from html where url="http://seusite.com"

Then this will return (because yourite.com exists!):

{"query":{"count":1,"created":"2016-04-18T12:16:44Z","lang":"pt-BR","results":{"body":{"script":{"language":"JavaScript","src":"js/redirect-min.js","type":"text/javascript"}}}}}

The results will tell whether it exists or not.

Therefore:

$(':button').click(function() {
  var url = $(':input').val();

  $.ajax({
    url: 'https://query.yahooapis.com/v1/public/yql?q=select * from html where url="' + url + '"&format=json',
    type: "get",
    dataType: "json",
    success: function(data) {
      alert(data.query.results != null ? 'Existe' : 'Nao existe');
    }
  });
});
<script src="https://ajax.googleapis.com/ajax/libs/jquery/2.1.1/jquery.min.js"></script><inputtype="text" value="http://stackexchange.com">
<button>VERIFICAR</button>

This will perform the query shown above and will compare the result , if null is because it does not exist.

However, this has false-negatives, such as https://facebook.com , which registers as non-existent. This would not happen in the first solution.

    
18.04.2016 / 14:29
0

For security reasons, the browser does not allow connections to other servers. Just as by default, servers will deny access to your resources in this way.

1) If your the site you want to check is your property, you could enable the CORS in it.

2) You could use JSONP that provides access to external resources.

3) These limitations are restricted to the client-side, you could via ajax, access a resource in your own domain, sending by parameter the url you want to check, and this back-side feature would check the url and return a true or false for example:

$.ajax({
  url: 'http://seudominio.com.br/ValidadorSite/',
  method: 'get',
  data: {
    urlParam1: "sitequevocequerverificar.org.br"
  },
  success: function(data) {
    if (data.url1) {
      alert("Site 1 existe");
    }
  }
});

C #

URL url;
url = new URL("urlParam1");
HttpURLConnection con = (HttpURLConnection ) url.openConnection();
System.out.println(con.getResponseCode());

The good thing about this last practice is that you could send a url set to be checked and perform a specific action on top of every return "exists or does not exist".

    
25.08.2017 / 15:01