Urllib2, exception handling

5

I'm a beginner in the art of programming. I'm learning to program in Python through a book,

  

Learn to Program: The Art of Teaching the Computer (Cesar Brod - Novatec Editora)

In one of the exercises, I should use the Urllib2 function library to search for a particular webpage and check if there is a particular word or phrase within that page. The idea is to use this process in a verbs conjugator, checking in an online dictionary if the verb entered by the user is regular).

Basically, this is what should happen:

Python 2.7.9 (default, Dec 10 2014, 12:24:55) [MSC v.1500 32 bit (Intel)] on win
32
Type "help", "copyright", "credits" or "license" for more information.
>>> import urllib2
>>> verbo = 'amar'
>>> pagina = urllib2.urlopen('http://pt.wiktionary.org/wiki/' + verbo)
>>> pagina = pagina.read()
>>> "Verbo regular" in pagina
True
>>>

So far so good. However, if there is no page corresponding to the word entered by the user, the following error appears:

Python 2.7.9 (default, Dec 10 2014, 12:24:55) [MSC v.1500 32 bit (Intel)] on win
32
Type "help", "copyright", "credits" or "license" for more information.
>>> import urllib2
>>> verbo = '123ar'
>>> pagina = urllib2.urlopen('http://pt.wiktionary.org/wiki/' + verbo)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Python27\lib\urllib2.py", line 154, in urlopen
    return opener.open(url, data, timeout)
  File "C:\Python27\lib\urllib2.py", line 437, in open
    response = meth(req, response)
  File "C:\Python27\lib\urllib2.py", line 550, in http_response
    'http', request, response, code, msg, hdrs)
  File "C:\Python27\lib\urllib2.py", line 475, in error
    return self._call_chain(*args)
  File "C:\Python27\lib\urllib2.py", line 409, in _call_chain
    result = func(*args)
  File "C:\Python27\lib\urllib2.py", line 558, in http_error_default
    raise HTTPError(req.get_full_url(), code, msg, hdrs, fp)
urllib2.HTTPError: HTTP Error 404: Not Found
>>> pagina = pagina.read()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
NameError: name 'pagina' is not defined
>>>

Well, since the page in question ( link ) has a source code, I figured the program would read it and would do the verification in the same way, though this does not happen. Could someone suggest a solution?

Q: If I have not been very clear, or have left any important information out, please let me know. Another thing, I usually use a Linux to program, but I'm using a Windows at the moment, however, the error occurs on both systems.

Q.2: Forgive me for any conceptual error, as I said earlier, I am still an initiator in the art of programming. Speaking of which, I'm open to tips too =)

    
asked by anonymous 16.04.2015 / 01:42

1 answer

4

You should treat this error in a block try...except , when the function urlopen can not open a page, an exception HTTPError is released (it is a subclass of URLError ), so to treat it do the following:

 
import urllib2

try:
    verbo = '123ar'
    pagina = urllib2.urlopen('http://pt.wiktionary.org/wiki/{0}'.format(verbo)).read()
    print ("Verbo regular" in pagina)
except urllib2.HTTPError as e:
    print ("Nao foi possivel abrir a pagina. Erro {0}".format(e.code))
  

Well, like the page in question ( link )   source code, I figured the program would read it and do the   verification in the same way, though this does not happen. Someone   could you suggest a solution?

This happens because urllib2 works differently than urllib , a documentation quoted the following:

  

For error codes other than 200, work passes to the protocol_error_code handler method, via OpenerDirector.error() . Eventually, urllib2.HTTPDefaultErrorHandler will generate HTTPError if no other handler handles the error.

To get around this, there are two ways, the first is to get the source code in block except :

try:
    verbo = '123ar'
    pagina = urllib2.urlopen('http://pt.wiktionary.org/wiki/{0}'.format(verbo)).read()
except urllib2.HTTPError as e:
    pagina = e.fp.read()

And the second is to use urllib :

import urllib

verbo = '123ar'
pagina = urllib.urlopen('http://pt.wiktionary.org/wiki/{0}'.format(verbo)).read()
print ("Verbo regular" in pagina)
    
16.04.2015 / 01:55