Problem in A \ u015bvagho \ u1e63a (Python3) encoding?

1

I'm working with wikipedia, and I'm having some coding problems. When I add such a link in my browser everything works fine:

link

That goes for the article with the following name:

link

That's the same article but the url is being shown differently. In other words, there is an encoding occurring there.

But I'm mining the topviews of wikipedia:

link

And in the case of this article I received the same title with the following name through the API:

link

"A\u015bvagho\u1e63a": {
  "assessment": "Stub",
  "num_users": 1,
  "assessment_img": "f/f5/Symbol_stub_class.svg",
  "num_edits": 1
},

But when I try to mount the following url:

https://en.wikipedia.org/wiki/A\u015bvagho\u1e63a

Certainty does not work.

What I want to know is how I can code this way (In python3):

A\u015bvagho\u1e63a -> A%C5%9Bvagho%E1%B9%A3a

    
asked by anonymous 14.09.2018 / 19:38

1 answer

1

Some modules that can perform this encoding:

#!/usr/bin/env python3
# -*- coding: utf-8 -*-
"""html encoding (Escaping HTML)"""
import cgi
import html
from urllib.parse import quote_plus

string = 'A\u015bvagho\u1e63a'

# cgi.escape(), utilize ele apenas com Python 2.
# No Python 3 ele irá entrar em desuso em versões futuras (Deprecated).
print('CGI escape:', cgi.escape(string))
print('HTML escape:', html.escape(string))
print('Quote plus:', quote_plus(string))

URL = 'https://en.wikipedia.org/wiki/'

print('CGI escape:', URL + cgi.escape(string))
print('HTML escape:', URL + html.escape(string))
print('Quote plus:', URL + quote_plus(string))

I believe there are other ways, however this will depend on how you are searching and saving the data.

    
15.09.2018 / 01:34