Feed parsing in Python

1

Access a feed whose layout looks like this:

<horoscope>
	<date>20170627</date>
	<listItem>
		<item>
		<signTitle begin="21/03" end="19/04">Áries</signTitle>
		<content>
			BNononononononononononon
		</content>
		<linktexto>
			<![CDATA[ 
			 <a href='' target='blank'></a> ]]>
		</linktexto>
		<textosaida>
			<![CDATA[ 
			 ]]>
		</textosaida>
		<linksaida>
			<![CDATA[ 
			 <a href='' target='blank'></a> ]]>
		</linksaida>
		</item>
	</listItem>
</horoscope>

When parsing using the feedparser library, I want to extract the value of the tag in the case of "Aries", but instead I get the following output:

{'begin': '21 / 03 ',' end ': '19 / 04'}

What are the "begin" and "end" attributes of the tag. But the inner value does not really come. My code goes below:

import feedparser
d = feedparser.parse(caminho_do_xml)
for post in d.entries:
  print(post.signtitle)

How can I access the content of the tag, rather than just the attributes? Thank you.

    
asked by anonymous 27.06.2017 / 15:09

2 answers

1

What about:

import feedparser

rssfeed = """
<horoscope>
    <date>20170627</date>
    <listItem>
        <item>
        <signTitle begin="21/03" end="19/04">Aries</signTitle>
        <content>
            BNononononononononononon
        </content>
        <linktexto>
            <![CDATA[
             <a href='' target='blank'></a> ]]>
        </linktexto>
        <textosaida>
            <![CDATA[
             ]]>
        </textosaida>
        <linksaida>
            <![CDATA[
             <a href='' target='blank'></a> ]]>
        </linksaida>
        </item>
    </listItem>
</horoscope>"""

d = feedparser.parse(rssfeed)

for e in d.entries:
    print e['content'][0].value

Output:

BNononononononononononon
    
27.06.2017 / 16:43
0

If I understand what you need, you will not need third-party libraries (as I think feedparser is). Python natively has a library for working with XML. See:

import xml.etree.ElementTree

# Elemento raiz do XML:
root = xml.etree.ElementTree.parse("feed.xml").getroot()

# Itera sobre todos os elementos listItem:
for listItem in root.iter("listItem"):

    # Itera sobre todos os elementos item:
    for item in listItem.iter("item"):

        # Busca pelo elemento signTitle:
        signTitle = item.find("signTitle")

        # Imprime seu conteúdo:
        print(signTitle.text)
  

See working at Repl.it .

    
27.06.2017 / 15:57