Python. Get a substring in formatted text

-1

Hello.

I have a string formatted with several attributes, I need to get all the "text" fields. In this example, I need to get "Gmail" and "Youtube" and discard everything else. Using Python

<node bounds="{0, 0, 540, 36}" 
checked="false" 
class="android.widget.FrameLayout" 
clickable="false" 
enabled="true" 
id="" 
index="1" 
indexId="-1" 
long-clickable="false" 
package="com.android.systemui" 
password="false" 
resource-id="com.android.systemui:id/hided_by_cover_group2" 
scrollable="false" 
selected="false" 
stringname="" 
talkback="" 
text="" 
><node bounds="{0, 0, 540, 36}" 
checked="false" 
class="android.widget.FrameLayout" 
clickable="false" 
enabled="true" 
id="" 
index="2" 
indexId="-1" 
long-clickable="false" 
package="com.android.systemui" 
password="false" 
resource-id="com.android.systemui:id/msim_panel_holder" 
scrollable="false" 
selected="false" 
stringname="" 
talkback="Gmail" 
text="Gmail" 
></node>

<node bounds="{0, 0, 540, 36}" 
checked="false" 
class="android.widget.FrameLayout" 
clickable="false" 
enabled="true" 
id="" 
index="1" 
indexId="-1" 
long-clickable="false" 
package="com.android.systemui" 
password="false" 
resource-id="com.android.systemui:id/hided_by_cover_group2" 
scrollable="false" 
selected="false" 
stringname="" 
talkback="" 
text="" 
><node bounds="{0, 0, 540, 36}" 
checked="false" 
class="android.widget.FrameLayout" 
clickable="false" 
enabled="true" 
id="" 
index="2" 
indexId="-1" 
long-clickable="false" 
package="com.android.systemui" 
password="false" 
resource-id="com.android.systemui:id/msim_panel_holder" 
scrollable="false" 
selected="false" 
stringname="" 
talkback="Youtube" 
text="Youtube" 
></node>

Thank you

    
asked by anonymous 14.06.2016 / 22:21

1 answer

0

Although it is a simple example of picking up data or filtering the lines or using regular expressions - since the text is a valid XML, the recommendation is to use the XML tools to get the desired values.

In this way, if at some future point, the "external" formatting of the file changes, or if you need other fields, the code remains valid - in addition to that XML has some other features that would be valid in the input data, and that reading as plain text would simply discard (use of entities such as character names, etc. .).

Now - your XML is malformatted - or why you did not paste the whole file, or why you edited the hand - notice that in the given excerpt there is not a "root" element that is the parent of all others, and some of the node does not have the closing tag. With a well-formed xml, which is in the variable named "text" you can do:

from xml.elementtree import ElementTree as ET
xml = ET.fromstring(texto.strip())
atributos = [element.get("text") for element in xml.iter() if element.tag=="node" and element.get("text", None)]

The first line imports the ElementTree Python class to work with XML. The second constructs a "live" XML object using the formatted string (the "parse" method instead of "fromstring" can read a file directly). The third line uses a list comprehension - a for inline of Python to visit all elements of its XML, and if the tag is node and there is some content in the text attribute it is used as part of the list. In the end, the variable "attributes" will have everything that would be in any "text" element in this structure

    
15.06.2016 / 17:44