Capture using Xpath

0

I'm doing a capture of a site using python (scrapy) and xpath

How to capture only 232.990 of the code below?

<div class="price-advantages-container">
    <div class="price-comparison">
        <div itemprop="price" class="price">
               <div>
                    <span>R$</span> 232.990
               </div>
        </div>
    </div>
</div>
I tried with response.xpath ('// div [contains (@class, "price")] / div / text ()') and it returned invisible characters like:

[<Selector xpath='//div[contains(@class, "price")]/div/text()' data='\r\n\t\t\t\t\t\t'>,
 <Selector xpath='//div[contains(@class, "price")]/div/text()' data='\r\n\t\t\t\t\t\t\t'>,
 <Selector xpath='//div[contains(@class, "price")]/div/text()' data='\r\n\t\t\t\t\t\t\t\t'>,
 <Selector xpath='//div[contains(@class, "price")]/div/text()' data=' 232.990\r\n\t\t\t\t\t\t\t'>,
 <Selector xpath='//div[contains(@class, "price")]/div/text()' data='\r\n\t\t\t\t\t\t'>,
 <Selector xpath='//div[contains(@class, "price")]/div/text()' data='\r\n\t\t\t\t\t'>]
    
asked by anonymous 28.08.2018 / 20:34

1 answer

3

You can filter by the itemprop attribute of the element, instead of filtering all the divs that have price in the class name. I'm using extract_first() to return only the first match and then strip() to remove the whitespace from the text.

from scrapy import Selector

source = '''<div class="price-advantages-container">
    <div class="price-comparison">
        <div itemprop="price" class="price">
               <div>
                    <span>R$</span> 232.990
               </div>
        </div>
    </div>
</div>'''

selector = Selector(text=source)

price = selector.xpath('//div[@itemprop="price"]/div/span/following-sibling::node()').extract_first().strip()

print("[*] Price: {}".format(price))

Result:
[*] Price: 232.990

    
31.08.2018 / 19:22