Mechanize with Nokogiri: trying to fetch information in divs

0

Hello

I am setting up a crawler to get product information, for this I am using the mechanize and consequently the nokogiri, I have a URL ( link ) that returns only one product, but I can not hit the regular expression to get the price of that item, HTML snippet example:

HTML

                <div class="pager top" id="PagerTop_66064345"></div><div id="ResultItems_66064345" class="prateleira vitrine"><div class="prateleira vitrine n1colunas"><ul><li layout="45e718bf-51b0-49c4-8882-725649af0594" class="informatica--teclado-notebook-tablet-pen-drive-|-megamamute last">

    <input type="hidden" class="x-id" value="55492" />

    <div class="x-product">

        <div class="x-selos">
            <p class="flag desconto-10--off-no-boleto">Desconto 10% off no boleto</p>

            <p class="flag Informática" style="display:none;">Informática</p>
        </div>

        <div class="x-get-skuId x-hide"><div class="buy-button-normal" id="55492" name="55492"><a class="buy-button-normal-a55492" href="https://www.megamamute.com.br/checkout/cart/add?sku=55492&qty=1&seller=1&sc=1&price=224900&cv=254ca7d1b9d7fb34e47ca55ceec1b2c0_geral:0F62E16B17B76A6FE17EC7C23A655D8B&sc=1" title="Comprar">Comprar</a><input type="hidden" value="cart" class="buy-button-normal-go-to-cart-55492" /></div></div>

        <div class="x-departamento">
            Multifuncional Laser Monocromática
        </div>

        <div class="x-image">
            <a class="x-productImage" title="Impressora Multifuncional Brother DCP-L5652DN Laser Mono" href="http://www.megamamute.com.br/impressora-multifuncional-brother-dcp-l5652dn-laser-mono/p">
                <img src="http://megamamute.vteximg.com.br/arquivos/ids/6658677-500-500/55492_original.jpg"width="500" height="500" alt="55492_original" id="" />
            </a>
        </div>

        <h2 class="product-name">
            <a title="Impressora Multifuncional Brother DCP-L5652DN Laser Mono" href="http://www.megamamute.com.br/impressora-multifuncional-brother-dcp-l5652dn-laser-mono/p">
                Impressora Multifuncional Brother DCP-L5652DN Laser Mono
            </a>
        </h2>

        <div data-trustvox-product-code="55492"></div>

                    <div class="x-price">
                <a title="Impressora Multifuncional Brother DCP-L5652DN Laser Mono" href="http://www.megamamute.com.br/impressora-multifuncional-brother-dcp-l5652dn-laser-mono/p">

                                            <span class="oldPrice">
                             R$ 2.899,00
                        </span> 

                        <span class="x-bestPrice">
                            R$ 2.249,00 
                        </span>

                    <span class="x-installment">
                                                     10X de <strong>R$ 224,90</strong> sem juros
                                            </em> 
                </a>

            </div>

            <!--<div class="x-opiniao">-->
            <!--    <span class="rating-produto avaliacao0">0</span> <span class="navaliacao">(0)</span>-->
            <!--</div>-->



            <div class="x-info-product">
                <ul>
                    <li class="x-info"><a href="http://www.megamamute.com.br/impressora-multifuncional-brother-dcp-l5652dn-laser-mono/p"></a></li>
                    <li class="x-favorite"><a href="#"></a></li>
                    <li class="x-move"><a href="#"></a></li>
                    <li class="x-add"><a href="#"></a></li>
                </ul>

            </div>

            <div class="x-hover">
                <div class="x-buy"> <a class="x-productImage" title="Impressora Multifuncional Brother DCP-L5652DN Laser Mono" href="http://www.megamamute.com.br/impressora-multifuncional-brother-dcp-l5652dn-laser-mono/p"> Comprar </a></div>
                <a class="x-hoverHref" title="Impressora Multifuncional Brother DCP-L5652DN Laser Mono" href="http://www.megamamute.com.br/impressora-multifuncional-brother-dcp-l5652dn-laser-mono/p"></a>
                <ul>
                    <li class="x-info"><a href="http://www.megamamute.com.br/impressora-multifuncional-brother-dcp-l5652dn-laser-mono/p"></a></li>
                    <li class="x-favorite"><a href="#"></a></li>
                    <li class="x-move"><a href="#"></a></li>
                    <li class="x-add"><a href="#"></a></li>
                </ul>

            </div>


        <div class="x-brand"><p class="texto brand brother">brother</p></div>

</div>

>

Thank you!

    
asked by anonymous 26.07.2017 / 23:02

1 answer

0

If you are using nokogiri to extract information from HTML tags, I do not see why to use regular expression.

Here's an example using HTTParty (just adapt to your situation):

require 'httparty'
require 'nokogiri'

link = "https://pt.stackoverflow.com/questions/tagged/python"
response = HTTParty.get(link)
content = Nokogiri::HTML(response)

# Captura os dados presentes em todas as tags <a> com a class "question-hyperlink"
result = content.css('a[class=question-hyperlink]')

# Laço para percorrer e imprimir um por um
result.each do |question|
  puts(question.text)
end

Output

  

Import module into python

     

ValueError type error in Learn Python the Hard Way exercises

     

Error passing parameter to user.set_password function

     

Use the set and for function in the same structure

     

Know years, months, days, hours, etc ... That have passed since a certain date

     

Find duplicate element in time O (n) and space O (1) [pending]

     

How to coverter string for timestamp object?

     

Python says the name of my function does not exist [pending]

     

Error retrieving JSON and using API in Python

     

Python does not return files within a directory

     

[...]

This should also solve the problem to catch multiple 'x-product' divs.

Send the feedback if I understood your problem well and the solution helped you.

I'm waiting for you.

    
29.07.2017 / 06:00