How to use "for" and "while" to capture cell tags from multiple tables in an html file?

0

I have several HTML files that I need to capture the data inside the tables, to launch in the database, but I can not navigate the html tree to find the tags that are cells, the html is this:

<div class="details">
   <div class="title-table"><h2> BEAUNE</h2>
   <div class="table-responsive">
      <div class="table-towers">
        <div id="table472dc5e9b46304cf95865f7db6c459aa" class="collapse in table-content">
           <div class="table-towers">
                 <div class="table-row">
                    <div class="table-cell build_type">Apartamento</div>
                    <div class="table-cell area_useful">220m²</div>
                    <div class="table-cell rooms">3</div>
                    <div class="table-cell garage">4</div>
                    <div class="table-cell bird_estimate_average">R$ 2.816.344,33*
            <p><small>(R$ 2.393.892,68 a R$ 3.238.795,98)</small></p>
        </div>
                 <div class="table-row">
        <div class="table-cell build_type">Cobertura</div>
                    <div class="table-cell area_useful">396m²</div>
                    <div class="table-cell rooms">3</div>
                    <div class="table-cell garage">5</div>
                    <div class="table-cell bird_estimate_average">R$ 5.069.419,80*
                             <p><small>(R$ 4.309.006,83 a R$ 5.829.832,77)</small></p>
                     </div>
   <div class="title-table"><h2>BERGERAC</h2>
      <div class="table-responsive">
          <div class="table-towers">
               <div id="table0b60c9a0a450b921186c91102da447d9" class="collapse table-content">
                   <div class="table-towers">
                       <div class="table-row">
                            <div class="table-cell build_type">Apartamento</div>
                    <div class="table-cell area_useful">220m²</div>
                    <div class="table-cell rooms">3</div>
                    <div class="table-cell garage">4</div>
                    <div class="table-cell bird_estimate_average">R$ 2.816.344,33*
                               <p><small>(R$ 2.393.892,68 a R$ 3.238.795,98)</small></p>
                 </div>
                                            <!-- asdasd -->
                </div>
                        </div>

I have 10 more tables, inside an HTML file, which follows the same structure, so I thought I'd do a "for" to get the title-table tag, which is the name of the table like this:

for id_torre in soup.find("div",{"class":"details"}).findAll("div",{"class":"title-table"}):#.findAll("h2"):
nm = id_torre.find("h2")
print(nm)

And with the list of table titles, I thought of the "while" so that it finds the table with each title and then it captures the data of the cells in each line, for later I launch in the database:

while len(id_torre) >0:
nm = id_torre
print(nm)

tipo = soup.find("div",{"class":id_torre}).find("div",{"class":"table-cell build_type"})
print(tipo)

m2_util = soup.find("div",{"class":id_torre}).find("div",{"class":"table-cell area_useful"})
print(m2_util)

dt = soup.find("div",{"class":id_torre}).find("div",{"class":"table-cell rooms"})
print(dt)

But it does "none" in all fields and loops endlessly. What's wrong with the code? I'm new to programming and python is the first language I'm learning.

    
asked by anonymous 03.11.2016 / 22:10

2 answers

0
Open the file using the open function (example: file = open ('filename.type'), so you can use this file in a for (where there would be no infinite loop), where it will move from line to line, the re library might be more useful than find if you know regular expression, with the library re, to find for example in that line '

  '
'

'

'the

  '

'

'

'

the '', are why the content was getting hidden

and extract the word 'Apartment' and save it to a database.

    
10.01.2017 / 16:31
0

This html has several unclosed tags and that's why the BeautifulSoup parser is lost. You can check the faults on this site: link

The following tags do not seem to be closed:

Line 1: <div class="details">
Line 2: <div class="title-table">
Line 3: <div class="table-responsive">
Line 4: <div class="table-towers">
Line 5: <div id="table472dc5e9b46304cf95865f7db6c459aa" class="collapse in table-content">
Line 6: <div class="table-towers">
Line 7: <div class="table-row">
Line 15: <div class="table-row">
Line 23: <div class="title-table">
Line 24: <div class="table-responsive">
Line 25: <div class="table-towers">
Line 26: <div id="table0b60c9a0a450b921186c91102da447d9" class="collapse table-content">

If you have not pasted the complete code, please do so. If the page actually contains these errors paste the complete code let us know so we can help.

    
20.01.2017 / 03:25