Web Scraping Selenium + Python in site with dynamic generation via JS = difficulty to map elements

Question

Web Scraping Selenium + Python in site with dynamic generation via JS = difficulty to map elements

Navigation

#1 by (1 votes)

4

Good afternoon. I'm developing a script that:

access a system;

Within the environment, you find certain information;

generates a kind of report;

creates a spreadsheet with the data.

My problem is still before parse. I can access the environment that contains the information, but I can not get the Selenium webdriver to locate the elements to click on to access the data that will appear in the report.

I get the impression that it's the javascript that's causing the confusion, since the frame information that "fires" the javascript is accessible, and the page with the result, visible to me, does not seem to be visible to the script.

How to work around javascript?

How can I make the webdriver "see" the final page the same way I see it?

(EDITED: Code below:)

from selenium import webdriver
import time
from selenium.common.exceptions import NoSuchFrameException
import os

if os.path.exists('c:\projudi') == False:
    os.makedirs('c:\projudi')

try:
    planilha = open('c:\projudi\relatorio.csv', 'r+')
except FileNotFoundError:
    planilha = open('c:\projudi\relatorio.csv', 'w+')

browser = webdriver.Chrome()
browser.get('https://projudi.tjpr.jus.br/projudi')
time.sleep(20)
browser.switch_to_frame('mainFrame')
browser.switch_to_frame('userMainFrame')
links = browser.find_elements_by_class_name('link')
n = len(links)

for x in range(0, n, 2):
    if links[x].text != ('0'):  
        links[x].click()
        time.sleep(2)
        try:
            browser.switch_to_frame('mainFrame')
            browser.switch_to_frame('userMainFrame')
            a = browser.find_elements_by_class_name('link')
        except NoSuchFrameException:
            a = browser.find_elements_by_class_name('link')
        if a != []:
            q = browser.find_elements_by_class_name('resultTable')
            w = q[0].text
            for x in range(len(w)):
                dados = w.split('\n')
            for x in range(len(dados)):
                planilha.writelines(dados[x])
            for x in range(int(len(a))):
                a[x].click()
                time.sleep(2)
                browser.back()
                time.sleep(2)
                browser.switch_to_frame('mainFrame')
                browser.switch_to_frame('userMainFrame')
                a = browser.find_elements_by_class_name('link')
            browser.back()
            time.sleep(2)
        else:       
            browser.back()
            time.sleep(2)
        browser.switch_to_frame('mainFrame')
        browser.switch_to_frame('userMainFrame')
        links = browser.find_elements_by_class_name('link')

planilha.close()    
browser.close()

My question: when I access the screen that contains the information I need (resultTable), I get it all and it generates a variable with a string containing all the data. I gave it a split, and I got a list of strings. So far, okay, I set it all up for the report file for further processing. Now ... how to control the FLOW? I already know that I will have to treat the string containing the DATA with regex in the list, since I only need to access the information of the present day until 2 days ago. But how to use this information as REFERENCE to Python? Example: scrip captures the table and plays to a list like this:

list = ['0004434-48.2010', 'UNITY', '(30 working days) 07/03/2017', '13 / 07/2017 ', '0008767-77.2013', '2017', '(10 business days) 07/03/2017', '13 / 07/2017 ']

The first item in the list is the first item in the table, row 1 and column 1. It contains the link. The control date is in the THIRD item, row 1 column 3. And item 5 is already the next row (row 2, column 1). I do not know if I could explain! = /

I need: 1 - Check the date. If it is today or yesterday: Click the first item on that row. If it is not, move on to the next line.

python web-scraping selenium

asked by anonymous 28.06.2017 / 21:23

1 answer

Confirmation dialog using angular Trying to upload JPG via Javascript and Ajax [duplicate]

score 1 · Answer 1

I do not know if I understood correctly what you want to do, but Selenium has several specific modules to be able to do what you want ... the problem is that you need to go to the html page and see which element is which to be able to capture with selenium.

from selenium.webdriver.common.keys import Keys         #importa a habilidade de input de chaves e senhas
from selenium.webdriver.support.ui import Select        #importa a habilidade de usar o select em boxes e pontos
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait #importa a habilidade de setar o 'wait time' do browser
from selenium.webdriver.support import expected_conditions as EC #importa a biblioteca de condições esperadas

Here are some useful selenium libraries ... Now to check the date of the day and check if the day is the current or the next I would recommend seeing the ID, the name or the id and using the command

variavel = driver.find_element_by_name('elemento').

Now ... if you have already captured the information and have it played in a file or variable then I suggest using Pandas to organize the information as dataframes.

To check the dates of a link I would get the link with find_element_by and then it would analyze what pixel the date starts and what pixel it ends (link [n: m]) and thus use datetime to compare the date searched with the current date.

to get the current date

import datetime
from datetime import timedelta
data_hoje = (datetime.datetime.now()).strftime("%d%m%Y")
data_ontem = (datetime.datetime.now() - timedelta(days = 1)).strftime("%d%m%Y")
data_um_dia_n_dias_atras = (datetime.datetime.now() - timedelta(days = n)).strftime("%d%m%Y")