Import CSV and convert date in dd / mm / yyyy format to yyyy-mm-dd

2

I'm totally new to programming, both in Python and Sql, so I have some questions that may seem basic. I get a .CSV file that comes with lots of information not compatible with Sql so I always have to manually change and import using the following command:

BULK INSERT clientes
FROM 'C:\import\clientes.csv'
WITH
(
FIRSTROW = 2,
FIELDTERMINATOR = ';',
ROWTERMINATOR = '\n',
ERRORFILE = 'C:\import\clientesErro.csv',
TABLOCK
)

(this example I took from the internet but that's what I use) I happen to have some columns with date format and would like to convert the dates that comes in dd / mm / yyyy to aaa-mm-dd. I would also need to convert numbers where the decimal places are separated with "," example: 1001.10 for 1001.10 but I believe that solving the problem of the dates I can resolve this too.

    
asked by anonymous 19.05.2017 / 17:38

2 answers

2

I was able to resolve this:

(at this point I'm assuming you've already been able to connect to your database).

Import these faces:

import os
#esse __init__é o arquivo onde salvei as conexões com o banco de dados
from __init__ import mycursor, connection
# csv é o modulo que é usado para ler o arquivo csv
import csv
# modulo datetime é usado para tratar strings como formato de data
from datetime import datetime, date, timedelta, datetime

After importing, I used this command: Note: this command is not required, I just used it because my code needed to read the same file several times, so better save it to a variable.

caminho_arquivo = './arquivos/seu_arquivo.txt'

Now let's read the file: Warning for the encoding I used latin-1 because my file was saved in this pattern, it may be that your is in another encoding and this may generate errors in the accents and 'ç'.

def ler_e_inserir_arquivo():
    with open(caminho_arquivo, newline='', encoding="latin-1") as arquivo:
        conteudo = csv.reader(arquivo, delimiter=';')
        for coluna in conteudo:
            query_com_insert = (
                "INSERT INTO nome_tabela(cd_cred,operacao,sigla_assessoria,data_atual)" "VALUES (%(cd_cred)s, %(operacao)s,%(sigla_assessoria)s,%(data_atual)s) "
            )
            tratamento_dos_dados = {
                'cd_cred': coluna[0],
                'operacao': coluna[1],
                'sigla_assessoria': coluna[2],
                'data_atual': datetime.strptime(coluna[3], "%d/%m/%Y").strftime("%Y-%m-%d"),
            }
            mycursor.execute(query_com_insert, tratamento_dos_dados)
            connection.commit()

Let's now break it down:

At this point I am informing ande I want to insert the data (query equal to sql)

INSERT INTO nome_tabela(cd_cred,operacao,sigla_assessoria,data_atual)

At this point I'm passing where the values are, note that there is a% (name) s. Within these relatives is the field name that I defined in the data dictionary.

"VALUES (%(cd_cred)s, %(operacao)s,%(sigla_assessoria)s,%(data_atual)s)

"data_process" is my dictionary in it I assign a key and then its value:

When the query searches the dictionary for the word 'cd_cred' I will pass the information that is in column [0] of my file (remembering that in python the lists start at 0).

Now it comes apart that everyone was looking for, like transforming the date:

'data_atual': datetime.strptime(coluna[3], "%d/%m/%Y").strftime("%Y-%m-%d")

I used the datetime module and "told" payton that the string that was in column [3] should be treated as a date format dd / mm / yyyy (12/20/2017 for example) and then in use sequence the .strftime is asking for python to convert my date to the default aaa-mm-dd (2017-12-20 for example).

After that, I move to the cursor that was defined in the init file to execute the commands:

        mycursor.execute(query_com_insert, tratamento_dos_dados)
        connection.commit()

Ready. Now just put at the end of your file already out of the repeat loop the command:

read_and_read_file ()

So you're 'calling' the function that was defined in the previous steps. I hope I have helped.

    
29.12.2017 / 18:25
0

It's important to be familiar with regular expressions :

import re

The sentence below changes the date format:

re.sub('(\d{2})\/(\d{2})\/(\d{4})', r"--", '19/05/2017')

Result:

>>> re.sub('(\d{2})\/(\d{2})\/(\d{4})', r"--", '19/05/2017')
'2017-05-19'

In the case of financial values, it would look something like this:

re.sub('(\d+),(\d{2})', r".", '1001,10')

Result:

>>> re.sub('(\d+),(\d{2})', r".", '1001,10')
'1001.10'
    
19.05.2017 / 18:16