What is the phase in which the data should be edited?

2

I am currently removing data from a web site, with data in English, through web scraping.

For example, if you want to translate the names or values of the fields into Portuguese, or complete abbreviations, the most appropriate approach is:

  • Make the change during the web scraping phase?
  • Or just make the change after having the raw data in a file or database?
asked by anonymous 22.01.2017 / 21:35

1 answer

1

After a more exhaustive search I found that the process I referred to is called data munging > (also known as wrangling data ), which involves cleaning extracted data to a more convenient format how it handles its aggregation, visualization and training of statistical models, among others).

The clearest and most accessible approach should be a separation between data acquisition and data munging.

    
23.01.2017 / 12:04