I am doing an evaluation of the portal database of transparency that can be obtained in this link , the problem is that I would like to select only a part of the database, my evaluation is only about the data of teachers. I could do a data cleanup using Excel, but I would like to learn how to do it in R. For reading the data I am using the following code:
library(readr)
df <- read_delim("~/GitHub/Servidores/Setembro/20160930_Cadastro.csv",
";", escape_double = FALSE, locale = locale(encoding = "ASCII"),
trim_ws = TRUE)
# As únicas colunas que importam são a 3ª (ID do servidor)
# e a 6ª (remuneração bruta) na planilha de remuneração
# Renomeando a coluna ID e de Remuneração básica bruta e
# fazendo um merge no data frame para acrescentar os salários
# de cada servidor
salarios <-
read_delim("~/GitHub/Servidores/Setembro/20160930_Remuneracao.csv", ";",
escape_double = FALSE, locale = locale(encoding = "ASCII"),
trim_ws = TRUE) %>% select(3, 6)
head(salarios)
names(salarios) <- c("ID_SERVIDOR_PORTAL", "SALARIO")
names(df) <- str_to_upper(names(df))
df <- merge(df, salarios, by="ID_SERVIDOR_PORTAL")
df$x <- 1
Once you have done this, I would like to know how to select a part of the database, only the part related to teachers, in order to study the database only for these.