Select a part of the database in R

Question

Select a part of the database in R

Navigation

#1 by (4 votes)

3

I am doing an evaluation of the portal database of transparency that can be obtained in this link , the problem is that I would like to select only a part of the database, my evaluation is only about the data of teachers. I could do a data cleanup using Excel, but I would like to learn how to do it in R. For reading the data I am using the following code:

library(readr)

df <- read_delim("~/GitHub/Servidores/Setembro/20160930_Cadastro.csv", 
";", escape_double = FALSE, locale = locale(encoding = "ASCII"),
trim_ws = TRUE)

# As únicas colunas que importam são a 3ª (ID do servidor) 
# e a 6ª (remuneração bruta) na planilha de remuneração      

# Renomeando a coluna ID e de Remuneração básica bruta e 
# fazendo um merge no data frame para acrescentar os salários 
# de cada servidor

salarios <-        
read_delim("~/GitHub/Servidores/Setembro/20160930_Remuneracao.csv", ";",
escape_double = FALSE, locale = locale(encoding = "ASCII"),
trim_ws = TRUE) %>% select(3, 6) 
head(salarios)

names(salarios) <- c("ID_SERVIDOR_PORTAL", "SALARIO")

names(df) <- str_to_upper(names(df))
df <- merge(df, salarios, by="ID_SERVIDOR_PORTAL")
df$x <- 1

Once you have done this, I would like to know how to select a part of the database, only the part related to teachers, in order to study the database only for these.

r

asked by anonymous 28.10.2016 / 20:49

1 answer

How to put an image in an HTML button Return data obtained from OnResponse

score 4 · Accepted Answer

I could not read the data with your original commands. I changed them so my computer could work. If you can read these files with your original commands, ignore this part of my code.

setwd("~/GitHub/Servidores/Setembro/")

library(readr)
library(stringr)

cadastro <- read.table(file="20160930_Cadastro.csv", header=TRUE, sep="\t")

df <- read_delim("20160930_Cadastro.csv", "\t", escape_double=FALSE,
locale = locale(encoding = "Latin1"), trim_ws = TRUE)

# As únicas colunas que importam são a 3ª (ID do servidor) 
# e a 6ª (remuneração bruta) na planilha de remuneração      

# Renomeando a coluna ID e de Remuneração básica bruta e 
# fazendo um merge no data frame para acrescentar os salários 
# de cada servidor

salarios <- read_delim("20160930_Remuneracao.csv", "\t", escape_double = FALSE,
locale = locale(encoding = "Latin1"), trim_ws = TRUE) %>% select(3, 6) 

names(salarios) <- c("ID_SERVIDOR_PORTAL", "SALARIO")

names(df) <- str_to_upper(names(df))
df <- merge(df, salarios, by="ID_SERVIDOR_PORTAL")
df$x <- 1

# selecionar as posicoes no banco de dados df
# que possuem a string 'PROFESSOR' em algum lugar
# (talvez precise refinar isto dependendo
# do objetivo deste trabalho)

professores <- grep("PROFESSOR", df$DESCRICAO_CARGO)

# novo banco de dados apenas com as linhas dos 
# professores (ou melhor, dos servidores cuja
# descricao do cargo possui 'PROFESSOR' em algum 
# momento)

df.professores <- df[professores, ]