How can I assign a variable to a table? [closed]

1

This table is only in one column! I need to assign the variable age, to do the average, variance, ..., but I can not do this because they are in the same column.

ID, Name, Sex, Age, Height, Weight, Team, NOC, Games, Year, Season, City, Sport "," Event "," Medal " 1, "A Dijiang", "M", 24,180.80, "China", "CHN", "1992 Summer", 1992, "Summer", "Barcelona", "Basketball", "Basketball Men's Basketball", NA Judo Men's Extra-Lightweight "," Judo Men "," Judo Men "," Judo Men "," Summer "," Summer " AT 3, "Gunnar Nielsen Aaby", "M", 24, NA, "Denmark", "DEN", "1920 Summer", 1920, "Summer", "Antwerpen", "Football", "Football Men's Football" ,AT 4, "Edgar Lindenau Aabye", "M", 34, NA, NA, "Denmark / Sweden", "DEN", "1900 Summer", 1900, "Summer", "Paris" , "Tug-Of-War Men's Tug-Of-War", "Gold" 5, "Christine Jacoba Aaftink", "F", 21,185.82, "Netherlands", "NED", "1988 Winter", 1988, "Winter", "Calgary", "Speed Skating", "Speed Skating Women's 500 meters ",AT 5, "Christine Jacoba Aaftink", "F", 21,185.82, "Netherlands", "NED", "1988 Winter", 1988, "Winter", "Calgary", "Speed Skating", "Speed Skating Women's 1,000 meters ",AT 5, "Christine Jacoba Aaftink", "F", 25,185.82, "Netherlands", "NED", "1992 Winter", 1992, "Winter", "Albertville", "Speed Skating", "Speed Skating Women's 500 meters ",AT 5, "Christine Jacoba Aaftink", "F", 25,185.82, "Netherlands", "NED", "1992 Winter", 1992, "Winter", "Albertville", "Speed Skating", "Speed Skating Women's 1,000 meters ",AT 5, "Christine Jacoba Aaftink", "F", 27,185.82, "Netherlands", "NED", "1994 Winter", 1994, "Winter", "Lillehammer", "Speed Skating", "Speed Skating Women's 500 meters ",AT 5, "Christine Jacoba Aaftink", "F", 27,185.82, "Netherlands", "NED", "1994 Winter", 1994, "Winter", "Lillehammer", "Speed Skating", "Speed Skating Women's 1,000 meters ",AT "Cross Country Skiing", "Cross Country Skiing", "Cross Country Skiing", "Cross Country Skiing", "Winter Knot" Men's 10 kilometers ", NA "Cross Country Skiing", "Cross Country Skiing", "Cross Country Skiing", "Cross Country Skiing", "Winter Knot" Men's 50 kilometers ", NA "Cross Country Skiing", "Cross Country Skiing", "Cross Country Skiing", "Cross Country Skiing", "Winter Knot" Men's 10/15 kilometers Pursuit ", NA "Cross Country Skiing", "Cross Country Skiing", "Cross Country Skiing", "Cross Country Skiing", "Winter Knot" Men's 4 x 10 km Relay ", NA "Cross Country Skiing", "Cross Country Skiing", "Cross Country Skiing", "Cross Country Skiing", "Winter Knot" Men's 10 kilometers ", NA "Cross Country Skiing", "Cross Country Skiing", "Cross Country Skiing", "Cross Country Skiing", "Winter Knot" Men's 30 kilometers ", NA "Cross Country Skiing", "Cross Country Skiing", "Cross Country Skiing", "Cross Country Skiing", "Winter Knot" Men's 10/15 kilometers Pursuit ", NA "Cross Country Skiing", "Cross Country Skiing", "Cross Country Skiing", "Cross Country Skiing", "Winter Knot" Men's 4 x 10 km Relay ", NA 7, "John Aalberg", "M", 31,183.72, "United States", "USA", "1992 Winter", 1992, "Winter", "Albertville", "Cross Country Skiing", "Cross Country Skiing Men's 10 kilometers ", NA 7, "John Aalberg", "M", 31,183.72, "United States", "USA", "1992 Winter", 1992, "Winter", "Albertville", "Cross Country Skiing", "Cross Country Skiing Men's 50 kilometers ", NA 7, "John Aalberg", "M", 31,183.72, "United States", "USA", "1992 Winter", 1992, "Winter", "Albertville", "Cross Country Skiing", "Cross Country Skiing Men's 10/15 kilometers Pursuit ", NA 7, "John Aalberg", "M", 31,183.72, "United States", "USA", "1992 Winter", 1992, "Winter", "Albertville", "Cross Country Skiing", "Cross Country Skiing Men's 4 x 10 km Relay ", NA

    
asked by anonymous 02.09.2018 / 18:11

1 answer

2

Your data takes the form of a comma-separated file (csv), but with the presence of quotation marks for the values. I believe that a simple read.csv will not work (other suggestions are welcome).

One solution is to copy this data into a txt file and read with read.table . First let's read the file indicating the separation by space. In this way each element will contain the entire line of the data frame

# ler ficheiro (warning é devido a falta de estrutura do documento)
dados <- read.table('data.txt', sep = ' ')

# separar entre cabeçalho e dados
cab <- dados[1]
dados <- dados[-1]

Now let's separate each value separated by a comma with the function strsplit()

# separar cada elemento por vírgula
df <- data.frame(matrix(NA, nrow = 1, ncol = 15))
for(i in 1:length(dados)) {
  df[i, ] <- unlist(strsplit(as.character(dados[, i]), ','))
}

# incluir cabeçalho
names(df) <- unlist(strsplit(as.character(cab[, 1]), ','))

# transformar variáveis
for(i in c(4, 5, 6, 10)) {
  df[, i] <- as.numeric(df[, i])
}
    
02.09.2018 / 23:09