Degree of class grouping

2

I have several classes and each of these classes consists of dozens of students, and each year these students change class. So I would like to calculate the degree of grouping that a class keeps from year to year in an automatic way. For example, in 2015 a school has two 1st grade classes, as below:

turma1a <- c('A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J')   
turma1b <- c('K', 'L', 'M', 'N', 'O', 'P', 'Q', 'R', 'S', 'T', 'U')

And in the year 2016 these students went to the second grade in 2016 going to two new classes in a random way, as below:

turma2a <- c('A', 'B', 'C', 'D', 'E', 'F', 'K', 'L', 'M', 'N')  
turma2b <- c('O', 'P', 'Q', 'R', 'S', 'T', 'U', 'G', 'H', 'I', 'J')

In this way, I would like to determine that turma1a was 60% for turma2a and that the other was 63.63% for 2b .

I tried to intersect in R , knowing which are the most similar classes, but I would need to do with dozens of classes compared to each other.

   turma1a <- c('A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J')
   turma1b <- c('K', 'L', 'M', 'N', 'O', 'P', 'Q', 'R', 'S', 'T', 'U')
   turma2a <- c('A', 'B', 'C', 'D', 'E', 'F', 'K', 'L', 'M', 'N')
   turma2b <- c('O', 'P', 'Q', 'R', 'S', 'T', 'U', 'G', 'H', 'I', 'J')
   intersect(turma1a, turma2a)
   [1] "A" "B" "C" "D" "E" "F"

With this script , I find the students in common, but I would need it to be automatic, since I need to analyze dozens of classes.

    
asked by anonymous 11.09.2017 / 21:59

1 answer

2

One possible solution is to use a double cycle with lapply to create a list of common learners and their proportions. For this, it is best to have the classes together in lists.

lista_t1 <- list(turma1a, turma1b)
names(lista_t1) <- c("turma1a", "turma1b")
lista_t2 <- list(turma2a, turma2b)
names(lista_t2) <- c("turma2a", "turma2b")

Now we use the lapply cycles in these lists.

resultado <- lapply(lista_t1, function(x)
                lapply(lista_t2, function(y) {
                    int <- intersect(x, y)
                    list(comuns = int, prop = length(int)/length(x))
                })
            )

Of course there must be many other ways to solve this problem. This is just one of them and maybe the data structure of resultado is not the best. (It's always tricky to work with lists and sub-lists, etc.)

Note:
It's also probably better to automate some class list creation operations. For example, class names can be assigned with

names(lista_t1) <- ls()[grep("turma1", ls())]
names(lista_t2) <- ls()[grep("turma2", ls())]

This avoids having to write all the names.

    
12.09.2017 / 01:02