Contingency table in R

Question

Contingency table in R

Navigation

#1 by (2 votes)

2

I have a table that boils down to the following:

Destino; Proposito; Custo;<br/>
Chicago; Negocios; 35;<br/>
Nova York; Negocios; 30;<br/>
Miami; Turismo; 25;<br/>
Chicago; Estudo; 50;<br/>
Nova York; Turismo; 40;<br/>
Miami; Estudo; 90;<br/>
Miami; Estudo; 110;<br/>
Chicago; Turismo; 30;<br/>
Miami; Negocios; 20;<br/>
Chicago; Turismo; 35;<br/>
Nova York; Negocios; 40;<br/>
Chicago; Estudo; 150;<br/>
Nova York; Turismo; 40;<br/>
Miami; Negocios; 30;<br/>
Nova York; Estudo; 140;<br/>
Chicago; Turismo; 35;<br/>
Nova York; Turismo; 40;<br/>

I'm trying to write a script in R that transforms this data into a contingency table with the following configuration:

Destino; Negocios; Turismo; Estudo; Total;<br/>
Miami; 50; 25; 200; 275;<br/>
Chicago; 35; 100; 150; 285;<br/>
Nova York; 70; 120; 140;<br/>
Total; 155; 130; 545;<br/>

The idea is to have a matrix with the sum of costs by "Target" and "Purpose" simultaneously.

The most I got, after much help from my colleagues with SQL experience, was:

require(sqldf)
%>% df
select Detino,
       sum(case when carater = 'Estudo'   then Custo else 0 end) as P_Estudo,
       sum(case when carater = 'Turismo'  then Custo else 0 end) as P_Turismo,
       sum(case when carater = 'Negocios' then Custo else 0 end) as P_Negocios,
       sum(Custo) as Total
from df
group by Destino

The task would be easier if the goal was to get the total costs by "Destination" or "Purpose." There are several ways to do this by using the functions group_by of package dplyr , or aggregate or xtabs .

Thanks for any suggestions that might help me solve the problem.

r reshape

asked by anonymous 19.12.2016 / 22:44

1 answer

Replacing characters in a string Generate PPM file

score 2 · Answer 1

Try to run the commands with > on the front, assuming your data is within a data frame called df :

> library(reshape2)
> acast(df, Destino ~ Proposito, fun.aggregate=sum)

Using Custo as value column: use value.var to override.
           Estudo  Negocios  Turismo
Chicago       200        35      100
Miami         200        50       25
Nova York     140        70      120

Now just add the margins to the totals:

> addmargins(acast(df, Destino ~ Proposito, fun.aggregate=sum))

Using Custo as value column: use value.var to override.
           Estudo  Negocios  Turismo
Chicago       200        35      100
Miami         200        50       25
Nova York     140        70      120

Notice that in the second block of commands I repeated the command acast . In fact, it is not necessary to separate this into two steps. I did so to be a step by step, to improve understanding.