I have a table that boils down to the following:
Destino; Proposito; Custo;<br/>
Chicago; Negocios; 35;<br/>
Nova York; Negocios; 30;<br/>
Miami; Turismo; 25;<br/>
Chicago; Estudo; 50;<br/>
Nova York; Turismo; 40;<br/>
Miami; Estudo; 90;<br/>
Miami; Estudo; 110;<br/>
Chicago; Turismo; 30;<br/>
Miami; Negocios; 20;<br/>
Chicago; Turismo; 35;<br/>
Nova York; Negocios; 40;<br/>
Chicago; Estudo; 150;<br/>
Nova York; Turismo; 40;<br/>
Miami; Negocios; 30;<br/>
Nova York; Estudo; 140;<br/>
Chicago; Turismo; 35;<br/>
Nova York; Turismo; 40;<br/>
I'm trying to write a script in R that transforms this data into a contingency table with the following configuration:
Destino; Negocios; Turismo; Estudo; Total;<br/>
Miami; 50; 25; 200; 275;<br/>
Chicago; 35; 100; 150; 285;<br/>
Nova York; 70; 120; 140;<br/>
Total; 155; 130; 545;<br/>
The idea is to have a matrix with the sum of costs by "Target" and "Purpose" simultaneously.
The most I got, after much help from my colleagues with SQL experience, was:
require(sqldf)
%>% df
select Detino,
sum(case when carater = 'Estudo' then Custo else 0 end) as P_Estudo,
sum(case when carater = 'Turismo' then Custo else 0 end) as P_Turismo,
sum(case when carater = 'Negocios' then Custo else 0 end) as P_Negocios,
sum(Custo) as Total
from df
group by Destino
The task would be easier if the goal was to get the total costs by "Destination" or "Purpose." There are several ways to do this by using the functions group_by
of package dplyr
, or aggregate
or xtabs
.
Thanks for any suggestions that might help me solve the problem.