As I said, this is a question more related to statistics, but since I do not have a statexchange in Portuguese, I'll help you with that.
The correlation method you are trying will only work for numeric variables, if you want to create view relationships between categorical variables with continuous variables what I recommend most would be boxplots or histograms / density.
I will demonstrate some examples in R of these analyzes. For this I'm using the dataset iris
found in the default package of datasets
and the package ggplot2
to plot the graphics. Within the dataset we will compare the different sizes of the%% of the different species that we have iris$Sepal.Length
.
BOXPLOT
require(datasets)
require(ggplot2)
ggplot(iris, aes(x = Species, y = Sepal.Length)) +
geom_boxplot()
DENSITY
require(datasets)require(ggplot2)ggplot(iris,aes(x=Sepal.Length,fill=Species))+geom_density(alpha=0.3)
Butifyoureallywanta"number" to guide yourself, an ANOVA test can give you this, basically it will tell you if the differences in the means (the test can be applied to other attributes) of your variable continues for each category are "statistically significant".
ANOVA
require(datasets)
anova <- aov(Sepal.Length ~ Species, iris)
summary(anova)
output:
Df Sum Sq Mean Sq F value Pr(>F)
Species 2 63.21 31.606 119.3 <2e-16 ***
Residuals 147 38.96 0.265
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
In this case, the null hypothesis that the sepals have an average of equal size is rejected by a p-value
28.03.2016 / 15:10