I'm trying to compare two distributions, however when I'm going to apply ks.test to both, just getting the value of 'D' and p-value coincidentally gives the same value for both, '< 2.2e-16 '. I had the idea of removing the values equal to zero to see the result, and the ks.test presented all values appropriately. But unfortunately, for this analysis, I have to also leave the values equal to zero.
Has anyone ever had this problem? Or any idea how to proceed? I need to have some value for p-value, to be able to accept or reject the null hypothesis.
My data is extensive, so I had not put it here. Follow below:
d<-c(4.1,3.7,11.1,15.0,5.1,12.3,0.1,0.2,0.0,0.4,0.0,23.2,0.0,0.0,13.2,0.0,0.0,0.0,0.0,18.6,3.3,0.2,4.2,0.1,0.0,0.7,11.6,1.0,28.9,0.0,0.0,0.0,2.3,10.5,9.7,1.7,0.0,0.5,0.0,1.9,16.7,26.4,9.2,1.2,1.4,9.0,35.3,8.6,0.6,0.0,0.0,0.1,0.5,2.9,27.2,0.0,0.0,0.0,0.0,15.4,0.0,0.0,5.3,1.3,2.1,0.3,22.1,0.0,0.0,5.7,4.2,68.5,1.7,8.7,0.0,9.6,0.0,15.6,0.0,1.9,14.8,0.1,2.4,0.0,0.0,1.1,22.0,1.8,39.4,0.0,0.1,29.5,14.0,0.0,4.5,0.0,37.2,0.0,0.0,21.6,0.0,21.6,1.3,24.5,1.9,1.8,14.1,12.1,0.0,0.1,0.0,0.0,0.2,15.4,1.2,0.4,0.0,0.0,0.0,0.0,0.1,18.9,0.2,0.7,0.8,0.6,17.2,0.0,0.0,0.1,0.1,0.0,0.0,0.1,0.0,0.7,21.2,35.7,0.0,0.0,.8,1.7,10.4,0.0,4.9,0.0,0.9,0.6,6.2,2.2,0.0,0.7,7.6,0.1,1.8,29.4,5.4,0.0,0.0,0.0,0.1,34.4,0.6,11.2,0.0,0.6,1.7,0.3,0.0,8.4,2.6,0.2,27.6,2.6,0.4,0.0,18.5,0.0,25.5,0.9,0.0,0.0,0.2,0.1,0.1,0.0,1.1,0.0,0.0,0.0,0.0,0.1,0.3,0.0,0.0,1.1,0.0,0.9,0.8,1.2,2.6,0.0,6.6,0.0,0.8,15.1,2.6,2.1,4.0,2.2,0.0,15.5,15.0,0.1,1.9,12.8,31.6,0.0,0.0,0.0,25.9,0.0,0.0,1.3,0.0,0.3,0.0,0.0,0.1,0.0,0.1,10.9,1.3,0.0,0.0,1.8,4.4,0.0,2.1,20.2,0.0,12.5,0.1,0.0,0.7,0.0,4.0,46.8,27.1,0.0,0.0,0.0,16.9,0.0,23.7,29.8,0.0,0.0,5.5,0.0,23.8,0.0,0.1,4.4,0.1,43.2,15.4,9.5,0.9,0.0,1.2,7.0,15.9,0.0,9.9,3.5,12.0,0.0,0.5,0.0,0.1,1.1,2.6,0.1,0.0,0.0,0.0,0.0,1.4,18.4,4.5,5.2,4.1,4.3,0.0,3.5,0.0,0.0,0.2,0.0,0.0,2.2,0.0,0.7,0.0,0.0,0.0,14.5,3.1,0.0,0.0,0.1,5.7,0.5,0.1,0.2,0.0,0.0,6.8,0.0,0.2,18.3,0.0,0.2,0.0,0.0,2.5,40.9,4.4,0.0,0.0,0.8,1.0,4.5,0.1,0.0,0.0,0.0,0.0,0.0,0.3,0.4,11.9,0.0,0.0,0.6,12.2,0.0,0.0,0.3,9.3,9.3,1.6,6.1,0.0,19.0,0.0,0.0,0.0,1.4,0.0,0.1,0.0,8.2,5.3,0.0,0.0,3.4,0.0,0.0,0.0,24.1,0.2,15.7,0.0,0.0,12.1,4.1,5.8,13.2,1.0,64.2,0.0,0.5,10.6,0.0,7.0,4.3,0.0,0.0,16.7,29.8,49.3,57.8,4.3,1.2,0.0,0.0,0.0,0.0,6.8,10.6,3.7,2.2,0.0,0.1,5.1,0.0,0.0,1.0,4.3,0.0,43.5,5.6,0.0,7.7,0.0,0.0,18.7,0.3,0.2,0.4,0.0,0.0,23.0,0.0,0.0,0.2,9.5,0.0,5.1,6.4,0.0,28.0,0.0,0.0,3.2,0.0,0.5,1.2,2.3,42.3,0.0,0.0,1.8,0.0,0.2,5.8,30.8,3.1,2.7)
The line of reasoning was as follows:
n<-length(d[!is.na(d)])
media<-mean(d)
desvio<-sd(d)
vetor<- as.vector(d[!is.na(d)])
variancia<-var(vetor)*(n-1)/n
alfa<-(media)^2/(variancia)
beta<-(variancia)/(media)
ks.test(vetor,"pgamma",shape=alfa, scale=beta)
D = 0.3792, p-value < 2.2e-16
alternative hypothesis: two-sided
Comparing with a normal:
ks.test(vetor,"pnorm",mean=media, sd=desvio)
D = 0.3002, p-value < 2.2e-16
alternative hypothesis: two-sided
I tested because I wanted to compare and see with the two distributions, Gamma and Normal. So that in the end I could compare the two values of p-value and see how best it would fit my data. But the two still appear p-value as: < 2.2e-16