Calculation Probabilities T-Student in the R

Question

Calculation Probabilities T-Student in the R

Navigation

#1 by (5 votes)

4

I have the mean and standard deviation of my distribution:

mean = -0.49 ; sd=3.029041

How do I calculate the probability of y being a standard deviation below the mean using T-Student with 85 degrees of freedom? Would it be: P(y< média(y)-sd(y)) ?

1-pt(mean-sd, df = 85, lower.tail = FALSE)

Is it correct or should I do this:

1-pt(((mean-sd-mean)/sd), df = 85, lower.tail = FALSE)

Edited:

I want to calculate this Probability P(x< média(x)-sd(x)) using the T-Student table. Since média and standard deviation is sample I should use the T-Student distribution, for this I must standardize: t = (X - meadia(x))/Sd(x) ~ t-Student . Correct?

Since it is not the sample mean, I do not need to use raiz of n . Right then in my case the X would be: X = média(x)-sd(x)

Standardizing for T-Student :

t = (X - media(x))/Sd(x) = (média(x)-sd(x) - media(X))/Sd(X)

So what I want to calculate:

P((x< média(x)-sd(x)) = P (X - media(x))/Sd(x))< (média(x)-sd(x) - media(x))/Sd(x) = P (t < (média(x)-sd(x)-media(x))/Sd(x)))

Is this correct? How to do this in R?

r

asked by anonymous 30.10.2016 / 20:21

1 answer

Convert Stack Trace to String in JAVA How to read Yaml with Python?

score 5 · Accepted Answer

You should not do this calculation in any of these ways. As it is formulated, the question does not seem to make much sense to me. The Student t distribution is always centered at zero (unless it is a non-central Student t distribution, which does not appear to be the case). So for your problem, you will always be calculating a probability that will not be tied to the average estimate of your sample. This may not be apparent with a small average as in this example, but increase the mean value to 100, for example, and see what I'm talking about.

The sample mean has asymptotically normal distribution , with mean equal to μ and variance σ ^ 2 / n, where μ is the population mean, σ ^ 2 is the population variance and n is the sample size. Thus, it is easy to see that we can use the normal distribution to calculate the probability of a random variable being a standard deviation below the mean and a standard deviation above.

set.seed(1234)
x <- rnorm(86, mean=.5, sd=.3) # amostra aleatoria

media <- mean(x) # estimador pontual da media
erro_padrao <- sd(x)/sqrt(length(x)) # estimador do erro padrao

media-erro_padrao # media - erro padrao
[1] 0.4683514
media+erro_padrao # media + erro padrao
[1] 0.5321069

pnorm(media-erro_padrao, mean=media, sd=erro_padrao, lower.tail=TRUE)
[1] 0.1586553
pnorm(media+erro_padrao, mean=media, sd=erro_padrao, lower.tail=FALSE)
[1] 0.1586553

The question is not very detailed, so I can not be sure what your real reason is in calculating these probabilities. Perhaps if there are more details about your real purpose, the people here in the forum will be able to help you a bit more.

Complement after editing the question: For me, this problem still does not make sense. I may just be having a hard time understanding it, but I'll try to explain it in items because I do not think it can be solved in this way.

Where did the analyzed data come from? To say that a distribution has an average -0.49 and standard deviation 3.029041 does not mean much. Is it symmetrical about the mean, for example? Does it have many outliers? Does it have a bell shape? From U?

Why use the t distribution? Even if your data came from a sample, I would only use it if I had any suspicion about heavy tails in your distribution. In addition, the calculation of the standardization of variables is only defined for variables with approximately normal distribution. Even if your data has a t-distribution, the heavy tails of this distribution will influence this calculation because, well, your variable has Student's t-distribution and standardization is not defined in this case.

The formula (x-mean (x)) / sd (x) only works if x has a distribution approximately normal due to Central Limit Theorem . This theorem is defined only for random variables with asymptotically normal distribution. So I solved this problem in the way I presented earlier: the sample mean has asymptotically normal distribution , regardless of the distribution of random variables

Is it possible to do the way you are doing? Yes, but it will not be correct. This proposed standardization does not exist for t. So you're going to get something like a z-value, but it has no real meaning. After all, what does (x-mean (x)) / sd (x) mean in the t? What is the distribution of this transformation? I do not know if it's t. I only know the case where x is normal or the case where we use the sample mean.

If your data is normal, use the cumulative from normal directly. And it is not even necessary to standardize the variable, since it is possible to calculate these probabilities directly. Unless, of course, you want to find these values in a table. Then you can do this transformation without problems.