Stats help - Variance

Stats help - Variance

Author
Discussion

RizzoTheRat

Original Poster:

25,165 posts

192 months

Thursday 19th February 2015
quotequote all
If I have a 2 sets of statistics from 2 identical tests on different populations (but same number of sample of each) how do I get a combined variance for the total data set?

eg if I had the variance in height of a group of 100 males, and a group of 100 females, can I calculate the variance in the group of 200?

This link seems to suggest I can just add the variances, but can I really do that if the two populations have different means? http://onlinestatbook.com/2/summarizing_distributi...
The other problem I have with that approach is if instead of joining the male and female groups I accidentally picked the make group twice, I'd have 2 measurement for each individual, therefore my gut feeling is the variance wouldn't change, why on earth would it double?

I am however aware that statistics rarely follow the "gut feeling" logic and I'm now getting even more confused. Any suggestions?

popeyewhite

19,875 posts

120 months

Thursday 19th February 2015
quotequote all
RizzoTheRat said:
If I have a 2 sets of statistics from 2 identical tests on different populations (but same number of sample of each) how do I get a combined variance for the total data set?

eg if I had the variance in height of a group of 100 males, and a group of 100 females, can I calculate the variance in the group of 200?

This link seems to suggest I can just add the variances, but can I really do that if the two populations have different means? http://onlinestatbook.com/2/summarizing_distributi...
The other problem I have with that approach is if instead of joining the male and female groups I accidentally picked the make group twice, I'd have 2 measurement for each individual, therefore my gut feeling is the variance wouldn't change, why on earth would it double?

I am however aware that statistics rarely follow the "gut feeling" logic and I'm now getting even more confused. Any suggestions?
Define your dependent and independent variables.

Which are height and gender.

Run an independent/student's t test or just use a calculator if your knowledge of statistics software isn't up to scratch. Might take some time though. smile

Add a variable, say children's height, and you'd use ANOVA or MANOVA, assuming parametricity.

If you want the means for the two combined, then just do that. Why would you accidentally pick the male group twice? I'd suggest if the numbers are that close you might want to check for normal distribution (parametricity) of data.

nammynake

2,590 posts

173 months

Thursday 19th February 2015
quotequote all
Rather than worrying about how to combine variances (doesn't sound a sensible thing to do if they have different means/variances), just calculate the variance of the combined dataset.

RizzoTheRat

Original Poster:

25,165 posts

192 months

Friday 20th February 2015
quotequote all
Trouble is I don't have the individual data points. Tried to simplify by the heights analogy but maybe that's not helpful.

Actual problem is looking at a computer model with a lot of inputs and a lot of outputs. I have a program someone has written to vary inputs by Morris and by Sobol methods to allow us to look at which inputs have the greatest effect on the outputs. The only problem is that it's limited in the number of runs it can perform, with 37 inputs it can do 420 runs whcih seems to be enough to get a stable result using Morris (two runs show the same rank order for the variance of each input even if the numbers aren't identical), but Sobol seems to need way more. It's spitting out a variance for each input against the individual outputs, and I'm wondering if I can legitimately combine the variance from 2 sets of runs to effectively have a data set twice the size.

V8LM

5,174 posts

209 months

Saturday 21st February 2015
quotequote all
If you don't know the means then you can't.