RSS
Логотип
Баннер в шапке 1
Баннер в шапке 2
2020/06/26 10:12:20

Statistical metrics (selection, dispersion)

.

Content

Selection

Selection is a data set which got to a research. It can be representative, not absolutely or at all not. For example, we want to count the average salary in the city. Our selection on demographic ratios should match city statistics — then it will be representative.

Dispersion

Dispersion is a variability of data in our research, this parameter allows to understand what in general with selection to do. Let's say we want to count average temperature on hospital. Dispersion will be from +34 to +42 degrees Celsius, it rather low to apply an arithmetic average method. And here if to add a corpse of room temperature to selection, dispersion will turn out too big that selection was representative[1].

Dispersion is a measure of "dispersion" of a random variable from its most probable value. At pupils assessment can be from 2 to 5. If we consider that the most probable assessment at school students 3.5, then we have the dispersion equal 1.5. It is small dispersion. It allows us to say that the arithmetic average of a class is rather indicative if we want to compare what class knows mathematics better. By means of such argumentation it is much simpler to explain to mother the three, than to prove that at all in general two. Agree, "Mother, I drew a conclusion that my three with plus above an arithmetic average in a class that says that I deserve encouragement, but not punishments" sounds much more convincingly, than "Mother! Yes all have in general two!".

In a case with an average temperature on hospital everything becomes more interesting. Dispersion of temperature at the living person not such big — from about +34 to +42 °C at most expected +36.6 °C. It allows us to say that the arithmetic average is rather indicative for assessment of the situation. It is possible to tell that on average patients in infectious department are warmer than patients in traumatologic. However everything changes if to add a corpse with room temperature. It increases dispersion and leads to the fact that the average becomes absolutely not representative.

In the same way it is possible to look at statistics of average age of the birth of the first/second/third child at the woman. Why all consider women, but not men? With aggregation of data on men there are many problems: different dispersion in comparison with women (women have a period when they can have children, is much shorter, than at men), essentially different number of children who can appear during life, complexity with reliable paternity proof.

Notes