Statistics: Definitions, Types, Formulas, Applications & Importance

Edited By Komal Miglani | Updated on Dec 19, 2024 03:10 PM IST

Everyday we come across a lot of information in form of facts, news, social media posts, etc. These informations are gathered by newspapers, televisions, magazines, social media and other means of communication. These may be about the cricket averages, business insights, climate, election resluts, budget plan etc.

The world has become more and more information-oriented. Everything around us is Data. Billions of data are generated every day by means of facts, numerals, tables, graphs, etc. So, What is Data? Data is a collection of information.

This article is about the topic of Statistics class 9, statistics class 10, and statistics class 11. Now, let's go through the basic concepts of statistics in this article.

What is Statistics?

Statistics is a branch of mathematics dealing with data and its interpretation. Statistics is the science of counting, averages, estimates, and probability. Statistics has many important applications in numerous fields like weather forecasting, Education, Finance, Maintenance of Social Media, Astronomy, etc. Sir Ronald Alymer Fisher is credited as the Father of Statistics.

Statistics Meaning

Statistics can be defined as the branch of mathematics that deals with the collection, analysis, and interpretation of numerical data. It is used to draw a summary and conclusion from data. Statistics is of two types, namely, Descriptive Statistics and Inferential Statistics.

Descriptive Statistics

Descriptive Statistics is about organizing and summarizing the collected data to understand its meaning and the purpose of it.

Inferential Statistics

Inferential Statistics is about predicting and drawing a conclusion from the organized data.

For example, In a set of marks of a student population, descriptive statistics is finding the average marks, the range of marks from highest to lowest, etc. while inferential statistics is about giving a conclusion like comparing this set of marks with the previous test marks to find the improvement.

Statistics Formulas

The interpretation of data using statistics is mainly based on statistics formulas. The statistics formulas for the basic concepts like mean, median, mode for different representations of data are given below.

Statistics Formulas

Statistics Class 9 Concept and Formulae

Data

Any bit of information is data. For example, the average marks you obtained in your exams are data.

Data is a collection of information. The data once collected must be arranged or organized in a way so that inferences or conclusions can be made out from it.

Types Of Data: Data ccan be classified into two types, namely,

1. Quantitative Data: It is the numeric data that is countable.

2. Qualitative Data: It is non-numeric data that describes qualities or characteristics.

Data Collection Method: The collection of data is done by various methods such as,

1. Surveys: Surveys are data collected by inquiring the individuals.

2. Experiments: Experiments include collecting the information but under some conditions.

3. Observational Studies: It is a collection of data made by observation of a experiment.

Representation of Data

The data once collected must be arranged or organized in a way so that inferences or conclusions can be made out from it.

The following are the ways for representation of data

Ungrouped distribution
Ungrouped frequency distribution
Grouped frequency distribution

The frequency represents the number of times a data appears in a set.

Ungrouped distribution

Consider the marks obtained (out of $100$ marks) by $30$ students of Class $XI$ of a school:

$\begin{equation}
\begin{array}{llllllllll}
10 & 20 & 36 & 92 & 95 & 40 & 50 & 56 & 60 & 70 \\
92 & 88 & 80 & 70 & 72 & 70 & 36 & 40 & 36 & 40 \\
92 & 40 & 50 & 50 & 56 & 60 & 70 & 60 & 60 & 88
\end{array}
\end{equation}$

This representation is called Ungrouped distribution, as all the values are simply mentioned and separated by a comma.

Ungrouped Frequency Distribution

Observe that, $4$ students got $70$ marks. So the frequency of $70$ marks is $4$.

To make the data readable, we construct a table. Such representation of data is called Ungrouped Frequency Distribution.

\begin{array}{|c|c|}\hline \mathbf { Marks } & {\mathbf { Number\;of \;students }} \\ \hline 10 & {1} \\ 20 & {1} \\ {36} & {3} \\ {40} & {4} \\ {50} & {3} \\ {56} & {2} \\ {60} & {4} \\ {70} & {4} \\ {72} & {1} \\ {80} & {1} \\ {98} & {2} \\ {92} & {3} \\ {95} & {1} \\ \hline{\mathbf { Total }} & \mathbf{30} \\ \hline\end{array}

Grouped Frequency Distribution

We can show data as ranges of marks and the number of students that obtained marks in that range.

So we can represent this data as

\begin{array}{|l|c|c|c|c|c|c|}\hline \text { Class interval } & {10-25} & {25-40} & {40-55} & {55-70} & {70-85} & {85-100} \\ \hline \text { Number of students } & {2} & {3} & {7} & {6} & {6} & {6} \\ \hline\end{array}

Here we have taken groups (range) of marks. So, it is called Grouped distribution.

Also, the difference in marks in each interval is $15 (25-10 = 15, 70-55=15,...).$ This number is called the width of the class interval. Here width is 15, but we can take any width as per our convenience.

The above table is called a Grouped frequency distribution.

Measures of Central Tendency

It is often convenient to have one number that represent the whole data. Such a number is called a Measures of Central Tendency. The measure of central tendency usually will be near to the middle value of the data. For a given data there exist several types of measures of central tendencies. The most common among them are

NEET Highest Scoring Chapters & Topics

This ebook serves as a valuable study guide for NEET exams, specifically designed to assist students in light of recent changes and the removal of certain topics from the NEET exam.

Download E-book

Mean: Mean of the given values is sum of all the observations divided by the total number of observations. If we have $n$ values in a data set, i.e. $x_1, x_2, x_3, \ldots, x_n$, then its mean, usually denoted by $\bar{x}$ (pronounced " $x$ bar"), is:

$
\bar{x}=\frac{x_1+x_2+\cdots+x_n}{n}
$

For example, to calculate the mean of maths marks of 50 students, add the 50 marks together and divide by 50. Technically this is the arithmetic mean.

Mean of the Ungrouped Data

If n observations in data are $\mathrm{x}_1, \mathrm{x}_2, \mathrm{x}_3, \ldots \ldots, \mathrm{x}_n$, then arithmetic mean $\bar{x}$ is given by

$
\bar{x}=\frac{x_1+x_2+x_3+\ldots \cdots+x_n}{n}=\frac{1}{n} \sum_\limits{i=1}^n x_i
$

Mean of Ungrouped Frequency Distribution

If observations in data are $x_1, x_2, x_3, \ldots \ldots, x_n$ with respective frequencies $f_1, f_2$, $f_3, \ldots \ldots, f_n$; then

Sum of the value of the observations $=f_1 x_1+f_2 x_2+f_3 x_3+\ldots \ldots .+f_n x_n$
and Number of observations $=f_1+f_2+f_3+\ldots .+f_n$
The mean of ungrouped frequency distribution is given by

$\bar{x}=\frac{f_1 x_1+f_2 x_2+f_3 x_3+\ldots \ldots+f_n x_n}{f_1+f_2+f_3+\ldots \ldots+f_n}=\frac{\sum_\limits{i=1}^n f_i x_i}{\sum_\limits{i=1}^n f_i}$

Median: The median is the middle value for a set of data that has been arranged in ascending or descending order.

Median is a value that separates ordered data into 2 equal halves. Half the values are the same or smaller number than the median while the other half the values are the same or larger number.

For example, to find the median of the following data

$\begin{array}{lllllllllll}65 & 55 & 89 & 56 & 35 & 14 & 56 & 55 & 87 & 45 & 92\end{array}$

Now rearrange the data into ascending order.
$\begin{array}{lllllllllll}14 & 35 & 45 & 55 & 55 & 56 & 56 & 65 & 87 & 89 & 92\end{array}$
The median mark is the value exactly in the middle - in this case, 56
When the $n$ is even in the data set, then simply you have to take the middle two scores and average them.

Median helps do Income distribution analysis.

Median of Ungrouped Data

If the number of observations is $n$,
First arrange the observations in ascending or descending order.

If n is odd :

$
\text { Median }=\left(\frac{n+1}{2}\right)^{t h} \text { observation }
$

If n is even :

$
\text { Median }=\frac{\text { Value of }\left(\frac{n}{2}\right)^{t h} \text { observation }+ \text { Value of }\left(\frac{n}{2}+1\right)^{t h} \text { observation }}{2}
$

For example,

Consider the data: $1 ; 11.5 ; 6 ; 7.2 ; 4 ; 8 ; 9 ; 10 ; 6.8 ; 8.3 ; 2 ; 2 ; 10 ; 1$
Ordered from smallest to largest: : $1 ; 1 ; 2 ; 2 ; 4 ; 6 ; 6.8 ; 7.2 ; 8 ; 8.3 ; 9 ; 10 ; 10 ; 11.5$
Since there are 14 observations, the median is average of $(\mathrm{n} / 2) \mathrm{th}=7$ th and $(\mathrm{n} / 2$ +1 )th $=8$ th term. So median is the average of 6.8 and 7.2 , which equals 7 .

Median of Ungrouped Frequency Distribution

To find the median, first arrange the observations in ascending order. After this the cumulative frequencies are obtained.

Let the sum of frequencies is denoted by N .
Now if $N$ is odd, then identify the observation whose cumulative frequency equal to or just greater than $\frac{N+1}{2}$. This value of the observation lies in the middle of the data and therefore, it is the required median.

If $N$ is even, then find two observations, first whose cumulative frequency equal to or just greater than (N/2) and second whose cumulative frequency equal to or just greater than $(\mathrm{N} / 2+1)$. The median is the average of these two observations

Mode: The mode is the most frequent value in our data set. Normally, the mode is used for categorical data to find the most common category,

$
\begin{array}{llllllllllll}
65 & 55 & 89 & 56 & 35 & 14 & 56 & 55 & 87 & 45 & 92 & 55
\end{array}
$

in the above case, the mode of the data set is 55.

Statistics Class 10 Concept and Formulae

Mean of Grouped Frequency Distribution

$x_i$ is taken as mid-point of respective classes (or interval). i.e.,

$
m=\frac{\text { lower boundary }+ \text { upper boundary }}{2}
$

then, $\bar{x}=\frac{\sum_\limits{i=1}^n f_i m_i}{\sum_\limits{i=1}^n f_i}$

For example,
Give the mean of the following data.

$
\begin{array}{|c|c|}
\hline \text { Grade Interval } & \text { Number of Students } \\
\hline 10-12 & 1 \\
\hline 12-14 & 2 \\
\hline 14-16 & 0 \\
\hline 16-18 & 4 \\
\hline 18-20 & 1 \\
\hline
\end{array}
$

First find the midpoints for all intervals

$
\begin{array}{|c|c|}
\hline \text { Grade Interval } & \text { Midpoint } \\
\hline 10-12 & 11 \\
\hline 12-14 & 13 \\
\hline 14-16 & 15 \\
\hline 16-18 & 17 \\
\hline 18-20 & 19 \\
\hline
\end{array}
$

Now calculate the sum of the product of each interval frequency and midpoint,

$
\begin{aligned}
& \sum_{i=i}^n f_i m_i \\
& 11(1)+13(2)+15(0)+17(4)+19(1)=124 \\
& \bar{x}=\frac{\sum_\limits{i=1}^n f_i m_i}{\sum_\limits{i=1}^n f_i}=\frac{124}{8}=15.5
\end{aligned}
$

Median of Continuous Frequency Distribution

In this case, the following formula can be used when observations arranged in ascending order

$
\text { Median }=l+\frac{\left(\frac{N}{2}-c f\right)}{f} \times h
$

where,
I = lower limit of median class,
$\mathrm{N}=$ number of observations,
cf = cumulative frequency of class preceding the median class,
$f=$ frequency of median class,
$\mathrm{h}=$ class size (width) (assuming class size to be equal).

Mode of Grouped Frequency Distribution

Mode is the value among the observations that occur most often, that is, the value of the observation having the maximum frequency.

In a grouped frequency distribution, it is not possible to determine the mode by looking at the frequencies. Here, we can only locate a class with the maximum frequency, called the modal class. The mode is a value inside the modal class, and is given by the formula:

Mode $=l+\left(\frac{f_1-f_0}{2 f_1-f_0-f_2}\right) \times h$
where
I = lower limit of the modal class,
$\mathrm{h}=$ size of the class interval (assuming all class sizes to be equal),
$\mathrm{f}_1=$ frequency of the modal class,
$\mathrm{f}_0=$ frequency of the class preceding the modal class,
$\mathrm{f}_2=$ frequency of the class succeeding the modal class.

Statistics Class 11 Concpet and Formulae

Measures of Dispersion of Data

Dispersion is a measure which gives the scatteredness of the values. It helps us to know how observations spread out (or) is scattered throughout the data.

The following are the measures of dispersion:

Range
Mean Deviation
Standard deviation and Variance

Range: The difference between the largest value and the smallest value of a distribution is called the range.

Mean Deviation: Mean deviation measures the deviation of the average mean to the given set of data.

Mean deviation for ungrouped data

Let $n$ observations are $\mathrm{x_1}, \mathrm{x_2}, \mathrm{x_3}, \ldots ., \mathrm{x}_{\mathrm{n}}$.
If $x$ is a number, then its deviation from any given value $a$ is $|x-a|$

To find the mean deviation about mean or median or any other value M,

Mean deviation about 'a', M.D. $(a)=\frac{1}{n} \sum_{i=1}^n\left|x_i-a\right|$
Mean deviation about mean, M.D. $(\bar{x})=\frac{1}{n} \sum_{i=1}^n\left|x_i-\bar{x}\right|$
Mean deviation about median, M.D.(Median $) \left.=\frac{1}{n} \sum_{i=1}^n \right\rvert\, x_i-$ Median $\mid$

Mean deviation for ungrouped frequency distribution

Let the given data consist of ${n}$ distinct values ${x_1}, {x_2}, \ldots, x_n$ occurring with frequencies ${f_1}, {f_2}, \ldots, f_n$ respectively.

$
\begin{array}{lll}
x: x_1 & x_2 & x_3 \ldots x_n \\
f: f_1 & f_2 & f_3 \ldots f_n
\end{array}
$

1. Mean Deviation About Mean

First find the mean, i.e.

$
\bar{x}=\frac{\sum_{i=1}^n x_i f_i}{\sum_{i=1}^n f_i}=\frac{1}{\mathrm{~N}} \sum_{i=1}^n x_i f_i
$

N is the sum of all frequencies
Then, find the deviations of observations $x_i$ from the mean $\bar{x}$ and take their absolute values, i.e., $\left|x_i-\bar{x}\right|$ for all $i=1,2, \ldots, n$

Now, find the mean of the absolute values of the deviations.
$\operatorname{M.D.}(\bar{x})=\frac{\sum_{i=1}^n f_i\left|x_i-\bar{x}\right|}{\sum_{i=1}^n f_i}=\frac{1}{N} \sum_{i=1}^n f_i\left|x_i-\bar{x}\right|$

2. Mean Deviation About any value 'a'

$
\text { M.D.(a) }=\frac{1}{\mathrm{~N}} \sum_{i=1}^n f_i\left|x_i-\mathrm{a}\right|
$

Mean deviation for grouped frequency distribution

The formula for mean deviation for grouped frequency distribution is the same as in the case of ungrouped frequency distribution. Here, $x_i$ is the midpoint of each class.
Note
The mean deviation about the median is the lowest value compared to other mean deviations about any other value.

Standard Deviation: Standard deviation is the positive square root of the mean of the squares of deviations of the given values from their mean. Standard deviation helps us know about how far the values are spreading or deviating from the mean. The standard deviation is usually denoted by $\sigma$ and it is given by

$
\sigma=\sqrt{\frac{1}{n} \sum_{i=1}^n\left(x_i-\bar{x}\right)^2}
$

Variance: The mean of the squares of the deviations from the mean is called the variance and is denoted by $\sigma^2$ (read as sigma square).

The variance of $n$ observations $x_1, x_2, \ldots, x_n$ is given by

$
\sigma^2=\frac{1}{n} \sum_{i=1}^n\left(x_i-\bar{x}\right)^2
$

Variance and Standard Deviation of Ungrouped Frequency Distribution

The given data is

$
\begin{aligned}
& x: x_1, \quad x_2, \quad x_3, \quad \ldots \quad x_n \\
& f: f_1, f_2, f_3, \ldots f_n
\end{aligned}
$

In this case, Variance $\left(\sigma^2\right)=\frac{1}{N} \sum_{i=1}^n f_i\left(x_i-\bar{x}\right)^2$ and, Standard Deviation $(\sigma)=\sqrt{\frac{1}{N} \sum_{i=1}^n f_i\left(x_i-\bar{x}\right)^2}$ where, $\mathrm{N}=\sum_{i=1}^n f_i$

Variance and Standard deviation of a grouped frequency distribution

Variance, $\sigma^2 = \frac{1}{N}\left[\sum_{i=1}^n f_i x_i^2+\bar{x}^2 N-2 \bar{x} \cdot N \bar{x}\right] $

Standard Deviation, $\sigma = \sqrt{\frac{1}{\mathrm{~N}} \sum_{i=1}^n f_i x_i^2-\bar{x}^2}$

Coefficient of Dispersion

The measure of variability which is independent of units is called coefficient of dispersion.

1. Coefficient of Range: The coefficient of the range equals $\frac{x_{\max }-x_{\min }}{x_{\max }+x_{\min }}$
Where $x_{\max }$ is the highest observation, and $x_{\min }$ is the lowest observation

2. Coefficient of Mean Deviation: The coefficient of mean deviation is $=\frac{M D}{\bar{x}}$ where $MD$ is the mean deviation and $\bar{x}$ is the mean of the data.

3. Coefficient of Standard Deviation: The coefficient of standard deviation is $\frac{\sigma}{\bar{x}}$
where $\sigma$ and $\bar{x}$ are the standard deviation and mean of the data respectively.

4. Coefficient of Variance: The mean of the squares of the deviations from the mean is called the variance and is denoted by $\sigma^2$ (read as sigma square).
Variance is a quantity that leads to a proper measure of dispersion.
The variance of $n$ observations $x_1, x_2, \ldots, x_n$ is given by

$
\sigma^2=\frac{1}{n} \sum_{i=1}^n\left(x_i-\bar{x}\right)^2
$

The coefficient of variation is defined as

$
\text { C.V. }=\frac{\sigma}{\bar{x}} \times 100, \bar{x} \neq 0
$

where $\sigma$ and $\bar{x}$ are the standard deviation and mean of the data. It is consistent than the other and thus is considered better.

Importance of Statistics

Statistics is widely used to handle and analyze data to draw valid inferences in various diversified spheres of life like social, economic and political with wide applications in almost all sciences such as biology, psychology, education, business management, etc. It is nowadays hardly possible to enumerate even a single department of human activity without statistics.

Statistics have a significant weighting in JEE Main exam, which is a national level exam for 12th grade students that aids in admission to the country's top engineering universities. It is one of the most difficult exams in the country, and it has a significant impact on students' futures. Several students begin studying as early as Class 11 in order to pass this test. When it comes to math, the significance of these chapters cannot be overstated due to their great weightage. You may begin and continue your studies with the standard books and these revision notes, which will ensure that you do not miss any crucial ideas and can be used to revise before any test or actual examination.

How to study Statistics?

Start with understanding the basic concepts like what is data, data collection toos, measures of central tendency and dispersion, etc. To get a good score in statistics, it is important to master the statistics formulas. After studying these concepts go through solved examples and then go to mcq and practice the problem to make sure you understood the topic. Solve the questions of the books which you are following and then go to previous year papers.

If you are preparing for competitive exams then solve as many problems as you can. Do not jump on the solution right away. Remember if your basics are clear you should be able to solve any question on this topic.

Important Books for Statistics

Start from NCERT Books, the illustration is simple and lucid. You should be able to understand most of the things. Solve all problems (including miscellaneous problem) of NCERT. If you do this, your basic level of preparation will be completed.

Then you can refer to the book Intermediate Mathematical Statistics Book by G. P. Beaumont or Introduction to the Theory of Statistics by Alexander M Mood. Statistics are explained very well in these books and there are an ample amount of questions with crystal clear concepts. Choice of reference book depends on person to person, find the book that best suits you the best, depending on how well you are clear with the concepts and the difficulty of the questions you require.