Total Probability Theorem and Bayes' Theorem

Total Probability Theorem and Bayes' Theorem

Komal MiglaniUpdated on 02 Jul 2025, 07:54 PM IST

Probability is a part of mathematics that deals with the likelihood of different outcomes occurring. It plays an important role in estimating the outcome or predicting the chances of that event. It is useful in real-life applications that solve complex problems and provide insightful insights. Two fundamental concepts under Probability are the Total probability Theorem and Bayes' theorem. This theorem helps us to find out the likelihood of events under certain conditions.

Total Probability Theorem and Bayes' Theorem
Total Probability Theorem and Bayes' Theorem

Theorem of Total Probability

Suppose $A_1, A_2, \ldots, A_n$ are $n$ mutually exclusive and exhaustive set of events and suppose that each of the events $A_1, A_2, \ldots, A_n$ has a nonzero probability of occurrence. Let $A$ be any event associated with $S$, then

$
P(A)=P\left(A_1\right) P\left(A \mid A_1\right)+P\left(A_2\right) P\left(A \mid A_2\right)+\ldots+P\left(A_n\right) P\left(A \mid A_n\right)
$

As from the image, $A_1, A_2, \ldots, A_n$ are $n$ mutually exclusive and exhaustive set of events
Therefore,

$
S=A_1 \cup A_2 \cup \ldots \cup A_n
$

And

$
A i \cap A j=\varphi, i \neq j, i, j=1,2, \ldots, n
$

Now, for any event A

$
\begin{aligned}
A & =A \cap S \\
& =A \cap\left(A_1 \cup A_2 \cup \ldots \cup A_n\right) \\
& =\left(A \cap A_1\right) \cup\left(A \cap A_2\right) \cup \ldots \cup\left(A \cap A_n\right)
\end{aligned}
$

Also $A \cap A i$ and $A \cap A j$ are respectively the subsets of $A i$ and $A j$
Since, $A i$ and $A j$ are disjoint, for $i \neq \ldots$, therefore, $A \cap A i$ and $A \cap A j$ are also disjoint for all $i \neq j, i, j=1,2, \ldots, n$.
Thus,

$
\begin{aligned}
P(A) & =P\left[\left(A \cap A_1\right) \cup\left(A \cap A_2\right) \cup \ldots . . \cup\left(A \cap A_n\right)\right] \\
& =P\left(A \cap A_1\right)+P\left(A \cap A_2\right)+\ldots+P\left(A \cap A_n\right)
\end{aligned}
$

Using the multiplication rule of probability

$
P(A \cap A i)=P(A i) P(A \mid A i) \text { as } P(A i) \neq 0 \forall i=1,2, \ldots, n
$

Therefore, $\quad$

$
P(A)=P\left(A_1\right) P\left(A \mid A_1\right)+P\left(A_2\right) P\left(A \mid A_2\right)+\ldots+P\left(A_n\right) P\left(A \mid A_n\right)
$

or

$
\mathrm{P}(\mathrm{A})=\sum_{i=1}^n \mathrm{P}\left(\mathrm{A}_i\right) \mathrm{P}\left(\mathrm{A} \mid \mathrm{A}_i\right)
$

De'morgans Laws
If $A$ and $B$ are any two sets then

$
\begin{aligned}
& \Rightarrow(A \cup B)^{\prime}=A^{\prime} \cap B^{\prime} \\
& \Rightarrow(A \cap B)^{\prime}=A^{\prime} \cup B^{\prime} \\
& \Rightarrow P\left(A_1 \cap A_2 \cap A_3 \cdots \cap A_n\right)=P\left(A_1\right) P\left(\frac{A_2}{A_1}\right) P\left(\frac{A_3}{A_1 A_2}\right) \cdots P\left(\frac{A_n}{A_1 A_2 A_3 \cdots A_{n-1}}\right){, \text {where } \mathrm{A}_1, \mathrm{~A}_2 \ldots \mathrm{A} n \text { are } \mathrm{n} \text { events }}
\end{aligned}
$

Bayes’ Theorem

Suppose $\underline{A_1}, \underline{A_2}, \ldots, A_n$ are $n$ mutually exclusive and exhaustive set of events. Then the conditional probability that $\underline{A_i}$, happens (given that event $A$ has happened) is given by

$
\begin{aligned}
& \mathrm{P}\left(\mathrm{A}_i \mid \mathrm{A}\right)=\frac{\mathrm{P}\left(\mathrm{A}_{\mathrm{i}} \cap \mathrm{A}\right)}{\mathrm{P}(\mathrm{A})}=\frac{\mathrm{P}\left(\mathrm{A}_i\right) \mathrm{P}\left(\mathrm{A} \mid \mathrm{A}_i\right)}{\sum_{j=1}^n \mathrm{P}\left(\mathrm{A}_j\right) \mathrm{P}\left(\mathrm{A} \mid \mathrm{A}_j\right)} \\
& \text { for any } i=1,2,3, \ldots, n
\end{aligned}
$

Solved Examples Based on Bayes' theorem and Theorem of Total Probability:

Example 1: Out of three bags that each contains 10 marbles:
- Bag 1 has 6 red and 4 blue marbles
- Bag 2 has 5 red and 5 blue marbles.
- Bag 3 has 2 red and 8 blue marbles.

What is the Probability of choosing a red marble?
1) $\frac{11}{30}$
2) $\frac{13}{30}$
3) $\frac{17}{30}$
4) $\frac{7}{30}$

Solution
The law of Total Probability -
Let $S$ be the sample space and $E_1, E_2, \ldots . . E_n$ be $n$ mutually exclusive and exhaustive events associated with a random experiment.
- wherein

$
\Rightarrow P(A)=P\left(A \cap E_1\right)+P\left(A \cap E_2\right)+\cdots+P\left(A \cap E_n\right)
$

$
\Rightarrow P(A)=P\left(E_1\right) \cdot P\left(\frac{A}{E_1}\right)+P\left(E_2\right) \cdot P\left(\frac{A}{E_2}\right)+\cdots P\left(E_n\right) \cdot P\left(\frac{A}{E_n}\right)
$

where $A$ is any event which occurs with $E_1, E_2, E_3 \ldots \ldots E_n$.

$
\begin{aligned}
& P\left(\frac{R}{B_1}\right)=0.6 ; P\left(\frac{R}{B_2}\right)=0.2 \\
& \text { Thus } P(R)=\frac{1}{3}(0.6)+\frac{1}{3}(0.5)+\frac{1}{3}(0.2) \\
& =\frac{1.3}{3} \\
& =\frac{13}{30}
\end{aligned}
$
Hence, the answer is option 2.

Example 2: In a box, there are 20 cards, out of which 10 are labeled as A and the remaining 10 are labeled as B. Cards are drawn at random, one after the other, and with replacement, till a second A-card is obtained. The probability that the second A-card appears before the third B-card is:

1) $\frac{15}{16}$
2) $\frac{9}{16}$
3) $\frac{13}{16}$
4) $\frac{11}{16}$

Solution
$A A+A B A+B A A+A B B A+B B A A+B A B A$
$\frac{1}{4}+\frac{1}{8}+\frac{1}{8}+\frac{1}{16}+\frac{1}{16}+\frac{1}{16}=\frac{11}{16}$
Hence, the answer is the option 4.

Example 3: In a game two players, A and B take turns in throwing a pair of fair dice starting with player $A$ and a total of scores on the two dice, in each throw is noted $A$ he throws a total 7 before $A$ throws a total six. The game stops as soon as either of the players wins. The probability of $A$ winning the game is:
1) $\frac{5}{31}$
2) $\frac{31}{61}$
3) $\frac{5}{6}$
4) $\frac{30}{61}$

Solution
$
\begin{aligned}
& P(\text { Sum } 6)=\frac{5}{36} \\
& (1,5),(5,1),(2,4),(4,2),(3,3) \\
& P(\text { Sum } 7)=\frac{6}{36}=\frac{1}{6} \\
& (1,6),(6,1),(2,5),(5,2),(3,4),(4,3)
\end{aligned}
$
If A represents total of 6 for $A, A^{\prime}$ represents total which is other than 6, B represents total of 7 for $B$, and $B$ ' represents when it is not 7

$
\begin{aligned}
& \text { A wins when } A, A^{\prime} B^{\prime} A, A^{\prime} B^{\prime} A^{\prime} B^{\prime} A, \ldots \ldots \\
& \therefore P(\text { A wins })=\frac{5}{36}+\frac{31}{36} \times \frac{5}{6} \times \frac{5}{36}+\frac{31}{36} \times \frac{5}{6} \times \frac{31}{36} \times \frac{5}{6} \times \frac{5}{36}+\ldots \\
& =\frac{\frac{5}{36}}{1-\frac{31}{36} \times \frac{5}{6}}=\frac{30}{61}
\end{aligned}
$

Example 4: In a group of 400 people, 160 are smokers and non-vegetarians; 100 are smokers and vegetarians and the remaining 140 are non-smokers and vegetarians. Their chances of getting a particular chest disorder are 35%, 20% and 10% respectively. A person is chosen from the group at random and is found to be suffering from a chest disorder. The probability that the selected person is a smoker and non-vegetarian is :

1) $\frac{7}{45}$
2) $\frac{14}{45}$
3) $\frac{28}{45}$
4) $\frac{8}{45}$

Solution
Consider following events
A: The person chosen is a smoker and non-vegetarian.
B: The person chosen is a smoker and vegetarian.
C: The person chosen is a non-smoker and vegetarian.
E: The person chosen has a chest disorder.
Given

$
\begin{aligned}
& \mathrm{P}(\mathrm{A})=\frac{160}{400}, \mathrm{P}(\mathrm{B})=\frac{100}{400}, \mathrm{P}(\mathrm{C})=\frac{140}{400} \\
& \mathrm{P}\left(\frac{\mathrm{E}}{\mathrm{A}}\right)=\frac{35}{100}, \mathrm{P}\left(\frac{\mathrm{E}}{\mathrm{B}}\right)=\frac{20}{100}, \mathrm{P}\left(\frac{\mathrm{E}}{\mathrm{C}}\right)=\frac{10}{100}
\end{aligned}
$

We need to find the probability that the selected person is a smoker and non-vegetarian, that is

$
\begin{aligned}
& P\left(\frac{A}{E}\right)=\frac{P(A) P\left(\frac{E}{A}\right)}{P(A) \cdot P\left(\frac{E}{A}\right)+P(B) \cdot P\left(\frac{E}{B}\right)+P(C) \cdot P\left(\frac{E}{C}\right)} \\
& P\left(\frac{A}{E}\right)=\frac{\frac{160}{400} \times \frac{35}{100}}{\frac{160}{400} \times \frac{35}{100}+\frac{100}{400} \times \frac{20}{100}+\frac{140}{400} \times \frac{10}{100}}=\frac{28}{45}
\end{aligned}
$
Hence, the answer is the option 3.

Example 5: If A and B are any two events such that $\mathrm{P}(\mathrm{A})=2 / 5$ and $P(A \cap B)=3 / 20$ then the conditional probability, $P\left(A /\left(A^{\prime} \cup B^{\prime}\right)\right)$, where $A^{\prime}$ denotes the complement of A , is equal to :
1) $1 / 4$
2) $5 / 17$
3) $8 / 17$
4) $11 / 20$

Solution
Given:

$
P(A)=\frac{2}{5}, P(A \cap B)=\frac{3}{20}
$
Now,

$
P\left(\frac{A}{A^{\prime} \cup B^{\prime}}\right)=\frac{P\left(A \cap\left(A^{\prime} \cup B^{\prime}\right)\right)}{P\left(A^{\prime} \cup B^{\prime}\right)}
$
Here,

$
P\left(A^{\prime} \cup B^{\prime}\right)=P(A \cap B)^{\prime}=1-P(A \cap B)=1-\frac{3}{20}=\frac{17}{20}
$

(Using De-Morgan's Law)

$
\begin{aligned}
& \text { And } P\left(A \cap\left(A^{\prime} \cup B^{\prime}\right)\right)=P\left(\left(A \cap A^{\prime}\right) \cup\left(A \cap B^{\prime}\right)\right)=P\left(A \cap B^{\prime}\right)=P(A)-P(A \cap B) \\
& =\frac{2}{5}-\frac{3}{20}=\frac{5}{20} \\
& P\left(\frac{A}{A^{\prime} \cup B^{\prime}}\right)=\frac{\frac{5}{20}}{\frac{17}{20}}=\frac{5}{17}
\end{aligned}
$
Hence, the answer is the option 2.

Frequently Asked Questions (FAQs)

Q: How can the Total Probability Theorem be applied in the analysis of communication systems and information theory?
A:
In communication systems and information theory
Q: What is the "Monte Carlo method" and how can it be used to approximate probabilities in complex applications of these theorems?
A:
The Monte Carlo method is a computational technique that uses random sampling to obtain numerical results. In the context of the Total Probability Theorem and Bayes' Theorem, Monte Carlo methods can be used to approximate probabilities or expectations that are difficult or impossible to calculate analytically. By generating many random samples from the relevant distributions, we can estimate probabilities and update beliefs in complex scenarios. This approach is particularly valuable in high-dimensional problems, Bayesian inference with intractable posteriors, and in simulating complex systems in physics, finance, and other fields.
Q: How can the Total Probability Theorem be used to solve problems involving mixture distributions?
A:
The Total Probability Theorem is particularly useful for analyzing mixture distributions, which are probability distributions that combine two or more component distributions. By treating each component as a separate scenario, we can use the theorem to calculate overall probabilities or expectations. This approach is valuable in modeling complex phenomena in fields like genetics (gene frequencies in populations), finance (mixed investment strategies), and pattern recognition (mixed Gaussian models). It allows us to break down complex distributions into simpler, more manageable components.
Q: What is the "law of total variance" and how does it relate to the Total Probability Theorem?
A:
The law of total variance is an extension of the Total Probability Theorem to variances. It states that the variance of a random variable can be decomposed into the expected value of the conditional variance plus the variance of the conditional expectation. This law is crucial in statistics and probability theory as it allows us to analyze the variability of a random variable by considering both the variability within each condition and the variability between conditions. Understanding this relationship helps in variance decomposition and in analyzing the sources of uncertainty in complex systems.
Q: What is the "prior predictive distribution" in Bayesian inference and how does it relate to the Total Probability Theorem?
A:
The prior predictive distribution in Bayesian inference is the distribution of new data points before observing any data. It's obtained by integrating the likelihood function over the prior distribution of parameters. This concept is directly related to the Total Probability Theorem, as it involves summing (or integrating) over all possible parameter values, weighted by their prior probabilities. Understanding the prior predictive distribution is crucial for model checking and comparison in Bayesian statistics, as it represents our predictions based solely on prior knowledge.
Q: How does the Total Probability Theorem relate to the concept of "marginalization" in probability theory?
A:
The Total Probability Theorem is closely related to the concept of marginalization in probability theory. Marginalization refers to the process of summing or integrating out variables from a joint probability distribution to obtain a marginal distribution. The Total Probability Theorem can be seen as a form of marginalization where we sum over all possible values of a conditioning variable to obtain the unconditional probability of an event. This connection highlights the theorem's role in simplifying complex probability calculations and in bridging joint and marginal probabilities.
Q: How does the concept of "conditional independence" impact the application of these theorems?
A:
Conditional independence occurs when two events are independent given a third event. This concept is important in both the Total Probability Theorem and Bayes' Theorem. In the Total Probability Theorem, conditional independence can simplify calculations by allowing us to treat certain probabilities as independent within specific scenarios. In Bayes' Theorem, conditional independence assumptions are often used to simplify complex probabilistic models, such as in naive Bayes classifiers. However, it's crucial to carefully assess whether conditional independence holds in a given situation, as incorrect assumptions can lead to inaccurate results.
Q: How can the Total Probability Theorem be applied in risk assessment and decision analysis?
A:
In risk assessment and decision analysis, the Total Probability Theorem helps quantify overall risks or outcomes by considering various scenarios. It allows decision-makers to break down complex situations into manageable components, assign probabilities to each scenario, and then calculate the overall probability of success or failure. This approach is valuable in fields like project management, financial planning, and policy-making, where decisions often involve multiple uncertain factors and potential outcomes.
Q: Can you explain the "chain rule of probability" and its relationship to the Total Probability Theorem?
A:
The chain rule of probability, also known as the multiplication rule, allows us to express the joint probability of multiple events as a product of conditional probabilities. It's closely related to the Total Probability Theorem because both involve breaking down complex probabilities into simpler components. The chain rule can be seen as a generalization of the Total Probability Theorem for multiple events. Understanding this relationship helps in solving more complex probability problems and in developing probabilistic models in machine learning and statistics.
Q: How does the concept of "independence" relate to the Total Probability Theorem?
A:
Independence is a crucial concept in probability theory, and it has important implications for the Total Probability Theorem. When events are independent, the occurrence of one does not affect the probability of the other. In the context of the Total Probability Theorem, independence can simplify calculations because conditional probabilities reduce to unconditional probabilities. However, it's essential to carefully assess whether events are truly independent before making this assumption, as misapplying independence can lead to incorrect results.