Previously, a high school student Ken, and his math teacher Dr. Demystifier (Dr. D), discussed the Bayes theorem. This time, Lily — a friend of both Ken and Dr. D — challenges them with the Monty Hall problem.

They will discuss the following topics:

- Monty Hall Problem
- Bayesian Solution
- Subjective Prior Belief

Ken went to a cafe near his high school to buy lunch. He ordered a tall cappuccino and a tuna sandwich and stood by the coffee machine, hearing the sound of steam coming out as if from a bull’s nostrils.

Lily — a barista at the cafe —…

This is a fictional story of a high school student Ken, and his math teacher Dr. Demystifier (Dr. D). Ken has just learned the Bayes theorem but he was completely mystified by the formula, not knowing how to make use of it.

They will discuss the following topics:

- Bayes Theorem Derivation
- Belief Update Once
- Belief Update Twice
- Belief Update Forever

Ken asked Dr. D, “May I ask a question about the Bayes theorem?”

Dr. D nodded and said, “Please do”.

Ken continued, “The Bayes theorem says:

Have you ever wondered why we often use the normal distribution?

How do we derive it anyway?

Why do many probability distributions have the exponential term?

Are they related to each other?

If any of the above questions make you wonder, you are in the right place.

I will demystify it for you.

Suppose we want to predict if the weather of some place is fine or not.

We use the calculus of variations to optimize **functionals**.

You read it right: functionals not functions.

But what are functionals? What does a functional really look like?

Moreover, there is this thing called the **Euler-Lagrange equation**.

What is it? How is it useful?

How do we derive such equation?

If you have any of the above questions, you are in the right place.

I’ll demystify it for you.

Suppose we want to find out the shortest path from the point A to the point B.

Have you ever wondered why we use the Lagrange multiplier to solve constrained optimization problems?

Is it just a clever technique?

Since it is very easy to use, we learn it like a basic arithmetic by practicing it until we can do it by heart.

But have you ever wondered why it works? Does it always work? If not, why not?

If you want to know the answers to these questions, you are in the right place.

I’ll demystify it for you.

In case you are not familiar with what constrained optimizations are, I have written an article that explains…

Have you ever wondered what constrained optimization problems are?

Often times, the word **constrained optimization** is used as everyone knows what it is.

But it may not be so obvious for people who have not been exposed to such terminology before.

If you like to understand what the constrained optimization is and how to approach such problems, you are in the right place.

I’ll demystify it for you.

Suppose you are driving a car on a mountain road. You want to climb as high as possible to have a better view of the moon. …

*What does KL stand for? Is it a distance measure? What does it mean to measure the similarity of two probability distributions?*

If you want to intuitively understand what the KL divergence is, you are in the right place, I’ll demystify the KL divergence for you.

As I’m going to explain the KL divergence from the information theory point of view, it is required to know the entropy and the cross-entropy concepts to fully apprehend this article. …

*What is it? Is there any relation to the entropy concept? Why is it used for classification loss? What about the binary cross-entropy?*

Some of us might have used the cross-entropy for calculating classification losses and wondered why we use the natural logarithm. Some might have seen the binary cross-entropy and wondered whether it is fundamentally different from the cross-entropy or not. If so, reading this article should help to demystify those questions.

The word “cross-entropy” has “cross” and “entropy” in it, and it helps to understand the “entropy” part to understand the “cross” part.

So, let’s review the entropy…

Is it a disorder, uncertainty or surprise?

The idea of entropy is confusing at first because so many words are used to describe it: disorder, uncertainty, surprise, unpredictability, amount of information and so on. If you’ve got confused with the word “entropy”, you are in the right place. I am going to demystify it for you.

In 1948, Claude Shannon introduced the concept of information entropy in his paper “A Mathematical Theory of Communication”.

Source: https://en.wikipedia.org/wiki/Claude_Shannon

Shannon was looking for a way to efficiently send messages without losing any information.