If you’ve heard about the transposed convolution and got confused what it actually means, this article is written for you.

The content of this article is as follows:

- The Need for Up-sampling
- Why Transposed Convolution?
- Convolution Operation
- Going Backward
- Convolution Matrix
- Transposed Convolution Matrix
- Summary

The notebook is available in my GitHub.

When we use neural networks to generate images, it usually involves up-sampling from low resolution to high resolution.

There are various methods to conduct up-sampling operation:

- Nearest neighbor interpolation
- Bi-linear interpolation
- Bi-cubic interpolation

All these methods involve some interpolation method which we need to chose when deciding a network architecture. It is like a manual feature engineering and there is nothing that the network can learn about. …

Have you ever wondered why we often use the normal distribution?

How do we derive it anyway?

Why do many probability distributions have the exponential term?

Are they related to each other?

If any of the above questions make you wonder, you are in the right place.

I will demystify it for you.

Suppose we want to predict if the weather of some place is fine or not.

We use the calculus of variations to optimize **functionals**.

You read it right: functionals not functions.

But what are functionals? What does a functional really look like?

Moreover, there is this thing called the **Euler-Lagrange equation**.

What is it? How is it useful?

How do we derive such equation?

If you have any of the above questions, you are in the right place.

I’ll demystify it for you.

Suppose we want to find out the shortest path from the point A to the point B.

Have you ever wondered why we use the Lagrange multiplier to solve constrained optimization problems?

Is it just a clever technique?

Since it is very easy to use, we learn it like a basic arithmetic by practicing it until we can do it by heart.

But have you ever wondered why it works? Does it always work? If not, why not?

If you want to know the answers to these questions, you are in the right place.

I’ll demystify it for you.

In case you are not familiar with what constrained optimizations are, I have written an article that explains it. …

Have you ever wondered what constrained optimization problems are?

Often times, the word **constrained optimization** is used as everyone knows what it is.

But it may not be so obvious for people who have not been exposed to such terminology before.

If you like to understand what the constrained optimization is and how to approach such problems, you are in the right place.

I’ll demystify it for you.

Suppose you are driving a car on a mountain road. You want to climb as high as possible to have a better view of the moon. …

*What does KL stand for? Is it a distance measure? What does it mean to measure the similarity of two probability distributions?*

If you want to intuitively understand what the KL divergence is, you are in the right place, I’ll demystify the KL divergence for you.

As I’m going to explain the KL divergence from the information theory point of view, it is required to know the entropy and the cross-entropy concepts to fully apprehend this article. …

*What is it? Is there any relation to the entropy concept? Why is it used for classification loss? What about the binary cross-entropy?*

Some of us might have used the cross-entropy for calculating classification losses and wondered why we use the natural logarithm. Some might have seen the binary cross-entropy and wondered whether it is fundamentally different from the cross-entropy or not. If so, reading this article should help to demystify those questions.

The word “cross-entropy” has “cross” and “entropy” in it, and it helps to understand the “entropy” part to understand the “cross” part.

So, let’s review the entropy formula. …

Is it a disorder, uncertainty or surprise?

The idea of entropy is confusing at first because so many words are used to describe it: disorder, uncertainty, surprise, unpredictability, amount of information and so on. If you’ve got confused with the word “entropy”, you are in the right place. I am going to demystify it for you.

In 1948, Claude Shannon introduced the concept of information entropy in his paper “A Mathematical Theory of Communication”.

Source: https://en.wikipedia.org/wiki/Claude_Shannon

Shannon was looking for a way to efficiently send messages without losing any information.

If you see the above image and it does not make much sense, this article is written for you. I explain how GAN works using a simple project that generates hand-written digit images.

I use Keras on TensorFlow and the notebook code is available in **my Github**.

**GAN (Generative Adversarial Network)** is a framework proposed by Ian Goodfellow, Yoshua Bengio and others in 2014.

A GAN can be trained to generate images from random noises. For example, we can train a GAN to generate digit images that look like hand-written digit images from **MNIST** database.