What is it? Is there any relation to the entropy concept? Why is it used for classification loss? What about the binary cross-entropy?
Some of us might have used the cross-entropy to calculate classification losses and wondered why we use the natural logarithm. Some might have seen the binary cross-entropy and wondered whether it is fundamentally different from the cross-entropy. If so, reading this article should help to demystify those questions.
The word “cross-entropy” has “cross” and “entropy” in it, and understanding the “entropy” part helps to understand the “cross” part.
So, let’s review the entropy formula.
Review of Entropy Formula
My article Entropy Demystified should help the understanding of the entropy concept if not already familiar with it.
Claude Shannon (https://en.wikipedia.org/wiki/Claude_Shannon) defined the entropy to calculate the minimum encoding size. He was looking for a way to efficiently send messages without losing information.
As we will see below, there are various ways of expressing the entropy.