Entropy and Information Gain
Neither 'Entropy' and 'Information' are concepts with very intuitive definitions. Most people learn about entropy in chemistry class where it is used to describe the amount of 'order' in a system. But how do you translate 'order' into a mathematical equation? And what about information? In data science the terms 'Entropy' and 'Information Gain' are usually used in the context of decision trees. Here entropy describes the 'purity' of a set, which of course is equivalent to the order of a system in chemistry. Decision trees try to split up a dataset based on differences in a single feature such that the split results in the 'purest' branches, meaning the lowest amount of variation in the target variable. Then the branches are split again according to the same criterion, until we reach the point where all branches are pure, or we decide the model is strong enough. The entropy (or often you will see cross-entropy or dev...