Posts

Kaggle Tensorflow Speech Recognition Challenge

Image
Implementation of the ResNet and CTC models at https://github.com/chrisdinant/speech In November of 2017 the Google Brain team hosted a speech recognition challenge on Kaggle . The goal of this challenge was to write a program that can correctly identify one of 10 words being spoken in a one-second long audio file. Having just made up my mind to start seriously studying data science with the goal of turning a new corner in my career, I decided to tackle this as my first serious kaggle challenge. In this post I will talk about ResNets, RNNs, 1D and 2D convolution, Connectionist Temporal Classification and more. Let's go! Exploratory Data Analysis The training data supplied by Google Brain consists of ca. 60,000 1-second-long .wav files in 32 directories that are named by the word spoken in the files. Only 10 of these are classes you need to identify, the others should go in the 'unknown' or 'silence' classes. There are a couple of things you can do to get a

What is Naive Bayes?

In my blog post Bayes and Binomial Theorem i talk about Bayes' theorem and how it is used to determine, or estimate rather, a conditional probability by turning the conditions around. ... In other words, you can use P(B|A) and prior probabilities P(A) and P(B) to calculate P(A|B). This is very powerful because often we have information on the former three probabilities but not on the latter. Naive Bayes is a classification algorithm that does this with features of a dataset. In non-math words: We calculate the probability of belonging to class A given feature vector B by multiplying the proportion of feature vector B in the population of class A with the proportion of class A and then divide the whole thing by the proportion of vector B in the population. This is in principle a very straight forward calculation, but you can probably tell when it will be hard or impossible to do: If we have many features, it becomes more and more unlikely that a specific feature vector

Entropy and Information Gain

Neither 'Entropy' and 'Information' are concepts with very intuitive definitions. Most people learn about entropy in chemistry class where it is used to describe the amount of 'order' in a system. But how do you translate 'order' into a mathematical equation? And what about information? In data science the terms 'Entropy' and 'Information Gain' are usually used in the context of decision trees. Here entropy describes the 'purity' of a set, which of course is equivalent to the order of a system in chemistry. Decision trees try to split up a dataset based on differences in a single feature such that the split results in the 'purest' branches, meaning the lowest amount of variation in the target variable. Then the branches are split again according to the same criterion, until we reach the point where all branches are pure, or we decide the model is strong enough. The entropy (or often you will see cross-entropy or dev

What is Logistic Regression?

Logistic Regression is closely related to Linear Regression. Read my post on Linear Regression here . Logistic Regression is a classification technique, meaning that the target Y  is qualitative instead of quantitative. For example, trying to predict whether a customer will stop doing business with you, a.k.a. churn. Logistic Regression models the probability that a measurement belongs to a class: If we would try to predict the target value directly, (let's say churn = 1 and not-churn = 0), as you would with Linear Regression, the model might output negative target values or values larger than 1 for certain predictor values. Probabilities smaller than zero or larger than 1 make no sense, so instead we can use the logistic, or sigmoid, function to model probabilities. This is also: which is the form you usually see it in when it is used as the activation function in a neural network layer. This function will only output values between 0 and 1, which you can t

What is Linear Regression?

Linear regression is used to model the relationship between continuous variables. For example to predict the price of a house when you have features like size in square meters and crime in the neighborhood etc.  A linear regression function takes the form of ... Here y is the target we're trying to predict (house price), the  x' s are the p  features or predictors (size, crime) and the β 's are the coefficients or the parameters that we are trying to estimate by fitting the model to data. The little hats on top of the  y and  β 's are called hats, and indicate we are dealing with estimates here. With multiple features it is called Multiple Linear Regression and when there's only one feature it is Simple Linear Regression. For Linear Regression the function does not have to be linear with regards to the predictors as long as it is linear in the parameters. This means that you can model interactions between predictors by, for example, multiplying x 's

What is k-Nearest Neighbors?

k-Nearest Neighbors or kNN is a classification algorithm that asigns a class to a new data point based on known data points that are similar, or nearby. What do you mean 'nearby'? To determine similarity of data you can use a few different distance algorithms. For example Euclidian distance , which is the square root of the sum of squares of the difference of the parameters of data points v and w. ... Or the City Block/Manhattan distance (yes that's what it's called) which is the sum of the absolute differences of v and w. ... Or the Cosine distance , which measures the "angle" between two parameter vectors v and w. ... What does kNN do? It goes through all the known data points and measures the distance to the new point. Then it assigns the class of the majority of  k  nearest data points to the new measurement. Yes, it goes through all training data every time you run this algorithm! No model is actually created. This is called a

What is Machine Learning?

When a computer learns from data it is called machine learning (ML). How do you teach a computer? You use ML algorithms. What is a machine learning algorithm? An ML algorithm is a function that takes in data and outputs a prediction. What kind of prediction? Predictions like, "This is a picture of a bicycle", or "Tomorrow it is going to rain", or "This customer will probably leave our company soon". Ok, so how does an algorithm do this? ML algorithms are designed to find attributes of datasets that describe groupings or trends within the data. This can be something simple as: "if a pixel in this location is blue you're looking at a smurf", or something complicated like: "a combination of this pixel intensity with this shape in this orientation here and this other shape there and so on, means this is a picture of Robert on a bicycle". Or another example: "if it rained yesterday and the air pressure is so and so and