What Is Supervised Learning (What Is Machine Learning | Machine Learning Basics)


Artificial Intelligence, Machine Learning – these words lately have been used synonymously – but should they be? In this third video in our artificial intelligence series and as for the purpose of this machine learning series, I’ll seek to answer that question, so sit back, relax, and join me on an exploration into the field of machine learning! To answer the question posed at the start of this video, we first need to acquire an understanding of what machine learning is. Machine learning is an immense topic, and many fields have adopted it and/or spawned from it, and this rate of adoption is only increasing. Some such fields include natural language processing, computer vision, computational biology and robotics – just a list a few. Now to define what machine learning is, let us first define, what learning is. As humans we have two primary modes of learning. 1) Declarative Knowledge, in other words, memorization, the accumulation of individual facts. And, 2) Imperative Knowledge, in other words, generalization, the ability to deduce new facts from old facts. As an extension of these two modes of learning, in the field of computing, machine learning then is any algorithm which can predict future results from past data. As quoted by Arthur Samuel, in 1959, a pioneer in computing who coined the term ‘machine learning’ and wrote a ‘self-learning’ program that played checkers, machine learning is, “a field of study that gives computers the ability to learn without being explicitly programmed”. Having algorithms that can infer new data from past data isn’t necessarily a new idea, in fact, large portions of the field of statistics are dedicated to this. One such algorithm in statistics is known as regression and has been around since the early 1800s. In regression, the goal is to mathematically measure the relationships between variables, model them with a line- of-best-fit and use that line to predict one another. Let’s view this with an example, say of the data of watch time and engagement, that being likes and comments, of some randomly sampled YouTube videos. Now, drawing a line through the data we can see a trend, where increasing watch time is correlated with increasing engagement. In terms of demonstrating deductive capability, given the watch time of a new video, using our line-of-best- fit we can predict the level of engagement for it and vice versa. While right now our model predicts other output variables through the use of regression, what about sorting types of data, known as classification problems. To view this, let’s now expand our example, along with tracking the watch time and engagement of select videos, we will also record if the video is recommended or not. In other words, we are trying to determine the value of the variables the YouTube algorithm uses to recommend videos. As you can see, our original data points have now been given labels, what is referred to as labeled data. Once data has been given labels, we can now proceed in classifying the output label of the data based on input variables. Similar as to the case of regression, we want to draw lines-of-best-fit to divide up our decision space, these lines define what is known as decision boundaries. From an eyeball perspective, let’s draw the boundaries at: if watch time is over 80% of the video duration and if 45% of the viewers of the video engage with it, then the video will be recommended, otherwise, it will not. Now let’s say we get a new video that we have to determine if was recommended. First, we measure its variables and plot in onto our decision space; this is referred to as unlabeled data. Based on our decision boundaries our model would predict the output label, in this case, the video is recommended. If we look closely at our now divided up data, we can see that 86 videos are correctly classified as not recommended and 87 as recommended. However, 14 videos were miss-classified as being recommended when they were not and 13 as not recommended when they were. This gives our model a predictive accuracy of 86.5%, calculated using the accuracy formula of the total number of correct model guesses, 173, divided up by all the data points, 200, or more specifically, the total of the true positives, 87, and true negatives, 86, divided by the total of all the true positives, true negatives, false positives, 14, and false negatives, 13. As viewable in this matrix, what is referred to as a confusion matrix used for determining the performance of machine learning models, a false positive is a result in which the model predicts an attribute is present, in this case recommendation, when in reality it is not. In contrast, a false negative is when the model predicts an attribute is absent when in reality it is actually there. Notice with our current classification model, there is no way to draw straight lines that would yield 100% accuracy. Move our decision boundary to the right requiring more engagement percentage and we miss-classify videos that are recommended to be not recommended, increasing the false negatives, and move it to require less engagement and we miss-classify videos that are not recommended to be recommended, increasing the false positives. At a high level and as we’ll see shortly, the job of machine learning algorithms is to maximize model accuracy. Now, this example we just went through is a type of machine learning algorithm referred to as decision trees. As a side note, this ‘tree-based’, in other words, conditional statement-based machine learning approach draws many parallels to expert systems, which we discussed in the previous video in this series. This is why expert systems were referred to as the first machine learning systems. Now, shifting our focus, there are many other types of algorithms that utilize a variety of approaches beyond just conditional statements to divide up a decision space, for instance, in the case of support vector machines. Now for the sake of time we obviously won’t cover all these different types of models in this video, but they are shown to satisfy curiosity and so you can learn more about them from other creators and resources. The key point to take away from all these models is that the lines don’t just have to be straight, in other words, linear, modeled by the formula of a line, y=mx + b. They can be quadratic, polynomial, exponential, etc. Now, our previous example utilized two variables to classify video data points, watch time and engagement. Now what happens if we are able to extract other properties of our video, thereby adding another variable to our model to classify the videos better, let’s say this variable is session time, the amount of time a user spends on the platform after watching your video. Well, our 2D lines now become 3D planes, dividing up our decision space into three dimensions. As in the case of our two variable classifier, these planes don’t just have to be straight either and can mould around the data points. As a side note, a truly useful classifier would be able to contend with many states of labelled data as well. In the case of our example we have 2 states, recommended or not recommended, but ideally, we would want more information, like if the video is recommended in a week, a month, etc. As you can see, with additional variables and output states we keep going to higher and higher dimensional spaces, and this is starting to get out of hand fast, literally. The only way to really model more complex real-world systems using these algorithms is through the use of powerful data center computers or GPUs, which excel at repetitive calculations. Imagine trying to visualize and do the mathematics by hand of a hyperplane going through a 1000-dimensional system. So, after walking through these simple examples and viewing various machine learning models, is machine learning just statistics rebranded? In a sense, yes; however, it goes much deeper than that. To get a better visualization of the relationships between various fields, I will illustrate them in this bubble diagram. The 3 primary fields in this diagram will be artificial intelligence, big data and data science. As a side note before continuing, data science is comprised and made up of many fields in it of itself such as mathematics, statistics, etc – with the primary goal to make sense of data, in other words, structuring data. For the sake of simplicity of our diagram, it is to be assumed that data science and statistics are considered one in the same. The intersection between big data, data science and artificial intelligence is where the majority of machine learning takes place, and, the intersection of data science and AI is where our examples took place. The examples we’ve gone through in this video are a subset of machine learning, what is referred to as supervised learning. Supervised learning is when we have the input and output to our data, in other words, labeled, structured data and we have to ‘train’ our models to maximize their predictive accuracy. Additionally, and as you can hopefully infer from our examples, supervised learning is then further subsectioned, comprised of 2 primary modes of learning models, regression and classification. Regression is for predicting continuous outputs, in other words, outputs that lie on the line- of-best-fit of the model, whether that line be straight, curved, etc. So, essentially, we are trying to map input variables to some continuous function. Classification on the other hand is for predicting discrete outputs, in other words, mapping input variables into discrete categories. To add to this, many classification models like those we saw earlier, implement regression algorithms as well. So, yes, supervised learning is essentially statistical mathematics for pattern recognition problems, rebranded as machine learning because they are applied in a way in which we iterate through them, in other words, train models to increase predictive accuracy. Now supervised learning isn’t the only subset of machine learning, in the next video in this series we will cover unsupervised learning and progressing forward, how deep learning ties into all of this. All in a quest to clear up the misconceptions between AI, machine learning and deep learning – and coming closer to answering the question posed at the start of this video! However, this doesn’t mean you have to wait to learn more! If you want to learn more about machine learning and I mean really learn how these algorithms work from supervised methodologies such as regression and classification, to unsupervised learning and more, then Brilliant.org is the place for you to go. For instance, in this course on machine learning, it covers all the concepts we went through in this video. My primary goal with this channel is to inspire and educate about the various technologies and innovations that are changing the world, but to do so on a higher-level requires going a step beyond these videos, and actually learning the mathematics and science beyond the concepts I discuss. Brilliant does this by making math and science learning exciting and cultivates curiosity by showing the interconnectedness between a variety of different topics! To support Singularity Prosperity and learn more about Brilliant, go to Brilliant.org/singularity Brilliant.org/singularity and sign up for free! Additionally, the first 200 people that go to that link will get 20% off their annual premium subscription! At this point the video has come to a conclusion, I’d like to thank you for taking the time to watch it! If you enjoyed it consider supporting me on Patreon to keep this channel growing, and if you have any topic suggestions please leave them in the comments below! Consider subscribing for more content and like my Facebook page for more bite-sized chunks of content. This has been Ankur, you’ve been watching Singularity Prosperity and I’ll see you again soon! [Music]

51 Replies to “What Is Supervised Learning (What Is Machine Learning | Machine Learning Basics)”

  1. Become a YouTube member for many exclusive perks from early previews, bonus content, shoutouts and more! https://www.youtube.com/c/singularityprosperity/join – AND – Join our Discord server for much better community discussions! https://discordapp.com/invite/HFUw8eM – ALSO – This video was made possible by Brilliant. Be one of the first 200 people to sign up with this link and get 20% off your premium subscription with Brilliant! https://brilliant.org/singularity

  2. This channel is so good, keep going with those really high quality videos and you will get the subscribers you deserve

  3. I was just thinking about your channel and how its one of the best on YouTube but you haven't posted anything in a while. Glad to see your back making new content.

  4. The best distinction to make between "human" learning and machine learning in practice is linear and exponential respectively. The former is characterised by classical and operant conditioning which invariably results in highly predictable behavioural patterns whereas the latter from a human perspective is quite the opposite, particularly beyond the technological "singularity.

  5. I am so very happy to see you continue these videos! Your channel is a fantastic source of very interesting and comprehensive information.

  6. Awesome, can't imagine how much time you've spent researching and creating the video itself. Great animations, keep up the work bro.

  7. Is it possible to use graphene or molybdenum disulfide or any other material to shrink the transistor to sub 1 nanometer?

  8. Really cool and informative video, but this effect (flickering) in your texts gets annoying over time

  9. the audio for this video is louder to the left ear, although it seems to be only your mic sound. the background music seems to be fine.
    please take note of that for the next video.
    otherwise great content as usual.

  10. Hey man. Great video! Just one thing. Given that these topics are difficult for most people, it might be nice to slow down your speech when explaining concepts. Thanks!

Leave a Reply

Your email address will not be published. Required fields are marked *