With so many questions to answer, what are some of the most common machine learning problem types that come up while building out AI systems? Jake Shaver, Special Projects Manager at DataRobot, walks us through four problem types in this installment of AI Simplified.

1. Classification

Classification is a systematic grouping of observations into categories, such as when biologists categorize plants, animals, and other lifeforms into different taxonomies. It is one of the primary uses of data science and machine learning.

The most important use cases of Natural Language Processing are:

The goal of this task is to predict a class (label) of a document, or rank documents within in a list based on their relevance. It could be used in spam filtering (predicting whether an e-mail is spam or not) or content classification (selecting articles from the web about what is happening to your competitors).

2. Why is Classification Important?

There are many practical business applications for machine learning classification. For example, if you want to predict whether or not a person will default on a loan, you need to determine if that person belongs to one of two classes with similar characteristics: the defaulter class or the non-defaulter class. This classification helps you understand how likely the person is to become a defaulter, and helps you adjust your risk assessment accordingly.

3. Classification + DataRobot

The DataRobot automated machine learning platform includes a number of classification algorithms and automatically recognizes whether your target variable is a categorical variable that’s suitable for classification or a continuous variable that is suitable for regression. Furthermore, DataRobot’s various tools allow you to examine the performance of classification models for both binary and multiclass problems.

- Training data is used to train a model. It means that ML model sees that data and learns to detect patterns or determine which features are most important during prediction.

- Validation data is used for tuning model parameters and comparing different models in order to determine the best ones. The validation data should be different from the training data, and should not be used in the training phase. Otherwise, the model would overfit, and poorly generalize to the new (production) data.

- It may seem tedious, but there is always a third, final test set (also often called a hold-out). It is used once the final model is chosen to simulate the model’s behaviour on a completely unseen data, i.e. data points that weren’t used in building models or even in deciding which model to choose.

It’s important to understand which problem you’re solving as each problem can use different models, have different accuracy metrics, and other problem-specific parameters that you need to account for.

Aron Larsson

– CEO, Strategy Director

Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur.

Avatar

fash

Leave A Comment

Your email address will not be published. Required fields are marked *

image-blog
image-blog
image-blog
image-blog
image-blog
image-blog