However, time is progressing, not to mention that the processing capability of machines has improved. In addition,
the web has developed and the Internet is spreading all over the world, so open data has increased. With this
development, everyone can handle data mining only if they pull data from the web. The environment is set for everyone
to casually study machine learning. The web is a treasure box of text-data. By making good use of this text-data
in the field of machine learning, we are seeing great development, especially with statistical natural language
processing. Machine learning has also made outstanding achievements in the field of image recognition and voice
recognition, and researchers have been working on finding the method with the best precision.
Machine learning is utilized in various parts of the business world as well. In the field of natural language
processing, the prediction conversion in the input method editor (IME) could soon be on your mind. The fields of
image recognition, voice recognition, image search, and voice search in the search engine are good examples. Of
course, it's not limited to these fields. It is also applied to a wide range of fields from marketing targeting,
such as the sales prediction of specific products or the optimization of advertisements, or designing store shelf
or space planning based on predicting human behavior, to predicting the movements of the financial market. It can
be said that the most used method of data mining in the business world is now machine learning. Yes, machine learning
is that powerful. At present, if you hear the word "AI," it's usually the case that the word simply indicates a
process done by machine learning.
What even machine learning cannot do
A machine learns by gathering data and predicting an answer. Indeed, machine learning is very useful. Thanks to
machine learning, questions that are difficult for a human to solve within a realistic time frame (such as using
a 100-dimensional hyperplane for categorization!) are easy for a machine. Recently, "big data" has been used as
a buzzword and, by the way, analyzing this big data is mainly done using machine learning too.
Unfortunately, however, even machine learning cannot make AI. From the perspective of "can it actually achieve AI?"
machine learning has a big weak point. There is one big difference in the process of learning between machine learning
and human learning. You might have noticed the difference, but let's see. Machine learning is the technique of pattern
classification and prediction based on input data. If so, what exactly is that input data? Can it use any data?
Of course… it can't. It's obvious that it can't correctly predict based on irrelevant data. For a machine to learn
correctly, it needs to have appropriate data, but then a problem occurs. A machine is not able to sort out what
is appropriate data and what is not. Only if it has the right data can machine learning find a pattern. No matter
how easy or difficult a question is, it's humans that need to find the right data.
Let's think about this question: "Is the object in front of you a human or a cat?" For a human, the answer is all
too obvious. It's not difficult at all to distinguish them. Now, let's do the same thing with machine learning.
First, we need to prepare the format that a machine can read, in other words, we need to prepare the image data
of a human and a cat respectively. This isn't anything special. The problem is the next step. You probably just
want to use the image data for inputting, but this doesn't work. As mentioned earlier, a machine can't find out
what to learn from data by itself. Things a machine should learn need to be processed from the original image data
and created by a human. Let's say, in this example, we might need to use data that can define the differences such
as face colors, facial part position, the facial outlines of a human and a cat, and so on, as input data. These
values, given as inputs that humans need to find out, are called the features.
Machine learning can't do feature engineering. This is the weakest point of machine learning. Features are, namely,
variables in the model of machine learning. As this value shows the feature of the object quantitatively, a machine
can appropriately handle pattern recognition. In other words, how you set the value of identities will make a huge
difference in terms of the precision of prediction. Potentially, there are two types of limitations with machine
learning:
An algorithm can only work well on data with the assumption of the training data - with data that has different distribution. In many cases, the
learned model does not generalize well.
Even the well-trained model lacks the ability to make a smart meta-decision. Therefore, in most cases, machine learning can be very successful in a
very narrow direction.
Let's look at a simple example so that you can easily imagine how identities have a big influence on the prediction
precision of a model. Imagine there is a corporation that wants to promote a package of asset management based on
the amount of assets. The corporation would like to recommend an appropriate product, but as it can't ask a personal
question, it needs to predict how many assets a customer might have and prepare in advance. In this case, what type
of potential customers shall we consider as an identity? We can assume many factors such as their height, weight,
age, address, and so on as an identity, but clearly age or residence seem more relevant than height or weight. You
probably won't get a good result if you try machine learning based on height or weight, as it predicts based on
irrelevant data, meaning it's just a random prediction.
As such, machine learning can provide an appropriate answer against the question only after the machine reads an
appropriate identity. But, unfortunately, the machine can't judge what the appropriate identity is, and the precision
of machine learning depends on this feature engineering!