【51CTO.com Quick Translation】 Artificial intelligence is not a new term, and its development history has been decades. It started in the early 1980s, when computer scientists designed algorithms that could learn and imitate human behavior. In terms of learning, the most important algorithm is the neural network, but it was not very successful because the model was too powerful and there was not enough data to support it. However, in some more specific tasks, the idea of using data to adapt functions has achieved great success, which also forms the basis of machine learning. In terms of imitation, artificial intelligence has a wide range of applications in image recognition, speech recognition, and natural language processing. Experts have spent a lot of time creating edge computing, color profiles, N-gram language models, syntax trees, etc., but unexpectedly the results were mediocre.
[[197336]] Traditional Machine Learning
Machine learning (ML) technology plays an important role in prediction. Machine learning has gone through many generations and has a complete set of model structures, such as:
Linear regression Logistic regression Decision Tree Support Vector Machine Bayesian Model Regularized model Ensemble Model Neural Networks
Each prediction model is based on a certain algorithmic structure with adjustable parameters. Training a prediction model involves the following steps:
1. Choose a model architecture (e.g., logistic regression, random forest, etc.). 2. Feed the model with training data (input and output). 3. The learning algorithm will output the optimal model (i.e., a model with specific parameters that minimizes the training error).
Each model has its own characteristics, performs well in some tasks, and performs poorly in others. But generally speaking, we can divide them into low-power (simple) models and high-power (complex) models. Choosing between different models is a very tricky problem. Traditionally, it is better to use low-power/simple models than high-power/complex models for the following reasons:
Before we had lots of processing power, training high-powered models took a long time. Until we have a huge amount of data, training a high-power model will lead to overfitting problem (since high-power models have rich parameters and can adapt to the shape of multiple data, we may end up training a model that is very relevant to the current training data instead of predicting future data).
However, choosing a low-power model has the so-called "underfitting" problem, that is, the model structure is too simple and cannot adapt to the training data in more complex situations. (Suppose the following data has a quadratic relationship: y = 5*X squared; there is no way to fit a linear regression: y = A, B, B, B, no matter what A and B we choose.)
To alleviate the "underfit problem", data scientists often apply their "domain knowledge" to generate "input features" that have a more direct relationship with the output (e.g., returning to the quadratic relationship y = 5*X squared), and then fit a linear regression by picking a = 5 and b = 0.
A major hurdle in machine learning is this feature engineering step, which requires domain experts to identify important signals before entering the training process. The feature engineering step is very manual and requires a lot of domain expertise, thus becoming the main bottleneck for most machine learning tasks today. In other words, if we don’t have enough processing power and enough data, then we have to use low-power/simple models, which requires us to spend a lot of time and effort to create appropriate input features. This is what most data scientists spend their time doing.
Regression of Neural Networks
In the early 2000s, with the collection of large amounts of fine-grained event data in the era of big data, machine processing power has greatly increased with the advancement of cloud computing and massively parallel processing infrastructure. We are no longer limited to low-power/simple models. For example, the two most popular mainstream machine learning models today are random forests and gradient boosted trees. However, although they are both very powerful and provide nonlinear model fitting to training data, data scientists still need to carefully create features to achieve good performance.
At the same time, computer scientists reused neural networks with many layers to complete these human-like tasks, which gave the newly born DNN (deep neural network) a major breakthrough in image classification and speech recognition tasks.
The main difference with DNN is that you can send raw signals (e.g., RGB pixel values) directly to the DNN without creating any domain-specific input features. Through multiple layers of neurons (that's why it's called a "deep" neural network), it is able to automatically generate the corresponding features that pass through each layer and finally provide a good prediction. This greatly saves the effort of "feature engineering", which is also a major bottleneck encountered by data scientists.
DNN has also evolved into many different network structures, so we have CNN (Convolutional Neural Network), RNN (Real Network), LSTM (Long Short-Term Memory), GAN (Generative Adversarial Network), transfer learning, attention models... The whole spectrum is called "Deep Learning", which is the focus of the entire machine learning community today.
Reinforcement Learning
Another key part is how to mimic a person (or animal) learning. Imagine the very natural animal behavior of the perception/action/reward cycle. A person or animal would first understand the environment by sensing what state he or she is in. Based on this, he or she would choose an "action" that would take him or her to another "state", then he or she would get a "reward", and the cycle would repeat.
This learning method (called reinforcement learning) is very different from the traditional curve-fitting method of supervised machine learning. In particular, reinforcement learning occurs very quickly because each new feedback (such as performing an action and obtaining a reward) is immediately sent to influence subsequent decisions. Reinforcement learning has achieved great success in self-driving cars as well as AlphaGO (a chess-playing robot).
Reinforcement learning also provides a smooth integration of prediction and optimization because it maintains a belief about the current state and possible transition probabilities when taking different actions, and then makes decisions about which actions will lead to the best outcome. Deep Learning + Reinforcement Learning = Artificial Intelligence Compared to classical machine learning techniques, deep learning provides a more powerful prediction model that generally produces good predictions. Compared to classical optimization models, reinforcement learning provides a faster learning mechanism and is more adaptable to changes in the environment.
Author: RickyHo Original link: https://dzone.com/articles/how-ai-differs-from-ml Translated by Liu Nina
[Translated by 51CTO. Please indicate the original translator and source as 51CTO.com when reprinting on partner sites] |