The past and present of AlphaGo

Why did AlphaGo focus on Go instead of Mahjong?

Legend has it that Yao invented Go to teach Danzhu, so Go has a history of more than 4,000 years. The LG Cup final in 2009 was hyped up by Koreans who are good at creating momentum as a "4,000-year battle". The game was between Lee Sedol and Gu Li, and it was like a battle of kings in the Chinese and Korean Go worlds. Now Lee Sedol is standing at the crossroads of history again, shouldering the pride and dignity of mankind for 4,000 years, but sitting opposite him is Google's computer chess player AlphaGo. Google and Gu Ge, is it predestined?

What is Go? In the eyes of computers, it is nothing more than a board game. On a 19*19 board, black and white players take turns to play. When a piece runs out of energy, it must be removed from the board. In the end, there is no place to play, and whoever occupies the largest area wins. The rules are so simple.

But in the eyes of humans, Go has long surpassed the scope of games. It contains history, etiquette, aesthetics, and life philosophy. The Honnoji Incident, the Huaishang letter, the ten-game match, and the challenge match have witnessed so much history; the new layout, the cosmic flow, the aesthetics of the big bamboo, the stone Buddha and the demon sword, contain so much elegance; entering the boundary slowly, and giving up pieces to compete for the first place have long become the life creed of countless people. When the computer really sits in front of you, you have mixed feelings: I say this is life, but you say it's just a game.

Among the vast sea of human intellectual games, Go is just a drop in the ocean, and its influence among the people may not be comparable to that of Mahjong and Poker. Compared with the Mahjong tables in ordinary alleys in China and the thousands of poker tables in casinos, playing Go is somewhat highbrow and unpopular.

So why does artificial intelligence favor Go so much? Why not "AlphaMajo" challenge Sichuan Mahjong masters, or "AlphaHoldem" challenge Texas Hold'em champions? There are three reasons:

First, Go has simple winning and losing rules (explicit winning condition). This is very important because the computer needs to make an accurate and quantitative assessment of the quality of each decision. It may take ten years to play Go well, but a beginner can judge who wins and loses after a game. If the rules themselves are vague, you can imagine what the results will be if a computer competes with humans in modern poetry or abstract painting.

Second, Go is information-symmetrical, or perfect information. Facing the board, the computer and its human opponent see exactly the same information. The same is true for Chinese chess and international chess. Mahjong, poker, and Four Kingdoms Chess are different: each player can only see the information of his own side, and must infer his opponent's cards through his behavior. This explains why experienced mahjong players often lose to novices who do not play by the rules, and why there are endless scams and psychological warfare in Texas Hold'em. The integrity and symmetry of information allow computers to make absolutely rational decisions: no matter why you play this way, I just need to play the best hand in the current situation.

Third, the vast search space of Go brings challenges and temptations that computers cannot resist. Humans have long been defeated by computers in chess and international chess, but in Go, we can at least look forward to Ke Jie.

Second, how difficult is it for a computer to learn to play Go?

How difficult is Go? For human players, this is difficult to quantify. Nie Weiping once modestly said, "The realm of Go is beyond reach, and I can only be considered to have just started." Professional players are often asked about the gap between themselves and the "God of Go": some say they should give up two stones, Ke Jie says they should give up the first stone, and some think that the development of Go is nearing its end. There are different opinions. Human vision is always blocked by the mountain in front of us. Only when we climb to the top of the mountain do we know that there are mountains beyond the mountain.

For computers, this question is much easier to answer. How many variations are there in Go? If I can judge whether the situation is good or bad for each variation, then wouldn’t I be the God of Go who can make the best move at every step? Early artificial intelligence designers did think so.

[[180185]]

Imagine we play a game of Tic-Tac-Toe: a 3*3 board, players place pieces in the blanks, and the first player to form a row, column, or diagonal wins. If each blank has only three states: black, white, or empty, there are only 393 9 (3 to the 9th power, or 9 times 3) = 19,683 states. Even if we consider the order of placement, there are only 9! = 362,880 variations. Evaluating the pros and cons of less than a million variations is a piece of cake for today's computers.

But when this method is applied to Go, I was stunned at once, there are too many variations!

So how many variations are there in Go? If we also consider that each intersection has three states: black, white, and no pieces, then the number of states of a Go board is 33613361. Excluding the states that are actually impossible, the number is about 1017010170. In comparison, the number of chess states is less than 10501050, which is completely negligible compared to the complexity of Go. If we consider the order of moves, then Go has about 361! Variations, or 1076810768 (in fact, there are not so many, because there are always places where you can't place a piece). Either way, it is an astronomical number, because the total number of observable atoms in the universe is no more than 10801080.

Some people may say that the God of Go does not necessarily calculate every move to the end, and it is enough to calculate 30 to 50 steps forward. Well, in the early stage (within 60 moves), there are probably more than 1012010120 changes if you calculate 50 steps. 10 steps? 10241024. Even if you only calculate 5 steps, there are more than 2? 10122? 1012 changes. Even if it only takes one nanosecond to evaluate a change (which is of course impossible), it will take 40 minutes to make this move. Moreover, there is a more serious problem for computers: How can I know who is good and who is bad without going all the way?

It seems too difficult! Would it be easier if the chessboard was smaller? The answer is yes. On a 13*13 chessboard, the number of variations is reduced to 1030410304. On a 9*9 chessboard, it is only 1012010120. Zhang Xu's self-created four-way chess has only 10131013 variations, and the number of states is reduced to tens of millions: still a lot, but completely manageable for computers.

Well, now we know that the God of Go and the God of the Universe are probably the same person. Since she can see through all the changes on the chessboard, she is probably familiar with all the atoms in the universe. Can AlphaGo really exhaust every change? It doesn't matter, even if it can, it's not terrible. If we expand the chessboard to 21 lines tomorrow, even if all the atoms in the universe become AlphaGo, it won't work.

3. Early computer Go relied on human teaching

Computers are certainly not the gods of Go. You can imagine them as a gifted teenager who wants to challenge a martial arts master. He has strong internal strength and fast movements, but he doesn't know how to play (think of Zhang Junbao or Zhang Wuji who just learned the Nine Yang Magic Skill). How can he defeat the martial arts master who is skilled in martial arts?

Due to the fear of the astronomical changes in Go, the earliest computer Go chose to imitate the human way. You can do it, but I can't, but I will always go where you go. This is also the way that professional Go players jokingly call "memorizing the chess manual". If you hang a corner with a small flying move, you should use a small flying move. If you force me, I will jump, and if you jump, I will jump with you. It is always better to be moved more by others. A large amount of Go knowledge, such as fixed patterns (routines for layout) and tricks (clever moves for local battles), etc., were extracted from the chess manual in this way, and then told to the computer by the programmer in the form of rules. Then, the computer follows the steps step by step in actual combat. The early version of the famous domestic Go software "Shou Tan" took this path.

The chess strength of such an algorithm is of course related to the completeness of the rule base, but it is basically quite low. Seeing a move of falling meteors, they will respond with a move of seeing Buddhas blooming, which is at best the level of Lin Pingzhi's father. This kind of algorithm that memorizes fixed patterns, if it changes slightly in actual combat, such as small flying hooks becoming large flying hooks, jumping and walking becoming flying and walking, the computer will immediately feel completely different and lose its way.

Of course, programmers also have countermeasures: they gradually add many fuzzy matching attempts on top of "memorizing by rote". In actual combat, when they see slightly different situations, they can also make the next move. This can be seen as learning from one example to a certain extent. Of course, in the mid-game battle where a slight difference can lead to a huge mistake, such fuzzy matching is unlikely to work.

The algorithm of "remembering chess records" has another major flaw, which is that most of these rules are limited to a narrow local area, and there is no method for the coordination of chess pieces in the whole world. Early Go programs were most afraid of "conquering pieces", which is a typical manifestation of this flaw.

Since the flaws of memorizing chess records are so obvious, why is it still the first choice of computer Go programmers? From the computer's perspective, memorizing chess records greatly reduces the space for choices. There are probably not many ways to play corners except for flying, jumping, clamping, tip top, and leaning out. This makes the choices worth considering very few, greatly reducing the computing intensity of the computer.

Fourth, another approach to computer Go is to evaluate the situation

So some people may ask, since reducing the range of choices is unreliable, can we reduce the depth of change?

This is a very interesting idea. If we only think about the present move and not the future move, then wouldn't we only need to consider at most 300 variations for each move? Suppose we can judge the good or bad situation after each move is placed on the board, then wouldn't it be enough to just choose the best move? This is indeed very tempting! But when a game of chess has only been played for a few dozen moves, can we really judge the good or bad situation? Can the great God of Go really calculate the utility of each chess piece?

Early computer Go games did make a lot of interesting attempts. Some people were memorizing chess records, while others were evaluating the situation, and those who were evaluating the situation even started earlier.

This is easy to understand: pushing forward one step is not the end of the game, and pushing forward ten steps is not the end of the game either. So as long as I can accurately evaluate the situation, then pushing forward any number of steps will be useful. How to do it? Divide and conquer! Isn't the game of Go won by the one who surrounds the largest territory (number of points)? Then suppose that each piece on the board can be converted into points, and adding them up can't we judge whether the situation is good or bad? Since 50 years ago (when the level of computers can be imagined), some people have begun to make such attempts.

The specific methods vary, but the general idea is the same: the closer the empty point is to our chess piece, the more likely it is to be mine, and the closer the point is to the opponent's chess piece, the more likely it is to be the opponent's; live pieces, dead pieces, and half-dead pieces are considered separately. It is said that the first Go program was born in 1968. Its main idea is to evaluate the situation by calculating the "influence" of each chess piece. Unfortunately, its paper can no longer be found. I have read another article published in 1981. The basic method is to calculate the amount of "qi" (the empty space adjacent to the chess piece), and choose the method that maximizes your own qi and minimizes the opponent's qi. (Because the amount of qi is related to the life and death of the chess piece, that is, the survivability)

In the "Shou Talk" software in the 1990s, its author once set the influence of each live piece as: "4 for its adjacent position, 3 for diagonal position (small tip), 2 for single and small flying position, and 1 for slightly far away." Based on the accumulation of piece effects, the designers have successively added a number of improvements to correct the value of single pieces, adjacent pieces, and blocks of pieces.

How good is this approach? It seems to be still very bad. Even with the combination of some artificial intelligence search algorithms, the champion of computer Go in the 1990s was probably only at the level of amateur masters with 14-16 handicaps. If the "remembering chess records" algorithm is like playing a set of Shaolin Changquan and then starting over; then this static situation evaluation method is a bit like the so-called "random chopping wind" swordsmanship: there is no back-up move, just chop wherever you see it is easy to chop, and whatever you chop is counted. It is worth mentioning that although the chess strength is not good enough, static situation evaluation as a means of rapid situation analysis in the middle and late stages is deeply loved by Go enthusiasts. When I played Go on Sina Go, I often used the situation analysis tool it provided to count the points (count how many empty spaces I had). In the online self-competition commentary of professional Go player Meng Tailing, we were surprised to find that Taige sometimes used this tool.

Throughout the 20th century, computer Go was in an era of memorizing chess records and evaluating situations. Designers added many heuristic algorithms to calculate the number of pieces, identify the robbery, fuzzy matching, and optimize the endgame. However, the computer's Go skills were like walking into a long dark night and never improved.

This has led to a deep-rooted contempt among Go masters for the level of computers: until the final match between AlphaGo and Lee Sedol, Luo Xihe still believed that he could easily give AlphaGo four pieces. Indeed, if a person who has learned Go for thirty years can only play handicap chess with amateur masters, his Go career would probably have been sentenced to death long ago.

Of course, in my opinion, this kind of computer Go that memorizes chess records and statically evaluates situations is far from being called "artificial intelligence." The early designers planted the seeds, and the seeds slowly took root and sprouted under the stones in the dark night. It is waiting for the day when the stones will be uncovered, and this day is still many years away.

5. Search: Learn the Chess of Chess by Walking Through a Maze

The seeds of computer Go are growing slowly under the stone. Let's put it aside for now and take a look at what the real artificial intelligence researchers are doing. Most of them have never played Go before, and their goal since childhood has been to defeat the human chess champion.

Unlike the Orientals who have been exposed to chessboards since childhood, the childhood of Westerners is a process from cross-and-circle chess to international chess. As we have said, there are less than a million variations of cross-and-circle chess, and the variations of international chess do not seem to be many either. Therefore, when Western researchers first come to the fore, they think of the exhaustive method.

Even exhaustive enumeration has to have a sequence. Starting from Venice, all roads lead to Rome. Venice is the beginning, Rome is the end, and we call the process leading to Rome search. Search ranks first in the weapon list of artificial intelligence. After the 1990s, due to the rise of the Internet and the decline of artificial intelligence, when people mention search, the first thing they think of is often Google and Baidu. I ask questions, and the computer tells me the answers.

Don’t forget that the original meaning of search is the process of finding Rome, not Rome itself!

Humans are no strangers to searching. Isn't it just walking through a maze? At every intersection in Manhattan, there are four choices. No matter which one you choose, there will be another four choices waiting for you at the next intersection, until you get out of the maze or exhaust all your choices.

Mathematically speaking, we call the beginning of the maze the "root", each intersection a "node", each path at the intersection a "branch", and each state where there is no path to go is a "leaf", so all the changes in the maze become a "tree". The search process is to traverse the tree in a certain order until you find the exit leaf or search all the leaves.

How much like Go! Starting from a blank board, every choice of move brings dozens or hundreds of branches, every ending is a leaf, and every winning game is Rome.

It is not difficult to find the exit of a maze or to find Rome. Just mark the intersections you have walked through and keep going left or right (in computer algorithms, this is called depth-first search, which can ensure that a tree is traversed without omission). The difficult part is that you can still have a hot meal in Rome. This is difficult, so we must give up some branches and give up looking for most leaves. With limited time and choices, can we still find Rome?

Countless pioneers of artificial intelligence have studied this problem, including the famous Mr. Dijkstra, whose algorithm allows people to find the shortest path from Venice to Rome (of course, the cost of finding this path is no lower than that of a depth-first search).

Computers are better at navigating mazes than humans. They can freely jump between discovered intersections (similar to Doraemon's portal). This allows them to try each intersection before deciding on the next step, which is the so-called breadth-first search. The famous A* Search algorithm (a type of best-first search) combines the advantages of breadth-first search and shortest path search. It sends spies at each intersection to report how far and where the next intersection is. It then combines the length of the current path and the estimated distance from the next intersection to the end point to decide what to do next. It certainly seems that it would be better to spend ten days to get out of Ziwu Valley and march directly to the city of Chang'an than to march for several days to Longyou, hundreds of miles away from Chang'an.

Such an algorithm greatly reduces the complexity of searching for the optimal path. However, it is easy to estimate the distance from Venice to Rome, but it is still difficult to estimate the distance from the middle game to the winning game!

Wait a minute, we seem to have forgotten something important. A maze is a one-person walk, but chess is played by two people. If you can't predict your opponent's moves, how can you find your best moves? In the exploration of applying search to chess games, pioneers of artificial intelligence invented the "minmax algorithm". Does it sound awkward? In fact, it is not difficult to understand. When looking for the next move, we prefer to make a move in a place where we will not be too bad no matter how the opponent responds (rather than a place where we can take advantage if the opponent responds incorrectly, but may suffer a big loss if we respond). Researchers have designed complex algorithms to further narrow the search space, so that computers can search deeper on more effective branches instead of spending time on useless moves that are not good at first glance. One of the most important algorithms is called Alpha-Beta pruning. The computer Go champion mentioned in the 1990s used it to cooperate with the situation evaluation. Alpha-Beta, Alpha-Bet, Alpha-Go, past and present, how embarrassing!

With these search algorithms in hand, it is no problem for computers to beat (or tie) children in cross-and-cross chess. But when artificial intelligence researchers turned their attention to chess, they found that its search space was unexpectedly large, and it seemed that no matter how they pruned, they could not find the bottom.

At that time, people did not have a good method to accurately estimate the good or bad situation of non-leaf nodes (if more pieces were better, then all the swindlers who set up chess endgames would be laid off). Computer Go developers saw that if you can't even search the bottom of chess, then I can't even think about Go. So computer Go spent another 20 years in the dark until a hero appeared.

6. Enlightenment from Deep Blue

On February 10, 1996, a computer called Deep Blue challenged the chess champion Kasparov. To everyone's surprise, it won the first game, then drew two and lost three. Deep Blue was designed by IBM. The two sides agreed to play again in a year. In May 1997, the two sides played six more games, and Deep Blue defeated the chess champion with one win and five draws. This was a milestone event in the history of artificial intelligence.

It is worth mentioning that after losing the game, Kasparov thought that the intelligence and creativity shown by Deep Blue were incredible, and that there must be human chess players behind the scenes. This time, Google was obviously well prepared: high-profile marketing, almost all the top human chess players appeared to talk about chess, thus eliminating the speculation that "Ke Jie was hiding in the chassis."

Why did Deep Blue win? In addition to the significant increase in computing power brought about by Moore's Law, Deep Blue's algorithm seems to be nothing special. Minmax search, Alpha-Beta pruning, why did martial arts become so powerful overnight?

When Deep Blue unveiled its mystery, people found that the secrets of its algorithm were nothing more than two points: situation assessment and looking ahead. It's familiar, isn't it? Deep Blue's situation assessment takes into account the importance of the pieces (the queen is 9, the pawn is 1, and the rook is in between), the influence range of each piece (sounds familiar again?), the safety factor of the king, and the first move (tempo). This assessment is not static, but it is to exhaust all the changes in the next few steps, and then estimate all possible situations (it is said that when playing against Kasparov, Deep Blue pushed forward 12 steps). How difficult is this? Roughly speaking, there are 100 ways to play each step (each pawn has a maximum of 2-3 ways to play, each horse has a maximum of 8 ways to play, each rook has 14 ways to play, and remove the places where it cannot be played, and so on). 12 steps means 102410 24 changes. With alpha-beta pruning and IBM's powerful parallel computing capabilities, it can be handled!

The success of Deep Blue made humans face up to the powerful potential of artificial intelligence for the first time. Let's take a look at the unprecedented enlightenment and opportunities that Deep Blue brought to computer Go. On the one hand, the success of Deep Blue declared that chess was a solved problem, which made many artificial intelligence researchers turn their attention to the next challenge: Go. On the other hand, the designers of computer Go were surprised to understand two points from Deep Blue: first, without much professional knowledge, the seemingly brute force exhaustive search could be so effective; second, accurate situation assessment is so important, but static assessment does not seem to be enough. From now on, memorizing chess records is no longer a solution, and situation assessment will be based on dynamic search!

In 2006, a black comedy movie called "Crazy Stone" swept China. In the same year, a computer Go program with the same name (Crazy Stone) quietly won the 9*9 Go championship at the Computer Olympics. The following year, it won the 9*9 championship at the Computer Olympics and won the runner-up in the 19*19 competition. The next year, Crazy Stone won the eight-piece game against a real professional player (Aoba Fumio 4th dan), and won the seven-piece game at the end of the same year. Five years later (2013), Crazy Stone won the four-piece game against Ishida Yoshio 9th dan. The next year, Crazy Stone also won the four-piece game against the stronger Yoda Noriki, and his influence in the Go world reached its peak.

Coincidentally, after 2010, a computer chess player named Zen appeared on the Internet, and gradually rose to the 5th dan on KGS (a famous Go server). I often played chess on KGS at that time, and I had won and lost against Zen. But soon I could only watch it surpass me in chess skills and leave. In 2012, Zen also defeated the professional 9th dan: the popular Takemiya Masaki in the four-stone game.

Since then, computer Go has entered a new era, an era that continues to surprise everyone. But we can't help but ask two questions: First, it has been ten years since the appearance of "Crazy Stone" and "Deep Blue". What have computers been doing in these ten years? Second, why is it always related to "Stone"?

7. A strange tree - Monte Carlo tree

Starting from Crazy Stone, this era can be called the "Monte Carlo Era". Contemporary computer chess players have adopted a search algorithm called Monte-Carlo Tree Search, and AlphaGo is no exception. What is its unique skill?

One of the revelations brought by Deep Blue is to find an accurate situation assessment function, which must be dynamic and must take into account the situation several or even dozens of steps later.

This idea has not been thought of by no one. But compared to chess, which has the only goal of winning - killing the king, the judgment of the situation in Go may be more subjective. What is momentum? What is thick? What is thin? How to convert momentum and territory? What is the overall situation? These are probably the eternal questions of Go. Even the top players often cannot make clear judgments. It is most interesting to watch the games of top players. Korean commentators always think that Koreans are good, while Chinese commentators think that Chinese are good. One side says it is a sacrifice, and the other side says it is being eaten... Computers don't like to see mountains from different angles, they need to make rational and objective decisions.

So what is absolutely objective in Go? As we said before, only the final victory or defeat. But those leaves representing the final victory seem out of reach on the Go search tree. So, is it possible to judge the situation on a branch without exhaustively enumerating all the leaves? This idea is refreshing. If a branch has 10 leaves, 8 leaves represent winning and 2 leaves represent losing, do we have to find the leaf with the biggest winning or losing to judge the quality of this branch? If this branch has 100 leaves, do we have to look at all the leaves to judge the quality?

Friends who love statistics are already laughing out loud: If you want to know whether the additives exceed the standard, of course you don't need to open all the cans. Just take a sample! Assuming that we can take a sample of each move and investigate the final result it leads to, we probably don't need to understand the ground, situation, thickness, and thinness to make a judgment on the situation, right?

How to sample? Random is simple and reliable. When a new move is unclear, professional chess players often advocate solving it in actual practice. The same is true for computers, except that instead of finding two experts to play a game, they find two kids who don't know chess to play a thousand games. How much does it take for a computer to play a thousand games randomly? Let's count it in milliseconds!

Isn't this easy? From now on, I can objectively estimate the quality of each situation, and I don't need to traverse the entire search tree (so please don't call me the exhaustive method!). Is it really that simple? I don't believe it. Starting from the first move in the upper right star position, randomly simulating a thousand games, and finding that White wins 501 games, does that mean Black 1 is a losing move? Unfortunately, the result obtained by random sampling is a statistical expectation, not the actual "optimal". It needs to follow the laws of the God of Statistics. How many situations are there for the first move Black 1? Astronomical numbers. How are the wins and losses distributed? I don't know. So a thousand games, ten thousand games, are just a trivial sample for such statistical analysis, and it is difficult to draw practical conclusions.

It doesn't matter. If the sample size is not enough, you can play more. After all, random chess does not consume electricity. How many games should we play? Of course, the more the better, but we might as well count until the countdown time. A random algorithm that is completed within a certain time is called Monte Carlo. Speaking of which, this term comes from Monte Carlo, a famous casino resort in Monaco, because such algorithms are often used to calculate the winning rate of gambling. Since there is Monte Carlo, is there Las Vegas? Of course there is, but we will not discuss it for now. The search algorithm used by the "God of Chess" Deep Blue can still be used now: just replace the situation evaluation with Monte Carlo (use the final winning rate of the simulated game instead of the score value to evaluate the current situation). This is the so-called "Monte Carlo tree search".

This method sounds quite reliable. So why did Monte Carlo not become popular until 10 years later?

This is because Monte Carlo also has obvious flaws. Due to its randomness, Monte Carlo cannot guarantee the correct answer, but can only guarantee that there will be no mistakes under a certain probability. What determines this probability boundary? Of course, it is related to the number of random simulations.

This brings us back to the original problem: because the Go tree is so large, if the number of simulations per node is too large, there will not be enough time; if the number of simulations is too small, the answer will not be accurate. This contradiction delayed Monte Carlo Go for a full ten years.

8. The shining debut of the “multi-armed slot machine”

Crazy Stone provides a good idea to solve the contradiction. It is not difficult to say: for branches that look good, we simulate more games to make its evaluation more accurate. For branches that look not so good, we play fewer games. For branches that are really unreliable, we don't even look at them. In this way, although the search space is huge, the actual search tree becomes very small.

But some people may ask, will this cause the crazy stone to "hang itself on a tree"? What if it sticks to a branch that looks good and keeps searching downwards?

This problem does exist, and it is quite famous in the field of artificial intelligence, known as the "exploration vs. exploitation contradiction." It is more interesting to use the words of geologists (or StarCraft players), probably the balance between exploration and development: over-exploitation of current oil fields reduces the chance of exploring richer oil fields, while too much exploration makes development less efficient.

This is very frustrating. However, there is an elegant solution to this problem.

Since we are in Monte Carlo, let's solve the casino's problems in a casino way! Gamblers who play slot machines often have this distress: some slot machines are easy to spit out coins, while others are difficult. The one I am playing now is pretty good. But if I keep playing, I always feel that the next door may give out more coins; if I switch to the next door, I am afraid that as soon as I leave, this one will give out a big coin.

In machine learning, there is an algorithm called "multi-armed bandit" that solves this problem. It uses a strategy called UCB to accurately calculate which slot machine should be tried more and which should be tried less, and tells the gambler which one to try next. When this strategy is applied to Monte Carlo trees, it becomes the UCT algorithm (UCB applied to trees) that made "Crazy Stone" famous. Branches related to the local focus are tried more, and branches farther away are tried less, but not without trying. For the branches recommended by UCT, we will search down first and use Monte Carlo to simulate more chess games.

Monte Carlo and multi-armed slot machines, the two great gambling gods, have brought a flourishing era to computer Go. The "crazy stone" and "Zen" of computers complement each other and have continuously refreshed people's expectations in the past decade. It seems that they will soon defeat professional Go players.

But the journey of a hundred miles begins with a single step. In the past two years, we have found that their progress has slowed down again, and even Zen's rank has not risen any further.

Everyone finally realized that they had encountered a bottleneck again. This is no wonder. Even Monte Carlo needs to go to the end; even multi-armed bandit machines need to try many branches. Computing power is still a bottleneck, unless the width and depth of the search can be reduced more effectively.

However, even the God of Gamblers has been brought out, and the masters of artificial intelligence seem to be at a loss as to what to do.

But people forget that in the long dark night for decades, a seed has been growing. This day has finally come, and it will push away the stone and break out of the cocoon.

"Wei Xiaobao, I will definitely come back. The next time I show up, you won't recognize who I am."

9. Two unique secrets of human chess players

The long dark night finally passed, and we inadvertently waited for AlphaGo. Its sudden appearance was so surprising that the initial report of AlphaGo appearing in Nature was regarded as a rumor by many people in the circle of friends.

While we were expecting "Crazy Stone" and "Zen" to gradually approach the level of professional chess players, "AlphaGo" bit the European champion and stunned him at the first move, and also issued a challenge to another crazy stone, Lee Sedol. What secrets does it have?

We might as well think about the bottlenecks encountered by "Crazy Stone" and "Zen" in a different way. We know how computers challenge humans, but how can humans compete with computers? No matter how strong professional chess players are, they obviously cannot calculate thousands of variations per second. They don't know Monte Carlo or Alpha-Beta pruning, so how can they navigate the maze of Go with ease? Obviously, computing power cannot be compared, but there must be something that humans are better than computers.

In fact, human players do have their own unique secrets when searching the Go tree.

The first kind of magic of human beings is to be able to significantly reduce the search space and miraculously find only a few feasible moves in a complex and open situation. This is the so-called chess sense, and for top players, it is even the "first sense". If computers also learn this ability, wouldn't it be possible to focus precious computing resources on exploring these few branches?

The second kind of magic of human beings lies in their powerful ability to judge the situation. They can judge the pros and cons of the overall situation without precise calculation. This is the so-called "big picture view". If computers also have this ability, wouldn't they be able to correctly evaluate the situation without searching deeply? Are chess sense and big picture view the magic that humans are born with? Of course not. They are actually the crystallization of the wisdom of human chess players for thousands of years; their foundation is the Go knowledge and experience passed down by chess records and generations of chess players.

This idea is very exciting: good chess sense and overall view, one can reduce the search width, the other can reduce the search depth, isn't this what Monte Carlo dreamed of! Where do chess sense and overall view come from? When I was a child, my chess teacher said: play more master's games. Thinking of this, artificial intelligence researchers are relieved: it turns out that the answer is still in the chess records. We have collected hundreds of thousands of human chess records, but we have not made good use of them. "We can only eliminate the enemy's internal strength, but cannot use them for our own use. It is like taking thousands of gold coins and throwing them on the ground. It is a waste of precious things, which is really laughable."

History always rolls forward like a wheel. After searching for him for thousands of times, he turned out to be in the dim light. It is time to bring back the "Meiqi Pu" and "Luanpi Feng" who have been dormant for decades. However, they have already emerged from their cocoons and become butterflies, mastered the martial arts secrets, and appeared in front of us with a brand new look.

This secret book is called "Deep Learning".

10. The finishing touch brought by deep learning

What is "deep learning"? What is "convolutional neural network"? These terms may sound like science fiction, but we don't need to delve too deeply into them.

All you need to know is that deep learning is a type of machine learning. It is a sophisticated assembly line. The whole pig goes in from one side and the sausage comes out from the other side. The pig is the chessboard, the sausage is a chess move, and deep learning can be used to predict the next possible move in the current situation. The pig is the chessboard, and the sausages are good and bad, so deep learning can also be used to judge the pros and cons of the current situation.

So how is this assembly line built? It is not designed by the head, but calculated after seeing millions of pigs and the sausages they made. What we have in this way are pigs? The human master's chess score takes about 30 million steps (30 million pigs), and the computer can raise pigs by itself (simulate the game by itself).

So this person asked, why is it deep learning rather than other learning? What is the difference between it and memorizing chess scores?

Traditional machine learning requires that pigs be broken down into various "features" according to certain rules (color, weight, scars on the hind legs, etc.), and the sausage rules are determined by these characteristics. When there are only a few dead rules, it is not much different from the chess score (see pigs with scars on the hind legs, please get off Xiaofei).

But "Tao can be told, it is very good", and it is a chess sense that can only be understood. If you insist on writing it into rules and characteristics, it is neither necessary to achieve the meaning. You can't force Gu Li to tell you that he thought of the trick of "fire-cutting on the cliff", because there is a bend three on the left and a bend four on the right!

Deep learning omits this step, lets the machine automatically find these features and their combinations. Remember the situation estimation method I talked about "adjacent addition 4, small tip plus 3"? Deep learning found is not such a simple addition and subtraction (linear combination). In theory, it can simulate any nonlinear function.

OK, now we probably understand the secret of AlphaGo. Its main body is still the Monte Carlo tree, but it cleverly uses two deep learning models, one predicts the next hand and the other judges the situation. The result of the prediction reduces the search width; and the situation judgment reduces the search depth. Deep learning learns the sense of chess and the overall situation from human experience. They make Monte Carlo look like a tiger, and they flap their wings and fly to Lee Sedol. Li Shitou, who has been in the world for 15 years, lowered his proud head in front of another "Golden Brother".

Having written this, the past and present of "Arfa Dog" has been introduced almost completely. We must see that this is not simple, the victory of artificial intelligence.

AlphaGo's success is largely attributed to Google's engineers. They effectively parallelize complex algorithms, skillfully convert between CPU and GPU, and use "cloud computing" to solve the bottleneck of computing power with ease.

We also need to see that in addition to Google and AlphaGo, there are many designers, engineers, and computer chess players working hard at the same time. "Crazy Stone", "Zen", and Facebook's "Dark Forest" led by Chinese scientists are all moving forward together.

This is a carnival shared by human experience and computer algorithms, and in the final analysis, it is a feast belonging to Go.

11. Monte Carlo is successful and Monte Carlo is defeated

When writing this article, AlphaGo was playing the fifth game with Lee Sedol. AlphaGo, which had almost won the first three games, surprisingly lost the fourth game almost like a short circuit. For a moment, there were many different opinions and rumors. When we understand AlphaGo's past and present life, we may have a better understanding of some issues. After knowing its martial arts secrets, human chess players can also compete with each other more.

Here are some of my thoughts, one of my own words, which is a joke. Since AlphaGo has been in the dark, Google's papers are also limited info, and many of the views are my own speculation. I believe that as AlphaGo fights more chess players, it will become more and more transparent, and the challenges it faces will become more and more difficult.

Will AlphaGo make mistakes? Of course, it will. Not only the sequence, but also the middle, but also the official.

Because Monte Carlo makes mistakes. Monte Carlo can only ensure that the correct result is obtained at a certain probability, and the more simulations are, the smaller the possibility of making mistakes.

Computing power is always a bottleneck. If the computer can concentrate computing resources in a small search space, then the possibility of making mistakes here is very small. Think about the top of the first game and the left of the third game. Although the locality is complex, the search space is very small. It is probably difficult for people to please straight-line calculations in a small space (see AlphaGo's wonderful hand on the right of the first game). Because of algorithms, computer Go always wants to narrow the search space, so it is not difficult to understand that AlphaGo does not like to retain the characteristics of subsequent changes, vulgarity, and always simplify the situation.

What should the person do? Simply put: do the opposite.

Try to expand the search space, forcing the computer to search for more branches, and each branch is searched deeper. When limited computing resources have to be allocated to a large number of local areas, the possibility of errors becomes larger.

When AlphaGo wants to break one finger, we let it use even force with ten fingers.

Lee Sedol's "dig" move in the fourth game is exactly the one that AlphaGo misses calculations. Deep learning and multi-arm slot machines may not recommend it, or it may be recommended to rank low, resulting in no in-depth calculations; it may also be cut off, which is unknown. But no matter what, the situation before that move affects the whole body, and there are many possible points in the overall situation; the chess involves potential disasters and induces, which complicated the judgment of the situation, resulting in a significant increase in the search depth. It is true that success is Monte Carlo is true, and Monte Carlo is true!

Twelve, the dog-beating stick technique is to rob!

It’s easy to say, but that digging is clearly a rare step. Can it be copied?

In fact, there is also a more easy way, which is the "robbery" that has been discussed in the past few days (Go term, that is, both sides are picking back and forth at a critical point). The specific situation has been thoroughly analyzed by Go experts and peers. In short, robbery can increase the width and depth of the search at the same time.

Because the robbery materials may be scattered all over the world, most of them have nothing to do with the robbery itself, this makes it impossible for deep learning to accurately predict how to find the robbery, forcing the computer to do a global search.

The uncertainty of the robbery and robbery materials also makes it difficult to judge the situation, making AlphaGo unable to effectively reduce the search depth.

There are two interesting things here: one is that AlphaGo cannot not rob, because it also uses Monte Carlo's "crazy stone" and "Zen" to do it, but its robbery ability may not be stronger than other computer chess players. Because of this, we can see that several chess games AlphaGo has a clear tendency to avoid robbery. Just imagine if Ke Jie had an opponent who had the last chess power of not robbing for 13th stage and robbing for 5th stage, he might have a way to deal with it.

The second is that robbery can cause no attack, and robbery can not be eliminated. It is best to create several more robberies in different places to fight. Why do you ask? Will it reduce the difficulty of searching too quickly?

So why does AlphaGo act like a "lost" after missing calculations? Of course, it is very likely that Google's algorithm has flaws when dealing with "lost calculations". But even if this is not the case, it is not difficult to explain through the algorithm.

We know that even if deep learning can help computers evaluate the situation, every change is evaluated, and no matter how good the computing power is, it cannot bear it. Even the "God of Chess" Dark Blue, who has a much simpler work, can only evaluate the situation once every time he thinks deeper and takes multiple steps. As Hasabis himself said, at 79 moves, AlphaGo still thought that his winning rate was very high, and after taking another 10 moves, he realized that the situation was no longer the same. At that time, there was no regret medicine to take, and it was not difficult to understand that he was busy.

As for why AlphaGo sets up the second way on the right nonsensically? My understanding is this. Do you remember the minmax algorithm mentioned above? Computers will give priority to no matter how the other party responds, I will not be too bad. If you stand one hand, no matter how the other party responds, it may not make the current situation worse! Such a strategy should also explain why AlphaGo always feels like you are ahead and wait for you. This provides a way for human chess players: because of the minmax algorithm, AlphaGo is not very risky and excessive chess, so how it turns around in adversity is really an interesting question. As for the "digging" in the lower left that is at a loss to grandma's house, the author really doesn't understand.

Thirteen, Dogs are not the god of Go, and artificial intelligence is still very awkward

Can AlphaGo practice chess while fighting with one's own left and right? This is another interesting question.

I think that self-game will be effective, but it is impossible to continuously improve the chess power.

In the paper "Nature", we can see that self-game is mainly to solve the problem of overfitting caused by deep learning in human chess scores. From the perspective of machine learning, it cannot continuously improve martial arts. Professor Zhou Zhihua of Nanyang University compared self-game to Wudang's Tiyunzong, which is very wonderful! Can you really keep flying upward when stepping on your left foot in the air? Without external objects to borrow power, this is obviously not in line with the laws of physics. There is no perpetual motion machine in physics, and no artificial intelligence.

The two AlphaGo are both stubborn, the right one is more right and the wrong one is more wrong.

I'm more worried that Google is using it to create the Las Vegas search tree than using it for self-game play, that's a guy who really doesn't make mistakes.

So is it possible for AlphaGo to improve its chess without using the game against humans? Of course there are, but that requires two AlphaGo, which have extremely high chess power and are very different in martial arts.

Anyone who has played chess knows that he benefits from his masters and may gain occasionally with his low players. But if he plays with himself, it can only be called playing chess. We are looking forward to seeing AlphaGo and the Dark Forest ten-game chess or unprecedented hundred-game chess.

AlphaGo brings people endless excitement, and also brings unprecedented impetuousness and panic:

Will artificial intelligence take away my job?
Will machines rule people?
Do humans dig their own graves?

In fact, artificial intelligence is far from panic.

Will it take away our jobs? People panic when the steam engines arrive, and people panic when the PCs arrive. But they do not "take away" human work, but they just liberate us from work that we are not good at, improve efficiency, and do things that are better at and more important.

Think of Microsoft Xiaobing who can chat, sweeping robots, and driverless cars. They are our friends, not gravediggers.

Is it omnipotent? No, artificial intelligence is still very young and has a long way to go. Haven’t you seen the 20 years from Deep Blue to AlphaGo, so long that people almost forget it.

Will it rule humanity? It should not be in the future I can foresee. To do so, it must first have creativity, but as a fan of artificial intelligence, what I can see is still the long night.

The key is always in the hands of human beings. Even if artificial intelligence has the ability to rule human beings, we can at least cut off the power when that day comes, right?

<<: Apple Pay enters China: Alipay and WeChat face off

>>: NB-IoT standard is expected to be commercially available in 2017

Ministry of Industry and Information Technology: my country has built a total of 1.037 million 5G base stations

Blog

RackNerd Spring Festival promotion KVM annual payment starts from $10.88, multiple computer rooms in Los Angeles/San Jose/Seattle

Blog

Jack Ma summoned by Indian court! Alibaba's Wang Shuai: I couldn't find him after searching for a whole day

TmhHost Summer Holiday Offer: Japan/Hong Kong/US CN2 GIA/High Defense Cloud Server 20% Off Seasonal Payment Starting from 120 Yuan

TmhHost has launched this year's summer promo...

Case | A sobering report on a major network failure

December 6, 2018 was a nightmare day for Japanese...

DediPath Summer Promotion: 29% off on dedicated servers, 1Gbps unlimited traffic servers starting at $35.5/month, 35% off on VPS hosting

DediPath has released its latest summer promotion...

The past and present of AlphaGo

Ministry of Industry and Information Technology: my country has built a total of 1.037 million 5G base stations

RackNerd Spring Festival promotion KVM annual payment starts from $10.88, multiple computer rooms in Los Angeles/San Jose/Seattle

Jack Ma summoned by Indian court! Alibaba's Wang Shuai: I couldn't find him after searching for a whole day

How can Wi-Fi positioning technology survive in a field full of strong competitors?

Vice President of Huawei Cloud BU: We want to be the promoter of enterprise digital transformation

5G in the eyes of Americans

From 4G to 5G, my country's mobile communication highway is getting wider and wider

Where is the entrance to 5G message service? You may not think of it

TCP SYN Queue and Accept Queue

Fiber Polarity and Its Role in Switching Technology

Recommend

China Unicom experts: 5G should embrace AI from five aspects

The entire network discloses IP locations, and your location is exposed

A first look at the 6G era: frequency band exploration, wide-area coverage, cloud and connectivity

Germany's attitude suddenly changes, may the new bill exclude Huawei 5G in disguise?

TmhHost Summer Holiday Offer: Japan/Hong Kong/US CN2 GIA/High Defense Cloud Server 20% Off Seasonal Payment Starting from 120 Yuan

Case | A sobering report on a major network failure

DediPath Summer Promotion: 29% off on dedicated servers, 1Gbps unlimited traffic servers starting at $35.5/month, 35% off on VPS hosting

Bryan to launch fiber optic internet service

Recommend a lightweight and fast file transfer tool for LAN

5G economy has arrived, and there is no "hidden corner" for industrial innovation

Beat 5G! German team achieves the fastest wireless transmission to date, 14G data per second

About remote procedure call gRPC

Working together: Two ways Wi-Fi and 5G can coexist

Networking in Pictures: What is Virtual Router Redundancy Protocol (VRRP)?

UCloud: Kuaijie Cloud Server (Ulanqab) starts from as low as 37 yuan in the first year