Pluribus Poker Bot Crushes Pro Players in Six-Handed Cash Game
The poker bots have been at it again, crushing human players on the virtual felt. But no, this is not an article about a bot ring being discovered at an online-poker site. There is nothing to fear. Or maybe there is? The developers of the award-winning poker artificial intelligence program, Libratus, have made a new bot called Pluribus and this one did not beat players heads-up like Libratus did, but instead it beat five professional poker players at a time in a no-limit hold’em game. Handily.
There were two different experiments conducted by Pluribus’ creators, Noam Brown and Tuomas Sandholm of Carnegie Mellon University. In one, Pluribus went up against five poker pros, while in the other, there were five instances of Pluribus at the table against one pro.
Five Pros, One Bot
In the five humans and one Pluribus matches (called 5H+1A1 in the published paper), the poker AI competed against a random selection of the following players: Jimmy Chou, Seth Davies, Michael Gagliano, Anthony Gregg, Dong Kim, Jason Les, Linus Loeliger, Daniel McAulay, Greg Merson, Nicholas Petrangelo, Sean Ruane, Trevor Savage, and Jacob Toole.
In each contest, the six players – five humans and one Pluribus in a six-handed cash game – played 10,000 hands over the course of a dozen days. Each player was given an anonymous screen name so that the players did not know who the actual human was behind the names and thus have an advantage based on the knowledge of a human player’s lifetime tendencies. The screen names stayed consistent throughout the 10,000 hand match so that players could try to figure out how each of their opponents played. Pluribus was not disguised. Think of it like sitting down in a live poker room with four complete strangers and a computer. You know nothing about the humans, but can try to pick up on their play styles during the game.
So that the poker pros would play seriously, $50,000 in prize money was divvied up. Each player received at least $0.40 per hand just for playing and as much as $1.60 per hand, depending upon performance. Players were initially asked to play on four tables at a time, but the players themselves requested that be increased to six tables at a time.
Their performance against Pluribus was not good. The poker AI took the five human players in the six-handed cash games for an average of 48 mbb/game (milli big blinds per game). To translate, that is 48 big blinds per thousand hands, or 4.8 big blinds per hundred.
Blinds were $50/$100 and each player began with 100 big blinds, so Pluribus averaged a $48,000 profit per game with just $60,000 on the table.
Five Bots, One Pro
In the other experiment, it was five Pluribus bots against a single human player. Those human players were quite the pair: Chris Ferguson and Darren Elias. They were paid $2,000 each and whichever did better than the other against the AI received an additional $2,000.
Neither human knew who the other one was or how they were doing. This is likely because even though they were not directly competing against each other, the scientists didn’t want them adjusting their strategy based on the knowledge of the other person.
The five Pluribus bots at the table did not know the identities of its opponents, human or AI, so they could not collude with each other. They also don’t adjust their strategies based on opponents.
And once again, Pluribus destroyed. Elias trailed Pluribus by 40 mbb/game and Ferguson was behind by 25 mbb/game.
Pluribus trained itself all on its own, without playing against human opponents; it actually played against other Pluribus bots. As the scientists described it, “The AI starts from scratch by playing randomly, and gradually improves as it determines which actions, and which probability distribution over those actions, lead to better outcomes against earlier versions of its strategy.”
Once in a game with humans, Pluribus would keep looking ways to improve on that baseline, or “blueprint” strategy.
To simplify things a little for Pluribus, the developers reduced some of the decision-making it needed to do. For instance, it didn’t consider every possible bet size, instead choosing from between one and fourteen different bets depending on the situation. This is actually similar to how we humans behave, as in a $50/$100 game with $10,000 stacks, we aren’t generally considering every single bet size between $100 and $10,000, but rather a smaller number of possible bets depending on the pot size and specific situation. There were options for Pluribus to veer from its bet restrictions if necessary.
“Pluribus achieved superhuman performance at multiplayer poker, which is a recognized milestone in artificial intelligence and in game theory that has been open for decades,” said Tuomas Sandholm in a press release. “Thus far, superhuman AI milestones in strategic reasoning have been limited to two-party competition. The ability to beat five other players in such a complicated game opens up new opportunities to use AI to solve a wide variety of real-world problems.”
“It was incredibly fascinating getting to play against the poker bot and seeing some of the strategies it chose,” poker player Michael Gagliano said. “There were several plays that humans simply are not making at all, especially relating to its bet sizing. Bots/AI are an important part in the evolution of poker, and it was amazing to have first-hand experience in this large step toward the future.”
If you want to dive deep into more information about the Pluribus poker bot, check out the research article published by Sandholm and Brown on science.sciencemag.org. It also includes a link to even further detail in their supplementary materials.
COMMENTS