Оптимізація нейронної мережі алгоритму DeepStack для гри у Leduc Hold’em

Дорогий, Ярослав Юрійович; Цуркан, Василь Васильович; Лісовий, Владислав Юрійович

Оптимізація нейронної мережі алгоритму DeepStack для гри у Leduc Hold’em

dc.contributor.author	Дорогий, Ярослав Юрійович
dc.contributor.author	Цуркан, Василь Васильович
dc.contributor.author	Лісовий, Владислав Юрійович
dc.date.accessioned	2020-04-22T19:39:09Z
dc.date.available	2020-04-22T19:39:09Z
dc.date.issued	2017
dc.description.abstracten	Artificial intelligence is used to automatically control cars, underwater vehicles, aircraft, rockets, robots, text and speech recognition, medical diagnostics and so on. Also in recent years, artificial intelligence is often used in games - in computer games to find ways in two-dimensional or three-dimensional space, simulating the behavior of units, and in board games and card games. Since in most card games players can not accurately develop a winning strategy because the opponent always operates on certain information is unknown to the player, with the evolution of artificial intelligence the question arose whether such a system is capable of playing at the same level as a person and whether it can win a person. The article considers the implementation of a neural network and the selection of its structure, which is used in the DeepStack algorithm. Its detailed description and principle of operation are given. This algorithm is used to make decisions during poker. Poker is represented as a game with incomplete information. The calculation of the strategy is based on two parameters - the counterfactual values of the opponent and the range of the player. The player's range is the probability distribution of the player's possible hands, taking into account the achievement of some public state. Counterfactual values - the probability of the opponent receiving such a hand, which in any case will be higher than the player's hand. These parameters have been described in detail using the formulas from the game theory. The proposed neural network is used to calculate the strategy, namely counterfactual values of the opponent. As a neural network, a feedforward network was chosen. The input values for the network are the bet size as a part of the player's total bank and the encoded ranges of players that depend on the open cards. The network output is a vector of counterfactual values for each player and hand, which are then interpreted as a bet size. As the data for training, a set of solved poker situations was used, which included various bet values and hand combinations. The proposed neural network is used to calculate the strategy, namely counterfactual values of the opponent. The dataset consisted of 1 000 000 solved poker situations. Several network structures are considered: a network with 2 hidden layers, an activation function of Tanh, a network with 2 hidden layers, an activation function of ReLU, a network with 2 hidden layers, an activation function of SoftPlus, a network with 4 hidden layers, a Tanh activation function, a network with 4 hidden layers, a ReLU activation function, Network with 4 hidden layers SoftPlus activation function. From these structures, the optimal one was chosen. The criterion of choice is the exploitability of the strategy, which shows the number of chips that a player loses by playing a certain strategy. Of the considered structures, the optimal network has 4 hidden layers with the SoftPlus activation function.	uk
dc.description.abstractru	В статье рассмотрен вопрос реализации нейронной сети и подбора ее структуры, которая используется в алгоритме DeepStack. Приведено подробное описание алгоритма и принципа его работы. Рассмотренный алгоритм используется для принятия решения во время игры в покер. Покер представлен как игра с неполной информацией. Расчет стратегии происходит на базе двух параметров - контрфактических значений оппонента и диапазона игрока. Предложенная нейронная сеть используется для расчета стратегии, а именно контрфактических значений оппонента. В качестве нейронной сети была выбрана сеть прямого распространения. В качестве данных для обучения использовался набор решеных покерных ситуаций, который включает в себя различные величины ставок и комбинации рук. Рассмотрены несколько структур сетей и выбрана оптимальная. Критерием выбора служит оценка уязвимости стратегии.	uk
dc.description.abstractuk	В статті розглянуте питання реалізації нейронної мережі та підбору її структури, яка використовується в алгоритмі DeepStack. Наведений детальний опис алгоритму та принцип його роботи. Розглянутий алгоритм використовується для прийняття рішення під час гри в покер. Покер представлений як гра з неповною інформацією. Розрахунок стратегії відбувається на основі двох параметрів – контрфактичних значень опонента та діапазону гравця. Запропонована нейронна мережа використовується для розрахунку стратегії, а саме контрфактичних значень опонента. В якості нейронної мережі була вибрана мережа прямого розповсюдження. В якості даних для навчання використовувався набір вирішений покерних ситуацій, який включав в себе різні величини ставок та комбінації рук. Розглянуто декілька структур мереж та вибрана оптимальна. Критерієм вибору слугує оцінка вразливості стратегії.	uk
dc.format.pagerange	С. 63-72	uk
dc.identifier.citation	Дорогий, Я. Ю. Оптимізація нейронної мережі алгоритму DeepStack для гри у Leduc Hold’em / Дорогий Я. Ю., Цуркан В. В., Лісовий В. Ю. // Мікросистеми, Електроніка та Акустика : науково-технічний журнал. – 2017. – Т. 22, № 5(100). – С. 63–72. – Бібліогр.: 13 назв.	uk
dc.identifier.doi	https://doi.org/10.20535/2523-4455.2017.22.5.105016
dc.identifier.uri	https://ela.kpi.ua/handle/123456789/33035
dc.language.iso	uk	uk
dc.publisher	КПІ ім. Ігоря Сікорського	uk
dc.publisher.place	Київ	uk
dc.rights.uri	https://creativecommons.org/licenses/by/4.0/	uk
dc.source	Мікросистеми, Електроніка та Акустика : науково-технічний журнал, 2017, Т. 22, № 5(100)	uk
dc.subject	нейронна мережа	uk
dc.subject	покер	uk
dc.subject	стратегія	uk
dc.subject	контрфактичні значення	uk
dc.subject	дерево передбачення	uk
dc.subject	neural network	uk
dc.subject	poker	uk
dc.subject	strategy	uk
dc.subject	counterfactual values	uk
dc.subject	lookahead tree	uk
dc.subject	нейронная сеть	uk
dc.subject	стратегия	uk
dc.subject	контрфактические значения	uk
dc.subject	дерево предсказания	uk
dc.subject.udc	004.89	uk
dc.title	Оптимізація нейронної мережі алгоритму DeepStack для гри у Leduc Hold’em	uk
dc.title.alternative	Neural network optimtzation of algorithm DeepStack for playing in Leduc Hold’em	uk
dc.title.alternative	Оптимизация нейронной сети алгоритма DeepStack для игры в Leduc Hold’em	uk
dc.type	Article	uk

Файли

Контейнер файлів

Зараз показуємо 1 - 1 з 1

Назва:: MEA2017_22-5_p63-72.pdf
Розмір:: 581.19 KB
Формат:: Adobe Portable Document Format
Опис:

Завантажити

Ліцензійна угода

Зараз показуємо 1 - 1 з 1

Назва:: license.txt
Розмір:: 9.06 KB
Формат:: Item-specific license agreed upon to submission
Опис:

Завантажити

Зібрання

Мікросистеми, Електроніка та Акустика: науково-технічний журнал, Т. 22, № 5(100)