TY - JOUR
T1 - Millimeter-wave concurrent beamforming
T2 - A multi-player multi-armed bandit approach
AU - Mohamed, Ehab Mahmoud
AU - Hashima, Sherief
AU - Hatano, Kohei
AU - Kasban, Hani
AU - Rihan, Mohamed
N1 - Publisher Copyright:
© 2020 Tech Science Press. All rights reserved.
PY - 2020
Y1 - 2020
N2 - The communication in the Millimeter-wave (mmWave) band, i.e., 30~300 GHz, is characterized by short-range transmissions and the use of antenna beamforming (BF). Thus, multiple mmWave access points (APs) should be installed to fully cover a target environment with gigabits per second (Gbps) connectivity. However, inter-beam interference prevents maximizing the sum rates of the established concurrent links. In this paper, a reinforcement learning (RL) approach is proposed for enabling mmWave concurrent transmissions by finding out beam directions that maximize the long-term average sum rates of the concurrent links. Specifically, the problem is formulated as a multiplayer multiarmed bandit (MAB), where mmWave APs act as the players aiming to maximize their achievable rewards, i.e., data rates, and the arms to play are the available beam directions. In this setup, a selfish concurrent multiplayer MAB strategy is advocated. Four different MAB algorithms, namely, ϵ-greedy, upper confidence bound (UCB), Thompson sampling (TS), and exponential weight algorithm for exploration and exploitation (EXP3) are examined by employing them in each AP to selfishly enhance its beam selection based only on its previous observations. After a few rounds of interactions, mmWave APs learn how to select concurrent beams that enhance the overall system performance. The proposed MAB based mmWave concurrent BF shows comparable performance to the optimal solution.
AB - The communication in the Millimeter-wave (mmWave) band, i.e., 30~300 GHz, is characterized by short-range transmissions and the use of antenna beamforming (BF). Thus, multiple mmWave access points (APs) should be installed to fully cover a target environment with gigabits per second (Gbps) connectivity. However, inter-beam interference prevents maximizing the sum rates of the established concurrent links. In this paper, a reinforcement learning (RL) approach is proposed for enabling mmWave concurrent transmissions by finding out beam directions that maximize the long-term average sum rates of the concurrent links. Specifically, the problem is formulated as a multiplayer multiarmed bandit (MAB), where mmWave APs act as the players aiming to maximize their achievable rewards, i.e., data rates, and the arms to play are the available beam directions. In this setup, a selfish concurrent multiplayer MAB strategy is advocated. Four different MAB algorithms, namely, ϵ-greedy, upper confidence bound (UCB), Thompson sampling (TS), and exponential weight algorithm for exploration and exploitation (EXP3) are examined by employing them in each AP to selfishly enhance its beam selection based only on its previous observations. After a few rounds of interactions, mmWave APs learn how to select concurrent beams that enhance the overall system performance. The proposed MAB based mmWave concurrent BF shows comparable performance to the optimal solution.
KW - Concurrent transmissions
KW - Millimeter wave (mmWave)
KW - Multiarmed bandit (MAB)
KW - Reinforcement learning
UR - http://www.scopus.com/inward/record.url?scp=85091827824&partnerID=8YFLogxK
U2 - 10.32604/cmc.2020.011816
DO - 10.32604/cmc.2020.011816
M3 - Article
AN - SCOPUS:85091827824
SN - 1546-2218
VL - 65
SP - 1987
EP - 2007
JO - Computers, Materials and Continua
JF - Computers, Materials and Continua
IS - 3
ER -