DMQ: Dual-Mode Q-Learning Hardware Accelerator for Shortest Path and Coverage Area
🔗Abstract: In this paper, we propose a novel dual-mode Q-learning hardware accelerator (DMQ) for shortest path and coverage area problems. The hardware accelerator design uses only an agent to tackle multiple modes, in this case, shortest path and coverage area problems for mobile robots. The work proposes a modified policy generator that supports two reward functions for the shortest path and coverage area. The coverage area mode has 4× state spaces than that of the shortest path mode. We also explore some policy generators such as decreasing epsilon and greedy for the dual-mode Q-learning accelerator. Experimental results show that by using a greedy policy generator the learning rate of an agent is faster for both problems. Moreover, the hardware architecture requires only 1199 LUTs, 4 LUTRAMs, and 6 BRAMs for the dual-mode functions. With a throughput of 185.63 MHz, the proposed work outperforms other methods up to 13× in energy efficiency. The proposed work is useful for disaster recovery, smart navigation, and other artificial intelligence applications.
I. Syafalni, M. I. Firdaus, N. Sutisna, T. Adiono, T. Juhana and R. Mulyawan, “DMQ: Dual-Mode Q-Learning Hardware Accelerator for Shortest Path and Coverage Area,” 2024 IEEE 37th International System-on-Chip Conference (SOCC), Dresden, Germany, 2024, pp. 1-6, doi: 10.1109/SOCC62300.2024.10737818.