2025 Volume 34 Issue 1

Article Contents

Previous Article Next Article

Li Hong(洪莉), Yu Liu(刘宇), Mengqiao Xu(徐梦俏), and Wenhui Deng(邓文慧). 2025: Combining deep reinforcement learning with heuristics to solve the traveling salesman problem, Chinese Physics B, 34(1): 018705. doi: 10.1088/1674-1056/ad95f1

Citation:

Li Hong(洪莉), Yu Liu(刘宇), Mengqiao Xu(徐梦俏), and Wenhui Deng(邓文慧). 2025: Combining deep reinforcement learning with heuristics to solve the traveling salesman problem, Chinese Physics B, 34(1): 018705. doi: 10.1088/1674-1056/ad95f1

Combining deep reinforcement learning with heuristics to solve the traveling salesman problem

1 School of Software Technology, Dalian University of Technology, Dalian 116620, China;
2 School of Economics and Management, Dalian University of Technology, Dalian 116024, China

Received Date: 27/07/2024
Accepted Date: 13/11/2024
Fund Project:
Project supported by the National Natural Science Foundation of China (Grant Nos. 72101046 and 61672128).

Abstract

Recent studies employing deep learning to solve the traveling salesman problem (TSP) have mainly focused on learning construction heuristics. Such methods can improve TSP solutions, but still depend on additional programs. However, methods that focus on learning improvement heuristics to iteratively refine solutions remain insufficient. Traditional improvement heuristics are guided by a manually designed search strategy and may only achieve limited improvements. This paper proposes a novel framework for learning improvement heuristics, which automatically discovers better improvement policies for heuristics to iteratively solve the TSP. Our framework first designs a new architecture based on a transformer model to make the policy network parameterized, which introduces an action-dropout layer to prevent action selection from overfitting. It then proposes a deep reinforcement learning approach integrating a simulated annealing mechanism (named RL-SA) to learn the pairwise selected policy, aiming to improve the 2-opt algorithm's performance. The RL-SA leverages the whale optimization algorithm to generate initial solutions for better sampling efficiency and uses the Gaussian perturbation strategy to tackle the sparse reward problem of reinforcement learning. The experiment results show that the proposed approach is significantly superior to the state-of-the-art learning-based methods, and further reduces the gap between learning-based methods and highly optimized solvers in the benchmark datasets. Moreover, our pre-trained model M can be applied to guide the SA algorithm (named M-SA (ours)), which performs better than existing deep models in small-, medium-, and large-scale TSPLIB datasets. Additionally, the M-SA (ours) achieves excellent generalization performance in a real-world dataset on global liner shipping routes, with the optimization percentages in distance reduction ranging from 3.52% to 17.99%.
- traveling salesman problem,
- deep reinforcement learning,
- simulated annealing algorithm,
- transformer model,
- whale optimization algorithm

References

Tirkolaee E B, Cakmak E and Karadayi-Usta S 2024 International Transactions in Operational Research 0 13452

Combining deep reinforcement learning with heuristics to solve the traveling salesman problem

Abstract

References

Access History

通讯作者: 陈斌, bchen63@163.com

Article Metrics

Access History

Combining deep reinforcement learning with heuristics to solve the traveling salesman problem

HTML

Catalog

Export File

Citation

Format

Content