2025 Volume 34 Issue 1
Article Contents

Li Hong(洪莉), Yu Liu(刘宇), Mengqiao Xu(徐梦俏), and Wenhui Deng(邓文慧). 2025: Combining deep reinforcement learning with heuristics to solve the traveling salesman problem, Chinese Physics B, 34(1): 018705. doi: 10.1088/1674-1056/ad95f1
Citation: Li Hong(洪莉), Yu Liu(刘宇), Mengqiao Xu(徐梦俏), and Wenhui Deng(邓文慧). 2025: Combining deep reinforcement learning with heuristics to solve the traveling salesman problem, Chinese Physics B, 34(1): 018705. doi: 10.1088/1674-1056/ad95f1

Combining deep reinforcement learning with heuristics to solve the traveling salesman problem

  • Received Date: 27/07/2024
    Accepted Date: 13/11/2024
  • Fund Project:

    Project supported by the National Natural Science Foundation of China (Grant Nos. 72101046 and 61672128).

  • Recent studies employing deep learning to solve the traveling salesman problem (TSP) have mainly focused on learning construction heuristics. Such methods can improve TSP solutions, but still depend on additional programs. However, methods that focus on learning improvement heuristics to iteratively refine solutions remain insufficient. Traditional improvement heuristics are guided by a manually designed search strategy and may only achieve limited improvements. This paper proposes a novel framework for learning improvement heuristics, which automatically discovers better improvement policies for heuristics to iteratively solve the TSP. Our framework first designs a new architecture based on a transformer model to make the policy network parameterized, which introduces an action-dropout layer to prevent action selection from overfitting. It then proposes a deep reinforcement learning approach integrating a simulated annealing mechanism (named RL-SA) to learn the pairwise selected policy, aiming to improve the 2-opt algorithm's performance. The RL-SA leverages the whale optimization algorithm to generate initial solutions for better sampling efficiency and uses the Gaussian perturbation strategy to tackle the sparse reward problem of reinforcement learning. The experiment results show that the proposed approach is significantly superior to the state-of-the-art learning-based methods, and further reduces the gap between learning-based methods and highly optimized solvers in the benchmark datasets. Moreover, our pre-trained model M can be applied to guide the SA algorithm (named M-SA (ours)), which performs better than existing deep models in small-, medium-, and large-scale TSPLIB datasets. Additionally, the M-SA (ours) achieves excellent generalization performance in a real-world dataset on global liner shipping routes, with the optimization percentages in distance reduction ranging from 3.52% to 17.99%.
  • 加载中
  • Tirkolaee E B, Cakmak E and Karadayi-Usta S 2024 International Transactions in Operational Research 0 13452

    Google Scholar Pub Med

    Lamperti R D and de Arruda L V R 2023 Engineering Applications of Artificial Intelligence 126 106884

    Google Scholar Pub Med

    Zhang X X 2020 Proceedings of the Sixth International Forum on Decision Sciences (Singapore: Springer) p. 65

    Google Scholar Pub Med

    Xu Y and Che C 2019 IEEE 9th International Conference on Electronics Information and Emergency Communication, July 12-14, 2019, Beijing, China, p. 7

    Google Scholar Pub Med

    Si J, Yang S, Cen Y, et al. 2024 Nat. Commun. 15 3457

    Google Scholar Pub Med

    Cheikhrouhou O and Khoufi I 2021 Computer Science Review 40 100369

    Google Scholar Pub Med

    Yang L, Wang X, He Z, et al. 2023 International Conference on Bio- Inspired Computing: Theories and Applications, December, 2023, Singapore, p. 3

    Google Scholar Pub Med

    Kizilates G and Nuriyeva F 2013 Advances in Computational Science, Engineering and Information Technology (Heidelberg: Springer) p. 111

    Google Scholar Pub Med

    Snyder L V and Daskin M S 2006 European Journal of Operational Research 174 38

    Google Scholar Pub Med

    Ezugwu A E S, Adewumi A O and Frıncu M E 2017 Expert Systems with Applications 77 189

    Google Scholar Pub Med

    Cui Y, Zhong J, Yang F, et al. 2020 IEEE Access 8 227497

    Google Scholar Pub Med

    Zhang J, Hong L and Liu Q 2020 Symmetry 13 48

    Google Scholar Pub Med

    Juan A A, Kelton W D, Currie C S M, et al. 2018 Winter Simulation Conference, December 9-12, 2018, Gothenburg, Sweden, p. 3048

    Google Scholar Pub Med

    Khalil E, Dai H, Zhang Y, et al. 2017 Advances in Neural Information Processing Systems, December, 2017, Long Beach, California, USA, p. 6351

    Google Scholar Pub Med

    Arulkumaran K, Deisenroth M P, Brundage M, et al. 2017 IEEE Signal Processing Magazine 34 26

    Google Scholar Pub Med

    Li Z, Liu F, Yang W, et al. 2021 IEEE Transactions On Neural Networks And Learning Systems 33 6999

    Google Scholar Pub Med

    Ling Z, Tao X, Zhang Y, et al. 2020 IEEE Transactions on Systems, Man, and Cybernetics: Systems 51 7475

    Google Scholar Pub Med

    Vaswani A, Shazeer N, Parmar N, et al. 2017 Advances in Neural Information Processing Systems, 2017, Long Beach, California, USA, p. 6000

    Google Scholar Pub Med

    Wang J, Xiao C, Wang S, et al. 2023 The Journal of Engineering 9 e12303

    Google Scholar Pub Med

    Ling Z, Zhang Y and Chen X 2023 IEEE Transactions on Intelligent Transportation Systems 24 5871

    Google Scholar Pub Med

    Xing Z, Tu S and Xu L 2020 arXiv:2005.06879 [cs.LG]

    Google Scholar Pub Med

    Luo J, Li C, Fan Q, et al. 2022 Engineering Applications of Artificial Intelligence 112 104848

    Google Scholar Pub Med

    Deudon M, Cournut P, Lacoste A, et al. 2018 International Conference on the Integration of Constraint Programming, Artificial Intelligence, and Operations Research, June, 2018, p. 170

    Google Scholar Pub Med

    Kool W, Van Hoof H and Welling M 2019 Attention, learn to solve routing problems! In Proceedings of the 7th International Conference on Learning Representations (ICLR), May 6-9, 2019, New Orleans, Louisiana, United States, p. 25

    Google Scholar Pub Med

    Paulo R d O Costa, Rhuggenaath J, Zhang Y, et al. 2020 In Asian Conference on Machine Learning, November 18-20, 2020, Bangkok, Thailand, p. 465

    Google Scholar Pub Med

    Ouyang W, Wang Y, Han S, et al. 2021 IEEE Symposium Series on Computational Intelligence, December 5-7, 2021, Orlando, FL, USA,

    Google Scholar Pub Med

    Vitali T, Mele U J, Gambardella L M, et al. 2021 arXiv:2108.00938 [cs.AI]

    Google Scholar Pub Med

    Parasteh S, Khorram A, Mouhoub M, et al. 2022 IEEE International Conference on Systems, Man, and Cybernetics, October 9-12, 2022, Prague, Czech Republic, p. 2514

    Google Scholar Pub Med

    Wu Y, Song W, Cao Z, et al. 2021 IEEE Transactions on Neural Networks and Learning Systems 33 5057

    Google Scholar Pub Med

    Navarro D J, Newell B R and Schulze C 2016 Cognitive Psychology 85 43

    Google Scholar Pub Med

    Pei M, An H, Liu B, et al. 2021 IEEE Transactions on Systems, Man, and Cybernetics: Systems 52 4415

    Google Scholar Pub Med

    Ghannadi P, Kourehli S S and Mirjalili S 2023 Frattura ed Integrità Strutturale 64 51

    Google Scholar Pub Med

    Mirjalili S and Lewis A 2016 Advances in Engineering Software 95 51

    Google Scholar Pub Med

    Zheng J, Yuan T, Xie W, et al. 2023 Sensors 23 6463

    Google Scholar Pub Med

    Benyamin A, Farhad S G and Saeid B 2021 International Journal of Intelligent Systems 36 1270

    Google Scholar Pub Med

    Sutton R and Barto A 1998 IEEE Transactions on Neural Networks 9 1054

    Google Scholar Pub Med

    Srivastava N, Hinton G, Krizhevsky A, et al. 2014 The Journal of Machine Learning Research 15 1929

    Google Scholar Pub Med

    Williams R J 1992 Machine Learning 8 229

    Google Scholar Pub Med

    Al-Hamadani M N A, Fadhel M A, Alzubaidi L, et al. 2024 Sensors 24 2461

    Google Scholar Pub Med

    Chen X and Tian Y 2019 Advances in Neural Information Processing Systems 2019 32

    Google Scholar Pub Med

    Reinhelt G 2014 {TSPLIB}: a library of sample instances for the TSP (and related problems) from various sources and of various types

    Google Scholar Pub Med

    Xu M, Pan Q, Muscoloni A, et al. 2020 Nat. Commun. 11 2849

    Google Scholar Pub Med

    Helsgaun K 2017 Roskilde: Roskilde University 12 966

    Google Scholar Pub Med

    Applegate D, Bixby R, Chvatal V, et al. 2006 Concorde TSP solver

    Google Scholar Pub Med

    Perron L and Furnon V 2019 OR-Tools (version 7.2)

    Google Scholar Pub Med

    Lei K, Guo P, Wang Y, et al. 2022 Neurocomputing 508 79

    Google Scholar Pub Med

    Wang Q and Tang C 2021 Knowledge-Based Systems 233 107526

    Google Scholar Pub Med

    Vinyals O, FortunatoMand Jaitly N 2015 Advances in Neural Information Processing Systems, December, 2015, Montreal, Canada, p. 2692

    Google Scholar Pub Med

    Liu H, Zong Z, Li Y, et al. 2023 Applied Soft Computing 146 110680

    Google Scholar Pub Med

    Rozić T, Naletina D and Zając M 2022 Applied Sciences 12 8452

    Google Scholar Pub Med

    Verschuur J, Salmon N, Hall J, et al. 2024 Environmental Research: Infrastructure and Sustainability 4 015001

    Google Scholar Pub Med

    Rbihou S and Haddouch K 2021 Fifth International Conference on Intelligent Computing in Data Sciences, October 20-22, 2021, Fez, Morocco, p. 1

    Google Scholar Pub Med

    Wang S, Liu M and Chu F 2020 Journal of Cleaner Production 249 119433

    Google Scholar Pub Med

    Kalatzantonakis P, Sifaleras A and Samaras N 2023 Expert Systems with Applications 213 118812

    Google Scholar Pub Med

  • 加载中
通讯作者: 陈斌, bchen63@163.com
  • 1. 

    沈阳化工大学材料科学与工程学院 沈阳 110142

  1. 本站搜索
  2. 百度学术搜索
  3. 万方数据库搜索
  4. CNKI搜索

Article Metrics

Article views(181) PDF downloads(2) Cited by(0)

Access History

Combining deep reinforcement learning with heuristics to solve the traveling salesman problem

Fund Project: 

Abstract: Recent studies employing deep learning to solve the traveling salesman problem (TSP) have mainly focused on learning construction heuristics. Such methods can improve TSP solutions, but still depend on additional programs. However, methods that focus on learning improvement heuristics to iteratively refine solutions remain insufficient. Traditional improvement heuristics are guided by a manually designed search strategy and may only achieve limited improvements. This paper proposes a novel framework for learning improvement heuristics, which automatically discovers better improvement policies for heuristics to iteratively solve the TSP. Our framework first designs a new architecture based on a transformer model to make the policy network parameterized, which introduces an action-dropout layer to prevent action selection from overfitting. It then proposes a deep reinforcement learning approach integrating a simulated annealing mechanism (named RL-SA) to learn the pairwise selected policy, aiming to improve the 2-opt algorithm's performance. The RL-SA leverages the whale optimization algorithm to generate initial solutions for better sampling efficiency and uses the Gaussian perturbation strategy to tackle the sparse reward problem of reinforcement learning. The experiment results show that the proposed approach is significantly superior to the state-of-the-art learning-based methods, and further reduces the gap between learning-based methods and highly optimized solvers in the benchmark datasets. Moreover, our pre-trained model M can be applied to guide the SA algorithm (named M-SA (ours)), which performs better than existing deep models in small-, medium-, and large-scale TSPLIB datasets. Additionally, the M-SA (ours) achieves excellent generalization performance in a real-world dataset on global liner shipping routes, with the optimization percentages in distance reduction ranging from 3.52% to 17.99%.

Reference (54)

Catalog

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return