TSP Solver with Deep RL

This is PyTorch implementation of NEURAL COMBINATORIAL OPTIMIZATION WITH REINFORCEMENT LEARNING, Bello et al. 2016 [https://arxiv.org/abs/1611.09940]

Pointer Networks is the model architecture proposed by Vinyals et al. 2015 [https://arxiv.org/abs/1506.03134]

This model uses attention mechanism to output a permutation of the input index.

In this work, we will tackle Traveling Salesman Problem(TSP), which is one of the combinatorial optimization problems known as NP-hard. TSP seeks for the shortest tour for a salesman to visit each city exactly once.

Training without supervised solution

In the training phase, this TSP solver will optimize 2 different types of Pointer Networks, Actor and Critic model.

Given a graph of cities where the cities are the nodes, critic model predicts expected tour length, which is generally called state-value. Parameters of critic model are optimized as the estimated tour length catches up with the actual length calculated from the tour(city permutation) predicted by actor model. Actor model updates its policy parameters with the value called advantage which subtracts state-value from the actual tour length.

Actor-Critic

Actor:  Defines the agent's behavior, its policy
Critic: Estimates the state-value

Inference

Active Search and Sampling

In this paper, two approaches to find the best tour at inference time are proposed, which we refer to as Sampling and Active Search.

Search strategy called Active Search takes actor model and use policy gradient for updating its parameters to find the shortest tour. Sampling simply just select the shortest tour out of 1 batch.

Usage

Training

First generate the pickle file contaning hyperparameter values by running the following command

(in this example, train mode, batch size 512, 20 city nodes, 13000 steps).

python config.py -m train -b 512 -t 20 -s 13000

-m train could be replaced with -m train_emv. emv is the abbreviation of 'Exponential Moving Average', which doesn't need critic model. Then, go on training.

python train.py -p Pkl/train20.pkl

Inference

If training is done, set the configuration for inference.
Now, you can see how the training process went from the csv files in the Csv dir.
You may use my pre-trained weight Pt/train20_1113_12_12_step14999_act.pt which I've trained for 20 nodes'.

python config.py -m test -t 20 -s 10 -ap Pt/train20_1113_12_12_step14999_act.pt --islogger --seed 123

python test.py -p Pkl/test20.pkl

Environment

I leave my own environment below. I tested it out on a single GPU.

OS:
- Linux(Ubuntu 18.04.5 LTS)
GPU:
- NVIDIA® GeForce® RTX 2080 Ti VENTUS 11GB OC
CPU:
- Intel® Xeon® CPU E5640 @ 2.67GHz
NVIDIA® Driver = 455.45.01
Docker = 20.10.3
nvidia-docker2(for GPU)

Dependencies

Python = 3.6.10
PyTorch = 1.6.0
numpy
tqdm (if you need)
matplotlib (only for plotting)

Docker(option)

Make sure you've already installed Docker

docker version

latest NVIDIA® Driver

nvidia-smi

and nvidia-docker2(for GPU)

Usage

build or pull docker image

build image(this might take some time)

./docker.sh build

pull image from dockerhub

docker pull docker4rintarooo/tspdrl:latest

run container using docker image(-v option is to mount directory)

./docker.sh run

If you don't have a GPU, you can run

./docker.sh run_cpu

Name		Name	Last commit message	Last commit date
Latest commit History 80 Commits
Csv		Csv
Pkl		Pkl
Pt		Pt
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
actor.py		actor.py
config.py		config.py
critic.py		critic.py
data.py		data.py
docker.sh		docker.sh
env.py		env.py
search.py		search.py
test.py		test.py
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

TSP Solver with Deep RL

Training without supervised solution

Actor-Critic

Inference

Active Search and Sampling

Usage

Training

Inference

Environment

Dependencies

Docker(option)

Usage

Reference

About

Releases

Packages

Languages

License

Rintarooo/TSP_DRL_PtrNet

Folders and files

Latest commit

History

Repository files navigation

TSP Solver with Deep RL

Training without supervised solution

Actor-Critic

Inference

Active Search and Sampling

Usage

Training

Inference

Environment

Dependencies

Docker(option)

Usage

Reference

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages