Implementation of genetic algorithm to solve the instance of travel salesman problem Berlin52.
Task: "Given a list of cities and the distances between each pair of cities, what is the shortest possible route that visits each city exactly once and returns to the origin city?" . In short we have to find the Hamiltonian circuit in undirected graph. Tsp is NP-hard problem, so there exist some nondeterministic solutions. This project tries to give the bast aproximation using genetic algorithm. There are 52 cities in data_tsp.txt with their coordinates
Tsp is minimization problem because we are searcing for smallest distance. We can write fitness function like this:
where D is the function of distance between to vertices, calculated with this formula:
At the beginning, we assign a certain number of population size, and randomly select strings of different numbers that go from 1 to n. Once we have made the initial population, we calculate how much each individual is worth according to the fitness function
Parents are selected by roulette selection with rank. To avoid problems such as superunits, we will rank with numbers from 1 to n the entire generation we observe, according to the calculated fitness function. Since we are looking for a minimum, the best unit will have a value of 1 and the worst a value of n. We go through n / 2 iterations and choose two parents each. In order not to choose the same two parents every time, we introduce a random factor, which we multiply by the calculated rank and get the final distribution of the selected individuals.
There are various ways we can cross our individuals, I opted for the order crossover operator (or just OX in the literature).
First we choose random boundaries and cut both parents along those boundaries, then we put what is left between the boundaries in order, in the first and second child.
P1 = (1 2 4 | 5 3 8 | 7 6) --> C1 = (* * * | 5 3 8| * )
P2 = (3 2 5 | 8 1 4 | 6 7) --> C2 = ( * * | 8 1 4| * *)
We fill in the rest of the field of the first child by taking the whole cycle of the second parent, moving from the second intersection point, and removing the elements that are already in the first child.
The cycle of the second parent is 6 7 3 2 5 8 1 4, ie when the elements that are already in the first child are removed, we get the cycle 6 7 2 1 4.
We are now filling in the rest of the first child, starting from the second intersection point. So we get the following form: C1 = (2 1 4 | 5 3 8 | 6 7). The procedure is identical and in the opposite case, for the second child we get the following form: C2 = (2 5 3 | 8 1 4 | 7 6) because the cycle of the first parent is 7 6 2 5 3.
Another crucial factor in a genetic algorithm is the selection of the appropriate mutation. I tried several mutation methods, but reverse sequence mutation (rms) proved to be the best. We select two random numbers and then reverse all numbers within that range. For example:
Let P1 be the observed unit, and let us randomly choose two numbers going from 1 to n - 1 to obtain the following limits
P1 = (1 2 4 | 5 3 8 | 7 6)
the next step is to invert all the numbers in that limit and thus we obtain a mutated unit
P1 = (1 2 4 | 8 3 5 | 7 6)
I have decided to support the possibility for a certain part of the individuals to continue living in the next generation, and not to completely change the generation with a new one.
The algorithm was tested for the TSP instance of the Berlin52 problem for which the shortest solution is known to be 7544.3659. The modules crossovers.py, selections.py and mutations.py contain the main functions of the algorithm, as well as the ability to add new ones. For example, there are a number of different mutation modes in the mutations.py module, of which, as I said earlier, reverse sequence mutation gives the best result in combination with the order crossover operator.
The individual itself is represented by the class Individual, which contains the path and value of the fitness function.
All the necessary methods are in the GeneticAlgorithm class, where when calling the algorithm we can pass all the necessary parameters as well as the functions we want to use for the most important operations such as selection, combination and mutation.
An appropriate balance needs to be found between the execution speed of the algorithm and the accuracy of the results. In our example, the algorithm behaves best for the following parameters:
- mutation_rate = 0.3
- elitis_rate = 0.1
where the number of maximum iterations is 500 and the number of units is 200 (for some larger numbers the algorithm does not give drastically better solutions, and the execution becomes longer, so I limited myself to these values)
At program startup (main.py) the result of one execution is recorded in tsp_current_result.txt and if that result is better than the best result ever measured in previous runs, the content of the best one is automatically changed (tsp_best_result.txt). The following is a display of the results of running the algorithm 100 times. Each execution is started with 500 iterations and the above parameters.
As can be seen in the graph, the algorithm deviates on average by about 9.3% from the correct solution, and the best found solution deviates by only 0.7%.