-
Notifications
You must be signed in to change notification settings - Fork 0
/
write_up_better.tex
1726 lines (1669 loc) · 248 KB
/
write_up_better.tex
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
\documentclass[12pt]{article}
%Note since you added a sub section you will need to update some parts referring to a seciton. IDEALLY LABEL SECTIONS
\usepackage{cite}
\usepackage{graphicx}
\title{\includegraphics[scale=0.6]{screenshots/crest.png}\\[1cm]
The Ant Colony Optimization Algorithm for the Travelling Salesman Problem\vspace{-2ex}
}
\date{}
\author{By\\Dylan Galea\\[1cm]}
\usepackage{listings}
\usepackage[tbtags]{amsmath}
\usepackage{amsthm}
\usepackage{amssymb}
\usepackage{physics}
\usepackage{tikz}
\usepackage{pgfplots}
\usepackage{slashbox}
\usepackage{titling}
\usepackage{algpseudocode,algorithm,algorithmicx}
\usepackage{titlepic}
\usetikzlibrary{arrows.meta,automata,positioning}
\renewcommand{\contentsname}{\centering Contents}
\usepackage{url}
\newtheorem{definition}{Definition}[subsection]
\newtheorem{theorem}[definition]{Theorem}
\newtheorem{lemma}[definition]{Lemma}
\newtheorem{corollary}[definition]{Corollary}
\newtheorem*{remark}{Remark}
\newtheorem{example}[definition]{Example}
\newtheorem{case}{Case}
\newtheorem{claim}{Claim}
\numberwithin{equation}{subsection}
\numberwithin{table}{subsection}
\numberwithin{algorithm}{subsection}
\numberwithin{figure}{subsection}
\DeclareMathOperator*{\argmax}{argmax}
\renewcommand{\baselinestretch}{1.5}
\usepackage { geometry }
\geometry {
right = 20mm,
bottom = 20mm,
left = 30mm,
top = 20mm,
}
\begin{document}
\clearpage\maketitle
\thispagestyle{empty}
{\centering A Dissertation submitted in partial fulfillment of the requirements\\
for the Degree of Bachelor of Science(Honours)\\ in Mathematics as main area\\[2cm]}
{\centering DEPARTMENT OF MATHEMATICS\\
FACULTY OF SCIENCE\\ UNIVERSITY OF MALTA\\[2cm]}
{\centering May 2019\\}
\newpage
{\center\section*{Acknowledgements}}
\addcontentsline{toc}{section}{Acknowledgements}
\noindent
I would like to thank everyone who either helped me academically or on a personal level throughout the writing of this dissertation. Firstly, I would like to express my deepest appreciation to my supervisor Professor Lauri, whose guidance and assistance was invaluable.\\\\
Secondly, I would like to thank my mother, father, brother, and my girlfriend for always being supportive. Without your constant love and support especially during hard times, I wouldn't have reached university let alone writing this dissertation.
\pagenumbering{roman}
\newpage
{\center\section*{Abstract}}
\addcontentsline{toc}{section}{Abstract}
\noindent
The Travelling Salesman Problem is a classical NP-Complete problem. Being NP-Complete, no algorithm that can solve the Travelling Salesman Problem in feasible time is known. As a result, heuristic algorithms and metaheuristic algorithms must be used to approximate a solution in feasible time. A metaheuristic algorithm which can approximate the Travelling Salesman Problem in feasible time is the Ant Colony Optimization Algorithm. In this dissertation, a variant of the Ant Colony Optimization algorithm called the Ant Colony System will be theoretically analyzed. The main theoretical result states that if given enough time to execute, the ACS algorithm applied to the Travelling Salesman Problem always finds an optimal solution. That being said, this result does not specify how much iterations need to be executed for the algorithm to find an optimal solution. As a result, the Ant Colony System will also be empirically analyzed, where the main aim of this empirical analysis is to study how the ACS algorithm approaches the optimal value as the number of iterations are increased. Apart from this, another aim of the empirical analysis is to check whether the Ant Colony System gives better approximations than the Nearest Neighbour Algorithm and the Twice Around the Minimum Spanning Tree Algorithm, which are two classical heuristic algorithms.\\\\
After presenting the theory behind the algorithms mentioned in the previous paragraphs, these algorithms were implemented and applied to Travelling Salesman Problem instances defined in the 2-D Euclidean Space. The approximation results indicate that for Travelling Salesman Problem instances defined in the 2-D Euclidean Space, the Ant Colony System algorithm gives better approximations than the Nearest Neighbour Algorithm and the Twice Around the Minimum Spanning Tree algorithm. From the empirical analysis it was also concluded that, for TSP instances defined in the 2-D Euclidean Space, the Ant Colony System algorithm approaches the optimal value in a manner which is inversely proportional to the number of iterations executed. As a result it could be concluded that, although it is theoretically known that the Ant Colony System algorithm will find an optimal solution if given enough time, it may do so in infeasible time.
\newpage
\tableofcontents
\newpage
\pagenumbering{arabic}
{\center{\section{Introduction} \label{introduction}}}
The Travelling Salesman Problem is a Mathematical problem that has been studied long since the 18th century by mathematicians Sir William Rowam Hamilton, and Thomas Penyngton Kirkman. In its most general form, the Travelling Salesman Problem can be described as follows. Suppose that a salesman is given a number of cities where the cost of travelling between the cities is known. In the Travelling Salesman Problem, the salesman is asked to find a trip of least cost which goes through every city exactly once, and returns back to the starting city. Although the Travelling Salesman Problem is intuitively easy to understand, it has a number of important real life applications. For example, suppose that a company has distinct items stored at a different warehouse, and assume that an order consisting of different items is received from a customer. In addition to this, suppose that a truck is to pick up the items ordered from the different warehouses, and deliver the order to the customer. Due to expenditure in fuel and other deliveries, the company may need to find the shortest way to gather the distinct items for different warehouses, and deliver them to the customer. This problem can be easily formulated as the Travelling Salesman Problem by letting the salesman be the truck, the cities be the different warehouses, and the cost of travelling between cities be the cost of travelling from one warehouse to another. As a result, the company can determine the trip of least cost by solving the Travelling Salesman Problem. However, unfortunately no algorithm that can solve the Travelling Salesman Problem in feasible time is known. \cite{dorigo_stutzle_thomas_2004}, \cite{Matai10}\\\\
Although no algorithm that can solve the Travelling Salesman Problem in feasible time is known, this does not mean that the problem cannot be approximated in feasible time \cite{cormen_leiserson_rivest_stein}. Therefore, the aim of this dissertation is to theoretically and empirically analyze an algorithm called the Ant Colony Optimization Algorithm which can approximate the Travelling Salesman Problem in feasible time. The theoretical aim of this dissertation is to present an Ant Colony Optimization algorithm variant called the Ant Colony System algorithm, and then argue that if given enough time to execute, the Ant Colony System algorithm will always find an optimal solution when applied to the Travelling Salesman Problem \cite{dorigo_stutzle_thomas_2004}. It is important to note that, this result does not state how much iterations need to be executed for the Ant Colony System algorithm to find an optimal solution when applied to the Travelling Salesman Problem. In addition to this, according to \cite{dorigo_stutzle_thomas_2004}, there are no results which state how many iterations need to be executed for the Ant Colony System to find an optimal solution for the Travelling Salesman Problem. Hence, the main aim of the empirical analysis is to try and study how the Ant Colony System approaches an optimal solution of the Travelling Salesman Problem in practice. Another aim of the empirical analysis is to apply the Ant Colony System on specific instances of the Travelling Salesman Problem, and compare it's results to those obtained by the Twice Around the Minimum Spanning Tree, and the Nearest Neighbour algorithm. As a result, one can then deduce which algorithm should be used to approximate the Travelling Salesman Problem for the considered instances. \\\\
Therefore, this dissertation is divided into four main chapters. The second chapter is split into 2 sections where, the first section presents the basic graph theoretic concepts which will be used throughout the dissertation. The second section will present the theory of computation which will help in formalizing the theory of NP-Completeness, and what it means when it is said that the Travelling Salesman Problem is NP-Complete. Then, the third chapter is divided into two sections. The first section presents the mathematical definition of the Travelling Salesman Problem, and properties belonging to the Travelling Salesman Problem. Afterwards, it will be shown that the Travelling Salesman Problem is NP-Complete, and hence approximation algorithms are needed to obtain sub-optimal solutions. The second section will then present two classical algorithms which are known to approximate the Travelling Salesman Problem. These algorithms are the Twice Around the Minimum Spanning Tree algorithm, and the Nearest Neighbour algorithm. In this section, the aim is to show that when applied to specific instances of the Travelling Salesman Problem, both algorithms return a solution with cost smaller than some upper bound. The fourth chapter presents the Ant Colony Optimization algorithm. In this chapter, the Ant Colony Optimization algorithm will be described, along with a variant of the algorithm called the Ant Colony System algorithm. The main aim of this section is to deduce that when applied to the Travelling Salesman Problem, the Ant Colony System will always find an optimal solution if given enough time to execute. Finally, the fifth chapter will present the empirical analysis of the Nearest Neighbour algorithm, the Twice Around the Minimum Spanning Tree algorithm, and the Ant Colony System algorithm. In this chapter, one of the aims is to present the approximation results of the Nearest Neighbour algorithm, the Twice Around the Minimum Spanning Tree algorithm, and the Ant Colony System when applied to specific instances of the Travelling Salesman Problem. The results from different algorithms are then compared and hence it could be deduced which algorithm produces the best approximations for the considered instances. In addition to this, another aim of this chapter is to try and study how the Ant Colony System approaches an optimal solution for the considered Travelling Salesman Problem instances.
\newpage
{\center\section{Background Theory}
\label{background_theory}}
As already discussed in Chapter \ref{introduction}, the main aim of this chapter is to present the background theory which will be needed throughout this dissertation. The background theory which will be needed is related to graph theory and computational theory. The following section starts by discussing the graph theoretic concepts which will be used throughout.
\subsection{Some Graph Theory}
\label{some_graph_theory}
Graph theory is the study of a mathematical structure called a graph. A graph can be defined formally as shown in definition \ref{Graph} below.
\begin{definition}
\label{Graph}
A graph G is a pair (V,E), where V is any non empty finite set called the set of vertices of G, and E $\subseteq$ \{$\{u,v\}$ $:$ $\forall$ u,v $\in$ V and u $\neq$ v\} is called the set of edges of G {\normalfont{\cite{black_tanenbaum_2017}}}. A graph G defined by the pair (V,E) will be denoted as G(V,E) or G.
\end{definition}
A graph defined using Definition \ref{Graph} is called an undirected graph. There is also the concept of a directed graph were $\mathit{E \subseteq \{(u,v) : \forall u,v \in V, u \neq v\}}$ \cite{black_tanenbaum_2017}. However, in this dissertation it can be assumed that any graph that will be considered is undirected, and that for every vertex $u$, there are no edges of the form $\{u, u\}$, unless otherwise stated. It must also be noted that by this definition there cannot be multiple edges joining any 2 vertices. The reason is that sets do not allow repetition of elements. Thus, each element in the edge set is unique. The discussion will now proceed by introducing more graph theoretic terminologies, with examples that illustrate these terminologies.\\
\\When two vertices are joined by an edge, they are said to be adjacent. This is defined formally in definition \ref{adjacent} below.
\begin{definition}
\label{adjacent}
Given a graph G(V,E), $\forall$ u,v $\in$ V, u and v are said to be adjacent if \{$u,v\}$ $\in$ E. Also, if u and v are adjacent, then u and v are said to be end vertices of the edge \{$u,v\}$. \normalfont{\cite{harris_hirst_mossinghoff_2008}}
\end{definition}
There is also a special name associated with the case when a vertex is an end vertex of an edge.
\begin{definition}
\label{incident}
A vertex v in a graph G is incident to an edge e in G if v is an end vertex of e. {\normalfont{\cite{harris_hirst_mossinghoff_2008}}}
\end{definition}
It is sometimes also required to know how many edges are incident to a specific vertex in a graph.
\begin{definition}
\label{degree}
The degree of a vertex v in a graph G denoted by deg(v) is, the number of edges incident with v in G. {\normalfont{\cite{harris_hirst_mossinghoff_2008}}}
\end{definition}
Any graph $\mathit{G(V,E)}$ can also be represented pictorially by drawing the vertices of $\mathit{G}$ using circles, and by drawing the edges of $\mathit{G}$ using lines between adjacent vertices. As a result, in this dissertation a graph is sometimes given formally using sets or as a pictorial representation, assuming that one can be converted into another. Example \ref{Example 1} below depicts how a graph can be represented pictorially, and illustrates Definitions \ref{adjacent}, \ref{incident}, and \ref{degree}.
\begin{example}
\label{Example 1}
{\normalfont{Consider the graph}} G(V,E) {\normalfont{such that}} V=\{$v_1, v_2, v_3, v_4\}$ {\normalfont{and}} E = \{$\{$ $v_1$, $v_2$\}$, \{v_2,v_3\}, \{v_3,v_4\}, \{v_4,v_1\}\}$.\\{\normalfont{Then,}} G {\normalfont{can be represented pictorially as :}}\\
\begin{center}
\begin{tikzpicture}[
> = , % arrow head style
shorten > = 1pt, % don't touch arrow head to node
auto,
node distance = 3cm, % distance between nodes
semithick % line style
]
\tikzset{every state}=[
draw = black,
thick,
fill = black,
minimum size = 1mm
]
\node[state] (v1) {$v_1$};
\node[state] (v2) [right=of v1] {$v_2$};
\node[state] (v3) [below =of v1] {$v_3$};
\node[state] (v4) [below =of v2] {$v_4$};
\path[->] (v1) edge node[]{}(v2);
\path[->] (v2) edge node[]{} (v3);
\path[->] (v3) edge node[]{}(v4);
\path[->] (v4) edge node[]{}(v1);
\end{tikzpicture}
\end{center}
{\normalfont{Using definition \ref{adjacent}, two adjacent vertices in }}G {\normalfont{are}} $v_1$ {\normalfont{and}} $v_2$.{\normalfont{ On the other hand, two non adjacent vertices in }}G {\normalfont{are}} $v_1$ {\normalfont{and}} $v_3.$\\ {\normalfont{By definition \ref{incident}}}, $v_1$ {\normalfont{and}} $v_2$ {\normalfont{are incident to the edge}} \{$v_1, v_2\}$.\\
{\normalfont{Using definition \ref{degree}, the degree of every vertex in}} G {\normalfont{is 2}}.
\end{example}
There are many important graphs in graph theory, one of them being the complete graph on $\mathit{n}$ vertices.
\begin{definition}
\label{Complete Graph}
A graph G(V,E) is said to be complete if $\forall$ v,w $\in$ V v $\neq$ w, v is adjacent to w. The complete graph on n vertices is denoted by $K_n$. \normalfont{\cite{harris_hirst_mossinghoff_2008}}
\end{definition}
Given any graph $\mathit{G(V,E)}$, one can also define graph theoretic structures that lie within $\mathit{G}$ such as walks and paths.
\begin{definition}
\label{walk}
Given a graph G(V,E), a walk is a sequence of vertices (not necessarily distinct) u= $u_1$, $u_2$, ...,$u_n$ = v such that $\forall$ i $\in$ [n-1], $\{u_i, u_{i+1}\}$ $\in$ E. Such a walk is usually referred to as a u-v walk, were u and v are the end vertices of the walk. \normalfont{\cite{harris_hirst_mossinghoff_2008}}
\end{definition}
\begin{definition}
\label{Path}
Given a graph G(V,E), a path in G joining any 2 vertices u, v $\in$ V, is a u-v walk with no repeated vertices.
\end{definition}
Definition \ref{Path} can now be used to define cycles and connectivity in a graph.
\begin{definition}
\label{connectedgraph}
A graph G(V,E) is said to be connected if $\forall$ u,v $\in$ V u $\neq$ v, u and v are joined by a path. \normalfont{\cite{harris_hirst_mossinghoff_2008}}
\end{definition}
\begin{definition}
\label{cycle}
Given a graph G(V,E), a cycle in G is a path $u_1$, ..., $u_k$ together with the edge \{$u_1, u_k\}$, where k $\geq$ 3 \normalfont{\cite{harris_hirst_mossinghoff_2008}}.
\end{definition}
By Definitions \ref{walk} and \ref{Path} above, it is clear that a path is a special instance of a walk. Similarly, from Definitions \ref{Path} and \ref{cycle}, a cycle is a special instance of a path, with the only difference being that in a cycle, the first vertex and the last vertex are equal. Another thing worth mentioning is that, since according to definitions \ref{walk}, \ref{Path} and \ref{cycle}, cycles and paths are formulated in terms of walks, then they could only be perceived as sequences of vertices and not actual graphs. However, this is not the case because cycles and paths can be represented easily as graphs. For example, given the path/cycle $\mathit{u_1, u_2, ...,u_n}$, a new graph $\mathit{G(V,E)}$ can be created such that, $\mathit{V= \{ u_1, u_2, ..., u_n\}}$ and $\mathit{E = \{ \{u_i, u_{i+1}\} : \forall i, 0 < i < n\}}$. For example, consider the cycle $v_1$, $v_2$, $v_3$, $v_4$, $v_1$, then the graph depicted in example \ref{Example 1}, is the graph representing this cycle. Such graphs are known as Cycle/Path graphs and are denoted by $\mathit{C_n/P_n}$ respecitvely, $\mathit{n}$ being the number of vertices in the graph. Since this construction can be done, cycles/paths will be treated as both graphs and sequences throughout this dissertation. The usefulness of this remark will be seen when defining Hamiltonian cycles. Example \ref{example3} is constructed for better understanding of Definitions \ref{Complete Graph}, \ref{Path}, \ref{connectedgraph} and \ref{cycle}.
\begin{example}
\label{example3}
{\normalfont{Consider the graph}} G(V,E) {\normalfont{below:}}\\
\begin{center}
\begin{tikzpicture}[
> = , % arrow head style
shorten > = 1pt, % don't touch arrow head to node
auto,
node distance = 3cm, % distance between nodes
semithick % line style
]
\tikzset{every state}=[
draw = black,
thick,
fill = black,
minimum size = 1mm
]
\node[state] (v1) {$v_1$};
\node[state] (v2) [right=of v1] {$v_2$};
\node[state] (v3) [below =of v1] {$v_3$};
\node[state] (v4) [below =of v2] {$v_4$};
\path[->] (v1) edge node[]{}(v2);
\path[->] (v1) edge node[]{}(v3);
\path[->] (v2) edge node[]{} (v3);
\path[->] (v3) edge node[]{}(v4);
\path[->] (v2) edge node[]{}(v4);
\path[->] (v4) edge node[]{}(v1);
\end{tikzpicture}
\end{center}
{\normalfont{Since every vertex in}} G {\normalfont{is adjacent to every other vertex, then by Definition \ref{Complete Graph}, }}G {\normalfont{must be complete. Hence,}} G {\normalfont{must be}} $K_4$. {\normalfont{Since}} G {\normalfont{is complete, it must also be connected because, there is a path}} $P_2$ {\normalfont{between any two distinct vertices of}} G.\\
{\normalfont{Some examples of walks in}} G {\normalfont{are:}}\\
1. $v_1$ $v_2$ $v_3$ $v_4$ $v_1$ $v_2$\\
2. $v_1$ $v_4$ $v_2$ $v_3$ $v_1$ $v_3$\\
3. $v_4$ $v_3$ $v_1$ $v_4$ $v_1$\\
{\normalfont{Some examples of paths in}} G {\normalfont{are:}}\\
1. $v_1$ $v_2$ $v_3$ $v_4$\\
2. $v_1$ $v_4$\\
3. $v_4$ $v_3$ $v_1$\\
{\normalfont{Some examples of cycles in}} G {\normalfont{are:}}\\
1. $v_1$ $v_2$ $v_3$ $v_4$ $v_1$\\
2. $v_1$ $v_4$ $v_2$ $v_3$ $v_1$\\
3. $v_4$ $v_3$ $v_1$ $v_4$
\end{example}
Another important graph theoretic concept is that of subgraphs.
\begin{definition}
\label{subgraph}
Given a graph G(V,E) and a graph H($V^\prime$,$E^\prime$), H is a subgraph of G if $V^\prime$ $\subseteq$ V and $E^\prime$ $\subseteq$ E \normalfont{\cite{harris_hirst_mossinghoff_2008}}.
\end{definition}
Clearly, by definition \ref{subgraph} and the construction of Cycle/Path graphs, if $A$ is a cycle/path in $G$ then the Cycle/Path graph representing $A$ is a subgraph of $G$. This observation helps to define the concept of two distinct path/cycles. Two paths/cycles in $G$ are said to be distinct if, when they they are constructed as Path/Cycle subgraphs of $G$ they differ in at least one edge. After defining some important concepts, the next step is to extend Definition \ref{Graph} to define another important class of graphs called weighted graphs. It must also be noted that all definitions presented so far apply also to weighted graphs.
\begin{definition}
\label{Weighted Function}
Given a graph G(V,E), a weight function is a function f : E $\mapsto$ $\real^+$ {\normalfont{\cite{harris_hirst_mossinghoff_2008}}}. The real numbers assigned to each edge are called weights.
\end{definition}
Note that in Definition \ref{Weighted Function}, the weights are taken to be positive. According to \cite{harris_hirst_mossinghoff_2008}, there could be cases where negative weights would be appropriate, however in this dissertation it is to be assumed that when considering a weight function, the weights are positive.
\begin{definition}
\label{Weighted Graph}
A weighted graph is a graph G(V,E) with a weight function f {\normalfont{\cite{harris_hirst_mossinghoff_2008}}}. This will be denoted by the triple G(V,E,f) or G.
\end{definition}
According to Bondy and Murty \cite{bondy_murty_1982}, weighted graphs occur regularly in applied graph theory. For example, a railway network can be represented by a weighted graph where the vertices are the set of towns in the railway network, and two vertices are connected if there is a direct route using the railway from one town to another, without visiting other towns in the process. The weight function would then represent the cost of travelling directly from one town to another. In addition to this, the shortest path between two towns in the network may be required. It is clear that in order to try and solve such problems, the total weight of a subgraph must first be defined.
\begin{definition}
\label{weightofasubgraph}
Given a weighted graph G(V,E,f), the total weight of any subgraph H($V^\prime$,$E^\prime$,f) of G is $\sum_{e \in E^\prime}^{} f(e) $ \normalfont{\cite{bondy_murty_1982}}.
\end{definition}
It is important to note that by Definition \ref{subgraph}, any weighted graph G is a subgraph of itself, therefore, it's weight can be calculated. This is highlighted in Example \ref{example4} below.
\begin{example}
\label{example4}
{\normalfont{Consider the weighted graph}} G(V,E,f) {\normalfont{such that,}} G(V,E) {\normalfont{ is the graph in Example \ref{example3} with the weight function}} f {\normalfont{defined below:}}\\
f(\{$v_1, v_2\}$) = 4\\
f(\{$v_1, v_3\}$) = 5\\
f(\{$v_2, v_3\}$) = 2\\
f(\{$v_3, v_4\}$) = 10\\
f(\{$v_2, v_4\}$) = 4\\
f(\{$v_4, v_1\}$) = 7\\
{\normalfont{Then by Definition \ref{Weighted Graph}, the graph below is a weighted graph}}.\\
\begin{center}
\begin{tikzpicture}[
> = , % arrow head style
shorten > = 1pt, % don't touch arrow head to node
auto,
node distance = 3cm, % distance between nodes
semithick % line style
]
\tikzset{every state}=[
draw = black,
thick,
fill = white,
minimum size = 1mm
]
\node[state] (v1) {$v_1$};
\node[state] (v2) [right=of v1] {$v_2$};
\node[state] (v3) [below =of v1] {$v_3$};
\node[state] (v4) [below =of v2] {$v_4$};
\path[->] (v1) edge node[]{4}(v2);
\path[->] (v1) edge node[]{5}(v3);
\path[->] (v2) edge node[pos=0.2,below right]{2} (v3);
\path[->] (v3) edge node[]{10}(v4);
\path[->] (v2) edge node[]{4}(v4);
\path[->] (v4) edge node[pos=0.2,below left]{7}(v1);
\end{tikzpicture}
\end{center}
{\normalfont{As highlighted previously, the above graph is a subgraph of itself. Therefore, it's weight can be calculated. As a result, by Definition \ref{weightofasubgraph}, the weight of}} G {\normalfont{is 32.}}
\end{example}
According to Guichard \cite{guichard_2018}, trees are another useful class of graphs.
\begin{definition}
\label{tree}
A tree is a connected graph with no cycles \normalfont{\cite{guichard_2018}}.
\end{definition}
Having defined the basic building blocks in graph theory, it is now time to define harder concepts that use previous definitions. It is important to note that the following concepts can be applied to both weighted and unweighted graphs. Therefore, in the remaining definitions the graph being considered can either be weighted or unweighted.
\begin{definition}
\label{spanning subgraph}
H($V^\prime$,$E^\prime$) is a spanning subgraph of G(V,E) if H is a subgraph of G and $V^\prime$ = V \normalfont{\cite{ray_2013}}.
\end{definition}
There are many types of spanning subgraphs, however, the ones that are relevant to this dissertation are spanning trees and spanning cycles, the latter mostly known as Hamiltonian cycles.
\begin{definition}
A graph H is a spanning tree of G if H is a tree, and H is a spanning subgraph of G \normalfont{\cite{ray_2013}}.
\label{spanning tree}
\end{definition}
\begin{definition}
\label{hamiltonian cycle}
Given a graph G, C is a Hamiltonian cycle of G if C is a cycle, and C is a spanning subgraph of G. Also, a graph that contains a Hamiltonian cycle is called a Hamiltonian graph. \normalfont{\cite{ray_2013}}
\end{definition}
It is worth mentioning that Definition \ref{hamiltonian cycle} holds because, cycles can be represented by Cycle graphs due to the construction discussed earlier. What follows now is an example that illustrates better Definitions \ref{spanning subgraph}, \ref{spanning tree} and \ref{hamiltonian cycle}.
\begin{example}
\label{example5}
{\normalfont{Let}} G {\normalfont{be the graph in Example \ref{example4}. Then, according to Definition \ref{spanning subgraph}, the two graphs below are two spanning subgraphs of}} G {\normalfont{because, they contain all the vertices of}} G {\normalfont{and are subgraphs of }}G .\\
\begin{center}
\begin{tikzpicture}[
> = , % arrow head style
shorten > = 1pt, % don't touch arrow head to node
auto,
node distance = 3cm, % distance between nodes
semithick % line style
]
\tikzset{every state}=[
draw = black,
thick,
fill = white,
minimum size = 1mm
]
\node[state] (v1) {$v_1$};
\node[state] (v2) [right=of v1] {$v_2$};
\node[state] (v3) [below =of v1] {$v_3$};
\node[state] (v4) [below =of v2] {$v_4$};
\path[->] (v1) edge node[]{4}(v2);
\path[->] (v1) edge node[]{5}(v3);
\path[->] (v3) edge node[]{10}(v4);
\path[->] (v2) edge node[]{4}(v4);
\end{tikzpicture}
\end{center}
\begin{center}
\begin{tikzpicture}[
> = , % arrow head style
shorten > = 1pt, % don't touch arrow head to node
auto,
node distance = 3cm, % distance between nodes
semithick % line style
]
\tikzset{every state}=[
draw = black,
thick,
fill = white,
minimum size = 1mm
]
\node[state] (v1) {$v_1$};
\node[state] (v2) [right=of v1] {$v_2$};
\node[state] (v3) [below =of v1] {$v_3$};
\node[state] (v4) [below =of v2] {$v_4$};
\path[->] (v1) edge node[]{5}(v3);
\path[->] (v2) edge node[pos=0.2,below right]{2} (v3);
\path[->] (v2) edge node[]{4}(v4);
\path[->] (v4) edge node[pos=0.2,below left]{7}(v1);
\end{tikzpicture}
\end{center}
{\normalfont{It must also be said that by Definition \ref{hamiltonian cycle}, the two graphs above are Hamiltonian cycles of}} G {\normalfont{because, they are spanning sub-graphs of}} G, {\normalfont{and are Cycle sub-graphs of}} G. {\normalfont{Since the above graphs are sub-graphs of }}G{\normalfont{, by Definition \ref{weightofasubgraph}, their weight can be calculated by summing up the weights of the edges. Thus, the Hamiltonian cycles above have weight 23 and 18 respectively.}}\\
{\normalfont{Given the graph }}G{\normalfont{ in Example \ref{example4}, the two graphs below are spanning trees of }}G{\normalfont{ of weight 18 and 19 respectively.}}\\
\begin{center}
\begin{tikzpicture}[
> = , % arrow head style
shorten > = 1pt, % don't touch arrow head to node
auto,
node distance = 3cm, % distance between nodes
semithick % line style
]
\tikzset{every state}=[
draw = black,
thick,
fill = white,
minimum size = 1mm
]
\node[state] (v1) {$v_1$};
\node[state] (v2) [right=of v1] {$v_2$};
\node[state] (v3) [below =of v1] {$v_3$};
\node[state] (v4) [below =of v2] {$v_4$};
\path[->] (v1) edge node[]{4}(v2);
\path[->] (v3) edge node[]{10}(v4);
\path[->] (v2) edge node[]{4}(v4);
\end{tikzpicture}
\end{center}
\begin{center}
\begin{tikzpicture}[
> = , % arrow head style
shorten > = 1pt, % don't touch arrow head to node
auto,
node distance = 3cm, % distance between nodes
semithick % line style
]
\tikzset{every state}=[
draw = black,
thick,
fill = white,
minimum size = 1mm
]
\node[state] (v1) {$v_1$};
\node[state] (v2) [right=of v1] {$v_2$};
\node[state] (v3) [below =of v1] {$v_3$};
\node[state] (v4) [below =of v2] {$v_4$};
\path[->] (v1) edge node[]{4}(v2);
\path[->] (v1) edge node[]{5}(v3);
\path[->] (v3) edge node[]{10}(v4);
\end{tikzpicture}
\end{center}
{\normalfont{This example also shows that within the same weighted graph, there could be multiple Hamiltonian cycles and spanning trees of different weight.}}
\end{example}
Having defined Hamiltonian cycles and spanning trees, it is natural to ask whether there are necessary and sufficient conditions in a graph that guarantee it is Hamiltonian, or that it contains a spanning tree as sub-graph. In fact, Theorem \ref{spanningtreetheorem} gives a necessary and sufficient condition for a graph to have a spanning tree. Note that the proof of Theorem \ref{spanningtreetheorem} was constructed using ideas from \cite{ray_2013}.
\begin{theorem}
A graph G has a spanning tree $\iff$ it is connected {\normalfont{\cite{ray_2013}}}.
\label{spanningtreetheorem}
\end{theorem}
\begin{proof}
($\implies$) Let $\mathit{G(V,E)}$ be a graph having a spanning tree $\mathit{T(V^\prime,E^\prime)}$ as one of it's subgraphs. Let $\mathit{v1, v2}$ $\in$ $\mathit{V}$. Since, $\mathit{T}$ is a spanning tree of $\mathit{G}$, then, $\mathit{T}$ is a spanning subgraph of $\mathit{G}$. Thus, $\mathit{v1, v2}$ $\in$ $V^\prime$. Also, since $\mathit{T}$ is a tree, $\mathit{T}$ must be connected. Therefore, $\exists$ a path $\mathit{P}$ joining vertices $\mathit{v_1}$ and $\mathit{v_2}$ in $\mathit{T}$. But since $\mathit{T}$ is a subgraph of G, then $\mathit{P}$ is also a path in $\mathit{G}$. Therefore $\mathit{G}$ must be connected.\\
($\Leftarrow$) Conversely, let $\mathit{G(V,E)}$ be a connected graph. Then, if $\mathit{G}$ has no cycles, $\mathit{G}$ itself must be a spanning tree. If $\mathit{G}$ has cycles, delete an edge from a cycle in $\mathit{G}$. Clearly, the resultant graph is still connected and contains one less cycle. Repeat this procedure untill no more cycles are left in the graph. Then, the resultant graph $\mathit{G^\prime}$ would be a connected subgraph of $\mathit{G}$ having no cycles (i.e a tree). Also, since by the deletion procedure no vertex was deleted from $\mathit{G}$, $\mathit{G^\prime}$ is a spanning subgraph of $\mathit{G}$. Therefore $\mathit{G^\prime}$ is a spanning tree of $\mathit{G}$.
\end{proof}
Theorem \ref{spanningtreetheorem} confirms that for a graph to have a spanning tree, the graph must be connected and vice-versa. Thus, for spanning trees, the necessary and sufficient condition is connectivity. However, the same cannot be said about Hamiltonian cycles because, no necessary and sufficient conditions are known for a graph to be Hamiltonian \cite{guichard_2018}. In fact, there are sufficient conditions for a graph to be Hamiltonian, however, these conditions are not necessary. There are also necessary conditions, some of which are trivial, such as, if $\mathit{G}$ is Hamiltonian then G must be connected, but this is not necessary and sufficient. According to Guichard \cite{guichard_2018}, these sufficient conditions typically say that for a graph to be Hamiltonian it must have a lot of edges. But, it is also argued in \cite{guichard_2018} that these conditions are not necessary because, there are Hamiltonian graphs that do not have many edges. For example, $\mathit{C_n}$ has only $\mathit{n}$ edges but is Hamiltonian. A sufficient but not necessary condition for Hamiltonianicity is Ore's Theorem below. Note that the proof of Theorem \ref{ore's theorem} was constructed using ideas from {\normalfont{\cite{ray_2013}}}.
\begin{theorem}[Ore's Theorem]
\label{ore's theorem}
Let G be a graph on n $\geq$ 3 vertices such that if v and w are not adjacent in G, then deg(v) + deg(w) $\geq$ n. Then G is Hamiltonian. {\normalfont{\cite{ray_2013}}}
\end{theorem}
\begin{proof}
Suppose that $\mathit{G(V,E)}$ is a graph satisfying all the conditions in the theorem statement, but is not Hamiltonian. Since $\mathit{G}$ is not Hamiltonian and $K_n$ is Hamiltonian, $\mathit{G}$ must be a subgraph of $\mathit{K_n}$ having fewer edges than $\mathit{K_n}$. Therefore, add edges to $\mathit{G}$ between non adjacent vertices to obtain a subgraph $\mathit{H(V^\prime,E^\prime)}$ of $\mathit{K_n}$, such that adding an edge to $\mathit{H}$ would create a subgraph of $\mathit{K_n}$ which is Hamiltonian. Let $\mathit{u, v}$ $\in$ $V^\prime$ be 2 non-adjacent vertices in H. Since by construction $\mathit{G}$ is a subgraph of $\mathit{H}$, $u$ and $v$ must be non-adjacent in $\mathit{G}$. Therefore, $\mathit{deg(u) + deg(v) \geq n}$ in both G and H. Since adding an edge to $\mathit{H}$ creates a resultant graph that is Hamiltonian, then, adding an edge between $\mathit{u}$ and $\mathit{v}$ creates a Hamiltonian graph. Therefore, in $\mathit{H}$ there must be a path joining $\mathit{u}$ and $\mathit{v}$ containing all the vertices of $\mathit{H}$. Let the path be $\mathit{u = v_1, v_2, ..., v_n = v}$.\\
Now, suppose $\mathit{deg(v_1)}$ = $\alpha$ in $\mathit{H}$. Now $\forall \mathit{i}, 1 < i < \mathit{n}$, if there is an edge between $\mathit{v_1}$ and $\mathit{v_i}$ in $\mathit{H}$, then there must not be an edge between $\mathit{v_{i-1}}$ and $\mathit{v_n}$ because, $\mathit{v_1, v_i, v_{i+1}, ..., v_n, v_{i-1}, v_{i-2}, ..., v_1}$ would be a Hamiltonian cycle in $\mathit{H}$, thus H would be Hamiltonian. Therefore, $\mathit{deg(v_n)}$ $\leq$ $\mathit{n-1-\alpha}$\\
$\implies$ $\mathit{deg(v_1) + deg(v_n) \leq \alpha + n-1 - \alpha}$ in $\mathit{H}$\\
$\implies$ $\mathit{deg(v_1) + deg(v_n) \leq n-1}$ in $\mathit{H}$\\
$\implies$ $\mathit{deg(v_1) + deg(v_n) < n}$ in $\mathit{G}$ since $\mathit{G}$ is a subgraph of $\mathit{H}$.\\
This contradicts the assumption that $\mathit{deg(v_1 = u) + deg(v_n = v) \geq n}$ in $\mathit{G}$ \\
Therefore, $\mathit{G}$ must be Hamiltonian.
\end{proof}
It is important to note that the above proof uses the fact that $K_n$ is Hamiltonian. This is true because, any cycle $C$ consisting of vertices from $K_n$ only is always a subgraph of $K_n$. Therefore, any cycle $C$ that spans $K_n$ is a spanning subgraph of $K_n$. Hence, by Definition \ref{hamiltonian cycle}, $C$ must be a Hamiltonian cycle of $K_n$, and therefore, $K_n$ is Hamiltonian. Example \ref{example 6} below is a counter example which shows why Ore's theorem gives a sufficient but not necessary condition.\\
\begin{example}
\label{example 6}
{\normalfont{Consider the graph}} $C_5$ below.
\\
\begin{center}
\begin{tikzpicture}[
> = , % arrow head style
shorten > = 1pt, % don't touch arrow head to node
auto,
node distance = 3cm, % distance between nodes
semithick % line style
]
\tikzset{every state}=[
draw = black,
thick,
fill = white,
minimum size = 1mm
]
\node[state] (v1) {$v_1$};
\node[state] (v2) [below left =of v1] {$v_2$};
\node[state] (v3) [below right =of v1] {$v_3$};
\node[state] (v4) [below=of v2] {$v_4$};
\node[state] (v5) [below=of v3] {$v_5$};
\path[->] (v1) edge node[]{}(v2);
\path[->] (v2) edge node[]{}(v4);
\path[->] (v4) edge node[]{}(v5);
\path[->] (v5) edge node[]{}(v3);
\path[->] (v3) edge node[]{}(v1);
\end{tikzpicture}
\end{center}
$C_5$ {\normalfont{above is Hamiltonian because it contains the Hamiltonian cycle}} $v_1, v_2, v_4, v_5, v_3, v_1$. {\normalfont{However,}} deg($v_1$) + deg($v_5$) = 4 $<$ 5 = n. {\normalfont{Therefore, the condition in Ore's Theorem is not a necessary condition}}.
\end{example}
The discussion in this section indicates that determining whether a graph is Hamiltonian is a very difficult problem. This means that there are certain problems that are harder than other problems. To formally reason about such problems, some computational theory must first be presented. This is done in the next section.
\subsection{Some Computational Theory}
\label{computational_theory}
After defining some important graph theoretic concepts, it is now time to present some important computational theory that will be used in the upcoming chapters. The discussion will now proceed by defining two different types of problems, and describing the relationship between them.
\begin{definition}
\label{optimization problems}
Suppose that a problem P has many possible solutions such that each solution has an associated value. Then, P is an optimization problem if P requires to find the solution with the best (minimum/maximum) value. \normalfont{\cite{cormen_leiserson_rivest_stein}}
\end{definition}
Therefore, the aim of the algorithm solving an optimization problem is to find the solution with the best (minimum/maximum) value. A maximization problem is an optimization problem that requires to find the solution with the maximum value. On the other hand, a minimization problem is an optimization problem that requires to find the solution with the minimum value. A solution with the best value is called an optimal solution to the optimization problem being solved, and it's value is known as the optimal value. Note that an optimal solution is not termed as `the' optimal solution because, there could be many solutions having the same optimal value. Another type of problems area called decision problems. \cite{cormen_leiserson_rivest_stein}
\begin{definition}
\label{decision problems}
A Decision problems is a problem whose solution is either a yes or a no \normalfont{\cite{cormen_leiserson_rivest_stein}}.
\end{definition}
In \cite{cormen_leiserson_rivest_stein} it is argued that given an optimization problem, one can transform the given optimization problem into a decision problem, where the decision problem is easier to solve than the optimization problem. Therefore, if some optimization problem is easy to solve, the decision problem version is easy to solve as well \cite{cormen_leiserson_rivest_stein}. On the other hand, if the decision problem is hard to solve, then this would mean that the original optimization problem is also hard to solve \cite{cormen_leiserson_rivest_stein}. Note that the notion of `hard' and `easy' problems will be rigorously defined later in this section. This concept was only mentioned here to show that there exists a relationship between optimization problems and decision problems. Before going into harder computational theory, the example below is given to show how an optimization problem can be transformed into a related decision problem.
\begin{example}
\label{example decision/optimization}
{\normalfont{Two optimization problems are:\\
1. The Shortest Path Problem\\
2. The Minimum Weight Spanning Tree problem\\
Given a connected weighted graph}} G{\normalfont{ and two vertices $u$ and $v$ of $G$, the Shortest Path Problem is the task of finding a path}} P {\normalfont{between $u$ and $v$ in}} G {\normalfont{such that, the weight of }}P {\normalfont{is the minimum amongst all paths joining $u$ and $v$ in }}G {\normalfont{\cite{bondy_murty_1982}}}. {\normalfont{On the other hand, given a connected weighted graph}} G {\normalfont{the Minimum Weight Spanning Tree Problem is the task of finding a spanning tree in}} G {\normalfont{of minimum weight \cite{harris_hirst_mossinghoff_2008}. The Shortest Path Problem is an optimization problem because, from all the possible paths joining $u$ and $v$ in the graph $G$, a path having the optimal value (minimum weight) joining $u$ and $v$ is chosen. In this case, the optimal solution is the path with minimum weight. Similarly, using the same reasoning, the minimum weight spanning tree problem is an optimisation problem. }}\\
{\normalfont{The decision problems related to the optimization problems above can be obtained by specifying a bound on the value to be optimized \cite{cormen_leiserson_rivest_stein}. Therefore, the decision problems related to the Shortest Path Problem and the Minimum Weight Spanning Tree Problem are the following:}}\\
{\normalfont{1. Given a connected weighted graph}} G, {\normalfont{two vertices}} u {\normalfont{and}} v, {\normalfont{and an integer}} k,{\normalfont{ does a path exist from}} u {\normalfont{to}} v {\normalfont{in}} G {\normalfont{of weight at most}} k?\\
{\normalfont{2. Given a connected weighted graph}} G, {\normalfont{and an integer}} k{\normalfont{, does}} G {\normalfont{contain a spanning tree of weight at most}} k?\\
{\normalfont{The two problems above are decision problems because the answer to these problems is either a yes or a no.}}
\end{example}
Reasoning about computational problems is usually done by analyzing the properties belonging to the algorithms that solve these problems. A common property which is analyzed is the running time of an algorithm. This is defined in Definition \ref{running_time} below.
\begin{definition}
\label{running_time}
The running time of an algorithm A given an input I is the number of steps executed by A when given I. The running time of an algorithm A can be represented by the function T $:$ $\mathbb{N}$ $\mapsto$ $\mathbb{N}$ where the domain represents the size of the input, and the co-domain represents the amount of steps executed by the algorithm, given an input size. {\normalfont{\cite{cormen_leiserson_rivest_stein}, \cite{adamchik_2009}}}
\end{definition}
What follows is an example which demonstrates how the running time of an algorithm can be computed.
\begin{example}
\label{example_running}
{\normalfont{Consider an algorithm that performs the addition of two}} n-{{\normalfont{bit strings. Then, if the addition of 2 bits takes}}} a {\normalfont{computational steps, the addition of 2}} $\mathit{n}${\normalfont{-bit strings will take}} a $\times$ n {\normalfont{steps, since, each computation must be performed}} n {\normalfont{times. Therefore, the running time of this algorithm for an input of size}} n {\normalfont{can be represented by the function,\\}} T(n) = a $\times$ n. {\normalfont{\cite{cormen_leiserson_rivest_stein}, \cite{adamchik_2009}}}
\end{example}
Expressing the running time in this way seems pretty easy to understand. However, this representation has a number of problems. Typically, the running time of an algorithm is fixed for any input, however, the running time function is not defined on the actual inputs, but, on the input sizes. As a result, since the same input size might represent different inputs, there could be different running times for the same input size $\mathit{n}$ as this depends on which input is given. This means that for any input size $\mathit{n}$, there may be different running time functions giving different running times. These different running time functions represent the different cases that could arise in an algorithm, given any input size $\mathit{n}$. The most important runnning time functions are the worst-case running time, the best-case running time, and the average-case running time. The worst/best/average case running time are the longest/shortest/average running time of an algorithm given any input of size $\mathit{n}$ respectively. Therefore, the problem is what running time function should be used to reason about algorithms. Although the average-case running time generally has a greater practical significance, it is often very difficult to compute. Therefore, the worst-case running time is usually used, that is, finding the running time function that gives the worst-case running time for any input of size $\mathit{n}$. The worst-case running time will be used because it is an upper bound to all other running times, therefore, this assures that the algorithm will never get a longer running time for any input of size $\mathit{n}$. So, if the worst case running time of an algorithm is polynomial in the input size, then it is known that the problem is in the class P (These terms will be defined later). \cite{cormen_leiserson_rivest_stein}, \cite{adamchik_2009}\\\\
Another problem is that the running time of an algorithm can be difficult to compute precisely because, it depends on the computer the algorithm is executed on. This informally means that, an algorithm may need different amount of computational steps on different computers for the same input. For example, the number of steps taken for adding 2-bits as seen in Example \ref{example_running} may be different on different computers, resulting into running time functions that represent the same worst/best/average running time case, differing only in multiplicative constants and low order terms. To get around this problem, a mechanism that identifies a common property between all running time functions is needed. This mechanism is called asymptotic analysis. This means making the input size increase without bound so that only the order of growth of the running time function is relevant. This works because, as the input size increases, the low-order terms and multiplicative constants of the running time function get dominated by the size of the input itself. This means that, given any running time function, the low-order terms and multiplicative constants of the function can be ignored leaving only the order of growth (which is common across all running time functions representing the same best/worst/average running time case). In a running time function, the order of growth is given by the highest order term. This is true because, the highest order term has the greatest effect on the running time function as the input size increases. For example, given the running time function $\mathit{T(n) = an^2+bn+c}$, $\mathit{n^2}$ is the term that increases the running time the most, as the input size increases without bound. \cite{cormen_leiserson_rivest_stein}, \cite{adamchik_2009} \\\\
The following example is used to explain better what has been discussed in the previous paragraph.
\begin{example}
{\normalfont{Suppose that the}} n-{\normalfont{bit addition algorithm in Example \ref{example_running} has a worst-case running time function}} T(n) = a $\times$ n {\normalfont{on computer}} A, {\normalfont{and a worst-case running time function}} T(n) = b $\times$ n {\normalfont{on computer}} B {\normalfont{where,}} a/b {\normalfont{represent the number of computational steps required to perform 2-bit addition on computers}} A/B {\normalfont{respectively. The asymptotic analysis of the algorithm concludes that, the}} n-{\normalfont{bit addition algorithm has an }} $n$ {\normalfont{ asymptotic behaviour. Note that this is common to both running time functions}}. {\normalfont{\cite{cormen_leiserson_rivest_stein}, \cite{adamchik_2009}}}
\end{example}
To express the asymptotic behaviour of functions mathematically, asymptotic notations are normally used. One of these notations is called the Big O Notation, whose exact definition is given below.
\begin{definition}
\label{bigonotation}
Let f $:$ $\mathbb{N}$ $\mapsto$ $\mathbb{N}$ and g $:$ $\mathbb{N}$ $\mapsto$ $\mathbb{N}$ be two monotonic functions. Then, f(n) = O(g(n)) if $\exists$ c $>$ 0 and $n_0$ $>$ 0 such that f(n) $\leq$ cg(n) $\forall$ n $\geq$ $n_0$. \normalfont{\cite{adamchik_2009}}
\end{definition}
Definition \ref{bigonotation} implicitly states that when writing $\mathit{f(n) = O(g(n)), g(n)}$ is an upperbound to $\mathit{f(n)}$ as the input size $\mathit{n}$ grows without bound \cite{adamchik_2009}. Therefore, if $\mathit{f(n)}$ is the worst case running time function of an algorithm and $\mathit{f(n) = O(g(n))}$, then it is guaranteed that as $\mathit{n \rightarrow \infty}$, the algorithm will never get a running time longer than $\mathit{g(n)}$ times some constant. Note that Definition \ref{bigonotation} can be used even for the asymptotic analysis of the average and best case running times. However, unless otherwise stated, it can be assumed that in this dissertation Big O Notation will be used for the worst-case asymptotic analysis of algorithms. The following is a continuation of Example \ref{example_running}.
\begin{example}
\label{bigonotationexample}
{\normalfont{By Example \ref{example_running}, the algorithm that performs two}} n-{\normalfont{bit addition has a running time function}} T(n)= an. {\normalfont{Therefore, by Definition \ref{bigonotation},}} T(n) = O(n) {\normalfont{because, since}} a {\normalfont{is a natural number}} $\exists$ z $\geq$ a {\normalfont{such that,}} an $\leq$ zn $\forall$ n $\in$ $\mathbb{N}$. {\normalfont{Therefore at each step, the order of growth of}} T(n) {\normalfont{is linear with respect to the input size}} n.
\end{example}
It is important to note that if $\mathit{T(n) = O(g(n))}$, and $\mathit{h(n)}$ has a higher order of growth than $\mathit{g(n)}$, then, $\mathit{T(n) = O(h(n))}$. For example, in Example \ref{bigonotationexample} $\mathit{T(n) = O(n^2)}$ as well since $\mathit{n^2}$ is an upperbound to $\mathit{n}$, and hence, $n^2$ is an upperbound to $\mathit{T(n)}$. However, unless otherwise stated, it can be assumed that whenever $\mathit{T(n) = O(g(n))}$, $\mathit{g(n)}$ is a tight asymptotic bound to $\mathit{T(n)}$. A tight asymptotic bound to $\mathit{T(n)}$ is a function $\mathit{g(n)}$ such that if $\mathit{T(n)=O(f(n))}$, then $\mathit{g(n)=O(f(n))}$.\\\\
Given the above, the time complexity of an algorithm can now be defined formally.
\begin{definition}
\label{time_complexity}
Given a function g(n) and an algorithm A having a running time function T(n), A is said to have a time complexity of O(g(n)) if T(n) = O(g(n)). \normalfont{\cite{adamchik_2009}}
\end{definition}
Definition \ref{time_complexity} does not apply only to the worst case running time function. However, recall that when discussing running time functions it was highlighted that it can be assumed that, the worst-case running time function is always taken. Therefore, it can be assumed that the worst-case running time function is taken when saying that an algorithm has an $\mathit{O(g(n))}$ time complexity, or when saying that an algorithm is an $\mathit{O(g(n))}$ algorithm. For clarity, this will sometimes also be termed as the `worst-case time complexity' of the algorithm. Note that when it is said that a problem can be solved in $\mathit{O(g(n))}$ time, this means that the problem can be solved by an algorithm that has an $\mathit{O(g(n))}$ worst-case time complexity.\\\\
The following table summarizes some of the most common time complexities encountered in computational theory, and what they are commonly known as. The table was adapted using information from \cite{pettis} and \cite{carter_1999}.
\begin{table}[!htbp]
\begin{center}
\caption{Time Complexities and Terminology Used.}
\label{tab:table1}
\begin{tabular}{l|c} % <-- Alignments: 1st column left, 2nd middle and 3rd right, with vertical lines in between
\textbf{Time Complexity} & \textbf{Terminology Used}\\
\hline
O(1) & Constant time \\
$O(\mathit{\log n})$ & Logarithmic time \\
$O(\mathit{n})$ & Linear time \\
$O(\mathit{n \log n})$ & Log-linear time \\
$O(\mathit{n^2})$ & Quadratic time \\
$O(\mathit{n^c})$, $\mathit{c > 0}$ constant & Polynomial time \\
$O(\mathit{a^n})$, $\mathit{a \geq 2}$ constant & Exponential time \\
$O(\mathit{n!})$ & Factorial time
\end{tabular}
\end{center}
\end{table}
\\In Table \ref{tab:table1}, the time-complexities are ordered in ascending order of growth using the curves in \cite{rowell}. This means that for example constant time algorithms have a smaller order of growth than logarithmic time algorithms. It must be noted that the higher the order of growth of an algorithm is, the more steps the algorithm needs to execute for any input, and therefore, the less efficient the algorithm is. In general, problems that can be solved in polynomial time are tractable problems \cite{cormen_leiserson_rivest_stein}. On the other hand, problems that require super-polynomial (exponential/factorial) time algorithms to be solved are intractable problems \cite{cormen_leiserson_rivest_stein}. In general, tractable problems are considered to be easy to solve, whereas intractable problems are hard to solve. Table \ref{tab:table2} below is constructed to show why problems that can be solved in polynomial time are tractable, whilst, problems that cannot be solved in polynomial time are intractable. Table \ref{tab:table2} gives the execution time of any algorithm given an input of size $\mathit{10}$ or $\mathit{100}$, provided that the algorithm's time complexity is $\mathit{O(n)}$, $\mathit{O(n^2)}$, $\mathit{O(2^n)}$ or $\mathit{O(n!)}$.\\\\
\begin{table}[h]
\begin{center}
\caption{Execution Time Given Input Size and Time Complexity}
\label{tab:table2}
\begin{tabular}{l|c|c|c} % <-- Alignments: 1st column left, 2nd middle and 3rd right, with vertical lines in between
\textbf{\backslashbox{Time Complexity}{Input Size}} & \textbf{10} & \textbf{100}\\
\hline
&&\\
$\mathit{O(n)}$ & $\mathit{3.171{\times10^{-16} years}}$ & $\mathit{3.171{\times10^{-15} years}}$\\
$\mathit{O(n^2)}$ & $\mathit{3.171{\times10^{-15} years}}$ & $\mathit{3.171{\times10^{-13} years}}$ \\
$\mathit{O(2^n)}$ & $\mathit{3.247{\times10^{-14} years}}$ & $\mathit{4.0197{\times10^{13} years}}$\\
$\mathit{O(n!)}$ & $\mathit{1.1507{\times10^{-10} years}}$ & Very large to compute
\end{tabular}
\end{center}
\end{table}
\\Table \ref{tab:table2} above was adapted from \cite{pettis}. In Table \ref{tab:table2} above it is assumed that the machine the algorithms can be executed on perform $\mathit{10^9}$ computational steps per second, as in \cite{pettis}. It can be concluded from Table \ref{tab:table2} above that as the size of the input increases, only the polynomial functions have small values. This means that a polynomial time algorithm does not have a very large execution time even if the size of the input is increased substantially. On the other hand, an algorithm with an exponential or factorial time complexity has a very large execution time if the input size is large. This true because, the rate of growth of the time complexity function is very large. What follows is an example which wraps up all the time complexity concepts discussed so far.
\begin{example}
\label{time_complexity_example}
{\normalfont{In Example \ref{bigonotationexample},}} T(n) = O(n). {\normalfont{Therefore, according to Definition \ref{time_complexity}, the}} n-{\normalfont{bit addition algorithm has a time complexity of}} O(n). {\normalfont{Therefore by Table \ref{tab:table1}, the}} n-{\normalfont{bit addition algorithm has a linear time complexity, or simply is a linear time algorithm. As a result, the}} n-{\normalfont{bit addition problem can be solved in linear time, and hence in polynomial time.}}.
\end{example}
In what has been presented so far in this section, it has always been assumed that any problem has an algorithm that solves it. However, this is not the case because there are computational problems that cannot be solved by any computer, no matter how much processing power and time is dedicated to them. In addition to that, there are problems that can be solved in polynomial time and others which cannot. This seems to indicate that there are several classes of problems. In fact, one of these classes is the class P defined below. \cite{cormen_leiserson_rivest_stein}
\begin{definition}
\label{P}
The class P consists of all the problems that can can be solved in polynomial time {\normalfont{\cite{cormen_leiserson_rivest_stein}}}.
\end{definition}
Therefore, according to Definition \ref{P}, the class P consists of tractable problems. An example of a problem in the class $\mathit{P}$ is the $\mathit{n}$-bit addition problem defined in Example \ref{example_running}. The $\mathit{n}$-bit addition problem is in the class P because it can be solved in linear time (as Discussed in Example \ref{time_complexity_example}). Apart from the class P, there is also the class NP. The following two definitions help in defining the class NP. Note that Definition \ref{certificate} was adapted from information in \cite{kaveh_tehada_2013}.
\begin{definition}
\label{certificate}
A certificate of a solution is a mathematical object that guarantees the solution.
\end{definition}
It can be deduced from Definition \ref{certificate} that if a certificate is correct, then the solution is correct. Therefore, one can verify if a given solution is correct by actually verifying that the certificate is correct. For convenience, in this dissertation, `verifying a solution/certificate if it is correct' will be referred to as `verifying a solution'. Note that Definition \ref{certificate} is not the exact mathematical definition of a certificate. The exact mathematical definition of a certificate is not given because, it requires discussing concepts about languages and verification algorithms which are beyond the scope of this project. A more formal discussion about certificates is given in \cite{cormen_leiserson_rivest_stein}.
\begin{definition}
\label{polynomial_time_verification}
Suppose that S is an arbitrary solution to a problem A. Then, A can be verified in polynomial time if given a certificate C of S, $\exists$ a polynomial time algorithm such that, when supplied C and an instance (input) of A as inputs, the algorithm can verify if C is correct. {\normalfont{\cite{cormen_leiserson_rivest_stein}}}
\end{definition}
In other words, Definition \ref{polynomial_time_verification} states that, a problem can be verified in polynomial time if the certificate that lead to the solution can be verified by a polynomial time algorithm. Note that as discussed before, when verifying a certificate, the solution is implicitly verified. What follows now is an example that explains Definition \ref{certificate} and Definition \ref{polynomial_time_verification}.
\begin{example}
\label{example_polynomial_time_verification}
{\normalfont{To explain Definition \ref{certificate} and Definition \ref{polynomial_time_verification}, the Hamiltonian Cycle problem is going to be used. Given a graph}} G(V,E){\normalfont{, the Hamiltonian cycle problem is the task of determing whether}} G {\normalfont{has a Hamiltonian cycle \cite{cormen_leiserson_rivest_stein}. A certificate of a solution to the Hamiltonian cycle problem can be a sequence of distinct vertices}} S = $v_1$, ..., $v_n$ {\normalfont{on }}$|$V$|$ = n {\normalfont{vertices. Note that if the solution to a Hamiltonian cycle problem instance is `yes' then, the certificate would guarantee the solution if it is a Hamiltonian cycle in}} G{\normalfont{. On the other hand, if the solution to the Hamiltonian cycle problem is a `no', then, any certificate given guarantees the solution. The solution is guaranteed when the answer is `no' because, since there is no Hamiltonian cycle in}} G{\normalfont{, there is no certificate that represent a Hamiltonian cycle in}} G{\normalfont{. As a consequence of these observations, for the Hamiltonian cycle problem to be verifiable in polynomial time, there must be a polynomial time algorithm that checks whether the certificate is a Hamiltonian cycle when the answer is `yes'. This can be done by creating an algorithm that checks if }} $\forall$ i $\in$ [n-1], \{$v_i, v_{i+1}\} \in$ E and \{$v_n$, $v_1\} \in$ E. {\normalfont{Therefore, the checker algorithm has a worst case}} O(n) {\normalfont{time complexity because, it must check all edges in the cycle. Hence, the checker algorithm takes polynomial time to execute. As a result, the Hamiltonian cycle problem can be verified in polynomial time.}}
\end{example}
Note that, as shown in Example \ref{example_polynomial_time_verification}, to show that a problem is verifiable in polynomial time, it is only needed to be shown that a certificate of the solution `yes' can be verified in polynomial time'. This is because, any certificate is correct when the answer is `no'. Now the class NP can be defined.
\begin{definition}
\label{NP}
The class NP consists of all decision problems that can be verified in polynomial time {\normalfont{\cite{cormen_leiserson_rivest_stein}, \cite{dorigo_stutzle_thomas_2004}}}.
\end{definition}
It is important to note that according to Definition \ref{NP}, a problem $\mathit{A}$ is in the class NP if $\mathit{A}$ is a decision problem. This will not be a limiting factor because, as discussed before, every optimization problem can be transformed into a related decision problem by specifying a bound on the solution to be optimized (see Example \ref{example decision/optimization}). Reasoning about decision problems is easier because, the related decision problem is not harder than the original optimization problem. As a result, one can prove that an optimization problem cannot be solved in polynomial time by showing that, the related decision problem cannot be solved in polynomial time.\cite{cormen_leiserson_rivest_stein}\\\\
Another important class of problems is the class of NP-Complete problems. To define this class, the concept of a reduction algorithm must first be defined.
\begin{definition}
\label{reduction_algorithm}
Suppose that A and B are two decision problems. A reduction algorithm is an algorithm that transforms any instance $\alpha$ of A into an instance $\beta$ of B with the following properties:
\begin{itemize}
\item The algorithm takes polynomial time to execute
\item The solution to $\alpha$ is yes $\iff$ the solution to $\beta$ is yes.
\end{itemize}
For convenience, if there exists a reduction algorithm R which transforms any instance $\alpha$ of $A$ into an instance $\beta$ of $B$, then it will be said that `R transforms A to B' or `A is reducible to B'. {\normalfont{\cite{cormen_leiserson_rivest_stein}}}
\end{definition}
A reduction algorithm has important applications in computational theory. In fact, suppose that there are two decision problems $\mathit{A}$ and $\mathit{B}$ such that an instance $\mathit{\alpha}$ of the problem $\mathit{A}$ needs to be solved in polynomial time. Suppose also that $\mathit{B}$ can be solved in polynomial time using algorithm $\mathit{B^\star}$, and there exists a reduction algorithm $\mathit{R}$ from $\mathit{A}$ to $\mathit{B}$. Then, $\mathit{A}$ can be solved in polynomial time in the following way :
\begin{itemize}
\item Use $\mathit{R}$ to transform $\mathit{\alpha}$ into an instance $\mathit{\beta}$ of $\mathit{B}$
\item Execute $\mathit{B^\star}$ on the input $\beta$ to get answer $\gamma$ = yes/no
\item Give $\mathit{\gamma}$ as an answer for $\mathit{\alpha}$.
\end{itemize}
It is important to note that $\mathit{A}$ can be solved in polynomial time because the total time of the above procedure is polynomial. The above procedure is polynomial because, the total time of executing two polynomial time algorithms is the summation of two polynomials, which by properties of polynomials is still polynomial. It is important to note that, the above could only be done because the solution to $\alpha$ is yes $\iff$ the solution to $\beta$ is yes. \cite{cormen_leiserson_rivest_stein}\\\\
Another important application of reduction algorithms is to show that a polynomial time algorithm does not exist for a particular decision problem. Let $\mathit{A}$ and $\mathit{B}$ be decision problems such that $\mathit{A}$ cannot be solved in polynomial time. Suppose that $\exists$ a reduction algorithm $\mathit{R}$ from $\mathit{A}$ to $\mathit{B}$. Then it can be shown that $\mathit{B}$ cannot be solved in polynomial time because, if $\mathit{B}$ can be solved in polynomial time, then $\mathit{R}$ can be used to transform any instance $\alpha$ of $\mathit{A}$ into an instance $\beta$ of $\mathit{B}$ where again, $\alpha$ can be solved in polynomial time as discussed in the previous paragraph. Since $\alpha$ is arbitrary, $\mathit{A}$ can be solved for all inputs in polynomial time and hence, $\mathit{A}$ can be solved in polynomial time. This contradicts the assumption that $\mathit{A}$ cannot be solved in polynomial time. Therefore, when using a reduction algorithm from a decision problem $\mathit{A}$ to a decision problem $\mathit{B}$, one implicitly makes the statement that decision problem $\mathit{B}$ is as hard (in terms of tractability as discussed before) as decision problem $\mathit{A}$. If not, this would contradict properties that belong to the problem $\mathit{A}$. After defining reduction algorithms, the class of NP-Complete problems can now be defined. \cite{cormen_leiserson_rivest_stein}
\begin{definition}
\label{NPC}
A decision problem A is in the class NP-Complete (NPC) if A is in NP, and A is as hard as any problem in NP {\normalfont{\cite{cormen_leiserson_rivest_stein}} }.
\end{definition}
In other words, Definition \ref{NPC} states that for a problem $\mathit{A}$ to be in the class NPC, $\mathit{A}$ must be in NP, and that any problem in NP can be transformed using a reduction algorithm into $\mathit{A}$. Therefore, to show that a problem $\mathit{A}$ is in NPC, one must show that it is in NP, and that every problem in NP can be reduced to $\mathit{A}$. Showing that $\mathit{A}$ is in NP is not difficult because, it only requires to show that an alleged solution for an instance of A can be verified in polynomial time. As discussed before, this can be done by checking that a certificate for the solution `yes' can be verified in polynomial time. On the other hand, showing that every problem in NP is reducible to $\mathit{A}$ may seem difficult because, this requires checking that every problem in NP can be reduced to $\mathit{A}$. However, this is not the case because by Exercise 34.3-2 in \cite{cormen_leiserson_rivest_stein}, reduction algorithms are transitive. Hence, considering a known NP-Complete problem $\mathit{H}$, since $\mathit{H}$ is NP-Complete, every problem in NP can be reduced to $\mathit{H}$. Therefore, if a reduction algorithm from $\mathit{H}$ to $\mathit{A}$ can be found, then by transitivity of reduction algorithms, every problem in NP can be reduced to $\mathit{A}$ using a reduction algorithm \cite{cormen_leiserson_rivest_stein}.\\\\
The classes P, NP and NPC have very important implications in computational theory. One of the most important problems in computational theory is whether P = NP. It is known that P $\subseteq$ NP. This is true because, since every problem in P can be solved in polynomial time, then implicitly the polynomial time verification has been done by the algorithm solving the problem (because it is guaranteed that the algorithm gives a correct solution in polynomial time). Note that to be precise, the problems in P must first be converted to a related decision problem if they are to be in NP. But since this can be done, and the related decision problem is no harder than the original problem, the result holds. Therefore, every problem in P is in NP and hence, P $\subseteq$ NP. Showing that NP $\subseteq$ P may seem to be easy as well. In fact, to show that NP $\subseteq$ P, it must be shown that $\exists$ an NP-Complete problem $\mathit{C}$ that can be solved in polynomial time. This would be enough because, by the NP-Completeness of $\mathit{C}$, every problem in NP can be reduced using a reduction algorithm to $\mathit{C}$ and hence, every problem in NP could be solved in polynomial time since the reduction algorithm and the algorithm solving $\mathit{C}$, would take a total time that is still polynomial. Unfortunately, no polynomial time algorithm has yet been discovered for any problem in NPC. In addition to this, no one has yet been able to prove that such polynomial time algorithms do not exist for NP-Complete problems. This essentially means that the NP $\subseteq$ P problem is still an open question. \cite{cormen_leiserson_rivest_stein}\\\\
It is important to note that as indicated in \cite{dorigo_stutzle_thomas_2004}, the classes P, NP and NPC are usually defined more formally using models of computations such as Turing Machines. This was not done in this way because Turing Machines are out of the scope of this dissertation. For a more formal definition of these classes, the reader can find more information in \cite{garey_johnson_1979}.\\\\
After defining the computational theory required, it is now time to mathematically define the Travelling Salesman Problem, and present some important properties about it. This will lead to showing that the Travelling Salesman Problem is NP-Complete, hence, showing that no polynomial time algorithm which solves the Travelling Salesman Problem has been discovered yet. All of this will be done in the next chapter.
\newpage
{\center{\section{The Travelling Salesman Problem} \label{TSP_CHAPTER}}}
As discussed in Chapter \ref{introduction}, this chapter is divided into two main sections. In the first section, the Travelling Salesman Problem will be mathematically defined, and some properties about it will be discussed. This will lead to showing that the Travelling Salesman Problem is in NPC. As a result, by the discussion in Section \ref{computational_theory}, no polynomial time algorithm which solves the Travelling Salesman Problem is known. According to \cite{cormen_leiserson_rivest_stein}, this does not mean that the Travelling Salesman Problem cannot be approximated in polynomial time. Hence in the second section, two algorithms that can approximate the Travelling Salesman Problem in polynomial time will be presented, along with some theoretical results about them.
\subsection{Problem Definition, Properties and Complexity Class}
The Travelling Salesman Problem can be defined mathematically as follows.
\begin{definition}
\label{TSP}
Given a complete weighted graph G on at least three vertices, the Travelling Salesman Problem (TSP) is the problem of finding a minimum weight Hamiltonian cycle in G {\normalfont{\cite{bondy_murty_1982}}}.
\end{definition}
According to Definition \ref{TSP}, a weighted graph $\mathit{G}$ must be complete for the TSP to be defined. However, this is not a limiting factor because, every weighted graph $\mathit{G^\prime}$ can be converted into a complete weighted graph, by adding the missing edges in $\mathit{G^\prime}$ with a very large weight. Note that this can be done because, if $\mathit{G^\prime}$ is an incomplete weighted Hamiltonian graph, the minimum weight Hamiltonian cycle in $\mathit{G^\prime}$ would never be effected when transforming $\mathit{G^\prime}$ into a complete graph in this way. The minimum weight Hamiltonian cycle can never be effected because, since the missing edges have a very large weight, they can never be part of the minimum weight Hamiltonian cycle when added. Therefore, since incomplete weighted graphs can be represented as complete weighted graphs, it can be assumed that complete weighted graphs are always being considered when discussing the TSP.\\\\
Definition \ref{TSP} also assumes indirectly that if $\mathit{G}$ is a complete weighted graph, then $\mathit{G}$ is Hamiltonian. This is true because, any 2 distinct vertices in $K_n$ are adjacent, therefore, every Hamiltonian cycle consisting of $n$ vertices must be in $K_n$. Note that for the TSP, $K_2$ and $K_1$ are implicitly excluded because by Definition \ref{cycle}, cycles are defined on graphs having at least 3 vertices. The following example is used to clearly explain what the TSP defined in Definition \ref{TSP} is all about.
\begin{example}
\label{example_2.1}
{\normalfont{Consider the complete graph}} $K_4$ {\normalfont{with the weight function}} f {\normalfont{defined in Example \ref{example4}. The distinct Hamiltonian cycles of}} G {\normalfont{are:}}
\begin{itemize}
\item $v_1$, $v_2$, $v_3$, $v_4$, $v_1$ {\normalfont{with weight}} 23
\item $v_1$, $v_3$, $v_4$, $v_2$, $v_1$ {\normalfont{with weight}} 23
\item $v_1$, $v_3$, $v_2$, $v_4$, $v_1$ {\normalfont{with weight}} 18
\end{itemize}
{\normalfont{Therefore, by Definition \ref{TSP}, the solution of the TSP given the graph in Example \ref{example4} would be the minimum weight Hamiltonian cycle}} $v_1$, $v_3$, $v_2$, $v_4$, $v_1$. {\normalfont{Note that $K_4$ has three distinct Hamiltonian cycles. This fact is a special case of Lemma \ref{distinct_hamiltonian_cycles} for any $K_n$}}.
\end{example}
Example \ref{example_2.1} suggests implicitly that to solve the TSP on any complete weighted graph, one can do it by a brute-force algorithm. A brute-force algorithm for the TSP is one that computes all the Hamiltonian cycles in a graph, and chooses the one of least weight as a solution \cite{Sanchit}. However, $\mathit{K_n}$ contains $\mathit{\frac{(n-1)!}{2}}$ distinct Hamiltonian cycles. Therefore, a brute-force algorithm is inefficient because, it must generate all the distinct Hamiltonian cycles in the graph taking factorial time to do so \cite{Sanchit}.
\begin{lemma}
\label{distinct_hamiltonian_cycles}
For n $\geq$ 3, $K_n$ has $\frac{(n-1)!}{2}$ distinct Hamiltonian cycles {\normalfont{\cite{Sanchit}}}.
\end{lemma}
\begin{proof}
Suppose that $V$ and $E$ are the set of vertices and the set of edges of $K_n$ respectively. Any Hamiltonian cycle in $K_n$ can be represented by a permutation of $V$ and vice-versa. For example the Hamiltonian cycle $h$ = $v_1$, ..., $v_n$, $v_1$ can be represented by the permutation $v_1$ $v_2$ ... $v_n$ and vice-versa. Therefore, the number of Hamiltonian cycles in $K_n$ is equal to the number of permutations of $V$. Since there are $n!$ permutations of $V$, there must be $n!$ Hamiltonian cycles in $K_n$. Now, in terms of Hamiltonian cycles, $h$, is the same as the Hamiltonian cycle $v_2$, ..., $v_n$, $v_1$, $v_2$. This is true because, although the order of vertices is different, they use the same edges from $E$. Therefore, in general there are $n$ distinct permutations (one for every vertex in $K_n$ when used as a starting vertex of the vertex sequence) representing the same Hamiltonian cycle. In addition to this, each permutation would still represent the same Hamiltonian cycle if the vertices are in reversed order. This is true because, again the same edges from $E$ are used. Therefore, there must be $\frac{n!}{2n}$ = $\frac{(n-1)!}{2}$ distinct Hamiltonian cycles in $K_n$.
\end{proof}
As discussed previously, Lemma \ref{distinct_hamiltonian_cycles} confirms that a brute-force algorithm is not efficient enough to solve the TSP for large inputs. This means that either a new algorithm that solves the TSP in polynomial time must be designed, or it can be shown that the TSP is in NPC. As discussed in Section \ref{computational_theory}, the latter would confirm that no polynomial time algorithm is known for the TSP. It is important to note that, this does not mean that TSP cannot be solved in polynomial time.\\\\ In what follows, it will be proven that the TSP is an NP-Complete problem. Therefore, first it will be shown that TSP is in NP, and secondly it will be shown that TSP is as hard as any problem in NP. It is important to note that to prove that TSP is in NP, the TSP defined in Definition \ref{TSP} must be converted into a related decision problem. This is needed because in Definition \ref{TSP}, the TSP was presented as an optimization problem, and because, by Definition \ref{NP}, the class NP contains decision problems only. The TSP's related decision problem is the following :
\begin{definition}[TSP defined as a decision problem]
\label{dec_prob}
Given an instance $(G,k)$ where $G$ is a complete weighted graph and $k$ is an integer, does $G$ contain a Hamiltonian cycle of weight at most $k$ \cite{cormen_leiserson_rivest_stein}?
\end{definition}
Having converted the TSP in Definition \ref{TSP} into a related decision problem, it is now time to show that TSP is in NP. This is shown in Lemma \ref{TSP_in_NP} below. Note that the proof of Lemma \ref{TSP_in_NP} was constructed using ideas in \cite{cormen_leiserson_rivest_stein}.
\begin{theorem}
\label{TSP_in_NP}
TSP is in NP {\normalfont{\cite{cormen_leiserson_rivest_stein}}}.
\end{theorem}
\begin{proof}
Consider the TSP's decision problem variant. To show that TSP is in NP, it must be shown that an alleged solution for the TSP can be verified in polynomial time. As discussed in Section \ref{computational_theory}, this can be done by verifying in polynomial time a certificate for the solution `yes'. Suppose that $G(V,E)$ is a complete weighted graph on $n$ vertices. A certificate of a solution to the TSP given G can be a sequence of $n$ vertices. Suppose that $v_1$, $v_2$, ..., $v_n$ is an arbitrary certificate. To check that the certificate is correct, two checks on this certificate must be carried out by an algorithm. Firstly, the algorithm must check that no vertex in the certificate is repeated (i.e. that the certificate is a Hamiltonian cycle). Secondly, the algorithm must check that, the total weight of the Hamiltonian cycle obtained from the certificate is at most $k$. Therefore, the algorithm is made up of the following two parts:
\begin{itemize}
\item In the first part, the algorithm must check that for all $i$ $\in$ [$n$], $v_i$ appears only once in the certificate. This part of the algorithm can be implemented by keeping an array $A$ of boolean values having length $n$ such that, for each $v_i$, the algorithm checks that $A[i]$ is $0$. If $A[i]$ is set to $0$, the algorithm sets $A[i]$ to $1$ when it encounters $v_i$ in the given sequence and then moves to the next vertex in the solution sequence. If while traversing the certificate, the checker algorithm notices that for $j$ $\in$ [$n$], $A[j]$ has already been set to 1, the algorithm stops because this means that, $v_j$ is repeated and hence the certificate is not a Hamiltonian cycle. As a result, the certificate is incorrect. If this does not happen, the algorithm starts executing the next part (next bullet point). Clearly, this first part of the algorithm has a worst-case $O(n)$ time complexity were $n$ is the length of the solution sequence. The worst-case occurs when the checker algorithm needs to traverse all the vertices in the solution sequence. As a result, this part of the checker algorithm can be executed in polynomial time.
\item The checker algorithm must now check that, the total weight of the Hamiltonian cycle represented by the sequence of vertices is at most $k$ . This part of the checker algorithm can be implemented by keeping a floating point variable named `total\_weight', and summing the weight of each edge used in the Hamiltonian cycle to the variable `total\_weight'. Since a Hamiltonian cycle contains $n$ edges, this part of the checker algorithm has a worst-case $O(n)$ time complexity because, all edges must be summed to calculate the total weight of the Hamiltonian cycle. After the total weight of the solution sequence is calculated, it is checked in $O(1)$ time whether `total\_weight' $\leq k$. If this inequality is satisfied, the checker algorithm declares the solution as correct, otherwise declares the solution as incorrect. Therefore, it can be concluded that, this part of the checker algorithm has an $O(n)$ worst-case time complexity. Hence, this part of the checker algorithm can be executed in polynomial time.
\end{itemize}
Since the checker algorithm is made up of two parts that execute in polynomial time, then, the whole algorithm runs in polynomial time. Therefore, it can be concluded that, any certificate can be verified if it is correct or not in polynomial time by the above checker algorithm. Therefore, the TSP can be verified in polynomial time and hence, it is in NP.
\end{proof}
To show that the TSP is an NP-Complete problem, it remains to be shown that the TSP is as hard as any problem in NP. This can be done as discussed in Section \ref{computational_theory}, by taking a known NP-Complete problem, and reducing it to the TSP. The known NP-Complete problem that is going to be used is the Hamiltonian cycle problem. Note that it has already been showed in Example \ref{example_polynomial_time_verification} that the Hamiltonian cycle problem is in NP. However, it will not be shown that the Hamiltonian cycle problem is as hard as any problem in NP, because, the proof is too long. In \cite{cormen_leiserson_rivest_stein}, it is shown that the Hamiltonian cycle problem is as hard as any problem in NP, by reducing the Vertex-Cover problem into the Hamiltonian cycle problem.
\begin{theorem}
\label{TSP_NP-Complete}
The TSP is an NP-Complete problem {\normalfont{\cite{cormen_leiserson_rivest_stein}}}.
\end{theorem}
\begin{proof}
Consider the TSP as in Definition \ref{dec_prob}. By Theorem \ref{TSP_in_NP}, the TSP is in NP. Therefore, it remains to be shown that TSP is hard as any problem in NP. By Theorem 34.13 in \cite{cormen_leiserson_rivest_stein}, the Hamiltonian cycle problem is NP-Complete. It will now be shown that the Hamiltonian cycle problem can be reduced to the TSP. Let $G(V,E)$ be an arbitrary instance of the Hamiltonian cycle problem such that $\abs{V} = n$ . An instance of the TSP can be constructed from $G$ as follows. From $G$ construct the complete weighted graph $G^\prime(V^\prime,E^\prime,f)$ such that, $V^\prime=V$, $E^\prime=\{\{u, v\} : u, v \in V$ and $u \neq v\}$ and \\
$f\colon V^\prime \times V^\prime \to \{0,1\}, f(\{u, v\}) = \begin{cases} 0& \text{if } \{u, v\} \in E\\ 1 & \text{otherwise} \end{cases}$\\
Therefore, an instance of the TSP can be the pair $(G^\prime,0)$, where $G^\prime$ is the constructed complete weighted graph, and $k =0$ ($k$ as defined in Definition \ref{dec_prob}). From Definition \ref{reduction_algorithm}, one can show that the above transformation is a reduction algorithm if the transformation can be carried in polynomial time, and if the solution to the Hamiltonian cycle problem given $G$ is yes $\iff$ the solution to the TSP given $(G^\prime,0)$ is yes. According to \cite{ray_2013}, $K_n$ contains $\frac{n(n-1)}{2}$ edges. Therefore, the algorithm implementing the above transformation has an $O(n^2)$ worst-case time complexity because, it needs to create $\frac{n(n-1)}{2}$ edges between $n$ vertices at worst. Now, for the above transformation to be a reduction algorithm, there remains to show that the solution to the Hamiltonian cycle problem given $G$ is yes $\iff$ the solution to the TSP given $(G^\prime,0)$ is yes. Therefore, it must be shown that $G$ has a Hamiltonian cycle $\iff$ $G^\prime$ has a Hamiltonian cycle of weight at most 0. Suppose that $G$ contains some Hamiltonian cycle $h$. Since $h$ is a Hamiltonian cycle in $G$, each edge of $h$ is in $E$, therefore $h$ has weight 0 in $G^\prime$. Therefore, h must be a Hamiltonian cycle in $G^\prime$ of weight at most 0. Suppose that $G^\prime$ contains a Hamiltonian cycle $C$ of weight at most 0. Then, each edge of the $C$ must belong to $G$. Therefore, $C$ must be a Hamiltonian cycle in $G$. Therefore, it has been shown that there is a reduction algorithm from the Hamiltonian cycle problem to the TSP. Therefore, TSP is as hard as the Hamiltonian cycle problem. Now since the Hamiltonian cycle problem is NP-Complete, it is as hard as any problem in NP. This means that by the transitivity of reduction algorithms, the TSP is as hard as any problem in NP. Therefore, the TSP must be NP-Complete. \cite{cormen_leiserson_rivest_stein}
\end{proof}
Theorem \ref{TSP_NP-Complete} confirms that no polynomial time algorithm solving the TSP is known. However, according to \cite{cormen_leiserson_rivest_stein}, there are three ways to get around NP-Completeness. Firstly, if the instance of an NP-Complete problem is small, an exponential algorithm may be good enough \cite{cormen_leiserson_rivest_stein}. Secondly, there could be special cases of the problem that could be solved in polynomial time \cite{cormen_leiserson_rivest_stein}. Thirdly, there are algorithms that can be designed to find near-optimal solutions (solutions who's value is very close to the optimal value) in polynomial time \cite{cormen_leiserson_rivest_stein}. Algorithms that return near-optimal solutions to a problem are called approximation algorithms \cite{cormen_leiserson_rivest_stein}. In this dissertation, the third case will be considered. In fact, in the next section, two approximation algorithms that can return near-optimal solutions to the TSP in polynomial time will be presented.
\subsection{Approximation Algorithms}
\label{approx_section}
As already discussed, the main aim of this section is to present two approximation algorithms that can return near-optimal solutions to the Travelling Salesman problem in polynomial time, and to show that for specific instances of the TSP, the value of the near-optimal solutions is bounded from the above. However, before doing so, some background theory on approximation algorithms must be presented.
\begin{definition}
\label{p(n)-approximation algorithm}
Suppose that A is an optimization problem with positive values assigned to each solution, and X is an approximation algorithm for A. Let $C^\star$ be the value of an optimal solution of A, and C be the value of the solution returned by X. The algorithm X is said to have an approximation ratio p(n) if for any input of size n, max($\frac{C}{C^\star}$, $\frac{C^\star}{C}$) $\leq$ p(n). If an algorithm X has an approximation ratio p(n), then X is said to be a p(n)-approximation algorithm. {\normalfont{\cite{cormen_leiserson_rivest_stein}}}
\end{definition}
In other words, Definition \ref{p(n)-approximation algorithm} states that if an approximation algorithm has an approximation ratio $p(n)$, then for any instance of size $n$, the approximation algorithm will return a solution whose value is at worst $p(n)\times$optimal value. In Definition \ref{p(n)-approximation algorithm}, if an optimization problem is a maximization problem, then, 0 $\le$ $C$ $\leq$ $C^\star$. Hence, the approximation ratio is $\frac{C^\star}{C}$, giving the factor of how large is the optimal value to the approximate solution value. On the other hand, if an optimisation problem is a minimization problem then 0 $\le$ $C^\star$ $\leq$ C. Hence, the approximation ratio is $\frac{C}{C^\star}$, which gives the factor of how large is the approximate solution value to the optimal value. Note that, a 1-approximation algorithm is an algorithm that produces the optimal solution. This is true because, $\frac{C^\star}{C} = 1$ and $\frac{C}{C^\star} = 1 \implies C = C^\star$. \cite{cormen_leiserson_rivest_stein}
\\\\
It is important to note that many problems have polynomial time approximation algorithms with a small constant approximation ratio \cite{cormen_leiserson_rivest_stein}. However, this is not always the case because, there are problems whose best known polynomial time approximation algorithms have approximation ratios that grow as the size of the input increases \cite{cormen_leiserson_rivest_stein}. It is now natural to ask whether the TSP has a polynomial time approximation algorithm with a constant approximation ratio. According to \cite{cormen_leiserson_rivest_stein}, it turns out that if a polynomial time approximation algorithm with a constant approximation ratio for the TSP exists, then P=NP. Note that the proof of Theorem \ref{no_approx_tsp} was constructed using ideas from \cite{cormen_leiserson_rivest_stein}.
\begin{theorem}
Suppose that P $\neq$ NP, then, for any constant p $\geq$ 1, no polynomial time approximation algorithm with approximation ratio p exists for the TSP {\normalfont{\cite{cormen_leiserson_rivest_stein}}}.
\label{no_approx_tsp}
\end{theorem}
\begin{proof}
Consider the TSP as defined in Definition \ref{TSP}. Suppose that P $\neq$ NP and that there exists a constant $p$ $\geq$ 1 such that, $A$ is a polynomial time $p$-approximation algorithm for the TSP. WLOG assume that $p$ is an integer, if not round it to the nearest integer. Let $G(V,E)$, $\abs{V} = n$ be an instance of the Hamiltonian cycle problem defined in Example \ref{example_polynomial_time_verification}. Construct an instance $G^\prime(V^\prime,E^\prime,f)$ of the TSP from $G$ such that, $V^\prime = V$, $E^\prime = \{ \{u, v\} : u, v \in V$ and $u \neq v\}$ and \\
$f\colon V^\prime \times V^\prime \to \{1, p\abs{V} + 1\},$ $f(\{u, v\}) = \begin{cases} 1& \text{if } \{u, v\} \in E\\ p\abs{V} + 1 & \text{otherwise} \end{cases}$\\
As discussed in Theorem \ref{TSP_NP-Complete}, this construction can be done in polynomial time. Now, consider the TSP on the graph $G^\prime$. If $G$ contains a Hamiltonian cycle $h$, then $f$ assigns a weight of $1$ to each edge of $h$ in $G^\prime$. Therefore, by Definition \ref{weightofasubgraph}, $h$ has weight $\abs{V}$ in $G^\prime$. If $G$ is not Hamiltonian, any Hamiltonian cycle in $G^\prime$ must contain edges which are not in $E$. As a result, the weight of any Hamiltonian cycle in $G^\prime$ is at least $p\abs{V} + 1 + \abs{V} - 1 = p\abs{V} + \abs{V} = (p+1)\abs{V} > p\abs{V} \geq \abs{V}$ $(\star)$. Now, apply the approximation algorithm $A$ on $G^\prime$, and let $w^\star$ be the optimal value of the TSP defined on $G^\prime$. It will now be shown that $A$ can be used to check if $G$ is Hamiltonian. Therefore, one gets the following two cases :
\begin{case}
\label{case1}
$G$ is Hamiltonian
\end{case}
Let $h^\prime$ be a Hamiltonian cycle in $G^\prime$. Since $G$ is Hamiltonian, $h^\prime$ either consists only of edges that are in $E$, or consists of edges some of which are in $E$ and some of which are not. Therefore, by the previous paragraph, $h^\prime$ can either have weight $\abs{V}$ or weight of at least $p\abs{V} + \abs{V}$. This means that in general, the Hamiltonian cycles in $G^\prime$ can only have a weight of $\abs{V}$ or a weight of at least $p\abs{V} + \abs{V}$. Now, since in (*), $p\abs{V} + \abs{V} > \abs{V}$, the minimum weight Hamiltonian cycle in $G^\prime$ must have weight $\abs{V}$. Therefore, $w^\star = \abs{V}$. Now, since $A$ is a $p$-approximation algorithm, $A$ guarantees that it will return a Hamiltonian cycle which has weight less than or equal to $pw^\star = p\abs{V}$ when such cycle exists. Therefore, since $G^\prime$ has Hamiltonian cycles of weight $\abs{V}$ $\leq$ $p\abs{V}$, and by $(\star)$ the remaining Hamiltonian cycles are of weight at least $ p\abs{V} + \abs{V} > p\abs{V}$, the approximation algorithm will always return a Hamiltonian cycle of weight $\abs{V}$. Therefore, whenever $G$ is Hamiltonian, $A$ will return a Hamiltonian cycle of weight $\abs{V}$.
\begin{case}
$G$ is not Hamiltonian
\end{case}
Since $G$ is not Hamiltonian, by the paragraph before Case \ref{case1} above, every Hamiltonian cycle in $G^\prime$ has at least a weight of $p\abs{V} + \abs{V}$. Therefore, the minimum weight Hamiltonian cycle in $G^\prime$ has at least a weight of $p\abs{V} + \abs{V}$. Now, since the TSP is a minimization problem, any solution returned by $A$ must have a value greater than or equal to the the optimal value. Therefore, A must return a Hamiltonian cycle having weight at least $p\abs{V} + \abs{V}$. Now, since by ($\star$), $p\abs{V} + \abs{V} > p\abs{V}$, $A$ must return a Hamiltonian cycle of weight greater than $p\abs{V}$. As a result, whenever $G$ is not Hamiltonian, $A$ must return a Hamiltonian cycle having weight greater than $p\abs{V}$ \\\\
From Case 1 and Case 2 above it can be concluded that, by checking the weight of the Hamiltonian cycle returned by A, one can determine whether $G$ is Hamiltonian or not. As discussed in Theorem \ref{TSP_in_NP}, this check can be done in polynomial time. Therefore, since $A$ is a polynomial time algorithm, the whole procedure solves the Hamiltonian Cycle Problem in polynomial time. As discussed in Section \ref{computational_theory}, if one NP-Complete problem can be solved in polynomial time, then P=NP. Now, by Theorem 34.4 in \cite{cormen_leiserson_rivest_stein}, the Hamiltonian Cycle problem is NP-Complete, therefore P=NP. Contradiction. \\
$\therefore$ There is no polynomial time approximation algorithm with a constant approximation ratio for the TSP.
\end{proof}
As discussed in Section \ref{computational_theory}, the P=NP problem is an open problem. Therefore, a polynomial time approximation algorithm with a constant approximation ratio for the TSP is not known, otherwise, the contrappositve of Theorem \ref{no_approx_tsp} would imply that P=NP, contradicting the fact that the P=NP problem is an open problem. It is important to note that, it is not easy to determine the approximation ratio of an approximation algorithm. In fact, there could be cases that an approximation ratio may only be determined for specific instances of a problem \cite{cormen_leiserson_rivest_stein}. One of the specific instances of the TSP that will be considered in the following subsections is called the Metric-TSP defined below.
\begin{definition}
\label{Metric-TSP}
The Metric-TSP is the TSP defined on a complete weighted graph $G(V,E,f)$ on at least 3 vertices, with the additional property that $\forall$ u,v,w $\in$ V, $f(\{u,v\}) \leq f(\{u,w\}) + f(\{w,v\})$. The additional property is called the triangle inequality. {\normalfont{\cite{vazirani_2003}}}
\end{definition}
Having presented some background theory on approximation algorithms, it is now time to present two approximation algorithms that are suitable for the TSP. These approximation algorithms are the Nearest Neighbour Approximation Algorithm, and the Twice Around The Minimum Spanning Tree Approximation Algorithm. The approximation ratios of both these algorithms are not known when applied to the TSP defined in Definition \ref{TSP}. That being said, it will be shown that their approximation ratios are known when applied to Metric-TSP instances. The following subsection presents the Nearest Neighbour Algorithm.
\subsubsection{Nearest Neighbour Approximation Algorithm}
\label{nna_section}
The Nearest Neighbour Algorithm (NNA) is a simple algorithm that can find near-optimal solutions to the TSP \cite{arora_agarwal_tanwar_2016}. The NNA takes as input an instance of the TSP, and outputs a sequence of all visited vertices in the order of how they were visited \cite{arora_agarwal_tanwar_2016}. The NNA can be described as shown in Algorithm \ref{nna_alg} below. Note that Algorithm \ref{nna_alg} was constructed using ideas from \cite{arora_agarwal_tanwar_2016}.
\begin{algorithm}[H]
\begin{algorithmic}[1]
\State Randomly select any vertex $u$ as starting vertex, mark it as visited, and set $current\_vertex = u$.
\State Find an edge of least weight between vertex $current\_vertex$ and an unvisited vertex $v$.
\State Mark $v$ as visited and set $current\_vertex = v$.
\State If all vertices have been visited, move to vertex $u$ and go to line five, otherwise, go to line two.
\State Return the sequence of vertices in the order of how they were visited.
\caption{: NNA($G(V,E,f)$)}
\label{nna_alg}
\end{algorithmic}
\end{algorithm}
From Algorithm \ref{nna_alg} above it can be deduced that, the idea behind the NNA is to try and find a minimum weight Hamiltonian cycle by always moving from one vertex to another using the edge of least weight. This means that the NNA tries to choose an edge which looks the best at that point in time, hoping that it would lead to a minimum weight Hamiltonian cycle. It is important to note that although the goal of the NNA is to find a minimum weight Hamiltonian cycle, the goal is not guaranteed (as it will be shown in Example \ref{example_nna_explanation}).
\\\\What must be guaranteed however is that the sequence of vertices returned by the NNA forms a Hamiltonian cycle. This guarantee is given because, all the vertices in the graph must be visited before the algorithm terminates. In addition to this, excluding the starting vertex, each vertex cannot be visited more than once because by line two in Algorithm \ref{nna_alg}, the algorithm only chooses edges that lead to unvisited vertices. Therefore, by Definition \ref{hamiltonian cycle}, the sequence of vertices outputted by the algorithm represent a Hamiltonian Cycle.
\\\\It is also important to check that the NNA can execute in polynomial time. Otherwise, for large inputs, approximations cannot be given in feasible time (note that feasible time is at worst polynomial time). According to \cite{Rosenkrantz}, the NNA has a quadratic time complexity. This is true because of the following. Suppose that $current\_vertex = u$, where $u$ is an arbitrary vertex in the TSP instance. Then, the NNA determines the next vertex $v$ to move to by checking every vertex adjacent to $u$. Therefore, since each vertex is adjacent to $n-1$ other vertices, the NNA must perform $n-1$ steps for each of the $n$ vertices, which yields a total of $n(n-1)$ steps. Therefore, the NNA must have a quadratic time complexity. What follows is an example which shows how the NNA constructs an approximate solution for the TSP.
\begin{example}
\label{example_nna_explanation}
\normalfont{Consider the graph} $G$ in Example \ref{example4}. The NNA constructs a solution to the TSP defined on $G$ in the following way. Assume that randomly the NNA chooses to start from vertex $v_1$. The following graphs give a pictorial representation on how the NNA will progress step by step on $G$, using Algorithm \ref{nna_alg} above. In these graphs, the blue vertices are the visited vertices, and black vertices are the unvisited vertices. Also, the chosen edges will be coloured red, whilst the unchosen edges will be coloured black.\\\\
\underline{Step 1:}\\\\
Mark the starting vertex $v_1$ as visited
\begin{center}
\begin{tikzpicture}[
> = , % arrow head style
shorten > = 1pt, % don't touch arrow head to node
auto,
node distance = 3cm, % distance between nodes
semithick % line style
]
\tikzset{every state}=[
draw = black,
thick,
fill = white,
minimum size = 1mm
]
\node[state,blue] (v1) {$v_1$};
\node[state] (v2) [right=of v1] {$v_2$};
\node[state] (v3) [below =of v1] {$v_3$};
\node[state] (v4) [below =of v2] {$v_4$};
\path[->] (v1) edge node[]{4}(v2);
\path[->] (v1) edge node[]{5}(v3);
\path[->] (v2) edge node[pos=0.2,below right]{2} (v3);
\path[->] (v3) edge node[]{10}(v4);
\path[->] (v2) edge node[]{4}(v4);
\path[->] (v4) edge node[pos=0.2,below left]{7}(v1);
\end{tikzpicture}
\end{center}
\underline{Step 2:}\\\\
Edge $\{v_1, v_2\}$ is chosen because it has the least weight amongst all edges that $v_1$ is adjacent to. Therefore, move to $v_2$ and mark $v_2$ as visited.
\begin{center}
\begin{tikzpicture}[
> = , % arrow head style
shorten > = 1pt, % don't touch arrow head to node
auto,
node distance = 3cm, % distance between nodes
semithick % line style
]
\tikzset{every state}=[
draw = black,
thick,
fill = white,
minimum size = 1mm
]
\node[state,blue] (v1) {$v_1$};
\node[state, blue] (v2) [right=of v1] {$v_2$};
\node[state] (v3) [below =of v1] {$v_3$};
\node[state] (v4) [below =of v2] {$v_4$};
\path[->] (v1) edge[red] node[]{4}(v2);
\path[->] (v1) edge node[]{5}(v3);
\path[->] (v2) edge node[pos=0.2,below right]{2} (v3);
\path[->] (v3) edge node[]{10}(v4);
\path[->] (v2) edge node[]{4}(v4);
\path[->] (v4) edge node[pos=0.2,below left]{7}(v1);
\end{tikzpicture}
\end{center}
\underline{Step 3:}\\\\
Since not all vertices are visited, steps 1 and 2 above are repeated and thus, edge $\{v_2, v_3\}$ is chosen. Therefore, move to vertex $v_3$ and mark $v_3$ as visited.
\begin{center}
\begin{tikzpicture}[
> = , % arrow head style
shorten > = 1pt, % don't touch arrow head to node
auto,
node distance = 3cm, % distance between nodes
semithick % line style
]
\tikzset{every state}=[
draw = black,
thick,
fill = white,
minimum size = 1mm
]
\node[state,blue] (v1) {$v_1$};
\node[state,blue] (v2) [right=of v1] {$v_2$};
\node[state,blue] (v3) [below =of v1] {$v_3$};
\node[state] (v4) [below =of v2] {$v_4$};
\path[->] (v1) edge[red] node[]{4}(v2);
\path[->] (v1) edge node[]{5}(v3);
\path[->] (v2) edge[red] node[pos=0.2,below right]{2} (v3);
\path[->] (v3) edge node[]{10}(v4);
\path[->] (v2) edge node[]{4}(v4);
\path[->] (v4) edge node[pos=0.2,below left]{7}(v1);
\end{tikzpicture}
\end{center}
\underline{Step 4:}\\\\
Since not all vertices are visited, steps 1 and 2 above are repeated and thus, edge $\{v_3, v_4\}$ is chosen. Therefore, move to vertex $v_4$ and mark $v_4$ as visited.
\begin{center}
\begin{tikzpicture}[
> = , % arrow head style
shorten > = 1pt, % don't touch arrow head to node
auto,
node distance = 3cm, % distance between nodes
semithick % line style
]
\tikzset{every state}=[
draw = black,
thick,
fill = white,
minimum size = 1mm
]
\node[state,blue] (v1) {$v_1$};
\node[state,blue] (v2) [right=of v1] {$v_2$};
\node[state,blue] (v3) [below =of v1] {$v_3$};
\node[state,blue] (v4) [below =of v2] {$v_4$};
\path[->] (v1) edge[red] node[]{4}(v2);
\path[->] (v1) edge node[]{5}(v3);
\path[->] (v2) edge[red] node[pos=0.2,below right]{2} (v3);
\path[->] (v3) edge[red] node[]{10}(v4);
\path[->] (v2) edge node[]{4}(v4);
\path[->] (v4) edge node[pos=0.2,below left]{7}(v1);
\end{tikzpicture}
\end{center}
\underline{Step 5:}\\\\
Since all vertices are visited, the NNA moves to the starting vertex.
\begin{center}
\begin{tikzpicture}[
> = , % arrow head style
shorten > = 1pt, % don't touch arrow head to node
auto,
node distance = 3cm, % distance between nodes
semithick % line style
]
\tikzset{every state}=[
draw = black,
thick,
fill = white,
minimum size = 1mm
]
\node[state,blue] (v1) {$v_1$};
\node[state,blue] (v2) [right=of v1] {$v_2$};
\node[state,blue] (v3) [below =of v1] {$v_3$};
\node[state,blue] (v4) [below =of v2] {$v_4$};
\path[->] (v1) edge[red] node[]{4}(v2);
\path[->] (v1) edge node[]{5}(v3);
\path[->] (v2) edge[red] node[pos=0.2,below right]{2} (v3);
\path[->] (v3) edge[red] node[]{10}(v4);
\path[->] (v2) edge node[]{4}(v4);
\path[->] (v4) edge[red] node[pos=0.2,below left]{7}(v1);
\end{tikzpicture}
\end{center}
Finally, the NNA returns the Hamiltonian cycle $v_1$, $v_2$, $v_3$, $v_4$, $v_1$ of weight 23.
\end{example}
It is important to note that in Example \ref{example_nna_explanation}, the minimum weight Hamiltonian cycle was not found. In fact, according to Example \ref{example_2.1}, the solution for the TSP given $G$ is the Hamiltonian cycle $v_1$, $v_3$, $v_2$, $v_4$, $v_1$ of weight 18. Note that the NNA would have found the optimal solution if it had started from vertex $v_2$. Since the approximate solution value does not differ a lot from the optimal value, this example seems to indicate that the NNA returns good approximations. However, this is not the case because, there could be instances where the NNA returns very bad approximations. The following example shows when the NNA can give bad approximations.
\begin{example}
\label{nna_fail}
\normalfont{Let} $M \in \mathbb{Z}^+$. Consider the graph $G$ below:
\begin{center}
\begin{tikzpicture}[
> = , % arrow head style
shorten > = 1pt, % don't touch arrow head to node
auto,
node distance = 3cm, % distance between nodes
semithick % line style
]
\tikzset{every state}=[
draw = black,
thick,
fill = white,
minimum size = 1mm
]
\node[state] (v1) {$v_1$};
\node[state] (v2) [right=of v1] {$v_2$};
\node[state] (v3) [below =of v1] {$v_3$};
\node[state] (v4) [below =of v2] {$v_4$};
\path[->] (v1) edge[] node[]{4}(v2);
\path[->] (v1) edge node[]{5}(v3);
\path[->] (v2) edge[] node[pos=0.2,below right]{2} (v3);
\path[->] (v3) edge[] node[]{M}(v4);
\path[->] (v2) edge node[]{4}(v4);
\path[->] (v4) edge[] node[pos=0.2,below left]{7}(v1);
\end{tikzpicture}
\end{center}
Assuming that $M$ is a very large number, the minimum weight Hamiltonian cycle of $G$ is $v_2, v_3, v_1, v_4, v_2$ of weight 18. This minimum weight Hamiltonian cycle is found if the NNA starts from vertex $v_2$. Now, suppose that the NNA starts from vertex $v_1$. Then, NNA returns the Hamiltonian cycle $v_1$, $v_2$, $v_3$, $v_4$, $v_1$ of weight $M + 13$. This means that, since $M$ is a very large number, the NNA would return an approximate result that differs a lot in weight compared to the optimal solution. Therefore, in this case, the NNA returns a very bad approximation.
\end{example}
It can be concluded from Example \ref{nna_fail} that the quality of the result returned by the NNA depends heavily on which vertex the algorithm starts with, especially when the weights differ a lot. As seen in Example \ref{nna_fail}, this happens when the edge with a very large weight has to be used in order to complete the Hamiltonian cycle, or when all other vertices have been visited. In fact in Example \ref{nna_fail}, edge $\{v_3, v_4\}$ has to be used because, $current\_vertex$ was equal to $v_3$ but only $v_4$ was unvisited.\\\\
Since the NNA is an approximation algorithm, it is natural to ask what is it's approximation ratio. It can be deduced from Theorem \ref{no_approx_tsp} that, it is known that the NNA does not have a constant approximation ratio unless P=NP. In addition to this, By Theorem 1 in \cite{chekuri_im_2011}, unless P=NP, the approximation ratio of the NNA is not known in general. However, Rosenkrantz, Stearns and Lewis \cite{Rosenkrantz}, proved that, the NNA applied to an instance $G$ of the Metric-TSP has an approximation ratio that is logarithmic in the number of vertices in $G$. In what follows, a lemma will be proved. This will help in proving that the approximation ratio of the NNA applied to an instance $G$ of the Metric-TSP, is logarithmic in the number of vertices in $G$. Note that Lemma \ref{to_proove_bound} was constructed using ideas from \cite{Rosenkrantz}.
\begin{lemma}
\label{to_proove_bound}
Let G(V,E,f) be a complete weighted graph on $n$ vertices, and $W^\star$ be the optimal value of the Metric-TSP defined on G. Suppose that there exists a mapping g : V $\mapsto$ $\real^+$ such that, for all v $\in$ V, g(v) = $l_v$ and that the following two conditions hold:
\begin{itemize}
\item $f(\{u, v\})$ $\geq$ min( $l_u$, $l_v$) for all $u$,$v$ $\in$ $V$
\item $l_u$ $\leq$ $\frac{1}{2}W^\star$ for all u $\in$ V
\end{itemize}
Then $\sum_{u \in V} l_u \leq \frac{1}{2}(\lceil log_2 n \rceil + 1)W^\star$. {\normalfont{\cite{Rosenkrantz}}}
\end{lemma}
\begin{proof}
Let $G(V,E,f)$ be a complete weighted graph on $n$ vertices, satisfying all the conditions in the Theorem statement. Suppose that $V=\{i | 1 \leq i \leq n\}$ and that, for all $i$, $j$ $\in$ V, if $i$ $\leq$ $j$ then $l_i$ $\geq$ $l_j$. Clearly, this just re-names and re-orders the vertices, hence, the generality of the proof is not effected.
\begin{claim}
\label{claim1}
$W^\star$ $\geq$ 2 $\sum_{i = k+1}^{min(2k, n)} l_i$, for all 1 $\leq$ $k$ $\leq$ n, where $k$ $\in$ $\mathbb{N}$.
\end{claim}
Let $H(V^\prime,E^\prime,f)$ be the complete weighted subgraph of $G$, such that, $V^\prime$ = $\{i | 1 \leq i \leq min(2k, n)\}$. Let $T$ be a sequence of vertices representing a Hamiltonian cycle in $H$, such that, the vertices are ordered as they would appear in a minimum weight Hamiltonian cycle of $G$. Let $C(V^\prime, E^{\prime\prime}, f)$ be the Cycle graph obtained from $T$ with weight $T^\star$. By the triangle inequality, for all $\{i, j\}$ $\in$ $E^{\prime\prime}$, $f(\{i, j\})$ is less than or equal to the weight of the path from $i$ to $j$ in a minimum weight Hamiltonian cycle of $G$. This holds because, the vertices in $T$ appear in the same order as in the minimum weight Hamiltonian cycle. Therefore, $T^\star$ $\leq$ $W^\star$. Now, by Definition \ref{weightofasubgraph}, $T^\star$ = $\sum_{\{i, j\} \in E^{\prime\prime}} f(\{i, j\})$. Therefore, by the first bullet point in Lemma \ref{to_proove_bound}, $T^\star$ = $\sum_{\{i, j\} \in E^{\prime\prime}} f(\{i, j\})$ $\geq$ $\sum_{\{i, j\} \in E^{\prime\prime}} min(l_i, l_j)$. For each $i$ $\in$ [$min(2k, n)$], let $\alpha_i$ be the number of edges $\{i, j\}$ $\in$ $E^{\prime\prime}$, such that, $i > j$ ( i.e $l_i$ = $min(l_i, l_j)$). Then, $\sum_{\{i, j\} \in E^{\prime\prime}} min(l_i, l_j)$ = $\sum_{i \in E^\prime} \alpha_il_i$\\
$\implies$ $T^\star$ $\geq$ $\sum_{i \in E^\prime} \alpha_i l_i$ = $\sum_{i =1}^{min(2k, n)} \alpha_i l_i$.\\ Now, for all $i$ $\in$ [$min(2k, n)$], $\alpha_i$ $\leq$ 2. This is true because, every vertex $v_i$ is incident to exactly two edges of $C$. In addition to this, $\alpha_1$ + .. + $\alpha_{min(2k, n)}$ = $\abs{E^{\prime\prime}}$, because, for all $\{i,j\}$ $\in$ $E^{\prime\prime}$, either $ i \geq j$ or $j > i$. Therefore, either $\alpha_i$ or $\alpha_j$ is incremented by one. Another observation is that, $k$ is at least half of the number of vertices in $C$. This is true because of the following. Suppose that, $min(2k, n)$ = 2k. Then, by the construction of $C$ and $H$, $C$ contains $2k$ vertices. Hence, $k$ must be half of the number of vertices in $C$. Now, suppose that $min(2k, n)$ = $n$. Then, by the construction of $C$ and $H$, $C$ contains $n$ vertices. Now, $n$ $\leq$ $2k$, hence, $k$ must be at least half of the number of vertices in $C$. Therefore, from both cases, it can be deduced that $k$ is at least half of the number of vertices in $C$.
\begin{claim}
\label{claim2}
$\sum_{i =1}^{min(2k, n)} \alpha_i l_i$ $\geq$ 2 $\sum_{i = k + 1}^{min(2k, n)} l_i$.
\end{claim}
Since $k$ is at least half the number of vertices in $C$, and each $l_i$ is positive, 2 $\sum_{i = k + 1}^{min(2k, n)} l_i$ has the largest value when $k$ = $\frac{1}{2} min(2k, n)$. Therefore, if Claim \ref{claim2} can be proved for $k$ = $\frac{1}{2} min(2k, n)$, then Claim \ref{claim2} can be proved for all possible $k$. Let $k$ = $\frac{1}{2} min(2k, n)$ and $\alpha_i$ = 2 for all $i$ $\in$ [$k+1,min(2k, n)$]. Then, $\alpha_{k+1}$ + ... + $\alpha_{min(2k, n)}$ = 2($min(2k, n)$ - $k$) = 2($min(2k, n)$ - $\frac{1}{2} min(2k, n)$ ) = $min(2k, n)$ = number of edges in $C$. Therefore, since $\alpha_1$ + .. + $\alpha_{min(2k, n)}$ = $\abs{E^{\prime\prime}}$, then, $\alpha_i$ = 0 for all $i$ $\in$ [$k$]. Therefore, $\sum_{i =1}^{min(2k, n)} \alpha_i l_i$ = $\sum_{i = k + 1}^{min(2k, n)}2 l_i$ = 2 $\sum_{i = k + 1}^{min(2k, n)} l_i$. Therefore $\sum_{i =1}^{min(2k, n)} \alpha_i l_i$ $\geq$ 2 $\sum_{i = k + 1}^{min(2k, n)} l_i$, and hence, Claim \ref{claim2} is proved.\\\\
Now, since $W^\star$ $\geq$ $T^\star$ $\geq$ $\sum_{i =1}^{min(2k, n)} \alpha_i l_i$. Then, by Claim \ref{claim2}, $W^\star$ $\geq$ 2 $\sum_{i = k + 1}^{min(2k, n)} l_i$, and hence Claim \ref{claim1} is proved aswell.\\\\
By Claim \ref{claim1}, $W^\star$ $\geq$ 2 $\sum_{i = k + 1}^{min(2k, n)} l_i$. \\Hence, $\sum_{j = 0}^{\lceil log_2 n \rceil - 1} W^\star$ $\geq$ $\sum_{j = 0}^{\lceil log_2 n \rceil -1} 2 \sum_{i = k + 1}^{min(2k, n)} l_i$. Letting $k$ = $2^{j}$,\\$\sum_{j = 0}^{\lceil log_2 n \rceil - 1} W^\star$ $\geq$ $\sum_{j = 0}^{\lceil log_2 n \rceil -1} 2 \sum_{i = 2^j + 1}^{min(2^{j+1}, n)} l_i$. \\$\implies$ $\lceil log_2 n \rceil$ $W^\star$ $\geq$ 2 $\sum_{i = 2}^{n} l_i$. Now, bullet point number two in Lemma \ref{to_proove_bound} implies that, $W^\star$ $\geq$ 2$l_1$. \\$\implies$ $\lceil log_2 n \rceil$ $W^\star$ $\geq$ 2 $\sum_{i = 1}^{n} l_i$.\\$\implies$ $\frac{1}{2}$ ($\lceil log_2 n \rceil$ + 1) $W^\star$ $\geq$ $\sum_{i = 1}^{n} l_i$ = $\sum_{u \in V} l_u$.
\end{proof}
Having proved Lemma \ref{to_proove_bound}, it is now time to prove that, when applied to an instance $G$ of the Metric-TSP, the NNA has an approximation ratio that depends logarithmically on the number of vertices in $G$. Note that the proof of Theorem \ref{log_bound_thrm} was constructed using ideas from \cite{Rosenkrantz}.
\begin{theorem}
\label{log_bound_thrm}
Suppose that $G(V,E,f)$ is an instance of the Metric-TSP. Let $W^\star$ be the optimal value of $G$, and apply the NNA on $G$. Suppose that $W$ is the value of the approximation returned by the NNA. Then, $\frac{W}{W^\star}$ $\leq$ $\frac{1}{2}(\lceil log_2 n \rceil + 1)$. {\normalfont\cite{Rosenkrantz}}
\end{theorem}
\begin{proof}
Define $G(V,E,f)$, $W^\star$, and $W$ as stated in Theorem \ref{log_bound_thrm}, and apply Algorithm \ref{nna_alg} to $G$. Let $g$ : $V$ $\mapsto$ $\real^+$ be the mapping defined by $g(v)$ = $f(\{v,u\})$, where $u$ is the vertex chosen by the NNA when $current\_vertex = v$. For all $v$ $\in$ $V$, let $l_v$ = $g(v)$. It will be shown that $g$ satisfies the two conditions of Lemma \ref{to_proove_bound}.\\\\
\textbf{Condition 1} : $f(\{u, v\})$ $\geq$ min( $l_u$, $l_v$) for all $u$,$v$ $\in$ $V$\\\\
Let $u$, $v$ $\in$ $V$. Suppose that $u$ was set as visited by the NNA before $v$. Then, since $G$ is complete, $\{u, v\}$ was a candidate edge for the NNA when $current\_vertex$ was equal to $u$. Therefore, $f(\{u, v\})$ is greater than or equal to the weight of the edge selected when $current\_vertex$ was equal to $u$. Note that the weight of the selected edge is $l_u$. Therefore, $f(\{u, v\})$ $\geq$ $l_u$ (*). Using the same argument, if $v$ was set as visited by the NNA before $u$, then, $f(\{v, u\})$ = $f(\{u, v\})$ $\geq$ $l_v$ (**). Now, since the NNA must visit all veritices, either $u$ is visited before $v$ or vice-versa. As a result, one of (*) or (**) must hold. Hence, $f(\{u, v\})$ $\geq$ min($l_u$, $l_v$). Therefore, since $u$ and $v$ are arbitrary, Condition 1 holds.\\\\
\textbf{Condition 2} : $l_u$ $\leq$ $\frac{1}{2}W^\star$ for all u $\in$ V\\\\
Let $u$, $v$ $\in$ V such that, $l_u$ = $f(\{u, v\})$. Let $h$ be a minimum weight Hamiltonian cycle in $G$. Since $h$ is spanning a spanning subgraph, both $u$ and $v$ are in $h$. Therefore, $h$ can be expressed as the disjoint union of two paths joining $u$ and $v$. One of the paths can be obtain by starting from vertex $u$ and traversing through $h$ clockwise until finding $v$. The other path can be obtained by starting from vertex $u$ and traversing through $h$ anti-clockwise until finding $v$. Let these paths be $p_1$ and $p_2$ respectively. By the triangle inequality, the weight of $p_1$ and the weight of $p_2$ are both greater than or equal to $f(\{u,v\})$ in $G$. Therefore, summing both inequalities, 2$l_u$ = $2f(\{u,v\})$ $\leq$ weight of $p_1$ + weight of $p_2$ = weight of $h$ = $W^\star$. As a result, $l_u$ $\leq$ $\frac{1}{2}W^\star$. Therefore, since $u$ is arbitrary, Condition 2 holds.\\\\Now, for all $v$ in $V$, $l_v$ is the weight of the edge chosen by the NNA when $current\_vertex = v$. Therefore, $\sum_{u \in V} l_u $ = $W$. Using Lemma \ref{to_proove_bound}, $W$ = $\sum_{u \in V} l_u $ $\leq$ $\frac{1}{2}$ ($\lceil log_2 n \rceil$ + 1) $W^\star$.\\ $\implies$ $W$ $\leq$ $\frac{1}{2}$ ($\lceil log_2 n \rceil$ + 1) $W^\star$\\ $\implies$ $\frac{W}{W^\star}$ $\leq$ $\frac{1}{2}$ ($\lceil log_2 n \rceil$ + 1).
\end{proof}
The implication of Theorem \ref{log_bound_thrm} is that when applying the NNA on an instance $G$ of the Metric-TSP, the quality of the approximation depends on the number of vertices in $G$. This is true because of the following. As the number of vertices increases, the approximation ratio increases logarithmically. Therefore, for very large $n$, $\frac{1}{2}(\lceil log_2 n \rceil + 1)$ may become very large that it may allow bad approximations to be returned by the NNA. These bad approximation would be returned because, their value would still be less than $\frac{1}{2}(\lceil log_2 n \rceil + 1)$ $\times$ optimal value. One way of solving this problem is to try and deduce whether the NNA can have a constant approximation ratio when applied to the Metric-TSP. This would solve the problem because, as the number of vertices increases, the quality of the approximation cannot worsen more than some fixed bound. Unfortunately, Rosenkrantz, Stearns and Lewis \cite{Rosenkrantz}, proved that there is an instance $G$ of the Metric-TSP such that, when the NNA is applied to $G$, it has an approximation ratio that is strictly bounded below by a logarithmic function of the number of vertices in $G$. This means that in general, the NNA cannot have a constant approximation ratio when applied to the Metric-TSP. This fact is stated in Theorem \ref{no_proof} below without proof because, the proof is too long.
\begin{theorem}
\label{no_proof}
For all m $>$ 3, m $\in$ $\mathbb{N}$, there is an instance $G(V,E,f)$ of the Metric-TSP such that, $\abs{V}$ = $2^m - 1$ and $\frac{W}{W^\star}$ $>$ $\frac{1}{3}$($log_2 (n+1))$ $+$ $\frac{4}{9}$, where $W$ is the optimal value of $G$, and $W^\star$ is the value of the approximation returned by the NNA when it is applied to $G$.{\normalfont{\cite{Rosenkrantz}}}
\end{theorem}
Theorem \ref{log_bound_thrm} and \ref{no_proof} imply that in general, the NNA cannot have a constant approximation ratio when applied to the Metric-TSP. However, this does not mean that the Metric-TSP cannot be approximated using an approximation algorithm that has a constant approximation ratio. In fact, an approximation algorithm that is known to have a constant approximation ratio when applied to the Metric-TSP is the Twice Around the Minimum Spanning Tree Approximation Algorithm. The Twice Around the Minimum Spanning Tree Approximation Algorithm will be presented in the next subsection.
\subsubsection{Twice Around the Minimum Spanning Tree Approximation Algorithm}
\label{TAMSA_SECTION}
Similarly to the NNA, the Twice Around the Minimum Spanning Tree Algorithm (TAMSA) is an algorithm that can find near-optimal solutions to the TSP \cite{cormen_leiserson_rivest_stein}. The TAMSA computes an approximate solution to the TSP by using two other algorithms. In what follows, these two algorithms will be briefly described without delving into any specific details related to their implementation.\\\\
The first algorithm used by the TAMSA is Prim's algorithm. Prim's algorithm takes a connected weighted graph $G(V,E,f)$ and an arbitrary starting vertex $r$ $\in$ $V$ as inputs, and returns a minimum weight spanning tree $T(V^\prime,E^\prime,f)$ of $G$. Prim's algorithm computes $T$ in the following way. The algorithm initially starts with empty $V^\prime$ and empty $E^\prime$. Then, the algorithm proceeds by adding $r$ to $V^\prime$. Afterwards, the algorithm adds to $E^\prime$, the edge of least-weight connecting a vertex $v^\prime$ in $V^\prime$ to a vertex $v$ $\notin$ $V^\prime$. Following this, $v$ is added to $V^\prime$. The whole procedure is then repeated until all vertices in $V$ are added to $V^\prime$. \cite{harris_hirst_mossinghoff_2008}\\\\
An important question that must be answered is whether the procedure described in the previous paragraph does return a minimum weight spanning tree. Typically, one needs to show that the mathematical structure returned by Prim's algorithm is that of a spanning tree, whose weight is the least possible among all spanning trees of the input graph. Note that, the answer to this question is vital for algorithms that use Prim's algorithm to produce a minimum weight spanning tree. In fact, if Prim's algorithm does not return a minimum weight spanning tree, the algorithms that use Prim's algorithm need to use another algorithm to compute a minimum weight spanning tree. The next two theorems confirm that given an arbitrary connected weighted graph $G$ together with an arbitrary starting vertex $r$ as input, Prim's algorithm always returns a minimum weight spanning tree of $G$. Theorem \ref{correctness1} was constructed using ideas from \cite{prim's_algorithm}.
\begin{theorem}
\label{correctness1}
Prim's algorithm always returns a spanning tree {\normalfont{\cite{prim's_algorithm}}}.
\end{theorem}
\begin{proof}
Let $G(V,E,f)$ be a connected weighted graph, and $T(V^\prime,E^\prime,f)$ be the graph returned by Prim's algorithm when given $G$ and an arbitrary vertex $s$ $\in$ $V$ as input. Note that $s$ is the starting vertex for Prim's algorithm. To show that $T$ is a spanning tree, it must be shown that $T$ is a connected spanning subgraph of $G$ with no cycles. Note that $T$ is a subgraph of $G$ because, Prim's algorithm constructs $T$ using vertices and edges that belong to $G$.\\\\
Suppose that $T$ does not span $G$. Then, there exists a non-empty set of vertices $X$ $\subseteq$ $V$ such that, $X$ $\cap$ $V^\prime$ = $\emptyset$. Let $x$ $\in$ $X$. Then, $x$ $\notin$ $V^\prime$ and hence, there are no edges in $E$ of the form $\{v,x\}$ where $v$ $\in$ $V^\prime$, otherwise, Prim's algorithm would have chosen the edge and added $x$ to $V^\prime$. Now, since $G$ is connected, there exists a path $P$ in $G$ joining $s$ to $x$. In addition to this, $s$ $\in$ $V^\prime$ and $x$ $\in$ $X$. Hence, $P$ must contain an edge of the form $\{v, x\}$, where $v$ $\in$ $V^\prime$. This means that $E$ contains an edge of the form $\{v, x\}$, where $v$ $\in$ $V^\prime$. Contradiction. Therefore, $T$ must be a spanning subgraph of $G$.\\\\
To show that $T$ is connected, it must be shown that any two distinct vertices in $V^\prime$ are connected by a path. Let $u$, $v$ $\in$ $V^\prime$ such that $u$ $\neq$ $v$. Now, before Prim's algorithm adds $u$ to $V^\prime$, it connects it with a vertex $u_k$ which is already in $V^\prime$. But before $u_k$ was added to $V^\prime$, Prim's algorithm must have connected $u_k$ to a vertex $u_{k-1}$ that is already in $V^\prime$. Repeat this argument until reaching a vertex $z$, which is not connected to another vertex from a previous iteration of Prim's algorithm. Note that a vertex $z$ exists because, since there are a finite number of vertices in $V$, there are a finite number of vertices in $V^\prime$ that could have been chosen before $u$. From the description of Prim's algorithm above, this argument terminates when $z$ = $s$. This is true because, $s$ is the only vertex in $V$ which was not connected to another vertex in $V^\prime$ prior to being added to $V^\prime$. Now, suppose that $k$ vertices were added to $V^\prime$ before $u$. Then, $s= u_1, u_2, ..., u_k, u_{k+1} = u$ must be a path in $T$ joining $s$ to $u$ because, $\{u_i, u_{i+1}\}$ $\in$ $E^\prime$ for all $i$ $\in$ [$k$]. Now, the same argument can be repeated for $v$ and hence, $s$ and $v$ must also be connected by a path $P^\prime$. Therefore, since both $u$ and $v$ connect to the same vertex $s$ via a path, one can start from vertex $u$/$v$, travel to $s$ using $P$/$P^\prime$, and travel to $v$/$u$ using $P^\prime$/$P$. Therefore, $u$ and $v$ must be connected by a path. Hence, $T$ must be a connected spanning subgraph of $G$.\\\\
Suppose that $T$ contains a cycle $C$. Let $\{c_1, c_2\}$ be the last edge of $C$ that was added by Prim's algorithm to $E^\prime$ (hence the edge that caused the cycle to be created). WLOG, suppose that $c_1$ was added to $V^\prime$ by Prim's algorithm before $c_2$. Then, since Prim's algorithm adds an edge between a vertex in $V^\prime$ and a vertex not in $V^\prime$, it must be that $c_2$ is not in $V^\prime$ before $\{c_1, c_2\}$ is added to $E^\prime$. Hence, before $\{c_1, c_2\}$ is added to $E^\prime$, there are no paths in $T$ joining $c_1$ to $c_2$. This means that, no cycle is created when adding $\{c_1, c_2\}$ to $E^\prime$ because, $c_1$ would be connected to $c_2$ by exactly one path (the path $c_1, c_2$). Contradiction, since $\{c_1, c_2\}$ is the edge that caused the cycle to be created. Therefore, $T$ must be a connected spanning sub graph of $G$ without cycles, hence, by Definition \ref{spanning tree}, $T$ is a spanning tree of $G$.
\end{proof}
Theorem \ref{correctness1} confirms that Prim's algorithm returns a spanning tree. Now, it must be shown that the spanning tree returned by Prim's algorithm has the least possible weight among all spanning trees of $G$. Theorem \ref{correctness2} was constructed using ideas from \cite{prim's_algorithm}.
\begin{theorem}
Prim's algorithm always returns a minimum weight spanning tree {\normalfont{\cite{prim's_algorithm}}}.
\label{correctness2}
\end{theorem}
\begin{proof}
Let $G(V,E,f)$ be a connected weighted graph, and $T(V^\prime,E^\prime,f)$ be the spanning tree returned by Prim's algorithm when given $G$ and an arbitrary vertex $s$ $\in$ $V$ as inputs. Note that $s$ is the starting vertex for Prim's algorithm. Let $T^\star(V^\star,E^\star,f)$ be a minimum weight spanning tree of $G$. If $T$ = $T^\star$, the theorem holds.\\\\
Suppose that $T$ $\neq$ $T^\star$. Then, $E^\prime$ $\setminus$ $E^\star$ $\neq$ $\emptyset$. Let $e = \{v_1, v_2\}$ $\in$ $E^\prime$ $\setminus$ $E^\star$. Let $S$ $\subset$ $V^\prime$ be the set of vertices in $V^\prime$ just before Prim's algorithm adds the edge $\{v_1, v_2\}$ to $E^\prime$. Although $e$ $\notin$ $E^\star$, there is a path $P$ joining $v_1$ to $v_2$ in $T^\star$, because, $T^\star$ is a spanning tree. Now, by construction of $S$, $v_1$ $\in$ $S$ and $v_2$ $\notin$ $S$. Hence, P is a sequence of vertices beginning with vertices in $S$, and ending with vertices which are not in $S$. Therefore, there exists some edge $e^\prime = \{x,y\}$ $\in$ $E^\star$ such that $x$ $\in$ $S$ and $y$ $\notin$ $S$. Now, in the iteration of Prim's algorithm when $V^\prime$ = $S$, both $e$ and $e^\prime$ connect a vertex which is in $V^\prime$ to a vertex which is not in $v^\prime$. Hence, both $e$ and $e^\prime$ are eligible to be chosen by Prim's algorithm at this stage. However, it is known that $e$ is chosen instead of $e^\prime$. This means that, $e$ is an edge of lesser weight than $e^\prime$. Now, if the edge $e^\prime$ is removed from $E^\star$ and replaced with the edge $e$, one gets a path $P^{\prime\prime}$ in $T^\star$ joining $x$ to $y$. Hence, ($T^\star$ $\setminus$ $P$) $\cup$ $P^{\prime\prime}$ would still remain a spanning tree. Let $T^\sim$ be the spanning tree ($T^\star$ $\setminus$ $P$) $\cup$ $P^{\prime\prime}$. Then, $T^\sim$ differs from $T^\star$ exactly in the edges $e$ and $e^\prime$, where, $f(e)$ $\geq$ $f(e^\prime)$. This means that, $T^\sim$ has weight at most that of $T^\star$. Now, since $T^\star$ is a minimum spanning tree, it must be a spanning tree of least weight. Hence, $f(T^\sim)$ = $f(T^\star)$. Therefore, if this procedure is repeated for every edge in $E^\prime$ $\setminus$ $E^\star$, $T^\star$ would be converted into a spanning tree $W^\prime = T$, where $f(T) = f(W^\prime) = f(T^\star)$. Therefore, $T$ must be a minimum weight spanning tree. Note that this holds because, the procedure can be repeated only for a finite number of times. This is true because, since $G$ is a finite graph, $|E^\prime$ $\setminus$ $E^\star|$ is a finite number.
\end{proof}
What follows is an example that demonstrates how Prim's algorithm constructs a minimum weight spanning tree from a connected weighted graph.
\begin{example}
\label{example_prim}
\normalfont{Consider} the graph $G$ in Example \ref{example4}, and the brief description of Prim's algorithm given in this subsection. Let $T(V^\prime,E^\prime)$ be the graph with an empty set of vertices, and an empty set of edges. Assuming that the starting vertex is $v_1$, Prim's algorithm starts by adding $v_1$ to $V^\prime$. Hence $T$ is now the following graph:
\begin{center}
\begin{tikzpicture}[
> = , % arrow head style
shorten > = 1pt, % don't touch arrow head to node
auto,
node distance = 3cm, % distance between nodes
semithick % line style
]
\tikzset{every state}=[
draw = black,
thick,
fill = white,
minimum size = 1mm
]
\node[state] (v1) {$v_1$};
\end{tikzpicture}
\end{center}
The algorithm now proceeds by adding the edge of least weight from $E$ to $E^\prime$, which connects a vertex $v^\prime$ $\in$ $V^\prime$ to a vertex $v$ $\notin$ $V^\prime$. Also, $v$ is added to $V^\prime$. In this example, these properties are satisfied by the edge $\{v_1, v_2\}$, hence, $T$ is the following graph.
\begin{center}
\begin{tikzpicture}[
> = , % arrow head style
shorten > = 1pt, % don't touch arrow head to node