forked from Captainarash/The_Holy_Book_of_X86
-
Notifications
You must be signed in to change notification settings - Fork 0
/
book_vol_1.txt
2221 lines (1754 loc) · 116 KB
/
book_vol_1.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
___ ___ ___ ___ ___ ___ ___ ___ ___ ___
/\__\ /\ \ /\ \ /\__\ /\ \ /\__\ /\ \ /\ \ /\ \ /\__\
/:/ / /::\ \ /::\ \ /::| | /::\ \ /:/ / /::\ \ /::\ \ /::\ \ /::| |
/:/__/ /:/\:\ \ /:/\:\ \ /:|:| | /:/\:\ \ /:/ / /:/\:\ \ /:/\:\ \ /:/\:\ \ /:|:| |
/::\__\____ /::\~\:\ \ /::\~\:\ \ /:/|:| |__ /::\~\:\ \ /:/ / /::\~\:\ \ /::\~\:\ \ /::\~\:\ \ /:/|:|__|__
/:/\:::::\__\ /:/\:\ \:\__\ /:/\:\ \:\__\ /:/ |:| /\__\ /:/\:\ \:\__\ /:/__/ /:/\:\ \:\__\ /:/\:\ \:\__\ /:/\:\ \:\__\ /:/ |::::\__\
\/_|:|~~|~ \:\~\:\ \/__/ \/_|::\/:/ / \/__|:|/:/ / \:\~\:\ \/__/ \:\ \ \/__\:\ \/__/ \/__\:\/:/ / \/_|::\/:/ / \/__/~~/:/ /
|:| | \:\ \:\__\ |:|::/ / |:/:/ / \:\ \:\__\ \:\ \ \:\__\ \::/ / |:|::/ / /:/ /
|:| | \:\ \/__/ |:|\/__/ |::/ / \:\ \/__/ \:\ \ \/__/ /:/ / |:|\/__/ /:/ /
|:| | \:\__\ |:| | /:/ / \:\__\ \:\__\ /:/ / |:| | /:/ /
\|__| \/__/ \|__| \/__/ \/__/ \/__/ \/__/ \|__| \/__/
=====================================================================================================================================================
_______ _ _ _ _ ____ _ __
|__ __| | | | | | | | | _ \ | | / _|
| | | |__ ___ | |__| | ___ | |_ _ | |_) | ___ ___ | | __ ___ | |_
| | | '_ \ / _ \ | __ |/ _ \| | | | | | _ < / _ \ / _ \| |/ / / _ \| _|
| | | | | | __/ | | | | (_) | | |_| | | |_) | (_) | (_) | < | (_) | |
|_| |_| |_|\___| |_| |_|\___/|_|\__, | |____/ \___/ \___/|_|\_\ \___/|_|
__/ |
|___/
888888888 66666666
88:::::::::88 6::::::6
88:::::::::::::88 6::::::6
8::::::88888::::::8 6::::::6
xxxxxxx xxxxxxx8:::::8 8:::::8 6::::::6
x:::::x x:::::x 8:::::8 8:::::8 6::::::6
x:::::x x:::::x 8:::::88888:::::8 6::::::6
x:::::xx:::::x 8:::::::::::::8 6::::::::66666
x::::::::::x 8:::::88888:::::8 6::::::::::::::66
x::::::::x 8:::::8 8:::::86::::::66666:::::6
x::::::::x 8:::::8 8:::::86:::::6 6:::::6
x::::::::::x 8:::::8 8:::::86:::::6 6:::::6
x:::::xx:::::x 8::::::88888::::::86::::::66666::::::6
x:::::x x:::::x 88:::::::::::::88 66:::::::::::::66
x:::::x x:::::x 88:::::::::88 66:::::::::66
xxxxxxx xxxxxxx 888888888 666666666 v0.1.5 Delivered to you by Arash TC with the spirit of OpenSecurityTraining.info
=====================================================================================================================================================
Are you such a dreamer to put the world to rights?
I stay home forever
where 2 and 2
always makes a 5
[Thom Yorke - 2 + 2 = 5]
=====================================================================================================================================================
Acknowledgement
I owe everything I know about x86 architecture to Xeno Kovah. A man who shared his class videos and slides freely available to everyone which
is a noble act. In return to his great efforts, I decided to write this tutorial on x86 architecture and assembly and publish it for free so whoever
is interested can learn and contribute.
=====================================================================================================================================================
About the Author[s]:
Arash TC is the main author and maintainer of this book. He is currently studying IT in Finland. He's got OSCP/OSCE and a long long long way
ahead to become a pro. He will appreciate readers' comments, criticisms and contributions. His main interest is low level security and kernel
internals.
Amir Ali Torkaman is the editor of this volume. He studies English Literature. He's very enthusiastic about Information Security and Privacy.
That's all I can tell about him now. maybe later you see more info about him here.
Other contributors are very appreciated as they help me to complete this project and present you a book which hopefully will be flawless.
You can contact the author[s] by visiting http://www.kernelfarm.com/
=====================================================================================================================================================
Introduction:
This book/guide/tutorial/wiki is about assembly and x86 architecture. It's written by a low level security dude for low level security dudes.
If you wanna learn Assembly and its structure, reversing basics, Segmentation, Paging, etc. keep on reading. I highly recommend you check
opensecuritytraining.info website and watch Intro to x86 videos as you read this book.
You need Intel Developer's Manual as a quick reference throughout this book. You can download it from the link below:
https://software.intel.com/sites/default/files/managed/39/c5/325462-sdm-vol-1-2abcd-3abcd.pdf
=====================================================================================================================================================
Table of Content:
Since the book is not finished yet, it's not so convenient to put a table of contents here for now. Just read and enjoy ;) :P
=====================================================================================================================================================
Data Types:
There are 5 different data types to deal with in the world of Assembly. They are as follows:
Byte: A byte is simply an 8-bit value (1 byte) and the C equivalent of a Byte is when you define a character like:
char alpha = 'a';
Word: A word is twice the size of a byte; 16-bit value (2 bytes) and it translates to this piece of code in C:
short int = 'a'
DoubleWord: As its name represents, a Double-Word or a DWORD is a 32-bit value
(4 bytes) and it would be translated to:
int var = 3
QuadWord: A 64-bit value (8 bytes) with the C equivalent of:
long int var = ...
Double-QuadWord: You do the math :)
7 0
+--------+
char | | Byte
+--------+
15 7 0
+------------------+
short |High Byte|Low Byte| Word
+------------------+
31 15 0
+------------------------------------+
int/long | High Word || Low Word | DoubleWord
+------------------------------------+
63 31 0
+--------------------------------------------------------------------------+
double/long long | High DoubleWord || Low DoubleWord | QuadWord
+--------------------------------------------------------------------------+
Binary-Decimal-Hex Refresher:
If you don't know how to work with Hexadecimal and Binary values, I have to tell you to stop reading this and
go kill yourself. Seriously? Ok. Here's a refresher for you:
+-------------------- +-------------------- +--------------------
| Decimal (base 10) | | Binary (Base 2) | | HEX (Base 16) |
+-------------------+ +-------------------+ +-------------------+
| 00 | | 0000 | | 0x00 |
+-------------------+ +-------------------+ +-------------------+
| 01 | | 0001 | | 0x01 |
+-------------------+ +-------------------+ +-------------------+
| 02 | | 0010 | | 0x02 |
+-------------------+ +-------------------+ +-------------------+
| 03 | | 0011 | | 0x03 |
+-------------------+ +-------------------+ +-------------------+
| 04 | | 0100 | | 0x04 |
+-------------------+ +-------------------+ +-------------------+
| 05 | | 0101 | | 0x05 |
+-------------------+ +-------------------+ +-------------------+
| 06 | | 0110 | | 0x06 |
+-------------------+ +-------------------+ +-------------------+
| 07 | | 0111 | | 0x07 |
+-------------------+ +-------------------+ +-------------------+
| 08 | | 1000 | | 0x08 |
+-------------------+ +-------------------+ +-------------------+
| 09 | | 1001 | | 0x09 |
+-------------------+ +-------------------+ +-------------------+
| 10 | | 1010 | | 0x0A |
+-------------------+ +-------------------+ +-------------------+
| 11 | | 1011 | | 0x0B |
+-------------------+ +-------------------+ +-------------------+
| 12 | | 1100 | | 0x0C |
+-------------------+ +-------------------+ +-------------------+
| 13 | | 1101 | | 0x0D |
+-------------------+ +-------------------+ +-------------------+
| 14 | | 1110 | | 0x0E |
+-------------------+ +-------------------+ +-------------------+
| 15 | | 1111 | | 0x0F |
+-------------------+ +-------------------+ +-------------------+
Negative numbers:
Negative numbers in x86 architecture may seem a little bit weird at first. A negative Number named N with the
positive value of P, is P's two's complement which is equal to one's complement plus one.
Holy shit! What was that again? OK! It is very simple and clear if you see it in action.
P = 1 in decimal = 0x01 in Hex = 00000001 in Binary.
One's complement is when you flip all the bits of the number P. So:
P = 00000001 and P's One's complement is all P's bits flipped which equals to:
Flipped_P = 11111110 in Binary.
Now what happens if you add one to it and convert it to Hex?
Flipped_P + 1 = 11111110 + 00000001 = 11111111 ---> 0xFF in Hex.
So negative one in x86 Hex format would be 0xFF. You can take a look at the following table to completely
comprehend it.
+-------------------------------------------------------+
| P |Flipped_P (One's Complement|Two's Complement|
+-------------------------------------------------------+
|0x00000001| 0xFFFFFFFE | 0xFFFFFFFF |
+--------------------------------------+----------------+
For signed integers, we have these scopes:
-From byte 0x01 to byte 0x7F, all bytes are positive.
-From byte 0x80 to byte 0xFF, all bytes are negative (They are in reversed order, 0xFF is -1 and as you approach a smaller hex value,
you approach a smaller negative number).
-From DWORD 0x00000001 to DWORD 0x7FFFFFFF, all DWORDs are positive.
-From DWORD 0x80000000 to DWORD 0xFFFFFFFF, all DWORDs are negative (They are in reversed order, which means that the last one (0xFFFFFFFF)
is -1 all the way down to the smallest).
+----------+----------+
| positive | negative |
+--------------------------+
|from|0x00000001|0x80000000|
+--------------------------+
| to |0x7fffffff|0xffffffff|
+--------------------------+
Little Endian or Big Endian?
Endianness comes from Jonathan Swifts' "Gulliver's Travels". It doesn't matter which way you eat your eggs and it certainly
doesn't matter in computer architecture, right? Well, I don't know exactly. Probably not. But what matters is that Intel
Architecture is "Little-Endian". So what's up with that?
In a Little-Endian Architecture, values are stored in RAM starting from the lowest byte. For example, this is what happens if
you want to store the address 0x12345678 in memory:
0x12345678 --> 0x12 0x34 0x56 0x78 -->
| |
Into RAM <-- 0x78 0x56 0x34 0x12 <--
So it's simple. It just starts storing from that lowest byte up to the highest byte.
In Big-Endian Architecture, values are stored in RAM as they are. Network traffic is Big-Endian.
PowerPC, ARM, SPARC, MIPS, etc. are Big-Endian unless otherwise configured.
*** Note: Register values are always Big-Endian. Little-Endian only applies to when writing to and reading from RAM (Memory).
REGISTERS:
Registers are small memory storage areas built into the processor. They are still a volatile type of memory storage so if you power off
your PC, you're gonna lose the state of your registers. Intel architecture defines 8 General Purpose Registers (GPR) for 32 bit platforms
and 16 GPRs for 64 bit platforms as follows:
32-bit | 64-bit
------------------------------------------------
|
EAX | RAX
|
EBX | RBX
|
ECX | RCX
|
EDX | RDX
|
ESI | RSI
|
EDI | RDI
|
EBP | RBP
|
ESP | RSP
|
| R8
|
| R9
|
| R10
|
| R11
|
| R12
|
| R13
|
| R14
|
| R15
Each of the registers above is 4 bytes long in the 32-bit version, and 8 bytes long in the 64-bit version. Beside those General
Purpose Registers, we have EIP (RIP for 64-bit) which is called the Instruction Pointer which holds the current flow of the execution;
and we have EFLAGS, which is a 32-bit long register of registers. Yeah, I know that might sound crazy but I will explain them fully
in later chapters.
EAX is mostly used when a function wants to return a value and it is also used for lots of different purposes. You have to see it
in action in order to recognize its usability in different scenarios. EBX is the Base Pointer for data section and EDX is the I/O pointer
but let's save these conventions for later chapters. ECX is mostly used as a counter for repetitive instructions i.e. for a
loop. ESI and EDI are used as Source Index and Destination Index respectively i.e. copying a string value. ESP is the stack pointer
which always points to the top of the current stack. EBP is the base pointer which always (actually not always :D) points to the bottom
of the current stack frame by convention. I have to mention that these are only some conventions and you don't have to use them in the
exact way. It is simply for simplicity and readability of your code.
The Stack
Two very important concepts you need to know, Stack and Heap. I'm gonna tell you what Stack is now and save Heap for later
chapters. Stack is a conceptual area of memory (RAM) which mostly holds a function's local variables. Stack has a Last-In-First-Out
data structure, meaning that the first thing that is pushed onto the stack is the last thing that is gonna pop out. Imagine a bucket
full of apples. The first apple that you put inside the bucket is the last one that you can pull out (of course if you don't
just turn the bucket upside down :D).
Stack grows down from higher memory addresses to lower memory addresses. For example, if the stack starts at address 0x7fff4444
(ESP), the next DWORD (4 bytes, remember?) that you push onto the stack, decrements the stack by 4 bytes and then ESP will point to
0x7fff4444 - 4 = 0x7fff4440. You will see this in greater detail in the next few paragraphs.
OK, you may want me to cut the bullshit and show you some real stuff, right? How about we see a simple C program? Hell no!
We're just getting started. Kidding aside, there are still some major things you need to know in order to fully understand even a simple
"Hello World" code in Assembly. So stick with me and be patient.
Caller - Callee Convention
Caller Save Registers mean that whenever you want to call a function, save these registers (EAX,EDX,ECX 32-bit or RAX,EDX,ECX
for 64-bit) somehow so when the execution is handed over to the function, your data will remain intact. That means the caller is
responsible of saving these registers in order to prevent their destruction when the function modifies the values held in these
registers. The caller is also responsible for restoring the saved values in registers when execution gets back to it.
Callee Save Registers means that when the function (the callee) needs more registers than those which are already saved by the caller,
callee is responsible for saving those values before going to its actual execution point. The registers that the callee is responsible for
are EBP, EBX, ESI, EDI (RBP, RBX, RSI, RDI 64-bit). The callee is responsible for restoring these saved values back to their place before
handing the execution back to the caller.
Structure of Registers
Every register is divided into some smaller pieces like below:
63 0
+----------------------------------+
| RAX |
+----------------+-----------------+
| RESERVED | EAX |
+----------------------------------+
|EXTENDED| AX |
+-----------------+
| |AH | AL|
+--------+--------+
31 16 8 0
63 0
+----------------------------------+
| RBX |
+----------------+-----------------+
| RESERVED | EBX |
+----------------------------------+
|EXTENDED| BX |
+-----------------+
| |BH | BL|
+--------+--------+
31 16 8 0
63 0
+----------------------------------+
| RCX |
+----------------+-----------------+
| RESERVED | ECX |
+----------------------------------+
|EXTENDED| CX |
+-----------------+
| |CH | CL |
+--------+--------+
31 16 8 0
63 0
+----------------------------------+
| RDX |
+----------------+-----------------+
| RESERVED | EDX |
+----------------------------------+
|EXTENDED| DX |
+-----------------+
| |DH | DL |
+--------+--------+
31 16 8 0
63 0
+----------------------------------+
| RSI |
+----------------+-----------------+
| RESERVED | ESI |
+----------------------------------+
|EXTENDED| SI |
+-----------------+
31 16 0
63 0
+----------------------------------+
| RDI |
+----------------+-----------------+
| RESERVED | EDI |
+----------------------------------+
|EXTENDED| DI |
+-----------------+
31 16 0
63 0
+----------------------------------+
| RBP |
+----------------+-----------------+
| RESERVED | EBP |
+----------------------------------+
|EXTENDED| BP |
+-----------------+
31 16 0
63 0
+----------------------------------+
| RSP |
+----------------+-----------------+
| RESERVED | ESP |
+----------------------------------+
|EXTENDED| SP |
+-----------------+
31 16 0
63 0
+----------------------------------+
| R8 |
+----------------+-----------------+
| RESERVED | R8D |
+----------------------------------+
|EXTENDED| R8W |
+-----------------+
| | |R8L |
+--------+--------+
31 16 7 0
63 0
+----------------------------------+
| R9 |
+----------------+-----------------+
| RESERVED | R9D |
+----------------------------------+
|EXTENDED| R9W |
+-----------------+
| | |R9L |
+--------+--------+
31 16 7 0
63 0
+----------------------------------+
| R10 |
+----------------+-----------------+
| RESERVED | R10D |
+----------------------------------+
|EXTENDED| R10W |
+-----------------+
| | |R10L|
+--------+--------+
31 16 7 0
63 0
+----------------------------------+
| R11 |
+----------------+-----------------+
| RESERVED | R11D |
+----------------------------------+
|EXTENDED| R11W |
+-----------------+
| | |R11L|
+--------+--------+
31 16 7 0
63 0
+----------------------------------+
| R12 |
+----------------+-----------------+
| RESERVED | R12D |
+----------------------------------+
|EXTENDED| R12W |
+-----------------+
| | |R12L|
+--------+--------+
31 16 7 0
63 0
+----------------------------------+
| R13 |
+----------------+-----------------+
| RESERVED | R13D |
+----------------------------------+
|EXTENDED| R13W |
+-----------------+
| | |R13L|
+--------+--------+
31 16 7 0
63 0
+----------------------------------+
| R14 |
+----------------+-----------------+
| RESERVED | R14D |
+----------------------------------+
|EXTENDED| R14W |
+-----------------+
| | |R14L|
+--------+--------+
31 16 7 0
63 0
+----------------------------------+
| R15 |
+----------------+-----------------+
| RESERVED | R15D |
+----------------------------------+
|EXTENDED| R15W |
+-----------------+
| | |R15L|
+--------+--------+
31 16 7 0
63 0
+----------------------------------+
| RFLAGS |
+----------------+-----------------+
| RESERVED | EFLAGS |
+----------------------------------+
|EXTENDED| FLAGS |
+-----------------+
31 0
You can access the small portions of registers in an assembly code. The concept of accessing and using these small portions becomes
clear in shellcoding when you want your shellcode to be as small as possible.
Here are some instructions for you but before you begin, you must know the basic syntax of an assembly instruction. We have 2 different notations of
assembly, Intel notation and AT&T notation.
In Intel notation after the instruction, first the destination is mentioned followed by a comma and the source.
instruction destination, source
In AT&T notation, after the instruction, first comes the source followed by a comma and then the destination. Every register has a percent sign(%)
appended to the beginning of it. It looks like this:
instruction %source, %destination
*** NOTE: The percent sign is only applied to the registers. It doesn't apply to the immediate values.
1.
_ _ ___ ___
| \| |/ _ \| _ \
| .` | (_) | _/
|_|\_|\___/|_|
YES! The first instruction for you to learn is NOP. NOP stands for No Operation. Better to wipe that smile off your face and tell me what NOP does. Ha? Nothing?
Well, you're wrong! NOP actually does something. A NOP instruction is like this:
XCHG eax,eax
It conceptually does nothing, but behind the scene it exchanges (XCHG as you guessed) the value in EAX with EAX.
2.
___ _ _ ___ _ _
| _ \ | | / __| || |
| _/ |_| \__ \ __ |
|_| \___/|___/_||_|
PUSH instruction pushes either a byte, a word, a dword or a quadword onto the stack.
For this part of tutorial I will only explain pushing a dword (4-byte value) onto the stack. The rest of them takes seconds to
understand. In order to fully understand what a push instruction does, you have to see it by demonstration. For the following
instructions:
(1) PUSH 0x41414141
(2) PUSH 0x42424242
(3) PUSH 0x43434343
Consider ESP points to some address that holds the content 0xDEADBEEF(0) before executing the 3 lines above. After the execution of each PUSH
instruction, ESP gets decremented by 4 and the value will be pushed on to the stack and the new ESP will point at it.
(0) (1) (2) (3)
+----------+ +----------+ +----------+ +----------+
ESP--> |0xDEADBEEF| |0xDEADBEEF| |0xDEADBEEF| |0xDEADBEEF| Higher Memory Addresses
+----------+ +----------+ +----------+ +----------+ .
| | ESP--> | A A A A | | A A A A | | A A A A | .
+----------+ +----------+ +----------+ +----------+ .
| | | | ESP--> | B B B B | | B B B B | .
+----------+ +----------+ +----------+ +----------+ .
| | | | | | ESP--> | C C C C | .
+----------+ +----------+ +----------+ +----------+ Lower Memory Addresses
ESP = 0x7fffff50 ESP = 0x7ffff4C ESP = 0x7fffff48 ESP = 0x7fffff44
3.
___ ___ ___
| _ \/ _ \| _ \
| _/ (_) | _/
|_| \___/|_|
POP is exactly the opposite of a PUSH instruction. It pops(moves) whatever value that ESP is currently pointing at to another register
and will increment ESP by 4(in the case of a DWORD). If you look at the demonstration below, assuming EAX holds the value 0xDEADCE11 before the execution of the
following 3 lines by issuing a PUSH EAX instruction(1). The current value at the address that ESP is pointing at, at the time (CCCC or
0x43434343) will be popped off the stack and it will show up in the EAX register and ESP will be incremented by 4. Notice that popping values off the
stack will not completely destroy the popped value. It just moves it to the register as the instruction defines and adds 4 bytes to ESP.
(1) POP EAX
(2) POP EAX
(3) POP EAX
(0) (1) (2) (3)
+----------+ +----------+ +----------+ +----------+
|0xDEADBEEF| |0xDEADBEEF| |0xDEADBEEF| ESP--> |0xDEADBEEF|
+----------+ +----------+ +----------+ +----------+
| A A A A | | A A A A | ESP--> | A A A A | | A A A A |
+----------+ +----------+ +----------+ +----------+
| B B B B | ESP--> | B B B B | | B B B B | | B B B B |
+----------+ +----------+ +----------+ +----------+
ESP--> | C C C C | | C C C C | | C C C C | | C C C C |
+----------+ +----------+ +----------+ +----------+
+----------+ +----------+ +----------+ +----------+
EAX |0xDEADCE11| EAX |0x43434343| EAX |0x42424242| EAX |0x41414141|
+----------+ +----------+ +----------+ +----------+
ESP |0x7fffff44| ESP |0x7fffff48| ESP |0x7fffff4C| ESP |0x7fffff50|
+----------+ +----------+ +----------+ +----------+
*** POP DWORD to a REGISTER
4.
___ _ _ _
/ __| /_\ | | | |
| (__ / _ \| |__| |__
\___/_/ \_\____|____|
One of the most important instructions for you to understand is the CALL instruction and its conventions. Understanding the calling conventions is
crucial in the field of reverse engineering. So before I tell you what happens while executing a CALL instruction, let's dive into the calling
conventions themselves. The calling conventions define how the code calls a function(subroutine) and how the parameters are passed to the function.
It is mostly dependent on the compiler and it can be configured to use a certain convention. But there are few of them and the most commonly used ones
are CDECL and STDCALL conventions.
CDECL:
"C Declaration" is the most commonly used convention for all C code and some C++. In CDECL, the caller must push the parameters of the
function that is gonna be called(callee) onto the stack from right to left. So for example if we have this function in C:
func (int a, int b){
...
...
}
int main (){
func(100,200);
int var = 300;
return 0;
}
The value "b" and then "a" must be pushed onto the stack (right to left) before calling "func". After calling the function "func", callee(func)
must save the previous stack frame pointer and create a new stack frame. Wait a minute! WTF? What's a stack frame? Oops! I forgot to tell you that. (:D)
Here, I will explain it now. Each function has its own stack frame. A stack frame is simply(by convention) an area that is a function's
playground in order to store local variables, etc. By calling a function, after passing the parameters, the called function must set up its own new
stack frame by executing 2 simple instructions as below:
(1) PUSH EBP
(2) MOV EBP,ESP
Line(1) saves the current stack pointer onto the stack, then on line(2) it will copy it to the EBP register which always points to the bottom (start of)
the stack. Both EBP and ESP hold the same value. Then after the function starts executing its main functions, ESP will point somewhere lower
than EBP (a frame full of local variables, etc.). A CALL instruction will push the address of the next instruction just after the CALL instruction onto to
the stack and will change the EIP with the address of the first line of the function's code section. Here's a demonstration for you to see the whole
picture:
.
.
.
(1) PUSH 0xC8
(2) PUSH 0x64
(3) CALL func
(4) PUSH 0x12C
.
.
.
BEFORE THE CALL:
(0) (1) (2) (3)
+----------+ +----------+ +----------+ +----------+
ESP--> |0xDEADBEEF| |0xDEADBEEF| |0xDEADBEEF| |0xDEADBEEF| Higher Memory Addresses
+----------+ +----------+ +----------+ +----------+ .
| | ESP--> | 200 | | 200 | | 200 | .
+----------+ +----------+ +----------+ +----------+ .
| | | | ESP--> | 100 | | 100 | .
+----------+ +----------+ +----------+ +----------+ .
| | | | | | ESP--> |addr of(4)| .
+----------+ +----------+ +----------+ +----------+ Lower Memory Addresses
ESP = 0x7fffff50 ESP = 0x7ffff4C ESP = 0x7fffff48 ESP = 0x7fffff44
AFTER THE CALL:
func:
(1) PUSH EBP
(2) MOV EBP,ESP
.
.
.
(0) (1) AND (2)
+----------+ +----------+
|0xDEADBEEF| |0xDEADBEEF|
+----------+ +----------+
| 200 | | 200 |
+----------+ +----------+
| 100 | | 100 |
+----------+ +----------+
ESP+-> |addr of(4)| |addr of(4)| ---> You'll see the exact reason why this address must be pushed onto the stack later but I can
+----------+ +----------+ tell you that it is there for when the function is done and wants to return to the caller.
NEW ESP+-> |SAVED EBP | SAVED EPB = 0x7fffff60 *You will see
+----------+ why this value must be saved
. before going any further.
.
+----------+ +----------+
EBP |0x7fffff60| EBP |0x7fffff40|
+----------+ +----------+
ESP |0x7fffff44| ESP |0x7fffff40|
+----------+ +----------+
In CDECL calling convention, function's return value will be put in EAX or EDX:EAX for primitive data types and after returning, caller is responsible
for cleaning up the stack. So here we wrap it up in the list below:
1. Most common calling convention for all C code and some C++ code.
2. The called function(the callee) expects its parameter to be pushed onto the stack from right to left.
3. First thing that the callee does is saving the old stack frame(PUSH EBP) and setting up a new one(MOV EBP,ESP). This procedure is called
"Function Prologue".
4. Returns data in EAX or EDX:EAX registers.
5. Caller is responsible for cleaning up the stack.
STDCALL:
The only difference between STDCALL and CDECL is that in STDCALL, the callee is responsible for cleaning up the stack. This calling convention is
mainly used by Microsoft C++ code (e.g. WIN32 API). You may have seen __stdcall declaration when using a function in Windows
API by ʥrst ʥnding its address in some windows Library (e.g. ZwQueryInformationProcess in ntdll.dll). In a high-level language such as C++, STDCALL is defined this way:
return-type __stdcall function-name[(argument-list)]
5.
___ ___ _____
| _ \ __|_ _|
| / _| | |
|_|_\___| |_|
We have 2 forms of Return Instruction:
1. It translates to the instruction "POP EIP". It means it pops whatever value is on top of the stack and puts it into EIP. This method is used
by a CDECL convention as caller is responsible for the stack clean-up.
2. It does exactly as number 1, plus it increments ESP by a given value(i.e. "RET 0x08" in the previous demonstration) after reverting the
previous stack frame, it pops the value pointer by ESP(address of (4)) and then increments ESP by 8 which will remove the arguments b and a
that were pushed on to the stack before. If you pay close attention, you will recognize that this action represents a STDCALL convention
where the callee is responsible for cleaning up the stack.
*** Note: In terms of exploitation, specifically buffer overflows, changing the return address(address of (4) in above demonstration) means
gaining control of the EIP register which holds the key to the program's execution path.
6.
__ __ _____ __
| \/ |/ _ \ \ / /
| |\/| | (_) \ V /
|_| |_|\___/ \_/
A MOV instruction simply copies from source to destination(notice that you have this in the background process of your brain since we saw it
when explaining the stack and calling conventions). We can move data in 3 different ways:
1. Register to Register
2. Memory to Register / Register to Register
3. Immediate to Register / Immediate to Memory
As you it guessed the MOV instruction can't move data from memory to memory. The memory addresses in most of the assembly instructions are used in
a way called r/m32 which will be explained in later chapters.
Now let's take a look at a very simple piece of code:
example1.c sub: main:
---------------------------------------+-----------------------------+------------------------------+
int sub(){ | 00401000 push ebp | 00401010 push ebp |
return 0xbeef; | 00401001 mov ebp,esp | 00401001 mov ebp,esp |
} | 00401003 mov eax,0xBEEF | 00401013 call sub(401000h) |
int main(){ | 00401008 pop ebp | 00401018 mov eax,0xF00D |
sub(); | 00401009 ret | 0040101D pop ebp |
return 0xf00d; | | 0040101E ret |
} | | |
---------------------------------------+-----------------------------+------------------------------+
*** Note: This piece of code is compiled without any optimization and security protection. Your assembly instruction may look different but don't
worry because this example serves the educational purpose as we have it in the simplest way. You will see more complicated and up-to-date instructions
as you go along in this book.
If we assume that the first thing that's gonna start executing is main(), this piece of code is gonna call the function sub() and then sub() is
gonna return the hex value 0xBEEF and main is not gonna use it in anyway and return 0xF00D and exit.
In assembly code, we assume the entry point of our program is main(). The first 2 instructions are the function prologue as we discussed
before. It saves the previous stack frame(PUSH EBP). This is done based on a simple fact that main() is not the first function that is called to
start executing. There are tons of them, you can check it in gdb if you're interested but for now we assume main() is the entry point. Later it
creates its own stack frame (MOV EBP,ESP). After executing those 2 lines, the stack should look like this:
+----------+
|Saved EIP | --> Return to whoever called main()
+----------+
ESP -> |SAVED EPB | --> Save the previous stack frame
+----------+
| |
+----------+
| |
+----------+
EIP = 00401013
EBP = 7fffff50
ESP = 7fffff50
Now when the call Instruction is gonna execute, the address of the very next instruction after the call instruction in main() is gonna get
pushed on to the stack which in this case is 0x00401018 (MOV EAX,0xF00D) and EIP will be point to the first instruction in sub() which is 0x00401000
(PUSH EBP). See when that happens the stack will look like this:
+----------+
|Saved EIP | --> Return to whoever called main()
+----------+
|SAVED EPB | --> Save the previous stack frame
+----------+
ESP -> | 18104000 | --> Address of the next instruction after the call. Pay attention that this address must be in Little-Endian format since it's
+----------+ saved in memory. Also as a side effect of a call instruction, ESP gets decremented by 4.
| |
+----------+
EIP = 00401000
EBP = 7fffff50
ESP = 7fffff4C
The only thing that sub() does is returning 0xBEEF. As it was mentioned before, the EAX register is mostly used for the function's return value.
after executing the function's prologue(PUSH EBP and MOV EBP,ESP), the hex value 0xBEEF is gonna be put in EAX. Here's how the stack will look like:
+----------+
|Saved EIP | --> Return to whoever called main()
+----------+
|SAVED EPB | --> Save the previous stack frame
+----------+
| 18104000 | --> Address of the next instruction after the call. Pay attention that this address must be in Little-Endian format since it's
+----------+ saved in memory.
ESP -> | 50ffff7f | --> Previous EBP(stack frame) will be push on to the stack and ESP will get decremented as a side effect of PUSH instruction.
+----------+ This address is also saved in memory in Little-Endian format.
| |
+----------+
EIP = 00401008
EBP = 7fffff48
ESP = 7fffff48
EAX = 0000BEEF
Now the return value of the function sub() has been put in EAX register, it's time get back to main(). The next instruction to execute is
the POP EBP instruction. As mentioned before, a POP instruction, gets whatever value that ESP currently points at and puts in in the register that is
written in front of it. ESP currently has the value of 0x7FFFFF48 which points to the value 50FFFF7F(Little-Endian). So after executing POP EBP
instruction, the stack will look like this:
+----------+
|Saved EIP | --> Return to whoever called main()
+----------+
|SAVED EPB | --> Save the previous stack frame
+----------+
ESP --> | 18104000 | --> Address of the next instruction after the call. We will use this to return the execution to main. As a side effect of the POP
+----------+ instruction, ESP is incremented by 4.
| 50ffff7f | --> Previous EBP (stack frame) will be popped off the stack and gets put in the EBP register. This value will not be completely
+----------+ wiped off the stack but the program has nothing to do with it and it's the OS' concern not us.
| |
+----------+
EIP = 00401009
EBP = 7fffff50
ESP = 7fffff4C
EAX = 0000BEEF
Now we got back to our previous stack frame by popping the saved EBP back to the EBP register, it's time to go back to main. When RET is
executed, what's gonna happen is that whatever value that ESP currently points at is gonna pop off the stack and appear in the EIP register. So here
is the stack after executing the RET instruction:
+----------+
|Saved EIP | --> Return address to whoever called main()
+----------+
ESP --> |SAVED EPB | --> ESP point here after executing the RET instruction.
+----------+
| 18104000 | --> Address of the next instruction after the call. We will use this to return the execution to main. As a side effect of the
+----------+ POP instruction, ESP is incremented by 4.
|undefined |
+----------+
| |
+----------+
EIP = 00401018
EBP = 7fffff50
ESP = 7fffff50
EAX = 0000BEEF
EIP points to the instruction just after the CALL sub() instruction which is MOV EAX,0xF00D. Now after executing this, the EAX register will hold
the value 0xF00D and the stack will remain the same. Now what's gonna happen after executing the RET instruction in main()? The same thing we saw in
sub(). The saved EBP(previous stack frame before calling main()) will be popped off the stack and EBP will be reset again to that value. Then RET
will put the saved EIP value into EIP and decrement ESP by 4 and return to whatever function it was at(probably a kernel module, I don't know).
Well that was fairly easy but it was a fairly good example for you to understand how calling and returning from calls work. Before we jump to our
next example, here I introduce you to R/M32:
Whenever you see the term R/M32 in Intel's manual or such, it means it can get the value you're looking for using a combination of a register
pointing to a memory location plus some offset or optionally a scale multiplier. I guess you may be doing your WTF gesture now (:D). What that means is
that you specify a register that points to a memory location that your program needs and based on that address you may add some offset to it to access
the exact value you want. For example, imagine after calling a function, that function wants to move some value from previous stack frame to some
register to work with. That actually happens every time a function wants to access the parameters passed to it. If you remember as mentioned before,
right before a call instruction, the parameters passed to the function must be pushed onto the stack from right to left. So when the
called function wants to access those parameters, if we assume that the function's prologue is executed and the very first instruction after it is to
return its parameter back (just for the sake of simplicity), that instruction would be as follows:
(00401003) mov eax, [ebp + 8] --> Take EBP, add 8 bytes to it, go to that memory address and take whatever is in there and put it in EAX.
+----------+
|Func Param| --> This value is pushed onto the stack just before the call since it is the function's parameter.
+----------+
ESP --> |SAVED EIP | --> Return address to whoever called the function.
+----------+
|SAVED EPB | --> Saving previous stack frame.
+----------+
| |
+----------+
| |
+----------+
EIP = 00401003
EBP = 7fffff50
ESP = 7fffff50
EAX = some value (before) ---> EAX = Func Param (after)
Now one thing you may have noticed is the brackets. So here's a rule which applies 99% of the time you see a register inside brackets:
A register(plus the index or scale) simply means that: go to the memory address at that location and get whatever actual value is in
there and do whatever is asked for. We need the content in that memory address, not the address itself.
So if we sum up R/M32 it would be:
[Base index*scale + displacement]
Where Base is a register such as EAX, EBX, ESP, EBP, etc. and index again is another register multiplied by a scale plus the
displacement(offset). In the above example, we only used Base plus displacement which happens to be the most common R/M32 form you will see.
Remember that all of these parts in the brackets are optional which means that you can put a hardcoded memory address inside the
bracket(i.e. [7FFFFF58]).
Here are some new instructions for you to continue to the next example:
7.
_ ___ ___
/_\ | \| \
/ _ \| |) | |) |
/_/ \_\___/|___/
Fairly easy, right? It takes the source and adds it to the destination and puts the final value in the destination. For example:
add eax,0x10 ---> will add decimal value 16(or hex 10) to EAX and updates EAX with the result.
8.
___ _ _ ___
/ __| | | | _ )
\__ \ |_| | _ \
|___/\___/|___/