Skip to content
Permalink

Comparing changes

Choose two branches to see what’s changed or to start a new pull request. If you need to, you can also or learn more about diff comparisons.

Open a pull request

Create a new pull request by comparing changes across two branches. If you need to, you can also . Learn more about diff comparisons here.
base repository: 0xADE1A1DE/AssemblyLine
Failed to load repositories. Confirm that selected base ref is valid, then try again.
Loading
base: v1.0.1
Choose a base ref
...
head repository: 0xADE1A1DE/AssemblyLine
Failed to load repositories. Confirm that selected head ref is valid, then try again.
Loading
compare: main
Choose a head ref

Commits on Sep 9, 2021

  1. Support prefetch instructions

    This commit adds support for
    
      * prefetcht0
      * prefetcht1
      * prefetcht2
      * prefetchnta
    
    instructions
    mlq committed Sep 9, 2021
    Copy the full SHA
    3f7bd57 View commit details

Commits on Sep 10, 2021

  1. Merge pull request #1 from mlq/feature/prefetch

    Support prefetch instructions
    davidywu9 authored Sep 10, 2021
    Copy the full SHA
    cd39fc3 View commit details
  2. Copy the full SHA
    5ae401a View commit details
  3. Copy the full SHA
    9613cfb View commit details
  4. updated Changelog

    davidywu9 committed Sep 10, 2021
    Copy the full SHA
    831f706 View commit details
  5. Copy the full SHA
    6147dbd View commit details

Commits on Sep 12, 2021

  1. Copy the full SHA
    8c3d323 View commit details
  2. Copy the full SHA
    cb4ca94 View commit details

Commits on Sep 13, 2021

  1. Copy the full SHA
    2a5cb47 View commit details
  2. Copy the full SHA
    baa7caf View commit details
  3. Copy the full SHA
    63a3425 View commit details
  4. Copy the full SHA
    af5f108 View commit details

Commits on Sep 14, 2021

  1. Copy the full SHA
    3a6bc92 View commit details
  2. updated comments

    davidywu9 committed Sep 14, 2021
    Copy the full SHA
    a78dd4c View commit details
  3. Update README.md

    davidywu9 authored Sep 14, 2021
    Copy the full SHA
    5b87329 View commit details
  4. Update README.md

    davidywu9 authored Sep 14, 2021
    Copy the full SHA
    069f47c View commit details
  5. Update README.md

    davidywu9 authored Sep 14, 2021
    Copy the full SHA
    4fb3d4e View commit details
  6. Copy the full SHA
    363025d View commit details
  7. added support for ror

    dderjoel committed Sep 14, 2021
    Copy the full SHA
    30438fb View commit details
  8. Update README.md

    davidywu9 authored Sep 14, 2021
    Copy the full SHA
    0e27044 View commit details
  9. Update README.md

    davidywu9 authored Sep 14, 2021
    Copy the full SHA
    99da9b7 View commit details
  10. Copy the full SHA
    b16a228 View commit details
  11. Copy the full SHA
    a02c05f View commit details
  12. Copy the full SHA
    7324faa View commit details
  13. cleaned up scripts

    dderjoel committed Sep 14, 2021
    Copy the full SHA
    e1c1d9e View commit details
  14. ignoring make artifacts

    dderjoel committed Sep 14, 2021
    Copy the full SHA
    a2a3b82 View commit details
  15. fixed typos, wording

    dderjoel committed Sep 14, 2021
    Copy the full SHA
    4dc93b0 View commit details

Commits on Sep 15, 2021

  1. Fix rdtsc and add rdtscp

    mlq committed Sep 15, 2021
    Copy the full SHA
    058e36a View commit details
  2. Merge pull request #1 from daviduwu9/ror

    added support for ror
    davidywu9 authored Sep 15, 2021
    Copy the full SHA
    7da1cdc View commit details
  3. Copy the full SHA
    eb260ed View commit details
  4. Update README.md

    davidywu9 authored Sep 15, 2021
    Copy the full SHA
    4245255 View commit details
  5. Copy the full SHA
    63f7f92 View commit details
  6. Update README.md

    davidywu9 authored Sep 15, 2021
    Copy the full SHA
    59a799e View commit details
  7. Merge pull request #5 from mlq/feature/rdtsc

    Fix rdtsc and add rdtscp
    dderjoel authored Sep 15, 2021
    Copy the full SHA
    3510ce4 View commit details
  8. adding tests for rdtsc and rdtscp

    Signed-off-by: Joel <rootjdev@gmail.com>
    dderjoel committed Sep 15, 2021
    Copy the full SHA
    f58b8eb View commit details
  9. Copy the full SHA
    4abe169 View commit details
  10. Copy the full SHA
    ebf15e2 View commit details
  11. fix rdtsc(p)

    dderjoel committed Sep 15, 2021
    Copy the full SHA
    cf08ac5 View commit details
  12. Copy the full SHA
    fad87cb View commit details

Commits on Sep 16, 2021

  1. Copy the full SHA
    6ea373c View commit details
  2. sorted source files

    dderjoel committed Sep 16, 2021
    Copy the full SHA
    2e1ee98 View commit details
  3. added library versioning

    dderjoel committed Sep 16, 2021
    Copy the full SHA
    04af206 View commit details
  4. updating changelog

    dderjoel committed Sep 16, 2021
    Copy the full SHA
    1e78d59 View commit details
  5. release 1.0.3

    dderjoel committed Sep 16, 2021
    Copy the full SHA
    81f7e7a View commit details
  6. bumped version in Makefile

    dderjoel committed Sep 16, 2021
    Copy the full SHA
    8f3a508 View commit details
  7. Merge pull request #7 from daviduwu9/main

    release 1.0.3
    dderjoel authored Sep 16, 2021
    Copy the full SHA
    db60743 View commit details

Commits on Sep 17, 2021

  1. Copy the full SHA
    585956e View commit details
  2. Create LICENSE

    javali7 authored Sep 17, 2021
    Copy the full SHA
    c502d7d View commit details
  3. Update LICENSE

    javali7 authored Sep 17, 2021
    Copy the full SHA
    73884a4 View commit details
  4. Fix rdtsc and add rdtscp

    mlq authored and davidywu9 committed Sep 17, 2021
    Copy the full SHA
    fa5e2f3 View commit details
Showing with 85,106 additions and 1,766 deletions.
  1. +178 −0 .clang-format
  2. +46 −0 .clang-tidy
  3. +27 −0 .github/workflows/c-check.yml
  4. +38 −0 .github/workflows/clang-format-check.yml
  5. +18 −2 .gitignore
  6. +223 −29 Changelog
  7. +198 −10 LICENSE
  8. +269 −12 Makefile.am
  9. +143 −187 README.md
  10. +9 −0 TROUBLESHOOTING.md
  11. +4 −0 _config.yml
  12. +32 −0 action.yml
  13. +10 −0 assemblyline.pc.in
  14. +5 −0 compile_flags.txt
  15. +47 −6 configure.ac
  16. +31 −0 data/completion/_asmline
  17. +40 −0 data/completion/asmline
  18. +148 −0 man/asmline.1
  19. +172 −0 man/libassemblyline.3
  20. +0 −27 src/Makefile.am
  21. +67 −0 src/README.md
  22. +227 −134 src/assembler.c
  23. +6 −6 src/assembler.h
  24. +187 −18 src/assemblyline.c
  25. +176 −21 src/assemblyline.h
  26. +132 −21 src/common.h
  27. +259 −176 src/encoder.c
  28. +7 −7 src/encoder.h
  29. +177 −35 src/enums.h
  30. +21 −28 src/instr_parser.c
  31. +2 −8 src/instr_parser.h
  32. +74 −30 src/instruction_data.h
  33. +369 −0 src/instructions.c
  34. +28 −35 src/instructions.h
  35. +107 −79 src/parser.c
  36. +1 −1 src/parser.h
  37. +108 −53 src/prefix.c
  38. +10 −13 src/prefix.h
  39. +187 −74 src/reg_parser.c
  40. +11 −8 src/reg_parser.h
  41. +57 −0 src/registers.c
  42. +11 −34 src/registers.h
  43. +0 −163 src/supported_instructions.h
  44. +95 −52 src/tokenizer.c
  45. +2 −2 src/tokenizer.h
  46. +171 −0 test/MOV_REG_IMM.asm
  47. +0 −63 test/Makefile.am
  48. +101 −0 test/adc.asm
  49. +96 −1 test/add.asm
  50. +35 −10 test/al_nasm_compare.sh
  51. +102 −0 test/and.asm
  52. +0 −64 test/asm_to_stdout.c
  53. +12,549 −0 test/bextr.asm
  54. +24 −11 test/check_chunk_counting.c
  55. +290 −0 test/clflush.asm
  56. +506 −0 test/cmp.asm
  57. +4 −0 test/cpuid.asm
  58. +4 −0 test/eaf/imul.eaf
  59. +11 −0 test/eaf/misc.eaf
  60. +12 −0 test/high_low_xmm.asm
  61. +837 −1 test/imul.asm
  62. +37 −14 test/invalid.c
  63. +113 −0 test/jmp.asm
  64. +47 −41 test/jump.c
  65. +4,105 −20 test/lea.asm
  66. +12 −0 test/lea_no_base.asm
  67. +18 −14 test/memory_reallocation.c
  68. +27 −5 test/mov.asm
  69. +225 −0 test/mov_reg_imm.asm
  70. +57 −0 test/mov_reg_imm32.asm
  71. +381 −0 test/movd.asm
  72. +102 −8 test/movntdqa.asm
  73. +31 −0 test/movntq.asm
  74. +585 −0 test/movq.asm
  75. +253 −14 test/movzx.asm
  76. +42 −3 test/mulx.asm
  77. +179 −0 test/neg.asm
  78. +6 −3 test/{new_instruction.asm → no_operand.asm}
  79. +1,039 −0 test/no_ptr.asm
  80. +5 −2 test/nop.asm
  81. +357 −0 test/optimization_disabled.c
  82. +333 −30 test/or.asm
  83. +274 −0 test/paddb.asm
  84. +274 −0 test/paddd.asm
  85. +308 −0 test/paddq.asm
  86. +274 −0 test/paddw.asm
  87. +84 −0 test/pand.asm
  88. +274 −0 test/pmuldq.asm
  89. +274 −0 test/pmulhuw.asm
  90. +274 −0 test/pmulhw.asm
  91. +274 −0 test/pmulld.asm
  92. +274 −0 test/pmullq.asm
  93. +274 −0 test/pmullw.asm
  94. +274 −0 test/pmuludq.asm
  95. +84 −0 test/por.asm
  96. +52 −0 test/prefetch.asm
  97. +274 −0 test/psubb.asm
  98. +274 −0 test/psubd.asm
  99. +274 −0 test/psubq.asm
  100. +274 −0 test/psubw.asm
  101. +1,288 −0 test/ptr.asm
  102. +43 −0 test/push.asm
  103. +84 −0 test/pxor.asm
  104. +4 −0 test/rdtsc.asm
  105. +4 −0 test/rdtscp.asm
  106. +16 −0 test/ror.asm
  107. +1,030 −18 test/rorx.asm
  108. +76 −31 test/run.c
  109. +438 −0 test/sal.asm
  110. +424 −0 test/sar.asm
  111. +4,383 −3 test/sarx.asm
  112. +100 −0 test/sbb.asm
  113. +63 −0 test/setcc.asm
  114. +135 −4 test/shl.asm
  115. +1,027 −0 test/shld.asm
  116. +291 −0 test/shlx.asm
  117. +144 −0 test/shr.asm
  118. +4,382 −9 test/shrx.asm
  119. +77 −0 test/sub.asm
  120. +23 −0 test/tap/call.tap
  121. +3 −0 test/tap/cmp.tap
  122. +63 −0 test/tap/compiler.sh
  123. +2 −0 test/tap/imul.tap
  124. +18 −0 test/tap/lea.tap
  125. +142 −0 test/tap/misc.tap
  126. +31 −0 test/tap/movq.tap
  127. +6 −0 test/tap/nasm_incompatible.tap
  128. +9 −0 test/tap/tap_example.tap
  129. +21 −0 test/tap/vmovdqu.tap
  130. +20 −0 test/tap/vmovupd.tap
  131. +14 −0 test/tap/xor.tap
  132. +33 −0 test/test.asm
  133. +146 −0 test/tools/asmline.sh
  134. +16 −0 test/tools/asmlineP.sh
  135. +12,291 −0 test/vaddpd.asm
  136. +11 −0 test/vector_add.asm
  137. +10 −0 test/vector_add_mem.asm
  138. +14 −0 test/vector_float_divide.asm
  139. +14 −0 test/vector_float_mul.asm
  140. +11 −0 test/vector_mul.asm
  141. +238 −0 test/vector_operations.c
  142. +11 −0 test/vector_sub.asm
  143. +771 −0 test/vmovupd.asm
  144. +12,291 −0 test/vperm2i128.asm
  145. +12,291 −0 test/vsubpd.asm
  146. +9 −0 test/xabort.asm
  147. +786 −0 test/xchg.asm
  148. +3 −1 test/xor.asm
  149. +0 −11 test/xorMI.asm
  150. +0 −1 test/xorRM.asm
  151. +0 −11 tools/Makefile.am
  152. +314 −0 tools/README.md
  153. +466 −102 tools/asmline.c
178 changes: 178 additions & 0 deletions .clang-format
Original file line number Diff line number Diff line change
@@ -0,0 +1,178 @@
---
Language: Cpp
# BasedOnStyle: LLVM
AccessModifierOffset: -2
AlignAfterOpenBracket: Align
AlignArrayOfStructures: None
AlignConsecutiveMacros: None
AlignConsecutiveAssignments: None
AlignConsecutiveBitFields: None
AlignConsecutiveDeclarations: None
AlignEscapedNewlines: Right
AlignOperands: Align
AlignTrailingComments: true
AllowAllArgumentsOnNextLine: true
AllowAllConstructorInitializersOnNextLine: true
AllowAllParametersOfDeclarationOnNextLine: true
AllowShortEnumsOnASingleLine: true
AllowShortBlocksOnASingleLine: Never
AllowShortCaseLabelsOnASingleLine: false
AllowShortFunctionsOnASingleLine: All
AllowShortLambdasOnASingleLine: All
AllowShortIfStatementsOnASingleLine: Never
AllowShortLoopsOnASingleLine: false
AlwaysBreakAfterDefinitionReturnType: None
AlwaysBreakAfterReturnType: None
AlwaysBreakBeforeMultilineStrings: false
AlwaysBreakTemplateDeclarations: MultiLine
AttributeMacros:
- __capability
BinPackArguments: true
BinPackParameters: true
BraceWrapping:
AfterCaseLabel: false
AfterClass: false
AfterControlStatement: Never
AfterEnum: false
AfterFunction: false
AfterNamespace: false
AfterObjCDeclaration: false
AfterStruct: false
AfterUnion: false
AfterExternBlock: false
BeforeCatch: false
BeforeElse: false
BeforeLambdaBody: false
BeforeWhile: false
IndentBraces: false
SplitEmptyFunction: true
SplitEmptyRecord: true
SplitEmptyNamespace: true
BreakBeforeBinaryOperators: None
BreakBeforeConceptDeclarations: true
BreakBeforeBraces: Attach
BreakBeforeInheritanceComma: false
BreakInheritanceList: BeforeColon
BreakBeforeTernaryOperators: true
BreakConstructorInitializersBeforeComma: false
BreakConstructorInitializers: BeforeColon
BreakAfterJavaFieldAnnotations: false
BreakStringLiterals: true
ColumnLimit: 80
CommentPragmas: '^ IWYU pragma:'
CompactNamespaces: false
ConstructorInitializerAllOnOneLineOrOnePerLine: false
ConstructorInitializerIndentWidth: 4
ContinuationIndentWidth: 4
Cpp11BracedListStyle: true
DeriveLineEnding: true
DerivePointerAlignment: false
DisableFormat: false
EmptyLineAfterAccessModifier: Never
EmptyLineBeforeAccessModifier: LogicalBlock
ExperimentalAutoDetectBinPacking: false
FixNamespaceComments: true
ForEachMacros:
- foreach
- Q_FOREACH
- BOOST_FOREACH
IfMacros:
- KJ_IF_MAYBE
IncludeBlocks: Preserve
IncludeCategories:
- Regex: '^"(llvm|llvm-c|clang|clang-c)/'
Priority: 2
SortPriority: 0
CaseSensitive: false
- Regex: '^(<|"(gtest|gmock|isl|json)/)'
Priority: 3
SortPriority: 0
CaseSensitive: false
- Regex: '.*'
Priority: 1
SortPriority: 0
CaseSensitive: false
IncludeIsMainRegex: '(Test)?$'
IncludeIsMainSourceRegex: ''
IndentAccessModifiers: false
IndentCaseLabels: false
IndentCaseBlocks: false
IndentGotoLabels: true
IndentPPDirectives: None
IndentExternBlock: AfterExternBlock
IndentRequires: false
IndentWidth: 2
IndentWrappedFunctionNames: false
InsertTrailingCommas: None
JavaScriptQuotes: Leave
JavaScriptWrapImports: true
KeepEmptyLinesAtTheStartOfBlocks: true
LambdaBodyIndentation: Signature
MacroBlockBegin: ''
MacroBlockEnd: ''
MaxEmptyLinesToKeep: 1
NamespaceIndentation: None
ObjCBinPackProtocolList: Auto
ObjCBlockIndentWidth: 2
ObjCBreakBeforeNestedBlockParam: true
ObjCSpaceAfterProperty: false
ObjCSpaceBeforeProtocolList: true
PenaltyBreakAssignment: 2
PenaltyBreakBeforeFirstCallParameter: 19
PenaltyBreakComment: 300
PenaltyBreakFirstLessLess: 120
PenaltyBreakString: 1000
PenaltyBreakTemplateDeclaration: 10
PenaltyExcessCharacter: 1000000
PenaltyReturnTypeOnItsOwnLine: 60
PenaltyIndentedWhitespace: 0
PointerAlignment: Right
PPIndentWidth: -1
ReferenceAlignment: Pointer
ReflowComments: true
ShortNamespaceLines: 1
SortIncludes: CaseSensitive
SortJavaStaticImport: Before
SortUsingDeclarations: true
SpaceAfterCStyleCast: false
SpaceAfterLogicalNot: false
SpaceAfterTemplateKeyword: true
SpaceBeforeAssignmentOperators: true
SpaceBeforeCaseColon: false
SpaceBeforeCpp11BracedList: false
SpaceBeforeCtorInitializerColon: true
SpaceBeforeInheritanceColon: true
SpaceBeforeParens: ControlStatements
SpaceAroundPointerQualifiers: Default
SpaceBeforeRangeBasedForLoopColon: true
SpaceInEmptyBlock: false
SpaceInEmptyParentheses: false
SpacesBeforeTrailingComments: 1
SpacesInAngles: Never
SpacesInConditionalStatement: false
SpacesInContainerLiterals: true
SpacesInCStyleCastParentheses: false
SpacesInLineCommentPrefix:
Minimum: 1
Maximum: -1
SpacesInParentheses: false
SpacesInSquareBrackets: false
SpaceBeforeSquareBrackets: false
BitFieldColonSpacing: Both
Standard: Latest
StatementAttributeLikeMacros:
- Q_EMIT
StatementMacros:
- Q_UNUSED
- QT_REQUIRE_VERSION
TabWidth: 8
UseCRLF: false
UseTab: Never
WhitespaceSensitiveMacros:
- STRINGIZE
- PP_STRINGIZE
- BOOST_PP_STRINGIZE
- NS_SWIFT_NAME
- CF_SWIFT_NAME
...

46 changes: 46 additions & 0 deletions .clang-tidy
Original file line number Diff line number Diff line change
@@ -0,0 +1,46 @@
---

Checks: 'boost-*,bugprone-*,performance-*,readability-*,portability-*,modernize-*,clang-analyzer-*,cppcoreguidelines-*,-readability-braces-around-statements'
WarningsAsErrors: ''
HeaderFilterRegex: ''
AnalyzeTemporaryDtors: false
FormatStyle: llvm
CheckOptions:
- key: llvm-else-after-return.WarnOnConditionVariables
value: 'false'
- key: modernize-loop-convert.MinConfidence
value: reasonable
- key: modernize-replace-auto-ptr.IncludeStyle
value: llvm
- key: modernize-pass-by-value.IncludeStyle
value: llvm
- key: google-readability-namespace-comments.ShortNamespaceLines
value: '10'
- key: google-readability-namespace-comments.SpacesBeforeComments
value: '2'
- key: cppcoreguidelines-non-private-member-variables-in-classes.IgnoreClassesWithAllMemberVariablesBeingPublic
value: 'true'
- key: google-readability-braces-around-statements.ShortStatementLines
value: '1'
- key: cert-oop54-cpp.WarnOnlyIfThisHasSuspiciousField
value: 'false'
- key: modernize-loop-convert.MaxCopySize
value: '16'
- key: cert-dcl16-c.NewSuffixes
value: 'L;LL;LU;LLU'
- key: cert-str34-c.DiagnoseSignedUnsignedCharComparisons
value: 'false'
- key: cppcoreguidelines-explicit-virtual-functions.IgnoreDestructors
value: 'true'
- key: modernize-use-nullptr.NullMacros
value: 'NULL'
- key: llvm-qualified-auto.AddConstToQualified
value: 'false'
- key: modernize-loop-convert.NamingStyle
value: CamelCase
- key: llvm-else-after-return.WarnOnUnfixable
value: 'false'
- key: google-readability-function-size.StatementThreshold
value: '800'
...

27 changes: 27 additions & 0 deletions .github/workflows/c-check.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
name: Make check

on: [ push ]

jobs:
build:
runs-on: ubuntu-latest

name: Build and test

steps:
- uses: actions/checkout@v3

- name: install nasm
run: sudo apt install -y nasm

- name: autogen
run: ./autogen.sh

- name: configure
run: ./configure

- name: make
run: make

- name: make check
run: make check
38 changes: 38 additions & 0 deletions .github/workflows/clang-format-check.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,38 @@
name: Clang-format Check

on: [ push ]

jobs:
formatting-check:
runs-on: ubuntu-latest

name: Expressive Formatting Check

steps:
- uses: actions/checkout@v2

- name: Run clang-format style check for C/C++/Protobuf programs.
uses: jidicula/clang-format-action@v4.6.2
with:
clang-format-version: '13'

lint:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- name: C/C++ Lint Action
uses: cpp-linter/cpp-linter-action@v1.4.3
id: linter
env:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
with:
files-changed-only: false
thread-comments: true
style: file
version: 13
tidy-checks: ""
- name: Fail fast?!
if: steps.linter.outputs.checks-failed > 0
run: |
echo "Some files failed the linting checks!"; exit 1
20 changes: 18 additions & 2 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -17,6 +17,7 @@ Thumbs.db
*.libs
*.log
*.trs
.dirstamp
assemblyline-*.tar.gz
compile_commands.json
Makefile
@@ -33,12 +34,27 @@ lib/
libtool
run_test
stamp-h1
!tools/test_*.[ch]
!tools/Makefile.[amin]
tools/*
!tools/*.[ch]

test/*
!test/*.c
!test/*.sh
!test/*.asm
!test/tools/*.sh
!test/tap
!test/eaf

man/*
!man/libassemblyline.3
!man/asmline.1

*.in
aclocal.m4
configure
assemblyline.pc
build-aux/*
m4/*
config.h.in


252 changes: 223 additions & 29 deletions Changelog
Original file line number Diff line number Diff line change
@@ -1,46 +1,240 @@
version 1.0.1-release (2021-09-09)
version 1.3.2-release (2022-11-16)

- added support for the rorx instruction
- updated documentation
- fixed bug which caused `asmline -{c,b} <NUM>` to not display the
correct chuks

- asmline {-h,--help} now emits to stdout to be conform with GNU standards
and with a slightly different indentation.

version 1.0.0-release (2021-08-26)
- provide bash and zsh auto completion helper for asmline cli-tool

version 1.3.1-release (2022-11-10)

- added support for all shift instructions shr/shl/sar/sal as well as all
their respective operand encodings

- fixed operand encoding for xchg intruction when using the return register

- fixed xabort instruction missing 8-bit immediate value

- fixed xbegin instruction rel32 offset value

- initial release
version 1.3.1-rc.3 (2022-08-17)

- fixed a bug which caused a segmentation fault when writing to a file via
the -P switch

version 1.3.1-rc.2 (2022-08-09)

- added four encodings for {vmovdqu, vmovupd} xmmN {xmmN,m/128}
and two for {vmovdqu, vmovupd} m/128, xmmN

version 1.3.1-rc.1 (2022-08-09)

- added encoding for movq mv

- added support for rdpmc

version 1.3.0-release (2022-07-27)

- added support for multiple keywords ex: "jmp far word [rax]"

- added support for unconditional far jump

- added support for memory operand without register ex: "[0x7fffffff]"

- added support for push instruction with M and I operand encoding

- implemented thread safe design for global variables

- fixed missing SIB byte for jmp instruction

- fixed a bug which can lead to SIGSEGV's on certain systems using
`strtok_r`. (Thanks to @hugsy for pointing that out)

- fixed bug where the wrong return code is returned if the internal buffer
it too small and assemblyline is compiled for non-Linux systems

version 5.3.4-beta (2021-08-18)
- CI: Using GitHub Actions to validate Code Formatting

FEATURES
- added support for "[register + register]" syntax for all instuctions with RM or MR operand encoding
- added support for all 'cmovcc' instructions
- CI: Using GitHub Actions to build and run checks

BUG FIXES
- additional information is provided via errno when any system call fails
- improvments were made to register opcode encoding
- missing zero byte opcode is address
- negative memory displacement parsing bug is fixed
- Repo: Introducing badges to show current stats for the repository

- Tests: run.c now checks for cpu-flags to run the code. (Sometimes it
fails because of missing CPU capabilities on GHActions)

version 5.1.1-beta (2021-07-26)

API CHANGES
- addressed compatibility issues with measuresuite: a chunk size of 0 could now be provided to
assemble_string_counting_chunks() it will be treated as assemble_str()

FEATURES
- added support in code multi length nop instructions (treated as marcos in when using nasm)
- assemblyline ignores macros within .asm files as well as in strings
version 1.2.2-release (2022-03-29)

BUG FIXES
- an error is returned when the 'short' keyword is used for branch instructions with offsets
greater than 0x7f
- added support for neg: Two's Complement Negation

- fixed a bug where shift and rotate instruction produce the wrong opcode
when immediate is zero

version 4.8.2-beta (2021-06-21)
- fixed a bug when using sib addressing with no base and a single operand

FEATURES
- added support for counting the number of instructions that break s chunk boundry of specified size
- asmline can now count the number of instructions that breaks a chunk boundary

- the number of instructions that break a chunk boundary can be counted when
assembling a file.

- adding `cpuid`-instruction



version 1.2.1-release (2022-02-23)

- Fixed bugs regarding -imm/8


version 1.2.0-release (2022-02-19)

- binary file can now be generated from libassemblyline
with asm_create_bin_file()

- reformatted man page for wide screen

- added support for all operand encodings for and, or, xor, sub, and add
instructions

- added support for all setcc instructions - set byte on condition

- added a command-line option in asmline for setting
assembly mode to NASM '$ asmline --nasm-sib-no-base' where:
in SIB addressing if there is no base register present and scale is equal to 2;
the base register will be set to the index register and the scale will be reduced to 1.
That is: "lea r15, [2*rax]" -> "lea r15, [rax+1*rax]" -> 4c 8d 3c 00

- added a command-line option in asmline for setting
assembly mode to STRICT '$ asmline --strict-sib-no-base' where:
in SIB addressing when there is no base register present the index and scale would
not change regardless of scale value.
That is: "lea r15, [2*rax]" -> "lea r15, [2*rax]" -> 4c 8d 3c 45 00 00 00 00


version 1.1.0-release (2022-01-21)

- added support for scaled index addressing mode
with syntax [base + scale*index + offset] or
[base + index*scale + offset]
ex: "add rax, [rcx+4*rbp+0x8]" or "add rax, [rcx+rbp*4+0x8]"

- added support for MMX, SSE2, AVX, and AVX2 instruction sets

- added support for pointers: byte, word, dword, and qword

- added support for `mov [MEM], IMM` such as `mov [rsi], "0xADE1A1DE"`

BUG FIXES
- fixed segmentation fault when using count_chunk_brks with invalid file
- added different modes of register size handling for
"mov register, immediate" instruction: NASM, STRICT, SMART

- improved 32-bit register supportability:
using eax-r15b for memory addressing in vector extension now
produces the correct prefix (67H)

- improved intel x86_64 instruction supportability:
source code is now more inline with the intel manual, therefore
more instructions can be supported in the near future

- added a command-line option in asmline for setting
assembly mode to NASM '$ asmline -n' where:
if immediate size for mov is less than or equal to max signed 32-bit
asmline will emit code to mov to 32-bit register rather than 64-bit.
That is: "mov rax, 0x7fffffff" as "mov eax, 0x7fffffff" -> b8 ff ff ff 7f
note: rax got optimized to eax for faster immediate to register transfer and
produces a shorter instruction.

- added a command-line option in asmline for setting
assembly mode to STRICT '$ asmline -t' where:
ex: even if immediate size for mov is less than or equal to max signed 32-bit
pad the immediate to fit 64bit.
That is: "mov rax,0x7fffffff" as "mov rax,0x000000007fffffff"
-> 48 b8 ff ff ff 7f 00 00 00 00

- added a command-line option in asmline for setting
assembly mode to SMART '$ asmline -s' where:
asmline will check the immediate value for leading 0's and thus allows manual
optimizations. Immediate must be zero padded to 64-bits exactly to ensure
it will not optimize. This is currently set as default.
ex: "mov rax, 0x000000007fffffff" -> 48 b8 ff ff ff 7f 00 00 00 00
"mov rax, 0x7fffffff" -> b8 ff ff ff 7f

- assemblyline functions without asm_ prefix are now deprecated

- simplified the process for adding new instructions

- improvements to autotools, including use of pkg-config. That enables
other autotools-based projects to include assemblyline as such:
add to their `configure.ac`:
```
AC_SUBST([LIBASSEMBLYLINE_CFLAGS])
AC_SUBST([LIBASSEMBLYLINE_LIBS])
PKG_CHECK_MODULES([LIBASSEMBLYLINE], [assemblyline >= 1.1.0])
```

- tools/asmline dynamically links to libassemblyline.so

- tools/asmline's -r[=LEN] option now runs with six arguments, which point
to heap-memory. Each argumenty points to LEN (defaults to ten) 64-bit
elements. It also features now --rand, which initializes those pointers to
random data.

- added `asmline 1` man-page

- added man-pages for all the functions from libassemblyline (same content
as libassemblyline)


version 1.0.5-release (2021-09-29)

- FIX another bug where 'mov reg, imm' results in a wrong instruction


version 1.0.4-release (2021-09-27)

- FIX bug where 'mov reg, imm' is interpreted as movntqda

- updated documentation

- implemented register size optimization for mov instruction
to match nasm assembly style

- added man page

- added -v -h options to asmline


version 1.0.3-release (2021-09-16)

- FIX rdtsc

- ADD rdtscp,

- ADD ror (ri encoding only for now)

- code maintenance

- added library versioning


version 1.0.2-release (2021-09-09)

- added support for the prefetch instruction

- added support for all registers (prefetch)


version 1.0.1-release (2021-09-09)

- added support for the rorx instruction

- updated documentation


version 1.0.0-release (2021-08-26)

- initial release

© 2022 GitHub, Inc.
208 changes: 198 additions & 10 deletions LICENSE
Original file line number Diff line number Diff line change
@@ -1,13 +1,201 @@
Copyright 2021 University of Adelaide
Apache License
Version 2.0, January 2004
http://www.apache.org/licenses/

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION

http://www.apache.org/licenses/LICENSE-2.0
1. Definitions.

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
"License" shall mean the terms and conditions for use, reproduction,
and distribution as defined by Sections 1 through 9 of this document.

"Licensor" shall mean the copyright owner or entity authorized by
the copyright owner that is granting the License.

"Legal Entity" shall mean the union of the acting entity and all
other entities that control, are controlled by, or are under common
control with that entity. For the purposes of this definition,
"control" means (i) the power, direct or indirect, to cause the
direction or management of such entity, whether by contract or
otherwise, or (ii) ownership of fifty percent (50%) or more of the
outstanding shares, or (iii) beneficial ownership of such entity.

"You" (or "Your") shall mean an individual or Legal Entity
exercising permissions granted by this License.

"Source" form shall mean the preferred form for making modifications,
including but not limited to software source code, documentation
source, and configuration files.

"Object" form shall mean any form resulting from mechanical
transformation or translation of a Source form, including but
not limited to compiled object code, generated documentation,
and conversions to other media types.

"Work" shall mean the work of authorship, whether in Source or
Object form, made available under the License, as indicated by a
copyright notice that is included in or attached to the work
(an example is provided in the Appendix below).

"Derivative Works" shall mean any work, whether in Source or Object
form, that is based on (or derived from) the Work and for which the
editorial revisions, annotations, elaborations, or other modifications
represent, as a whole, an original work of authorship. For the purposes
of this License, Derivative Works shall not include works that remain
separable from, or merely link (or bind by name) to the interfaces of,
the Work and Derivative Works thereof.

"Contribution" shall mean any work of authorship, including
the original version of the Work and any modifications or additions
to that Work or Derivative Works thereof, that is intentionally
submitted to Licensor for inclusion in the Work by the copyright owner
or by an individual or Legal Entity authorized to submit on behalf of
the copyright owner. For the purposes of this definition, "submitted"
means any form of electronic, verbal, or written communication sent
to the Licensor or its representatives, including but not limited to
communication on electronic mailing lists, source code control systems,
and issue tracking systems that are managed by, or on behalf of, the
Licensor for the purpose of discussing and improving the Work, but
excluding communication that is conspicuously marked or otherwise
designated in writing by the copyright owner as "Not a Contribution."

"Contributor" shall mean Licensor and any individual or Legal Entity
on behalf of whom a Contribution has been received by Licensor and
subsequently incorporated within the Work.

2. Grant of Copyright License. Subject to the terms and conditions of
this License, each Contributor hereby grants to You a perpetual,
worldwide, non-exclusive, no-charge, royalty-free, irrevocable
copyright license to reproduce, prepare Derivative Works of,
publicly display, publicly perform, sublicense, and distribute the
Work and such Derivative Works in Source or Object form.

3. Grant of Patent License. Subject to the terms and conditions of
this License, each Contributor hereby grants to You a perpetual,
worldwide, non-exclusive, no-charge, royalty-free, irrevocable
(except as stated in this section) patent license to make, have made,
use, offer to sell, sell, import, and otherwise transfer the Work,
where such license applies only to those patent claims licensable
by such Contributor that are necessarily infringed by their
Contribution(s) alone or by combination of their Contribution(s)
with the Work to which such Contribution(s) was submitted. If You
institute patent litigation against any entity (including a
cross-claim or counterclaim in a lawsuit) alleging that the Work
or a Contribution incorporated within the Work constitutes direct
or contributory patent infringement, then any patent licenses
granted to You under this License for that Work shall terminate
as of the date such litigation is filed.

4. Redistribution. You may reproduce and distribute copies of the
Work or Derivative Works thereof in any medium, with or without
modifications, and in Source or Object form, provided that You
meet the following conditions:

(a) You must give any other recipients of the Work or
Derivative Works a copy of this License; and

(b) You must cause any modified files to carry prominent notices
stating that You changed the files; and

(c) You must retain, in the Source form of any Derivative Works
that You distribute, all copyright, patent, trademark, and
attribution notices from the Source form of the Work,
excluding those notices that do not pertain to any part of
the Derivative Works; and

(d) If the Work includes a "NOTICE" text file as part of its
distribution, then any Derivative Works that You distribute must
include a readable copy of the attribution notices contained
within such NOTICE file, excluding those notices that do not
pertain to any part of the Derivative Works, in at least one
of the following places: within a NOTICE text file distributed
as part of the Derivative Works; within the Source form or
documentation, if provided along with the Derivative Works; or,
within a display generated by the Derivative Works, if and
wherever such third-party notices normally appear. The contents
of the NOTICE file are for informational purposes only and
do not modify the License. You may add Your own attribution
notices within Derivative Works that You distribute, alongside
or as an addendum to the NOTICE text from the Work, provided
that such additional attribution notices cannot be construed
as modifying the License.

You may add Your own copyright statement to Your modifications and
may provide additional or different license terms and conditions
for use, reproduction, or distribution of Your modifications, or
for any such Derivative Works as a whole, provided Your use,
reproduction, and distribution of the Work otherwise complies with
the conditions stated in this License.

5. Submission of Contributions. Unless You explicitly state otherwise,
any Contribution intentionally submitted for inclusion in the Work
by You to the Licensor shall be under the terms and conditions of
this License, without any additional terms or conditions.
Notwithstanding the above, nothing herein shall supersede or modify
the terms of any separate license agreement you may have executed
with Licensor regarding such Contributions.

6. Trademarks. This License does not grant permission to use the trade
names, trademarks, service marks, or product names of the Licensor,
except as required for reasonable and customary use in describing the
origin of the Work and reproducing the content of the NOTICE file.

7. Disclaimer of Warranty. Unless required by applicable law or
agreed to in writing, Licensor provides the Work (and each
Contributor provides its Contributions) on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
implied, including, without limitation, any warranties or conditions
of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
PARTICULAR PURPOSE. You are solely responsible for determining the
appropriateness of using or redistributing the Work and assume any
risks associated with Your exercise of permissions under this License.

8. Limitation of Liability. In no event and under no legal theory,
whether in tort (including negligence), contract, or otherwise,
unless required by applicable law (such as deliberate and grossly
negligent acts) or agreed to in writing, shall any Contributor be
liable to You for damages, including any direct, indirect, special,
incidental, or consequential damages of any character arising as a
result of this License or out of the use or inability to use the
Work (including but not limited to damages for loss of goodwill,
work stoppage, computer failure or malfunction, or any and all
other commercial damages or losses), even if such Contributor
has been advised of the possibility of such damages.

9. Accepting Warranty or Additional Liability. While redistributing
the Work or Derivative Works thereof, You may choose to offer,
and charge a fee for, acceptance of support, warranty, indemnity,
or other liability obligations and/or rights consistent with this
License. However, in accepting such obligations, You may act only
on Your own behalf and on Your sole responsibility, not on behalf
of any other Contributor, and only if You agree to indemnify,
defend, and hold each Contributor harmless for any liability
incurred by, or claims asserted against, such Contributor by reason
of your accepting any such warranty or additional liability.

END OF TERMS AND CONDITIONS

APPENDIX: How to apply the Apache License to your work.

To apply the Apache License to your work, attach the following
boilerplate notice, with the fields enclosed by brackets "[]"
replaced with your own identifying information. (Don't include
the brackets!) The text should be enclosed in the appropriate
comment syntax for the file format. We also recommend that a
file or class name and description of purpose be included on the
same "printed page" as the copyright notice for easier
identification within third-party archives.

Copyright 2022 The University of Adelaide

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
281 changes: 269 additions & 12 deletions Makefile.am
Original file line number Diff line number Diff line change
@@ -1,19 +1,276 @@
SUBDIRS = src test tools
# run 'make check' from projects' root to have libassemblyline being compiled on change
# check_LTLIBRARIES = libassemblyline.la
EXTRA_DIST = autogen.sh
## Copyright 2022 University of Adelaide

## Licensed under the Apache License, Version 2.0 (the "License");
## you may not use this file except in compliance with the License.
## You may obtain a copy of the License at

## http://www.apache.org/licenses/LICENSE-2.0

## Unless required by applicable law or agreed to in writing, software
## distributed under the License is distributed on an "AS IS" BASIS,
## WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
## See the License for the specific language governing permissions and
## limitations under the License.

# to tell pkg-config what to place where
pkgconfig_DATA = assemblyline.pc
pkgconfigdir = $(libdir)/pkgconfig

ACLOCAL_AMFLAGS = -I m4 --install
AM_CFLAGS = -Wall -Wextra -std=gnu99
AM_CPPFLAGS = -I./src
CLEANFILES = assemblyline-*.tar.gz \
config.h~ \
configure~

##############################################################################################
##############################################################################################
############################### LIB ##############################################
##############################################################################################
##############################################################################################

# this ensures, that the lib is rebuild (if necessary) on make check
lib_LTLIBRARIES = libassemblyline.la
libassemblyline_la_SOURCES =
libassemblyline_la_LIBADD = src/libassemblyline.la
libassemblyline_la_SOURCES = \
src/assembler.c \
src/assembler.h \
src/assemblyline.c \
src/common.h \
src/encoder.c \
src/encoder.h \
src/enums.h \
src/instr_parser.c \
src/instr_parser.h \
src/instruction_data.h \
src/instructions.c \
src/instructions.h \
src/parser.c \
src/parser.h \
src/prefix.c \
src/prefix.h \
src/reg_parser.c \
src/reg_parser.h \
src/registers.h \
src/registers.c \
src/tokenizer.c \
src/tokenizer.h

include_HEADERS = src/assemblyline.h


# from 7.3 https://www.gnu.org/software/libtool/manual/html_node/Versioning.html#Versioning
# -version-info accepts ‘current[:revision[:age]]’

# 1. If the library SOURCE CODE has changed at all since the last update: revision++
# 2. If ANY INTERFACES have been ADDED since the last public release: current++, revision=0, age++.
# 3. If ANY INTERFACES have been REMOVED or CHANGED since the last public release: current++, revision=0, age=0.

# Hints
# 1. Programs using the previous version may use the new version as drop-in replacement, and programs using the new version can also work with the previous one. In other words, no recompiling nor relinking is needed. In this case, bump revision only, don’t touch current nor age.
# 2. Programs using the previous version may use the new version as drop-in replacement, but programs using the new version may use APIs not present in the previous one. In other words, a program linking against the new version may fail with “unresolved symbols” if linking against the old version at runtime: set revision to 0, bump current and age.
# 3. Programs may need to be changed, recompiled, and relinked in order to use the new version. Bump current, set revision and age to 0.

# in short, if only patch from configure.ac is bumped, bump the middle number below. If more is changed, read the above
libassemblyline_la_LDFLAGS = -version-info 3:5:2
##############################################################################################
##############################################################################################
############################### BINS ##############################################
##############################################################################################
##############################################################################################

bin_PROGRAMS = tools/asmline
LDADD = libassemblyline.la

# completion --start--
if ENABLE_BASH_COMPLETION
bashcompletiondir = $(BASH_COMPLETION_DIR)
dist_bashcompletion_DATA = data/completion/asmline
endif
if ENABLE_ZSH_COMPLETION
zshcompletiondir = $(ZSH_COMPLETION_DIR)
dist_zshcompletion_DATA = data/completion/_asmline
endif
# completion --end--

##############################################################################################
##############################################################################################
############################### MANS ##############################################
##############################################################################################
##############################################################################################

link_man = \
man/asm_create_instance.3 \
man/asm_destroy_instance.3 \
man/asm_assemble_str.3 \
man/asm_assemble_file.3 \
man/asm_assemble_string_counting_chunks.3 \
man/asm_set_chunk_size.3 \
man/asm_set_debug.3 \
man/asm_get_offset.3 \
man/asm_set_offset.3 \
man/asm_get_buffer.3 \
man/asm_get_code.3 \
man/asm_create_bin_file.3 \
man/asm_mov_imm.3 \
man/asm_sib_index_base_swap.3 \
man/asm_set_all.3

$(link_man): man/libassemblyline.3
test -e $@ || $(LN_S) libassemblyline.3 $@

dist_man1_MANS = man/asmline.1
dist_man3_MANS = man/libassemblyline.3 $(link_man)

##############################################################################################
##############################################################################################
############################### TEST ##############################################
##############################################################################################
##############################################################################################

# runner for 'asm'-tests
TEST_EXTENSIONS = .asm
ASM_LOG_COMPILER = ./test/al_nasm_compare.sh

# get the tap-driver
TEST_EXTENSIONS += .tap
TAP_LOG_DRIVER = env AM_TAP_AWK='$(AWK)' $(SHELL) \
$(top_srcdir)/build-aux/tap-driver.sh
TAP_LOG_COMPILER = ./test/tap/compiler.sh

# add TAP tests here
TEST_TAP = \
test/tap/call.tap \
test/tap/cmp.tap \
test/tap/imul.tap \
test/tap/lea.tap \
test/tap/misc.tap \
test/tap/movq.tap \
test/tap/nasm_incompatible.tap \
test/tap/vmovupd.tap \
test/tap/vmovdqu.tap \
test/tap/xor.tap

# those are the tests, where asmline is expected to fail, for the suite to be ok.
# eaf stands for 'expected asmline to fail'
TEST_EXTENSIONS += .eaf
EAF_LOG_DRIVER = env AM_TAP_AWK='$(AWK)' $(SHELL) \
$(top_srcdir)/build-aux/tap-driver.sh

EAF_LOG_COMPILER = ./test/tap/compiler.sh --no-nasm
TEST_EAF = \
test/eaf/imul.eaf \
test/eaf/misc.eaf


# In case you want a test to fail for the suite to pass, add the test here
XFAIL_TESTS= $(TEST_EAF) \
test/tap/nasm_incompatible.tap

# add SH-tests here
TEST_SH = test/tools/asmline.sh \
test/tools/asmlineP.sh


# add .c -tests here
TEST_C= \
test/check_chunk_counting \
test/invalid \
test/jump \
test/memory_reallocation \
test/optimization_disabled \
test/run \
test/vector_operations

CLEANFILES = config.h~ \
configure~ \
assemblyline-*.tar.gz
# add .asm-tests here
TEST_ASM = \
test/MOV_REG_IMM.asm \
test/adc.asm \
test/adcx.asm \
test/add.asm \
test/adox.asm \
test/and.asm \
test/bextr.asm \
test/bzhi.asm \
test/clc.asm \
test/clflush.asm \
test/cmp.asm \
test/cpuid.asm \
test/high_low_xmm.asm \
test/imul.asm \
test/jmp.asm \
test/lea.asm \
test/lea_no_base.asm \
test/mov.asm \
test/mov_reg_imm.asm \
test/mov_reg_imm32.asm \
test/movd.asm \
test/movntdqa.asm \
test/movntq.asm \
test/movq.asm \
test/movzx.asm \
test/mulx.asm \
test/neg.asm \
test/negative_mem_disp.asm \
test/no_ptr.asm \
test/no_operand.asm \
test/nop.asm \
test/not.asm \
test/or.asm \
test/paddb.asm \
test/paddd.asm \
test/paddq.asm \
test/paddw.asm \
test/pand.asm \
test/pmuldq.asm \
test/pmulhuw.asm \
test/pmulhw.asm \
test/pmulld.asm \
test/pmullq.asm \
test/pmullw.asm \
test/pmuludq.asm \
test/por.asm \
test/prefetch.asm \
test/psubb.asm \
test/psubd.asm \
test/psubq.asm \
test/psubw.asm \
test/ptr.asm \
test/push.asm \
test/pxor.asm \
test/rdtsc.asm \
test/rdtscp.asm \
test/ror.asm \
test/rorx.asm \
test/sal.asm \
test/sar.asm \
test/sarx.asm \
test/sbb.asm \
test/setc.asm \
test/setcc.asm \
test/seto.asm \
test/shl.asm \
test/shld.asm \
test/shlx.asm \
test/shr.asm \
test/shrd.asm \
test/shrx.asm \
test/sub.asm \
test/test.asm \
test/vaddpd.asm \
test/vector_add.asm \
test/vector_add_mem.asm \
test/vector_float_divide.asm \
test/vector_float_mul.asm \
test/vector_mul.asm \
test/vector_sub.asm \
test/vmovupd.asm \
test/vperm2i128.asm \
test/vsubpd.asm \
test/xabort.asm \
test/xchg.asm \
test/xor.asm \
test/zero_byte_rbp.asm

distclean-local:
rm -rf autom4te.cache/
# if needed, add utility programs, which should be build for the test, to check_PROGRAMS
check_PROGRAMS = $(bin_PROGRAMS) $(TEST_C)

TESTS = $(TEST_EAF) $(TEST_TAP) $(TEST_SH) $(TEST_ASM) $(TEST_C)
330 changes: 143 additions & 187 deletions README.md

Large diffs are not rendered by default.

9 changes: 9 additions & 0 deletions TROUBLESHOOTING.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
# ldconfig Issue
If you see below message at your first time using assemblyline:
` error while loading shared libraries: libassemblyline.so.1: cannot open shared object file: No such file or directory`
please run `$ sudo ldconfig` to update the shared library cache.


# Old CC versions
We use the `__attribute__((deprecated("TEXt")))` in `assemblyline.{c,h}`. GCC >= `4.6.4` and Clang >= `3.4.1` support that.

4 changes: 4 additions & 0 deletions _config.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
theme: jekyll-theme-cayman
title: "AssemblyLine"
description: The lightweight in-memory assembler.
show_downloads: true
32 changes: 32 additions & 0 deletions action.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
name: 'AssemblyLine'
description: 'Will install dependencies to build AssemblyLine, build and install AssemblyLine.'

runs:
using: "composite"
steps:
- uses: actions/checkout@v3
with:
repository: 0xADE1A1DE/AssemblyLine
ref: v1.3.2

- name: Install dependencies
shell: bash
run: sudo apt install -y autoconf curl gcc libtool make nasm pkg-config tar

- name: Run autogen
shell: bash
run: ./autogen.sh

- name: Run Configure
shell: bash
run: ./configure CFLAGS=-O3

- name: Build
shell: bash
run: make all

- name: install
shell: bash
run: |
sudo make install
sudo ldconfig
10 changes: 10 additions & 0 deletions assemblyline.pc.in
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
prefix=@prefix@
exec_prefix=@exec_prefix@
libdir=@libdir@
includedir=@includedir@

Name: Assemblyline
Description: A C library and binary for generating machine code of x86_64 assembly language and executing on the fly.
Version: @VERSION@
Libs: -L${libdir} -lassemblyline
Cflags: -I${includedir}/
5 changes: 5 additions & 0 deletions compile_flags.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
-I./src
-I.
-Wall
-Wextra
-std=gnu99
53 changes: 47 additions & 6 deletions configure.ac
Original file line number Diff line number Diff line change
@@ -2,10 +2,14 @@
# Process this file with autoconf to produce a configure script.

AC_PREREQ([2.69])
AC_INIT([assemblyline],[1.0.1],[yval@cs.adelaide.edu.au])
AC_INIT([assemblyline],[1.3.2],[yval@cs.adelaide.edu.au])
AC_CONFIG_HEADERS([config.h])
AC_CONFIG_SRCDIR([src/assemblyline.c])
AC_CONFIG_AUX_DIR([build-aux])

# for the Test-Anything Protocol (TAP)
AC_REQUIRE_AUX_FILE([tap-driver.sh])

AC_CONFIG_MACRO_DIR([m4])

# Checks for programs.
@@ -14,12 +18,14 @@ AC_PROG_CC
AC_PROG_CC_STDC
AC_PROG_INSTALL
AM_PROG_AR
AC_PROG_LN_S


#defines LIBTOOL variable
LT_INIT

# Checks for header files.
AC_CHECK_HEADERS([fcntl.h inttypes.h stdint.h stdlib.h string.h strings.h unistd.h])
AC_CHECK_HEADERS([fcntl.h inttypes.h stdint.h stdlib.h string.h strings.h unistd.h time.h])

# Checks for typedefs, structures, and compiler characteristics.
AC_CHECK_HEADER_STDBOOL
@@ -28,17 +34,52 @@ AC_TYPE_UINT32_T
AC_TYPE_UINT64_T
AC_TYPE_UINT8_T



# completion --start--
# bash
AC_ARG_WITH([bash-completion-dir],
AS_HELP_STRING([--with-bash-completion-dir[=PATH]],
[Install the bash auto-completion script in this directory. @<:@default=yes@:>@]),
[],
[with_bash_completion_dir=yes])

if test "x$with_bash_completion_dir" = "xyes"; then
PKG_CHECK_MODULES([BASH_COMPLETION], [bash-completion >= 2.0],
[BASH_COMPLETION_DIR="`pkg-config --variable=completionsdir bash-completion`"],
[BASH_COMPLETION_DIR="$datadir/bash-completion/completions"])
else
BASH_COMPLETION_DIR="$with_bash_completion_dir"
fi
AC_SUBST([BASH_COMPLETION_DIR])
AM_CONDITIONAL([ENABLE_BASH_COMPLETION],[test "x$with_bash_completion_dir" != "xno"])
# end bash
# start zsh
AC_ARG_WITH([zsh-completion-dir],
AS_HELP_STRING([--with-zsh-completion-dir[=PATH]],
[Install the zsh auto-completion script in this directory. @<:@default=yes@:>@]),
[],
[with_zsh_completion_dir=yes])

if test "x$with_zsh_completion_dir" = "xyes"; then
ZSH_COMPLETION_DIR="$datadir/zsh/site-functions"
else
ZSH_COMPLETION_DIR="$with_zsh_completion_dir"
fi
AC_SUBST([ZSH_COMPLETION_DIR])
AM_CONDITIONAL([ENABLE_ZSH_COMPLETION],[test "x$with_zsh_completion_dir" != "xno"])
# end zsh
# completion --end--

# Checks for library functions.
AC_FUNC_MALLOC
AC_FUNC_MMAP
AC_CHECK_FUNCS([munmap strchr strstr strtol strtoul])
AC_CHECK_FUNCS([munmap strchr strstr strtol strtoul rand])

AM_INIT_AUTOMAKE([-Wall -Werror foreign subdir-objects])

AC_CONFIG_FILES([
Makefile
src/Makefile
test/Makefile
tools/Makefile
assemblyline.pc
])
AC_OUTPUT
31 changes: 31 additions & 0 deletions data/completion/_asmline
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
#compdef asmline

_arguments -S -s \
'(H --rand)--rand[runs the code and initializes memory with random data _rdi--r9 can be dereferenced_.]:random heap:' \
'(H -r --return)'{-r=-,--return=-}'[runs assembled code]:number of elements:' \
'(H -p --print)'{-p,--print}'[print to stdout in ASCII-hex.]' \
'(H -P --printfile -o --object)'{-P+,--printfile+}'[write raw binary into FILE]:filename:_files' \
'(H -P --printfile -o --object)'{-o+,--object+}'[write raw binary machinecode to FILE.bin]:filename:_files' \
'(H -c --chunk)'{-c+,--chunk+}'[set (write) chunk size. Will NOP-pad every chunk]:size of chunks to pad to:' \
'(H -b --breaks)'{-b+,--breaks+}'[set (read) chunk size. Counts how many chunks break a boundary.]' \
+ '(mov)' \
"(H)--nasm-mov-imm[nasm mov imm: mov to 32-bit reg if possible.]" \
"(H)--smart-mov-imm[smart mov imm: if 64-bit padded, mov to 64-bit reg, to 32-bit otherwise.]" \
"(H)--strict-mov-imm[strict mov imm: always mov to 64-bit reg.]" \
+ '(sib_index)' \
"(H)--nasm-sib-index-base-swap[nasm swap sib; 'lea r15, \[rax+rsp\]' -> 'lea r15, \[rsp+rax\]']" \
"(H)--strict-sib-index-base-swap[no swap sib; 'lea r15, \[rax+rsp\]' as is.]" \
+ '(sib_nobase)' \
"(H)--nasm-sib-no-base[nasm scale sib; 'lea r15, \[2*rax\]' -> 'lea r15, \[rax+1*rax\]']" \
"(H)--strict-sib-no-base[no scale sib; 'lea r15, \[2*rax\]' as is.]" \
+ '(sib_total)' \
"(H sib_index sib_nobase)--nasm-sib[implies --nasm-sib-{no-base,index-base-swap}]" \
"(H sib_index sib_nobase)--strict-sib[implies --strict-sib-{no-base,index-base-swap}]" \
+ '(mode_total)' \
"(H sib_total mov)"{--nasm,-n}'[implies --nasm-{mov-imm,sib-{no-base,index-base-swap}}]' \
"(H sib_total mov)"{--strict,-t}'[implies --strict-{mov-imm,-{no-base,index-base-swap}}]' \
+ '(H)' \
'(* - input)'{-v,--version}'[Prints version information to stdout and exits.]' \
'(* - input)'{-h,--help}'[Prints usage information to stdout and exits.]' \
+ '(input)' \
':assembly file:_files -g \*.\(asm\|s\|S\)' && ret=0
40 changes: 40 additions & 0 deletions data/completion/asmline
Original file line number Diff line number Diff line change
@@ -0,0 +1,40 @@
#!/usr/bin/env bash
_asmline() {

local current="${COMP_WORDS[COMP_CWORD]}"
local options="
--breaks
--chunk
--help
--nasm
--nasm-mov-imm
--nasm-sib
--nasm-sib-index-base-swap
--nasm-sib-no-base
--object
--print
--printfile
--rand
--return
--smart-mov-imm
--strict
--strict-mov-imm
--strict-sib
--strict-sib-index-base-swap
--strict-sib-no-base
--version
-P
-b
-c
-h
-n
-o
-p
-r
-t
-v
"
mapfile -t COMPREPLY < <(compgen -W "${options}" -- "${current}")
}

complete -F _asmline asmline
148 changes: 148 additions & 0 deletions man/asmline.1
Original file line number Diff line number Diff line change
@@ -0,0 +1,148 @@
.TH ASMLINE 1 2022-01-21 GNU

.SH NAME
asmline \- assemblyline tool

.SH SYNOPSIS
.B asmline
[OPTIONS]...
path/to/file.asm

.SH DESCRIPTION
.B asmline
Generates machine code from a file or stdin containing x86_64 assembly instructions. Machine code could be directly executed without the need for an executable file format.

.SH OPTIONS

.TP
.BR \-r[=LEN] ", " \-\-return[=LEN]
Assembles given code. Then executes it with six parameters to heap-allocated memory. Each pointer points to an array of LEN 64-bit elements which can be dereferenced in the asm-code, where LEN defaults to 10. After execution, it prints out the contents of the return (rax) register and frees the heap-memory.

.TP
.BR \-\-rand
Implies -r and will additionally initialize the memory from with random data.
.br
-r=11 can be used to alter LEN.

.TP
.BR \-p ", " \-\-print
The corresponding machine code will be printed to stdout in hex form. Output is similar to `objdump`: Byte-wise delimited by space and linebreaks after 7 bytes. If-c is given, the chunks are delimited by '|' with each chunk on one line.

.TP
.BR \-P ", " \-\-printfile " " \fIFILENAME
The corresponding machine code will be printed to \fIFILENAME\fR in binary form. Can be set to '/dev/stdout' to write to stdout.

.TP
.BR \-c ", " \-\-chunk " " \fICHUNK_SIZE>1
Sets a given \fICHUNK_SIZE\fR boundary in bytes. Nop padding will be used to ensure no instruction opcode will cross the specified \fICHUNK_SIZE\fR boundary.

.TP
.BR \-b ", " \-\-breaks " " \fICHUNK_BOUNDARY>1

Given a \fICHUNK_BOUNDARY\fR asmline will count the number of instructions where their opcode crosses the specified \fICHUNK_BOUNDARY\fR size in bytes.

.TP
.BR \-o ", " \-\-object " " \fIFILENAME
Generates a binary file from path/to/file.asm called \fIFILENAME\fR.bin in the current directory.

.TP
.BR \-\-nasm\-mov\-imm
Enables nasm-style mov-immediate register-size handling. where if immediate size for mov is less than or equal to max signed 32-bit assemblyline will emit code to mov to 32-bit register rather than 64-bit.
.br
\fBThat is:\fR "mov rax, 0x7fffffff" as "mov eax, 0x7fffffff" -> b8 ff ff ff 7f
.br
\fBNOTE:\fR rax got optimized to eax for faster immediate to register transfer and produces a shorter instruction.

.TP
.BR \-\-strict\-mov\-imm
Enables strict mov-imm thereby disabling nasm-style mov-immediate register-size handling.
.br
\fBex:\fR even if immediate size for mov is less than or equal to max signed 32-bit
.br
assemblyline will pad the immediate to fit 64bit.
.br
\fBThat is:\fR "mov rax,0x7fffffff" as "mov rax,0x000000007fffffff"
.br
-> 48 b8 ff ff ff 7f 00 00 00 00

.TP
.BR \-\-smart\-mov\-imm
Assemblyline will check the immediate value for leading 0's and thus allows manual optimizations. Immediate must be zero padded to 64-bits exactly to ensure it will not optimize. This is currently set as default.
.br
\fBex:\fR "mov rax, 0x000000007fffffff" -> 48 b8 ff ff ff 7f 00 00 00 00
.br
"mov rax, 0x7fffffff" -> b8 ff ff ff 7f


.TP
.BR \-\-nasm\-sib\-index\-base\-swap
In SIB addressing if the index register is esp or rsp then the base and index registers will be swapped as this is because the stack pointer register is not scalable in SIB. This is currently set as default.
.br
\fBThat is:\fR "lea r15, [rax+rsp]" interpreted as "lea r15, [rsp+rax]"
.br
-> 4c 8d 3c 04
.br
"lea r15, [rsp+rsp]" will produce an error because
.br
base and index cannot be swapped)

.TP
.BR \-\-strict\-sib\-index\-base\-swap
In SIB addressing the base and index registers will not be swapped even if the index register is esp or rsp.
.br
\fBThat is:\fR "lea r15, [rax+rsp]" will be interpreted as "lea r15, [rax+riz]"
.br
-> 4c 8d 3c 20
.br
\fBNOTE:\fR riz is a pseudo-register evaluated by GCC to constant 0 and therefore cannot be used in assemblyline as a string ie. assembling "lea r15, [rax+riz]" is invalid "lea r15, [rsp+rsp]" will produce an error (invalid instruction)


.TP
.BR \-\-nasm\-sib\-no\-base
In SIB addressing if there is no base register present and scale is equal to 2; the base register will be set to the index register and the scale will be reduced to 1.
.br
\fBThat is:\fR "lea r15, [2*rax]" -> "lea r15, [rax+1*rax]"
.br
-> 4c 8d 3c 00

.TP
.BR \-\-strict\-sib\-no\-base
In SIB addressing when there is no base register present the index and scale would not change regardless of scale value.
.br
\fBThat is:\fR "lea r15, [2*rax]" -> "lea r15, [2*rax]"
.br
-> 4c 8d 3c 45 00 00 00 00

.TP
.BR \-\-nasm\-sib
Equivalent to \fB--nasm-sib-no-base\fR \fB--nasm-sib-index-base-swap\fR.
.br

.TP
.BR \-\-strict\-sib
Equivalent to \fB--strict-sib-no-base\fR \fB--strict-sib-index-base-swap\fR.
.br

.TP
.BR \-n ", " \-\-nasm
Equivalent to \fB--nasm-mov-imm\fR \fB--nasm-sib-index-base-swap\fR \fB--nasm-sib-no-base\fR.
.br

.TP
.BR \-t ", " \-\-strict
Equivalent to \fB--strict-mov-imm\fR \fB--strict-sib-index-base-swap\fR \fB--strict-sib-index-base-swap\fR.

.TP
.BR \-s ", " \-\-smart
Equivalent to \fB--smart-mov-imm\fR.
.br

.TP
.BR \-h ", " \-\-help
Prints usage information to stdout and exits.
.TP
.BR \-v ", " \-\-version
Prints version information to stdout and exits.

.SH SEE ALSO
.B libassemblyline(3)
172 changes: 172 additions & 0 deletions man/libassemblyline.3
Original file line number Diff line number Diff line change
@@ -0,0 +1,172 @@
.TH ASSEMBLYLINE 3 2022-01-21 GNU

.SH NAME
assemblyline \- C library functions for generating machine code of intel x86_64 assembly language without invoking another compiler, assembler or linker

.SH DESCRIPTION
This library provides a method to generate machine code from intel x86_64 assembly instructions on-the-fly. Machine code can be written to a buffer passed as argument for \fBasm_create_instance(3)\fR or an interal dynamically allocated buffer within assemblyline. An external buffer must be greater than 20 bytes. An error will the thrown if written length exceeds (buffer length - 20). There is no size limit when using an internal buffer.
.br
\fBNOTE:\fR assemblyline does not check for mismatch operand size/type when using pointers
.br
ie. mov qword [rbp], al will be intepreted as mov qword [rbp], rax

.SH SYNOPSIS
.TP
.BR #include " "<assemblyline.h>
.TP
.BI "assemblyline_t asm_create_instance(uint8_t *" buffer ", int " len );
Allocates an instance of assemblyline_t and attaches a pointer to a memory \fIbuffer\fR where machine code will be written to. Buffer length will be specified by \fIlen\fR.
.br
\fBNOTE:\fR \fIbuffer\fR could also be set to NULL for internal memory allocation. In this case \fIlen\fR would be irrelevant and could be set to any number.

.TP
.BI "int asm_destroy_instance(assemblyline_t " instance );
Frees all memory associated with \fIinstance\fR. Returns EXIT_SUCCESS or EXIT_FAILURE.

.TP
.BI "int asm_assemble_str(assemblyline_t " al ", const char *" assembly_str );
Assembles the given string \fIassembly_str\fR containing valid x64 assembly code with instance \fIal\fR. It writes the corresponding machine code to the memory location specified by the buffer associated with \fIal\fR. Returns EXIT_SUCCESS or EXIT_FAILURE.

.TP
.BI "int asm_assemble_file(assemblyline_t " al ", char *" asm_file );
Assembles the given file path \fIasm_file\fR containing valid x64 assembly code with instance \fIal\fR. It writes the corresponding machine code to the memory location specified by the buffer associated with \fIal\fR. Returns EXIT_SUCCESS or EXIT_FAILURE.

.TP
.BI "int asm_assemble_string_counting_chunks(assemblyline_t " al ", char *" str ", int " chunk_size ", int *" dest );
Assembles the given null-terminated string \fIstr\fR with instance \fIal\fR. It counts the number of instructions that break the chunk boundary of size \fIchunk_size\fR and saves it to \fIdest\fR. It does not nop-pad by default, depends on instance \fIal\fR (you can nop-pad and count different chunk breaks). Returns EXIT_SUCCESS or EXIT_FAILURE.

.br
\fBNOTE:\fR you cannot pass const char* as \fIstr\fR, it will segfault, because string will be altered.

.TP
.BI "int asm_assemble_file_counting_chunks(assemblyline_t " al ", char *" asm_file ", int " chunk_size ", int *" dest );
Assembles the given file \fIasm_file\fR with instance \fIal\fR. It counts the number of instructions that break the chunk boundary of size \fIchunk_size\fR and saves it to \fIdest\fR. It does not nop-pad by default, depends on instance \fIal\fR (you can nop-pad and count different chunk breaks).

.TP
.BI "void asm_set_chunk_size(assemblyline_t " al ", size_t " chunk_size );
Sets a given chunk size boundary \fIchunk_size\fR in bytes with instance \fIal\fR. When called before assemble_str() or assemble_file() assemblyline will ensure no instruction opcode will cross the specified \fIchunk_size\fR boundary via nop padding.
.br
\fBNOTE:\fR \fIchunk_size\fR must be greater than 2 in order to be classified as a valid memory chunk boundary size.

.TP
.BI "void asm_set_debug(assemblyline_t " al ", bool " debug );
Set debug flag \fIdebug\fR to true or false with instance \fIal\fR. When is set \fIdebug\fR to true machine code represented in hexidecimal will be printed to stdout.

.TP
.BI "int asm_get_offset(assemblyline_t " al );
Returns the offset associated with \fIal\fR.

.TP
.BI "void asm_set_offset(assemblyline_t " al ", int "offset );
Sets a memory \fIoffset\fR to specify exact location in memory block for writting machine code with instance \fIal\fR\.
.br
\fBNOTE:\fR \fIoffset\fR could be set to 0 for the resulting memory block.

.TP
.BI "void *asm_get_code(assemblyline_t " al );
Returns the buffer associated with \fIal\fR as type void* for easy typecasting to any function pointer format.

.TP
.BI "uint8_t *asm_get_buffer(assemblyline_t " al );
Returns the buffer associated with \fIal\fR (DEPRECATED: use \fBasm_get_code(3)\fR instead).

.TP
.BI "int asm_create_bin_file(assemblyline_t " al ", char *" file_name );
Generates a binary file \fIfile_name\fR from assembled machine code up to the memory offset of the current instance \fIal\fR. Returns EXIT_SUCCESS or EXIT_FAILURE.

.TP
.BI "void asm_mov_imm(assemblyline_t " al ", enum asm_opt "option );
Setting \fIoption\fR to STRICT disables nasm-style mov-immediate register-size handling. where even if immediate size for mov is less than or equal to max signed 32 bit assemblyline will pad the immediate to fit 64bit.
.br
\fBThat is:\fR "mov rax,0x7fffffff" as "mov rax,0x000000007fffffff"
.br
-> 48 b8 ff ff ff 7f 00 00 00 00

.br
Setting \fIoption\fR to NASM enables nasm-style mov-immediate register-size handling where if the immediate size for mov is less than or equal to max signed 32 bit assemblyline will emit code to mov to 32-bit register rather than 64-bit.
.br
\fBThat is:\fR "mov rax, 0x7fffffff" as "mov eax, 0x7fffffff" -> b8 ff ff ff 7f

.br
Setting \fIoption\fR to SMART, Assemblyline will check the immediate value for leading 0's and thus allows manual handlings. This is currently set as default.
.br
\fBex:\fR "mov rax, 0x000000007fffffff" -> 48 b8 ff ff ff 7f 00 00 00 00
.br
"mov rax, 0x7fffffff" -> b8 ff ff ff 7f

.br
.br
Setting \fIoption\fR to any other value results in an no-operation function.

.TP
.BI "void asm_sib_index_base_swap(assemblyline_t " al ", enum asm_opt "option );
Since the stack pointer register is non-scalable in SIB, Nasm will swap the base and index register if the stack pointer register is used as the index.

.br
Setting \fIoption\fR to STRICT disables Nasm SIB handling.
.br
\fBThat is:\fR "lea r15, [rax+rsp]" will be interpreted as "lea r15, [rax+riz]"
.br
-> 4c 8d 3c 20
.br
\fBNOTE:\fR riz is a pseudo-register evaluated by GCC to constant 0 and therefore cannot be used in assemblyline as a string ie. assembling "lea r15, [rax+riz]" is invalid "lea r15, [rsp+rsp]" will produce an error (invalid instruction)

.br
Setting \fIoption\fR to NASM enables Nasm SIB handling. This is currently set as default.
.br
\fBThat is:\fR "lea r15, [rax+rsp]" will be interpreted as "lea r15, [rsp+rax]"
-> 4c 8d 3c 04
.br
"lea r15, [rsp+rsp]" will produce an error because
.br
base and index cannot be swapped

.br
Setting \fIoption\fR to SMART or any other value results in an no-operation function.

.TP
.BI "void asm_sib_no_base(assemblyline_t " al ", enum asm_opt "option );
In SIB, when no base register is present and the scale is equal to 2; NASM will set the base to the index register and reduce the scale by 1. \fBex:\fR [2*rax] -> [rax+1*rax]

.br
Setting \fIoption\fR to STRICT disables Nasm SIB handling with no base.
.br
\fBThat is:\fR "lea r15, [2*rax]" will be interpreted as is
.br
-> 4c 8d 3c 45 00 00 00 00

.br
Setting \fIoption\fR to NASM enables Nasm SIB handling with no base. This is currently set as default.
.br
\fBThat is:\fR "lea r15, [2*rax]" will be interpreted as "lea r15, [rax+1*rax]"
.br
-> 4c 8d 3c 00

.br
Setting \fIoption\fR to SMART or any other value results in an no-operation function.

.TP
.BI "void asm_sib(assemblyline_t " al ", enum asm_opt "option );
Setting \fIoption\fR to STRICT is equivalent to calling both \fBasm_sib_index_base_swap(al,STRICT)\fR and \fBasm_sib_no_base(al,STRICT)\fR.

.br
Setting \fIoption\fR to NASM is equivalent to calling both \fBasm_sib_index_base_swap(al,NASM)\fR and \fBasm_sib_no_base(al,NASM)\fR.

.br
Setting \fIoption\fR to SMART or any other value results in an no-operation function.

.TP
.BI "void asm_set_all(assemblyline_t " al ", enum asm_opt "option );
Setting \fIoption\fR to STRICT is equivalent to calling \fBasm_sib_index_base_swap(al,STRICT)\fR, \fBasm_sib_no_base(al,STRICT)\fR, and \fBasm_mov_imm(al,STRICT)\fR.

.br
Setting \fIoption\fR to NASM is equivalent to calling \fBasm_sib_index_base_swap(al,NASM)\fR, \fBasm_sib_no_base(al,NASM)\fR, and \fBasm_mov_imm(al,NASM)\fR.

.br
Setting \fIoption\fR to SMART is equivalent to calling \fBasm_mov_imm(al,SMART)\fR.

.br
Setting \fIoption\fR to any other value results in an no-operation function.

.SH SEE ALSO
.B asmline(1)
27 changes: 0 additions & 27 deletions src/Makefile.am

This file was deleted.

67 changes: 67 additions & 0 deletions src/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,67 @@
# How to add new instructions

1. Get the instruction opcode layout and operand encoding format (please refer to the [intel manual](https://www.intel.com/content/dam/www/public/us/en/documents/manuals/64-ia-32-architectures-software-developer-instruction-set-reference-manual-325383.pdf)).
1. Add the new instruction to the asm\_instr enumerator set found in the [enums.h](/src/enums.h).
1. Add a new entry to INSTR\_TABLE[] [instructions.c](/src/instructions.c) while maintaining alphabetical order

#### Instruction table format:
```c
struct INSTR_TABLE[] {

/* null-terminated string representation of an instruction ex: "mov"
* subsequent instructions of the same name with a different operand
* encoding will be placed contiguously with the first instance of the
* instuction and will have the '\0' string
*/
char instr_name[MAX_INSTR_LEN];

// asm_instr enumerator for uniquely identifying a instruction
int name;

/* contains the valid operand formats for an instruction that maps to the same operand
* enccoding (at most 2 for a single operand encoding)
* ex: rr (instruction reg,reg) && rm (instruction reg, [mem]) both maps to RM
* when both operand formats are set to NA.
* ex: {NA,NA}, this denotes that the instruction cannot be accessed during the
* parsing phase rather it could only be accessed during encoding by incrementing
* the INSTR_TABLE[key] index using key++, therefore ordering is important for
* this type of entry.
*/
int opd_format[VALID_OPERAND_FORMATS];

/* operand encoding format as an enumerator (determines how instruction
* operands will be encoded) in assemblyline the 'I' character op/en will be
* ignored unless it is standalone ex: MI -> M , RMI -> RM , I -> I
*/
operand_encoding encode_operand;

/* enumerator for defining the semantic type of an instruction
* where special encoding is required ( currently, only applicable for
* SHIFT, DATA_TRANSFER, and CONTROL_FLOW type instructions) else set this to 'OTHER'
*/
instr_type type;

/* 'i' index of opcode[i] when a byte changes in the opcode depending
* on the register size for the instruction
* (set this value to NA if not applicable to the instruction)
*/
int op_offset_i;

/* used for instructions with a single register operand denoted as '/digit'
* in the intel manual section 3.1.1.1
* (set this value to NA if not applicable to the instruction)
*/
int single_reg_r;

// number of bytes in the opcode[MAX_OPCODE_LEN] field
int instr_size;

/* opcode layout for an instruction ex: {REX,0x0f+rd,0xa9,REG}
* REX VEX, and REG are placeholders for prefix and register values for encoding of
* VEX prefix please refer to the intel manual section 2.3.5 as well as common.h
* '#define VEX(vvvv,L,pp,mmmmm,WIG)' and enums.h 'opcode_encoding'.
* '+rd' refers to '+rb, +rw, +rd, +ro' in intel manual section 3.1.1.1
*/
unsigned int opcode[MAX_OPCODE_LEN];
}
```
361 changes: 227 additions & 134 deletions src/assembler.c

Large diffs are not rendered by default.

12 changes: 6 additions & 6 deletions src/assembler.h
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
/**
* Copyright 2021 University of Adelaide
* Copyright 2022 University of Adelaide
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
@@ -26,12 +26,12 @@
* writes nop instruction with length @param nop_pad_len at pointer location
* @param buff
*/
int nop_padding(uint8_t *buf, int nop_pad_len);
unsigned int nop_padding(uint8_t *buf, unsigned int nop_pad_len);

/**
* assembles the prefix, opcode, memory displacement, and immediate of a @param
* single_instr at pointer location @param ptr
* assembles the prefix, opcode, memory displacement, and immediate of a
* @param instruc at pointer location @param ptr
*/
int assemble_asm(struct instr *single_instr, uint8_t *dest);
unsigned int assemble_asm(struct instr *instruc, uint8_t *dest);

#endif
#endif
205 changes: 187 additions & 18 deletions src/assemblyline.c
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
/**
* Copyright 2021 University of Adelaide
* Copyright 2022 University of Adelaide
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
@@ -28,33 +28,63 @@
#include <sys/mman.h>
#include <sys/stat.h>

/**
* called when an instance of @param al is created and maps the index of
* INSTR_TABLE[] where the first occurrence of each letter of the alphabet to
* instr_index_table for more efficient instruction lookup
*/
static void asm_build_index_tables() {
// INSTR_TABLE index starts at the SKIP entry
int i = 2;
char previous_char = 'a' - 1;
while (INSTR_TABLE[++i].name != NA) {
if (INSTR_TABLE[i].instr_name[0] != '\0') {
if (previous_char != INSTR_TABLE[i].instr_name[0])
instr_table_index[INSTR_TABLE[i].instr_name[0] - 'a'] = i;
previous_char = INSTR_TABLE[i].instr_name[0];
}
}
// create an index table from OPD_FORMAT_TABLE
i = 0;
previous_char = '\0';
while (OPD_FORMAT_TABLE[++i].val != opd_error) {
if (previous_char != OPD_FORMAT_TABLE[i].str[0]) {
opd_format_table_index[OPD_FORMAT_TABLE[i].str[0] - 'a'] = i;
previous_char = OPD_FORMAT_TABLE[i].str[0];
}
}
}

assemblyline_t asm_create_instance(uint8_t *buffer, int len) {

assemblyline_t al = malloc(sizeof(struct assemblyline));
al->offset = 0;
al->assembly_opt = DEFAULT;
// allocate buffer internally if not directly given
if (buffer == NULL) {
al->external = false;
al->buffer_len = MEM_BUFFER + BUFFER_TOLERANCE;
al->buffer = mmap(NULL, sizeof(uint8_t) * al->buffer_len,
PROT_READ | PROT_WRITE | PROT_EXEC,
MAP_ANONYMOUS | MAP_PRIVATE, -1, 0);
if (al->buffer == MAP_FAILED) {
int *ret = mmap(NULL, sizeof(uint8_t) * al->buffer_len,
PROT_READ | PROT_WRITE | PROT_EXEC,
MAP_ANONYMOUS | MAP_PRIVATE, -1, 0);
if (ret == MAP_FAILED) { // NOLINT
fprintf(stderr, "failed to allocate internal memory buffer\n");
perror("Error: ");
free(al);
return NULL;
}
al->buffer = (uint8_t *)ret;
} else {
al->external = true;
al->buffer_len = len;
al->buffer = buffer;
}
al->assembly_mode = ASSEMBLE;
al->chunk_size = none;
al->chunk_size = NONE;
al->chunk_size++;
al->debug = false;
al->finalized = false;
asm_build_index_tables();
return al;
}

@@ -69,13 +99,18 @@ int asm_destroy_instance(assemblyline_t instance) {

// checks the minimum buffer length requirement 20 bytes at least
static int check_buffer_len(int buffer_len) {

FAIL_IF_MSG(buffer_len < BUFFER_TOLERANCE,
"insufficient buffer size: ensure length of buffer is at least "
"20 bytes\n");
return EXIT_SUCCESS;
}

int assemble_str(assemblyline_t al, const char *assembly_str) {
return asm_assemble_str(al, assembly_str);
}

int asm_assemble_str(assemblyline_t al, const char *assembly_str) {

al->finalized = false;
// check minimum buffer length requirement
@@ -89,6 +124,11 @@ int assemble_str(assemblyline_t al, const char *assembly_str) {

int assemble_string_counting_chunks(assemblyline_t al, char *str,
int chunk_size, int *dest) {
return asm_assemble_string_counting_chunks(al, str, chunk_size, dest);
}

int asm_assemble_string_counting_chunks(assemblyline_t al, char *str,
int chunk_size, int *dest) {
al->assembly_mode = CHUNK_COUNT;
if (chunk_size < 2)
al->assembly_mode = ASSEMBLE;
@@ -101,23 +141,50 @@ int assemble_string_counting_chunks(assemblyline_t al, char *str,
return EXIT_SUCCESS;
}

int assemble_file(assemblyline_t al, char *asm_file) {
static void *asm_mmap_file(char *asm_file, size_t *str_len) {
// open file for reading
int fd = open(asm_file, O_RDONLY, S_IRUSR | S_IRUSR);
FAIL_SYS(fd == -1, "failed to open file\n");

// NOLINTNEXTLINE
FAIL_SYS(fd == -1, "failed to open file\n", MAP_FAILED);
struct stat file_stat;
FAIL_SYS(fstat(fd, &file_stat), "failed to get file stats\n");

// NOLINTNEXTLINE
FAIL_SYS(fstat(fd, &file_stat), "failed to get file stats\n", MAP_FAILED);
// map file contents to a string
size_t str_len = file_stat.st_size;
const char *assembly_str =
mmap(NULL, file_stat.st_size, PROT_READ, MAP_PRIVATE, fd, 0);
*str_len = file_stat.st_size;
void *str = mmap(NULL, *str_len, PROT_READ, MAP_PRIVATE, fd, 0);
close(fd);
FAIL_SYS(assembly_str == MAP_FAILED, "failed to allocate memory via mmap\n");
int exit_status = assemble_str(al, assembly_str);
return str;
}

int asm_assemble_file_counting_chunks(assemblyline_t al, char *asm_file,
int chunk_size, int *dest) {

size_t str_len = 0;
char *str = asm_mmap_file(asm_file, &str_len);
// NOLINTNEXTLINE
FAIL_SYS(str == MAP_FAILED, "mmap failed to read file\n", EXIT_FAILURE);
int exit = asm_assemble_string_counting_chunks(al, str, chunk_size, dest);
// free mmap memory used for reading file
FAIL_SYS(munmap((void *)str, str_len) == -1, "munmap failed\n", EXIT_FAILURE);
return exit;
}

int assemble_file(assemblyline_t al, char *asm_file) {
return asm_assemble_file(al, asm_file);
}

int asm_assemble_file(assemblyline_t al, char *asm_file) {

size_t str_len = 0;
const char *str = asm_mmap_file(asm_file, &str_len);
// NOLINTNEXTLINE
FAIL_SYS(str == MAP_FAILED, "mmap failed to read file\n", EXIT_FAILURE);
int exit = asm_assemble_str(al, str);
// free mmap memory used for reading file
FAIL_SYS(munmap((void *)assembly_str, str_len) == -1,
"Error: failed to free memory\n");
return exit_status;
FAIL_SYS(munmap((void *)str, str_len) == -1, "munmap failed\n", EXIT_FAILURE);
return exit;
}

void asm_set_chunk_size(assemblyline_t al, size_t chunk_size) {
@@ -136,6 +203,108 @@ int asm_get_offset(assemblyline_t al) { return al->offset; }

void asm_set_offset(assemblyline_t al, int offset) { al->offset = offset; }

uint8_t *asm_get_buffer(assemblyline_t al) { return al->buffer; }
uint8_t __attribute__((deprecated("use asm_get_code instead"))) *
asm_get_buffer(assemblyline_t al) {
return al->buffer;
}

void *asm_get_code(assemblyline_t al) { return (void *)al->buffer; }

int asm_create_bin_file(assemblyline_t al, const char *file_name) {

void *buffer = asm_get_code(al);
int len = asm_get_offset(al);
FILE *write_ptr = fopen(file_name, "wb");

FAIL_IF_MSG(write_ptr == NULL, "failed to create binary file")

fwrite(buffer, sizeof(uint8_t), len, write_ptr);

fclose(write_ptr);

return EXIT_SUCCESS;
}

void asm_mov_imm(assemblyline_t al, enum asm_opt option) {

switch (option) {
case NASM:
al->assembly_opt |= NASM_MOV_IMM;
al->assembly_opt &= ~SMART_MOV_IMM;
break;
case STRICT:
al->assembly_opt &= ~(NASM_MOV_IMM | SMART_MOV_IMM);
break;
case SMART:
al->assembly_opt |= SMART_MOV_IMM;
al->assembly_opt &= ~NASM_MOV_IMM;
break;
default:
return;
}
}

void asm_sib(assemblyline_t al, enum asm_opt option) {

switch (option) {
case NASM:
asm_sib_index_base_swap(al, NASM);
asm_sib_no_base(al, NASM);
break;
case STRICT:
asm_sib_index_base_swap(al, STRICT);
asm_sib_no_base(al, STRICT);
break;
default:
return;
}
}

void asm_sib_index_base_swap(assemblyline_t al, enum asm_opt option) {

switch (option) {
case NASM:
al->assembly_opt |= NASM_SIB_INDEX_BASE_SWAP;
break;
case STRICT:
al->assembly_opt &= ~NASM_SIB_INDEX_BASE_SWAP;
break;
default:
return;
}
}

void asm_sib_no_base(assemblyline_t al, enum asm_opt option) {

switch (option) {
case NASM:
al->assembly_opt |= NASM_SIB_NO_BASE;
break;
case STRICT:
al->assembly_opt &= ~NASM_SIB_NO_BASE;
break;
default:
return;
}
}

void asm_set_all(assemblyline_t al, enum asm_opt option) {

switch (option) {
case NASM:
asm_mov_imm(al, NASM);
asm_sib_index_base_swap(al, NASM);
asm_sib_no_base(al, NASM);
break;
case STRICT:
asm_mov_imm(al, STRICT);
asm_sib_index_base_swap(al, STRICT);
asm_sib_no_base(al, STRICT);
break;
case SMART:
asm_mov_imm(al, SMART);
break;
default:
return;
}
}
197 changes: 176 additions & 21 deletions src/assemblyline.h
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
/**
* Copyright 2021 University of Adelaide
* Copyright 2022 University of Adelaide
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
@@ -22,10 +22,16 @@
#include <stdint.h>
#include <unistd.h>

// different assembly options for mov immediate and SIB
enum asm_opt { STRICT, NASM, SMART };

// TODO:
#define DEFAULT (SMART_MOV_IMM | NASM_SIB_INDEX_BASE_SWAP | NASM_SIB_NO_BASE)

typedef struct assemblyline *assemblyline_t;

/**
* Allocates an instance of assemblyline_t and attaches a pointer to a memory
* allocates an instance of assemblyline_t and attaches a pointer to a memory
* buffer @param buffer where machine code will be written to. Buffer length
* will be specified by @param len.
* NOTE: @param buffer could also be set to NULL for internal memory memory
@@ -35,74 +41,223 @@ typedef struct assemblyline *assemblyline_t;
assemblyline_t asm_create_instance(uint8_t *buffer, int len);

/**
* frees all memory associated with @param instance
* frees all memory associated with @param instance. Returns EXIT_SUCCESS or
* EXIT_FAILURE.
*/
int asm_destroy_instance(assemblyline_t instance);

/**
* Assembles the given string @param assembly_str containing valid x64 assembly
* assembles the given string @param assembly_str containing valid x64 assembly
* code with instance @param al It writes the corresponding machine code to the
* memory location specified by the buffer attached to @param al. Returns
* EXIT_SUCCESS or EXIT_FAILURE.
* (DEPRECATED: use asm_assemble_str() instead)
*/
int __attribute__((deprecated("use asm_assemble_str instead")))
assemble_str(assemblyline_t al, const char *assembly_str);

/**
* assembles the given string @param assembly_str containing valid x64 assembly
* code with instance @param al It writes the corresponding machine code to the
* memory location specified by al->buffer.
* memory location specified by buffer attached to @param al. Returns
* EXIT_SUCCESS or EXIT_FAILURE.
*/
int assemble_str(assemblyline_t al, const char *assembly_str);
int asm_assemble_str(assemblyline_t al, const char *assembly_str);

/**
* Assembles the given file path @param asm_file containing valid x64 assembly
* assembles the given file path @param asm_file containing valid x64 assembly
* code with instance @param al It writes the corresponding machine code to the
* memory location specified by al->buffer.
* memory location specified by buffer attached to @param al. Returns
* EXIT_SUCCESS or EXIT_FAILURE.
* (DEPRECATED: use asm_assemble_file() instead)
*/
int assemble_file(assemblyline_t al, char *asm_file);
int __attribute__((deprecated("use asm_assemble_file instead")))
assemble_file(assemblyline_t al, char *asm_file);

/**
* Assembles the given null-terminated @param string with instance @param al.
* assembles the given file path @param asm_file containing valid x64 assembly
* code with instance @param al It writes the corresponding machine code to the
* memory location specified by buffer attached to @param al. Returns
* EXIT_SUCCESS or EXIT_FAILURE.
*/
int asm_assemble_file(assemblyline_t al, char *asm_file);

/**
* assembles the given null-terminated @param string with instance @param al.
* It counts the number of instructions that break the chunk boundary of size
* @param chunk_size and saves it to @param dest It does not nop-pad
* necessarily, depends on the @param al instance (you can nop-pad and count
* different chunk breaks).
* different chunk breaks). Returns EXIT_SUCCESS or EXIT_FAILURE.
* NOTE: you cannot pass const char* as @param string, it will segfault, because
* string will be altered.
* (DEPRECATED: use asm_assemble_string_counting_chunks() instead)
*/
int assemble_string_counting_chunks(assemblyline_t al, char *string,
int chunk_size, int *dest);
int __attribute__((
deprecated("use asm_assemble_string_counting_chunks instead")))
assemble_string_counting_chunks(assemblyline_t al, char *string, int chunk_size,
int *dest);

/**
* Sets a given chunk size boundary @param chunk_size in bytes with instance
* assembles the given null-terminated @param string with instance @param al.
* It counts the number of instructions that break the chunk boundary of size
* @param chunk_size and saves it to @param dest It does not nop-pad
* necessarily, depends on the @param al instance (you can nop-pad and count
* different chunk breaks). Returns EXIT_SUCCESS or EXIT_FAILURE.
* NOTE: you cannot pass const char* as @param string, it will segfault, because
* string will be altered.
*/
int asm_assemble_string_counting_chunks(assemblyline_t al, char *string,
int chunk_size, int *dest);

/**
* assembles the given @param asm_file with instance @param al.
* It counts the number of instructions that break the chunk boundary of size
* @param chunk_size and saves it to @param dest It does not nop-pad
* necessarily, depends on the @param al instance (you can nop-pad and count
* different chunk breaks). Returns EXIT_SUCCESS or EXIT_FAILURE.
*/
int asm_assemble_file_counting_chunks(assemblyline_t al, char *asm_file,
int chunk_size, int *dest);

/**
* sets a given chunk size boundary @param chunk_size in bytes with instance
* @param al. When called before assemble_str() or assemble_file() assemblyline
* will ensure no instruction opcode will cross the specified @param chunk_size
* will ensure no instruction opcode will cross the specified @param chunk_size
* boundary via nop padding.
* NOTE: @param chunk_size must be greater than 2 in order to
* be classified as a valid memory chunk boundary
*/
void asm_set_chunk_size(assemblyline_t al, size_t chunk_size);

/**
* Set debug flag @param debug to true or false with instance @param al. When is
* set debug flag @param debug to true or false with instance @param al. When is
* set @param debug to true machine code represented in hexidecimal will be
* printed to stdout.
*/
void asm_set_debug(assemblyline_t al, bool debug);

/**
* Returns the offset associated with @param al
* returns the offset associated with @param al
*/
int asm_get_offset(assemblyline_t al);

/**
* Sets a memory offset @param offset to specify exact location in memory block
* sets a memory offset @param offset to specify exact location in memory block
* for writting machine code with instance @param al.
* NOTE: @param offset could be set to 0 for the resulting memory block.
*/
void asm_set_offset(assemblyline_t al, int offset);

/**
* Returns the buffer associated with @param al
* returns the buffer associated with @param al
* (DEPRECATED: use asm_get_code() instead)
*/
uint8_t *asm_get_buffer(assemblyline_t al);
uint8_t __attribute__((deprecated("use asm_get_code instead"))) *
asm_get_buffer(assemblyline_t al);

/**
* Returns the buffer associated with @param al as type void* for easy
* returns the buffer associated with @param al as type void* for easy
* typecasting to any function pointer format.
*/
void *asm_get_code(assemblyline_t al);

/**
* Generates a binary file @param file_name from assembled machine code up to
* the memory offset of the current instance @param al. Returns EXIT_SUCCESS or
* EXIT_FAILURE.
*/
int asm_create_bin_file(assemblyline_t al, const char *file_name);

/**
* Nasm optimizes a `mov rax, IMM` to `mov eax, imm`, iff imm is <= 0x7fffffff
* for all destination registers. The following three methods allow the user to
* specify this behavior.
*
* setting @param option to STRICT disables nasm-style mov-immediate handling.
* ex: even if immediate size for mov is less than or equal to max signed 32 bit
* assemblyline will pad the immediate to fit 64bit.
* That is:
* "mov rax, 0x7fffffff" as "mov rax, 0x000000007fffffff"
* -> 48 b8 ff ff ff 7f 00 00 00 00
*
* setting @param option to NASM enables nasm-style mov-immediate handling.
* ex: if immediate size for mov is less than or equal to max signed 32 bit
* assemblyline will emit code to mov to eax rather than rax.
* That is: "mov rax, 0x7fffffff" as "mov eax, 0x7fffffff"
* -> b8 ff ff ff 7f
*
* setting @param option to SMART, Assemblyline will check the immediate value
* for leading 0's and thus allows manual optimizations. This is currently set
* as default.
* ex: "mov rax, 0x000000007fffffff" -> 48 b8 ff ff ff 7f 00 00 00 00
* "mov rax, 0x7fffffff" -> b8 ff ff ff 7f
*
* setting @param option to any other value results in an no-operation function.
*/
void asm_mov_imm(assemblyline_t al, enum asm_opt option);

/**
* Since the stack pointer register is non-scalable in SIB, Nasm will swap the
* base and index register if the stack pointer register is used as index.
*
* setting @param option to STRICT disables Nasm SIB handling.
* That is:
* "lea r15, [rax+rsp]" will be interpreted as is
* -> 4c 8d 3c 20
*
* setting @param option to NASM enables Nasm SIB handling. This is currently
* set as default.
* That is: "lea r15, [rax+rsp]" will be interpreted as "lea r15,
* [rsp+rax]"
* -> 4c 8d 3c 04
*
* setting @param option to SMART or any other value results in an no-operation
* function.
*/
void asm_sib_index_base_swap(assemblyline_t al, enum asm_opt option);

/**
* In SIB, when no base register is present and the scale is equal to 2
* NASM will set the base to the index register and reduce the scale by 1.
* ex: [2*rax] -> [rax+1*rax]
*
* setting @param option to STRICT disables Nasm SIB handling with no base.
* That is:
* "lea r15, [2*rax]" will be interpreted as is
* -> 4c 8d 3c 45 00 00 00 00
*
* setting @param option to NASM enables Nasm SIB handling with no base.
* This is currently set as default.
* "lea r15, [2*rax]" will be interpreted as "lea r15, [rax+1*rax]"
* -> 4c 8d 3c 00
*
* setting @param option to SMART or any other value results in an no-operation
* function.
*/
void asm_sib_no_base(assemblyline_t al, enum asm_opt option);

/**
* setting @param option to STRICT is equivalent to calling both
* asm_sib_index_base_swap(al,STRICT) and asm_sib_no_base(al,STRICT)
*
* setting @param option to NASM is equivalent to calling both
* asm_sib_index_base_swap(al,NASM) and asm_sib_no_base(al,NASM)
*
* setting @param option to any other value results in an no-operation function
*/
void asm_sib(assemblyline_t al, enum asm_opt option);

/**
* setting @param option to STRICT is equivalent to calling both
* asm_sib_index_base_swap(al,STRICT) and asm_mov_imm(al,STRICT)
*
* setting @param option to NASM is equivalent to calling both
* asm_sib_index_base_swap(al,NASM) and asm_mov_imm(al,NASM)
*
* setting @param option to SMART is equivalent to calling asm_mov_imm(al,SMART)
*
* setting @param option to any other value results in an no-operation function
*/
void asm_set_all(assemblyline_t al, enum asm_opt option);

#endif
153 changes: 132 additions & 21 deletions src/common.h
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
/**
* Copyright 2021 University of Adelaide
* Copyright 2022 University of Adelaide
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
@@ -17,51 +17,123 @@
/*defines all preprocessors used in assemblyline*/
#ifndef COMMON_H
#define COMMON_H
#include <stdio.h>

#define LETTERS_IN_ALPHABET 26
// max length of filtered assembly string
#define FILTERED_STR_LEN 100
// default length of internal buffer
#define MEM_BUFFER 6000
// actual writable buffer size = MEM_BUFFER - BUFFER_TOLERANCE
#define BUFFER_TOLERANCE 20

#define _GNU_SOURCE 1

// used when 0 cannot denote none
#define NA (-1)
// denotes an error during assembly
#define ASM_ERROR (-1)
// denotes an error during instruction look up
#define INSTR_ERROR (-2)

// base for string to unsigned long conversions
#define RADIX_16 16
#define RADIX_10 10
#define STR_HEX_64 18
// used to determine register size
#define BIT_MASK 0b01100000000
#define BIT_32 0b01100000000
#define BIT_16 0b01000000000

// number of bits in a byte
#define BIT_8 8
// immediate ranges
#define NEG64BIT 0xffffffffffffffff
#define NEG32BIT 0xffffffff00000000
#define NEG8BIT 0xffffffffffffff00
#define NEG80BIT 0xffffffffffffff80
#define NEG80_32BIT 0xffffff80
#define MASK_4BIT 0xf
#define MAX_BYTE_IDENTIFIER 0xe0
#define MAX_SIGNED_8BIT 0x7f
#define MAX_UNSIGNED_8BIT 0xff
#define MAX_UNSIGNED_16BIT 0xffff
#define MAX_SIGNED_32BIT 0x7fffffff
#define MAX_UNSIGNED_32BIT 0xffffffff
#define MOVI 6
#define NEG32BIT_CHECK 0x80000000
// check if a number is at least 32 bits
#define X32BIT_CHECK 0x10000000
#define NEG8BIT_CHECK 0x80

// set register length to 1 byte
#define SET_BYTE ~(reg16 | reg32 | reg64)
// used for setting prefix
#define SET_WORD ~(reg32 | reg64)
#define SET_DWORD ~(reg64)

#define C5H 0xc5
#define C4H 0xc4
#define NONE 0x0
#define NO_BYTE 0x100
#define NO_REG_MEM 0x25

// RXB bits in front of mmmm corresponds to !(rex_r), !(rex_x), and !(rex_b)
// Obtained from get_rex_prefix()
#define RXB 0x0
// mmmmm constants (VEX must be 3-byte)
#define X0F 0b00001000000000
#define X0F38 0b00010000000000
#define X0F3A 0b00011000000000
// just a place holder for reference
#define W 0x0
#define W0 0b000000000
// most signifcant bit will depend on the size of m operand (default 64 bit)
#define W1 0b100000000
// most signifcant bit will depend on the size of m operand (default 64 bit)
#define W0_W1 0b100000001
// WIG constant to specify we could switch between 3 and 2 byte hex
#define WIG 0b1

// pp constant
#define X66 0b010
#define XF3 0b100
#define XF2 0b110

// vvvv to specify which register to encode
#define NDS 0b01000000
#define NDD 0b00100000
#define DDS 0b00010000
// no register specifier
#define NNN 0b11110000
#define CLEARvvvv 0b1111000

// L constant
#define LZ 0x0
#define B128 0x0
#define B256 0b1000

// VEX settings (Will be shifted right one bit to remove WIG)
// use NONE if not present
#define VEX(vvvv, L, pp, mmmmm, WIG) \
((VEX) | (RXB) | (mmmmm) | (W) | (vvvv) | (L) | (pp) | (WIG))

#define R_VEX 0b10000000
#define M_VEX 0b00100000

// used for getting register type, length, and value
#define REX_MASK 0b111
#define REG_MASK 0b00000011111
#define MODE_MASK 0b11110000000
#define MODE_CLEAR 0b00001111111
#define VALUE_MASK 0b00000000111
#define VVVV_MASK 0b00000001111

// process WRXB bit of the rex prefix
#define REG_RB 0b1000
#define REX_W_RXB 0b1001111

// register string length
#define REGISTER_LEN 5
// max register string length
#define REGISTER_LEN 6
// all generic registers have at most 7 variations
#define NUM_OF_REGISTERS 7

// used in instruction_data.h
#define MAX_REG_LEN 5
#define NUM_OF_OPD 3
#define MAX_REG_LEN 6
#define NUM_OF_OPD 4
#define OPD_FORMAT_LEN 4
#define INSTRUCTION_CHAR_LEN 15

@@ -72,34 +144,73 @@
#define OPERAND_FORMAT_LEN 7

// used only in tokenizer
#define IN_RANGE(var, lower, upper) ((var >= lower) && (var <= upper))
#define IN_RANGE(var, lower, upper) (((var) >= (lower)) && ((var) <= (upper)))
#define DO_NOT_PAD(reduce, set, mask) \
(reduce) &= (mask); \
(set) = true;

// keyword length
#define DWORD_LEN 5
#define BYTE_LEN 4
#define SHORT_LEN 5
#define LONG_LEN 4
#define SHORT_LEN 5
#define FAR_LEN 3

// mod values for the ModR/M Byte
#define MOD8 0b1000000
#define MOD8 0b01000000
#define MOD16 0b10000000
#define MOD24 0b11000000

#define SIB 0b000000000
#define SIB2 0b01000000
#define SIB4 0b10000000
#define SIB8 0b11000000

// scaled indexed addressing
#define SIB_CONST 0x24
#define NO_BASE 0b101

// operand position
#define FIRST_OPERAND 0
#define SECOND_OPERAND 1
#define THIRD_OPERAND 2

// fail condition preprocessor
#define FOURTH_OPERAND 3

// assembly mode options
#define NASM_MOV_IMM 0b00001
#define NASM_SIB_INDEX_BASE_SWAP 0b00100
#define NASM_SIB_NO_BASE 0b01000
#define SMART_MOV_IMM 0b00010
#define GET_EN 0b11111100000000000000000

// check instruction attributes
#define TYPE(key, instr_type) (INSTR_TABLE[(key)].type == (instr_type))
#define NAME(key, instr_name) (INSTR_TABLE[(key)].name == (instr_name))

// various length nop instructions
#define NOP 0x90
#define NOP2 0x66, 0x90
#define NOP3 0x0f, 0x1f, 0x00
#define NOP4 0x0f, 0x1f, 0x40, 0x00
#define NOP5 0x0f, 0x1f, 0x44, 0x00, 0x00
#define NOP6 0x66, 0x0f, 0x1f, 0x44, 0x00, 0x00
#define NOP7 0x0f, 0x1f, 0x80, 0x00, 0x00, 0x00, 0x00
#define NOP8 0x0f, 0x1f, 0x84, 0x00, 0x00, 0x00, 0x00, 0x00
#define NOP9 0x66, 0x0f, 0x1f, 0x84, 0x00, 0x00, 0x00, 0x00, 0x00
#define NOP10 0x66, 0x66, 0x0f, 0x1f, 0x84, 0x00, 0x00, 0x00, 0x00, 0x00
#define NOP11 0x66, 0x66, 0x66, 0x0f, 0x1f, 0x84, 0x00, 0x00, 0x00, 0x00, 0x00

// fail conditions
#define FAIL_IF(EXP) \
if (EXP) { \
return EXIT_FAILURE; \
}

#define FAIL_SYS(EXP, MSG) \
#define FAIL_SYS(EXP, MSG, RET) \
if (EXP) { \
fprintf(stderr, "assembyline: " MSG); \
perror("error: "); \
return EXIT_FAILURE; \
perror("error "); \
return RET; \
}

#define FAIL_IF_ERR(EXP) \
@@ -119,4 +230,4 @@
return EXIT_FAILURE; \
}

#endif
#endif
435 changes: 259 additions & 176 deletions src/encoder.c

Large diffs are not rendered by default.

14 changes: 7 additions & 7 deletions src/encoder.h
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
/**
* Copyright 2021 University of Adelaide
* Copyright 2022 University of Adelaide
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
@@ -23,18 +23,18 @@
#include "instruction_data.h"

/**
* determines opcode offset of @param instruc for different register size
* determines opcode offset of @param instrc for different register size
*/
void encode_offset(struct instr *instruc);
void encode_offset(struct instr *instrc);

/**
* determines register and prefix opcode of @param instruc
* determines register and prefix opcode of @param instrc
*/
void encode_operands(struct instr *instruc);
int encode_operands(struct instr *instrc);

/**
* preprocess the immediate for @param instruc
* preprocess the immediate for @param instrc
*/
void encode_imm(struct instr *instruc);
void encode_imm(struct instr *instrc);

#endif
212 changes: 177 additions & 35 deletions src/enums.h
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
/**
* Copyright 2021 University of Adelaide
* Copyright 2022 University of Adelaide
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
@@ -24,34 +24,30 @@ typedef enum { BEGIN, FIRST_CH, SPACE_FOUND } filter_op;
// with instruction opcode)
typedef enum {

none = 0,
SIB = 0x24,
REG = 256,
REX,
VEX,
EVEX,
W0,
MEM,
NO_PREFIX = 399
} op_encoding;
// used in opcode layout to denote a dynamic byte
REG = 0b00100000000000000000,
REX = 0b01000000000000000000,
VEX = 0b10000000000000000000,
// this denotes the presence of an 8-bit immediate
ib = 0b0100000000000000000000,
rd = 0b1000000000000000000000
} opcode_encoding;

// only used for determining what prefix to use based on registers
typedef enum {

rex = 0x40,
rex_ = 0x40,
rex_w = 0x48,
rex_b = 0x01,
rex_r = 0x04,
rex_x = 0x02,
evex = 0x67
rex_b = 0x01
} prefix_encoding;

typedef enum { CHUNK_COUNT, CHUNK_FITTING, ASSEMBLE } ASM_MODE;

// describes how operands are encoded
typedef enum {

NA = -1,
MR = 500,
RM,
RVM,
@@ -60,7 +56,8 @@ typedef enum {
I,
O,
D,
S
S,
B
} operand_encoding;

// describes operand layout ex: ri = instruction register, constant
@@ -79,13 +76,32 @@ typedef enum {
rri,
rmi,
rrm,
rmr
rmr,
vr,
rv,
vv,
yy,
vi,
vm,
mv,
ym,
my,
vvv,
yyy,
mri,
mrr,
vvm,
yym,
vvvi,
vvmi,
yyyi,
yymi,

} operand_format;

// unique identifier for each instuction
typedef enum {

ASM_ERROR = -1,
EOI,
LABEL,
SKIP,
@@ -94,6 +110,7 @@ typedef enum {
add,
adox,
and,
bextr,
bzhi,
call,
clc,
@@ -115,6 +132,12 @@ typedef enum {
cmovnc,
cmovne,
cmovng,
cmovnge,
cmovnl,
cmovnle,
cmovno,
cmovnp,
cmovns,
cmovnz,
cmovo,
cmovp,
@@ -123,7 +146,11 @@ typedef enum {
cmovs,
cmovz,
cmp,
cpuid,
cvtdq2pd,
cvtpd2dq,
dec,
divpd,
imul,
inc,
ja,
@@ -148,54 +175,161 @@ typedef enum {
lfence,
mfence,
mov,
movd,
movntdqa,
movntq,
movq,
movzx,
mulpd,
mulx,
neg,
nop,
not,
// clang-format off
or,
// clang-format on
paddb,
paddd,
paddq,
paddw,
pand,
pandn,
pmuldq,
pmulhrsw,
pmulhuw,
pmulhw,
pmulld,
pmullw,
pmuludq,
pop,
por,
prefetchnta,
prefetcht0,
prefetcht1,
prefetcht2,
psrldq,
psubb,
psubd,
psubq,
psubw,
punpcklqdq,
push,
pxor,
rcr,
rdpmc,
rdtsc,
rdtscp,
ret,
ror,
rorx,
sal,
sar,
sarx,
sbb,
seta,
setae,
setb,
setbe,
setc,
sete,
setg,
setge,
setl,
setle,
setna,
setnae,
setnb,
setnbe,
setnc,
setne,
setng,
setnge,
setnl,
setnle,
setno,
setnp,
setns,
setnz,
seto,
setp,
setpe,
setpo,
sets,
setz,
sfence,
shl,
shld,
shlx,
shr,
shrd,
shrx,
sub,
test,
vaddpd,
vdivpd,
vmovdqu,
vmovupd,
vmulpd,
vpaddb,
vpaddd,
vpaddq,
vpaddw,
vpand,
vpandn,
vperm2f128,
vperm2i128,
vpermd,
vpmuldq,
vpmulhrsw,
vpmulhuw,
vpmulhw,
vpmulld,
vpmullw,
vpmuludq,
vpor,
vpsubb,
vpsubd,
vpsubq,
vpsubw,
vpxor,
vsubpd,
xabort,
xbegin,
xchg,
xend,
xor
} asm_instr;

// used to catagorize instruction based on their functionality
// used to categorize instruction based on their functionality
typedef enum {

DATA_TRANSFER,
LOGICAL,
ARITHMETIC,
SHIFT,
SHIFT_S,
BIT,
CONTROL_FLOW,
OTHER,
ASSEMBLYLINE,
// enforce a different mode of processing constant operand
DATA_TRANSFER = 0b000001,
/* to ensure shift instruction such as "shr REG, 1" does not
* assemble the predefined constant 1 operand. Rather use the special
* instruction for shr REG, 1
*/
SHIFT = 0b000010,
/* for control flow instructions constant operand is handled
* differently due to not having an associated register
*/
CONTROL_FLOW = 0b000100,
// SSE and vector extension instructions
VECTOR = 0b001000,
// AVX 256 instruction
VECTOR_AVX = 0b011000,
// this is a test used to bypass old implementation
VECTOR_EXT = 0b101000,
// used to encode instructions with both an I and M operand encoding
OPERATION = 0b1000000,
PAD_ALWAYS = 0b1000010,
// operand can only be a byte
BYTE_OPD = 0b1000011,
// instructions that do not require special encodings
OTHER
} instr_type;

// register bit size and catagory(ext denotes extended x64 registers)
// register bit size and category (ext denotes extended x64 registers)
typedef enum {

reg8 = 0b00000000000,
@@ -207,16 +341,15 @@ typedef enum {
ext32 = 0b01110000000,
reg64 = 0b10000000000,
ext64 = 0b10010000000,
mmx64 = 0b10100000000,
mmx64 = 0b10100000000
} bit_mode;

// register representation (converted from string)
typedef enum {

reg_error = 0b1000000,
reg_none = 0b100000,

// 8bitregisters
// 8-bit registers
al = 0b00000,
cl = 0b00001,
dl = 0b00010,
@@ -225,7 +358,7 @@ typedef enum {
bpl = 0b00101,
sil = 0b00110,
dil = 0b00111,

// 8-bit extended registers
r8b = 0b01000,
r9b = 0b01001,
r10b = 0b01010,
@@ -234,7 +367,7 @@ typedef enum {
r13b = 0b01101,
r14b = 0b01110,
r15b = 0b01111,

// 64-bit vector reigsters
mm0 = 0b10000,
mm1 = 0b10001,
mm2 = 0b10010,
@@ -243,7 +376,16 @@ typedef enum {
mm5 = 0b10101,
mm6 = 0b10110,
mm7 = 0b10111,
// 64-bit extended vector reigsters
mm8 = 0b11000,
mm9 = 0b11001,
mm10 = 0b11010,
mm11 = 0b11011,
mm12 = 0b11100,
mm13 = 0b11101,
mm14 = 0b11110,
mm15 = 0b11111

} asm_reg;

#endif
#endif
49 changes: 21 additions & 28 deletions src/instr_parser.c
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
/**
* Copyright 2021 University of Adelaide
* Copyright 2022 University of Adelaide
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
@@ -17,59 +17,52 @@
/*implements functions for parsing instruction name
and operand format of instruction*/
#include "instr_parser.h"
#include "common.h"
#include "instruction_data.h"
#include "instructions.h"
#include <stdio.h>
#include <string.h>

operand_format get_opd_format(char *opd_en) {

int i = NA;
if (opd_en[0] != '\0')
i = opd_format_table_index[opd_en[0] - 'a'] - 1;
// find the correct operand format enum given the corresponding string
while (OPD_FORMAT_TABLE[++i].val != opd_error) {
if (strcmp(opd_en, OPD_FORMAT_TABLE[i].str) == 0)
if (!strcasecmp(opd_en, OPD_FORMAT_TABLE[i].str))
return OPD_FORMAT_TABLE[i].val;
}
// operand format not found
fprintf(stderr, "unknown operand format: \"%s\"\n", opd_en);
return opd_error;
}

int str_to_instr_key(char *instruction, operand_format opd_index) {
int str_to_instr_key(char *instruction, operand_format opd_layout) {

int i = 2;
int i = 0;
// set index of INSTR_TABLE[] to the first letter of instruction
if (IN_RANGE(instruction[0], 'a', 'z'))
i = instr_table_index[instruction[0] - 'a'] - 1;
else
return INSTR_ERROR;
// search for instruction entry in INSTR_TABLE[]
while (INSTR_TABLE[++i].name != NA) {
if (INSTR_TABLE[i].instr_name[0] != '\0') {
// compare intruction strings
if (strcmp(instruction, INSTR_TABLE[i].instr_name) == 0) {
asm_instr found_instr = INSTR_TABLE[i].name;
if (!strcmp(instruction, INSTR_TABLE[i].instr_name)) {
int found_instr = INSTR_TABLE[i].name;
while (INSTR_TABLE[i].name == found_instr) {
// compare operand formats
if (INSTR_TABLE[i].opd_format[0] == opd_index)
return i;
else if (INSTR_TABLE[i].opd_format[1] == opd_index)
if (INSTR_TABLE[i].opd_format[0] == opd_layout ||
INSTR_TABLE[i].opd_format[1] == opd_layout)
return i;
i++;
}
// operand format not found
return EOI;
// operand format is not found for instruction string
return INSTR_ERROR;
}
}
}
return EOI;
}

int to_special_instr_key(int key) {
// converts special instructions to their corresponding key
switch (INSTR_TABLE[key].name) {
case sub:
case sbb:
case add:
case adc:
case xor:
if (INSTR_TABLE[key].encode_operand == M)
return key + 1;
default:
return EOI;
}
// INSTR_TABLE entry is not found for instruction string
return INSTR_ERROR;
}
10 changes: 2 additions & 8 deletions src/instr_parser.h
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
/**
* Copyright 2021 University of Adelaide
* Copyright 2022 University of Adelaide
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
@@ -31,12 +31,6 @@ operand_format get_opd_format(char *opd_en);
* an operand_format enum representation @param opd_index and returns the index
* key to the matching INSTR_TABLE[] entry.
*/
int str_to_instr_key(char *instruction, operand_format opd_index);

/**
* takes a INSTR_TABLE[] index key @param key and returns the new INSTR_TABLE[]
* index key
*/
int to_special_instr_key(asm_instr key);
int str_to_instr_key(char *instruction, operand_format opd_layout);

#endif
104 changes: 74 additions & 30 deletions src/instruction_data.h
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
/**
* Copyright 2021 University of Adelaide
* Copyright 2022 University of Adelaide
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
@@ -21,62 +21,106 @@

#include "common.h"
#include "enums.h"
#include "instructions.h"
#include <inttypes.h>
#include <stdbool.h>
#include <stdint.h>
#include <stdio.h>

// contains the memory context for assembling an asm program
struct assemblyline {
// points to a memory buffer location containing the first instruction
uint8_t *buffer;
int buffer_len;
// size of assembly program in bytes (could be manually adjusted)
// size of assembly program in bytes (could be manually adjusted) and offset
// of -1 denotes assembly parsing error
int offset;
size_t chunk_size;
bool external;
bool external : 1;
ASM_MODE assembly_mode;
bool debug;
bool finalized;
uint8_t assembly_opt;
bool debug : 1;
bool finalized : 1;
};

// prefix and and register byte values
struct prefix {
unsigned int reg;
unsigned int rex;
// [W|R][vvvv][L][pp]
uint8_t vvvv : 4;
bool is_w0 : 1;
bool is_67H : 1;
bool is_66H : 1;
// fix later if possible
unsigned int sib;
};

struct operand {
// pointer to operand in instruction string
char *ptr;
// stores the string representation of register
char str[MAX_REG_LEN];
// enum representation of register
asm_reg reg;
// stores the 2nd register in a memory reference
char sib[MAX_REG_LEN];
// enum representation of 2nd register in
// a memory reference
asm_reg index;
// operand typecould be: r,m, or i
char type;
};

// stores keywords used in assemblyline
union keywords {
struct {
uint8_t is_short : 1;
uint8_t is_long : 1;
uint8_t is_far : 1;
uint8_t is_byte : 1;
uint8_t is_word : 1;
uint8_t is_dword : 1;
};
uint8_t is_keyword;
};

// internal representation of an assembly instruction
struct instr {
// connects instr to INSTR_TABLE[]
asm_instr key;
// stores components assembly instruction into buffer
int key;
// stores components of assembly instruction into buffer
char instruction[INSTRUCTION_CHAR_LEN];
char *operand[NUM_OF_OPD];
char op_cpy[NUM_OF_OPD][MAX_REG_LEN];
char op_mem_cpy[NUM_OF_OPD][MAX_REG_LEN];
char opd_type[OPD_FORMAT_LEN];
// operand registers represented as asm_reg enum
asm_reg opd[NUM_OF_OPD];
asm_reg opd_mem[NUM_OF_OPD];
// keywords for assemblyline
bool is_short;
bool is_long;
bool is_byte;
// stores operands represented as strings
struct operand opd[NUM_OF_OPD];
// bitmap for keywords
union keywords keyword;
// enable or disable nasm register optimization
uint8_t assembly_opt;
// constants and memory displacement
bool imm;
bool reduced_imm;
bool imm : 1;
bool reduced_imm : 1;
unsigned long cons;
bool zero_byte;
bool mem_disp;
bool sib;
bool zero_byte : 1;
// when operand is a memory reference M
bool mem_disp : 1;
bool mem_value : 1;
bool is_sib_const : 1;
bool is_sib : 1;
bool no_base : 1;
uint8_t mem_index : 3;
uint32_t mem_offset;
uint32_t mem_const;
// displacement for modRM64_m variable based on
// value of op_en and size of mem_disp
int mod_disp;
int sib_disp;
// uses operand_encoding to get value
// operand and prefix
unsigned int reg_hex;
unsigned int prefix_hex;
unsigned int vex_prefix_hex;
unsigned int w0_hex;
unsigned int mem_hex;
// operand and prefix values
struct prefix hex;
// offset for opcode determined by register size
int op_offset;
int rd_offset;
unsigned int rd_offset;
};

#endif
Loading