ELF Parser in Lean #18

shigoel · 2024-02-22T15:49:32Z

We need an ELF binary file parser in Lean, specifically for an Arm (Aarch64) machine, so that we can read in the program, data, and other relevant sections, and render them into a form suitable for use by LNSym. We intend to verify programs in LNSym that have been obtained directly from the ELF binary.

Here are some references that can come in useful when developing the ELF parser:

Tool Interface Standards (TIS) Executable and Linking Format (ELF) Specification
Wikipedia: Executable and Linkable Format (ELF)
ELF for the Arm 64-bit Architecture (AArch64)
HOL-Light/s2n-bignum ELF loader code
Linux elf.h
Linux x86 specific elf.h
Linux Arm specific elf.h
Linux AArch64 specific elf.h
LLVM Elf Header
Chromium Linker Relocation Info:
Has better information than Linux headers on Arm relocation types
Goloang ELF debugger
Has better information than Linux headers on Arm relocation types
ACL2 Exec Loader Documentation
ELF File Man Page
ReadELF man page

jimgrundy · 2024-02-22T16:38:04Z

Also this paper

The Missing Link: Explaining ELF Static Linking, Semantically

tenedor · 2024-03-19T19:46:54Z

@shigoel / others, is anyone working on this yet? I'm interested in helping.

shigoel · 2024-03-20T19:32:17Z

@tenedor Hi Daniel -- no, there isn't anyone working on this. It'd be great to have your help!

tenedor · 2024-03-20T20:43:42Z

@shigoel Sounds good, I'll take a stab at it! This isn't an area I've worked in before but hopefully I can still be useful. Thanks for all the links to resources on ELF.

Are you looking for just an implementation of the parser at this stage, or does it need to be a verified implementation?

shigoel · 2024-03-20T21:51:01Z

@tenedor Thank you, Daniel!!

Just a parser implementation would do, to begin with. It may be interesting to keep verification in mind during development though, so that we can imagine what a potential proof would look like and structure code appropriately.

The paper Jim linked above does a pretty deep dive into ELF Linking, and may be of interest after some basics are lined up.

tenedor · 2024-03-20T22:16:09Z

@shigoel Great! Okay, here are some scoping questions:

It sounds like at some point this parser may be useful as a standalone project, but I'm going to focus on the LNSym use case for now.

What parts of ELF parsing are important for LNSym? E.g. do you just care about running programs (so, executables and shared libraries need to be parsed), or do you care about linking or additional capabilities ELF supports?

When you say that the parser should render the parsed binary "into a form suitable for use by LNSym", what representation do you have in mind? E.g. do you just need a parser from the binary-encoded data to nice-to-work-with data structures, or does the scope for this include mapping program segments into simulated memory, doing dynamic linking, relocation, etc.? Let me know if there's a clear hand-off point that's already written in LNSym. (I figure I should also look at how you're running the SHA256 tests.)

What level of performance do you need from an initial version? / How soon will you care about performance?

Do you have an example ELF file that I should start with? The simpler the better =). I figure I'll get a standalone executable working first before I consider shared libraries.

That's probably enough questions to throw at you at once! But if you have other recommendations I'll happily take them. I can also find time to set up a call if that's useful. Either way, I'll try to get some basic code to you soon so we can start iterating on that.

jimgrundy · 2024-03-20T23:21:41Z

I'd suggest starting small with the ability to parse (deserialize) an ELF binary (executables and shared libraries) into easy to use data-structures. But, as Shilpi says, think about what one might prove about it and how. But, dream big, also image deserializing those data-structures back into an ELF binary. Imagine one day loading in a binary, applying a correctness preserving optimization to it, and writing it back out again as an optimized executable. That's a long way off, but it's fun to deam.

…

On Wed, Mar 20, 2024 at 5:16 PM Daniel Windham ***@***.***> wrote: @shigoel <https://github.com/shigoel> Great! Okay, here are some scoping questions: It sounds like at some point this parser may be useful as a standalone project, but I'm going to focus on the LNSym use case for now. What parts of ELF parsing are important for LNSym? E.g. do you just care about running programs (so, executables and shared libraries need to be parsed), or do you care about linking or additional capabilities ELF supports? When you say that the parser should render the parsed binary "into a form suitable for use by LNSym", what representation do you have in mind? E.g. do you just need a parser from the binary-encoded data to nice-to-work-with data structures, or does the scope for this include mapping program segments into simulated memory, doing dynamic linking, relocation, etc.? Let me know if there's a clear hand-off point that's already written in LNSym. (I figure I should also look at how you're running the SHA256 tests.) What level of performance do you need from an initial version? / How soon will you care about performance? Do you have an example ELF file that I should start with? The simpler the better =). I figure I'll get a standalone executable working first before I consider shared libraries. That's probably enough questions to throw at you at once! But if you have other recommendations I'll happily take them. I can also find time to set up a call if that's useful. Either way, I'll try to get some basic code to you soon so we can start iterating on that. — Reply to this email directly, view it on GitHub <#18 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AUPXYT345PVXYLVFOJPMORLYZIDD7AVCNFSM6AAAAABDVGHT7GVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDAMJQG42DEOJQHE> . You are receiving this because you commented.Message ID: ***@***.***>

tenedor · 2024-03-21T14:30:26Z

Serializing back to a file will come for free - writing the round trip deserialize + serialize from the start more than pays for itself in testing gains. But I'll leave the optimizing rewrites for the future 🙂

gleachkr · 2024-03-21T15:36:47Z

Possibly relevant: I'm currently working on an ELF model and parser in Lean.

So far, it's basically a port of the ELF model from https://github.com/rems-project/linksem, accompanied by an (unpolished) readelf clone as an way of structuring the project and sanity-checking the output. The eventual ambition is pretty close to what @jimgrundy just described, although my team was thinking more about micropatching than optimization. Though

That's a long way off, but it's fun to dream

Anyhow, I'm working on getting permission to make the repo public. Would that be of interest to you all?

(Also, I noticed in CONTRIBUTING.md that you don't want to take on any external dependencies. Would that rule out using a standalone ELF parsing library? I'm planning to release it under MIT, so it'd also be possible to fold whatever pieces you want into your own codebase.)

shigoel · 2024-03-21T19:11:26Z

@gleachkr This is exciting! Thanks for letting us know! Your repo. would definitely be of interest to us, especially if Arm64 platforms are supported. Please keep us posted.

BTW, does your library include a Lean proof of correctness of relocation for some platform, similar to the REMS' ELF work?

Re. external dependencies: we try to limit what we import because we often need to be on the bleeding edge of Lean versions (i.e., nightly releases). We found in the past that other libraries sometimes weren't bumped up to use the same lean-toolchain as us, which caused compatibility issues. However, as we noted in CONTRIBUTING.md, exceptions are indeed possible if we find a way around this issue.

gleachkr · 2024-03-21T20:05:37Z

Wonderful! Here's the repo (please excuse the lack of polish right now): https://github.com/draperlaboratory/ELFSage. I just bumped the tool chain to catch up with you all.

BTW, does your library include a Lean proof of correctness of relocation for some platform, similar to the REMS' ELF work?

That would be in scope, but I think it's some ways off.

Would something like the ELF file structures, from https://github.com/rems-project/linksem/blob/master/src/elf_file.lem be a useful parsing output for you all? I think that's a pretty natural next step on my side.

tenedor · 2024-03-21T21:35:29Z

Sweet! @gleachkr, would you want help developing ELFSage? Incidentally I'm also based in Cambridge.

gleachkr · 2024-03-21T21:40:55Z

On Thu Mar 21, 2024 at 5:35 PM EDT, Daniel Windham wrote: Sweet! @gleachkr, would you want help developing ELFSage?

Sure thing, absolutely!

Incidentally I'm also based in Cambridge.

That's neat, small world :)

shigoel · 2024-03-21T23:41:36Z

Would something like the ELF file structures, from https://github.com/rems-project/linksem/blob/master/src/elf_file.lem be a useful parsing output for you all? I think that's a pretty natural next step on my side.

Indeed -- that looks great!

Glad @gleachkr and @tenedor can join forces!

tenedor · 2024-03-22T15:14:04Z

@gleachkr great - how do you want to coordinate on ways I can help? I'm happy to get on a call (or meet in person) if that's convenient.

gleachkr · 2024-03-22T17:14:47Z

A meetup sounds good (the geographical coincidence kinda calls out for it). My email is [email protected], if you want to send me a note maybe we can take it from there.

tenedor · 2024-03-22T18:02:28Z

Great, I sent you an email.

…

On Fri, Mar 22, 2024, 13:15 Graham Leach-Krouse ***@***.***> wrote: A meetup sounds good (the geographical coincidence kinda calls out for it). My email is ***@***.***, if you want to send me a note maybe we can take it from there. — Reply to this email directly, view it on GitHub <#18 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AALOGU77FVGZQKNHNFBUWATYZRRJ3AVCNFSM6AAAAABDVGHT7GVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDAMJVGU2DGNJZGI> . You are receiving this because you were mentioned.Message ID: ***@***.***>

gleachkr · 2024-04-15T14:55:24Z

Update: we've now got a stable-ish implementation of something corresponding to the rems-project elf32_file and elf64_file, here:

https://github.com/draperlaboratory/ELFSage/blob/deb16f7f9cdb6d02bbd76c4aeb288b95e024ab25/ELFSage/Types/File.lean#L91

It's essentially:

structure ELF64File where
  /-- The ELF Header -/
  file_header           :  ELF64Header
  /-- The Segments of the file -/
  interpreted_segments  : List (ELF64ProgramHeaderTableEntry × InterpretedSegment)
  /-- The Sections of the file -/
  interpreted_sections  : List (ELF64SectionHeaderTableEntry × InterpretedSection)
  /-- Bits and Bobs: binary contents not covered by any section or segment -/
  bits_and_bobs         : List (Nat × ByteArray)

With the fields implemented as: ELF64Header, ELF64ProgramHeaderTableEntry, InterpretedSegment, ELF64SectionHeaderTableEntry, and InterpretedSection.

Hopefully it's all pretty uncontroversial and boring-in-a-good-way, but we thought it might be a good idea to check in to make sure that this is going in a direction that looks useful to you, before anything gets too locked in.

tenedor · 2024-04-19T15:28:50Z

Hi @shigoel! What do you see as the integration path for using ELFSage in LNSym?

Do you expect any challenges integrating the current ELFSage data structures into LNSym?
What's the shortest path to a first useful integration?
What interfaces do you need ELFSage to keep stable?
What additional capabilities would you most want from ELFSage?
How does LNSym handle Lean version upgrades?

shigoel · 2024-04-19T20:53:10Z

@gleachkr elf32_file and elf64_file (and several other structures I looked at) look great! Thank you. Those will be crucial for the behavior we need in LNSym.

shigoel · 2024-04-19T20:54:22Z

@tenedor I'm temporarily working on another project right now, and unfortunately, I don't have the bandwidth to actively work on LNSym. I'll be back on LNSym in July, and I'm looking forward to properly trying out ELFSage then.

But I'll answer your "shortest path" question. An immediately useful capability for us (which I suspect already exists in ELFSage) would be to read in a given executable, and get all the bytes corresponding to a given symbol in any section (.text, .rodata, etc.). This would replace our clumsy way of representing programs right now: here's an example. This interface would be one of the most important to keep stable.

tenedor · 2024-04-20T00:55:35Z

Okay thanks @shigoel, good to know! Is LMSym paused until July, or is there someone else we should coordinate with?

…

On Fri, Apr 19, 2024, 16:54 Shilpi Goel ***@***.***> wrote: @tenedor <https://github.com/tenedor> I'm temporarily working on another project right now, and unfortunately, I don't have the bandwidth to actively work on LNSym. I'll be back on LNSym in July, and I'm looking forward to properly trying out ELFSage then. But I'll answer your "shortest path" question. An immediately useful capability for us (which I suspect already exists in ELFSage) would be to read in a given executable, and get all the bytes corresponding to a given symbol in any section (.text, .rodata, etc.). This would replace our clumsy way of representing programs right now: here's an example <https://github.com/leanprover/LNSym/blob/bdfadd596f64940993650e6378543a9ddebd27a1/Tests/SHA512ProgramTest.lean#L16>. This interface would be one of the most important to keep stable. — Reply to this email directly, view it on GitHub <#18 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AALOGU2ACEZ26REXA6SW2QDY6GABJAVCNFSM6AAAAABDVGHT7GVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDANRXGI3TAOJYGU> . You are receiving this because you were mentioned.Message ID: ***@***.***>

shigoel · 2024-04-22T15:40:57Z

@tenedor LNSym's paused till July unfortunately, especially for such changes that have no precedence so far. We may still commit some code meanwhile, if and when we find some pockets of time.

tenedor · 2024-04-22T15:53:59Z

@shigoel okay, we'll keep our focus on Draper's ELF needs til you're back. Good luck with your other project!

shigoel · 2024-04-24T02:49:52Z

@tenedor Thank you so much, Daniel!

shigoel · 2024-07-15T18:28:43Z

Hi @gleachkr and @tenedor! You may have already received a notification for this, but thought I'd ping here just in case: I've opened a PR in ELFSage to bump its Lean toolchain, as a first step towards using ELFSage with LNSym. Thanks!

shigoel · 2024-07-18T20:43:45Z

Update: Another PR opened in ELFSage!

tenedor · 2024-07-18T23:37:51Z

Hi @shigoel, great to hear you're back! I'm tied up working on other projects and I'm not sure when I'll get a chance to return to ELFSage, but I know @gleachkr has been working regularly on ELFSage.

shigoel · 2024-07-23T23:46:48Z

Update: another minor PR opened. Thanks!

shigoel · 2024-07-31T20:55:33Z

PR bumping toolchain.

bollu · 2024-08-09T12:57:03Z

Heya @gleachkr , gentle nudge on draperlaboratory/ELFSage#11 -- this fixes a stack overflow we witness from the latest ElfSage.

gleachkr · 2024-08-10T12:19:50Z

Thanks for the nudge! I've been on vacation, but will be back as of this Monday.

shigoel · 2024-09-11T16:50:59Z

Hi @gleachkr! Minor ELFSage PR to bump Lean's toolchain:
draperlaboratory/ELFSage#12

shigoel · 2024-10-08T01:19:59Z

Hi again @gleachkr! Another PR to bump Lean's toolchain
draperlaboratory/ELFSage#15

shigoel added the help wanted Extra attention is needed label Feb 22, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ELF Parser in Lean #18

ELF Parser in Lean #18

shigoel commented Feb 22, 2024

jimgrundy commented Feb 22, 2024

tenedor commented Mar 19, 2024

shigoel commented Mar 20, 2024

tenedor commented Mar 20, 2024

shigoel commented Mar 20, 2024

tenedor commented Mar 20, 2024

jimgrundy commented Mar 20, 2024 via email

tenedor commented Mar 21, 2024

gleachkr commented Mar 21, 2024

shigoel commented Mar 21, 2024

gleachkr commented Mar 21, 2024 •

edited

Loading

tenedor commented Mar 21, 2024

gleachkr commented Mar 21, 2024 via email

shigoel commented Mar 21, 2024

tenedor commented Mar 22, 2024

gleachkr commented Mar 22, 2024

tenedor commented Mar 22, 2024 via email

gleachkr commented Apr 15, 2024

tenedor commented Apr 19, 2024

shigoel commented Apr 19, 2024

shigoel commented Apr 19, 2024

tenedor commented Apr 20, 2024 via email

shigoel commented Apr 22, 2024

tenedor commented Apr 22, 2024

shigoel commented Apr 24, 2024

shigoel commented Jul 15, 2024

shigoel commented Jul 18, 2024

tenedor commented Jul 18, 2024

shigoel commented Jul 23, 2024

shigoel commented Jul 31, 2024

bollu commented Aug 9, 2024

gleachkr commented Aug 10, 2024

shigoel commented Sep 11, 2024

shigoel commented Oct 8, 2024

ELF Parser in Lean #18

ELF Parser in Lean #18

Comments

shigoel commented Feb 22, 2024

jimgrundy commented Feb 22, 2024

tenedor commented Mar 19, 2024

shigoel commented Mar 20, 2024

tenedor commented Mar 20, 2024

shigoel commented Mar 20, 2024

tenedor commented Mar 20, 2024

jimgrundy commented Mar 20, 2024 via email

tenedor commented Mar 21, 2024

gleachkr commented Mar 21, 2024

shigoel commented Mar 21, 2024

gleachkr commented Mar 21, 2024 • edited Loading

tenedor commented Mar 21, 2024

gleachkr commented Mar 21, 2024 via email

shigoel commented Mar 21, 2024

tenedor commented Mar 22, 2024

gleachkr commented Mar 22, 2024

tenedor commented Mar 22, 2024 via email

gleachkr commented Apr 15, 2024

tenedor commented Apr 19, 2024

shigoel commented Apr 19, 2024

shigoel commented Apr 19, 2024

tenedor commented Apr 20, 2024 via email

shigoel commented Apr 22, 2024

tenedor commented Apr 22, 2024

shigoel commented Apr 24, 2024

shigoel commented Jul 15, 2024

shigoel commented Jul 18, 2024

tenedor commented Jul 18, 2024

shigoel commented Jul 23, 2024

shigoel commented Jul 31, 2024

bollu commented Aug 9, 2024

gleachkr commented Aug 10, 2024

shigoel commented Sep 11, 2024

shigoel commented Oct 8, 2024

gleachkr commented Mar 21, 2024 •

edited

Loading