Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

issues/questions encountered while installing on an Orin Nano #113

Open
j-baker opened this issue Jul 24, 2023 · 2 comments
Open

issues/questions encountered while installing on an Orin Nano #113

j-baker opened this issue Jul 24, 2023 · 2 comments

Comments

@j-baker
Copy link

j-baker commented Jul 24, 2023

Hi! Firstly, wanted to say thank you so much for this invaluable project. The activation energy on something like this must have been huge and I'm very glad it exists. I've had a go at smashing my way through installing on an Orin Nano with a dsboard-ornx-lan, which is a bit non-standard but hopefully good enough.

There are three main issues/questions that I've ended up hitting, and though I've in general worked around, I figured I'd check in here to see if these had been hit before/had obvious solutions.

Firmware flash

Here, actually building the firmware went well, but actually flashing consistently blows up when trying to write. I've attached a flash log. Googling, it sounds like this is because 'the configuration is wrong' but it's hard for me to actually know what parts might be differing that's causing the flash to fail. Have you seen this error before? The board provides some DTB overlays which I put in the right places.

log.txt

However, I already have EFI on my dev board so I eventually figured it was probably optional. Actually getting into the EFI boot menu sadly required a USB-to-RS232/UART/TTL/whatever connector (this weekend has been Amazon same-day-delivery heavy).

What about DTB overlays?

So, clearly, from the above, I haven't succeeded at flashing. But, it's not obvious to me how the flashing would affect kernel runtime if I then just nuke the system partition. As far as I can tell, support in jetpack-nixos for DTB overlays is limited to the flashing parts of the code - not entirely clear what those overlays actually apply to. Is that right? It seems as though my carrier board has some DTB overlays that are supposed to be applied. Should I be trying to jam these in somewhere else? I assume parts of my jetson are presently non-functional due to the absence, though have not yet found the reason. Or maybe, since the flash failed, are they still the right DTBs?

USB ports not working

It's possible that I'm just suffering from #111 combined with not having managed to flash this thing yet - this is an Orin Nano 8GB not an Orin NX, but I think it is the Nano devkit. Basically, no USB device will show up until the kernel has booted - keyboards, mass storage, etc. Serial wire fixes keyboard issue.
In the end, I discovered a pretty hacky solution, which is that if you simply mount your ISO on an HTTP endpoint, you can point the HTTP boot of the Jetson at that. It can boot enough of NixOS from there that it can then pick up the USB drive for the rest of the boot. Wild! It made sense until I realised it was mounting the USB drive and didn't work without it!

Some missing kernel modules

This was a weird one. I found that LUKS pulled in some kernel modules not supported (blowfish in particular).

A broken kernel module

This was pretty brutal to deal with. It turns out that LUKS interacts with nVidia's hardware encryption (tegra_se_nvhost) poorly - depending on how you configure LUKS it causes hangs, kernel panics, FS corruption. Generally no bueno. Thankfully, their module is seemingly the only place which writes Couldn't get free cmdbuf on the internet. In the end I disabled by setting initcall_blacklist=tegra_se_module_init, which disabled the module. With it gone, LUKS seemed happy, although presumably with less performance. On this one - do you know who I would best follow up with on this? nVidia? OE4T?

This was a bit of an essay but hopefully there are some useful bits either for project maintainers or for people who hit similar issues.

Specific questions are:

  • Should I be worried that I was unable to flash?
  • How would I go about debugging a failed flash?
  • What do I do with my DTBs?
  • Where do I follow up with regards to a broken hardware kernel module?
@danielfullmer
Copy link
Collaborator

danielfullmer commented Jul 25, 2023

The first 3 questions are likely all related:

Is there a BSP package for this board? How do they intend for you to flash the device normally? (not using jetpack-nxios)? Typically, vendors provide a package that contains a set of files you're supposed to copy on top of the official NVIDIA flashing tools. That package often contains a .conf file for your board as well as some device trees that are used instead of the standard devkit device trees. Additionally, when the flash.sh script is run, it typically uses a different .conf file than the devkit one:
e.g.
./flash.sh xavier-agx-devkit
turns into something like:
./flash.sh custom-agx-board
(assuming there is a config file named custom-agx-board.conf)

The flashing script created by jetpack-nixos has a couple options that a relevant here:
hardware.nvidia-jetpack.flashScriptOverrides.postPatch can be set to add additional files to the "flash tools" directory where ./flash.sh is run from. This setting can just be some bash commands to copy from the BSP overlay package onto the temporary "flash tools" directory.
hardare.nvidia-jetpack.flashScriptOverrides.targetBoard should be set to a string that matches a .conf file present in the flash tools directory. Normally this is set automatically for you when carrierBoard = "devkit", but if since you're not using the devkit you should set this yourself.

Jetpack-nixos does use hardware.deviceTree.overlays if its set to apply device tree overlays at build time on top of those built as part of the kernel. However, that would be best used for additional customization on top of what's provided by the vendor.

On the last question, I may have encountered a similar issues with the tegra_se stuff, #108. (I also seem to recall some issue with LUKS encryption on Orins, but I wasn't able to reproduce that well enough to open an issue). Do you have more details on exactly what settings / configuration causes the kernel panic? If so, could you also open an issue here so we can track it? I think it's probably worth following up on the NVIDIA forums, especially if the issue is reproducible on the standard Ubuntu distro / L4T.

@j-baker
Copy link
Author

j-baker commented Jul 25, 2023

Many thanks for the detailed reply. Ack, I filed #114. The BSP contains:

  1. Some DTBs to move into the kernel subdirectory. These replace pre-existing files.
  2. Some DTBs to move into the boot subdirectory. These replace pre-existing files.
  3. A replacement kernel image and a copy of nvgpu.ko.
  4. Some pinmux dtsi files.
  5. It then does some sedding in some pre-existing dts files, to e.g. replace cvb_eeprom_read_size = <0x100>; with 0x0.
  6. They then have you run sudo ./tools/kernel_flash/l4t_initrd_flash.sh --external-device nvme0n1p1 -c tools/kernel_flash/flash_l4t_external.xml -p "-c bootloader/t186ref/cfg/flash_t234_qspi.xml" --showlogs --network usb0 jetson-orin-nano-devkit internal.

It sounds like:

  • For 1, 2, 4, 5, I can use flashScriptOverrides to apply the adjustments. It additionally sounds like (based on 6) I might want to use the undocumented initrd method of flashing?
  • It looks like they do actually want you to use the orin nano devkit profile, and presumably achieve this by actually overwriting what that means. I don't know what the flash_234_qspi bit means, though.
  • For the kernel image (path kernel/Image) & nvgpu.ko, it sounds like I should be emailing them in order to exercise my GNU GPL rights, yes? And then I can presumably use standard Nix mechanisms to apply those patches to the kernel you build?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants