Skip to content

Triaging Issues

Juan Cruz Viotti edited this page Jan 16, 2017 · 9 revisions

Triaging Issues

This document aims to serve as an ever-evolving list of pointers and advice on how to debug user's issues.

General guidelines:

  • Always ask the operating system, version, and architecture
  • Always ask what Etcher version they are running
  • Try to reproduce the issue yourself

Drive doesn't boot

If the issue can't be identified, ask the user to upload the image file somewhere for further investigation, and save it on resin.io's Google Drive, so we ensure we don't lose access to it.

These require special treatment that we don't do at the moment. See https://github.com/resin-io/etcher/issues/210

  • Check if the image includes a partition table

Some images don't include a partition table. We've seen this on some VMWare images. See https://github.com/resin-io/etcher/issues/553

  • Was the image downloaded completely?

Maybe calculate a checksum of the local file and compare it with one provided by the publisher?

VBScript errors

  • The only place where we use VBScript is on drivelist

This usually indicates a coding bug in our Windows drive detection script. Take note of the VBScript error stack trace.

Uncaught errors during writing/validation

If the issue can't be identified, or it can be consistently reproduced with a certain image, ask the user to upload the file somewhere for further investigation, and save it on resin.io's Google Drive, so we ensure we don't lose access to it.

  • The etcher-image-write modules exposes a nice CLI (see bin/cli.js) when installing globally (e.g: npm install -g etcher-image-write). Ask the user to try to reproduce that way, so we narrow the issue further.

  • Make sure the user provides a screenshot of the uncaught error along with the full stack trace

  • Narrow the issue by identifying at which point the usually happens (e.g: at the beginning of the write process, right after clicking "Flash", during the end of the validation phase)

    • If the issue happens right after clicking flash, before the elevation dialog was shown:

      • The process of elevating the child process is a complex one. This usually points out a coding bug there.
    • If the issue happens right after clicking flash, after the elevation dialog was shown:

      • If on GNU/Linux or OS X, the error might reside on the initial unmounting routine
        • Ask the user to manually unmount the drive first to confirm. The issue might be reproducible outside Etcher, otherwise, it might indeed be an unmounting issue in our application
      • If on Windows, the error might reside on the routine that cleans up the drive (wipes its partition table)
        • Ask the user to try to wipe it out manually (see the clean command of diskpart.exe)
      • It can be an error when spawning the writer process
        • We display the command the exact command that we run in DevTools. Check that the command has no obvious issues (e.g: quoting, special characters). If its not the case, ask the user to open a terminal emulator with administrator/sudo permissions and run the command manually, to see if that shows any other information. Since the child process communicates with an IPC server, ask the user to run the command without closing the main Etcher window, otherwise the IPC server will be closed
    • If the issue happens right before finishing the write process

    • If the issue happens right after starting the validation process

    • If the issue happens before finishing the validation process

      • It might an unmounting issue (confirm by asking the user to disable "unmounting after success" on settings)

Reproducible validation errors

  • This might indicate a bug in our validation routine (unlikely though, it has been battle tested for a while)
  • Check if the drive is getting mounted during the middle of the validation phase, causing the operating system to write dummy files like .DS_Store etc. This usually happens on images that contain partitions recognizable to the OS (like FAT). Ask the user to try another image that can't be read in their OS directly to confirm the issue
  • Ask the user to reproduce with the https://github.com/resin-io-modules/etcher-image-write CLI

Drive not detected

  • This usually means that a removable drive was incorrectly detected to be a system drive by https://github.com/resin-io-modules/drivelist. Ask the user to run the corresponding platform script inside scripts/ and confirm by checking that the drive in question is reported as system: true

    • In OS X, ask the user for the output of diskutil list, diskutil info /dev/diskN and mount. This is a bash script so you can probably figure out the root cause easily
    • In GNU/Linux, ask the user for the output of lsblk -d --output NAME, df --output=source,target, lsblk -b -d /dev/<device> --output SIZE,RO,RM,MODEL, udevadm info --query=property --export --export-prefix=UDEV_ --name=/dev/<device>. This is also a bash script, so it should be easy to figure out what's going on
    • In Windows, this is way trickier, since the script is a VBScript program. Read the script, inject Wscript.Echo calls to output things you think would be beneficial to debug and ask the user to run your modified copy
  • The drive might be a non-removable drive

Ask the user to enable unsafe mode. For safety purposes we will not attempt to interpret a non-removable drive as a removable drive using any kind of heuristic.

Clone this wiki locally