Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Uninitialized Nodes in Instruction Set Tree #74

Open
fpedd opened this issue Sep 6, 2021 · 2 comments
Open

Uninitialized Nodes in Instruction Set Tree #74

fpedd opened this issue Sep 6, 2021 · 2 comments

Comments

@fpedd
Copy link
Collaborator

fpedd commented Sep 6, 2021

When uncommenting

// std::cout << iset.print() << std::endl;

the tree structure of the instruction set/instruction decoder gets printed.

However, some nodes in the compressed instruction set tree are printed as "uninitialized" (arrows <----- inserted by me):

...
MODE 1: 	ISA16_RISCV[default: 16]:
		ISA16_RISCV[16]:
			@0x0 Node[1:0]
				@0x0 Node[15:13]
					@0x0 Uninitialized Node <-----
					@0x1 Instruction: c.fld
					@0x2 Instruction: c.lw
					@0x3 Instruction: c.flw
					@0x5 Instruction: c.fsd
					@0x6 Instruction: c.sw
					@0x7 Instruction: c.fsw
				@0x1 Node[15:13]
					@0x0 Uninitialized Node <-----
					@0x1 Instruction: c.jal
					@0x2 Instruction: c.li
					@0x3 Uninitialized Node <-----
					@0x4 Node[11:10]
						@0x0 Node[12:12]
							@0x0 Instruction: c.srli
						@0x1 Node[12:12]
							@0x0 Instruction: c.srai
						@0x2 Instruction: c.andi
						@0x3 Node[6:5]
							@0x0 Node[12:12]
								@0x0 Instruction: c.sub
							@0x1 Node[12:12]
								@0x0 Instruction: c.xor
							@0x2 Node[12:12]
								@0x0 Instruction: c.or
							@0x3 Node[12:12]
								@0x0 Instruction: c.and
					@0x5 Instruction: c.j
					@0x6 Instruction: c.beqz
					@0x7 Instruction: c.bnez
				@0x2 Node[15:13]
					@0x0 Node[12:12]
						@0x0 Instruction: c.slli
					@0x1 Instruction: c.fldsp
					@0x2 Instruction: c.lwsp
					@0x3 Instruction: c.flwsp
					@0x4 Node[12:12]
						@0x0 Uninitialized Node <-----
						@0x1 Uninitialized Node <-----
					@0x5 Instruction: c.fsdsp
					@0x6 Instruction: c.swsp
					@0x7 Instruction: c.fswsp
...

A node gets printed as uninitialized when this condition evaluates to false:

if (reader_ != nullptr && subs_ != nullptr)

The second uninitialized node

@0x1 Node[15:13]
	@0x0 Uninitialized Node <-----

corresponds to the c.addi instruction. Checking the binary I am compiling (using an rv32gc compiler) this instruction gets used multiple times. I would thus expect the binary to throw some sort of error. However, the binary using these "uninitialized instructions" runs without any issues.

What is happening here? Why are those nodes printed as uninitialized? Why does the binary run anyways? Any help is appreciated! :)

PS: I am mainly asking because I am working on something else, where some instructions/nodes of a RISC-V instruction set extension are also printed as "uninitialized". However, those uninitialized instructions cause some trouble and I am trying to understand why and where the underlying issue is.

@wysiwyng
Copy link
Contributor

Please check whether this instruction is actually executed, I am pretty confident it is (otherwise ETISS would complain, as you already noted). You can do that i.e. by using the PrintInstruction plugin, or placing a breakpoint somewhere here:

imm += imm_0;
and running ETISS with a debugger.

The instruction tree printing stuff has some issues, but usually these don't mean the decoder is not working. @rafzi might know more as to why the instruction tree prints do not work as expected.

@fpedd
Copy link
Collaborator Author

fpedd commented Sep 15, 2021

Providing some more Infos:

Compiling the following main.c with an rv32gcv toolchain:

#include <stdlib.h>
#include <stdio.h>

int main()
{
    asm("addi a1, a1, 1");
    asm("c.addi a1, 1");
    printf("hello world!\n");
}

and dumping the binary with riscv32-unknown-elf-objdump -h -S riscv_example.elf > riscv_example.lst gives:

0000008c <main>:
#include <stdlib.h>
#include <stdio.h>

int main()
{
      8c:	1141                	addi	sp,sp,-16
      8e:	c606                	sw	ra,12(sp)
      90:	c422                	sw	s0,8(sp)
      92:	0800                	addi	s0,sp,16
    asm("addi a1, a1, 1");
      94:	0585                	addi	a1,a1,1
    asm("c.addi a1, 1");
      96:	0585                	addi	a1,a1,1
    printf("hello world!\n");
      98:	67b1                	lui	a5,0xc
      9a:	e6878513          	addi	a0,a5,-408 # be68 <__DTOR_END__+0x1a>
      9e:	135010ef          	jal	ra,19d2 <puts>
      a2:	4781                	li	a5,0
}
      a4:	853e                	mv	a0,a5
      a6:	40b2                	lw	ra,12(sp)
      a8:	4422                	lw	s0,8(sp)
      aa:	0141                	addi	sp,sp,16
      ac:	8082                	ret

One can see how most of the instructions are 16bit/compressed instructions (for some reason the human-readable instructions are not shown as compressed instructions). Because the assembler is responsible for converting "normal" instructions to compressed instructions (of course only when compressed support is available), also the addi inline assembly instruction gets converted to its compressed equivalent (address 0x94).

Running this with the PrintInstruction plugin enabled gives:

...
0x000000000000008c: c.addi # 0x0x1141 [UNKNOWN PARAMETERS]
0x000000000000008e: c.swsp # 0x0xc606 [UNKNOWN PARAMETERS]
0x0000000000000090: c.swsp # 0x0xc422 [UNKNOWN PARAMETERS]
0x0000000000000092: c.addi4spn # 0x0x0800 [UNKNOWN PARAMETERS]
0x0000000000000094: c.addi # 0x0x0585 [UNKNOWN PARAMETERS]
0x0000000000000096: c.addi # 0x0x0585 [UNKNOWN PARAMETERS]
0x0000000000000098: c.lui # 0x0x67b1 [UNKNOWN PARAMETERS]
0x000000000000009a: addi # 0x0xe6878513 [UNKNOWN PARAMETERS]
0x000000000000009e: jal # 0x0x135010ef [UNKNOWN PARAMETERS]
...

I also set a breakpoint using the target gdb in at one of the inline addi instructions and checked the dereferenced instruction pointer, which supports the claim that indeed a "compressed add immediate" is executed:

(gdb) x $pc
0x94 <main+8>: 0x05850585

With the CoreDSL for c.addi instruction:

C.ADDI {
encoding:b000 | imm[5:5]s | rs1[4:0] | imm[4:0]s | b01;
args_disass: "{name(rs1)}, {imm:#05x}";
X[rs1] <= X[rs1]'s + imm;
}

the 0x0585 -> 0b 0000 0101 1000 0101 -> 0b 000 0 01011 0001 01 matches the c.addi instruction with register a1 -> x11 -> 0b01011 and 1 as immediate value.

So I am fairly certain that a "compressed add immediate" is executed.

Coming back to the instruction tree and using the encoding from above b000 | imm[5:5]s | rs1[4:0] | imm[4:0]s | b01

	@0x0 Node[1:0]
		@0x0 Node[15:13]
			@0x0 Uninitialized Node
			@0x1 Instruction: c.fld
			@0x2 Instruction: c.lw
			@0x3 Instruction: c.flw
			@0x5 Instruction: c.fsd
			@0x6 Instruction: c.sw
			@0x7 Instruction: c.fsw
		@0x1 Node[15:13]
			@0x0 Uninitialized Node <-----

the c.addi however seems to be uninitialized.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants