Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

problem with generator on 64 bit compters #65

Open
picca opened this issue Feb 7, 2018 · 14 comments
Open

problem with generator on 64 bit compters #65

picca opened this issue Feb 7, 2018 · 14 comments

Comments

@picca
Copy link

picca commented Feb 7, 2018

Hello, I am using generator and coroutine on an i386 computer in order to build my hkl project.
It works fine.

The problem arise when I switch to 64 bits computer and can be reproduce using the current debian unstable source package from here

https://packages.debian.org/source/unstable/hkl

(the problematic version is this one https://tracker.debian.org/news/930678)

It generates this kind of failure via a Segfault.

https://buildd.debian.org/status/fetch.php?pkg=hkl&arch=amd64&ver=5.0.0.2447-1&stamp=1518020479&raw=0

I tryed to debug the problem via gdb and I got this

Program received signal SIGSEGV, Segmentation fault.
generator_new_ (fn=fn@entry=0xaaaaaaae65b8 <trajectory_gen_generator__>, retsize=48, retsize@entry=40) at generator/generator.c:38
38      generator/generator.c: No such file or directory.
(gdb) bt
#0  0x0000aaaaaaae50d8 in generator_new_ (fn=fn@entry=0xaaaaaaae65b8 <trajectory_gen_generator__>, retsize=48, retsize@entry=40) at generator/generator.c:38
#1  0x0000aaaaaaae6b78 in trajectory_gen (tconfig=...) at hkl2.c:246
#2  0x0000aaaaaaae6c2c in Trajectory_solve (tconfig=..., gconfig=..., sconfig=..., move=43690) at hkl2.c:288
#3  0x0000aaaaaaabf324 in main () at sirius.c:161

If we look at the generator line,we get this

gen->base = base;

I do not understand why it work great on 32 bits computer and not on 64 bits computer.

can you help me find the problem. I am ready to test this until it works on 64 bits computer also :))

Cheers

Frederic

@picca
Copy link
Author

picca commented Feb 7, 2018

here the valgrind output on the arm64 porter box from debian

(sid_arm64-dchroot)picca@amdahl:~/hkl/Documentation/figures$ ../../libtool --mode=execute valgrind  --leak-check=full --show-leak-kinds=all --main-stacksize=1000000 sirius
==7368== Memcheck, a memory error detector
==7368== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==7368== Using Valgrind-3.13.0 and LibVEX; rerun with -h for copyright info
==7368== Command: /home/picca/hkl/Documentation/figures/.libs/sirius
==7368==
==7368== Invalid write of size 8
==7368==    at 0x1430D8: generator_new_ (generator.c:38)
==7368==    by 0x144B77: trajectory_gen (hkl2.c:246)
==7368==    by 0x144C2B: Trajectory_solve (hkl2.c:288)
==7368==    by 0x11D323: main (sirius.c:161)
==7368==  Address 0x23c0 is not stack'd, malloc'd or (recently) free'd
==7368==
==7368==
==7368== Process terminating with default action of signal 11 (SIGSEGV)
==7368==  Access not within mapped region at address 0x23C0
==7368==    at 0x1430D8: generator_new_ (generator.c:38)
==7368==    by 0x144B77: trajectory_gen (hkl2.c:246)
==7368==    by 0x144C2B: Trajectory_solve (hkl2.c:288)
==7368==    by 0x11D323: main (sirius.c:161)
==7368==  If you believe this happened as a result of a stack
==7368==  overflow in your program's main thread (unlikely but
==7368==  possible), you can try to increase the size of the
==7368==  main thread stack using the --main-stacksize= flag.
==7368==  The main thread stack size used in this run was 1048576.
==7368==
==7368== HEAP SUMMARY:
==7368==     in use at exit: 45,052 bytes in 234 blocks
==7368==   total heap usage: 278 allocs, 44 frees, 52,804 bytes allocated
==7368==
==7368== 64 bytes in 4 blocks are possibly lost in loss record 1 of 8
==7368==    at 0x4844BBC: malloc (in /usr/lib/valgrind/vgpreload_memcheck-arm64-linux.so)
==7368==
==7368== 184 bytes in 1 blocks are possibly lost in loss record 2 of 8
==7368==    at 0x4846EE8: realloc (in /usr/lib/valgrind/vgpreload_memcheck-arm64-linux.so)
==7368==
==7368== 688 bytes in 45 blocks are still reachable in loss record 3 of 8
==7368==    at 0x4844BBC: malloc (in /usr/lib/valgrind/vgpreload_memcheck-arm64-linux.so)
==7368==
==7368== 1,104 bytes in 13 blocks are possibly lost in loss record 4 of 8
==7368==    at 0x4846CDC: calloc (in /usr/lib/valgrind/vgpreload_memcheck-arm64-linux.so)
==7368==
==7368== 4,096 bytes in 1 blocks are still reachable in loss record 5 of 8
==7368==    at 0x4846EE8: realloc (in /usr/lib/valgrind/vgpreload_memcheck-arm64-linux.so)
==7368==
==7368== 8,192 bytes in 1 blocks are definitely lost in loss record 6 of 8
==7368==    at 0x4844C84: malloc (in /usr/lib/valgrind/vgpreload_memcheck-arm64-linux.so)
==7368==
==7368== 12,976 bytes in 138 blocks are still reachable in loss record 7 of 8
==7368==    at 0x4846CDC: calloc (in /usr/lib/valgrind/vgpreload_memcheck-arm64-linux.so)
==7368==
==7368== 17,748 bytes in 31 blocks are still reachable in loss record 8 of 8
==7368==    at 0x4844C84: malloc (in /usr/lib/valgrind/vgpreload_memcheck-arm64-linux.so)
==7368==
==7368== LEAK SUMMARY:
==7368==    definitely lost: 8,192 bytes in 1 blocks
==7368==    indirectly lost: 0 bytes in 0 blocks
==7368==      possibly lost: 1,352 bytes in 18 blocks
==7368==    still reachable: 35,508 bytes in 215 blocks
==7368==                       of which reachable via heuristic:
==7368==                         newarray           : 1,536 bytes in 16 blocks
==7368==         suppressed: 0 bytes in 0 blocks
==7368==
==7368== For counts of detected and suppressed errors, rerun with: -v
==7368== ERROR SUMMARY: 5 errors from 5 contexts (suppressed: 0 from 0)
Segmentation fault

@picca
Copy link
Author

picca commented Feb 7, 2018

here my generator

struct Trajectory {
	enum trajectory_e tag;
	union {
		struct {double h0; double k0; double l0;
			double h1; double k1; double l1;
			uint n; struct Mode mode;} hklfromto;
	};
};

#define TrajectoryHklFromTo(h0_, k0_, l0_, h1_, k1_, l1_, n_, mode_) {.tag=TRAJECTORY_HKL_FROM_TO, .hklfromto={h0_, k0_, l0_, h1_, k1_, l1_, n_, .mode=mode_}}

extern generator_declare(trajectory_gen, struct Engine, struct Trajectory, tconfig);

@picca
Copy link
Author

picca commented Feb 7, 2018

And indeed te code which call the generator

HklGeometryList *Trajectory_solve(struct Trajectory tconfig,
				  struct Geometry gconfig,
				  struct Sample sconfig,
				  uint move)
{
	const struct Engine *econfig;
	HklGeometryList *solutions = hkl_geometry_list_new();
	generator_t(struct Engine) gen = trajectory_gen(tconfig);

	HklGeometry *geometry = newGeometry(gconfig);
	HklEngineList *engines = newEngines(gconfig);
	HklSample *sample = newSample(sconfig);
	HklDetector *detector = hkl_detector_factory_new(HKL_DETECTOR_TYPE_0D);
	HklTrajectoryStats *stats = hkl_trajectory_stats_new(Trajectory_len(tconfig));

	hkl_engine_list_init(engines, geometry, detector, sample);

	while((econfig = generator_next(gen)) != NULL){
		/* Engine_fprintf(stdout, *econfig); */
		HklGeometryList *geometries = Engine_solve(engines, *econfig);
		if(NULL != geometries){
			const HklGeometryListItem *solution;

			hkl_trajectory_stats_add(stats, geometries);
			solution = hkl_geometry_list_items_first_get(geometries);
			if(move)
				hkl_engine_list_select_solution(engines, solution);

			hkl_geometry_list_add(solutions,
					      hkl_geometry_list_item_geometry_get(solution));
			/* hkl_geometry_list_fprintf(stdout, geometries); */
			hkl_geometry_list_free(geometries);
		}
	}

	/* hkl_trajectory_stats_fprintf(stdout, stats); */

	hkl_trajectory_stats_free(stats);
	hkl_detector_free(detector);
	hkl_sample_free(sample);
	hkl_engine_list_free(engines);
	hkl_geometry_free(geometry);
	generator_free(gen);

	return solutions;
}

@dgibson
Copy link
Collaborator

dgibson commented Feb 8, 2018

Ok. The first thing to know is that I wrote generator most to see if I could. I was quite surprised how well it worked out, but I don't know that I'd really call it ready for serious use. But let's see what I can work out..

I developed and tested the module mostly on x86_64, so it's certainly not broken on all 64-bit systems in all circumstances.

My guess is simply that you're running out of space on the generator's stack - 64-bit code will typically consume more stack space, because of the 64-bit pointers it needs to store there. Currently generator always allocates a fixed - and tiny (~8kiB) - stack for each generator, meaning you're pretty limited in what you can do inside the actual generator function. Your return type is also a reasonably sized structure, so if you're manipulating similar structures within your generator code that could consume stack pretty quickly.

I had some fairly ambitious plans to improve the stack handling in coroutine and generator, but I didn't get very far with it before being distracted by other projects and I've never had a chance to go back to it.

So first step is to see if that actually is the problem: go into generator.c and change DEFAULT_STACK_SIZE to something much larger (say a couple of megabytes) and see if the problem goes away. If it does, we can work together on improving the out of the box behaviour.

@picca
Copy link
Author

picca commented Feb 8, 2018

Thanks a lot for your reply,

I can confirm that using 4 times the current stack_size (8192*4), make the problem vanish for my use case. Now it is time to improve the out of box behaviour ;)

@picca
Copy link
Author

picca commented Feb 8, 2018

It vanish on the arm64 architecture, but even if I increase a lot the stack size, on amd64 it still faill

@picca
Copy link
Author

picca commented Feb 8, 2018

the stack trace is a bit different

#0  0x000055555559311f in trajectory_gen_generator__ (ret=0x557be730) at hkl2.c:246
#1  0x00007ffff6aefd70 in  () at /lib/x86_64-linux-gnu/libc.so.6
#2  0x0000000000000000 in  ()

@picca
Copy link
Author

picca commented Feb 8, 2018

the sefgault arise when I define the generator.

generator_def(trajectory_gen, struct Engine, struct Trajectory, tconfig)
{
	switch(tconfig.tag){
	case TRAJECTORY_HKL_FROM_TO:
	{
		uint i;
		double dh = (tconfig.hklfromto.h1 - tconfig.hklfromto.h0) / (tconfig.hklfromto.n);
		double dk = (tconfig.hklfromto.k1 - tconfig.hklfromto.k0) / (tconfig.hklfromto.n);
		double dl = (tconfig.hklfromto.l1 - tconfig.hklfromto.l0) / (tconfig.hklfromto.n);
		for(i=0; i<tconfig.hklfromto.n + 1; ++i){
			double h = i * dh + tconfig.hklfromto.h0;
			double k = i * dk + tconfig.hklfromto.k0;
			double l = i * dl + tconfig.hklfromto.l0;

			struct Engine econfig = EngineHkl(h, k, l, tconfig.hklfromto.mode);
			generator_yield(econfig);
		}
	}
	break;
	}
}

@picca
Copy link
Author

picca commented Feb 8, 2018

Here the valgrind output on amd64

(sid_amd64-dchroot)picca@barriere:~/hkl$ ./libtool --mode=execute valgrind  --track-origins=yes Documentation/figures/sirius
==21449== Memcheck, a memory error detector
==21449== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==21449== Using Valgrind-3.13.0 and LibVEX; rerun with -h for copyright info
==21449== Command: /home/picca/hkl/Documentation/figures/.libs/sirius
==21449==
==21449== Warning: client switching stacks?  SP change: 0x1fff0001b8 --> 0x6c72e08
==21449==          to suppress, use: --max-stackframe=137308459952 or greater
==21449== Warning: client switching stacks?  SP change: 0x6c72d88 --> 0x1fff0001c0
==21449==          to suppress, use: --max-stackframe=137308460088 or greater
==21449== Conditional jump or move depends on uninitialised value(s)
==21449==    at 0x121A2B: perm_r (hkl-geometry.c:1213)
==21449==    by 0x121C2D: hkl_geometry_list_multiply_from_range (hkl-geometry.c:1281)
==21449==    by 0x11CC04: hkl_engine_set.part.7 (hkl-pseudoaxis-private.h:581)
==21449==    by 0x126778: hkl_engine_set (hkl-pseudoaxis-private.h:560)
==21449==    by 0x126778: hkl_engine_pseudo_axis_values_set (hkl-pseudoaxis.c:207)
==21449==    by 0x147664: Engine_solve (hkl2.c:233)
==21449==    by 0x1478C4: Trajectory_solve (hkl2.c:300)
==21449==    by 0x11CEE1: main (sirius.c:161)
==21449==  Uninitialised value was created by a stack allocation
==21449==    at 0x121BB1: hkl_geometry_list_multiply_from_range (hkl-geometry.c:1266)

The stack switch information worried me.

@dgibson
Copy link
Collaborator

dgibson commented Feb 11, 2018

Ok. So, the first step is altering generator to use coroutine_stack_alloc() instead of coroutine_stack_init(). The former is safer, more flexible, and if it fails is much more likely to fail with a relatively clear SEGV, instead of corrupting unrelated memory.

The complication is that the generator code makes a temporary allocation at the "bottom" of the stack area (that is, at the far end of the buffer from where the stack itself will first occupy). That's used to pass the initial parameters into the generator code. So far, this requires manually managing the stack buffer, because coroutine_stack_*() doesn't provide a built in way to make such an allocation.

So we either need to extend coroutine with a way to make such temporary allocations, or use a different means of passing the arguments in.

Let's worry about x86_64 first - there may be further complications with arm64.

The "switching stack" warnings are kind of expected - the generator code does indeed switch stacks, that's how it works. I have some valgrind calls which are supposed to tell it what's going on, but apparently they're not really supported any more, so we might just have to live with valgrind not understanding this code.

@picca
Copy link
Author

picca commented Feb 17, 2018

Hello, I can not be of a great help here except if you want me to test patches :),

have a good weekend

@picca
Copy link
Author

picca commented Feb 22, 2018

Just for information it works with gcc6, on an amd64 computer

:~/hkl/hkl$ gcc --version
gcc (Debian 6.3.0-18) 6.3.0 20170516
Copyright (C) 2016 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

So there is something different with gcc-7

@dgibson
Copy link
Collaborator

dgibson commented Feb 23, 2018

My guess would just be that gcc7 is doing some better optimizations that result in less stack usage, and so we're not hitting the overrun that we did before.

@picca
Copy link
Author

picca commented Aug 4, 2018

Hello, I reveive this today

https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=889878#26

so it seems that this fantastic guy found the culprite :))

picca pushed a commit to picca/ccan that referenced this issue Nov 8, 2018
It seems that this patch solve our problem with generator on all
arch supported by Debian.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants