-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Run 3 Long Jobs with multi-threaded PhoSim 3.6 #420
Comments
Yes, it would seem a validation is in order. There was nothing in the announcement regarding the implementation of multi-threading, so it will be interesting to see how the total workload is divided between threads (per photon? per source? something else?) and how the total execution time changes for some of our lengthiest visits. Heather has already offered to install the new code at SLAC & NERSC so we should be able to start working with v3.6 soon. The work on checkpointing is coming along and I would like to get that project to a stage that it could be integrated into a workflow quickly. Note that dmtcp advertises full support for multi-threaded applications so this work will hopefully still apply with phoSim v3.6. |
Excellent! I got the impression that the parallelization was per source, but it'd be good to check this in the v3.6 documentation. I had a quick look but there's no PIN about multithreading on the wiki, and the walkthrough has not been updated yet. Which source files should we be reading to
understand how things work, John?
|
Its multithreaded on a per source basis (the photon level doesn’t work as there is too much thread divergence and inefficiencies). this is fine because for example the background is made up of thousands of sources.
all you have to do is have “-t N” where N is the number of threads. i will update the wiki documentation about this in a bit.
|
Hi John, Have you tested it with 48 or more threads? Cori Phase II at NERSC supports up to 272 hardware threads (4 per core), so it'd be interesting to see if we can leverage that. |
phosim v3.6 is now available on Cori: |
yeah, i think en-hsin did either 24 or 48 tests on a cluster here at Purdue. personally, i usually just to 4 or 8 on my laptop. at some point there will be diminishing returns from the non-threaded setup, but that might be around 48 anyways, is my guess.
john
|
Hi John, If it is per source do you mean they are done one-by-one and then (in principle) added later at near positions? Let's say there was a galaxy and and a star do you do anything to make sure that BF still works? If you added all the light from the galaxy and then the start afterwards the effect would be lost right? Just trying to understand exactly what you mean.. -Chris |
chris-
its even better than that. so say have two bright sources that are overlapping like you are imagining and then you do 2 threads. what will happen is that it will be simulating photons both at the same time on two different cores, but whenever an electron is collected it will update the collected electron image *while* its going. the other thread will then get to see the e-field from those new electrons *during* its simulation. so it really shouldn’t have any difference whatsoever even in the case of brighter-fatter.
we have redone *all* the thousands of intergration tests with 4 threads instead of the usual 1 and i haven’t noticed any changes in results, so we should all be ok. (in fact, given the speed ups, we probably will always run multi-threaded validation runs from now on). but if anyone notices anything strange please let me know.
john
|
That's great. Thanks. |
Hi John, I am interested in learning a bit about the control flow of the new phoSim. Is there a document or flow chart that might provide an overview?
|
Tom, i don’t have a document, but basically the multithreading happens only in the core raytrace calculation and doesn’t have anything to do with the overall phosim workflow.
you will want to still run phosim with the condor option and then use Glenn’s script to convert it to NERSC or SLAC job submission commands. the only difference is you use the “-t N” option for the phosim invocation to send the signal to the jobs to be threaded. Glenn was just testing to see if it still works with his script, but if not we will let you know and update with the new version.
john
On Dec 2, 2016, at 5:00 PM, Tom Glanzman <[email protected]<mailto:[email protected]>> wrote:
Hi John,
I am interested in learning a bit about the control flow of the new phoSim. Is there a document or flow chart that might provide an overview?
* Tom
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub<#420 (comment)>, or mute the thread<https://github.com/notifications/unsubscribe-auth/AJbT8iHLw5orh9WkCaVEWihzF-MAQ7wqks5rEJUJgaJpZM4LCrYR>.
———————————
John R. Peterson
Assoc. Professor of Physics and Astronomy
Department of Physics and Astronomy
Purdue University
525 Northwestern Ave.
West Lafayette, IN 47906
(765) 494-5193
|
Some Twinkles validation of the new phoSim version 3.6.0 has been produced. Using exactly the same configuration as for the main Twinkles workflow task (TW-phoSim-r3), I have created two new workflow tasks: TW-phoSim-r3-MT (using phoSim v3.6.0 and running with four (4) threads); and, TW-phoSim-r3-noMT (using phoSim v3.6.0 with no multi-threading). A total of ten (10) visits were processed with TW-phoSim-r3-MT and five (5) visits with TW-phoSim-r3-noMT. Links to the workflow: TW-phoSim-r3 While the multi-threaded phoSims were running, I was able to confirm the ongoing creation of up to four extra execution threads using a combination of tools (ps and top via lsrun, and farmrtmweb). I have not attempted to test different numbers of threads. Timing results: Average wall clock time ratio (v3.5.3/v3.6.0) = 4.1 Note that there are situations when running large productions in which seemingly random jobs will exhibit unusual CPU and/or wall-clock times. This can be due to various reasons, such as transient I/O bottlenecks to a needed storage server; competing jobs on the batch host hogging critical resources; or other transient outages. Part of the reason the v3.6.0 job efficiency took a hit is that during the phoSim execution, threads are continually being created and killed. Sometimes, not all four execution threads fully utilized for short periods of time. Part of this is likely due to the overhead of thread management, and part may be due to phoSim design. This loss of efficiency is offset by the reduction in total CPU time -- which I find slightly mysterious. In any event, the net savings in wall-clock time is significant, congratulations to the phoSim team! (Some raw timing data comparing these 20 runs appear in this Google sheet) Data Product Comparison: Could these differences be attributed simply to random number seeds? Or other changes/features in the v3.6.0 release? Others with an interest in the difference details are invited to take a look for themselves. The files are at SLAC, e.g., for visit "000000": TW-phoSim-r3: TW-phoSim-r3-MT: TW-phoSim-r3-noMT Note that the visit index, "000000", may be replaced by "000001" through "000009" for TW-phoSim-r3-MT, and through "000004" for TW-phoSim-r3-noMT. There was one configuration hiccup associated with v3.6.0. A new dependency on the phoSim installation's data/sky directory suddenly appeared and required the 'sky' directory to be placed adjacent to the (staged) copy of the SEDs. Perhaps John, and Co. could comment on whether this is a bug or a feature? Please feel free to add comments to this issue thread. |
@LSSTDESC/twinkles PhoSim v3.6 is out! The release email from the PhoSim team is pasted in below for our records. Congratulations, and a big thank you, to @johnrpeterson et al :-)
@TomGlanzman what timetable do you suggest we follow for running the remaining Twinkles 1 Run 3 "long jobs" with PhoSim 3.6, with the same commands and configuration as you used in Run 3.1, 3.2, and 3.3? I guess we'll need to check that the results from v3.6 match those from v3.5 in some of the short visits...
The text was updated successfully, but these errors were encountered: