Global Task List

Must-Haves for Paper

Thread-creation must have load-balancing included.
Fix the Shenango reference
Use the proper mechanism for the exclusive-thread-exit instead of the current hacks (reclaiming on demand). See board picture for ideas.
Maintain a count of blocked threads and use numOccupied - numBlocked to determine load instead of just numOccupied. This will correct against the CoreLoadEstimator taking up a thread slot and being blocked most of the time.

Finish new Core Arbiter implementation and replace measurements in paper.
- Always allocate the hypertwin to the same process.
- When yielding cores, always yield in the same order we received a core.
Investigate and try to optimize thread exit performance
RAMCloud latency numbers - why are they so much higher than we expect?
Measure the cost of the random number generator

Fix TLS issues in Arachne
Ensure that empty thread contexts which a core is polling on do not get swapped during thread migration.
Revisit the bug fixes that Peter implemented and see if there are cleaner solutions.
Run the synthetic application without the power of 2. Just make a random choice.
- Evidence for the power of 2.
- Expect 99% to get worse
Make all of the paper experiments easy to re-run without source code modification or arcane knowledge.
Start CoreArbiter Server (setuid binary) in Arachne runtime if it is not already started.

Integrate Arachne into Redis and/or build other applications on Arachne?
Insert Core Arbiter / Reimplement Arachne under Go / Java
Cluster scheduler that interacts with the core arbiter? // Replace Borg
Analyze all sources of latency in Arachne's operations.
Do latency & throughput comparisons against Golang's work-stealing load balancer, assuming that threads are created by a central dispatcher.

Resolve memory barrier issues around thread wakeup. // Exchange has mfence semantics, trust in gcc
Design & implement a new CorePolicy API and architecture.
Clean up Peter's bug fix for reducing the contexts scanned; one option is simply to remove that optimization until we can come up with something cleaner.
Re-run RAMCloud write throughput with the core arbiter running.
Integrate Arachne into memcached
- Current done with video processing as background
Fix the Arachne-RAMCloud-YCSB integration.
- Re-run YCSB benchmarks and put results into the paper.
Revise the paper to incorporate the memcached results
Refactor CoreEstimator to take a CoreList as an argument, and only do estimation across the cores specified therein. Estimation should be deferred if data for the specific CoreList was not previously collected.
Run microbenchmarks with both idle and active hypertwins
Expand graphs and tables in paper
Investigate why memcached-A-NoArbiter performs worse than memcached - Add the graph requested.
Investigate why microbenchmarks got slower.
Decide what to do with Cilk -- e.g. find an older compiler that doesn't remove thread creations?
Check whether x264 is running with a nice value set; if so, rerun experiments without nice.
- Make sure we mention in the paper
- Rerun without nice.
Add a new line in Table 2 for thread exit turnaround time. Time from when one thread exits to the next thread starts.
Make sure figure 2 is consistent with Table 2.