-
Notifications
You must be signed in to change notification settings - Fork 237
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Caffe integration attempt #23
base: master
Are you sure you want to change the base?
Conversation
should compile against https://github.com/immars/caffe/tree/mydev |
Hi Lisen, Many thanks for your contributions! I have a few questions about your codes (apologies for the stupid questions due to i didn't read your codes carefully)
Best |
@immars mshadow-ps is a library that implements async copying and communication for GPU thread.
using mshadow-ps might help you handle some of the communication computation over-laps and unifies multi-card implementation with multi node code in one verison. You can find description here https://github.com/dmlc/mshadow/blob/master/guide/neuralnet/nnet_ps.cu was an example of implementing on mshadow-ps for a simple net |
i'll be more focused on ps-lite (basically i'll split paramet_server into two repos, one is a pure interface, and move the apps into another repo). but the class kv-layer which mshadow used doesn't change between ps-lite and master branch of parameter_server. so it is safe to use it. |
Caffe Integration attempt with parameter_server.
This pull request is created not because the code is ready for merge, but to draw attention and ask questions to make better implementation. Thanks.
about algorithm
about implementation
src/app/caffe/caffe_main.cc
: 1 process per gpu device; computation drived by workers; puller/pusher/solver in different threadsrc/app/caffe/caffe_synced.cc
: 1 process per gpu device; computation drived by server; all workers compute a batch and those batches accumulated to a larger batch, effectively a N times larger batch-sizesrc/app/caffe/caffe_async_share.cc
: 1 process per node, 1 thread per gpu device; weights pulled from server are shared by threads; computation drived by workerscaffe_async_share
gets better performance overall w.r.t. network usage/convergence speedabout usage:
script/caffe_local.sh
for test ( not withcaffe_async_share
),script/caffe_lan.sh
for launch in clusterscript/caffe_kill_lan.sh
for (ungracefully) shutdown in cluster.caffe_lan.sh
usage:script/caffe_lan.sh {conf_file}
. See./conf/
for example conf_file setting up workers/servers.Pending questions about code
push
andpull
? likevectorChanged
vectorGetting
interfaces introduced in my code. Server need to update weights after worker's diffpush
ed, and to synchronize weights from gpu back to host memory just before workerpull
.SharedParameter
, its subclasses, channels, and their usage in the cluster?VVector
added for simpler implementation: all parameters in one un-dividable vector (only 1 server supported as a result) but I guess that could be replaced by KVVector to support multi-servers. I just don't know how.