You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
@shabnamsadegh I have another idea how to implement this that could also benefit the core batch writers:
We could implement an AsyncBatchWriter class in kipoi.writers that takes another batch writer and makes it asynchronous. The process loop should just run the while loop where it immediately runs batch_writer.batch_write().
classAsyncBatchWriter(BatchWriter):
def__init__(self, batch_writer, max_queue_size=100):
""" Args: max_queue_size: maximal queue size. If it gets larger then batch_write needs to wait till it can write to the queue again. """self.batch_writer=batch_writer# start the process and instantiate the queueself.queue= ...
self.process= ...
@abstractmethoddefbatch_write(self, batch):
"""Write a single batch of data Args: batch is one batch of data (nested numpy arrays with the same axis 0 shape) """ifself.queue.size() >self.max_queue_size:
# display warning. Wait till the queue is not small enoughself.queue.put(batch)
@abstractmethoddefclose(self):
"""Close the file """# stop the process, # make sure the queue is empty# close the fileself.batch_writer.close()
With this approach we would just need to add that class to kipoi.writers and then change this line of code to:
- [x] buffer writes - #21 (e.g. don't write predictions to disk on every batch but only every now and then)
- [ ] use asynchronous writes
Here is the main loop performing:
https://github.com/kipoi/kipoi-veff/blob/master/kipoi_veff/snv_predict.py#L620-L658
- [ ] setup some standardized benchmarks to test the overhead
Tasks
Follow the following notebook: https://github.com/kipoi/kipoi-veff/blob/write_buffer/notebooks/code-profiling.ipynb
Finish the code on the
write buffer
PR by speeding up the writing to take minimal amount of time.The text was updated successfully, but these errors were encountered: