- master
- v0.2.22 (2015-06-12)
- added Context.wholeTextFiles()
- improved RDD.first() and RDD.take(n)
- added fileio.TextFile
- v0.2.21 (2015-06-07)
- added doc strings and created Sphinx documentation
- implemented allowLocal in
Context.runJob()
- v0.2.19 (2015-06-04)
- new IPython demo notebook at
docs/demo.ipynb
at https://github.com/svenkreiss/pysparkling/blob/master/docs/demo.ipynb parallelize()
can take an iterator (used inzip()
now for lazy loading)
- new IPython demo notebook at
- v0.2.16 (2015-05-31)
- add
values()
,union()
,zip()
,zipWithUniqueId()
,toLocalIterator()
- improve
aggregate()
andfold()
- add
stats()
,sampleStdev()
,sampleVariance()
,stdev()
,variance()
- make
cache()
andpersist()
do something useful - better partitioning in
parallelize()
- logo
- fix
foreach()
- add
- v0.2.10 (2015-05-27)
- fix
fileio.codec
import - support
http://
- fix
- v0.2.8 (2015-05-26)
- parallelized text file reading (and made it lazy)
- parallelized take() and takeSample() that only computes required data partitions
- add example: access Human Microbiome Project
- v0.2.6 (2015-05-21)
- factor out
fileio.fs
andfileio.codec
modules - merge
WholeFile
intoFile
- improved handling of compressed files (backwards incompatible)
fileio
interface changed todump()
andload()
methods. Addedmake_public()
for S3.- factor file related operations into
fileio
submodule
- factor out
- v0.2.2 (2015-05-18)
- compressions:
.gz
,.bz2
- compressions:
- v0.2.0 (2015-05-17)
- proper handling of partitions
- custom serializers, deserializers (for functions and data separately)
- more tests for parallelization options
- execution of distributed jobs is such that a chain of
map()
operations gets executed on workers without sending intermediate results back to the master - a few more methods for RDDs implemented
- v0.1.1 (2015-05-12)
- implemented a few more RDD methods
- changed handling of context in RDD
- v0.1.0 (2015-05-09)