Skip to content
This repository has been archived by the owner on Dec 10, 2024. It is now read-only.

Commit

Permalink
Add benchmark, remove extra import
Browse files Browse the repository at this point in the history
  • Loading branch information
brad-richardson committed Jul 7, 2024
1 parent 7b56b5c commit d23f1da
Show file tree
Hide file tree
Showing 2 changed files with 17 additions and 2 deletions.
17 changes: 16 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -44,10 +44,25 @@ cargo run --release -- --input your.osm.pbf --output ./parquet
5. Test with `cd test && ./prepare.sh && python3 validate.py`


## Benchmarks
osm-pbf-parquet prioritizes transcode speed over file size, file count or perserving ordering.
| | Time (wall) | Output size | File count |
| - | - | - | - |
| osm-pbf-parquet | 33 minutes | 234GB | 3,253 |
| [osm-parquetizer](https://github.com/adrianulbona/osm-parquetizer) | 196 minutes | 285GB | 3 |
| [osm2orc](https://github.com/mojodna/osm2orc) | 385 minutes | 110GB | 1 |
Test system:
```
i5-9400 (6 CPU, 32GB memory)
Ubuntu 24.04
OpenJDK 17
Rust 1.79.0
```


## License
Distributed under the MIT License. See `LICENSE` for more information.


## Acknowledgments
* [osmpbf](https://github.com/b-r-u/osmpbf) and [osm2gzip](https://github.com/b-r-u/osm2gzip) for reading PBF data
* [osm2orc](https://github.com/mojodna/osm2orc) for schema and processing ideas
2 changes: 1 addition & 1 deletion src/sink.rs
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
use std::fs::File;
use std::sync::{Arc, Mutex};

use osmpbf::{DenseNode, Node, RelMemberType, Relation, TagIter, Way};
use osmpbf::{DenseNode, Node, RelMemberType, Relation, Way};
use parquet::arrow::ArrowWriter;
use parquet::basic::Compression;
// use parquet::data_type::DataType;
Expand Down

0 comments on commit d23f1da

Please sign in to comment.