Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix stream processing #35

Open
wants to merge 2 commits into
base: master
Choose a base branch
from
Open

Conversation

mkieselmann
Copy link

While trying to merge a large number of GeoJson files in streaming mode I encountered 2 bugs.

  1. The merge file did not contain all content of the separate files.
    This seems to be related to parallel stream processing. At least serializing the processing of the separate file streams fixes it.

  2. The contents of the first file argument were missing in the merged file.
    This was related to the way command line arguments are parsed. -s was parsed as key=value which leads to the fact that the first file was interpreted as value for the s option.

I've tested the changes on macOS with node v13.5.0.

@zingi
Copy link

zingi commented Jan 23, 2020

I can confirm that I also had issues while trying to merge several (30+) geojson files.
The output geojson only containd a fraction of what it should have.

Furthermore I got a memory leak warning:

(node:6825) MaxListenersExceededWarning: Possible EventEmitter memory leak detected. 11 drain listeners added to [Stream]. Use emitter.setMaxListeners() to increase limit
    at _addListener (events.js:268:17)
    at Stream.addListener (events.js:284:10)
    at Stream.pipe (internal/streams/legacy.js:30:8)
    at <mypath>/app/node_modules/@mapbox/geojson-merge/index.js:59:14
    at Array.forEach (<anonymous>)
    at Object.mergeFeatureCollectionStream (<mypath>/app/node_modules/@mapbox/geojson-merge/index.js:56:12)
    at <mypath>/app/rasterToVector.js:237:39
    at new Promise (<anonymous>)
    at mergeMultipleGeojsonFiles (<mypath>/app/rasterToVector.js:236:10)
    at AsyncFunction.convertMultiple [as multiple] (<mypath>/app/rasterToVector.js:91:13)
(node:6825) MaxListenersExceededWarning: Possible EventEmitter memory leak detected. 11 error listeners added to [Stream]. Use emitter.setMaxListeners() to increase limit
    at _addListener (events.js:268:17)
    at Stream.addListener (events.js:284:10)
    at Stream.pipe (internal/streams/legacy.js:64:8)
    at <mypath>/app/node_modules/@mapbox/geojson-merge/index.js:59:14
    at Array.forEach (<anonymous>)
    at Object.mergeFeatureCollectionStream (<mypath>/app/node_modules/@mapbox/geojson-merge/index.js:56:12)
    at <mypath>/app/rasterToVector.js:237:39
    at new Promise (<anonymous>)
    at mergeMultipleGeojsonFiles (<mypath>/app/rasterToVector.js:236:10)
    at AsyncFunction.convertMultiple [as multiple] (<mypath>/app/rasterToVector.js:91:13)
(node:6825) MaxListenersExceededWarning: Possible EventEmitter memory leak detected. 11 close listeners added to [Stream]. Use emitter.setMaxListeners() to increase limit
    at _addListener (events.js:268:17)
    at Stream.addListener (events.js:284:10)
    at Stream.pipe (internal/streams/legacy.js:86:8)
    at <mypath>/app/node_modules/@mapbox/geojson-merge/index.js:59:14
    at Array.forEach (<anonymous>)
    at Object.mergeFeatureCollectionStream (<mypath>/app/node_modules/@mapbox/geojson-merge/index.js:56:12)
    at <mypath>/app/rasterToVector.js:237:39
    at new Promise (<anonymous>)
    at mergeMultipleGeojsonFiles (<mypath>/app/rasterToVector.js:236:10)
    at AsyncFunction.convertMultiple [as multiple] (<mypath>/app/rasterToVector.js:91:13)

I tried the fixed code of this pull-request and it solved the issue.

@zingi
Copy link

zingi commented Jan 24, 2020

Just for the sake of good written code, you could remove the variable declaration, because the variable is actually unused:

function mergeFeatureCollectionStream (inputs) {
    const out = geojsonStream.stringify();
    const streams = inputs.map(file => fs.createReadStream(file));
    new StreamConcat(streams);
        .pipe(geojsonStream.parse())
        .pipe(out);
    return out;
}

@mkieselmann
Copy link
Author

Thanks for the suggestion. I have updated the code.

@kevinduke10
Copy link

I can confirm that I also had issues while trying to merge several (30+) geojson files.
The output geojson only containd a fraction of what it should have.

Furthermore I got a memory leak warning:

(node:6825) MaxListenersExceededWarning: Possible EventEmitter memory leak detected. 11 drain listeners added to [Stream]. Use emitter.setMaxListeners() to increase limit
    at _addListener (events.js:268:17)
    at Stream.addListener (events.js:284:10)
    at Stream.pipe (internal/streams/legacy.js:30:8)
    at <mypath>/app/node_modules/@mapbox/geojson-merge/index.js:59:14
    at Array.forEach (<anonymous>)
    at Object.mergeFeatureCollectionStream (<mypath>/app/node_modules/@mapbox/geojson-merge/index.js:56:12)
    at <mypath>/app/rasterToVector.js:237:39
    at new Promise (<anonymous>)
    at mergeMultipleGeojsonFiles (<mypath>/app/rasterToVector.js:236:10)
    at AsyncFunction.convertMultiple [as multiple] (<mypath>/app/rasterToVector.js:91:13)
(node:6825) MaxListenersExceededWarning: Possible EventEmitter memory leak detected. 11 error listeners added to [Stream]. Use emitter.setMaxListeners() to increase limit
    at _addListener (events.js:268:17)
    at Stream.addListener (events.js:284:10)
    at Stream.pipe (internal/streams/legacy.js:64:8)
    at <mypath>/app/node_modules/@mapbox/geojson-merge/index.js:59:14
    at Array.forEach (<anonymous>)
    at Object.mergeFeatureCollectionStream (<mypath>/app/node_modules/@mapbox/geojson-merge/index.js:56:12)
    at <mypath>/app/rasterToVector.js:237:39
    at new Promise (<anonymous>)
    at mergeMultipleGeojsonFiles (<mypath>/app/rasterToVector.js:236:10)
    at AsyncFunction.convertMultiple [as multiple] (<mypath>/app/rasterToVector.js:91:13)
(node:6825) MaxListenersExceededWarning: Possible EventEmitter memory leak detected. 11 close listeners added to [Stream]. Use emitter.setMaxListeners() to increase limit
    at _addListener (events.js:268:17)
    at Stream.addListener (events.js:284:10)
    at Stream.pipe (internal/streams/legacy.js:86:8)
    at <mypath>/app/node_modules/@mapbox/geojson-merge/index.js:59:14
    at Array.forEach (<anonymous>)
    at Object.mergeFeatureCollectionStream (<mypath>/app/node_modules/@mapbox/geojson-merge/index.js:56:12)
    at <mypath>/app/rasterToVector.js:237:39
    at new Promise (<anonymous>)
    at mergeMultipleGeojsonFiles (<mypath>/app/rasterToVector.js:236:10)
    at AsyncFunction.convertMultiple [as multiple] (<mypath>/app/rasterToVector.js:91:13)

I tried the fixed code of this pull-request and it solved the issue.

Just for the sake of good written code, you could remove the variable declaration, because the variable is actually unused:

function mergeFeatureCollectionStream (inputs) {
    const out = geojsonStream.stringify();
    const streams = inputs.map(file => fs.createReadStream(file));
    new StreamConcat(streams);
        .pipe(geojsonStream.parse())
        .pipe(out);
    return out;
}

Same issue here with 1.2GB in 72 files. This fix cleared it right up!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants