Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GeoJson InputFormat #81

Closed
randallwhitman opened this issue May 18, 2015 · 4 comments
Closed

GeoJson InputFormat #81

randallwhitman opened this issue May 18, 2015 · 4 comments
Milestone

Comments

@randallwhitman
Copy link
Contributor

The Spatial Framework for Hadoop has an InputFormat for Esri GeoServices REST JSON, but not for GeoJSON. Providing an InputFormat for GeoJSON might in fact mean 2 different formats as with Esri JSON - Enclosed and Unenclosed variants.

Developing an InputFormat would include a custom InputFormat and a custom RecordReader.

Currently, Enclosed JSON - i.e. a file of complete valid Esri REST JSON - is treated as non-splittable; in other words, if used as an input to a MapReduce job, the whole file will be processed by a single Mapper. It would likely be possible to split Enclosed JSON for both Esri JSON and GeoJSON.

A GeoJSON InputFormat might be a good opportunity for a community contribution.

@randallwhitman
Copy link
Contributor Author

Pathological case:
...}{"type":"Feature,"properties":{"type":"Feature"},"geometry":{...
split as ..
... | ... }{"type":"Feature,"properties": | {"type":"Feature"},"geometry":{... | ...

@randallwhitman
Copy link
Contributor Author

I have renamed the tests of Unenclosed JSON for symmetric naming between EsriJSON and GeoJSON. Consistent symmetric naming would cover product code as well as tests. Blithely renaming UnenclosedJsonInputFormat to UnenclosedEsriJsonInputFormat would be a backward-compatibility problem.

Proposal:

  • Introduce UnenclosedEsriJsonInputFormat equivalent to current UnenclosedJsonInputFormat, so we would have UnenclosedEsriJsonInputFormat and UnenclosedGeoJsonInputFormat;
  • Retain, but deprecate, UnenclosedJsonInputFormat as a synonym of UnenclosedEsriJsonInputFormat, for backward compatibility.

Comments? @ErikHoel @climbage @smambrose

@randallwhitman
Copy link
Contributor Author

This enhancement is feature complete and passes unit tests. No integration testing so far.

@randallwhitman
Copy link
Contributor Author

The test failures in travis did not show up in my working directory.
Reproduced in clean-directory clone. Resources omitted from last commit.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant