Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Output with Enclosed InputFormats #177

Open
randallwhitman opened this issue Sep 25, 2020 · 4 comments
Open

Output with Enclosed InputFormats #177

randallwhitman opened this issue Sep 25, 2020 · 4 comments

Comments

@randallwhitman
Copy link
Contributor

See Esri/gis-tools-for-hadoop#83

@randallwhitman randallwhitman added this to the v2.2 milestone Sep 25, 2020
@randallwhitman
Copy link
Contributor Author

randallwhitman commented Nov 16, 2020

With larger data, the output would be expected to span multiple files. In that case, it's not clear how the file[s] could be enclosed at all - maybe each file of the collection could have Enclosed format? Does the InputFormat have the info when an output file is started? Maybe a custom OutputFormat would be needed?

@randallwhitman randallwhitman changed the title Enclosed InputFormats with Hive-2.3 Output with Enclosed InputFormats Nov 16, 2020
@randallwhitman
Copy link
Contributor Author

randallwhitman commented Dec 29, 2020

maybe each file of the collection could have Enclosed format

I think that can be done by extending/implementing/overriding some-to-all of:

  • FileOutputFormat // HiveIgnoreKeyTextOutputFormat

    • RecordWriter getHiveRecordWriter(JobConf jc, Path outPath, Class<? extends Writable> valueClass, boolean isCompressed, Properties tableProperties, Progressable progress) throws IOException
    • RecordWriter<K, V> getRecordWriter(TaskAttemptContext job) throws IOException, InterruptedException
    • OutputCommitter getOutputCommitter(TaskAttemptContext context) throws IOException
  • RecordWriter<K,V> // LineRecordWriter

    • void close(TaskAttemptContext context) throws IOException, InterruptedException
    • void write(K key, V value) throws IOException, InterruptedException

@randallwhitman
Copy link
Contributor Author

See also -
https://github.com/apache/hive/blob/master/beeline/src/java/org/apache/hive/beeline/JSONOutputFormat.java
The abstract base class appears introduced in Hive-0.11 , which is probably OK if we cease supporting older.

@randallwhitman
Copy link
Contributor Author

com.esri.json.hadoop.Enclosed{Esri,Geo}JsonOutputFormat and/or com.esri.hadoop.hive.json.EnclosedEachJsonHiveOutputFormat

@randallwhitman randallwhitman modified the milestones: v2.2, v2.3 Dec 30, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant