Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

adding CombineInputFileFormat; only single use case so far #3

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

thomasstorm
Copy link
Collaborator

No description provided.

@thomasstorm thomasstorm self-assigned this Jul 15, 2019
Comment on lines -150 to 153
if (inputFile == null) {
for (Path inputPath : inputPaths) {
inputFile = CalvalusProductIO.copyFileToLocal(inputPath, getConfiguration());
setInputFile(inputFile);
if (inputFile == null) {
setInputFile(inputFile);
}
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here, it seems the logic has unintentionally changed. The test for null was located before the second assignment of the copyFileToLocal before, and will never be true now.

/**
* @author thomas
*/
public class CombineFileInputFormat extends InputFormat {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a relation to org.apache.hadoop.mapreduce.lib.input.CombineFileInputFormat ?

* Creates a single split from a given pattern
*/
@Override
public List<InputSplit> getSplits(JobContext context) throws IOException {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about our other methods to determine inputs, in particular those using the geo-inventory? I know that PatternBasedInputFormat needs refactoring and decomposition but I think the other ways to determine inputs are required.
Thinking of how to refactor PatternBasedInputFormat it may be good to distinguish the way the inputs shall be determined (geo-inventory, opensearch query, path pattern, ...) by different classes as they have different parameters anyway, and whichever parameter is specified the client could automatically select the right class. Then, we could either derive a class for CombineFileSplit generation from each of them, or we make this a parameter. In any case, the old PatternBasedInputFormat could delegate the getSplits() call to the new implementations to keep backwards compatibility.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants