-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add fixed-width column support #220
Conversation
Note I will fix the "DESIRED UNITS: BLAH" comments I left in |
f3cb1c1
to
e8d19ef
Compare
e8d19ef
to
76762f0
Compare
5c1e76a
to
6aee4ae
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Your level of test coverage is impressive. My main concern is testing (or testing they are invalid) other ways CsvSpecs might be configured in relationship to hasFixedWidthColumns, along w/ documentation in those regards.
* See {@link Builder#fixedColumnWidths}. | ||
*/ | ||
@Default | ||
public List<Integer> fixedColumnWidths() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should be able to add a check that if fixedColumnWidths is specified, hasFixedWidthColunms must be true.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Check added, and triggered in the test checkParametersIncompatibleWithDelimitedMode
* | ||
* @param fixedColumnWidths The caller-specified widths of the columns. | ||
*/ | ||
Builder fixedColumnWidths(Iterable<Integer> fixedColumnWidths); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If specified, do all of the column lengths need to be set? All but the last?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All but the last need to be set. The last is a placeholder. Made explicit in comments for CsvSpecs#fixedColumnWidths
and also tested in CsvReaderTest#finalFixedColumnWidthEntryIsPlaceholder
/** | ||
* True if the input is organized into fixed width columns rather than delimited by a delimiter. | ||
*/ | ||
Builder hasFixedWidthColumns(boolean hasFixedWidthColumns); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do we expect will happen if ignoreSurroundingSpaces() == false
? Is that a legit configuration? Should we disallow it?
What about quote / trim behavior? I'm assuming that "
(or quote char) inside the fixed width are included in the resulting string. In which case, maybe we want to assert that if we are in fixed width mode, quote()
should not be set (or, at least, not set differently than the default).
Mainly, I wonder if we need to consider the scope of CsvSpecs wrt hasFixedWidthColumns; I'm happy to err on the side of being more strict.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ignoreSurroundingSpaces() == false
is legit, though perhaps not very useful. Now tested in CsvReaderTest#fixedColumnsMayIncludeOrExcludeSurroundingSpaces
quote / trim / etc now validated for only being set in the right mode. See CsvReaderTest#checkParametersIncompatibleWithFixedWidthMode
and its partner CsvReaderTest#checkParametersIncompatibleWithDelimitedMode
+ "beta Not Active 4y235d kubernetes.io/metadata.name=beta\n"; | ||
|
||
final CsvSpecs specs = defaultCsvBuilder().hasFixedWidthColumns(true) | ||
.ignoreSurroundingSpaces(true).build(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ignoreSurroundingSpaces defaults to true; is false a legit value? If not, we should err if it is set.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've removed a bunch of places where I was redundantly setting it to true. Setting it to false does work but is not IMO very useful. Unit test in CsvReaderTest#fixedColumnsMayIncludeOrExcludeSurroundingSpaces
6e762db
to
1ff6927
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great. Will approve once spotless is corrected and tests pass.
This PR depends on #219 and should not be merged until that one is.
Until then the changes in this PR will look like they also include the changes from #219, but that will go away once the other PR has been merged.
This PR resolves #212
Perhaps the most controversial idea here is the flag that controls how characters are counted. Should they be UTF-32 code points or the number of characters Java would see as
String.length
? There's no difference until you go outside the Unicode Basic Multilingual Plane. But once outside that plane, Unicode code points require two Javachar
rather than one.The question is, what does the user intend? Should a string like
🥰😻🧡💓💕💖
represent a width of six (because it is six Unicode code points) or a width of twelve (because the Java string representing it would have length 12).I think in some sense the question has no right answer, so we have a flag
CsvSpecs.useUtf32CountingConvention
to make it go one way or the other.Also I need to add a couple more tests that I forgot to do (e.g. for explicitly specifying column widths rather than inferring them, and a file that has no column headers, etc).