Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature/396 handle xlsx files #421

Closed
wants to merge 9 commits into from

Conversation

A-M-A-X
Copy link
Contributor

@A-M-A-X A-M-A-X commented Aug 10, 2023

Implemented xlsx interpreter and sheet-picker blocks.
Test cases if applicable will be in a different PR.

@A-M-A-X A-M-A-X changed the title WIP: Feature/396 handle xlsx files Feature/396 handle xlsx files Aug 10, 2023
@rhazn rhazn self-requested a review August 10, 2023 16:37
@A-M-A-X
Copy link
Contributor Author

A-M-A-X commented Aug 11, 2023

Please let me know which classes I should write test cases for and to which extend or if this is not required at this stage. @rhazn

Copy link
Member

@georg-schwarz georg-schwarz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work!

I know I wasn't tagged but still wanted to leave you a few comments ;-)

Comment on lines 31 to 38
addSheet(sheet: Sheet) {
this.sheets.push(sheet);
}

addNewSheet(data: string[][], sheetName?: string) {
const sheetNameOrDefault = sheetName ?? `Sheet${this.sheets.length + 1}`;
this.addSheet(new Sheet(data, sheetNameOrDefault));
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This implementation currently allows sheets with duplicate names. I'd try to avoid this, or is this intended behavior?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added check to prevent duplicate names in workbook

PrimitiveValuetypes.Text,
);
const sheet = workbook.getSheetByName(sheetName);
assert(sheet instanceof Sheet);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rather sheet !== undefined (would be more expressive of your intention imo

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed assert completly as already checked within function getSheetByName().

const workSheet = workBookFromFile.Sheets[workSheetName];
assert(workSheet !== undefined);

const workSheetDataArray = Array.prototype.map.call<
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That line looks weird to me, why go over the prototype?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be quite frank, because I did't manage to do it differently eg calling map directly without failing lint and or type checks. Maybe I can have another look with @rhazn.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, please bring this up in our sync, we can do it live.

) => string[],
],
string[][]
>(workSheet, (x: xlsx.CellObject[]): string[] => {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pls use speaking variable names to indicate to readers whats happening.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

renamed to row

],
string[][]
>(workSheet, (x: xlsx.CellObject[]): string[] => {
return x.map<string>((y: xlsx.CellObject) => {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here: better naming of variables please (e.g., cell)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

renamed to cell

@georg-schwarz georg-schwarz deleted the branch jvalue:dev August 15, 2023 14:27
@georg-schwarz
Copy link
Member

Uhm, weird. Didn't mean to close this PR!

@georg-schwarz georg-schwarz reopened this Aug 15, 2023
Copy link
Contributor

@rhazn rhazn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left some comments.

Regarding the PR itself: Please make sure to always at least mention the related task, otherwise it is very hard to navigate. You can do so by just adding a 'closes #' or 'relates to #' in the description where number is the issue id.

Regarding test cases, hopefully @georg-schwarz can help me out here, do you know from the top of your head where we are right now with tests, should we add tests for the new blocks or the iotype?

@@ -134,6 +135,12 @@ export class DebugLogVisitor implements IoTypeVisitor<void> {

this.log('... (omitted in peek mode)');
}
visitWorkbook(workbook: Workbook): void {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should

  • have the same empty return with minimal debug granularity as the visitTable function has
  • Log the number of worksheets in the workbook by default
  • use the log function consistently, never console.log
  • log the peek comment (as visit table does)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added

  • empty return for minimal debug granularity
  • Logging the number of worksheets in the workbook
  • consistently using log function
  • added peek functionality for bigger workbooks


export class Workbook implements IOTypeImplementation<IOType.WORKBOOK> {
public readonly ioType = IOType.WORKBOOK;
private sheets: Sheet[];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can be assigned immediately and you do not need a constructor.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed accordingly.

this.sheets = [];
}

getSheets(): ReadonlyArray<Sheet> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why readonly array? I'd just reuse the Sheet[].

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed accordingly.

return this.sheets;
}

getSheetByName(sheetName: string): Sheet {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We shouldn't crash the interpreter with an assert if the sheet does not exist, I'd return Sheet or undefined and use https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Array/find

return sheet;
}

addSheet(sheet: Sheet) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why addSheet and addNewSheet with different semantic? addSheet lets you add sheets with duplicate names, addNewSheet does not. This has to be one function, just remove the current addSheet and rename the addNewSheet to addSheet after.

Also, please add return types. These functions should return the workbook itself to enable fluent interfaces (https://en.wikipedia.org/wiki/Fluent_interface).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed as suggested

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the general approach I am not convinced a name should be the property of a sheet. You run into multiple problems using that approach:

  • You have complex code because you need to filter
  • You have the problem of duplicate names and what to do with them

Why not make the name to sheet mapping a property of a workbook? Keep sheets as they are, nameless. Instead, track sheets in the workbook as objects containing the name and the actual sheet. Then you can have an index (from the position of the object in the array), the sheet and the name.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added it as a map<string, workbook> where string is the name of the workbook.
As discussed with the downside not beeing able to reference the sheet by position within the workbook e.g. read from sheet 5.

);

this.docs.description =
"Interprets an input file as a xlsx-file and outputs a `Workbook` containing `Sheet`'s.";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"Interprets an input file as a xlsx-file and outputs a `Workbook` containing `Sheet`'s.";
"Interprets an input file as a XLSX-file and outputs a `Workbook` containing `Sheet`s.";

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed as suggested

{
code: blockExample,
description:
"Interprets an input file as a xlsx-file and outputs a `Workbook` containing `Sheet`'s.",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"Interprets an input file as a xlsx-file and outputs a `Workbook` containing `Sheet`'s.",
"Interprets an input file as a XLSX-file and outputs a `Workbook` containing `Sheet`s.",

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed as suggested

file: BinaryFile,
context: ExecutionContext,
): Promise<R.Result<Workbook>> {
context.logger.logDebug(`reading from xlsx file`);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
context.logger.logDebug(`reading from xlsx file`);
context.logger.logDebug(`Reading from XLSX file`);

const workSheet = workBookFromFile.Sheets[workSheetName];
assert(workSheet !== undefined);

const workSheetDataArray = Array.prototype.map.call<
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, please bring this up in our sync, we can do it live.

@f3l1x98 f3l1x98 deleted the branch jvalue:dev August 28, 2023 08:58
@f3l1x98 f3l1x98 closed this Aug 28, 2023
@georg-schwarz
Copy link
Member

@A-M-A-X seems like the dev branch got deleted by mistake, leading to an automation to close this PR. Can you retarget on main and reopen?

@A-M-A-X
Copy link
Contributor Author

A-M-A-X commented Aug 28, 2023

@georg-schwarz not sure if I am able to. I guess I am missing rights to do so.

@rhazn
Copy link
Contributor

rhazn commented Aug 28, 2023

You can create a new PR from your branch to main and keep this one closed.

@A-M-A-X
Copy link
Contributor Author

A-M-A-X commented Aug 29, 2023

opened new PR #433

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants