Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature/396 handle xlsx files #421

Closed
wants to merge 9 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 7 additions & 0 deletions libs/execution/src/lib/debugging/debug-log-visitor.ts
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@
import { internalValueToString } from '@jvalue/jayvee-language-server';

import { Logger } from '../logger';
import { Workbook } from '../types';
import { FileSystem } from '../types/io-types/filesystem';
import { BinaryFile } from '../types/io-types/filesystem-node-file-binary';
import { TextFile } from '../types/io-types/filesystem-node-file-text';
Expand Down Expand Up @@ -134,6 +135,12 @@ export class DebugLogVisitor implements IoTypeVisitor<void> {

this.log('... (omitted in peek mode)');
}
visitWorkbook(workbook: Workbook): void {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should

  • have the same empty return with minimal debug granularity as the visitTable function has
  • Log the number of worksheets in the workbook by default
  • use the log function consistently, never console.log
  • log the peek comment (as visit table does)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added

  • empty return for minimal debug granularity
  • Logging the number of worksheets in the workbook
  • consistently using log function
  • added peek functionality for bigger workbooks

this.log(`WorkSheets in WorkBook:`);
workbook
.getSheets()
.forEach((sheet) => console.log(`WorkSheet: ${sheet.getSheetName()}`));
}

private log(text: string): void {
this.logger.logDebug(`[${this.logPrefix}] ${text}`);
Expand Down
1 change: 1 addition & 0 deletions libs/execution/src/lib/types/io-types/index.ts
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@ export * from './filesystem';
export * from './filesystem-inmemory';
export * from './io-type-implementation';
export * from './none';
export * from './workbook';
export * from './sheet';
export * from './table';
export * from './filesystem-node-file-binary';
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,7 @@ import { type TextFile } from './filesystem-node-file-text';
import { type None } from './none';
import { type Sheet } from './sheet';
import { type Table } from './table';
import { type Workbook } from './workbook';

export interface IOTypeImplementation<T extends IOType = IOType> {
ioType: T;
Expand All @@ -20,6 +21,7 @@ export interface IOTypeImplementation<T extends IOType = IOType> {
export interface IoTypeVisitor<R = unknown> {
visitTable(table: Table): R;
visitSheet(sheet: Sheet): R;
visitWorkbook(workbook: Workbook): R;
visitNone(none: None): R;
visitFileSystem(fileSystem: FileSystem): R;
visitBinaryFile(binaryFile: BinaryFile): R;
Expand Down
6 changes: 5 additions & 1 deletion libs/execution/src/lib/types/io-types/sheet.ts
Original file line number Diff line number Diff line change
Expand Up @@ -22,13 +22,17 @@ export class Sheet implements IOTypeImplementation<IOType.SHEET> {
public readonly ioType = IOType.SHEET;
private numberOfRows: number;
private numberOfColumns: number;
constructor(private data: string[][]) {
constructor(private data: string[][], private sheetName?: string) {
this.numberOfRows = data.length;
this.numberOfColumns = data.reduce((prev, curr) => {
return curr.length > prev ? curr.length : prev;
}, 0);
}

getSheetName(): string {
return this.sheetName ?? '';
}

getData(): ReadonlyArray<ReadonlyArray<string>> {
return this.data;
}
Expand Down
47 changes: 47 additions & 0 deletions libs/execution/src/lib/types/io-types/workbook.ts
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the general approach I am not convinced a name should be the property of a sheet. You run into multiple problems using that approach:

  • You have complex code because you need to filter
  • You have the problem of duplicate names and what to do with them

Why not make the name to sheet mapping a property of a workbook? Keep sheets as they are, nameless. Instead, track sheets in the workbook as objects containing the name and the actual sheet. Then you can have an index (from the position of the object in the array), the sheet and the name.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added it as a map<string, workbook> where string is the name of the workbook.
As discussed with the downside not beeing able to reference the sheet by position within the workbook e.g. read from sheet 5.

Original file line number Diff line number Diff line change
@@ -0,0 +1,47 @@
// SPDX-FileCopyrightText: 2023 Friedrich-Alexander-Universitat Erlangen-Nurnberg
//
// SPDX-License-Identifier: AGPL-3.0-only

import { strict as assert } from 'assert';

import { IOType } from '@jvalue/jayvee-language-server';

import { IOTypeImplementation, IoTypeVisitor } from './io-type-implementation';
import { Sheet } from './sheet';

export class Workbook implements IOTypeImplementation<IOType.WORKBOOK> {
public readonly ioType = IOType.WORKBOOK;
private sheets: Sheet[];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can be assigned immediately and you do not need a constructor.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed accordingly.

constructor() {
this.sheets = [];
}

getSheets(): ReadonlyArray<Sheet> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why readonly array? I'd just reuse the Sheet[].

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed accordingly.

return this.sheets;
}

getSheetByName(sheetName: string): Sheet {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We shouldn't crash the interpreter with an assert if the sheet does not exist, I'd return Sheet or undefined and use https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Array/find

const sheet = this.sheets.filter(
(sheet) => sheet.getSheetName() === sheetName,
)[0];
assert(sheet instanceof Sheet);
return sheet;
}

addSheet(sheet: Sheet) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why addSheet and addNewSheet with different semantic? addSheet lets you add sheets with duplicate names, addNewSheet does not. This has to be one function, just remove the current addSheet and rename the addNewSheet to addSheet after.

Also, please add return types. These functions should return the workbook itself to enable fluent interfaces (https://en.wikipedia.org/wiki/Fluent_interface).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed as suggested

this.sheets.push(sheet);
}

addNewSheet(data: string[][], sheetName?: string) {
const sheetNameOrDefault = sheetName ?? `Sheet${this.sheets.length + 1}`;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this but the name should have an empty space.

Suggested change
const sheetNameOrDefault = sheetName ?? `Sheet${this.sheets.length + 1}`;
const sheetNameOrDefault = sheetName ?? `Sheet ${this.sheets.length + 1}`;

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed, we will keep this without the gap to stay in line with common default naming for worksheets in eg. excel or sheets.

if (
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if (this.getSheetByName), reuse your own api.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed as suggested

this.sheets.some((sheet) => sheet.getSheetName() === sheetNameOrDefault)
)
return;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Silent returns without doing anything is suboptimal. Please log a clear error (e.g., did not add sheet X to workbook, sheet with name X already exists).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed accordingly.

this.addSheet(new Sheet(data, sheetNameOrDefault));
}

acceptVisitor<R>(visitor: IoTypeVisitor<R>): R {
return visitor.visitWorkbook(this);
}
}
4 changes: 4 additions & 0 deletions libs/extensions/tabular/exec/src/extension.ts
Original file line number Diff line number Diff line change
Expand Up @@ -12,8 +12,10 @@ import { CellWriterExecutor } from './lib/cell-writer-executor';
import { ColumnDeleterExecutor } from './lib/column-deleter-executor';
import { CSVInterpreterExecutor } from './lib/csv-interpreter-executor';
import { RowDeleterExecutor } from './lib/row-deleter-executor';
import { SheetPickerExecutor } from './lib/sheet-picker-executor';
import { TableInterpreterExecutor } from './lib/table-interpreter-executor';
import { TableTransformerExecutor } from './lib/table-transformer-executor';
import { XLSXInterpreterExecutor } from './lib/xlsx-interpreter-executor';

export class TabularExecExtension implements JayveeExecExtension {
getBlockExecutors(): BlockExecutorClass[] {
Expand All @@ -25,6 +27,8 @@ export class TabularExecExtension implements JayveeExecExtension {
TableInterpreterExecutor,
CSVInterpreterExecutor,
TableTransformerExecutor,
XLSXInterpreterExecutor,
SheetPickerExecutor,
];
}
}
39 changes: 39 additions & 0 deletions libs/extensions/tabular/exec/src/lib/sheet-picker-executor.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,39 @@
// SPDX-FileCopyrightText: 2023 Friedrich-Alexander-Universitat Erlangen-Nurnberg
//
// SPDX-License-Identifier: AGPL-3.0-only

import * as R from '@jvalue/jayvee-execution';
import {
AbstractBlockExecutor,
BlockExecutorClass,
ExecutionContext,
Sheet,
Workbook,
implementsStatic,
} from '@jvalue/jayvee-execution';
import { IOType, PrimitiveValuetypes } from '@jvalue/jayvee-language-server';

@implementsStatic<BlockExecutorClass>()
export class SheetPickerExecutor extends AbstractBlockExecutor<
IOType.WORKBOOK,
IOType.SHEET
> {
public static readonly type = 'SheetPicker';

constructor() {
super(IOType.WORKBOOK, IOType.SHEET);
}

// eslint-disable-next-line @typescript-eslint/require-await
async doExecute(
workbook: Workbook,
context: ExecutionContext,
): Promise<R.Result<Sheet | null>> {
const sheetName = context.getPropertyValue(
'sheetName',
PrimitiveValuetypes.Text,
);
const sheet = workbook.getSheetByName(sheetName);
return R.ok(sheet);
}
}
61 changes: 61 additions & 0 deletions libs/extensions/tabular/exec/src/lib/xlsx-interpreter-executor.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,61 @@
// SPDX-FileCopyrightText: 2023 Friedrich-Alexander-Universitat Erlangen-Nurnberg
//
// SPDX-License-Identifier: AGPL-3.0-only

import { strict as assert } from 'assert';

import * as R from '@jvalue/jayvee-execution';
import {
AbstractBlockExecutor,
BinaryFile,
BlockExecutorClass,
ExecutionContext,
Workbook,
implementsStatic,
} from '@jvalue/jayvee-execution';
import { IOType } from '@jvalue/jayvee-language-server';
import * as xlsx from 'xlsx';

@implementsStatic<BlockExecutorClass>()
export class XLSXInterpreterExecutor extends AbstractBlockExecutor<
IOType.FILE,
IOType.WORKBOOK
> {
public static readonly type = 'XLSXInterpreter';

constructor() {
super(IOType.FILE, IOType.WORKBOOK);
}

async doExecute(
file: BinaryFile,
context: ExecutionContext,
): Promise<R.Result<Workbook>> {
context.logger.logDebug(`reading from xlsx file`);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
context.logger.logDebug(`reading from xlsx file`);
context.logger.logDebug(`Reading from XLSX file`);

const workBookFromFile = xlsx.read(file.content, { dense: true });
const workbook = new Workbook();
for (const workSheetName of workBookFromFile.SheetNames) {
const workSheet = workBookFromFile.Sheets[workSheetName];
assert(workSheet !== undefined);

const workSheetDataArray = Array.prototype.map.call<
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That line looks weird to me, why go over the prototype?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be quite frank, because I did't manage to do it differently eg calling map directly without failing lint and or type checks. Maybe I can have another look with @rhazn.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, please bring this up in our sync, we can do it live.

xlsx.WorkSheet,
[
callbackfn: (
value: xlsx.CellObject[],
index: number,
array: xlsx.WorkSheet[],
) => string[],
],
string[][]
>(workSheet, (row: xlsx.CellObject[]): string[] => {
return row.map<string>((cell: xlsx.CellObject) => {
return cell.v?.toString() ?? '';
});
});

workbook.addNewSheet(workSheetDataArray, workSheetName);
}
return Promise.resolve(R.ok(workbook));
}
}
4 changes: 4 additions & 0 deletions libs/extensions/tabular/lang/src/extension.ts
Original file line number Diff line number Diff line change
Expand Up @@ -13,8 +13,10 @@ import { CellWriterMetaInformation } from './lib/cell-writer-meta-inf';
import { ColumnDeleterMetaInformation } from './lib/column-deleter-meta-inf';
import { CSVInterpreterMetaInformation } from './lib/csv-interpreter-meta-inf';
import { RowDeleterMetaInformation } from './lib/row-deleter-meta-inf';
import { SheetPickerMetaInformation } from './lib/sheet-picker-meta-inf';
import { TableInterpreterMetaInformation } from './lib/table-interpreter-meta-inf';
import { TableTransformerMetaInformation } from './lib/table-transformer-meta-inf';
import { XLSXInterpreterMetaInformation } from './lib/xlsx-interpreter-meta-inf';

export class TabularLangExtension implements JayveeLangExtension {
getBlockMetaInf(): Array<ConstructorClass<BlockMetaInformation>> {
Expand All @@ -26,6 +28,8 @@ export class TabularLangExtension implements JayveeLangExtension {
TableInterpreterMetaInformation,
CSVInterpreterMetaInformation,
TableTransformerMetaInformation,
XLSXInterpreterMetaInformation,
SheetPickerMetaInformation,
];
}
}
44 changes: 44 additions & 0 deletions libs/extensions/tabular/lang/src/lib/sheet-picker-meta-inf.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,44 @@
// SPDX-FileCopyrightText: 2023 Friedrich-Alexander-Universitat Erlangen-Nurnberg
//
// SPDX-License-Identifier: AGPL-3.0-only

import {
BlockMetaInformation,
IOType,
PrimitiveValuetypes,
} from '@jvalue/jayvee-language-server';

export class SheetPickerMetaInformation extends BlockMetaInformation {
constructor() {
super(
// How the block type should be called:
'SheetPicker',
// Property definitions:
{
sheetName: {
type: PrimitiveValuetypes.Text,
docs: {
description: 'The name of the sheet to select.',
},
},
},
// Input type:
IOType.WORKBOOK,

// Output type:
IOType.SHEET,
);

this.docs.description =
'Selects one `Sheet` from a `Workbook` based on its `sheetName`. If no sheet matches the name, no output is created and the execution of the pipeline is aborted.';
this.docs.examples = [
{
code: `block AgencySheetPicker oftype SheetPicker {
sheetName: "AgencyNames";
}`,
description:
'Tries to pick the sheet `AgencyNames` from the provided `Workbook`. If `AgencyNames` exists it is passed on as `Sheet`, if it does not exist the execution of the pipeline is aborted.',
},
];
}
}
33 changes: 33 additions & 0 deletions libs/extensions/tabular/lang/src/lib/xlsx-interpreter-meta-inf.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
// SPDX-FileCopyrightText: 2023 Friedrich-Alexander-Universitat Erlangen-Nurnberg
//
// SPDX-License-Identifier: AGPL-3.0-only

import { BlockMetaInformation, IOType } from '@jvalue/jayvee-language-server';

export class XLSXInterpreterMetaInformation extends BlockMetaInformation {
constructor() {
super(
// How the block type should be called:
'XLSXInterpreter',
// Property definitions:
{},
// Input type:
IOType.FILE,

// Output type:
IOType.WORKBOOK,
);

this.docs.description =
"Interprets an input file as a xlsx-file and outputs a `Workbook` containing `Sheet`'s.";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"Interprets an input file as a xlsx-file and outputs a `Workbook` containing `Sheet`'s.";
"Interprets an input file as a XLSX-file and outputs a `Workbook` containing `Sheet`s.";

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed as suggested

this.docs.examples = [
{
code: blockExample,
description:
"Interprets an input file as a xlsx-file and outputs a `Workbook` containing `Sheet`'s.",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"Interprets an input file as a xlsx-file and outputs a `Workbook` containing `Sheet`'s.",
"Interprets an input file as a XLSX-file and outputs a `Workbook` containing `Sheet`s.",

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed as suggested

},
];
}
}
const blockExample = `block AgencyXLSXInterpreter oftype XLSXInterpreter {
}`;
1 change: 1 addition & 0 deletions libs/language-server/src/lib/ast/model-util.ts
Original file line number Diff line number Diff line change
Expand Up @@ -145,6 +145,7 @@ export enum IOType {
TEXT_FILE = 'TextFile',
FILE_SYSTEM = 'FileSystem',
SHEET = 'Sheet',
WORKBOOK = 'Workbook',
TABLE = 'Table',
}

Expand Down
Loading