Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Infinite loop in Parser when input object has empty strings as keys and flatten is used #76

Open
mikkopiu opened this issue May 29, 2024 · 1 comment

Comments

@mikkopiu
Copy link

Description

The base Parser (or BaseParser) from @json2csv/plainjs will enter an infinite loop when attempting to parse an array of objects where at least one contains empty string keys like:

[
  {
    "a": {
      "": {
        "b": 1
      }
    }
  },
  {
    "anything": 2
  }
]

and flatten is used, so the headers end up like "a..b".

The current regular expression

const rePropName = RegExp(
  // Match anything that isn't a dot or bracket.
  '[^.[\\]]+' +
    '|' +
    // Or match property names within brackets.
    '\\[(?:' +
    // Match a non-string expression.
    '([^"\'][^[]*)' +
    '|' +
    // Or match strings (supports escaping characters).
    '(["\'])((?:(?!\\2)[^\\\\]|\\\\.)*?)\\2' +
    ')\\]' +
    '|' +
    // Or match "" as the space between consecutive dots or empty brackets.
    '(?=(?:\\.|\\[\\])(?:\\.|\\[\\]|$))',
  'g',
);

ends up forever testing true against strings like a..b, causing castPath's while loop:

  while ((match = rePropName.exec(value))) {
    result.push(match[3] ?? match[1]?.trim() ?? match[0]);
  }

to never finish until the result array blows past whatever the runtime's Array size limit is, throwing RangeError: Invalid array length (V8's error message).


Obviously this is an extreme edge case and you might wonder why one would even have empty strings as keys but that's the data I've found myself with 😅

This seems to have regressed in v7.0.4 with the introduction of the above regular expression rePropName in packages/plainjs/src/utils.ts, based on some unit tests that pass with v7.0.3. Unfortunately the expression is a bit too complex for me to confidently point to a certain solution.


As a sidenote, it also seems that the above regular expression would only work with .-separated paths but I haven't tested that enough yet to be sure.

Reproduction

Simple test script
import { Parser } from '@json2csv/plainjs';
import { flatten } from '@json2csv/transforms';
const parser = new Parser({
  delimiter: ';',
  transforms: [flatten({ arrays: true, objects: true, separator: '.' })],
});
parser.parse([
  {
    a: { '': { b: 1 } },
  },
  {
    anything: 2, // This can be literally anything, even an empty object
  },
]);

which when run, will enter the infinite loop and soon crash after maxing out the V8 Array size limit (1 GB):

Stack trace
/my/dir/node_modules/@json2csv/plainjs/dist/mjs/utils.js:21
        result.push((_c = (_a = match[3]) !== null && _a !== void 0 ? _a : (_b = match[1]) === null || _b === void 0 ? void 0 : _b.trim()) !== null && _c !== void 0 ? _c : match[0]);
               ^

RangeError: Invalid array length
    at Array.push (<anonymous>)
    at castPath (file:///my/dir/node_modules/@json2csv/plainjs/dist/mjs/utils.js:21:16)
    at getProp (file:///my/dir/node_modules/@json2csv/plainjs/dist/mjs/utils.js:30:56)
    at Object.value (file:///my/dir/node_modules/@json2csv/plainjs/dist/mjs/BaseParser.js:65:37)
    at JSON2CSVParser.processCell (file:///my/dir/node_modules/@json2csv/plainjs/dist/mjs/BaseParser.js:133:44)
    at file:///my/dir/node_modules/@json2csv/plainjs/dist/mjs/BaseParser.js:118:71
    at Array.map (<anonymous>)
    at JSON2CSVParser.processRow (file:///my/dir/node_modules/@json2csv/plainjs/dist/mjs/BaseParser.js:118:47)
    at file:///my/dir/node_modules/@json2csv/plainjs/dist/mjs/Parser.js:62:48
    at Array.map (<anonymous>)

Node.js v20.11.0

Workaround

An extremely silly workaround using null terminators and find-and-replace

As a temporary workaround, one silly solution that happens to work in my case is to use null terminators and a custom formatter; with all else equal in the repro above, we can change the Parser configuration to be:

const parser = new Parser({
  delimiter: ';',
  transforms: [flatten({ objects: true, arrays: true, separator: '\0' })],
  formatters: {
    header: (h) => h.replaceAll('\0\0', '."".').replaceAll('\0', '.'),
  },
})

which will work around the issue of only supporting . as a separator without accidentally replacing non-separator .s while also working around this issue by making the empty string sections explicit as ."". (instead of ..).

Entirely silly but appears to work. 🤡

@GervinFung
Copy link

GervinFung commented Oct 3, 2024

Similar error was thrown when a header ends with . for example, Header 1.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants