Make ShapedArray.description's maxScalarCountPerLine user-customizable #1168

xanderdunn · 2020-12-24T01:45:21Z

Here is ShapedArray's fileprivate func description( indentLevel: Int, edgeElementCount: Int, maxScalarLength: Int, maxScalarCountPerLine: Int, summarizing: Bool ) -> String.

Is there any reason this is marked fileprivate? It's currently accessible only via the public func description( lineWidth: Int = 80, edgeElementCount: Int = 3, summarizing: Bool = false ) where the maxScalarCountPerLine is calculated for me:

let maxScalarCountPerLine = Swift.max(1, lineWidth / maxScalarLength)

Calculating the maxScalarCountPerLine independently for each Tensor leads to this problem:

=== Feature 0:
input:
[ -0.022060618,   0.024561103,  -0.025651768,   -0.04885944,   0.012175075,   0.006922609,    -0.0516627,
  -0.019092154,   0.024305645,  -0.028501112,  -0.047275346,   0.014285761,    0.00435431,  -0.052575804,
   -0.01609808,   0.023822624,  -0.031269953,   -0.04550122,   0.016227337,  0.0016748396,   -0.05326795,
  -0.013095862,    0.02311485,   -0.03394214,  -0.043547418,   0.017988473,  -0.001100175,  -0.053735107,
  -0.010103013,   0.022186458,  -0.036502086,          -0.0,           0.0, -0.0039545433,  -0.053974554]
output:
[       -0.0,         0.0,        -0.0,        -0.0,         0.0,        -0.0,        -0.0,        -0.0,        -0.0,
        -0.0,        -0.0,         0.0,         0.0,        -0.0,         0.0,        -0.0,        -0.0,         0.0,
         0.0,         0.0,        -0.0,        -0.0,         0.0,         0.0,         0.0,         0.0,        -0.0,
        -0.0,         0.0,         0.0,        -0.0,  -1.6888539, -0.99011576,         0.0,         0.0]
target:
[        -0.0,          0.0,         -0.0,         -0.0,          0.0,          0.0,         -0.0,         -0.0,
          0.0,         -0.0,         -0.0,          0.0,          0.0,         -0.0,         -0.0,          0.0,
         -0.0,         -0.0,          0.0,          0.0,         -0.0,         -0.0,          0.0,         -0.0,
         -0.0,          0.0,         -0.0,         -0.0,         -0.0,          0.0,         -0.0, -0.041425332,
  0.019558901,         -0.0,         -0.0]

Each Tensor has a different number of scalars per line, so the values are visually shifted in each of the three descriptions. This makes it difficult to visually inspect the values at the same position in each tensor.

I would like to be able to force the max number of scalars per line for each Tensor so that the values are more readily visually comparable.

The text was updated successfully, but these errors were encountered:

dan-zheng · 2020-12-24T02:00:30Z

Here's the PR that added Tensor pretty-printing: swiftlang/swift#23837. The Swift implementation of ShapedArray.description is largely based on the PyTorch pretty-printing implementation, which is a simpler version of NumPy's: TF-419.

The original goal with Tensor pretty-printing was to closely match the output of PyTorch. Does your example print better in existing n-d array libraries, like NumPy or PyTorch? I wonder if PyTorch printing exposes enough knobs to achieve what you'd like to do, without using unreasonably unsafe or private APIs.

xanderdunn · 2020-12-24T02:14:56Z

Thanks @dan-zheng. I copied the above linked swift-apis code into my project:

extension String {
  /// Returns a string of the specified length, padded with whitespace to the left.
  func leftPadded(toLength length: Int) -> String {
    return repeatElement(" ", count: max(0, length - count)) + self
  }
}

public extension ShapedArray {

  func vectorDescription(
    indentLevel: Int,
    edgeElementCount: Int,
    maxScalarLength: Int,
    maxScalarCountPerLine: Int,
    summarizing: Bool
  ) -> String {
    // Get scalar descriptions.
    func scalarDescription(_ element: Element) -> String {
      let description = String(describing: element)
      return description.leftPadded(toLength: maxScalarLength)
    }

    var scalarDescriptions: [String] = []
    if summarizing && count > 2 * edgeElementCount {
      scalarDescriptions += prefix(edgeElementCount).map(scalarDescription)
      scalarDescriptions += ["..."]
      scalarDescriptions += suffix(edgeElementCount).map(scalarDescription)
    } else {
      scalarDescriptions += map(scalarDescription)
    }

    // Combine scalar descriptions into lines, based on the scalar count per line.
    let lines = stride(
      from: scalarDescriptions.startIndex,
      to: scalarDescriptions.endIndex,
      by: maxScalarCountPerLine
    ).map { i -> ArraySlice<String> in
      let upperBound = Swift.min(
        i.advanced(by: maxScalarCountPerLine),
        scalarDescriptions.count)
      return scalarDescriptions[i..<upperBound]
    }

    // Return lines joined with separators.
    let lineSeparator = ",\n" + String(repeating: " ", count: indentLevel + 1)
    return lines.enumerated().reduce(into: "[") { result, entry in
      let (i, line) = entry
      result += line.joined(separator: ", ")
      result += i != lines.count - 1 ? lineSeparator : ""
    } + "]"
  }

  func description(
    indentLevel: Int,
    edgeElementCount: Int,
    maxScalarLength: Int,
    maxScalarCountPerLine: Int,
    summarizing: Bool
  ) -> String {
    // Handle scalars.
    if let scalar = scalar {
      return String(describing: scalar)
    }

    // Handle vectors, which have special line-width-sensitive logic.
    if rank == 1 {
      return vectorDescription(
        indentLevel: indentLevel,
        edgeElementCount: edgeElementCount,
        maxScalarLength: maxScalarLength,
        maxScalarCountPerLine: maxScalarCountPerLine,
        summarizing: summarizing)
    }

    // Handle higher-rank tensors.
    func elementDescription(_ element: Element) -> String {
      return element.description//(
        /*indentLevel: indentLevel + 1,*/
        /*edgeElementCount: edgeElementCount,*/
        /*maxScalarLength: maxScalarLength,*/
        /*maxScalarCountPerLine: maxScalarCountPerLine,*/
        /*summarizing: summarizing)*/
    }

    var elementDescriptions: [String] = []
    if summarizing && count > 2 * edgeElementCount {
      elementDescriptions += prefix(edgeElementCount).map(elementDescription)
      elementDescriptions += ["..."]
      elementDescriptions += suffix(edgeElementCount).map(elementDescription)
    } else {
      elementDescriptions += map(elementDescription)
    }

    // Return lines joined with separators.
    let lineSeparator =
      "," + String(repeating: "\n", count: rank - 1)
      + String(repeating: " ", count: indentLevel + 1)
    return elementDescriptions.enumerated().reduce(into: "[") { result, entry in
      let (i, elementDescription) = entry
      result += elementDescription
      result += i != elementDescriptions.count - 1 ? lineSeparator : ""
    } + "]"
  }
}

And this achieved what I wanted:

=== Variable 0:
input:
[-0.046002183, 0.015716469, 0.0024140144, -0.05310176, -0.013912535, 0.023329472, -0.033225544, -0.044096183,
  0.017527763, -0.00033658138,       -0.0,       -0.0,        0.0,       -0.0,       -0.0,        0.0,
  -0.0031709874, -0.05393206, -0.007940362, 0.021374973, -0.038286343, -0.03978186, 0.020576812, -0.006072663,
  -0.0540048, -0.005004768, 0.020078853, -0.04061756, -0.037398703, 0.021796772, -0.009024683, -0.053848244,
  -0.002125753, 0.01857983, -0.04279759]
output:
[      -0.0,       -0.0,        0.0,        0.0,       -0.0,        0.0,       -0.0,        0.0,
         0.0,       -0.0, -0.2371707,  1.7225978, -2.3474987, -1.0847256,  1.6712375,  0.3812195,
        -0.0,       -0.0,       -0.0,        0.0,       -0.0,       -0.0,       -0.0,        0.0,
        -0.0,        0.0,       -0.0,        0.0,        0.0,       -0.0,       -0.0,       -0.0,
         0.0,       -0.0,       -0.0]
target:
[      -0.0,        0.0,        0.0,       -0.0,       -0.0,        0.0,       -0.0,       -0.0,
         0.0,       -0.0, -0.053630464, -0.010915402, 0.022460626, -0.035817537, -0.042018704, 0.019151034,
        -0.0,       -0.0,       -0.0,        0.0,       -0.0,       -0.0,        0.0,       -0.0,
        -0.0,       -0.0,        0.0,       -0.0,       -0.0,        0.0,       -0.0,       -0.0,
        -0.0,        0.0,       -0.0]

The corresponding values are all lined up visually now. Given that it works as-is, I was wondering why it needed to be fileprivate.

I'm not seeing any options in Pytorch's set_printoptions that would achieve this. Nothing stands out in the numpy implementation that would achieve it either. No worries if this is outside the scope of the intended API.

dan-zheng · 2020-12-24T05:57:09Z

Nice, thanks for sharing your code snippet! Could you please complete the example by showing the invocation of ShapedArray.description(...) used to print the array contents?

I would describe your change as "adding maxScalarCountPerLine as a customizable argument to ShapedArray.description(...)" - would you agree with this more pointed description? If so, I might recommend changing the PR title to be more specific along those lines. Currently, the title sounds like "changing private API to be public", which sounds scarier.

Supporting this change tentatively sounds good to me (I haven't thought about it super hard). Do you have some intuition why maxScalarCountPerLine should be user-customizable instead of always computing it from other arguments (maxScalarLength, edgeElementCount)? Feel free to start a PR with tests for review!

xanderdunn · 2020-12-24T11:59:41Z

Each ShapedArray was printed with:

print(myTensor[TensorRange.ellipsis, variableIndex].array.description(indentLevel: 0,
                                                                         edgeElementCount: 50,
                                                                         maxScalarLength: 10,
                                                                         maxScalarCountPerLine: 8,
                                                                         summarizing: true))

Oh yes, the idea of opening currently private API is incidental. The core idea is supporting maxScalarCountPerLine in ShapedArray's description. Thanks, I changed the title.

maxScalarCountPerLine should be user-customizable so that tensors can be printed with the same number of rows and columns, thus making different tensors visually comparable, like in my above two examples. I think right now maxScalarCountPerLine is a function of lineWidth and maxScalarLength:

public func description(
    lineWidth: Int = 80,
    edgeElementCount: Int = 3,
    summarizing: Bool = false
  ) -> String {
let maxScalarLength = scalars.lazy.map { String(describing: $0).count }.max() ?? 3
let maxScalarCountPerLine = Swift.max(1, lineWidth / maxScalarLength)

where only the lineWidth is user-customizable. This makes it difficult or impossible to control the number of columns that are printed for a given tensor, which makes it difficult to visually compare tensors.

This was born out of a desire to understand my loss values. Printing a snippet of my model's output alongside input and target allows me to understand how close the outputs really are to the targets in absolute value for a given valLoss.

dan-zheng · 2020-12-24T12:16:42Z

Thanks for the context! Feel free to open a PR, if you'd like to upstream your func description(...) changes.

xanderdunn · 2020-12-24T23:13:22Z

Thanks @dan-zheng, I'll plan to open a pull request with some unit tests after Christmas.

dan-zheng · 2020-12-24T23:14:27Z

Thank you! Take your time - Santa and we are patiently waiting for you after Xmas.

brettkoonce · 2020-12-25T02:49:10Z

🎅

dan-zheng · 2020-12-25T08:13:01Z

🎅

tbh GitHub needs to enable more than eight reaction emoji 🙎🏻‍♂️

xanderdunn changed the title ~~Make ShapedArray description(indentLevel:edgeElementCount:maxScalarLength:maxScalarCountPerLine:summarizing:) Public~~ Add maxScalarCountPerLine to ShapedArray.description Dec 24, 2020

xanderdunn changed the title ~~Add maxScalarCountPerLine to ShapedArray.description~~ Make ShapedArray.description's maxScalarCountPerLine user-customizable Dec 24, 2020

dan-zheng assigned xanderdunn Dec 24, 2020

dan-zheng added enhancement New feature or request help wanted Extra attention is needed labels Dec 24, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make ShapedArray.description's maxScalarCountPerLine user-customizable #1168

Make ShapedArray.description's maxScalarCountPerLine user-customizable #1168

xanderdunn commented Dec 24, 2020 •

edited

Loading

dan-zheng commented Dec 24, 2020

xanderdunn commented Dec 24, 2020 •

edited

Loading

dan-zheng commented Dec 24, 2020 •

edited

Loading

xanderdunn commented Dec 24, 2020

dan-zheng commented Dec 24, 2020

xanderdunn commented Dec 24, 2020

dan-zheng commented Dec 24, 2020

brettkoonce commented Dec 25, 2020

dan-zheng commented Dec 25, 2020

Make ShapedArray.description's maxScalarCountPerLine user-customizable #1168

Make ShapedArray.description's maxScalarCountPerLine user-customizable #1168

Comments

xanderdunn commented Dec 24, 2020 • edited Loading

dan-zheng commented Dec 24, 2020

xanderdunn commented Dec 24, 2020 • edited Loading

dan-zheng commented Dec 24, 2020 • edited Loading

xanderdunn commented Dec 24, 2020

dan-zheng commented Dec 24, 2020

xanderdunn commented Dec 24, 2020

dan-zheng commented Dec 24, 2020

brettkoonce commented Dec 25, 2020

dan-zheng commented Dec 25, 2020

xanderdunn commented Dec 24, 2020 •

edited

Loading

xanderdunn commented Dec 24, 2020 •

edited

Loading

dan-zheng commented Dec 24, 2020 •

edited

Loading