SNOW-1799603: add guidance and example on processing Arrow results `WithHigherPrecision` #1240

datbth · 2024-11-12T12:04:36Z

What version of GO driver are you using?
v1.12.0
What operating system and processor architecture are you using?
Ubuntu 24.04.1 LTS
What version of GO are you using?
go version go1.23.1 linux/amd64
Server version: 8.42.2
What did you do?
- Usage:
  - Use WithArrowBatches and WithHigherPrecision
  - Query NUMBER with scale > 0 and fetch as Arrow
- Code:

	db, err := sql.Open("snowflake", connUrl)
	require.NoError(t, err)
	defer db.Close()

	conn, err := db.Conn(ctx)
	require.NoError(t, err)

	ctx = sf.WithArrowBatches( // enable arrow downloader (snowflakeChunkDownloader)
		sf.WithHigherPrecision( // retrieve numbers with high precision
			ctx,
		),
	)

	sql := `
		SELECT 123.12345
		UNION ALL
		SELECT CAST(234.23456 AS NUMBER(38, 6))
	`

	var rows driver.Rows
	queryErr := conn.Raw(func(c any) error {
		queryRows, queryErr := c.(driver.QueryerContext).QueryContext(ctx, sql, nil)
		if queryErr != nil {
			return queryErr
		}
		rows = queryRows
		return nil
	})
	require.NoError(t, queryErr)
	defer rows.Close()

	arrowBatches, arrowBatchesErr := rows.(sf.SnowflakeRows).GetArrowBatches()
	require.NoError(t, arrowBatchesErr)

	for n := range arrowBatches {
		records, fetchErr := arrowBatches[n].Fetch()
		require.NoError(t, fetchErr)
		recordsArray := *records
		for i := range recordsArray {
			record := recordsArray[i]
			log.Printf(
				"(%s) %v",
				record.Column(0).DataType().Name(),
				record.Column(0).String(),
			)
		}
	}

What did you expect to see?

Expected output: The Arrow Array is a array.Decimal128. Printf output:

(decimal128) [123.12345 234.23456]

Actual output: The Arrow Array is a array.Int32. Prinf output:

(int32) [123123450 234234560]

Can you set logging to DEBUG and collect the logs?

https://community.snowflake.com/s/article/How-to-generate-log-file-on-Snowflake-connectors

Before sharing any information, please be sure to review the log and remove any sensitive
information.

The text was updated successfully, but these errors were encountered:

sfc-gh-dszmolka · 2024-11-13T12:16:26Z

hi - thank you for raising this issue with us. We'll take a look.

sfc-gh-dszmolka · 2024-11-13T14:27:06Z

so if we call .Schema() on the record:

schema:
  fields: 1
    - 123.12345: type=int32
           metadata: ["logicalType": "FIXED", "precision": "38", "scale": "6", "charLength": "0", "byteLength": "4", "finalType": "T"]

we see that the Arrow schema, coming from the server, dictates int32. Why ? It has been discussed in detail in #1219 (comment) bullet point 2.

(The values themselves (123123450 234234560) are due to WithHigherPrecision expected behaviour, returning the more precise format and the client can calculate the value using the scale also provided in the schema)

Considering this is all to an expected behaviour, let us know if you need any further help on the matter, otherwise I would like to close it.

datbth · 2024-11-13T15:00:24Z

hmm I see.

I thought the server would choose the minimal size but still keep the data class. But this is really surprising and hard to use. Now any meaningful processing/consumption on the NUMBER data requires casting/converting the data, which is not straightforward at all and also not efficient.

I guess I can't really push for a change here. But perhaps at least these details should be included in the docs. E.g.

Disclaimer: NUMBER data in Arrow-format result can be arrow.Decimal or arrow.Int
Some pointers/guidelines on how to process such Arrow numbers accurately.

sfc-gh-pfus · 2024-11-14T06:22:17Z

Hi @datbth ! Arrow batches mode is a low-level mode, which is more performant because it does fewer operations. On the other hand, it is useful for people who either prefer a columnar approach or just pass Arrow to another service - the latter especially find it good that we don't do any unnecessary processing.

We mention this compression in our docs: https://github.com/snowflakedb/gosnowflake/blob/master/doc.go#L692
We could add an example for processing, but we can't promise a timeline.

datbth · 2024-11-14T10:31:25Z

Thank you.

We mention this compression in our docs: https://github.com/snowflakedb/gosnowflake/blob/master/doc.go#L692

I see it now. Sorry for having missed it earlier.
But still, that line looks like 1::decimal(38,0) can be returned as 1 (arrow.Int8), rather than something like 1::decimal(38,0) can be returned as 100 (arrow.Int8 with logical scale 2).

or just pass Arrow to another service - the latter especially find it good that we don't do any unnecessary processing

I might be missing something here, but could you give some example services that can process such arrow.Int32 with such a custom Logical Type? As far as I know, it isn't a canonical Arrow type, is it?
For example, I couldn't find anywhere in these docs that Int32 can have "precision" or "scale":

If it's indeed a "custom" type that is only used/produced by Snowflake, it should be documented accordingly.

sfc-gh-pfus · 2024-11-18T09:16:59Z

But still, that line looks like 1::decimal(38,0) can be returned as 1 (arrow.Int8), rather than something like 1::decimal(38,0) can be returned as 100 (arrow.Int8 with logical scale 2).
I don't get it. Why is 1::decimal(38, 0) to be converted to have scale 2? That's not quite possible; I think that the backend should always use scale = 0 in that case.

As for the second part - it is not a canonical format. It is just an arrow.Int32, but with scale. I added a small doc improvement: https://github.com/snowflakedb/gosnowflake/pull/1246/files

datbth · 2024-11-18T10:45:57Z

I don't get it. Why is 1::decimal(38, 0) to be converted to have scale 2? That's not quite possible; I think that the backend should always use scale = 0 in that case.

I see. My point was that the doc did not tell in which case the Int data would have a non-zero scale.
But your new doc commit above clears that up.

sfc-gh-dszmolka · 2024-11-18T14:24:39Z

documentation update merged with #1246. If you don't have any further questions in context with this issue, i would suggest closing it.

datbth · 2024-11-19T11:57:03Z

okay thank you

datbth added the bug Erroneous or unexpected behaviour label Nov 12, 2024

github-actions bot changed the title ~~Arrow result of NUMBER data with scale > 0 is array.Int instead of array.Decimal~~ SNOW-1799603: Arrow result of NUMBER data with scale > 0 is array.Int instead of array.Decimal Nov 12, 2024

datbth changed the title ~~SNOW-1799603: Arrow result of NUMBER data with scale > 0 is array.Int instead of array.Decimal~~ SNOW-1799603: Arrow result of NUMBER data is array.Int instead of array.Decimal Nov 12, 2024

datbth changed the title ~~SNOW-1799603: Arrow result of NUMBER data is array.Int instead of array.Decimal~~ SNOW-1799603: GetArrowBatches returns array.Int instead of array.Decimal for NUMBER data Nov 13, 2024

sfc-gh-dszmolka self-assigned this Nov 13, 2024

sfc-gh-dszmolka added the status-triage Issue is under initial triage label Nov 13, 2024

sfc-gh-dszmolka added question Issue is a usage/other question rather than a bug status-triage_done Initial triage done, will be further handled by the driver team and removed bug Erroneous or unexpected behaviour status-triage Issue is under initial triage labels Nov 13, 2024

sfc-gh-dszmolka changed the title ~~SNOW-1799603: GetArrowBatches returns array.Int instead of array.Decimal for NUMBER data~~ SNOW-1799603: add guidance and example on processing Arrow results WithHigherPrecision Nov 14, 2024

sfc-gh-dszmolka added the enhancement The issue is a request for improvement or a new feature label Nov 14, 2024

sfc-gh-dszmolka assigned sfc-gh-snow-drivers-warsaw-dl and unassigned sfc-gh-dszmolka Nov 14, 2024

sfc-gh-dszmolka removed the question Issue is a usage/other question rather than a bug label Nov 14, 2024

datbth closed this as completed Nov 19, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SNOW-1799603: add guidance and example on processing Arrow results `WithHigherPrecision` #1240

SNOW-1799603: add guidance and example on processing Arrow results `WithHigherPrecision` #1240

datbth commented Nov 12, 2024 •

edited

Loading

sfc-gh-dszmolka commented Nov 13, 2024

sfc-gh-dszmolka commented Nov 13, 2024

datbth commented Nov 13, 2024

sfc-gh-pfus commented Nov 14, 2024

datbth commented Nov 14, 2024 •

edited

Loading

sfc-gh-pfus commented Nov 18, 2024

datbth commented Nov 18, 2024

sfc-gh-dszmolka commented Nov 18, 2024

datbth commented Nov 19, 2024

SNOW-1799603: add guidance and example on processing Arrow results WithHigherPrecision #1240

SNOW-1799603: add guidance and example on processing Arrow results WithHigherPrecision #1240

Comments

datbth commented Nov 12, 2024 • edited Loading

sfc-gh-dszmolka commented Nov 13, 2024

sfc-gh-dszmolka commented Nov 13, 2024

datbth commented Nov 13, 2024

sfc-gh-pfus commented Nov 14, 2024

datbth commented Nov 14, 2024 • edited Loading

sfc-gh-pfus commented Nov 18, 2024

datbth commented Nov 18, 2024

sfc-gh-dszmolka commented Nov 18, 2024

datbth commented Nov 19, 2024

SNOW-1799603: add guidance and example on processing Arrow results `WithHigherPrecision` #1240

SNOW-1799603: add guidance and example on processing Arrow results `WithHigherPrecision` #1240

datbth commented Nov 12, 2024 •

edited

Loading

datbth commented Nov 14, 2024 •

edited

Loading