Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SNOW-1799603: add guidance and example on processing Arrow results WithHigherPrecision #1240

Closed
datbth opened this issue Nov 12, 2024 · 9 comments
Assignees
Labels
enhancement The issue is a request for improvement or a new feature status-triage_done Initial triage done, will be further handled by the driver team

Comments

@datbth
Copy link

datbth commented Nov 12, 2024

  1. What version of GO driver are you using?
    v1.12.0

  2. What operating system and processor architecture are you using?
    Ubuntu 24.04.1 LTS

  3. What version of GO are you using?
    go version go1.23.1 linux/amd64

  4. Server version: 8.42.2

  5. What did you do?

    • Usage:
      • Use WithArrowBatches and WithHigherPrecision
      • Query NUMBER with scale > 0 and fetch as Arrow
    • Code:
	db, err := sql.Open("snowflake", connUrl)
	require.NoError(t, err)
	defer db.Close()

	conn, err := db.Conn(ctx)
	require.NoError(t, err)

	ctx = sf.WithArrowBatches( // enable arrow downloader (snowflakeChunkDownloader)
		sf.WithHigherPrecision( // retrieve numbers with high precision
			ctx,
		),
	)

	sql := `
		SELECT 123.12345
		UNION ALL
		SELECT CAST(234.23456 AS NUMBER(38, 6))
	`

	var rows driver.Rows
	queryErr := conn.Raw(func(c any) error {
		queryRows, queryErr := c.(driver.QueryerContext).QueryContext(ctx, sql, nil)
		if queryErr != nil {
			return queryErr
		}
		rows = queryRows
		return nil
	})
	require.NoError(t, queryErr)
	defer rows.Close()

	arrowBatches, arrowBatchesErr := rows.(sf.SnowflakeRows).GetArrowBatches()
	require.NoError(t, arrowBatchesErr)

	for n := range arrowBatches {
		records, fetchErr := arrowBatches[n].Fetch()
		require.NoError(t, fetchErr)
		recordsArray := *records
		for i := range recordsArray {
			record := recordsArray[i]
			log.Printf(
				"(%s) %v",
				record.Column(0).DataType().Name(),
				record.Column(0).String(),
			)
		}
	}
  1. What did you expect to see?
  • Expected output: The Arrow Array is a array.Decimal128. Printf output:
(decimal128) [123.12345 234.23456]
  • Actual output: The Arrow Array is a array.Int32. Prinf output:
(int32) [123123450 234234560]
  1. Can you set logging to DEBUG and collect the logs?

    https://community.snowflake.com/s/article/How-to-generate-log-file-on-Snowflake-connectors

    Before sharing any information, please be sure to review the log and remove any sensitive
    information.

@datbth datbth added the bug Erroneous or unexpected behaviour label Nov 12, 2024
@github-actions github-actions bot changed the title Arrow result of NUMBER data with scale > 0 is array.Int instead of array.Decimal SNOW-1799603: Arrow result of NUMBER data with scale > 0 is array.Int instead of array.Decimal Nov 12, 2024
@datbth datbth changed the title SNOW-1799603: Arrow result of NUMBER data with scale > 0 is array.Int instead of array.Decimal SNOW-1799603: Arrow result of NUMBER data is array.Int instead of array.Decimal Nov 12, 2024
@datbth datbth changed the title SNOW-1799603: Arrow result of NUMBER data is array.Int instead of array.Decimal SNOW-1799603: GetArrowBatches returns array.Int instead of array.Decimal for NUMBER data Nov 13, 2024
@sfc-gh-dszmolka sfc-gh-dszmolka self-assigned this Nov 13, 2024
@sfc-gh-dszmolka sfc-gh-dszmolka added the status-triage Issue is under initial triage label Nov 13, 2024
@sfc-gh-dszmolka
Copy link
Contributor

hi - thank you for raising this issue with us. We'll take a look.

@sfc-gh-dszmolka
Copy link
Contributor

so if we call .Schema() on the record:

schema:
  fields: 1
    - 123.12345: type=int32
           metadata: ["logicalType": "FIXED", "precision": "38", "scale": "6", "charLength": "0", "byteLength": "4", "finalType": "T"]

we see that the Arrow schema, coming from the server, dictates int32. Why ? It has been discussed in detail in #1219 (comment) bullet point 2.

(The values themselves (123123450 234234560) are due to WithHigherPrecision expected behaviour, returning the more precise format and the client can calculate the value using the scale also provided in the schema)

Considering this is all to an expected behaviour, let us know if you need any further help on the matter, otherwise I would like to close it.

@sfc-gh-dszmolka sfc-gh-dszmolka added question Issue is a usage/other question rather than a bug status-triage_done Initial triage done, will be further handled by the driver team and removed bug Erroneous or unexpected behaviour status-triage Issue is under initial triage labels Nov 13, 2024
@datbth
Copy link
Author

datbth commented Nov 13, 2024

hmm I see.

I thought the server would choose the minimal size but still keep the data class. But this is really surprising and hard to use. Now any meaningful processing/consumption on the NUMBER data requires casting/converting the data, which is not straightforward at all and also not efficient.

I guess I can't really push for a change here. But perhaps at least these details should be included in the docs. E.g.

  • Disclaimer: NUMBER data in Arrow-format result can be arrow.Decimal or arrow.Int
  • Some pointers/guidelines on how to process such Arrow numbers accurately.

@sfc-gh-pfus
Copy link
Collaborator

Hi @datbth ! Arrow batches mode is a low-level mode, which is more performant because it does fewer operations. On the other hand, it is useful for people who either prefer a columnar approach or just pass Arrow to another service - the latter especially find it good that we don't do any unnecessary processing.

We mention this compression in our docs: https://github.com/snowflakedb/gosnowflake/blob/master/doc.go#L692
We could add an example for processing, but we can't promise a timeline.

@sfc-gh-dszmolka sfc-gh-dszmolka changed the title SNOW-1799603: GetArrowBatches returns array.Int instead of array.Decimal for NUMBER data SNOW-1799603: add guidance and example on processing Arrow results WithHigherPrecision Nov 14, 2024
@sfc-gh-dszmolka sfc-gh-dszmolka added the enhancement The issue is a request for improvement or a new feature label Nov 14, 2024
@sfc-gh-dszmolka sfc-gh-dszmolka removed the question Issue is a usage/other question rather than a bug label Nov 14, 2024
@datbth
Copy link
Author

datbth commented Nov 14, 2024

Thank you.

We mention this compression in our docs: https://github.com/snowflakedb/gosnowflake/blob/master/doc.go#L692

I see it now. Sorry for having missed it earlier.
But still, that line looks like 1::decimal(38,0) can be returned as 1 (arrow.Int8), rather than something like 1::decimal(38,0) can be returned as 100 (arrow.Int8 with logical scale 2).

or just pass Arrow to another service - the latter especially find it good that we don't do any unnecessary processing

I might be missing something here, but could you give some example services that can process such arrow.Int32 with such a custom Logical Type? As far as I know, it isn't a canonical Arrow type, is it?
For example, I couldn't find anywhere in these docs that Int32 can have "precision" or "scale":

If it's indeed a "custom" type that is only used/produced by Snowflake, it should be documented accordingly.

@sfc-gh-pfus
Copy link
Collaborator

But still, that line looks like 1::decimal(38,0) can be returned as 1 (arrow.Int8), rather than something like 1::decimal(38,0) can be returned as 100 (arrow.Int8 with logical scale 2).
I don't get it. Why is 1::decimal(38, 0) to be converted to have scale 2? That's not quite possible; I think that the backend should always use scale = 0 in that case.

As for the second part - it is not a canonical format. It is just an arrow.Int32, but with scale. I added a small doc improvement: https://github.com/snowflakedb/gosnowflake/pull/1246/files

@datbth
Copy link
Author

datbth commented Nov 18, 2024

I don't get it. Why is 1::decimal(38, 0) to be converted to have scale 2? That's not quite possible; I think that the backend should always use scale = 0 in that case.

I see. My point was that the doc did not tell in which case the Int data would have a non-zero scale.
But your new doc commit above clears that up.

@sfc-gh-dszmolka
Copy link
Contributor

documentation update merged with #1246. If you don't have any further questions in context with this issue, i would suggest closing it.

@datbth
Copy link
Author

datbth commented Nov 19, 2024

okay thank you

@datbth datbth closed this as completed Nov 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement The issue is a request for improvement or a new feature status-triage_done Initial triage done, will be further handled by the driver team
Projects
None yet
Development

No branches or pull requests

4 participants