The `Cursor.fetch_dataframe` method doesn't respect case-sensitivity #238

georgesittas · 2024-10-17T21:57:24Z

Driver version

2.1.3

Redshift version

PostgreSQL 8.0.2 on i686-pc-linux-gnu, compiled by GCC gcc (GCC) 3.4.2 20041017 (Red Hat 3.4.2-6.fc3), Redshift 1.0.76169

Client Operating System

macos

Python version

3.12.6

Problem description

The Cursor.fetch_dataframe method always lowercases column names, irrespective of the case-sensitivity configuration value. This behavior is unexpected, because the resulting DataFrame's columns may be treated as case-insensitive, even when that flag is set to true.

One example where this can be problematic is demonstrated below:

from redshift_connector import connect

conn = connect(<connection options>)
cursor = conn.cursor()

cursor.execute('SET enable_case_sensitive_identifier TO true')
cursor.execute('WITH t AS (SELECT 1 AS "C", 2 AS "c") SELECT * FROM t')

# cursor.fetch_dataframe()
#    c  c
# 0  1  2

# cursor.fetch_dataframe().to_dict()
# <stdin>:1: UserWarning: DataFrame columns are not unique, some columns will be omitted.
# {'c': {0: 2}}

Possible solutions

I see that there is an open PR related to this issue, but I don't think it solves it. I believe that the correct way to solve this is to get rid of the lower() call in line 526 altogether. That would mean that the columns produced by the driver reflect those returned by Redshift, hence respecting the case-sensitivity configuration value (see image below). I plan to open a PR with this fix soon.

P.S.: I see that the line of interest was introduced in the first commit of this repo (!) and hasn't changed since. My hunch is that this was most likely an oversight, since Redshift's documentation at the date of that commit had no mention of the enable_case_sensitive_identifier flag, so the driver must've not been updated to take it into account after it was introduced.

The text was updated successfully, but these errors were encountered:

Brooke-white · 2024-10-28T15:46:25Z

Hi @georgesittas , thank you for raising this issue and corresponding PR. The team is taking a look at the PR and will get back to you.

georgesittas · 2024-10-28T15:47:20Z

Hey @Brooke-white, appreciate the response 👍

This was referenced Oct 17, 2024

fix(cursor, fetch_dataframe): use column names in cursor's description as is #239

Open

Fix: ensure Redshift's _fetch_native_df respects case-sensitivity TobikoData/sqlmesh#3266

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

The `Cursor.fetch_dataframe` method doesn't respect case-sensitivity #238

The `Cursor.fetch_dataframe` method doesn't respect case-sensitivity #238

georgesittas commented Oct 17, 2024

Brooke-white commented Oct 28, 2024

georgesittas commented Oct 28, 2024

The Cursor.fetch_dataframe method doesn't respect case-sensitivity #238

The Cursor.fetch_dataframe method doesn't respect case-sensitivity #238

Comments

georgesittas commented Oct 17, 2024

Driver version

Redshift version

Client Operating System

Python version

Problem description

Possible solutions

Brooke-white commented Oct 28, 2024

georgesittas commented Oct 28, 2024

The `Cursor.fetch_dataframe` method doesn't respect case-sensitivity #238

The `Cursor.fetch_dataframe` method doesn't respect case-sensitivity #238