Skip to content

[SPARK-55830][SQL] Fix JDBC predicate pushdown dropping driver properties#55408

Draft
yadavay-amzn wants to merge 1 commit intoapache:masterfrom
yadavay-amzn:fix/SPARK-55830-jdbc-predicates
Draft

[SPARK-55830][SQL] Fix JDBC predicate pushdown dropping driver properties#55408
yadavay-amzn wants to merge 1 commit intoapache:masterfrom
yadavay-amzn:fix/SPARK-55830-jdbc-predicates

Conversation

@yadavay-amzn
Copy link
Copy Markdown

What changes were proposed in this pull request?

When using spark.read.jdbc() with predicates, custom JDBC driver properties (like socketFactory, cloudSqlInstance) were silently dropped, causing connection failures. Without predicates, the same properties worked fine.

Root Cause

In JDBCOptions, the constructor this(url, table, parameters: Map[String, String]) did:

this(CaseInsensitiveMap(parameters ++ Map(JDBC_URL -> url, JDBC_TABLE_NAME -> table)))

When parameters is a CaseInsensitiveMap (which it is in the predicate code path), parameters ++ Map(...) calls the inherited Map.++ (because the static type is Map[String, String]). The inherited Map.++ iterates this using CaseInsensitiveMap.iterator, which returns lowercased keys from keyLowerCasedMap. This creates a plain HashMap with lowercased keys, losing the original case.

JDBC drivers expect case-sensitive property names (e.g., socketFactory, not socketfactory), so asConnectionProperties returns properties with wrong-cased keys, and Properties.getProperty("socketFactory") returns null.

Fix

Wrap parameters in CaseInsensitiveMap first, then use CaseInsensitiveMap.++ which preserves original key case via the updated() method:

this(CaseInsensitiveMap(parameters) ++ Map(JDBC_URL -> url, JDBC_TABLE_NAME -> table))

Applied to both JDBCOptions and JdbcOptionsInWrite.

How was this patch tested?

Added a unit test in JDBCSuite that simulates the predicate code path: creates JDBCOptions from CaseInsensitiveMap ++ Properties.asScala and verifies asConnectionProperties preserves original key case.

Does this PR introduce any user-facing change?

Yes. Custom JDBC driver properties (e.g., socketFactory, cloudSqlInstance) now work correctly when using spark.read.jdbc() with predicates.

Was this patch authored or co-authored using generative AI tooling?

Yes

…ties

When using spark.read.jdbc() with predicates, custom JDBC driver properties
(like socketFactory, cloudSqlInstance) were silently dropped because
CaseInsensitiveMap.iterator returns lowercased keys. The JDBCOptions
constructor called the inherited Map.++ (due to static typing as
Map[String, String]), which iterated via iterator and lost the original
key case. JDBC drivers expect case-sensitive property names, so this
caused connection failures.

The fix wraps parameters in CaseInsensitiveMap first, then uses
CaseInsensitiveMap.++ which preserves original key case via the
updated() method and originalMap.

Closes #XXXXX

### Was this patch authored or co-authored using generative AI tooling?
Yes
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant