Skip to content

Commit 8ed531e

Browse files
saurabh500Saurabh Singh (SQL Drivers)Copilot
authored
FEAT: Improved Connection String handling in Python (#307)
### Work Item / Issue Reference <!-- IMPORTANT: Please follow the PR template guidelines below. For mssql-python maintainers: Insert your ADO Work Item ID below (e.g. AB#37452) For external contributors: Insert Github Issue number below (e.g. #149) Only one reference is required - either GitHub issue OR ADO Work Item. --> <!-- mssql-python maintainers: ADO Work Item --> > [AB#39896](https://sqlclientdrivers.visualstudio.com/c6d89619-62de-46a0-8b46-70b92a84d85e/_workitems/edit/39896) <!-- External contributors: GitHub Issue --> > GitHub Issue: #306 ------------------------------------------------------------------- ### Summary Fixes #306 by introducing a connection string parser for mssql-python. ### PR Title Guide FEAT: Improve connection string parsing, honor the allow list, and improve error messaging. # Connection String Allowlist Feature - Review Guide ## Overview This PR introduces a comprehensive connection string validation and allowlist system that validates all ODBC connection parameters before passing them to the driver, improving usability and providing better error messages. --- ## What This Changes **Before:** Connection strings were passed directly to the ODBC driver with minimal or no validation. Unknown parameters were silently ignored by ODBC, malformed strings caused cryptic ODBC errors. **After:** All connection strings are parsed, validated against an allowlist of ODBC Driver 18 parameters, and reconstructed with proper escaping. Clear error messages are provided for any issues. --- ## Data Flow ``` User Input (connection string + kwargs) ↓ ┌─────────────────────────────────────────────┐ │ Connection.__init__() │ │ _construct_connection_string() │ └─────────────────────────────────────────────┘ ↓ ┌─────────────────────────────────────────────┐ │ Step 1: Parse Connection String │ │ _ConnectionStringParser.parse() │ │ - Tokenizes key=value pairs │ │ - Handles braced values: {val} │ │ - Processes escape sequences: }}-> } │ │ - Detects syntax errors │ │ Output: Dict[str, str] │ └─────────────────────────────────────────────┘ ↓ ┌─────────────────────────────────────────────┐ │ Step 2: Validate Against Allowlist │ │ ConnectionStringAllowList.normalize_keys() │ │ - Checks unknown parameters │ │ - Normalizes synonyms (host→Server) │ │ - Blocks reserved params (Driver, APP) │ │ - Warns about rejected params │ │ Output: Dict[str, str] (filtered) │ └─────────────────────────────────────────────┘ ↓ ┌─────────────────────────────────────────────┐ │ Step 3: Process kwargs │ │ - Normalize each kwarg key │ │ - Block reserved parameters │ │ - Merge into filtered params │ │ - kwargs override connection string │ └─────────────────────────────────────────────┘ ↓ ┌─────────────────────────────────────────────┐ │ Step 4: Build Connection String │ │ _ConnectionStringBuilder.build() │ │ - Add Driver (hardcoded) │ │ - Add APP=MSSQL-Python │ │ - Escape special characters │ │ - Format: key1=value1;key2=value2; │ │ Output: str (final connection string) │ └─────────────────────────────────────────────┘ ↓ ODBC Driver (validated, safe connection string) ``` --- ## File Structure & Review Order ### **Recommended Review Order** Review in this sequence to understand the architecture: #### 1. **Core Components** (understand the building blocks) 1. **`mssql_python/connection_string_parser.py`** Start here - **Purpose:** Parses ODBC connection strings per MS-ODBCSTR spec - **Key Classes:** - `ConnectionStringParseError` - Custom exception - `_ConnectionStringParser` - Main parser class - **Key Methods:** - `parse(connection_str)` - Main entry point - `_parse_value(str, pos)` - Handles braced values & escaping - **What to Look For:** - Proper handling of escape sequences (`{{`, `}}`) - Error collection and batch reporting - Edge cases: empty strings, whitespace, special chars 2. **`mssql_python/connection_string_allowlist.py`** Critical - **Purpose:** Validates parameters against ODBC Driver 18 keywords - **Key Classes:** - `ConnectionStringAllowList` - Singleton allowlist manager - **Key Methods:** - `normalize_key(key)` - Maps synonyms to canonical names - `filter_params(params)` - Validates and filters parameters - **What to Look For:** - Completeness of allowlist (compare with ODBC docs) - Synonym mappings (host→Server, user→UID, etc.) - Reserved parameter handling (Driver, APP) 3. **`mssql_python/connection_string_builder.py`** - **Purpose:** Safely constructs connection strings with escaping - **Key Classes:** - `_ConnectionStringBuilder` - Builder with escape logic - **Key Methods:** - `add_param(key, value)` - Adds a parameter - `_needs_braces(value)` - Determines if braces needed - `_escape_value(value)` - Escapes special characters - `build()` - Constructs final string - **What to Look For:** - Correct escaping logic (`;`, `{`, `}`, `=`) - Proper brace placement - Semicolon formatting #### 2. **Integration** (see how it fits together) 4. **`mssql_python/connection.py`** Integration point - **Modified Method:** `_construct_connection_string()` - **What Changed:** - Lines 241-303: New implementation replacing old concat logic - Now uses parser-> filter-> builder pipeline - Handles kwargs with allowlist validation - Raises ValueError for reserved parameters - **What to Look For:** - Error handling for parse failures - kwargs override behavior - Reserved parameter rejection - Logging of warnings 5. **`mssql_python/__init__.py`** - **What Changed:** - Added export: `ConnectionStringParseError` - **What to Look For:** - Proper exception exposure for users #### 3. **Tests** (validate functionality) 6. **`tests/test_010_connection_string_parser.py`** Parser tests - **Coverage:** ~40 test cases - **Test Categories:** - Valid parsing scenarios - Braced value handling - Escape sequence processing - Error detection and reporting - Edge cases (empty, whitespace, unicode) - **What to Look For:** - Test coverage of MS-ODBCSTR spec - Error message clarity - Edge case handling 7. **`tests/test_011_connection_string_allowlist.py`** Allowlist tests - **Coverage:** ~25 test cases - **Test Categories:** - Key normalization (synonyms) - Parameter filtering - Reserved parameter blocking - Unknown parameter detection - **What to Look For:** - All ODBC parameters tested - Synonym mappings validated - Security (Driver/APP blocking) 8. **`tests/test_012_connection_string_integration.py`** Integration tests - **Coverage:** ~20 test cases - **Test Categories:** - End-to-end parsing-> filtering-> building - Real connection string scenarios - Error propagation from connect() API - kwargs override behavior - **What to Look For:** - Real-world usage patterns - Error messages match user expectations - Backward compatibility where possible 9. **`tests/test_003_connection.py`** (Updated) - **What Changed:** - Updated assertions in `test_construct_connection_string()` - Updated assertions in `test_connection_string_with_attrs_before()` - Updated assertions in `test_connection_string_with_odbc_param()` - **What to Look For:** - Assertions match new builder output format - No semicolons in middle of assertions (builder handles formatting) 10. **`tests/test_006_exceptions.py`** (Updated) - **What Changed:** - `test_connection_error()` now expects `ConnectionStringParseError` - Updated error message assertions - **What to Look For:** - Proper exception type changes - Error messages are helpful #### 4. **Documentation** (understand design decisions) 11. **`docs/connection_string_allow_list_design.md`** Read this - **Content:** - Design rationale and motivation - Architecture decisions - Security considerations - Future enhancements - **What to Look For:** - Justification for approach - Trade-offs discussed - Security implications understood 12. **`docs/parser_state_machine.md`** - **Content:** - Detailed state machine for parser - Character-by-character processing flow - Error handling states - Escape sequence examples - **What to Look For:** - Parser logic is well-documented - Edge cases covered in examples --- ## Areas to Focus On ### 1. **Security** - **Reserved Parameters:** Verify Driver and APP cannot be set by users - **Allowlist Completeness:** Check all ODBC Driver 18 params are included - **Escape Handling:** Ensure no injection via special characters ### 2. **Error Handling** - **Error Messages:** Are they clear and actionable? - **Error Collection:** Multiple errors reported together? - **Backward Compatibility:** Do meaningful errors replace cryptic ODBC errors? ### 3. **Performance** - **Parsing Overhead:** There is no string split being used during parsing. Hence the cost of multiple allocation of strings is avoided. - **No Regex in Hot Path:** Parser uses character-by-character processing. Regex have been known to cause problems and its advisable to stay away from them. - **Allowlist Lookup:** O(1) dict lookups, minimal overhead ### 4. **Correctness** - **Synonym Handling:** All common aliases map correctly - **Case Insensitivity:** Keys normalized consistently --- ## Testing Strategy ### Test Coverage Map | Component | Test File | Key Scenarios | |-----------|-----------|---------------| | Parser | `test_010_connection_string_parser.py` | Syntax, escaping, errors | | Allowlist | `test_011_connection_string_allowlist.py` | Validation, normalization | | Builder | `test_012_connection_string_integration.py` | Escaping, formatting | | Integration | `test_012_connection_string_integration.py` | End-to-end flows | | Connection | `test_003_connection.py` | Connection string construction | | Exceptions | `test_006_exceptions.py` | Error propagation | ## Behavior Changes Should be aware of these behavioral changes: ### 1. **Unknown Parameters Now Raise Errors** **Before:** ```python connect("Server=localhost;FakeParam=value") # Silently ignored ``` **After:** ```python connect("Server=localhost;FakeParam=value") # Raises: ConnectionStringParseError: Unknown keyword 'FakeParam' is not recognized ``` ### 2. **Malformed Strings Caught Early** **Before:** ```python connect("ServerLocalhost") # ODBC error later ``` **After:** ```python connect("ServerLocalhost") # Raises: ConnectionStringParseError: Incomplete specification: keyword 'ServerLocalhost' has no value ``` ### 3. **Reserved Parameters Blocked** **Before:** ```python connect("Server=localhost;Driver={SQL Server}") # Maybe ignored ``` **After:** ```python connect("Server=localhost;Driver={SQL Server}") # Raises: ValueError: Connection parameter 'Driver' is reserved and controlled by the driver ``` --- ## Key Concepts to Understand ### ODBC Connection String Format ``` key1=value1;key2=value2;key3={val;ue} ``` ### Braced Values Used when value contains semicolons or special characters: ``` PWD={my;pass;word} -> Password is "my;pass;word" ``` ### Escape Sequences - `{{`-> `{` (escaped left brace) - `}}`-> `}` (escaped right brace) Example: ``` PWD={a}}b{{c} -> Password is "a}b{c" ``` --------- Co-authored-by: Saurabh Singh (SQL Drivers) <singhsaura@microsoft.com> Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
1 parent f9acc07 commit 8ed531e

12 files changed

+2028
-134
lines changed

eng/pipelines/build-whl-pipeline.yml

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -340,7 +340,7 @@ jobs:
340340
python -m pytest -v
341341
displayName: 'Run Pytest to validate bindings'
342342
env:
343-
DB_CONNECTION_STRING: 'Driver=ODBC Driver 18 for SQL Server;Server=tcp:127.0.0.1,1433;Database=master;Uid=SA;Pwd=$(DB_PASSWORD);TrustServerCertificate=yes'
343+
DB_CONNECTION_STRING: 'Server=tcp:127.0.0.1,1433;Database=master;Uid=SA;Pwd=$(DB_PASSWORD);TrustServerCertificate=yes'
344344
345345
# Build wheel package for universal2
346346
- script: |
@@ -801,7 +801,7 @@ jobs:
801801
802802
displayName: 'Test wheel installation and basic functionality on $(BASE_IMAGE)'
803803
env:
804-
DB_CONNECTION_STRING: 'Driver=ODBC Driver 18 for SQL Server;Server=localhost;Database=TestDB;Uid=SA;Pwd=$(DB_PASSWORD);TrustServerCertificate=yes'
804+
DB_CONNECTION_STRING: 'Server=localhost;Database=TestDB;Uid=SA;Pwd=$(DB_PASSWORD);TrustServerCertificate=yes'
805805
806806
# Run pytest with source code while testing installed wheel
807807
- script: |
@@ -856,7 +856,7 @@ jobs:
856856
"
857857
displayName: 'Run pytest suite on $(BASE_IMAGE) $(ARCH)'
858858
env:
859-
DB_CONNECTION_STRING: 'Driver=ODBC Driver 18 for SQL Server;Server=localhost;Database=TestDB;Uid=SA;Pwd=$(DB_PASSWORD);TrustServerCertificate=yes'
859+
DB_CONNECTION_STRING: 'Server=localhost;Database=TestDB;Uid=SA;Pwd=$(DB_PASSWORD);TrustServerCertificate=yes'
860860
continueOnError: true # Don't fail pipeline if tests fail
861861
862862
# Cleanup

mssql_python/__init__.py

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -25,6 +25,7 @@
2525
InternalError,
2626
ProgrammingError,
2727
NotSupportedError,
28+
ConnectionStringParseError,
2829
)
2930

3031
# Type Objects
@@ -46,6 +47,10 @@
4647
# Connection Objects
4748
from .db_connection import connect, Connection
4849

50+
# Connection String Handling
51+
from .connection_string_parser import _ConnectionStringParser
52+
from .connection_string_builder import _ConnectionStringBuilder
53+
4954
# Cursor Objects
5055
from .cursor import Cursor
5156

mssql_python/connection.py

Lines changed: 51 additions & 25 deletions
Original file line numberDiff line numberDiff line change
@@ -41,6 +41,9 @@
4141
)
4242
from mssql_python.auth import process_connection_string
4343
from mssql_python.constants import ConstantsDDBC, GetInfoConstants
44+
from mssql_python.connection_string_parser import _ConnectionStringParser
45+
from mssql_python.connection_string_builder import _ConnectionStringBuilder
46+
from mssql_python.constants import _RESERVED_PARAMETERS
4447

4548
if TYPE_CHECKING:
4649
from mssql_python.row import Row
@@ -242,39 +245,62 @@ def _construct_connection_string(
242245
self, connection_str: str = "", **kwargs: Any
243246
) -> str:
244247
"""
245-
Construct the connection string by concatenating the connection string
246-
with key/value pairs from kwargs.
247-
248+
Construct the connection string by parsing, validating, and merging parameters.
249+
250+
This method performs a 6-step process:
251+
1. Parse and validate the base connection_str (validates against allowlist)
252+
2. Normalize parameter names (e.g., addr/address -> Server, uid -> UID)
253+
3. Merge kwargs (which override connection_str params after normalization)
254+
4. Build connection string from normalized, merged params
255+
5. Add Driver and APP parameters (always controlled by the driver)
256+
6. Return the final connection string
257+
248258
Args:
249259
connection_str (str): The base connection string.
250260
**kwargs: Additional key/value pairs for the connection string.
251261
252262
Returns:
253-
str: The constructed connection string.
263+
str: The constructed and validated connection string.
254264
"""
255-
# Add the driver attribute to the connection string
256-
conn_str = add_driver_to_connection_str(connection_str)
257-
258-
# Add additional key-value pairs to the connection string
265+
266+
# Step 1: Parse base connection string with allowlist validation
267+
# The parser validates everything: unknown params, reserved params, duplicates, syntax
268+
parser = _ConnectionStringParser(validate_keywords=True)
269+
parsed_params = parser._parse(connection_str)
270+
271+
# Step 2: Normalize parameter names (e.g., addr/address -> Server, uid -> UID)
272+
# This handles synonym mapping and deduplication via normalized keys
273+
normalized_params = _ConnectionStringParser._normalize_params(parsed_params, warn_rejected=False)
274+
275+
# Step 3: Process kwargs and merge with normalized_params
276+
# kwargs override connection string values (processed after, so they take precedence)
259277
for key, value in kwargs.items():
260-
if key.lower() == "host" or key.lower() == "server":
261-
key = "Server"
262-
elif key.lower() == "user" or key.lower() == "uid":
263-
key = "Uid"
264-
elif key.lower() == "password" or key.lower() == "pwd":
265-
key = "Pwd"
266-
elif key.lower() == "database":
267-
key = "Database"
268-
elif key.lower() == "encrypt":
269-
key = "Encrypt"
270-
elif key.lower() == "trust_server_certificate":
271-
key = "TrustServerCertificate"
278+
normalized_key = _ConnectionStringParser.normalize_key(key)
279+
if normalized_key:
280+
# Driver and APP are reserved - raise error if user tries to set them
281+
if normalized_key in _RESERVED_PARAMETERS:
282+
raise ValueError(
283+
f"Connection parameter '{key}' is reserved and controlled by the driver. "
284+
f"It cannot be set by the user."
285+
)
286+
# kwargs override any existing values from connection string
287+
normalized_params[normalized_key] = str(value)
272288
else:
273-
continue
274-
conn_str += f"{key}={value};"
275-
276-
log("info", "Final connection string: %s", sanitize_connection_string(conn_str))
277-
289+
log('warning', f"Ignoring unknown connection parameter from kwargs: {key}")
290+
291+
# Step 4: Build connection string with merged params
292+
builder = _ConnectionStringBuilder(normalized_params)
293+
294+
# Step 5: Add Driver and APP parameters (always controlled by the driver)
295+
# These maintain existing behavior: Driver is always hardcoded, APP is always MSSQL-Python
296+
builder.add_param('Driver', 'ODBC Driver 18 for SQL Server')
297+
builder.add_param('APP', 'MSSQL-Python')
298+
299+
# Step 6: Build final string
300+
conn_str = builder.build()
301+
302+
log('info', "Final connection string: %s", sanitize_connection_string(conn_str))
303+
278304
return conn_str
279305

280306
@property
Lines changed: 113 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,113 @@
1+
"""
2+
Copyright (c) Microsoft Corporation.
3+
Licensed under the MIT license.
4+
5+
Connection string builder for mssql-python.
6+
7+
Reconstructs ODBC connection strings from parameter dictionaries
8+
with proper escaping and formatting per MS-ODBCSTR specification.
9+
"""
10+
11+
from typing import Dict, Optional
12+
from mssql_python.constants import _CONNECTION_STRING_DRIVER_KEY
13+
14+
class _ConnectionStringBuilder:
15+
"""
16+
Internal builder for ODBC connection strings. Not part of public API.
17+
18+
Handles proper escaping of special characters and reconstructs
19+
connection strings in ODBC format.
20+
"""
21+
22+
def __init__(self, initial_params: Optional[Dict[str, str]] = None):
23+
"""
24+
Initialize the builder with optional initial parameters.
25+
26+
Args:
27+
initial_params: Dictionary of initial connection parameters
28+
"""
29+
self._params: Dict[str, str] = initial_params.copy() if initial_params else {}
30+
31+
def add_param(self, key: str, value: str) -> '_ConnectionStringBuilder':
32+
"""
33+
Add or update a connection parameter.
34+
35+
Args:
36+
key: Parameter name (should be normalized canonical name)
37+
value: Parameter value
38+
39+
Returns:
40+
Self for method chaining
41+
"""
42+
self._params[key] = str(value)
43+
return self
44+
45+
def build(self) -> str:
46+
"""
47+
Build the final connection string.
48+
49+
Returns:
50+
ODBC-formatted connection string with proper escaping
51+
52+
Note:
53+
- Driver parameter is placed first
54+
- Other parameters are sorted for consistency
55+
- Values are escaped if they contain special characters
56+
"""
57+
parts = []
58+
59+
# Build in specific order: Driver first, then others
60+
if _CONNECTION_STRING_DRIVER_KEY in self._params:
61+
parts.append(f"Driver={self._escape_value(self._params['Driver'])}")
62+
63+
# Add other parameters (sorted for consistency)
64+
for key in sorted(self._params.keys()):
65+
if key == 'Driver':
66+
continue # Already added
67+
68+
value = self._params[key]
69+
escaped_value = self._escape_value(value)
70+
parts.append(f"{key}={escaped_value}")
71+
72+
# Join with semicolons
73+
return ';'.join(parts)
74+
75+
def _escape_value(self, value: str) -> str:
76+
"""
77+
Escape a parameter value if it contains special characters.
78+
79+
Per MS-ODBCSTR specification:
80+
- Values containing ';', '{', '}', '=', or spaces should be braced for safety
81+
- '}' inside braced values is escaped as '}}'
82+
- '{' inside braced values is escaped as '{{'
83+
84+
Args:
85+
value: Parameter value to escape
86+
87+
Returns:
88+
Escaped value (possibly wrapped in braces)
89+
90+
Examples:
91+
>>> builder = _ConnectionStringBuilder()
92+
>>> builder._escape_value("localhost")
93+
'localhost'
94+
>>> builder._escape_value("local;host")
95+
'{local;host}'
96+
>>> builder._escape_value("p}w{d")
97+
'{p}}w{{d}'
98+
>>> builder._escape_value("ODBC Driver 18 for SQL Server")
99+
'{ODBC Driver 18 for SQL Server}'
100+
"""
101+
if not value:
102+
return value
103+
104+
# Check if value contains special characters that require bracing
105+
# Include spaces and = for safety, even though technically not always required
106+
needs_braces = any(ch in value for ch in ';{}= ')
107+
108+
if needs_braces:
109+
# Escape existing braces by doubling them
110+
escaped = value.replace('}', '}}').replace('{', '{{')
111+
return f'{{{escaped}}}'
112+
else:
113+
return value

0 commit comments

Comments
 (0)