Skip to content
1,577 changes: 1,577 additions & 0 deletions docs/connection_string_allow_list_design.md

Large diffs are not rendered by default.

215 changes: 215 additions & 0 deletions docs/parser_state_machine.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,215 @@
# Connection String Parser State Machine

This document describes the state machine for the ODBC connection string parser (`_ConnectionStringParser`).

## Overview

The parser processes ODBC connection strings character-by-character, handling:
- Semicolon-separated key=value pairs
- Simple values (unquoted)
- Braced values with escape sequences: `{value}`, `}}` → `}`, `{{` → `{`
- Whitespace normalization
- Error detection and collection

## State Machine Diagram

```mermaid
stateDiagram-v2
[*] --> START: Begin parsing

START --> SKIP_WHITESPACE: Start of new segment

SKIP_WHITESPACE --> SKIP_WHITESPACE: Space, tab, semicolon
SKIP_WHITESPACE --> END: EOF
SKIP_WHITESPACE --> PARSE_KEY: Other char

PARSE_KEY --> PARSE_KEY: Any char except equals or semicolon
PARSE_KEY --> ERROR_NO_EQUALS: EOF or semicolon found
PARSE_KEY --> ERROR_EMPTY_KEY: Equals found but key is empty
PARSE_KEY --> VALIDATE_KEY: Equals found and key exists

ERROR_NO_EQUALS --> SKIP_TO_SEMICOLON: Record error
ERROR_EMPTY_KEY --> SKIP_TO_SEMICOLON: Record error

VALIDATE_KEY --> PARSE_VALUE: Key validated

PARSE_VALUE --> CHECK_VALUE_TYPE: Skip whitespace

CHECK_VALUE_TYPE --> PARSE_SIMPLE_VALUE: First char not left brace
CHECK_VALUE_TYPE --> PARSE_BRACED_VALUE: First char is left brace

PARSE_SIMPLE_VALUE --> PARSE_SIMPLE_VALUE: Any char except semicolon
PARSE_SIMPLE_VALUE --> STORE_PARAM: Semicolon or EOF

PARSE_BRACED_VALUE --> PARSE_BRACED_VALUE: Regular char
PARSE_BRACED_VALUE --> CHECK_RIGHT_BRACE: Right brace encountered
PARSE_BRACED_VALUE --> CHECK_LEFT_BRACE: Left brace encountered
PARSE_BRACED_VALUE --> ERROR_UNCLOSED_BRACE: EOF without closing brace

CHECK_RIGHT_BRACE --> PARSE_BRACED_VALUE: Double right brace (escaped)
CHECK_RIGHT_BRACE --> STORE_PARAM: Single right brace (end of value)

CHECK_LEFT_BRACE --> PARSE_BRACED_VALUE: Double left brace (escaped)
CHECK_LEFT_BRACE --> PARSE_BRACED_VALUE: Single left brace (keep as-is)

ERROR_UNCLOSED_BRACE --> SKIP_TO_SEMICOLON: Record error

STORE_PARAM --> CHECK_DUPLICATE: Parameter extracted

CHECK_DUPLICATE --> ERROR_DUPLICATE: Key seen before
CHECK_DUPLICATE --> VALIDATE_ALLOWLIST: New key

ERROR_DUPLICATE --> SKIP_TO_SEMICOLON: Record error

VALIDATE_ALLOWLIST --> CHECK_RESERVED: If allowlist provided
VALIDATE_ALLOWLIST --> SAVE_PARAM: No allowlist

CHECK_RESERVED --> ERROR_RESERVED: Driver or APP keyword
CHECK_RESERVED --> CHECK_UNKNOWN: Not reserved

CHECK_UNKNOWN --> ERROR_UNKNOWN: Unknown keyword
CHECK_UNKNOWN --> SAVE_PARAM: Known keyword

ERROR_RESERVED --> SKIP_TO_SEMICOLON: Record error
ERROR_UNKNOWN --> SKIP_TO_SEMICOLON: Record error

SAVE_PARAM --> SKIP_WHITESPACE: Continue parsing
SKIP_TO_SEMICOLON --> SKIP_WHITESPACE: Error recovery

END --> RAISE_ERRORS: If errors collected
END --> RETURN_PARAMS: No errors

RAISE_ERRORS --> [*]: Throw ConnectionStringParseError
RETURN_PARAMS --> [*]: Return dict of params
```

## States Description

### Main States

| State | Description |
|-------|-------------|
| **START** | Initial state at beginning of parsing |
| **SKIP_WHITESPACE** | Skip whitespace (spaces, tabs) and semicolons between parameters |
| **PARSE_KEY** | Extract parameter key up to '=' sign |
| **VALIDATE_KEY** | Check if key is non-empty |
| **PARSE_VALUE** | Determine value type and extract it |
| **CHECK_VALUE_TYPE** | Decide between simple or braced value parsing |
| **PARSE_SIMPLE_VALUE** | Extract unquoted value up to ';' or EOF |
| **PARSE_BRACED_VALUE** | Extract braced value with escape handling |
| **STORE_PARAM** | Prepare to store the key-value pair |
| **CHECK_DUPLICATE** | Verify key hasn't been seen before |
| **VALIDATE_ALLOWLIST** | Check parameter against allowlist (if provided) |
| **CHECK_RESERVED** | Verify parameter is not reserved (Driver, APP) |
| **CHECK_UNKNOWN** | Verify parameter is recognized |
| **SAVE_PARAM** | Store the parameter in results |
| **SKIP_TO_SEMICOLON** | Error recovery: advance to next ';' |
| **END** | Parsing complete |
| **RAISE_ERRORS** | Collect and throw all errors |
| **RETURN_PARAMS** | Return parsed parameters dictionary |

### Error States

| Error State | Trigger | Error Message |
|-------------|---------|---------------|
| **ERROR_NO_EQUALS** | Key without '=' separator | "Incomplete specification: keyword '{key}' has no value (missing '=')" |
| **ERROR_EMPTY_KEY** | '=' with no preceding key | "Empty keyword found (format: =value)" |
| **ERROR_DUPLICATE** | Same key appears twice | "Duplicate keyword '{key}' found" |
| **ERROR_UNCLOSED_BRACE** | '{' without matching '}' | "Unclosed braced value starting at position {pos}" |
| **ERROR_RESERVED** | User tries to set Driver or APP | "Reserved keyword '{key}' is controlled by the driver and cannot be specified by the user" |
| **ERROR_UNKNOWN** | Key not in allowlist | "Unknown keyword '{key}' is not recognized" |

## Special Characters & Escaping

### Braced Value Escape Sequences

| Input | Parsed Output | Description |
|-------|---------------|-------------|
| `{value}` | `value` | Basic braced value |
| `{val;ue}` | `val;ue` | Semicolon allowed inside braces |
| `{val}}ue}` | `val}ue` | Escaped right brace: `}}` → `}` |
| `{val{{ue}` | `val{ue` | Escaped left brace: `{{` → `{` |
| `{a=b}` | `a=b` | Equals sign allowed inside braces |
| `{sp ace}` | `sp ace` | Spaces preserved inside braces |

### Simple Value Rules

- Read until semicolon (`;`) or end of string
- Leading whitespace after '=' is skipped
- Trailing whitespace is stripped from value
- Cannot contain semicolons (unescaped)

## Examples

### Valid Parsing Flow

```
Input: "Server=localhost;Database=mydb"

START → SKIP_WHITESPACE → PARSE_KEY("Server")
→ VALIDATE_KEY → PARSE_VALUE → CHECK_VALUE_TYPE
→ PARSE_SIMPLE_VALUE("localhost") → STORE_PARAM
→ CHECK_DUPLICATE → VALIDATE_ALLOWLIST → SAVE_PARAM
→ SKIP_WHITESPACE → PARSE_KEY("Database")
→ VALIDATE_KEY → PARSE_VALUE → CHECK_VALUE_TYPE
→ PARSE_SIMPLE_VALUE("mydb") → STORE_PARAM
→ CHECK_DUPLICATE → VALIDATE_ALLOWLIST → SAVE_PARAM
→ END → RETURN_PARAMS
```

### Error Handling Flow

```
Input: "Server=localhost;Server=other" (duplicate)

START → ... → SAVE_PARAM(Server=localhost)
→ SKIP_WHITESPACE → PARSE_KEY("Server")
→ VALIDATE_KEY → PARSE_VALUE → ... → STORE_PARAM
→ CHECK_DUPLICATE → ERROR_DUPLICATE
→ SKIP_TO_SEMICOLON → END → RAISE_ERRORS
```

### Braced Value with Escaping

```
Input: "PWD={p}}w{{d;test}"

START → SKIP_WHITESPACE → PARSE_KEY("PWD")
→ VALIDATE_KEY → PARSE_VALUE → CHECK_VALUE_TYPE
→ PARSE_BRACED_VALUE
- Read 'p'
- Read '}' → CHECK_RIGHT_BRACE
- Next is '}' → Escaped: add '}', continue
- Read 'w'
- Read '{' → CHECK_LEFT_BRACE
- Next is '{' → Escaped: add '{', continue
- Read 'd', ';', 't', 'e', 's', 't'
- Read '}' → CHECK_RIGHT_BRACE
- Next is not '}' → End of value
→ STORE_PARAM (value="p}w{d;test")
→ ... → SAVE_PARAM → END → RETURN_PARAMS
```

## Parser Characteristics

### Key Features

1. **Error Collection**: Collects all errors before raising exception (batch error reporting)
2. **Case-Insensitive Keys**: All keys normalized to lowercase during parsing
3. **Duplicate Detection**: Tracks seen keys to prevent duplicates
4. **Reserved Keywords**: Blocks user from setting `Driver` and `APP`
5. **Allowlist Validation**: Optional validation against allowed parameters
6. **Escape Handling**: Proper ODBC brace escape sequences (`{{`, `}}`)
7. **Error Recovery**: Skips to next semicolon after errors to continue validation

### Error Handling Strategy

- **Non-fatal errors**: Continue parsing to collect all errors
- **Fatal errors**: Stop immediately (e.g., unclosed brace in value parsing)
- **Batch reporting**: All errors reported together in `ConnectionStringParseError`

## References

- MS-ODBCSTR Specification: [ODBC Connection String Format](https://learn.microsoft.com/en-us/openspecs/sql_server_protocols/ms-odbcstr/)
- Implementation: `mssql_python/connection_string_parser.py`
- Tests: `tests/test_010_connection_string_parser.py`
6 changes: 3 additions & 3 deletions eng/pipelines/build-whl-pipeline.yml
Original file line number Diff line number Diff line change
Expand Up @@ -340,7 +340,7 @@
python -m pytest -v
displayName: 'Run Pytest to validate bindings'
env:
DB_CONNECTION_STRING: 'Driver=ODBC Driver 18 for SQL Server;Server=tcp:127.0.0.1,1433;Database=master;Uid=SA;Pwd=$(DB_PASSWORD);TrustServerCertificate=yes'
DB_CONNECTION_STRING: 'Server=tcp:127.0.0.1,1433;Database=master;Uid=SA;Pwd=$(DB_PASSWORD);TrustServerCertificate=yes'

Check notice

Code scanning / devskim

Accessing localhost could indicate debug code, or could hinder scaling. Note

Do not leave debug code in production

# Build wheel package for universal2
- script: |
Expand Down Expand Up @@ -801,7 +801,7 @@

displayName: 'Test wheel installation and basic functionality on $(BASE_IMAGE)'
env:
DB_CONNECTION_STRING: 'Driver=ODBC Driver 18 for SQL Server;Server=localhost;Database=TestDB;Uid=SA;Pwd=$(DB_PASSWORD);TrustServerCertificate=yes'
DB_CONNECTION_STRING: 'Server=localhost;Database=TestDB;Uid=SA;Pwd=$(DB_PASSWORD);TrustServerCertificate=yes'

Check notice

Code scanning / devskim

Accessing localhost could indicate debug code, or could hinder scaling. Note

Do not leave debug code in production

# Run pytest with source code while testing installed wheel
- script: |
Expand Down Expand Up @@ -856,7 +856,7 @@
"
displayName: 'Run pytest suite on $(BASE_IMAGE) $(ARCH)'
env:
DB_CONNECTION_STRING: 'Driver=ODBC Driver 18 for SQL Server;Server=localhost;Database=TestDB;Uid=SA;Pwd=$(DB_PASSWORD);TrustServerCertificate=yes'
DB_CONNECTION_STRING: 'Server=localhost;Database=TestDB;Uid=SA;Pwd=$(DB_PASSWORD);TrustServerCertificate=yes'

Check notice

Code scanning / devskim

Accessing localhost could indicate debug code, or could hinder scaling. Note

Do not leave debug code in production
continueOnError: true # Don't fail pipeline if tests fail

# Cleanup
Expand Down
28 changes: 14 additions & 14 deletions eng/pipelines/pr-validation-pipeline.yml
Original file line number Diff line number Diff line change
Expand Up @@ -190,7 +190,7 @@
python -m pytest -v --junitxml=test-results.xml --cov=. --cov-report=xml --capture=tee-sys --cache-clear
displayName: 'Run pytest with coverage'
env:
DB_CONNECTION_STRING: 'Driver=ODBC Driver 18 for SQL Server;Server=tcp:127.0.0.1,1433;Database=master;Uid=SA;Pwd=$(DB_PASSWORD);TrustServerCertificate=yes'
DB_CONNECTION_STRING: 'Server=tcp:127.0.0.1,1433;Database=master;Uid=SA;Pwd=$(DB_PASSWORD);TrustServerCertificate=yes'

Check notice

Code scanning / devskim

Accessing localhost could indicate debug code, or could hinder scaling. Note

Do not leave debug code in production
DB_PASSWORD: $(DB_PASSWORD)

- task: PublishTestResults@2
Expand Down Expand Up @@ -359,12 +359,12 @@
echo "SQL Server IP: $SQLSERVER_IP"

docker exec \
-e DB_CONNECTION_STRING="Driver=ODBC Driver 18 for SQL Server;Server=$SQLSERVER_IP;Database=TestDB;Uid=SA;Pwd=$(DB_PASSWORD);TrustServerCertificate=yes" \
-e DB_CONNECTION_STRING="Server=$SQLSERVER_IP;Database=TestDB;Uid=SA;Pwd=$(DB_PASSWORD);TrustServerCertificate=yes" \
-e DB_PASSWORD="$(DB_PASSWORD)" \
test-container-$(distroName) bash -c "
source /opt/venv/bin/activate
echo 'Build successful, running tests now on $(distroName)'
echo 'Using connection string: Driver=ODBC Driver 18 for SQL Server;Server=$SQLSERVER_IP;Database=TestDB;Uid=SA;Pwd=***;TrustServerCertificate=yes'
echo 'Using connection string: Server=$SQLSERVER_IP;Database=TestDB;Uid=SA;Pwd=***;TrustServerCertificate=yes'
python -m pytest -v --junitxml=test-results-$(distroName).xml --cov=. --cov-report=xml:coverage-$(distroName).xml --capture=tee-sys --cache-clear
"
displayName: 'Run pytest with coverage in $(distroName) container'
Expand Down Expand Up @@ -570,13 +570,13 @@
echo "SQL Server IP: $SQLSERVER_IP"

docker exec \
-e DB_CONNECTION_STRING="Driver=ODBC Driver 18 for SQL Server;Server=$SQLSERVER_IP;Database=TestDB;Uid=SA;Pwd=$(DB_PASSWORD);TrustServerCertificate=yes" \
-e DB_CONNECTION_STRING="Server=$SQLSERVER_IP;Database=TestDB;Uid=SA;Pwd=$(DB_PASSWORD);TrustServerCertificate=yes" \
-e DB_PASSWORD="$(DB_PASSWORD)" \
test-container-$(distroName)-$(archName) bash -c "
source /opt/venv/bin/activate
echo 'Build successful, running tests now on $(distroName) ARM64'
echo 'Architecture:' \$(uname -m)
echo 'Using connection string: Driver=ODBC Driver 18 for SQL Server;Server=$SQLSERVER_IP;Database=TestDB;Uid=SA;Pwd=***;TrustServerCertificate=yes'
echo 'Using connection string: Server=$SQLSERVER_IP;Database=TestDB;Uid=SA;Pwd=***;TrustServerCertificate=yes'
python main.py
python -m pytest -v --junitxml=test-results-$(distroName)-$(archName).xml --cov=. --cov-report=xml:coverage-$(distroName)-$(archName).xml --capture=tee-sys --cache-clear
"
Expand Down Expand Up @@ -778,12 +778,12 @@
echo "SQL Server IP: $SQLSERVER_IP"

docker exec \
-e DB_CONNECTION_STRING="Driver=ODBC Driver 18 for SQL Server;Server=$SQLSERVER_IP;Database=TestDB;Uid=SA;Pwd=$(DB_PASSWORD);TrustServerCertificate=yes" \
-e DB_CONNECTION_STRING="Server=$SQLSERVER_IP;Database=TestDB;Uid=SA;Pwd=$(DB_PASSWORD);TrustServerCertificate=yes" \
-e DB_PASSWORD="$(DB_PASSWORD)" \
test-container-rhel9 bash -c "
source myvenv/bin/activate
echo 'Build successful, running tests now on RHEL 9'
echo 'Using connection string: Driver=ODBC Driver 18 for SQL Server;Server=$SQLSERVER_IP;Database=TestDB;Uid=SA;Pwd=***;TrustServerCertificate=yes'
echo 'Using connection string: Server=$SQLSERVER_IP;Database=TestDB;Uid=SA;Pwd=***;TrustServerCertificate=yes'
python main.py
python -m pytest -v --junitxml=test-results-rhel9.xml --cov=. --cov-report=xml:coverage-rhel9.xml --capture=tee-sys --cache-clear
"
Expand Down Expand Up @@ -997,13 +997,13 @@
echo "SQL Server IP: $SQLSERVER_IP"

docker exec \
-e DB_CONNECTION_STRING="Driver=ODBC Driver 18 for SQL Server;Server=$SQLSERVER_IP;Database=TestDB;Uid=SA;Pwd=$(DB_PASSWORD);TrustServerCertificate=yes" \
-e DB_CONNECTION_STRING="Server=$SQLSERVER_IP;Database=TestDB;Uid=SA;Pwd=$(DB_PASSWORD);TrustServerCertificate=yes" \
-e DB_PASSWORD="$(DB_PASSWORD)" \
test-container-rhel9-arm64 bash -c "
source myvenv/bin/activate
echo 'Build successful, running tests now on RHEL 9 ARM64'
echo 'Architecture:' \$(uname -m)
echo 'Using connection string: Driver=ODBC Driver 18 for SQL Server;Server=$SQLSERVER_IP;Database=TestDB;Uid=SA;Pwd=***;TrustServerCertificate=yes'
echo 'Using connection string: Server=$SQLSERVER_IP;Database=TestDB;Uid=SA;Pwd=***;TrustServerCertificate=yes'
python -m pytest -v --junitxml=test-results-rhel9-arm64.xml --cov=. --cov-report=xml:coverage-rhel9-arm64.xml --capture=tee-sys --cache-clear
"
displayName: 'Run pytest with coverage in RHEL 9 ARM64 container'
Expand Down Expand Up @@ -1225,13 +1225,13 @@
echo "SQL Server IP: $SQLSERVER_IP"

docker exec \
-e DB_CONNECTION_STRING="Driver=ODBC Driver 18 for SQL Server;Server=$SQLSERVER_IP;Database=TestDB;Uid=SA;Pwd=$(DB_PASSWORD);TrustServerCertificate=yes" \
-e DB_CONNECTION_STRING="Server=$SQLSERVER_IP;Database=TestDB;Uid=SA;Pwd=$(DB_PASSWORD);TrustServerCertificate=yes" \
-e DB_PASSWORD="$(DB_PASSWORD)" \
test-container-alpine bash -c "
echo 'Build successful, running tests now on Alpine x86_64'
echo 'Architecture:' \$(uname -m)
echo 'Alpine version:' \$(cat /etc/alpine-release)
echo 'Using connection string: Driver=ODBC Driver 18 for SQL Server;Server=$SQLSERVER_IP;Database=TestDB;Uid=SA;Pwd=***;TrustServerCertificate=yes'
echo 'Using connection string: Server=$SQLSERVER_IP;Database=TestDB;Uid=SA;Pwd=***;TrustServerCertificate=yes'

# Activate virtual environment
source /workspace/venv/bin/activate
Expand Down Expand Up @@ -1467,13 +1467,13 @@
echo "SQL Server IP: $SQLSERVER_IP"

docker exec \
-e DB_CONNECTION_STRING="Driver=ODBC Driver 18 for SQL Server;Server=$SQLSERVER_IP;Database=TestDB;Uid=SA;Pwd=$(DB_PASSWORD);TrustServerCertificate=yes" \
-e DB_CONNECTION_STRING="Server=$SQLSERVER_IP;Database=TestDB;Uid=SA;Pwd=$(DB_PASSWORD);TrustServerCertificate=yes" \
-e DB_PASSWORD="$(DB_PASSWORD)" \
test-container-alpine-arm64 bash -c "
echo 'Build successful, running tests now on Alpine ARM64'
echo 'Architecture:' \$(uname -m)
echo 'Alpine version:' \$(cat /etc/alpine-release)
echo 'Using connection string: Driver=ODBC Driver 18 for SQL Server;Server=$SQLSERVER_IP;Database=TestDB;Uid=SA;Pwd=***;TrustServerCertificate=yes'
echo 'Using connection string: Server=$SQLSERVER_IP;Database=TestDB;Uid=SA;Pwd=***;TrustServerCertificate=yes'

# Activate virtual environment
source /workspace/venv/bin/activate
Expand Down Expand Up @@ -1574,7 +1574,7 @@
lcov_cobertura total.info --output unified-coverage/coverage.xml
displayName: 'Generate unified coverage (Python + C++)'
env:
DB_CONNECTION_STRING: 'Driver=ODBC Driver 18 for SQL Server;Server=tcp:127.0.0.1,1433;Database=master;Uid=SA;Pwd=$(DB_PASSWORD);TrustServerCertificate=yes'
DB_CONNECTION_STRING: 'Server=tcp:127.0.0.1,1433;Database=master;Uid=SA;Pwd=$(DB_PASSWORD);TrustServerCertificate=yes'

Check notice

Code scanning / devskim

Accessing localhost could indicate debug code, or could hinder scaling. Note

Do not leave debug code in production
DB_PASSWORD: $(DB_PASSWORD)

- task: PublishTestResults@2
Expand Down
3 changes: 3 additions & 0 deletions mssql_python/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,9 @@
NotSupportedError,
)

# Connection string parser exceptions
from .exceptions import ConnectionStringParseError

# Type Objects
from .type import (
Date,
Expand Down
Loading
Loading