Usage Guide¶
Connection String Setup¶
To use sql_database_utils, you'll need a valid ODBC connection string to your SQL Server database.
Connection String Format¶
DRIVER={ODBC Driver 17 for SQL Server};SERVER=server_name;DATABASE=database_name;UID=username;PWD=password
Example Connection Strings¶
Local SQL Server:
conn_str = "DRIVER={ODBC Driver 17 for SQL Server};SERVER=localhost;DATABASE=MyDatabase;UID=sa;PWD=password"
Azure SQL Database:
conn_str = "DRIVER={ODBC Driver 17 for SQL Server};SERVER=myserver.database.windows.net;DATABASE=MyDatabase;UID=user@myserver;PWD=password"
Using Windows Authentication:
conn_str = "DRIVER={ODBC Driver 17 for SQL Server};SERVER=server_name;DATABASE=database_name;Trusted_Connection=yes"
Reading Data¶
Read Entire Table¶
Use read_table() to read an entire table into a pandas DataFrame:
import sql_database_utils as db
conn_str = "DRIVER={ODBC Driver 17 for SQL Server};SERVER=localhost;DATABASE=MyDB;UID=user;PWD=password"
# Read entire table
df = db.read_table("Customers", conn_str)
print(df.head())
Read with Filter¶
Use read_with_filter() to read only rows that match a condition:
# Read orders with amount > 500
df_large_orders = db.read_with_filter(
"Orders",
conn_str,
"OrderAmount",
"> 500"
)
# Read specific customer
df_customer = db.read_with_filter(
"Customers",
conn_str,
"CustomerID",
"= 123"
)
# Read by date range
df_recent = db.read_with_filter(
"TransactionLog",
conn_str,
"TransactionDate",
">= '2024-01-01'"
)
Writing Data¶
Insert DataFrame¶
Use insert_dataframe() to insert data from a pandas DataFrame into a SQL Server table:
import pandas as pd
# Create a sample DataFrame
df = pd.DataFrame({
'CustomerID': [1, 2, 3],
'Name': ['Alice', 'Bob', 'Charlie'],
'Email': ['alice@example.com', 'bob@example.com', 'charlie@example.com']
})
# Insert into table
db.insert_dataframe("Customers", df, conn_str)
The function will:
- Retrieve the table schema from the database
- Validate columns - ensures no extra columns exist in the DataFrame
- Auto-fill missing columns with NULL values
- Use parameterized queries to prevent SQL injection
- Insert row-by-row for safe operations
Note
For large batch inserts (>10,000 rows), consider breaking the DataFrame into smaller chunks for better performance.
Deleting Data¶
Use delete_with_filter() to delete rows matching a condition:
# Delete cancelled orders
deleted_count = db.delete_with_filter(
"Orders",
conn_str,
"OrderStatus",
"= 'Cancelled'"
)
print(f"{deleted_count} rows deleted")
# Delete old log entries
deleted_count = db.delete_with_filter(
"TransactionLog",
conn_str,
"TransactionDate",
"< '2020-01-01'"
)
print(f"Deleted {deleted_count} old transactions")
Best Practices¶
Security¶
Warning
Always use parameterized queries and avoid string concatenation for user input. The module's functions are designed to use parameterized queries internally.
Performance¶
- Reading: For large tables, use filtering to reduce memory usage
- Inserting: Insert dataframes in batches of 1,000-5,000 rows for optimal speed
- Deleting: Test your WHERE conditions carefully before executing
Error Handling¶
import sql_database_utils as db
try:
df = db.read_table("NonExistentTable", conn_str)
except Exception as e:
print(f"Error reading table: {e}")
try:
db.insert_dataframe("Customers", df, conn_str)
except Exception as e:
print(f"Error inserting data: {e}")
Examples¶
Data Pipeline Example¶
import sql_database_utils as db
import pandas as pd
conn_str = "DRIVER={ODBC Driver 17 for SQL Server};SERVER=localhost;DATABASE=MyDB;UID=user;PWD=password"
# 1. Read raw data
raw_df = db.read_table("RawData", conn_str)
# 2. Process data
processed_df = raw_df[raw_df['Amount'] > 0].copy()
processed_df['Date'] = pd.to_datetime(processed_df['Date'])
# 3. Store processed data
db.insert_dataframe("ProcessedData", processed_df, conn_str)
# 4. Clean up old entries
deleted = db.delete_with_filter("RawData", conn_str, "Status", "= 'Processed'")
print(f"Cleaned up {deleted} records")
Bulk Data Migration¶
# Read from source table
df = db.read_with_filter(
"SourceTable",
conn_str_source,
"Year",
">= 2023"
)
# Insert to destination table
db.insert_dataframe("DestinationTable", df, conn_str_destination)
print(f"Migrated {len(df)} records")