eftoolkit.sql¶
DuckDB wrapper with S3 integration.
Module Contents¶
DuckDB
¶
DuckDB(
database: str = ':memory:',
*,
s3: Optional[S3FileSystem] = None,
s3_region: str | None = None,
s3_access_key_id: str | None = None,
s3_secret_access_key: str | None = None,
s3_endpoint: str | None = None,
s3_url_style: str | None = None,
)
Thin wrapper around duckdb.DuckDBPyConnection with S3 integration.
Inherits all native DuckDB methods (query, execute, sql, fetchone, fetchall, etc.) via delegation to the underlying connection.
S3 operations use eftoolkit.s3.S3FileSystem internally.
Initialize DuckDB with optional S3 integration.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
database
|
str
|
Path to the database file or ':memory:' for in-memory database |
':memory:'
|
s3
|
Optional[S3FileSystem]
|
Existing S3FileSystem instance to use for S3 operations |
None
|
s3_region
|
str | None
|
AWS region for S3 access (creates S3FileSystem internally) |
None
|
s3_access_key_id
|
str | None
|
AWS access key ID for S3 access |
None
|
s3_secret_access_key
|
str | None
|
AWS secret access key for S3 access |
None
|
s3_endpoint
|
str | None
|
Custom S3 endpoint |
None
|
s3_url_style
|
str | None
|
S3 URL style ('path' or 'vhost') |
None
|
Source code in eftoolkit/sql/duckdb.py
connection
property
¶
Underlying DuckDB connection (for direct access to native API).
s3
property
¶
s3: Optional[S3FileSystem]
S3FileSystem instance used for S3 operations, or None if not configured.
query
¶
Execute SQL and return DataFrame.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
sql
|
str
|
SQL query to execute. |
required |
Returns:
| Type | Description |
|---|---|
DataFrame
|
DataFrame containing the query results. |
Example
db = DuckDB() df = db.query("SELECT 1 as id, 'Alice' as name") print(df) id name 0 1 Alice
Source code in eftoolkit/sql/duckdb.py
execute
¶
Execute SQL without returning results.
This method can be used for any DuckDB SQL command, including: - DDL statements (CREATE, DROP, ALTER) - DML statements (INSERT, UPDATE, DELETE) - DuckDB COPY commands for S3 writes (e.g., COPY ... TO 's3://...')
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
sql
|
str
|
SQL statement to execute. |
required |
*args
|
object
|
Positional arguments passed to duckdb execute. |
()
|
**kwargs
|
object
|
Keyword arguments passed to duckdb execute. |
{}
|
Source code in eftoolkit/sql/duckdb.py
get_table
¶
SELECT * FROM table with optional WHERE clause.
Automatically cleans inf/nan values to None.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
table_name
|
str
|
Name of the table to query. |
required |
where
|
str | None
|
Optional WHERE clause (without 'WHERE' keyword). |
None
|
Returns:
| Type | Description |
|---|---|
DataFrame
|
DataFrame with table contents. |
Example
db = DuckDB() db.create_table('users', "SELECT 1 as id, 'Alice' as name") df = db.get_table('users') filtered = db.get_table('users', where="id = 1")
Source code in eftoolkit/sql/duckdb.py
create_table
¶
CREATE OR REPLACE TABLE from SQL.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
table_name
|
str
|
Name for the new table. |
required |
sql
|
str
|
SQL SELECT statement to define table contents. |
required |
Example
db = DuckDB() db.create_table('active_users', "SELECT * FROM users WHERE active = true")
Source code in eftoolkit/sql/duckdb.py
create_table_from_df
¶
CREATE OR REPLACE TABLE from DataFrame.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
table_name
|
str
|
Name for the new table. |
required |
df
|
DataFrame
|
DataFrame to store as a table. |
required |
Example
import pandas as pd db = DuckDB() df = pd.DataFrame({'id': [1, 2], 'name': ['Alice', 'Bob']}) db.create_table_from_df('users', df)
Source code in eftoolkit/sql/duckdb.py
read_parquet_from_s3
¶
Read parquet from S3.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
s3_uri
|
str
|
S3 URI (e.g., 's3://bucket/path/file.parquet') |
required |
Returns:
| Type | Description |
|---|---|
DataFrame
|
DataFrame with parquet contents |
Raises:
| Type | Description |
|---|---|
ValueError
|
If S3 is not configured |
Source code in eftoolkit/sql/duckdb.py
write_df_to_s3_parquet
¶
Write DataFrame to S3 as parquet.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
df
|
DataFrame
|
DataFrame to write |
required |
s3_uri
|
str
|
S3 URI (e.g., 's3://bucket/path/file.parquet') |
required |
Raises:
| Type | Description |
|---|---|
ValueError
|
If S3 is not configured |