eftoolkit.s3¶
S3 filesystem operations for parquet files.
Classes¶
S3FileSystem¶
S3FileSystem
¶
S3FileSystem(
*,
access_key_id: str | None = None,
secret_access_key: str | None = None,
region: str | None = None,
endpoint: str | None = None,
)
S3 filesystem client for reading/writing parquet files.
Falls back to environment variables if credentials are not provided
- S3_ACCESS_KEY_ID / AWS_ACCESS_KEY_ID
- S3_SECRET_ACCESS_KEY / AWS_SECRET_ACCESS_KEY
- S3_REGION / AWS_REGION
- S3_ENDPOINT
Initialize S3 filesystem.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
access_key_id
|
str | None
|
AWS access key ID |
None
|
secret_access_key
|
str | None
|
AWS secret access key |
None
|
region
|
str | None
|
AWS region |
None
|
endpoint
|
str | None
|
Custom S3 endpoint (e.g., 'nyc3.digitaloceanspaces.com') |
None
|
Source code in eftoolkit/s3/filesystem.py
read_df_from_parquet
¶
Read parquet file(s) from S3.
Supports both single files and directories containing parquet files.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
s3_uri
|
str
|
S3 URI. Can be: - A URI ending in .parquet (reads that exact file) - A prefix/directory URI (reads all .parquet files and concatenates) |
required |
Returns:
| Type | Description |
|---|---|
DataFrame
|
DataFrame with parquet contents |
Source code in eftoolkit/s3/filesystem.py
write_df_to_parquet
¶
Write DataFrame as parquet to S3.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
df
|
DataFrame
|
DataFrame to write |
required |
s3_uri
|
str
|
S3 URI (e.g., 's3://bucket/path/file.parquet') |
required |
Raises:
| Type | Description |
|---|---|
ValueError
|
If URI does not end with .parquet |
Source code in eftoolkit/s3/filesystem.py
file_exists
¶
Check if object exists.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
s3_uri
|
str
|
S3 URI (e.g., 's3://bucket/key') |
required |
Returns:
| Type | Description |
|---|---|
bool
|
True if object exists |
Source code in eftoolkit/s3/filesystem.py
ls
¶
ls(
s3_uri: str,
*,
recursive: bool = True,
include_prefixes: bool = False,
) -> Iterator[S3Object]
List objects at an S3 URI.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
s3_uri
|
str
|
S3 URI (e.g., 's3://bucket' or 's3://bucket/prefix') |
required |
recursive
|
bool
|
If True, list all objects under prefix recursively. If False, list only files at the immediate level. |
True
|
include_prefixes
|
bool
|
If True and recursive=False, also yield prefix (directory) entries with is_prefix=True in metadata. Ignored when recursive=True. |
False
|
Yields:
| Type | Description |
|---|---|
S3Object
|
S3Object instances with metadata for each file (and prefix if requested) |
Source code in eftoolkit/s3/filesystem.py
cp
¶
Copy an object within or across buckets.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
src_uri
|
str
|
Source S3 URI (e.g., 's3://bucket/key') |
required |
dst_uri
|
str
|
Destination S3 URI (e.g., 's3://bucket/key') |
required |
Raises:
| Type | Description |
|---|---|
FileNotFoundError
|
If the source object does not exist |
Source code in eftoolkit/s3/filesystem.py
delete_object
¶
Delete an object from S3.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
s3_uri
|
str
|
S3 URI (e.g., 's3://bucket/key') |
required |
Note
This is idempotent - deleting a non-existent object does not error.
Source code in eftoolkit/s3/filesystem.py
put_object
¶
Upload raw bytes to S3.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
s3_uri
|
str
|
S3 URI (e.g., 's3://bucket/key') |
required |
body
|
bytes
|
Raw bytes to upload |
required |
content_type
|
str | None
|
Optional content type (e.g., 'application/octet-stream') |
None
|
Source code in eftoolkit/s3/filesystem.py
get_object
¶
Download raw bytes from S3.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
s3_uri
|
str
|
S3 URI (e.g., 's3://bucket/key') |
required |
Returns:
| Type | Description |
|---|---|
bytes
|
Raw bytes of the object |
Raises:
| Type | Description |
|---|---|
FileNotFoundError
|
If the object does not exist |
Source code in eftoolkit/s3/filesystem.py
S3Object¶
S3Object
dataclass
¶
S3Object(key: str, bucket: str, metadata: S3ObjectMetadata)
Represents an S3 object with its location and metadata.
Attributes:
| Name | Type | Description |
|---|---|---|
key |
str
|
Object key (path within bucket) |
bucket |
str
|
Bucket name |
uri |
str
|
Full S3 URI (s3://bucket/key) |
metadata |
S3ObjectMetadata
|
Object metadata (size, last_modified, etc.) |
S3ObjectMetadata¶
S3ObjectMetadata
dataclass
¶
S3ObjectMetadata(
is_prefix: bool = False,
last_modified_timestamp_utc: datetime | None = None,
size: int | None = None,
etag: str | None = None,
storage_class: str | None = None,
)
Metadata for an S3 object from boto3 response.
Attributes:
| Name | Type | Description |
|---|---|---|
is_prefix |
bool
|
True if this represents a prefix/directory, not an actual object |
last_modified_timestamp_utc |
datetime | None
|
When the object was last modified (UTC) |
size |
int | None
|
Object size in bytes |
etag |
str | None
|
Object ETag hash |
storage_class |
str | None
|
S3 storage class (STANDARD, GLACIER, etc.) |
last_modified_timestamp_utc
class-attribute
instance-attribute
¶
items
¶
Yield key-value pairs of metadata fields.
Enables dict(metadata.items()) and for k, v in metadata.items().