Generators¶
Generator functions for streaming DDACS simulation data.
generators
¶
Simple generator functions for DDACS data streaming.
This module provides lightweight generator functions for iterating over DDACS simulation data without class overhead.
count_available_simulations(data_dir, h5_subdir='h5', metadata_file='metadata.csv')
¶
Count available simulations (with existing H5 files).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
data_dir
|
str | Path
|
Root directory of the dataset. |
required |
h5_subdir
|
str
|
Subdirectory containing H5 files (default: "h5"). |
'h5'
|
metadata_file
|
str
|
Name of the metadata CSV file (default: "metadata.csv"). |
'metadata.csv'
|
Returns:
| Name | Type | Description |
|---|---|---|
int |
int
|
Number of simulations with existing H5 files. |
Raises:
| Type | Description |
|---|---|
FileNotFoundError
|
If the H5 directory or metadata file don't exist. |
Examples:
>>> count = count_available_simulations('/data/ddacs')
>>> print(f"Dataset contains {count} available simulations")
Source code in ddacs/generators.py
get_simulation_by_id(sim_id, data_dir, h5_subdir='h5', metadata_file='metadata.csv')
¶
Get a specific simulation by its ID.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
sim_id
|
int
|
The simulation ID to retrieve. |
required |
data_dir
|
str | Path
|
Root directory of the dataset. |
required |
h5_subdir
|
str
|
Subdirectory containing H5 files (default: "h5"). |
'h5'
|
metadata_file
|
str
|
Name of the metadata CSV file (default: "metadata.csv"). |
'metadata.csv'
|
Returns:
| Type | Description |
|---|---|
tuple[int, ndarray, Path] | None
|
Optional[Tuple[int, np.ndarray, Path]]: Simulation data if found, None otherwise. Tuple contains (simulation_id, metadata_values, h5_file_path). |
Raises:
| Type | Description |
|---|---|
FileNotFoundError
|
If the H5 directory or metadata file don't exist. |
Examples:
>>> sim_data = get_simulation_by_id(113525, '/data/ddacs')
>>> if sim_data:
... sim_id, metadata, h5_path = sim_data
... print(f"Found simulation {sim_id}")
>>> # Check if simulation exists
>>> if get_simulation_by_id(999999, '/data/ddacs') is None:
... print("Simulation not found")
Source code in ddacs/generators.py
iter_ddacs(data_dir, h5_subdir='h5', metadata_file='metadata.csv', skip_missing=False)
¶
Generator for streaming DDACS data.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
data_dir
|
str | Path
|
Root directory of the dataset. |
required |
h5_subdir
|
str
|
Subdirectory containing H5 files (default: "h5"). |
'h5'
|
metadata_file
|
str
|
Name of the metadata CSV file (default: "metadata.csv"). |
'metadata.csv'
|
skip_missing
|
bool
|
If True, skip missing H5 files with a warning. If False, raise FileNotFoundError (default: False). |
False
|
Yields:
| Type | Description |
|---|---|
tuple[int, ndarray, Path]
|
Tuple[int, np.ndarray, Path]: Simulation ID, metadata values array, and path to corresponding H5 file. |
Raises:
| Type | Description |
|---|---|
FileNotFoundError
|
If H5 directory doesn't exist, or if skip_missing=False and an H5 file is missing. |
Examples:
>>> for sim_id, metadata, h5_path in iter_ddacs('/data/ddacs'):
... print(f"Simulation {sim_id}: {h5_path}")
>>> # Skip missing files (for partial downloads)
>>> for sim_id, metadata, h5_path in iter_ddacs('/data/ddacs', skip_missing=True):
... print(f"Processing {sim_id}")
Source code in ddacs/generators.py
iter_h5_files(data_dir, h5_subdir='h5')
¶
Minimal generator for H5 file paths only.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
data_dir
|
str | Path
|
Root directory of the dataset. |
required |
h5_subdir
|
str
|
Subdirectory containing H5 files (default: "h5"). |
'h5'
|
Yields:
| Name | Type | Description |
|---|---|---|
Path |
Path
|
Absolute path to each H5 file found in the specified directory. |
Raises:
| Type | Description |
|---|---|
FileNotFoundError
|
If the H5 directory doesn't exist. |
Examples:
>>> # Count all H5 files
>>> h5_count = sum(1 for _ in iter_h5_files('/data/ddacs'))
>>> print(f"Total H5 files: {h5_count}")
Note
Yields all .h5 files found in the directory, regardless of metadata.
Source code in ddacs/generators.py
sample_simulations(n, data_dir, h5_subdir='h5', metadata_file='metadata.csv')
¶
Randomly sample simulations from the dataset.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
n
|
int
|
Number of simulations to sample. |
required |
data_dir
|
str | Path
|
Root directory of the dataset. |
required |
h5_subdir
|
str
|
Subdirectory containing H5 files (default: "h5"). |
'h5'
|
metadata_file
|
str
|
Name of the metadata CSV file (default: "metadata.csv"). |
'metadata.csv'
|
Yields:
| Type | Description |
|---|---|
tuple[int, ndarray, Path]
|
Tuple[int, np.ndarray, Path]: Simulation ID, metadata values array, and path to corresponding H5 file. |
Raises:
| Type | Description |
|---|---|
FileNotFoundError
|
If the H5 directory or metadata file don't exist. |
Examples:
>>> # Sample 5 random simulations
>>> for sim_id, metadata, h5_path in sample_simulations(5, '/data/ddacs'):
... print(f"Sampled simulation {sim_id}")
>>> # Convert to list for further processing
>>> samples = list(sample_simulations(10, '/data/ddacs'))
>>> print(f"Got {len(samples)} samples")