Dataset Creation#
- snakebids.generate_inputs(bids_dir: Path | str, pybids_inputs: InputsConfig, pybidsdb_dir: Path | str | None = None, pybidsdb_reset: bool = None, derivatives: bool | Path | str = False, pybids_config: str | None = None, limit_to: Iterable[str] | None = None, participant_label: Iterable[str] | str | None = None, exclude_participant_label: Iterable[str] | str | None = None, use_bids_inputs: Literal[True] | None = None, index_metadata: bool = False, validate: bool = False, pybids_database_dir: Path | str | None = None, pybids_reset_database: bool = None) BidsDataset #
- snakebids.generate_inputs(bids_dir: Path | str, pybids_inputs: InputsConfig, pybidsdb_dir: Path | str | None = None, pybidsdb_reset: bool = None, derivatives: bool | Path | str = False, pybids_config: str | None = None, limit_to: Iterable[str] | None = None, participant_label: Iterable[str] | str | None = None, exclude_participant_label: Iterable[str] | str | None = None, use_bids_inputs: Literal[False] = None, index_metadata: bool = False, validate: bool = False, pybids_database_dir: Path | str | None = None, pybids_reset_database: bool = None) BidsDatasetDict
Dynamically generate snakemake inputs using pybids_inputs.
Pybids is used to parse the bids_dir. Custom paths can also be parsed by including the custom_paths entry under the pybids_inputs descriptor.
- Parameters:
bids_dir – Path to bids directory
pybids_inputs –
Configuration for bids inputs, with keys as the names (
str
)Nested dicts with the following required keys (for complete info, see
InputConfig
):"filters"
: Dictionary of entity: “values” (dict of str -> str or list of str). The entity keywords should the bids tags on which to filter. The values should be an acceptable str value for that entity, or a list of acceptable str values."wildcards"
: List of (str) bids tags to include as wildcards in snakemake. At minimum this should usually include['subject','session']
, plus any other wildcards that you may want to make use of in your snakemake workflow, or want to retain in the output paths. Any wildcards in this list that are not in the filename will just be ignored."custom_path"
: Custom path to be parsed with wildcards wrapped in braces, as in/path/to/sub-{subject}/{wildcard_1}-{wildcard_2}
. This path will be parsed without pybids, allowing the use of non-bids-compliant paths.
pybidsdb_dir – Path to database directory. If None is provided, database is not used
pybidsdb_reset – A boolean that determines whether to reset / overwrite existing database.
derivatives – Indicates whether pybids should look for derivative datasets under bids_dir. These datasets must be properly formatted according to bids specs to be recognized. Defaults to False.
limit_to – If provided, indicates which input descriptors from pybids_inputs should be parsed. For example, if pybids_inputs describes
"bold"
and"dwi"
inputs, andlimit_to = ["bold"]
, only the “bold” inputs will be parsed. “dwi” will be ignoredparticipant_label – Indicate one or more participants to be included from input parsing. This may cause errors if subject filters are also specified in pybids_inputs. It may not be specified if exclude_participant_label is specified
exclude_participant_label – Indicate one or more participants to be excluded from input parsing. This may cause errors if subject filters are also specified in pybids_inputs. It may not be specified if participant_label is specified
use_bids_inputs – If False, returns the classic
BidsDatasetDict
instead of :class`BidsDataset`. Setting to True is deprecated as of v0.8, as this is now the default behaviourindex_metadata – If True indexes metadata of BIDS directory using pybids, otherwise skips indexing.
validate – If True performs validation of BIDS directory using pybids, otherwise skips validation.
- Returns:
Object containing organized information about the bids inputs for consumption in snakemake. See the documentation of
BidsDataset
for details and examples.- Return type:
Example
As an example, consider the following BIDS dataset:
example ├── README.md ├── dataset_description.json ├── participant.tsv ├── sub-001 │ ├── ses-01 │ │ ├── anat │ │ │ ├── sub-001_ses-01_run-01_T1w.json │ │ │ ├── sub-001_ses-01_run-01_T1w.nii.gz │ │ │ ├── sub-001_ses-01_run-02_T1w.json │ │ │ └── sub-001_ses-01_run-02_T1w.nii.gz │ │ └── func │ │ ├── sub-001_ses-01_task-nback_bold.json │ │ ├── sub-001_ses-01_task-nback_bold.nii.gz │ │ ├── sub-001_ses-01_task-rest_bold.json │ │ └── sub-001_ses-01_task-rest_bold.nii.gz │ └── ses-02 │ ├── anat │ │ ├── sub-001_ses-02_run-01_T1w.json │ │ └── sub-001_ses-02_run-01_T1w.nii.gz │ └── func │ ├── sub-001_ses-02_task-nback_bold.json │ ├── sub-001_ses-02_task-nback_bold.nii.gz │ ├── sub-001_ses-02_task-rest_bold.json │ └── sub-001_ses-02_task-rest_bold.nii.gz └── sub-002 ├── ses-01 │ ├── anat │ │ ├── sub-002_ses-01_run-01_T1w.json │ │ ├── sub-002_ses-01_run-01_T1w.nii.gz │ │ ├── sub-002_ses-01_run-02_T1w.json │ │ └── sub-002_ses-01_run-02_T1w.nii.gz │ └── func │ ├── sub-002_ses-01_task-nback_bold.json │ ├── sub-002_ses-01_task-nback_bold.nii.gz │ ├── sub-002_ses-01_task-rest_bold.json │ └── sub-002_ses-01_task-rest_bold.nii.gz └── ses-02 └── anat ├── sub-002_ses-02_run-01_T1w.json ├── sub-002_ses-02_run-01_T1w.nii.gz ├── sub-002_ses-02_run-02_T1w.json └── sub-002_ses-02_run-02_T1w.nii.gz
With the following
pybids_inputs
defined in the config file:pybids_inputs: bold: filters: suffix: 'bold' extension: '.nii.gz' datatype: 'func' wildcards: - subject - session - acquisition - task - run
Then
generate_inputs(bids_dir, pybids_input)
would return the following values:BidsDataset({ "bold": BidsComponent( name="bold", path="bids/sub-{subject}/ses-{session}/func/sub-{subject}_ses-{session}_task-{task}_bold.nii.gz", zip_lists={ "subject": ["001", "001", "001", "001", "002", "002" ], "session": ["01", "01", "02", "02", "01", "01" ], "task": ["nback", "rest", "nback", "rest", "nback", "rest"], }, ), "t1w": BidsComponent( name="t1w", path="example/sub-{subject}/ses-{session}/anat/sub-{subject}_ses-{session}_run-{run}_T1w.nii.gz", zip_lists={ "subject": ["001", "001", "001", "002", "002", "002", "002"], "session": ["01", "01", "02", "01", "01", "02", "02" ], "run": ["01", "02", "01", "01", "02", "01", "02" ], }, ), })