Dataset
Alignment Datasets
AlignmentDatasetSample
Container for a single Alignment Dataset Sample.
This representation is faithful to the "TRL Preference Format with explicit prompt". See: https://huggingface.co/docs/trl/en/dataset_formats.
Parameters:
-
prompt(str) –The prompt associated with the sample.
-
chosen(str) –The winning response associated with the sample.
-
rejected(str) –The losing response associated with the sample.
AlignmentDataset
Container object for an Alignment Dataset.
Parameters:
-
task(AligmnentTask) –The AlignmentTask associated with the dataset.
-
samples(List[AlignmentDatasetSample]) –The samples in this AlignmentDataset.
-
train_frac(float) –Fraction of samples that belong to the training split.
Raises:
-
ValueError–If train_frac is not in the interval [0, 1.0]
Methods:
-
from_dict–Construct an AlignmentDataset from dictionary representation.
-
from_json–Load the AlignmentDataset from a json file.
-
to_dict–Convert the AlignmentDataset to dictionary represenetation.
-
to_hf_compatible–Convert the AlignmentDataset to a dictionary compatible with HuggingFace datasets.
-
to_json–Save the AlignmentDataset to a json file.
Attributes:
-
num_samples(int) –int: The number of samples associated with the AlignmentDataset.
-
num_test_samples(int) –int: The number of test samples associated with the AlignmentDataset.
-
num_train_samples(int) –int: The number of training samples associated with the AlignmentDataset.
-
test(List[AlignmentDatasetSample]) –List[AlignmentDatasetSample]: The list of testing samples associated with the AlignmentDataset.
-
test_frac(float) –Fraction of samples that belong to the testing split.
-
train(List[AlignmentDatasetSample]) –List[AlignmentDatasetSample]: The list of training samples associated with the AlignmentDataset.
num_samples
property
num_samples: int
int: The number of samples associated with the AlignmentDataset.
num_test_samples
property
num_test_samples: int
int: The number of test samples associated with the AlignmentDataset.
num_train_samples
property
num_train_samples: int
int: The number of training samples associated with the AlignmentDataset.
test
property
test: List[AlignmentDatasetSample]
List[AlignmentDatasetSample]: The list of testing samples associated with the AlignmentDataset.
train
property
train: List[AlignmentDatasetSample]
List[AlignmentDatasetSample]: The list of training samples associated with the AlignmentDataset.
from_dict
classmethod
from_dict(dataset_dict: Dict[str, Any]) -> AlignmentDataset
Construct an AlignmentDataset from dictionary representation.
Note
Expects 'task', and 'train', 'test' keys to be present in the dictionary. The 'task' value should be parsable by AlignmentTask.from_dict(). The 'train' and 'test' value should be a list of dictionaries, each of which are parsable by AlignmentDatasetSample.
Parameters:
Returns:
-
AlignmentDataset(AlignmentDataset) –The newly constructed AlignmentDataset.
Raises:
-
ValueError–If the input dictionary is missing any required keys.
Source code in aif_gen/dataset/alignment_dataset.py
113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 | |
from_json
classmethod
from_json(file_path: str | Path) -> AlignmentDataset
Load the AlignmentDataset from a json file.
Note: Uses AlignmentDataset.from_dict() under the hood to parse the representation.
Parameters:
Returns:
-
AlignmentDataset(AlignmentDataset) –The newly constructed AlignmentDataset.
Source code in aif_gen/dataset/alignment_dataset.py
97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 | |
to_dict
Convert the AlignmentDataset to dictionary represenetation.
Returns:
Source code in aif_gen/dataset/alignment_dataset.py
85 86 87 88 89 90 91 92 93 94 95 | |
to_hf_compatible
Convert the AlignmentDataset to a dictionary compatible with HuggingFace datasets.
Returns:
Source code in aif_gen/dataset/alignment_dataset.py
144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 | |
to_json
Save the AlignmentDataset to a json file.
Note: Uses to_dict() under the hood to get a dictionary representation.
Parameters:
Source code in aif_gen/dataset/alignment_dataset.py
73 74 75 76 77 78 79 80 81 82 83 | |
ContinualAlignmentDataset
Container object for a Continual Alignment Dataset.
Parameters:
-
datasets(List[ContinualAlignmentDataset]) –Temporal list of AlignmentDatasets constituents.
Methods:
-
append–Append a single AlignmentDataset to the ContinualAlignmentDataset.
-
extend–Append multiple AlignmentDataset's to the ContinualAlignmentDataset.
-
from_dict–Construct a ContinualAlignmentDataset from dictionary representation.
-
from_json–Load the ContinualAlignmentDataset from a json file.
-
to_dict–Convert the ContinualAlignmentDataset to dictionary represenetation.
-
to_hf_compatible–Convert the ContinualAlignmentDataset to a list of dictionaries compatible with HuggingFace datasets.
-
to_json–Save the ContinualAlignmentDataset to a json file.
Attributes:
-
num_datasets(int) –int: The number of AlignmentDataset constituents.
-
num_samples(int) –int: The total number of samples acros all AlignmentDataset constituents.
num_samples
property
num_samples: int
int: The total number of samples acros all AlignmentDataset constituents.
append
append(dataset: AlignmentDataset) -> None
Append a single AlignmentDataset to the ContinualAlignmentDataset.
Parameters:
-
dataset(AlignmentDataset) –The new dataset to add.
Raises:
-
TypeError–if the sample is not of type AlignmentDataset.
Source code in aif_gen/dataset/continual_alignment_dataset.py
47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 | |
extend
extend(datasets: List[AlignmentDataset]) -> None
Append multiple AlignmentDataset's to the ContinualAlignmentDataset.
Parameters:
-
datasets(List[AlignmentDataset]) –The new datasets to add.
Raises:
-
TypeError–if any dataset is not of type AlignmentDataset.
Source code in aif_gen/dataset/continual_alignment_dataset.py
63 64 65 66 67 68 69 70 71 72 73 | |
from_dict
classmethod
from_dict(
dataset_dict: Dict[str, Any],
) -> ContinualAlignmentDataset
Construct a ContinualAlignmentDataset from dictionary representation.
Note
Expects 'datasets' key to be present in the dictionary. The value is a list of dictionaries, each parsable by AlignmentDataset.from_dict().
Parameters:
Returns:
-
ContinualAlignmentDataset(ContinualAlignmentDataset) –The newly constructed ContinualAlignmentDataset.
Raises:
-
ValueError–If the input dictionary is missing any required keys.
Source code in aif_gen/dataset/continual_alignment_dataset.py
114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 | |
from_json
classmethod
from_json(
file_path: str | Path,
) -> ContinualAlignmentDataset
Load the ContinualAlignmentDataset from a json file.
Note: Uses ContinualAlignmentDataset.from_dict() under the hood to parse the representation.
Parameters:
Returns:
-
ContinualAlignmentDataset(ContinualAlignmentDataset) –The newly constructed ContinualAlignmentDataset.
Source code in aif_gen/dataset/continual_alignment_dataset.py
98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 | |
to_dict
Convert the ContinualAlignmentDataset to dictionary represenetation.
Returns:
Source code in aif_gen/dataset/continual_alignment_dataset.py
87 88 89 90 91 92 93 94 95 96 | |
to_hf_compatible
Convert the ContinualAlignmentDataset to a list of dictionaries compatible with HuggingFace datasets.
Returns:
-
List[Dict[str, Dataset]]–List[Dict[str, Dataset]]: The list of dictionaries compatible with HuggingFace datasets.
Source code in aif_gen/dataset/continual_alignment_dataset.py
136 137 138 139 140 141 142 | |
to_json
Save the ContinualAlignmentDataset to a json file.
Note: Uses to_dict() under the hood to get a dictionary representation.
Parameters:
Source code in aif_gen/dataset/continual_alignment_dataset.py
75 76 77 78 79 80 81 82 83 84 85 | |