- Notifications
You must be signed in to change notification settings - Fork 4.4k
Closed
Labels
bugSomething isn't workingSomething isn't working
Description
Error:
openai tools fine_tunes.prepare_data -f training_data_2022-09-14.jsonl Analyzing... - Your file contains 2446 prompt-completion pairs Based on the analysis we will perform the following actions: - [Recommended] Remove 1155 duplicate rows [Y/n]: y - [Recommended] Remove 49 long examples [Y/n]: y Traceback (most recent call last): File "/Users/ser/project/project-venv/bin/openai", line 8, in <module> sys.exit(main()) File "/Users/ser/project/project-venv/lib/python3.10/site-packages/openai/_openai_scripts.py", line 63, in main args.func(args) File "/Users/ser/project/project-venv/lib/python3.10/site-packages/openai/cli.py", line 531, in prepare_data apply_validators( File "/Users/ser/project/project-venv/lib/python3.10/site-packages/openai/validators.py", line 851, in apply_validators df, optional_applied = apply_optional_remediation( File "/Users/ser/project/project-venv/lib/python3.10/site-packages/openai/validators.py", line 578, in apply_optional_remediation df = remediation.optional_fn(df) File "/Users/ser/project/project-venv/lib/python3.10/site-packages/openai/validators.py", line 171, in optional_fn return x.drop(long_indexes) File "/Users/ser/project/project-venv/lib/python3.10/site-packages/pandas/util/_decorators.py", line 311, in wrapper return func(*args, **kwargs) File "/Users/ser/project/project-venv/lib/python3.10/site-packages/pandas/core/frame.py", line 4957, in drop return super().drop( File "/Users/ser/project/project-venv/lib/python3.10/site-packages/pandas/core/generic.py", line 4267, in drop obj = obj._drop_axis(labels, axis, level=level, errors=errors) File "/Users/ser/project/project-venv/lib/python3.10/site-packages/pandas/core/generic.py", line 4311, in _drop_axis new_axis = axis.drop(labels, errors=errors) File "/Users/ser/project/project-venv/lib/python3.10/site-packages/pandas/core/indexes/base.py", line 6661, in drop raise KeyError(f"{list(labels[mask])} not found in axis") KeyError: '[330, 352, 377, 378, 422, 424, 435, 1172, 1194, 1219, 1220, 1264, 1266, 1277, 1468, 1498, 1549, 1641, 1648, 1714, 1741, 1816, 1859, 1984] not found in axis' I believe that since the duplicate rows were removed, many of the long examples are missing, throwing this error. And thus I end up needing to apply the first recommendation and not the second one, and then use the resulting file to apply the second recommendation.
It would be great to be able to apply both changes to the same file.
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working