-
Notifications
You must be signed in to change notification settings - Fork 10
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Splitting dataset into files and adding prompts and scripts
- Loading branch information
Showing
18 changed files
with
4,900 additions
and
42 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,16 @@ | ||
# python script to add ID for each dataset | ||
import json | ||
|
||
# Load the JSON data from a file | ||
with open('parsed_data.json', 'r') as file: | ||
data = json.load(file) | ||
|
||
# Add an id tag to each data entry | ||
for index, entry in enumerate(data): | ||
entry['id'] = index + 1 | ||
|
||
# Write the modified data back to a JSON file | ||
with open('parsed_data_with_ids.json', 'w') as file: | ||
json.dump(data, file, indent=2) | ||
|
||
print("IDs added successfully.") |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,29 @@ | ||
#Parses the json out of the dataset generated by LLM | ||
import json | ||
import re | ||
|
||
# Read the file content | ||
with open('dataset.json', 'r') as file: | ||
content = file.read() | ||
|
||
# Extract JSON strings using regular expressions | ||
json_strings = re.findall(r'```json\n(.*?)\n```', content, re.DOTALL) | ||
|
||
# Initialize an empty list to store parsed data | ||
parsed_data = [] | ||
|
||
# Parse the JSON strings, ensuring to check for None or empty strings | ||
for json_str in json_strings: | ||
if json_str and json_str.strip(): | ||
try: | ||
parsed_data.append(json.loads(json_str)) | ||
except json.JSONDecodeError as e: | ||
print(f"Failed to parse JSON string: {json_str}") | ||
print(f"Error: {e}") | ||
|
||
# Save the parsed data to a new file | ||
output_file = 'parsed_data.json' | ||
with open(output_file, 'w') as file: | ||
json.dump(parsed_data, file, indent=2) | ||
|
||
print(f"Parsed data has been saved to {output_file}") |
Large diffs are not rendered by default.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,18 @@ | ||
import json | ||
import os | ||
|
||
# Load the JSON data from a file | ||
with open('parsed_data_with_ids.json', 'r') as file: | ||
data = json.load(file) | ||
|
||
# Create a directory to store the split files | ||
output_dir = 'split_data' | ||
os.makedirs(output_dir, exist_ok=True) | ||
|
||
# Split each data entry into a new file | ||
for index, entry in enumerate(data): | ||
output_file = os.path.join(output_dir, f'data_{index + 1}.json') | ||
with open(output_file, 'w') as file: | ||
json.dump(entry, file, indent=2) | ||
|
||
print("Data split into separate files successfully.") |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.