python - Split a large json file into multiple smaller files

Python - Split a large json file into multiple smaller files

To split a large JSON file into smaller files in Python, you can use the following approach. This example assumes you have a JSON file with an array at the top level.

import json import os def split_json(input_file, output_dir, chunk_size): with open(input_file, 'r') as f: data = json.load(f) # Ensure the output directory exists os.makedirs(output_dir, exist_ok=True) # Split the data into chunks chunks = [data[i:i + chunk_size] for i in range(0, len(data), chunk_size)] # Write each chunk to a separate file for i, chunk in enumerate(chunks): output_file = os.path.join(output_dir, f'output_{i + 1}.json') with open(output_file, 'w') as f: json.dump(chunk, f, indent=2) # Example usage input_file_path = 'large_file.json' output_directory = 'output_files' chunk_size = 1000 # Adjust this to your desired chunk size split_json(input_file_path, output_directory, chunk_size) 

In this example:

  • The split_json function takes the path of the input JSON file, the output directory for the smaller files, and the desired chunk size as parameters.
  • It reads the input JSON file, parses it, and splits the data into chunks of the specified size.
  • Each chunk is then written to a separate JSON file in the output directory.

You can adjust the chunk_size based on your requirements. The resulting files will be named output_1.json, output_2.json, and so on.

Make sure to replace large_file.json with the path to your actual input JSON file.

Examples

  1. "Python split large JSON file into smaller files"

    • Code Implementation:
      import json import os with open('large_file.json') as f: data = json.load(f) chunk_size = 1000 # specify the number of records per file for i in range(0, len(data), chunk_size): chunk = data[i:i + chunk_size] with open(f'output_file_{i // chunk_size}.json', 'w') as outfile: json.dump(chunk, outfile, indent=2) 
    • Description: This query provides a basic Python script that reads a large JSON file, splits it into smaller chunks (specified by chunk_size), and writes each chunk to a separate output file.
  2. "Python split large JSON array into files"

    • Code Implementation:
      import json import os with open('large_file.json') as f: data = json.load(f) chunk_size = 1000 # specify the number of elements per file for i in range(0, len(data), chunk_size): chunk = data[i:i + chunk_size] with open(f'output_file_{i // chunk_size}.json', 'w') as outfile: json.dump(chunk, outfile, indent=2) 
    • Description: This query focuses on splitting a large JSON array (list) into smaller files, using a similar approach to the previous example.
  3. "Python split JSON file by key into smaller files"

    • Code Implementation:
      import json import os with open('large_file.json') as f: data = json.load(f) key_to_split_on = 'category' # specify the key to split by grouped_data = {} for item in data: key_value = item.get(key_to_split_on, 'unknown') grouped_data.setdefault(key_value, []).append(item) for key, group in grouped_data.items(): with open(f'output_file_{key}.json', 'w') as outfile: json.dump(group, outfile, indent=2) 
    • Description: This query provides code to split a large JSON file into smaller files based on a specific key (e.g., 'category') in the JSON objects.
  4. "Python split JSON file evenly into files"

    • Code Implementation:
      import json import os with open('large_file.json') as f: data = json.load(f) num_files = 5 # specify the number of output files chunk_size = len(data) // num_files for i in range(0, len(data), chunk_size): chunk = data[i:i + chunk_size] with open(f'output_file_{i // chunk_size}.json', 'w') as outfile: json.dump(chunk, outfile, indent=2) 
    • Description: This query demonstrates splitting a large JSON file into a specified number of smaller files, distributing the data evenly across the output files.
  5. "Python split JSON file by date into files"

    • Code Implementation:
      import json import os with open('large_file.json') as f: data = json.load(f) date_key = 'timestamp' # specify the date key to split by grouped_data = {} for item in data: date_value = item.get(date_key, 'unknown') grouped_data.setdefault(date_value, []).append(item) for key, group in grouped_data.items(): with open(f'output_file_{key}.json', 'w') as outfile: json.dump(group, outfile, indent=2) 
    • Description: This query provides code to split a large JSON file into smaller files based on a date key (e.g., 'timestamp') in the JSON objects.
  6. "Python split JSON file into equal size files"

    • Code Implementation:
      import json import os with open('large_file.json') as f: data = json.load(f) num_files = 5 # specify the number of output files chunk_size = len(data) // num_files for i in range(0, len(data), chunk_size): chunk = data[i:i + chunk_size] with open(f'output_file_{i // chunk_size}.json', 'w') as outfile: json.dump(chunk, outfile, indent=2) 
    • Description: This query provides code to split a large JSON file into a specified number of smaller files, ensuring that each output file has an approximately equal number of JSON objects.
  7. "Python split JSON file into nested folders"

    • Code Implementation:
      import json import os with open('large_file.json') as f: data = json.load(f) num_folders = 5 # specify the number of nested folders chunk_size = len(data) // num_folders for i in range(0, len(data), chunk_size): chunk = data[i:i + chunk_size] folder_path = f'output_folder_{i // chunk_size}' os.makedirs(folder_path, exist_ok=True) with open(os.path.join(folder_path, f'output_file_{i // chunk_size}.json'), 'w') as outfile: json.dump(chunk, outfile, indent=2) 
    • Description: This query extends the splitting process by organizing the output files into nested folders based on the data chunk.
  8. "Python split JSON file by size into smaller files"

    • Code Implementation:
      import json import os with open('large_file.json') as f: data = json.load(f) max_file_size = 1e6 # specify the maximum file size in bytes current_size = 0 current_chunk = [] for item in data: item_size = len(json.dumps(item)) if current_size + item_size > max_file_size: with open(f'output_file_{len(os.listdir("."))}.json', 'w') as outfile: json.dump(current_chunk, outfile, indent=2) current_size = 0 current_chunk = [] current_chunk.append(item) current_size += item_size # Write the last chunk if any if current_chunk: with open(f'output_file_{len(os.listdir("."))}.json', 'w') as outfile: json.dump(current_chunk, outfile, indent=2) 
    • Description: This query provides code to split a large JSON file into smaller files based on a specified maximum size in bytes.
  9. "Python split JSON file by object count into files"

    • Code Implementation:
      import json import os with open('large_file.json') as f: data = json.load(f) max_objects_per_file = 500 # specify the maximum number of objects per file current_chunk = [] for i, item in enumerate(data, start=1): current_chunk.append(item) if i % max_objects_per_file == 0: with open(f'output_file_{i // max_objects_per_file}.json', 'w') as outfile: json.dump(current_chunk, outfile, indent=2) current_chunk = [] # Write the last chunk if any if current_chunk: with open(f'output_file_{(i // max_objects_per_file) + 1}.json', 'w') as outfile: json.dump(current_chunk, outfile, indent=2) 
    • Description: This query provides code to split a large JSON file into smaller files based on a specified maximum number of JSON objects per file.
  10. "Python split JSON file with progress indicator"

    • Code Implementation:
      import json import os from tqdm import tqdm # tqdm is a library for adding progress bars with open('large_file.json') as f: data = json.load(f) chunk_size = 1000 # specify the number of records per file num_chunks = len(data) // chunk_size for i in tqdm(range(num_chunks), desc="Splitting JSON"): chunk = data[i * chunk_size: (i + 1) * chunk_size] with open(f'output_file_{i}.json', 'w') as outfile: json.dump(chunk, outfile, indent=2) 
    • Description: This query enhances the splitting process by incorporating a progress bar using the tqdm library, providing visual feedback on the progress of the operation.

More Tags

tabbedpage git-gc image-uploading master-slave payment-gateway cmd react-async firemonkey gcov autoload

More Programming Questions

More Fitness-Health Calculators

More Financial Calculators

More Animal pregnancy Calculators

More Other animals Calculators