chat_analysis.py is a command line tool that distills large chat logs into a concise persona summary. It understands several JSON formats including ChatGPT conversation exports and OpenAI API logs.
pip install -r requirements.txtAn OpenAI API key is required for summarisation. Set OPENAI_API_KEY in your environment.
python chat_analysis.py <log_path> <output_path>The script reads the chat log at log_path and writes a JSON file containing a narrative biography and categorized facts about the user.
- Log parsing – messages are loaded from different JSON structures and converted to
ChatTurnobjects. - User text extraction – only messages from the
userspeaker are concatenated. - Segmentation – text is split into overlapping chunks and embeddings help locate topic boundaries.
- Chunk summarisation – GPT‑4 is used to extract background, style, goals, lifestyle and interests from each chunk.
- Aggregation – similar facts are deduplicated via embedding similarity.
- Narrative generation – a short biography is produced from the aggregated facts.
chat_analysis.ipynb contains an earlier exploratory notebook, but the Python script is the recommended entry point.
python chat_analysis.py sample_log.json persona.jsonThe resulting persona.json will look like:
{ "narrative": "...", "facts": { "background": [...], "style": [...], "goals": [...], "lifestyle": [...], "interests": [...] } }This project is licensed under the MIT License. See LICENSE for details.