Handling large file (20GB +) can be tough. I'm studying a way to improve performance on our file manager system when uploading a large files. The initial idea is to create a PoC to check performance between Node.JS and Python workers.
Our feature users will paste a Google Drive link, and in the background, the system downloads the file and uploads it to AWS S3 — all without blocking the user.
What the System Does
- User pastes a Google Drive link
- The system adds a job to a queue
- A background worker downloads the file
- The file is uploaded to AWS S3
- The user gets notified when it’s done
Main Goals
- Handle big files (20GB+)
- Use low memory (no saving to disk)
- Be fast and reliable
Python vs Node.js
I tested both. Python (with FastAPI and boto3) works, but streaming big files needs extra care for what I've seen in my researches.
Node.js is built for streaming. Using stream
, fs, and @aws-sdk/client-s3
, I could pipe the file directly from Google Drive to S3. No buffer, no saving to disk, and memory stays low.
Why I Picked Node.js
- Streams are native and simple
- Uses less memory
- It seems to be more stable for this kind of task
For moving large files from cloud to cloud, Node.js seems to be a better and more efficient choice.
I’ll post the solution soon once is done!
Top comments (0)