I have 200TBs of PDF files(~800K), 100MB each on average, to be stored in MongoDB(GRIDFS). It is the cloud provider's policy to store PDFs in MongoDB Database only.
I need guidance regarding MongoDB as I'm new to it and couldn't find satisfactory answers to my questions. I'll describe the application and my storage strategy below and then as a few questions in this one POST as all need the below preface.
- 1 x Ubuntu LTS Server
- 4 x 50TB NFS Share Volume (Total 200TB)
Cloud provider can only make volumes of 50TB max.
The site is a retrieval application. So once all ~800K PDFs are uploaded to MongoDB no write operations are to be done after that.
Only one user is going to use the site that too ONCE A MONTH. No traffic at all. Thus, MongoDB SHARDING is something I should not opt for and won't.
The MongoDB Instance is configured to put databases files in separate database folders cause I have 4 x 50TB nfs mounted volumes and 200TBs of data. Initially I thought i can mount the 4 volumes as database names in /var/mongodb/data/ and monogdb will store files in the volumes now depending on the database name.
MongoDB has one file for a database!!. I'm not sure if a 50TB file can be handled by the OS and MongoDB. Also backing up, restoring, copying is all time consuming.
So, I'm thinking of storing 1000 documents in each MongoDB GRIDFS Database in a Single MongoDB Instance like below:
MongoDB Instance at 192.168.15.12 DB1 DB1001 DB2001 . . . DB799001 Each database DBx in /var/mongodb/data/DBx is a mounted NFS Share folder. fstab entry has 800 entries of 4 NFS share volumes of 50TB each volume pointing to about 200 folders in /var/mongodb/data.
I'm planning this for 3 reasons:
- Once 1000 PDFs are written to DBx, that database can be backed-up and restored SUPER DUPER FAST.
- If a Database DBx fails or is corrupted then again only need to restore a small amount of data.
- I'm not sure if MongoDB can handle 50TB to 200TB of space occupied by a DATABASE. As MongoDB stores database in a SINGLE FILE!!, unlike SQL Server that has option to cap(.mdf) and stores database files(.ndf) across multiple drives.
Questions
- Can MongoDB handle a database with 50TB of data? MongoDB creates a 50TB file in the database folder. Can the system all together handle such a large file with ease.
- What is the recommended database size for MongoDB?
- How many databases should a Single MongoDB Instance can Handle? (I might have 800)
- Is this strategy good enough?
- Can you recommend a better strategy
- is mounting the same nfs share over 200 times to different folders a bad idea?