Find duplicate records in MongoDB?



You can use the aggregate framework to find duplicate records in MongoDB. To understand the concept, let us create a collection with the document. The query to create a collection with a document is as follows −

> db.findDuplicateRecordsDemo.insertOne({"StudentFirstName":"John"}); {    "acknowledged" : true,    "insertedId" : ObjectId("5c8a330293b406bd3df60e01") } > db.findDuplicateRecordsDemo.insertOne({"StudentFirstName":"John"}); {    "acknowledged" : true,    "insertedId" : ObjectId("5c8a330493b406bd3df60e02") } > db.findDuplicateRecordsDemo.insertOne({"StudentFirstName":"Carol"}); {    "acknowledged" : true,    "insertedId" : ObjectId("5c8a330c93b406bd3df60e03") } > db.findDuplicateRecordsDemo.insertOne({"StudentFirstName":"Sam"}); {    "acknowledged" : true,    "insertedId" : ObjectId("5c8a331093b406bd3df60e04") } > db.findDuplicateRecordsDemo.insertOne({"StudentFirstName":"Carol"}); {    "acknowledged" : true,    "insertedId" : ObjectId("5c8a331593b406bd3df60e05") } > db.findDuplicateRecordsDemo.insertOne({"StudentFirstName":"Mike"}); {    "acknowledged" : true,    "insertedId" : ObjectId("5c8a331e93b406bd3df60e06") }

Display all documents from a collection with the help of find() method. The query is as follows −

> db.findDuplicateRecordsDemo.find();

The following is the output −

{ "_id" : ObjectId("5c8a330293b406bd3df60e01"), "StudentFirstName" : "John" } { "_id" : ObjectId("5c8a330493b406bd3df60e02"), "StudentFirstName" : "John" } { "_id" : ObjectId("5c8a330c93b406bd3df60e03"), "StudentFirstName" : "Carol" } { "_id" : ObjectId("5c8a331093b406bd3df60e04"), "StudentFirstName" : "Sam" } { "_id" : ObjectId("5c8a331593b406bd3df60e05"), "StudentFirstName" : "Carol" } { "_id" : ObjectId("5c8a331e93b406bd3df60e06"), "StudentFirstName" : "Mike" }

Here is the query to find duplicate records in MongoDB −

> db.findDuplicateRecordsDemo.aggregate(    ... {"$group" : { "_id": "$StudentFirstName", "count": { "$sum": 1 } } },    ... {"$match": {"_id" :{ "$ne" : null } , "count" : {"$gt": 1} } },    ... {"$project": {"StudentFirstName" : "$_id", "_id" : 0} } ... );

The following is the output displaying only the duplicate records −

{ "StudentFirstName" : "Carol" } { "StudentFirstName" : "John" }
Updated on: 2019-07-30T22:30:25+05:30

2K+ Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements