As part of my work, we have been working with MongoDB. Coming from the SQL world where queries start with WHERE clause or cryptic JOIN, the MongoDB JavaScript file is easy to work with. I will share with you some example queries in this post.
As a background, we implemented a feature where user can endorse terms, similar to the LinkedIn endorsement. We have the concept of
term
, and endorsement
. A term
is just a representation of a piece of text. An endorsement
is an association between a term, user, and listing that a user's currently endorsing. Here's what they look like:testx:PRIMARY> db.terms.findOne(); { "_id" : ObjectId("53aa7ac62ecea3e746beaef2"), "text" : "Locally Owned", "origin" : "TERM_DB", "_created" : NumberLong("1403681478027"), "_modified" : NumberLong("1403681478027") } testx:PRIMARY> db.endorsements.findOne(); { "_id" : ObjectId("53aa7ba8d0adeb95abec766f"), "listingId" : "14938311", "userId" : "1", "headingCode" : "33650", "term" : DBRef("terms", ObjectId("53aa7acc2ecea3e746beaf20")), "_created" : NumberLong("1403681704148"), "_modified" : NumberLong("1403681704148") }
1. Updating using Regular expression
When we went live initially, there were a bunch of terms starting with "Authentic", such as "Authentic Italian", "Authentic Chinese". After a while, we didn't see much uptake for those terms. We decided to drop the word "Authentic". Good news is Mongo supports regex, so updating them is quite easy: searching and replace all terms starting with "Authentic" with an empty string.function updateTermWithAuthentic() { //drop the word Authentic in front of "Authentic Some Cuisine", Authentic is 10 character including the space, hence the substring of 10. db.terms.find({ text: { $regex: '^Authentic' } }).forEach(function(doc) { db.terms.update( { _id: doc._id }, { $set: { text: doc.text.substring(10) } } ) }) }We could update a term text with a new text, with a bit of print logging:
function updateTermText(oldTerm, newTerm) { var cursor = db.terms.find({text: oldTerm}); if (cursor.hasNext()) { db.terms.update( { text: oldTerm }, { $set: { text: newTerm } } ) print("Done - Processing '" + oldTerm + "' - replaced with new term: '" + newTerm + "'."); } else { print(">>> ERROR - Processing '" + oldTerm + "'... Don't have term '" + oldTerm + "' in the database, maybe typo?."); } } updateTermText("Good, High Quality", "Good or High Quality");
2. Count distinct
How many users submitted endorsements:testx:PRIMARY> db.endorsements.distinct('userId').length;
3. Run query on data after a certain date/timestamp
How many endorsements after a certain date, for example after 14-Aug-2014:testx:PRIMARY> db.endorsements.find({ _created: { $gte: new Date(2014,7,15).getTime() } }).count(); //month is 0-11 //Sample output: 441How many unique users submitting endorsements after a certain date, for example, 14-Aug-2014:
testx:PRIMARY> db.runCommand({ distinct: 'endorsements', key: 'userId', query: { _created: { $gte: new Date(2014,7,15).getTime() } } }).values.length; //Sample output: 113
4. Aggregation
Find the popular endorsement terms in each listing:testx:PRIMARY> db.endorsements.aggregate( [ { $group: { _id: { listingId: "$listingId", term: "$term" }, count: { $sum: 1 } }, { $sort: { count: -1 } } } ] ).forEach(function(myDoc) { print(myDoc._id.listingId + "," + myDoc._id.term + "," + myDoc.count); } /* Sample output: 14813910,DBRef("terms", ObjectId("53aa7ae5d0adeb95abec7602")),10 13347379,DBRef("terms", ObjectId("53aa7ace2ecea3e746beaf34")),10 13501322,DBRef("terms", ObjectId("53aa7ae9d0adeb95abec761e")),7 13347379,DBRef("terms", ObjectId("53aa7ae1d0adeb95abec75e2")),6 */The DBRef and ID are a bit ugly. It would be nice to output the Term text there. With DBRef, we can modify the forEach function() above to perform another query to get the term text:
testx:PRIMARY> db.endorsements.aggregate( //similar to the above //... ).forEach(function(myDoc) { var termRef = myDoc._id.term var termText = db[termRef.$ref].findOne( { _id: (termRef.$id) } ) print(myDoc._id.listingId + "," + termText.text + "," + myDoc.count); }); /* Sample output: 15495422,Cosy Venue,10 14524676,A La Carte,10 14813910,Healthy,7 13407783,Elderly Friendly,6 */
5. Aggregation, count and sorting
Which user contributed the most number of endorsements? We need to aggregate based on userId.testx:PRIMARY> db.endorsements.aggregate( [ { $group: { _id: { userId: "$userId" }, count: { $sum: 1 } } }, { $sort: {count: -1} } ] ).forEach(function(myDoc) { print(myDoc._id.userId + "," + myDoc.count); }); /* Sample output: 361259bc32c1bc79b8dffa4130f22a83ea7d1c,80 55b5ae81-000c-4a0e-87ee-b6a7cada5a01,38 4e31ccd3-371a-49a6-bfc7-e1f9c3e99672,24 b82e9b60-5e1b-42c2-8131-7ddc67cfd525,23 */
6. Update the DBRef
Scenario: An existing term becomes obsolete, so we want to update all existing endorsements pointing to that obsolete terms to a different term. For example, the term "Great Service" is obsolete and should be deleted, all existing endorsements should be updated to "Great or Professional Service".First, find the term "Great Service", and the term "Great or Professional Service", note down the ObjectId for both.
// find the ID of "Great Service" testx:PRIMARY> db.terms.find({text: "Great Service"}).pretty(); { "_id" : ObjectId("54a2123744c96256390bee87"), "className" : "models.endorsement.Term", "text" : "Great Service", "origin" : "TERM_DB", "_created" : NumberLong("1419907639466"), "_modified" : NumberLong("1419907639466") } // find the ID of “Great or Professional Service" testx:PRIMARY> db.terms.find({text: "Great or Professional Service"}).pretty(); { "_id" : ObjectId("53d99b0b44c96256390bec17"), "className" : "models.endorsement.Term", "text" : "Great or Professional Service", "origin" : "TERM_DB", "_created" : NumberLong("1406769931603"), "_modified" : NumberLong("1406769931603") }Perform update for all the endorsements having term ID of "Great Service" to the ID of "Great or Professional Service":
testx:PRIMARY> db.endorsements.update( { "term.$id": ObjectId("54a2123744c96256390bee87") }, { $set: { term: { $ref: "terms", $id: "53d99b0b44c96256390bec17" } } } )Finally, delete the term "Great Service":
// remove the term “Great Service" testx:PRIMARY> db.terms.remove({text: "Great Service"})
7. Unique grouping from derived values
Scenario: we want to find out unique users contributing for each day. We can do a distinct query based onuserId
and _created
fields. However, _created
is a timestamp, not a date. From the _created
field, we need to derive a yyyy-MM-dd and perform a distinct query. Luckily, MongoDB group()
function and its keyf
does support that.testx:PRIMARY> db.endorsements.group({ keyf: function(doc) { var createdDate = new Date(doc._created); return { id: doc.userId, year: createdDate.getFullYear(), month: createdDate.getMonth() + 1, date: createdDate.getDate() }; }, reduce: function ( curr, result ) { }, initial: {} }).forEach(function(doc) { print(doc.id + ", " + doc.year + "/" + doc.month + "/" + doc.date); }); /* Sample output: 55b5ae81-000c-4a0e-87ee-b6a7cada5a01, 2014/8/14 f2ded4d1-ff70-4b5e-88e4-e5e51d8a3780, 2014/8/14 a8b8a640-0c2d-443b-8cdd-a541d610cfe2, 2014/8/14 d589f10e-a97c-473d-aecf-ab847f5b42db, 2014/8/14 5d830945-9665-4298-aff1-6ab68b27c8af, 2014/8/14 */
References
- http://www.tutorialspoint.com/mongodb/mongodb_database_references.htm
- http://stackoverflow.com/questions/7613801/mongodb-sort-by-subdocument-count
- http://docs.mongodb.org/manual/reference/command/distinct/#dbcmd.distinct
- http://stackoverflow.com/questions/13361278/mongodb-query-by-dbref
0 comments:
Post a Comment