MongoDB: a Fast and Easy Way to Calculate Aggregated Values

A MongoDB aggregation framework allows you to calculate aggregated values without having to use map-reduce. While map-reduce is a powerful tool, it often proves to be slow when processing big volumes of data. In this article, I would like to compare map-reduce with MongoDB and show the significant benefits of using the latter.

MongoDB vs Map-Reduce

The main differences of Aggregation Framework from Map-Reduce are:

declarative syntax, no need to write code in JavaScript;
describing chains of operations to apply;
expressions evaluation;
higher performance because aggregation framework is implemented in C++ instead of JavaScript;
projections of returned data so a user can add computed fields, sub-objects, etc.

Framework concepts

Aggregation Framework provides the similar logic as the “GROUP BY” SQL operator. There are 2 main concepts in aggregation framework: pipelines and expressions. Pipelines are operators that can process a stream of the documents. Expressions return the output documents after the calculations on input documents. Some pipelines:

$match – uses query predicate like collection.find({});
$project – allows to change the shape of the result, include computed values, sub-objects, etc.;
$unwind – separates elements of an array and add it into an output document;
$sort – sorts documents;
$limit – specifies maximum number of documents to be returned;
$skip – skips a specified number of documents.

Using MongoDB in Node.JS: our hands-on experience

MongoDB has drivers for many programming languages and platforms, including Node.JS. You can install Node.JS driver by typing npm install mongodb.

All MongoDB features are available in the driver. There was a task to aggregate huge data collection by three fields to build some statistical report. The collection contained about 500k records with web pages views statistics. Each document had the following format:

It was necessary to group data by time, IP address and URL. The first version of this logic was implemented using map-reduce:

The processing of 500k records took about 1 minute. It was an annoying issue and we decided to switch to the MongoDB 2.1. aggregation framework. The new version of aggregation logic is presented below:

In this code, we use 2 pipelines: $match and $group. The $match filter required records, and the $group aggregates records by three fields: time, URL and IP. These fields are used as a key because we explicitly specified ‘_id’ field and expression $sum calculates the number of records with the same key. The output data has the following view:

Result

The use of an aggregation framework significantly improved the performance of the processing. Now 500k of records are processed within 3-4 seconds. The MongoDB aggregation framework is a powerful, simple and lightweight tool that really allows you to improve the performance of aggregated values calculations without using map-reduce.

Alexander Bessudnov

Chief Product Officer

With a passion for innovation and a keen understanding of market trends, Alexander plays a pivotal role in shaping Magora's product development strategy and ensuring the delivery of cutting-edge solutions to clients.

MongoDB: a Fast and Easy Way to Calculate Aggregated Values without Map-Reduce

MongoDB vs Map-Reduce

Framework concepts

Using MongoDB in Node.JS: our hands-on experience

Result

Subscribe to our Magora Newsletter:

Grab your e-book: Design to attract more buyers

Subscribe to our Magora
Newsletter: