MongoDB: Mastering the art of aggregation (part 2)

Piotr Sobuś
4 min readApr 14, 2020

In this article I will explore the technique of injecting data from a specific collection into an aggregation pipeline by joining two collections together based on a common field. This article is a continuation of my series about data aggregation in MongoDB. I highly recommend to check part 1 of this series before going any further.

Photo by Clint Adair on Unsplash

Referring to the example from the first article, we can agree on the point that returning simple data is sometimes not enough and in this case we need to broaden the scope of returned data so that developers who are responsible for displaying data on the web application can fully operate on them and fill out individual elements of the application without any problems. Let me extend the example from the first article by adding a new field — mapId. This field will be a reference to a document in another collection, called maps. So, at this moment, our games collection will look like this:

Whereas, our maps collection will look like presented below:

One day your front-end developer colleague approaches you with a request to extend the returned data by additional fields. He wants to display information about the current played map in the list of current games. Moreover, he asks you kindly to do this within the same endpoint which was created for the filtering part (presented in first part of this series). New day, new feature.
So, you wonder how to approach this task and you find that the easiest way is simply extract mapId from each game object, find the map in the maps collection using the extracted ID and if you successfully find it, take the data and inject in the specific game object, otherwise do nothing. Luckily for us, MongoDB offers a $lookup operator which joins a collection from the same database based on a provided field name. This is how we should implement this:

In the first stage we use $lookup operator to pull the map data from maps collection — based on the equality of two properties — and put it into _map field. In the second stage we use $unwind operator to deconstruct the value of _map field (which is an array, because $lookup returns an array of elements) to a single element, because we know that we’ll eventually get only one match (one data object) or no match. We also use preserveNullAndEmptyArrays to include documents whose _map field is null, missing, or an empty array. Our pipeline will pop out this:

In a situation where our mapId has a String type, instead of ObjectId, we need to convert it to ObjectId, before doing anything. We can combine $addFields and $toObjectId together in order to create a temporary field and assign the result of String-to-ObjectId conversion and use that temporary field in each stage of the pipeline. Later, we can remove it with $project operator. Let’s extend our pipeline:

Select particular fields inside $lookup

A single map object can contain many fields and probably some of them will not be needed in the result, therefore we can exclude them while doing the $lookup operation. In this case, we have to create a nested pipeline and execute it within the joined collection and return back to the main pipeline with our modified result. Here we go:

We use let to specify variables from the main pipeline to be used in the pipeline of the joined collection. We use $match to do the equality check between two fields. First argument is the ID of the map object from the maps collection that is being iterated in the nested pipeline, wheras the second argument is a reference of map taken from each game object and placed in the second pipeline. After that, we include the fields we really need. The rest of fields that were not provided in the list are automatically excluded.

Additional operations in the nested pipeline

Let’s say your front-end colleague wants a prepared format of a map location in the result. We can still continue adding new operators in the nested pipeline which will be executed within the joined collection and not in the main collection. Proposed format may look like this:

locationName: "Location: Ankrahmun, Antica" 

We can use $reduce to loop over the array of location names (before this we need to convert location object to an array & map the array to values-only) and concat them by a comma.

Conclusion

As you can see, performing an aggregation that involves multiple collections is easy. We can create a desired functionality by combining specific operators and receive data in a special fashion from the pipeline that is processed for the joined collection.

That’s it. Thanks for reading. If you liked the post, please give me an applause. If you have any questions, feel free to ask them in the comments!

--

--