MongoDB: Mastering the art of aggregation (part 1)
Every developer must always consider and ensure an easy, fast and safe exposure of his application data when building an API. In this case, choosing a database to save and read data is a reasonable move towards better targeted regulation and control of various types of data. In the era of rapid development of systems and tools, developers have a wide spectrum of choice of a solution that suits their goals. A wise choice would be to pick a solution that allows easy and effective data manipulation, a solution that can definitely help in gathering high-quality, accurate data and generating relevant results. The process described here is called data aggregation and MongoDB is one of the databases that supports this process.
Data aggregation is the process of gathering data and presenting it in a summarized format. The data may be gathered from multiple collections with the intent of combining these data sources into a summary for data analysis — mostly. To put it more simply, let’s assume you have five products in your database and you’re trying to get only the products with a price greater than $29.99. What you’re doing is called querying. However, if you want to retrieve the total price (calculate the sum) of these five products you need to use aggregation. MongoDB offers many operators which will help you to transform the data in a satisfactory and competent fashion. You can put a set of your data into a pipeline and transform it through a series of operations, eventually popping out a result. The problem arises when your data model is complex or is built in a non-standard form. In this case, the use of a single operator is insufficient, so you should think about combining or chaining multiple operators together. Yet this is not something that will be everyone’s cup of tea, because this requires the right selection of operators in a correct order. It is worth pointing out that debugging such operation is hard and
we are usually condemned to use the known “trial and error” method. In this article I will attempt to get over the fear of using operators in aggregation and start using them with pleasure and caution — on a real-life example.
The game begins.
Let’s assume you have a collection called games. Your collection contains data that looks like this:
You may already have doubts about the scheme of the data presented above — especially about the users property. Your doubt is fully justified, because the shape of the users object may cause a problem when traversing through the object or extracting the appropriate data because the keys of this object are dynamically placed, so we cannot directly refer to a given place in the object without knowing the key name. Nevertheless, by choosing the right operators, we can easily tackle this issue — so let’s move to the next part. Let’s assume, you and your front-end developer colleague are in a project and one day he approaches you and asks you to create an endpoint for retrieving all games that are associated to current authenticated user, because he wants to create a view that will show current user’s games in a list.
What you propose is following:
GET /v1/games/:id
We can now retrieve the ID of the user passed as a parameter and check if his ID appears in the usersIds array of each document.
Sending a GET request with parameter abc will return the result below:
Easy. Next day your colleague comes by and tells you that he created a search bar for filtering players — who are in the same game as the currently logged user — by their name. Sounds difficult considering the current situation related to our data model, but let’s think deeply how to overcome this. We need to convert users object into an array and filter users of ID that are not equal to the ID passed as parameter (so we get rid of current user’s name in each document) and search for matches using a query string which was passed in the URL (e.g: /v1/games/abc?q=Kurt). Let’s create a more advanced pipeline:
The first stage adds a new field to each document called usersTempArr and assigns the result of object-to-array conversion of property users. This operator does not save this field in the database, it only adds this field in the current pipeline process. This is a temporary field and its value is presented like this:
In the second stage we also add a new field called userTempMatched and assign the result of $map + $filter operations to it. You can see that filter operator is wrapped by the map operator, this means that the result of filter operation will be the input of the map operation. First, we leave users of ID that do not match current user’s ID passed in parameter — thanks to the existing k property. The result of filter operation looks like this:
Secondly, we map over this array and return the name property.
In the third stage we use $regex operator. Thanks to regular expression capabilities, we can select a pattern — in our case it is Kurt — and find matches in userTempMatched array of each document.
In the last stage we use $project to exclude the temporary fields from our result. The final result looks as follows:
Conclusion
Aggregation operators offer a variety of possibilities to interpret your data and help you generate data that meets your needs. This article is an introduction to my new series about MongoDB data aggregation. In the next part, I will show you how to inject data from other collections based on two related fields.
That’s it. Thanks for reading. If you liked the post, please give me an applause. If you have any questions, feel free to ask them in the comments!