Aggregating trees with RavenDB

We got an interesting question in the RavenDB discussion group:How to do aggregation on a tree structure?The task is to build a Work Breakdown Structure, where you have:ProjectsMajor deliverablesSub-deliverablesWork packagesThe idea is to be able to track EstimatedHours and CompletedHours across the entire tree. For example, let’s say that I have the following:Project: Bee Keeper Chronicle AppMajor deliverable: App DesignSub-deliverable: Wireframes all screensWork Package: Login page wireframeUsers can add the EstimatedHours and CompletedHours at any level, and we want to be able to aggregate the data upward. So the Project: “Bee Keeper Chronicle App” should have a total estimated time and the number of hours that were worked on.The question is how to model & track that in RavenDB. Here is what I think the document structure should look like:{ "Name": "Login page wire frame", "Parent": { "Type": "Subs", "Id": "subs/0000000000000000009-A" }, "EsimatedHours": 8, "CompletedHours": 3, "@metadata": { "@collection": "WorkPackages" } } { "Name": "Wire frames all screens", "Parent": { "Type": "Majors", "Id": "major/0000000000000000008-A" }, "EsimatedHours": 20, "CompletedHours": 7, "@metadata": { "@collection": "Subs" } } { "Name": "App Design", "Parent": { "Type": "Projects", "Id": "projects/0000000000000000011-A" }, "EsimatedHours": 50, "CompletedHours": 12, "@metadata": { "@collection": "Majors" } } { "Name": "Bee Keeper Chronicle App", "EsimatedHours": 34, "CompletedHours": 21, "@metadata": { "@collection": "Projects" } }I used a Parent relationship, since that is the most flexible (it allows you to move each item to a completely different part of the tree easily). Now, we need to aggregate the values for all of those items using a Map-Reduce index. Because of the similar structure, I created the following JS function:function processWorkBreakdownHours(doc) { let hours = { EsimatedHours: doc.EsimatedHours, CompletedHours: doc.CompletedHours }; let results = [Object.assign({ Scope: id(doc) }, hours)]; let current = doc; while (current.Parent) { current = load(current.Parent.Id, current.Parent.Type); results.push(Object.assign({ Scope: id(current) }, hours)); } return results; }This will scan over the hierarchy and add the number of estimated and completed hours to all the levels. Now we just need to add this file as Additional Sources to an index and call it for all the relevant collections, like this:map("WorkPackages",processWorkBreakdownHours); map("Subs",processWorkBreakdownHours); map("Majors",processWorkBreakdownHours); map("Projects",processWorkBreakdownHours);And the last step is to aggregate across all of them in the reduce function: groupBy(x => x.Scope).aggregate(g => { return { Scope: g.key, EsimatedHours: g.values.reduce((c, val) => val.EsimatedHours + c, 0), CompletedHours: g.values.reduce((c, val) => val.CompletedHours + c, 0) }; })You can see the full index definition here.The end result is automatic aggregation at all levels. Change one item, and it will propagate upward.Considerations: I’m using load() here, which creates a reference from the parent to the child. The idea is that if we move a Work Package from one Sub-deliverable to another (in the same or a different Major & Project), this index will automatically re-index what is required and get you the right result.However, that also means that whenever the Major document changes, we’ll have to re-index everything below it (because it might have changed the Project). There are two ways to handle that. Instead of using load(), we’ll use noTracking.load(), which tells RavenDB that when the referenced document changes, we should not re-index. Provide the relevant scopes at the document level, like this:{ "Name": "Login page wire frame", "Scope": [ "subs/0000000000000000009-A", "major/0000000000000000008-A", "projects/0000000000000000011-A" ], "EsimatedHours": 8, "CompletedHours": 3, "@metadata": { "@collection": "WorkPackages" } }Note that in this case, changing the root will be more complex because you have to scan / touch everything if you move between parts of the tree. In most cases, that is such a rare event that it shouldn’t be a consideration, but it depends largely on your context. And there you have it, a simple Map-Reduce index that can aggregate across an entire hierarchy with ease.

Jan 19, 2025 - 20:06

We got an interesting question in the RavenDB discussion group:How to do aggregation on a tree structure?

The task is to build a Work Breakdown Structure, where you have:

  • Projects
  • Major deliverables
  • Sub-deliverables
  • Work packages

The idea is to be able to track EstimatedHours and CompletedHours across the entire tree. For example, let’s say that I have the following:

  • Project: Bee Keeper Chronicle App
  • Major deliverable: App Design
  • Sub-deliverable: Wireframes all screens
  • Work Package: Login page wireframe

Users can add the EstimatedHours and CompletedHours at any level, and we want to be able to aggregate the data upward. So the Project: “Bee Keeper Chronicle App” should have a total estimated time and the number of hours that were worked on.

The question is how to model & track that in RavenDB. Here is what I think the document structure should look like:


{
    "Name": "Login page wire frame",
    "Parent": {
        "Type": "Subs",
        "Id": "subs/0000000000000000009-A"
    },
    "EsimatedHours": 8,
    "CompletedHours": 3,
    "@metadata": {
        "@collection": "WorkPackages"
    }
}


{
    "Name": "Wire frames all screens",
    "Parent": {
        "Type": "Majors",
        "Id": "major/0000000000000000008-A"
    },
    "EsimatedHours": 20,
    "CompletedHours": 7,
    "@metadata": {
        "@collection": "Subs"
    }
}


{
    "Name": "App Design",
    "Parent": {
        "Type": "Projects",
        "Id": "projects/0000000000000000011-A"
    },
    "EsimatedHours": 50,
    "CompletedHours": 12,
    "@metadata": {
        "@collection": "Majors"
    }
}


{
    "Name": "Bee Keeper Chronicle App",
    "EsimatedHours": 34,
    "CompletedHours": 21,
    "@metadata": {
        "@collection": "Projects"
    }
}

I used a Parent relationship, since that is the most flexible (it allows you to move each item to a completely different part of the tree easily). Now, we need to aggregate the values for all of those items using a Map-Reduce index.

Because of the similar structure, I created the following JS function:


function processWorkBreakdownHours(doc) {
    let hours = {
        EsimatedHours: doc.EsimatedHours,
        CompletedHours: doc.CompletedHours
    };
    let results = [Object.assign({
        Scope: id(doc)
    }, hours)];


    let current = doc;
    while (current.Parent) {
        current = load(current.Parent.Id, current.Parent.Type);
        results.push(Object.assign({
            Scope: id(current)
        }, hours));
    }
    return results;
}

This will scan over the hierarchy and add the number of estimated and completed hours to all the levels. Now we just need to add this file as Additional Sources to an index and call it for all the relevant collections, like this:


map("WorkPackages",processWorkBreakdownHours);
map("Subs",processWorkBreakdownHours);
map("Majors",processWorkBreakdownHours);
map("Projects",processWorkBreakdownHours);

And the last step is to aggregate across all of them in the reduce function:


groupBy(x => x.Scope).aggregate(g => {
    return {
        Scope: g.key,
        EsimatedHours: g.values.reduce((c, val) => val.EsimatedHours + c, 0),
        CompletedHours: g.values.reduce((c, val) => val.CompletedHours + c, 0)
    };
})

You can see the full index definition here.

The end result is automatic aggregation at all levels. Change one item, and it will propagate upward.

Considerations:

I’m using load() here, which creates a reference from the parent to the child. The idea is that if we move a Work Package from one Sub-deliverable to another (in the same or a different Major & Project), this index will automatically re-index what is required and get you the right result.

However, that also means that whenever the Major document changes, we’ll have to re-index everything below it (because it might have changed the Project). There are two ways to handle that.

  • Instead of using load(), we’ll use noTracking.load(), which tells RavenDB that when the referenced document changes, we should not re-index.
  • Provide the relevant scopes at the document level, like this:


{
    "Name": "Login page wire frame",
    "Scope": [
       "subs/0000000000000000009-A",
       "major/0000000000000000008-A",
       "projects/0000000000000000011-A"
    ],
    "EsimatedHours": 8,
    "CompletedHours": 3,
    "@metadata": {
        "@collection": "WorkPackages"
    }
}

Note that in this case, changing the root will be more complex because you have to scan / touch everything if you move between parts of the tree.

In most cases, that is such a rare event that it shouldn’t be a consideration, but it depends largely on your context.

And there you have it, a simple Map-Reduce index that can aggregate across an entire hierarchy with ease.