Measure Advanced Configuration

The guide Managing Measures explains how to use the Measure Editor to create and manage measures. This reference gives more detail on the various options available when creating measures in advanced configuration. See Measure Settings for simple configuration.

Value types

The valueType property is used to tell Watershed what data value the value of an extension has. Valid value types are string, number, boolean, time, duration, percentage and array. This is especially important for durations. For example:

{ 
  "name": "Start", 
  "aggregation": { 
    "type": "LAST" 
  }, 
  "valueProducer": { 
    "type": "STATEMENT_PROPERTY", 
    "statementProperty": "context.extensions.[http://id.tincanapi.com/extension/starting-point]", 
    "valueType": "duration" 
  }
}

Format

The format property is used to tell Watershed how to format date values. Here's a list of allowed format values.

For example:

{ 
  "name": "Last Timestamp With Time (am/pm)", 
  "aggregation": { 
    "type": "LAST" 
  }, 
  "valueProducer": { 
    "type": "STATEMENT_PROPERTY", 
    "statementProperty": "timestamp", 
    "format": "MMMM Do YYYY, h:mm:ss a" 
  }
},
{ 
  "name": "Last Timestamp With Time and Zone", 
  "aggregation": { 
    "type": "LAST" 
  }, 
  "valueProducer": { 
    "type": "STATEMENT_PROPERTY", 
    "statementProperty": "timestamp", 
    "format": "YYYY/MM/DD kk:mm:ss Z" 
  }
}

Additional Aggregations

Some aggregations are not yet supported by the Measure Editor and can only be added in the advanced configuration of reports.

Any

While the LAST and FIRST aggregations can be used to return specifically the first or last value of a statement property, the ANY aggregation will return just any value. This is useful when you don't care which value is returned. For example when displaying a property from an activity definition in a report organized by activity, all statements about that activity will have exactly the same value for that property. It therefore does not matter which statement's value is returned, since the value will be the same regardless.

Formula

The formula aggregation enables you to perform mathematical calculations with the results of other measures. In the formula, measures are identified by the letter m, followed by the index of that measure. For example, if your third measure was a formula measure and you wanted to multiply the first measure by the second measure, the syntax would be:

{
  "name": "Multiplication Example",
  "aggregation": {
    "type": "FORMULA",
    "formula": "m0 * m1"
  },
  "valueProducer": {}
}

More complex mathematical formula are also possible. For example:

{
  "name": "Pythagoras",
  "aggregation": {
    "type": "FORMULA",
    "formula": "sqrt(pow(m0, 2) + pow(m1, 2))"
  },
  "valueProducer": {}
}

The table below lists some of the possible operations.

Function	Description	Example Formula	Example Result
ceil	Rounds up to the next whole number.	ceil(4.4)	5
floor	Rounds down to the next whole number.	floor(4.4)	4
round	Rounds to the nearest whole number.	round(4.4)	4
max	Returns the highest value from a list.	max(5,3,7)	7
min	Returns the lowest value from a list.	min(5,3,7)	3
pow	Returns the number to the specified power.	pow(2,10)	1024
sqrt	Returns the square root of the number.	sqrt(9)	3

The Formula measure value producer can be used to set the data type of the resulting value. For example, the following measure will display the value in hours, minutes and seconds rather than a number.

{
  "name": "Time Difference",
  "aggregation": {
    "type": "FORMULA",
    "formula": "m1 - m0"
  },
  "valueProducer": {
    "type": "STATEMENT_PROPERTY",
    "valueType": "duration"
  }
}

Formulas with null values

If any the measures you are using in a formula return a null value, then the default behavior of the formula is to also return null. If you would rather the formula treated null values as 0 values, then you can use the treatMissingAsZero flag.

The example below shows an overall score calculated based on points scored and points available for two different assessments of which the learner may have completed either one or both.

{
  "name": "Overall (%)",
  "aggregation": {
    "type": "FORMULA",
    "formula": "round((m3 + m4) / (m5 + m6) * 10000) /100",
    "treatMissingAsZero": true
  },
  "valueProducer": {}
}

Hidden Measures

Where measures used in calculations are not intended to be shown on the report, the hiddenMeasures property can be used to hide those measures. This property is currently only supported on the leaderboard, heatmap, line, bar and range reports. The hiddenMeasures property is a direct property of the report configuration object and contains a list of the indexes of measures that should not be displayed. For example the following config would hide the first and third measures:

"hiddenMeasures": [0, 2]

Here is a complete report configuration using formula measures. It shows the number of interactions per person each week.

{
  "filter": {},
  "hiddenMeasures": [0, 1],
  "dimensions": [
    {
      "type": "TIME",
      "timePeriod": "WEEK"
    }
  ],
  "measures": [
    {
      "name": "Interaction Count",
      "aggregation": {
        "type": "COUNT"
      },
      "valueProducer": {
        "type": "STATEMENT_PROPERTY",
        "statementProperty": "id"
      }
    },
    {
      "name": "Person Count",
      "aggregation": {
        "type": "DISTINCT_COUNT"
      },
      "valueProducer": {
        "type": "STATEMENT_PROPERTY",
        "statementProperty": "actor.person.id"
      }
    },
    {
      "name": "Interactions per Person",
      "aggregation": {
        "type": "FORMULA",
        "formula": "m0 / m1"
      },
      "valueProducer": {}
    }
  ]
}

Group Count

The Group Count aggregation counts the number of people in a group, ignoring any filters. This aggregation only make sense for reports organized by group and uses a statement property value producer of actor.person.id.

{ 
  "name": "Group Population", 
  "aggregation": { 
    "type": "GROUP_COUNT" 
  }, 
  "valueProducer": { 
    "type": "STATEMENT_PROPERTY", 
    "statementProperty": "actor.person.id"
  }
}

Group Percent

The Group Percent aggregation gives the total number of people matching a filter divided by the number of people in the group, displayed as a percentage. This aggregation only make sense for reports organized by group and uses a statement property value producer of actor.person.id or context.instructor.person.id. For example, this can be used to find the percentage of people in a group who have taken a specific course. See the following example:

{ 
  "name": "Percent of People", 
  "aggregation": { 
    "type": "GROUP_PERCENT" 
  }, 
  "valueProducer": { 
    "type": "STATEMENT_PROPERTY", 
    "statementProperty": "actor.person.id"
  },
  "filter": {
    "activityIds": {
      "ids": [
        "http:/example.com/courses/some-course"
      ]
    }
  }
}

Last Between

The Last Between aggregation works in exactly the same way as the Last aggregation except that it only returns a value if the last value is with a particular date range. This is useful for identifying events that have happened most recently within a date range and not more recently, for example people who have completed a particular compliance course in the last year, but have not retaken it in the last 11 months.

{
  "name": "Upcoming Expiration",
  "aggregation": {
    "type": "LAST_BETWEEN",
    "from": "P365D",
    "to": "P334D"
  },
  "valueProducer": {
    "type": "STATEMENT_PROPERTY",
    "statementProperty": "timestamp"
  },
  "filter": {
    "equals": [
      {
        "fieldName": "result.completion",
        "values": {
          "ids": [
            "true"
          ]
        }
      }
    ]
  }
}

Please note: You can use the LAST aggregation with a trailing date filter to get people who have taken the course in the last 11 months.

Expired

The Expired aggregation is used to identify people who have never completed something, or have completed it prior to a certain date. Again, this aggregation is useful for tracking compliance. The aggregation will return "never" if people have no statements matching the measure's filter. It will return the value of the most recent statement prior to the configured date if they do have matching statements only prior to that date. It will return nothing if they have matching statements after the configured date.

Putting this into the context of compliance, this measure would return nothing for people who are compliant, the date when they last became compliant if they were compliant but aren't any longer, or 'never' if they have never been compliant.

{
  "name": "Not Completed",
  "aggregation": {
    "type": "EXPIRED",
    "date": "P1Y"
  },
  "valueProducer": {
    "type": "STATEMENT_PROPERTY",
    "statementProperty": "timestamp"
  },
  "filter": {
    "equals": [
      {
        "fieldName": "result.completion",
        "values": {
          "ids": [
            "true"
          ]
        }
      }
    ]
  }
}

Last Value Count

The Last Value Count aggregation is used to count the number of distinct items for which the most recent value of a statement property matches a configured value. For example, this can be used to count the number of people who's most recently reported status for a course was 'registered' (i.e. only people who registered but have yet to complete the course). See the following example:

{
  "name": "Number of people registered for Some Course",
  "aggregation": {
    "type": "LAST_VALUE_COUNT",
    "lastValueProperty": "verb.id",
    "testValue": "http://adlnet.gov/expapi/verbs/registered",
    "regExp": false
  },
  "valueProducer": {
    "type": "STATEMENT_PROPERTY",
    "statementProperty": "actors.person.id"
  },
  "filter":{
    "activityIds": {
      "ids": [
        "http:/example.com/courses/some-course"
      ]
    }
  }
}

The aggregation also supports a testValues property instead of testValue so that you can specify an array of values to match instead of a single value.

Accumulation

The Accumulation aggregation is used to look at data for a filtered set of statements before and after an xAPI statement matching a filter. For example, it might be used to get the average or total sales made by a salesperson in the 6 months before or 6 months after completing a training course. This measure works especially well in pairs of measures, one for before and one for after an event displayed in a Range report. The aggregation works by defining an event filter and before or after period in which to filter statements, and an aggregation and value producer to aggegrate those statements.

Where there are multiple statements matching the event filter, the set of statements before or after that each matching statement are each aggregated separately based on the aggegration property, then resulting values are averaged to give the final value returned by the aggregation.

The event filter currently only supports activity id and verb id filters. The aggregation property supports SUM, AVG, COUNT and DISTINCT_COUNT aggregations.

The following example shows two measures

{
  "name": "Score Before Training", 
  "aggregation": { 
    "type": "ACCUMULATION",
    "eventFilter": {
      "activityIds": {
        "ids": [
          "http:/example.com/courses/some-course"
        ]
      },
      "verbIds": {
        "ids": [
          "http://adlnet.gov/expapi/verbs/completed"
        ]
      }
    },
    "aggregation": "SUM",
    "before": "P6M"
  },
  "valueProducer": {
    "type": "STATEMENT_PROPERTY",
    "statementProperty": "result.score.raw"
  }
},{
  "name": "Score After Training", 
  "aggregation": { 
    "type": "ACCUMULATION",
    "eventFilter": {
      "activityIds": {
        "ids": [
          "http:/example.com/courses/some-course"
        ]
      },
      "verbIds": {
        "ids": [
          "http://adlnet.gov/expapi/verbs/completed"
        ]
      }
    },
    "aggregation": "SUM",
    "after": "P6M"
  },
  "valueProducer": {
    "type": "STATEMENT_PROPERTY",
    "statementProperty": "result.score.raw"
  }
}

Distinct Counts

Aggregation precision threshold

Computing exact counts on data sets can be a slow and memory intensive process that can utilise an impractically large amount of memory and take a long periods of time to execute (this is known as the count-distinct problem). Because of this, by default Watershed uses an algorithm which calculates a very accurate estimation when using the distinct count, set and group percentage aggregations. This estimation is accurate to around 1% for data sets around 10,000 objects. This can be changed to use a true distinct count (see below)

Using advanced configuration you can set this value for each supported measure aggregation individually to be anywhere between 0 and 10,000 allowing you to trade memory and execution speed for accuracy (and vice versa).

{ 
 "name": "Precision threshold demo", 
 "aggregation": { 
  "type": "DISTINCT_COUNT",
  "precisionThreshold": 10000
 }, 
 "valueProducer": { 
  "type":"STATEMENT_PROPERTY", 
  "statementProperty": "actor.person.id"
 } 
}

By default the precision threshold is set to 1000, and can be set to a maximum of 10,000.

If a measures precision threshold is below the number of objects being counted a message will appear below the chart informing you which measures are estimates.

This threshold utilises a method based on the HyperLogLog++ algorithm which uses a hash function applied to each element in the multiset to obtain a multiset of uniformly distributed random numbers with the same cardinality as the original multiset. The cardinality of this randomly distributed set is then estimated to give the result. This method is utilised by most systems that dealing in large datasets to solve the count-distinct problem.

Exact Distinct Counts

In situations where calculating the exact distinct count is needed Watershed can calculate this using the exact distinct count setting. When enabled this method will apply to all measures within the report using the DISTINCT_COUNT aggregation. To enable add "exactDistinctCount": true to the top level of your report configuration.

{
 "filter": {...},
 "dimensions": [...],
 "measures": [...],
 "exactDistinctCount": true,
 "type": "leaderboard"
}

When exactDistinctCount is used it will take significantly longer to run reports due to the way this counts occur, so for the best experience it should only be when absolutely necessary

Match in Array

When data is contained in an array inside an xAPI statement it can be difficult to return the desired data inside a measure. To assist with this, we allow you to search the content of arrays and return a matching field value into the report as a measure.

Let's take the statement below as an example:

"context": {
 "contextActivities": {
 "parent": [
   {
    "objectType": "Activity",
    "id": "https://hoola.edcast.com/channel/1234",
    "definition": {
     "name": {
      "und": "Machine Learning & Artificial Intelligence"
     },
     "description": {
      "und": "Machine Learning & Artificial Intelligence group."
     },
     "type": "https://edcast.com/xapi/activity/channel"
    }
   },
   {
    "objectType": "Activity",
    "id": "https://hoola.edcast.com/insights/5678",
    "definition": {
     "name": {
      "und": "Advanced technical skills"
     },
     "description": {
      "und": "Advanced technical skills pathway."
     },
     "type": "https://edcast.com/xapi/activity/channel"
    }
   }
  ]
 }
}

If you wanted to get the context.contextActivities.parent.id matching the context.contextActivities.parent.definition.type of https://edcast.com/xapi/activity/channel. You would need to use the below measure configuration:

{
 "name": "Source Array Measure",
 "aggregation": {
   "type": "LAST"
 },
 "valueProducer": {
   "type": "MATCH_IN_ARRAY",
   "arrayField": "context.contextActivities.parent",
   "valueField": "id",
   "matchField": "definition.type",
   "matchPattern": "https://edcast.com/xapi/activity/channel"
  }
}

The table below lists some of the possible operations.

Field	Description	Example
arrayField	Defines the array to search, in dot notation	context.contextActivities.grouping
valueField	The field (from the array) that is returned in the measure	definition.name.und
matchField	The field that should be searched for in the array	definition.type
matchPattern	The pattern that should be searched for int he matchField. This can be a regex.	http://id.tincanapi.com/activitytype/source

One very useful way to use the match_in_array valueProducer is to search the actors.person.personas array and return a user ID, from a specific learner homepage:

{
 "name": "Learner ID",
 "aggregation": {
  "type": "ANY"
 },
 "valueProducer": {
  "type": "MATCH_IN_ARRAY",
  "arrayField": "actor.person.personas",
  "valueField": "account.name",
  "matchField": "account.homePage",
  "matchPattern": ".*watershedlrs.com.*"
 }
}

Please note: Based on the underlying data structure, Match-in-Array measures do not work for count-based aggregations (i.e. COUNT, DISTINCT_COUNT, SET, LIST)

Group of Type

The Group of Type value producer displays the Watershed group that a person (or parent group) belongs to, based on a configured group type. This works best with leaderboard reports organized by person to display additional metadata about the person. It can also work with reports organized by group to show the parent group(s) of the group type used to organize the report.

The configuration below shows a measure that will display the name of the 'division' group that a person belongs to. Note that if people belong to multiple groups of the configured type, only one of those groups will be displayed.

{ 
 "name": "Division", 
 "aggregation": { 
   "type": "ANY" 
 }, 
 "valueProducer": { 
   "type": "GROUP_OF_TYPE", 
   "groupType": "Division" 
 } 
}

By default the group_of_type measure will report on all groups of that type a learner is in even if they are not a direct members. You can also configure the measure to only show groups they are a direct member of by using the "directGroupsOnly": true flag.

{ 
 "name": "Division", 
 "aggregation": { 
   "type": "ANY" 
 }, 
 "valueProducer": { 
   "type": "GROUP_OF_TYPE", 
   "groupType": "Division" 
 },
 "directGroupsOnly": true
}

Time Between

The time between value producer looks at the time between two events, for example the time between starting one task and getting to another one. It uses a start filter and an end filter to find pairs of statements, then calculates the time difference between the two. Where multiple pairs of statements matching the start and end filter are found, the measure will return the average time between.

The measure will match the closest pairs and not include overlapping matches. So in the sequence of statement start1 start2 end1 end2 start3 end3, it will average the time between start2 and end1, and between start3 and end3. start1 will be ignored because there is no matching end statement before the next matching start statement. end2 will be ignored because there is no matching start statement after the last matching end statement.

Here's the syntax:

{
  "name": "Most Recent Time Between Activity 1 and Activity 2",
  "aggregation": {
    "type": "TIME_BETWEEN",
    "ignoreRegistrations": true
  },
  "valueProducer": {
    "type": "TIME_BETWEEN",
    "startFilter": {
      "activityIds": {
        "ids": [
          "http://example.com/expapi/activities/example-1"
        ]
      }
    },
    "endFilter": {
      "activityIds": {
        "ids": [
          "http:/example.com/expapi/activities/example-2"
        ]
      }
    }
  }
}

A parameter of "ignoreRegistrations": false can be added to the Time Between measure so that only the statements with the same registration are respected in the measure.

Display Activity And Verb Names (instead of DisplayID)

Activities and verbs have both IDs and display names. Since the names are a language map, the actual field to retrieve the name could be different on each ids. For example, one verb’s display name could be under the field verb.display.en, while another could be under verb.display.en-us. This can be a problem if you want to display these names interchangeably in a report. To get around this, Watershed has the ability to translate measures using activity, verb, person, or group IDs to their names, using the displayId field under valueProducer. Note: This feature already works for dimensions by default.

To enable this feature, add "measuresUseDisplayId": true to the top level of your report configuration.

{
 "filter": {...},
 "dimensions": [...],
 "measures": [...],
 "measuresUseDisplayId": true,
 "type": "leaderboard"
}

For any measures that you want to translate the ID to a display name, make sure the statementProperty points to an id, and that displayId is false:

{
 "name": "Activity Name",
 "aggregation": {
 "type": "ANY"
 },
 "valueProducer": {
 "type": "STATEMENT_PROPERTY",
 "statementProperty": "object.id",
 "displayId": false
 }
}

If you want to display the ID instead, set displayId to true.

Measures Based on cmi.response

If your measure has data from a cmi.response in it, that follows the standard laid out in Appendix C of the xAPI specification and uses a question type of choice or likert Watershed can translate the response from the option the user selected, to the human readable option from the statement.

Without translate response

Screenshot_2019-10-11_at_15.25.52.png
With translate response

To use translate response you need to add

"translateResponse": true

to the measure configuration.

{
  "name": "Choice",
  "aggregation": {
    "type": "LAST"
  },
  "valueProducer": {
    "type": "STATEMENT_PROPERTY",
    "statementProperty": "result.response",
    "displayId": false
  },
  "filter": {
    "activityIds": {
      "ids": [
        "http://example.csat.assessment.example.com/questions/6"
      ]
    }
  },
  "translateResponse": true
}

Combine Multiple Measure Values into a Single Measure

Sometimes, the value from a single aggregation in a measure does not get the desired point across. One such example is the desire to show a raw score along with the percentage in a single measure. In situations like this, you can combine multiple displays together in a single measure. To do this, first you will want to create the individual measures (m0, m1) that you want to combine into a single measure. Then, you will create a formula measure that references each. The formula parameter will be a comma-delimited list of the measures in question:

"formula": "m0,m1"

While the desired format of the combined measure will go in the outputPattern inside of the valueProducer:

 "outputPattern": "%1s (%2s%%)"

The outputPattern formats strings using Java format tags. Here is a quick guide to this formatting. Putting this together into a single measure yields the following:

{
  "name": "Raw Score (Percentage)",
  "aggregation": {
    "type": "FORMULA",
    "formula": "m0,m1"
  },
  "valueProducer": {
    "type": "STATEMENT_PROPERTY",
    "statementProperty": "none",
    "outputPattern": "%1s (%2s%%)"
  },
  "filter": {},
  "isTimeline": false
}

Modify Measure Output

Most report measures in Watershed will either directly show a statement property or aggregate those values in some way (e.g. Count, Sum, etc.). However, there are times where a measure should instead show a modified output of a statement property. In situations like this, the undesired portions of the statement property can be identified with a regular expression and be replaced with a new desired value. For this, a display parameter can be added to the valueProducer portion of the measure where regex and replace values can be specified.

Please note: The regex property will match on substrings of the statement property so the whole string does not need to fully match the regex and the regex can match multiple times for a single statement property.

An example use-case for this feature would be if the activity type on statements is showing a full URL, but you want to show only the last portion of the URL after the last forward slash. For this example, you will want to find everything before the slash in the regex and replace it with a blank string.

{
  "name": "Activity Type Formatted",
  "aggregation": {
    "type": "LAST"
  },
  "valueProducer": {
    "type": "STATEMENT_PROPERTY",
    "statementProperty": "object.definition.type",
    "display": {
      "ignoreCase": false,
      "regex": ".*/",
      "replace": ""
    }
  },
  "isTimeline": false
}

Please note: If you want to replace the whole statement property with a string, you can use a regular expression of .+ as shown below

{
  "name": "Replace Statement ID with View",
  "aggregation": {
    "type": "ANY"
  },
  "valueProducer": {
    "type": "STATEMENT_PROPERTY",
    "statementProperty": "id",
    "display": {
      "ignoreCase": false,
      "regex": ".+",
      "replace": "View"
      }
    },
    "isTimeline": false
}

Measure Filters

You can configure simple measure filters in simple measure configuration. See the filters section of the advanced configuration help guide for details of how to customize these simple filters for more complex requirements.

Value types

Format

Additional Aggregations

Any

Formula

Formulas with null values

Hidden Measures

Group Count

Group Percent

Last Between

Expired

Last Value Count

Accumulation

Distinct Counts

Aggregation precision threshold

Exact Distinct Counts

Match in Array

Group of Type

Time Between

Display Activity And Verb Names (instead of DisplayID)

Measures Based on cmi.response

Combine Multiple Measure Values into a Single Measure

Modify Measure Output

Measure Filters

Related articles