What are Advanced Data Filters?

Advanced Data Filters are the JSON filters used in Advanced Configuration to choose what data should be included in reports. They're also used in Data Permissions to choose what data a user can see.

Getting Started with Advanced Data Filters

The easiest way to get started with Advanced Data Filters is to create a report using Simple Configuration in the report builder and then switching to Advanced Configuration to see how the JSON has been created and formatted.

In Simple Configuration, use the main filters (Activities, Verbs, Dates, and People) to create a report. These filters chosen are used to populate the filter property of the report configuration object in Advanced Configuration. For example, a filter that includes an activity, person and date range might look like this:

"filter": {
  "activityIds": {
    "ids": [
      "https://twitter.com/6209872/status/65472397"
    ],
    "regExp": false
  },
  "personCustomIds": [
    "alberta.dyonnaire@example.com"
  ],
  "groupCustomIds": null,
  "dateFilter": {
    "dateType": "trailing",
    "trailingAmount": "6",
    "trailingType": "months",
    "customDateFrom": null,
    "customDateTo": null
  }
}

Advanced Data Filter Options

There are some additional filters and properties that are not accessible in Simple Configuration.

Filter by child groups

childGroupsOfCustomIds will filter all groups that are direct children of the selected group. This is useful where a group has a large number of direct children that you want to compare. For example if an organization contains 200 departments, you can use the childGroupsOfCustomIds filter with the name of the organization group that contains those departments. The report will then display all departments within the organization.

"childGroupsOfCustomIds": [
  "Company"
]

Filter by group type

The group type filter enables you to filter for groups of a particular type, for such as departments or teams. It filters out any people that do not belong to a group of the configured types, and when the report is organized by group, ensures that only groups of the configured types are shown.

"groupTypeNames": [
  "team"
]

Please note: group type names displayed in Settings / Your Organization are all plural (e.g. "teams", "departments"). If you are copying the group type name from that page, you need to remove the 's' before using it in this filter.

Filter by group without adding that group to the list of items

When a report is organized by group, the default behavior is to list all groups included in the group filter to the list of items. But what if you want to filter a report by a group without adding that group to the list of items? For example, what if you want to report on data by job role, but only show data for a particular region? In this case, you can use the excludeFromOutput flag. For example:

"groupTypeNames": [
  "Job Role"
],
"and": [
  {
    "groupCustomIds": [
      "Sales"
    ],
    "excludeFromOutput": true
  },
  {
    "groupCustomIds": [
      "Operations"
    ],
    "excludeFromOutput": false
  }
]

This will show every job role, plus group id 456 in the list of items. It will only show data for people who belong to one of those groups and to group id 123. Group 123 will not appear in the list of items.

Filter people by persona

The actorIds property enables you to filter by the email address or account identifier used in xAPI statements.

Please note: This filter only looks at the identifiers used directly in the xAPI statements and will not match statements using other personas connected to the same person.

For email identifiers, the value is mbox[,]mailto: followed by the email address to be filtered, for example:

"actorIds": [
  "mbox[,]mailto:test@example.com"
]

For account identifiers, the value is account[,] followed by the account homepage, followed by [:], followed by the account name. For example:

"actorIds": [
  "account[,]https://example.com[:]12345"
]

Lists of personas of the same or different types can be included in the filter together. For example:

"actorIds": [
  "mbox[,]mailto:test@example.com",
  "mbox[,]mailto:example@xapi.com",
  "account[,]https://example.com[:]12345",
  "account[,]https://watershedlrs.com[:]67890"
]

Filter to logged in user

By using a value of -1 in the personIds field you can filter the report to the logged in user. This is helpful if you want to create a report showing the current user's activity.

 "personIds": [
  -1
]

More Advanced Dynamic Filters Based on Logged in User

If a dynamic filter is needed based on the logged in user (other than just filtering to the logged in user's data), it can be creating following this guide: Dynamic Filters Based on Logged-in User

Filter by verb

"verbIds": {
  "ids": [
    "http://id.tincanapi.com/verb/viewed"
  ],
  "regExp": false
}

Filter by context activity

It's also possible to filter by context activities using the 'parentActivityIds', 'groupingActivityIds', and ‘contextActivityIds ‘ properties. For example:

"contextActivityIds": {
  "ids": [
    "https://twitter.com/"
  ],
  "regExp": false
}

Please note: the contextActivityIds filter property will match activity ids whichever collection of context activities they are found in (parent, grouping, category or other).

Filter by data source

The equals filter can also be used to filter by data source. In this case the fieldName is authority.account.name and the id is the key used to connect the data source.

"equals": [{
  "fieldName": "authority.account.name",
  "values": {
    "ids": [
      "09baf07cc7d98f"
    ],
    "regExp": false
  }
}]

Filter to include only dates older than a certain time period

The olderThan date type is the opposite of the trailing date type. You can use this in the Advanced Configuration to retrieve statements that are older than a certain number of days, weeks, months, or years. For example, the following filter would retrieve all statements that were stored more than 5 days ago:

"dateFilter": {
    "dateType": "older_than",
    "olderThanAmount": 5,
    "olderThanType": "days",  
    "fieldName": "stored"
}

Filter by dates in fields other than timestamp

By default, the date filter is applied to the 'timestamp' property that represents when the interaction happened. You can also filter by dates in other statement properties (such as 'stored' and extensions) using the 'fieldName' property in a date filter.

"dateFilter": {
    "dateType": "trailing",
    "trailingAmount": "6",
    "trailingType": "months",
    "fieldName": "stored"
}

The 'timestamp' and 'stored' properties should always be in the past, but dates in extensions (such as deadlines or due dates) might include future dates. To filter dates since (or until) now or since the start of today, you can use NOW or TODAY with custom date types. This example will filter all statements with a future deadline:

 "dateFilter": {
    "dateType": "custom",
    "customDateFrom": "NOW",
    "fieldName": "result.extensions.[https://netdimensions.com/xapi/extensions/result/transcript-detail].TranscriptDetail-Deadline"
  }

Filter using durations

The Watershed date filter allows you to use ISO 8601 durations as parameters for custom date filters. These durations can be combined with both dates, and NOW and TODAY. This for example will filter all dates in the week before the day its run.

"dateFilter": {
    "dateType": "custom",
    "customDateFrom": "-P1W",
    "customDateTo": "TODAY",
    "fieldName": "timestamp"
}

This one will filter all dates in the hour before midnight on the 1st Jan 2020.

"dateFilter": {
    "dateType": "custom",
    "customDateFrom": "-P1H",
    "customDateTo": "2020-01-01T00:00:00.000"",
    "fieldName": "timestamp"
}

Hint: To filter by multiple date based properties, use an 'and' filter. See below.

Filter by any statement property

In fact, it’s possible to filter by any statement property using the required and equals filter properties.

Use required to filter all statements that have a certain property with any value. For example if you are reporting on score data, you might want to filter only statements that contain a score:

"required": "result.score.scaled"

Use equals to specify that a certain value is required, for example perhaps you filter only accounts with a specific account:

"equals": [{
  "fieldName": "actor.account.homePage",
  "values": {
    "ids": [
      "http://watershedlrs.com"
    ],
    "regExp": false
  }
}]

When equals is used with a numerical value, you should use the fieldType property to specify that the field type is a number. For example if you wanted to filter all statements with a raw score of 5, you would use the following syntax:

"equals": [{
  "fieldName": "result.score.raw",
  "fieldType": "number",
  "values": {
    "ids": [
      5
    ],
    "regExp": false
  }
}]

When using filters with extensions, you may also need the fieldType property to tell Watershed what type of value you are filtering by. Possible values are string, number, boolean, null or array.

Array-based properties are:

string_array for arrays with string values
number_array for arrays with number values
boolean_array for arrays with boolean values
array_array for arrays of arrays
null_array for arrays with null values

For example, consider the following result extension:

"https://example.com/fruits" : [
  "apple",
  "pear",
  "orange", 
  "bear"
]

To filter just statements containing apples, you could use the following filter:

"equals": [{
  "fieldName": "result.extensions.[https://example.com/fruits]",
  "fieldType": "string_array",
  "values": {
    "ids": [
      "apple"
    ],
    "regExp": false
  }
}]

If your extension includes an array of objects and you want to filter by a property of those objects, you need to tell Watershed about the array using the __arr__*__ syntax. Where * changes depending on the values expected in the array:

__arr__str__ - String values
__arr__num__ - Number values
__arr__bool__ - Boolean values
__arr__obj__ - Object values (array of objects)
__arr__arr__ - Array values (array within an array)

For example, let's say you have a context extension with the following value:

"https://example.com/shopping-list" : {
  "fruits": [
    {
      "type": "apple",
      "color": "green"
    },
    {
      "type": "apple",
      "color": "red"
    }
  ]
}

To filter just statements containing green fruit, you'd use the following filter:

"equals": [{
  "fieldName": "context.extensions.[https://example.com/shopping-list].fruits.__arr__obj__.color",
  "fieldType": "string",
  "values": {
    "ids": [
      "green"
    ],
    "regExp": false
  }
}]

Please note: We're using __arr__obj__ because we're expecting fruits to be an array of objects, but our fieldType is string because color is a string value.

Filter by data ranges

The range filter can be used to filter statements who's data that sits inside a specified range of values. The fieldName and fieldType are the same as when filtering by any statement property. The from and to options set the span of the range.

"range": [{
   "fieldName": "context.extensions.[https://w3id.org/xapi/cmi5/result/extensions/progress].__num__",
   "fieldType": "number",
   "from": 75,
   "to": 100,
   "includeLower": true,
   "includeUpper": true
}]

Regex

Advanced Data Filters support Regex (Regular Expressions) to filter matching activities, context activities and verbs. RegexOne is a great great reference for learning how to set up Regex. And Elastic provides more specific information about Watershed's regular expression syntax.

In Watershed there are a few frequent regular expressions that are very useful. For example, the following would filter all statements where the activity id started with “https://twitter.com/”

"activityIds": {
  "ids": [
    "https://twitter.com/.*"
  ],
  "regExp": true
}

The .* at the end of "https://twitter.com/" acts as a variable and will catch all activity IDs that begin with "https://twitter.com".

The same .* can be also applied at the beginning of a string to catch IDs that all end with the same value. The following would filter all statements where the activity id ended with “module”.

"activityIds": {
  "ids": [
    ".*module"
  ],
  "regExp": true
},

.* can also be used as a wildcard anywhere in the string, and can be used in the search string as many times as required.

Another more complex example filters all activity ids starting with "http://example.com/assessments", except those that contain the string "question" (a number of common Regex examples are used in this):

"activityIds": {
  "ids": [
    "(http://example.com/assessments.*)&~(.*question.*)"
  ],
  "regExp": true
}

You can also configure a regex filter to be case insensitive:

"activityIds": {
   "ids": ["https://example.com/assessments/v83m4463px"],
   "regExp": true,
   "ignoreCase": true
}

Sometimes, you may want to filter to specific HTML elements in order to clean up activity names or descriptions that may contain them. Unfortunately, left angle and right angle brackets (< and >) are used as special characters in our regex syntax, so you can't simply use <p>.* as your regex filter. Instead, you'll need to provide some additional syntax to escape those characters as in the below example:

"equals": [
{
     "fieldName": "object.definition.name.en-US",
     "values": {
          "regExp": true,
          "ids": [
               "\\<p\\>.*"
          ]
     },
     "exclude": false
}
]

Not all regex syntax elements are supported by Watershed. The table below lists some of the unsupported syntax elements and alternatives you can use.

Regex	What it does	Alternatives
`^`	Anchors the expression at the start of the string.	Watershed uses Regex to match the whole string only (not partial strings) so anchors are not required.
`$`	Anchors the expression at the end of the string.	Watershed uses Regex to match the whole string only (not partial strings) so anchors are not required.
`?:`	Non-matching group. Used to include expressions relating to a part of the target string that are not intended to be matched.	Watershed uses Regex to match the whole string only (not partial strings) so non-matching groups are not required.
`/d`	Matches any numerical character.	Use `[0-9]` instead.

If you are not familiar with regex, please speak to us for help with your activity filter.

Please note: Regex filters are case sensitive unless specified within the regex otherwise.

People Filters

People filters allow you to filter based on people rather than filtering based on statement properties.

Please note: A people filter has a limit of 15,000 people. If the filter returns more than this, a warning will be displayed on the report.

Filter people by interactions

The people filter enables you to filter a population of people based on their interactions. For example, you might filter all people who have completed a particular course:

"peopleFilter": {
  "activityIds": {
    "ids": ["http://leadership.assessment.example.com"]
  }
}

You can also use multiple activityIds and verbIds in the people filter. Setting the matchAllCombinations property will filter a population of people to only people that match every single activity and verb combination. For example, you can filter all people who have launched and completed 2 specific courses:

"peopleFilter": {
  "matchAllCombinations": true,
  "activityIds": {
    "ids": ["http://leadership.assessment.example.com", "http://example.csat.assessment.example.com"]
  },
  "verbIds": {
    "ids": ["http://adlnet.gov/expapi/verbs/launched", "http://adlnet.gov/expapi/verbs/completed"]
  }
}

Filter people by measure values

The measure filter is a special type of people filter that enables you to filter a population of people based on the value of a particular measure. For example, include only people with an average score above a certain value, include only people who logged in less than a certain number of times in the last week, or exclude the highest and/or lowest performers by a given metric to exclude outliers.

A simple measure filter is shown below. In this case, the filter will include only people whose most recent score was 100%.

Please note: In practice, the measure filter is likely to be more complex and filter only people whose most recent score was 100% in a particular assessment. The filter below will look at all scores from all sources.

"peopleFilter": {
  "measureFilter": {
    "measure": {
      "name": "Last Score",
      "aggregation": {
        "type": "LAST"
      },
      "valueProducer": {
        "type": "STATEMENT_PROPERTY",
        "statementProperty": "result.score.scaled"
      }
    },
    "equals": {
      "values": {
        "ids": [1.0]
      }
    }
  }
}

The measure filter contains two properties: measure and one of either equals, range, or percentileRange. The measure property contains configuration for the measure to compare against. The definition of measures is explained in the Measure Editor help guide.

The equals property defines a value or list of values to match against. This uses the same syntax as the equals filter outlined below. See an example of an equals measure filter above.

The range property defines a range of values to compare against. For example, you might want to filter people who scored on average between 50% and 75%. The includeUpper and includeLower properties defines if this is applied inclusively, so in this example a score of 75% exactly will be included.

"peopleFilter": {
  "measureFilter": {
    "measure": {
      "name": "Average Score",
      "aggregation": {
        "type": "AVERAGE"
      },
      "valueProducer": {
        "type": "STATEMENT_PROPERTY",
        "statementProperty": "result.score.scaled"
      },
      "id": 103
    },
    "range": [{
      "from": 0.50,
      "to": 0.75,
      "includeUpper": true,
      "includeLower": true
    }]
  }
}

Please note: the range filter is applied before values are rounded for display, so the range filter may filter out some values that you don’t expect it to. In the example above, an average score of 75.01% might be displayed in a report as 75% but would still be excluded by this measure filter.

The percentileRange property is used to exclude people with the top and bottom values for a given metric. This can be used if you have a few exceptionally high and low performers who are skewing the average results. The config below will exclude the top and bottom 1% in terms of points scored, so if people score between 0 and 10,000 points, this config would include only people who scored between 100 and 9,900. If the config was changed to make includeLower and includeUpper false, it would show people who scored between 101 and 9,899.

"peopleFilter": {
  "includeParentFilter": true,
  "measureFilter": {
    "measure": {
      "name": "Points Scored",
      "aggregation": {
        "type": "SUM"
      },
      "valueProducer": {
        "type": "STATEMENT_PROPERTY",
        "statementProperty": "result.score.raw"
      }
    },
    "percentileRange": [
      {
        "from": 1,
        "to": 99,
        "includeUpper": true,
        "includeLower": true
      }
    ]
  }
}

Include people without Statements

Setting this field to true will include all people who match the report filter, regardless of whether they have any statement data. This is useful when you want to see which or how many people in a group have not done something. This property will only work with dimensions of actors.person.id or actors.groups.id.

"includePeopleWithoutStatements": true

And, or and not

By default, filter properties are added together so that if you include a verb filter and an activity id filter, statements must match both that verb and activity id. On the other hand, lists are interpreted as an or filter, so if you specify a list of verb ids, statements matching any verb on the list are included. And, or and not filters give you the power to change that and, for example, match statements that either use a particular verb or a particular activity id.

You can combine multiple filter properties using and, or and not to craft very complex filters. The following contrived example includes all three properties nested together.

"or": [
  {
    "verbIds": {
      "ids": [
        "http://id.tincanapi.com/verb/tweeted"
      ],
      "regExp": false
    }
  },
  {
    "and": [
      {
        "contextActivityIds": {
          "ids": [
            "https://twitter.com/"
          ],
          "regExp": false
        }
      },
      {
        "equals": [{
          "fieldName": "object.definition.type",
          "values": {
            "ids": [
              "http://id.tincanapi.com/activitytype/tweet"
            ],
            "regExp": false
          }
        }]
      }
    ],
    "not": {
      "required": "actor.mbox"
    }
  }
]

In the example above, either the verb must be ‘tweeted’ or twitter must be a context activity and the activity type must be a tweet, but the actor must not be identified by email.

Please note: and and or contain arrays of filter objects, whereas not contains a single filter object.

In the example below, the last 2 months are filtered, then the last 1 month is removed giving only statements from 'last month':

"filter": {
  "dateFilter": {
    "dateType": "trailing",
    "trailingAmount": "2",
    "trailingType": "months"
  },
  "not" :{
    "dateFilter": {
      "dateType": "trailing",
      "trailingAmount": "1",
      "trailingType": "months"
    }
  }
}

And, or and not filters are applied in addition to other properties in the filter. This means that the below filter would include statements where the actor is 'alberta.dyonnaire@example.com' and either the verb is 'completed' or the activity id 'http://example.com'.

"filter": {
  "personCustomIds": [
    "alberta.dyonnaire@example.com"
  ]
  "or": [
    {
      "activityIds": {
        "ids": [
          "http://example.com"
        ],
        "regExp": false
      }
    },
    {
      "verbIds": {
        "ids": [
          "http://adlnet.gov/expapi/verbs/completed"
        ],
        "regExp": false
      }
    }
  ]
}

Using Advanced Data Filters in Data Permissions

The same filters that are used in Advanced Configuration can be used when setting up a Watershed user's Data Permissions. We recommend using the report builder to set up the filters first, and then bringing the filters into the user's Data Permissions.

Advanced Data Filters

What are Advanced Data Filters?

Getting Started with Advanced Data Filters

Advanced Data Filter Options

Filter by child groups

Filter by group type

Filter by group without adding that group to the list of items

Filter people by persona

Filter to logged in user

More Advanced Dynamic Filters Based on Logged in User

Filter by verb

Filter by context activity

Filter by data source

Filter to include only dates older than a certain time period

Filter by dates in fields other than timestamp

Filter using durations

Filter by any statement property

Filter by data ranges

Regex

People Filters

Filter people by interactions

Filter people by measure values

Include people without Statements

And, or and not

Using Advanced Data Filters in Data Permissions

Related articles