Subset formulas in Omniscope: tutorial with examples

Modified on Fri, 3 Jul at 11:28 AM

Omniscope allows you to perform calculations in multiple locations, as discussed in this article, however the context and exact application might affect the syntax or the results.

in the workflow, using the Field Organiser block
on the individual report level, in the Report data sources
in the Content and KPI view, in the Report area
on a chart level, allowing the use of multiple Formula measures

Subset is method of dividing the dataset into different groups. You could divide some transactional data according to the week number into Week 1, Week 2 and so on. This is the basis that enables us to perform multiple statistical functions and end up with different results for each of these subsets e.g. if we add up the sales data we might conclude that the sum of sales in week 1 is higher than the sum of sales in week 2.

a) Typical syntax contains in the first part the name of the statistical function and in the second part the subset division criteria.

SUBSET_SUM([Sales], SUBSET([Week]))

In other words - create a different sum of sales for every week in the dataset.

When applied in a tabular environment (in the Field Organiser block or Report data sources), where the data is organised in a table, result will be calculated for each record for the [Week] value on the given row - therefore will return the same result [Weekly sales] for every record that has the same value in the [Week] field - see image.

This is worth remembering when [Weekly sales] is used in subsequent analysis, to avoid double-counting!

b) Subset formulae could contain multiple 'open criteria':

SUBSET_SUM([Sales],

INTERSECTION

( SUBSET([Week]), SUBSET([Store Type])))

This will create a different result variation for each combination of Week/ Store values.

c) Below formula will return sum of all sales, for the whole dataset - no grouping here.

SUBSET_SUM([Sales])

d) We can also create more targeted conditional criteria:

SUBSET_MEAN([Sales],

INTERSECTION(
SUBSET([Traffic], 1000, ">"),

SUBSET(Store Type], "CONTROL", "=")

)

Calculate a mean for the subset of the dataset where the [Store Type] is "CONTROL" and the Traffic is higher than 1000. Note how in the syntax the text field values have to have "" around them, while you do not need to do that with the numeric fields' criteria.

This type of syntax is precisely referring to certain records according to the value in particular fields, regardless of their position in a table. This is why this type of subset calculation could be applied in all scenarios listed above, including the Content or KPI view.

When building formulas for these two views it is good to know that the underlying data query has no perception of the record, therefore relative references to 'current row value', e.g. referring to the current row week and store combination, will not work. Subset references have to be precise and define which data subset the calculus will be applied to (like in the d example).

When using SUBSET formulas to build Formula measures it is worth remembering that the chart will divide the records automatically into 'buckets' e.g. your Bar chart might have [Week] on the X axis. If you wish to use [Profit] or [Units] as your measure, there is no need to add Subset([Week]) as a subset grouping condition - the chart has already done that for you, which is convenient. That nicety comes with a limitation that the data placed in a 'Week 1 bucket' has no visibility of the data in other weeks, so you could not reference them in your formula syntax e.g. divide sales in one week with the sum of sales in all weeks.

#Calculating a difference in [Amount] for 2 consecutive periods (for the consecutive period difference calculations e.g. DoD, WoW, MoM, where [Rank date] could be replaced by the week or month number):

SUBSET_SUM([Amount], subset([Rank date]))

SUBSET_SUM([Amount], subset([Rank date],[Rank date]-1))

Formula builder interface

In all the different scenarios Omniscope will offer a helping hand in form of a 'formula builder' interface. Here you can pick a function, that comes with the syntax, which can be easily edited and items replaced using the 'Insert field' option.

It is obvious that Subset functions are just a subset of ALL the different functions you can perform in Omniscope, although you will probably be familiar with these from Excel or other packages.

Reference 1

Full list of SUBSET functions is below, although you may prefer to search for a function in our interactive Function picker. All these functions will be available in the Field Organiser block, however some of them may not appear in the Content view/ Report Data Sources/ Formula measures.

SUBSET_UNIQUESLIST(stat_field, subset)	A comma-separated list of all unique values] of a field.
SUBSET_EMPTYCOUNT(stat_field, subset)	Number of empty values] of a field.
SUBSET_NONEMPTYCOUNT(stat_field, subset)	Number of non-empty values] of a field.
SUBSET_UNIQUECOUNT(stat_field, subset)	Number of unique non-empty values] of a field.
SUBSET_FIRSTNONNULL(stat_field, subset)	The first non-null value in the original data order] of a field.
SUBSET_FIRST(stat_field, subset)	The first value (or null) in the original data order] of a field.
SUBSET_LASTNONNULL(stat_field, subset)	The last non-null value in the original data order] of a field.
SUBSET_LAST(stat_field, subset)	The last value (or null) in the original data order] of a field.
SUBSET_SINGLETON(stat_field, subset)	The single value, if there is one] of a field.
SUBSET_MAX(stat_field, subset)	Maximum value (highest or last)] of a field.
SUBSET_MEAN(stat_field, subset)	Mean (average) of values] of a field.
SUBSET_MEDIAN(stat_field, subset)	Median (middle) of values] of a field.
SUBSET_MIN(stat_field, subset)	Minimum value (lowest or first)] of a field.
SUBSET_MODE(stat_field, subset)	Most common value (first if multimodal)] of a field.
SUBSET_PRODUCT(stat_field, subset)	Product of values] of a field.
SUBSET_RANGE(stat_field, subset)	Range of values (maximum minus minimum)] of a field.
SUBSET_STDDEV(stat_field, subset)	Standard deviation of values from the mean] of a field.
SUBSET_SUM(stat_field, subset)	Sum (total) of values] of a field.

SUBSET is usually used only as an argument to another function (e.g. SUBSET_SUM), like above, to define the data groupings:
(a) all records (if no arguments are specified); or
(b) a current subset of records (if field is specified); or
(c) another subset of records (if field and field value are specified); or
(d) another subset of records, using a custom operator (if all arguments are specified).

Custom operator should be one of:

"=", "<>", ">", ">=", "<", "<=", "!=" (not equal), "contains", "does-not-contain", "starts-with", "ends-with".

Reference 2

Interactive Omniscope functions guide link

Reference 3

Several examples of formulas used in a Content view to produce some KPIs using date referencing.

Note that because of the absence of a tabular structure, referencing is done by using date functions.

#Sum of values on the last day in our time series:

SUBSET_SUM([Value], subset([Date],

SUBSET_MAX([Date])

))

#Previous day value:

SUBSET_SUM([Value], SUBSET([Date],

(DATEADD((SUBSET_MAX([Date])), -1, "day"))

))

#Sum of Values on the same day last month (compared to the latest day in the series):
SUBSET_SUM([Value], subset([Date],
DATEADD(SUBSET_MAX([Date]),(-1),"month")
))

# finding the value sum on the last day of the previous month :

SUBSET_SUM([Value], subset([Date],

LASTDAYOFMONTH(

(DATEADD(SUBSET_MAX([Date]),(-1),"month"))

)))

# you can replace the "month" argument here with "year","week" or "day". If Max date is 22 May 2022 - the formula will retrieve sum of values for the 22nd April 2022, or, in case of -1 Year, values on the 22 May 2021.

#Value sum for the latest Month - our month field is called [Month MMM-yy]:

SUBSET_SUM([Value], subset([Month MMM-yy],

SUBSET_MAX([Month MMM-yy])

))

# MoM difference (month on month) from the latest month in our datset:

SUBSET_SUM([Value], subset([Month MMM-yy],

SUBSET_MAX([Month MMM-yy])

))

SUBSET_SUM([Value], subset([Month MMM-yy],

DATEADD(SUBSET_MAX([Month MMM-yy]),-1,"month")

))

# Year to date Value sum (sum of all values for the latest year in the time series):

SUBSET_SUM([Value], subset([Year],

SUBSET_MAX([Year])

))

# Sum of values for the previous year:

SUBSET_SUM([Value], subset([Year],

DATEADD(SUBSET_MAX([Year]),-1,"year")

))

# YoY (year on year) difference (sum of Values for the Max year - previous year):

SUBSET_SUM([Value], subset([Year],

SUBSET_MAX([Year])

))

SUBSET_SUM([Value], subset([Year],

DATEADD(SUBSET_MAX([Year]),-1,"year")

))

#nDifference from a previous day in a time series where [Value] field is a cumulative

SUBSET_MAX([Value])

SUBSET_MAX([Value], SUBSET([Date],

(DATEADD((SUBSET_MAX([Date])), -1, "day"))

))

#Increase rate (difference divided by the previous period):

(SUBSET_MAX([Value])

SUBSET_MAX([Value], SUBSET([Date],

(DATEADD((SUBSET_MAX([Date])), -1, "day"))

)))

SUBSET_MAX([Value], SUBSET([Date],

(DATEADD((SUBSET_MAX([Date])), -1, "day")

#Calculate the % difference in [Price USD] between the first and the last day in the dataset

(SUBSET_SINGLETON([Price USD],

SUBSET([DATE], SUBSET_MAX([DATE])))

SUBSET_SINGLETON([Price USD],

SUBSET([DATE], SUBSET_MIN([DATE]))))

SUBSET_SINGLETON([Price USD],

SUBSET([DATE], SUBSET_MIN([DATE])))

Moving average formula example

Moving average calculation is relying on order for reference, and for this purpose you can use Currentrow() or Rank another field, such as [Date] - these formulas should be used in the Field Organiser block.

Previous Close =

SUBSET_SUM([Close], Subset([Row],[Row]-1,"="))

# establishing the previous value, with [Row] reference

It is useful to include a safeguard IF function and Intersection function:

IF([Row]<14,

null,

SUBSET_SUM([Value],

INTERSECTION(

SUBSET([Row],[Row],"<="),

SUBSET([Row],[Row]-13,">=")))

/14

)

Rank Revenue values in descending order within each Month, so the highest value will be ranked # 1. This formula relies on the Field Organiser tabular structure.

RANK([Revenue], [Revenue], false, false, Subset([Month]))

Here are formulas that are useful when evaluating and counting units/ clients in a time period, for example in a scenario when we wish to calculate a retention rate (how many clients from the beginning of a period are still present ant the end of the period).

Clients at the end of the Quarter:

SUBSET_UNIQUECOUNT([Ultimate client],

INTERSECTION(

SUBSET([Invoice Date],[Q Finish],"<=")

SUBSET([Next Renewal],[Q Finish],">=")

))

New client count in a Quarter:

SUBSET_UNIQUECOUNT([Ultimate client],

INTERSECTION(

SUBSET([Invoice Date],[Q Start],">=")

SUBSET([Invoice Date],[Q Finish],"<=")

))

Start of Quarter client count:

SUBSET_UNIQUECOUNT([Ultimate client],

INTERSECTION(

SUBSET([Invoice Date],[Q Start],"<=")

SUBSET([Next Renewal],[Q Start],">=")

))

Quarter retention rate:

(

SUBSET_UNIQUECOUNT([Ultimate client],

INTERSECTION(

SUBSET([Invoice Date],[Q Finish],"<="),

SUBSET([Next Renewal],[Q Finish],">=")

))

SUBSET_UNIQUECOUNT([Ultimate client],

INTERSECTION(

SUBSET([Invoice Date],[Q Start],">="),

SUBSET([Invoice Date],[Q Finish],"<=")

))

)

SUBSET_UNIQUECOUNT([Ultimate client],

INTERSECTION(

SUBSET([Invoice Date],[Q Start],"<="),

SUBSET([Next Renewal],[Q Start],">=")

))

Clients in Quarter :

SUBSET_UNIQUECOUNT([Ultimate client],

INTERSECTION(

SUBSET([Invoice Date],[Q Start],"<="),

SUBSET([Next Renewal],[Q Start],”>=“)))

Annual retention rate (clients at the end of the year - new clients, divided by the year start clients’ count)

(

SUBSET_UNIQUECOUNT([Ultimate client],

INTERSECTION(

SUBSET([Invoice Date],[Y Finish],"<="),

SUBSET([Next Renewal],[Y Finish],">=")

))

SUBSET_UNIQUECOUNT([Ultimate client],

INTERSECTION(

SUBSET([Invoice Date],[Y Start],">="),

SUBSET([Invoice Date],[Y Finish],"<=")

))

)

SUBSET_UNIQUECOUNT([Ultimate client],

INTERSECTION(

SUBSET([Invoice Date],[Y Start],"<="),

SUBSET([Next Renewal],[Y Start],">=")

))

Quarter end clients:

SUBSET_UNIQUECOUNT([Ultimate client],

INTERSECTION(

SUBSET([Invoice Date],[Q Finish],"<="),

SUBSET([Next Renewal],[Q Finish],">="),

SUBSET([Quarter])

))

Quarter end period clients count:

SUBSET_UNIQUECOUNT([Ultimate client],

INTERSECTION(

SUBSET([Invoice Date],[Q Finish],"<="),

SUBSET([Next Renewal],[Q Finish],">=")

)

Q1 2023 end client count:

SUBSET_UNIQUECOUNT([Ultimate client],

INTERSECTION(

SUBSET([Invoice Date],DATEVALUE("31/03/2023"),"<="),

SUBSET([Next Renewal],DATEVALUE("31/03/2023"),">=")

)

Retention rate calculation with Variables as Date End/ Date Start (works in Content view):

(

SUBSET_UNIQUECOUNT([Ultimate client],

INTERSECTION(

SUBSET([Invoice Date],[Date End],"<="),

SUBSET([Next Renewal],[Date End],">=")

))

SUBSET_UNIQUECOUNT([Ultimate client],

INTERSECTION(

SUBSET([First Invoice],[Date Start],">="),

SUBSET([First Invoice],[Date End],"<=")

))

SUBSET_UNIQUECOUNT([Ultimate client],

INTERSECTION(

SUBSET([Invoice Date],[Date Start],"<="),

SUBSET([Next Renewal],[Date Start],">=")

))

Year start client:

IF(

and(

[Invoice Date]<=[Y Start],

[Next Renewal]>=[Y Start]

)

[Ultimate client], null)

Year end client:

IF(

and(

[Invoice Date]<=[Y Finish],

[Next Renewal]>=[Y Finish]

)

[Ultimate client], null)

New client in year evaluation (‘flag field’):

IF(

and(

[Invoice Date]>=[Y Start],

[Invoice Date]<=[Y Finish]

)

[Ultimate client], null)

Is someone new client:

IF(

and(

[Invoice Date]>=[Y Start],

[Invoice Date]<=[Y Finish],

SUBSET_UNIQUECOUNT([Ultimate client],

SUBSET([Next Renewal],[Date End],"<=")

))

[Ultimate client], null)

First invoice date per client per year:

SUBSET_MIN([Invoice Date],

INTERSECTION(

subset([Ultimate client]),

subset([Year])

))

Number of clients where invoice is before Year start:

SUBSET_UNIQUECOUNT([Ultimate client],

subset([Y Start],

SUBSET_MIN([Invoice Date],

INTERSECTION(

subset([Ultimate client]),

subset([Year])

)),">=")

)

Year start client?

IF(

and(

[Invoice Date]<=[Y Start],

[Next Renewal]>=[Y Start]

)

[Ultimate client]

, null)

Year end client?

IF(

and(

[Invoice Date]<=[Y Finish],

[Next Renewal]>=[Y Finish]

)

[Ultimate client], null)

How many clients are there where first invoice date is before the year start (calculated for each year):

SUBSET_UNIQUECOUNT([Ultimate client],

subset([Y Start], SUBSET_MIN([Invoice Date],

INTERSECTION(

subset([Ultimate client]),

subset([Year])

)),">=")

)

Take a date field [Date] and convert it to text in specified format, add time 09:00, then convert back to date:

TEXTTODATE(

DATETOTEXT(

[Date],"dd/MM/yyyy")+" 09:00")

Total value of invoices per client in the last year (rolling year, 365 days period before today)

SUBSET_SUM([Invoice Value],

INTERSECTION(

Subset([Company]),

Subset([Invoice Date],

DATEADD(TODAY(), -1, "year"),">")

)

Create a flag field to mark with "True" the last Saturday in every month of every year in the dataset:

IF([Date]=SUBSET_MAX([Date],

INTERSECTION(

SUBSET([Date Month]),

SUBSET([Date yyyy]),

SUBSET([Day in Week],"Sat")

)),"True",null)

Analysing data which contains a list, where we perform calculation on the records that satisfy a condition:

SUBSET_SUM([Value],SUBSET([Basket items], "apple", "does-not-contain")

)

SUBSET_SUM([Value], subset([Basket items],"lemon","contains")

)

Nested IF statement - return result 1, 2,3 or 4, depending on the sum of Value.

IF(

SUBSET_SUM([Value])<10, 1,

SUBSET_SUM([Value])<50, 2,

SUBSET_SUM([Value])<100, 3,

)

Here is a cheat sheet with instructions on how to rank on two fields.

Rank function Syntax:

RANK(value, field, isAscending, includeNulls, dataSubset)

Ranking Spend=

RANK([Spend])

# simple Rank comparing all values in the Spend field

Rank Spend per Week=

RANK([Spend], [Spend], false, false, SUBSET([Week]))

#Ranking Spend values inside each Week

SubRank Clicks =

RANK([Clicks], [Clicks], false, false,subset([Rank Spend per Week] ))

# secondary rank field, based on the Clicks value. Meaningful when multiple rows share the same [Rank spend per Week] value

Spend per Keyword=

SUBSET_SUM([Spend], SUBSET([Keyword]))

# total Spend for each Keyword

[One]=1

# basis for running total indexing

Running total per Keyword:

RUNNINGTOTAL([One], SUBSET([Keyword]))

Rank Totals per Keyword=

IF([Running total per Keyw]>1, null,

RANK([Spend per Keyword], [Spend per Keyword], false, true, SUBSET([Running total per Keyw],1,"=")))

# ignoring the values greater than 1, rank compares only the subset sum values where [Running total per Keyw] is 1, ensuring that only one value per category is compared.