Subset formulas in Omniscope: tutorial with examples

Modified on Fri, 12 May 2023 at 04:01 PM

Omniscope allows you to perform calculations in multiple locations, as discussed in this article, however the context and exact application might affect the syntax or the results.


  • in the workflow, using the Field Organiser block
  • on the individual report level, in the Report data sources 
  • in the Content and KPI view, in the Report area
  • on a chart level, allowing the use of multiple Formula measures


Subset is method of dividing the dataset into different groups. You could divide some transactional data according to the week number into Week 1, Week 2 and so on. This is the basis that enables us to perform multiple statistical functions and end up with different results for each of these subsets e.g. if we add up the sales data we might conclude that the sum of sales in week 1 is higher than the sum of sales in week 2.


a) Typical syntax contains in the first part the name of the statistical function and in the second part the subset division criteria

SUBSET_SUM([Sales], SUBSET([Week]))

In other words - create a different sum of sales for every week in the dataset.


When applied in a tabular environment (in the Field Organiser block or Report data sources), where the data is organised in a table, result will be calculated for each record for the [Week] value on the given row - therefore will return the same result [Weekly sales] for every record that has the same value in the [Week] field - see image.

This is worth remembering when [Weekly sales] is used in subsequent analysis, to avoid double-counting! 



b) Subset formulae could contain multiple 'open criteria':

SUBSET_SUM([Sales], 

INTERSECTION

( SUBSET([Week]), SUBSET([Store Type])))

This will create a different result variation for each combination of Week/ Store values.


c) Below formula will return sum of all sales, for the whole dataset - no grouping here.

SUBSET_SUM([Sales]) 


d) We can also create more targeted conditional criteria:

SUBSET_MEAN([Sales], 

  INTERSECTION(
    SUBSET([Traffic], 1000, ">"),

    SUBSET(Store Type], "CONTROL", "=")

  )

)

Calculate a mean for the subset of the dataset where the [Store Type] is "CONTROL" and the Traffic is higher than 1000. Note how in the syntax the text field values have to have "" around them, while you do not need to do that with the numeric fields' criteria.

This type of syntax is precisely referring to certain records according to the value in particular fields, regardless of their position in a table. This is why this type of subset calculation could be applied in all scenarios listed above, including the Content or KPI view. 


When building formulas for these two views it is good to know that the underlying data query has no perception of the record, therefore relative references to 'current row value', e.g. referring to the current row week and store combination, will not work. Subset references have to be precise and define which data subset the calculus will be applied to (like in the d example).


When using SUBSET formulas to build Formula measures it is worth remembering that the chart will divide the records automatically into 'buckets' e.g. your Bar chart might have [Week] on the X axis. If you wish to use [Profit] or  [Units] as your measure, there is no need to add Subset([Week]) as a subset grouping condition - the chart has already done that for you, which is convenient. That nicety comes with a limitation that the data placed in a 'Week 1 bucket' has no visibility of the data in other weeks, so you could not reference them in your formula syntax e.g. divide sales in one week with the sum of sales in all weeks.


#Calculating a difference in [Amount] for 2 consecutive periods (for the consecutive period difference calculations e.g. DoD, WoW, MoM, where [Rank date] could be replaced by the week or month number):

SUBSET_SUM([Amount], subset([Rank date]))

-

SUBSET_SUM([Amount], subset([Rank date],[Rank date]-1))


Formula builder interface


In all the different scenarios Omniscope will offer a helping hand in form of a 'formula builder' interface.  Here you can pick a function, that comes with the syntax, which can be easily edited and items replaced using the 'Insert field' option.


It is obvious that Subset functions are just a subset of ALL the different functions you can perform in Omniscope, although you will probably be familiar with these from Excel or other packages.




Reference 1

Full list of SUBSET functions is below, although you may prefer to search for a function in our interactive Function picker. All these functions will be available in the Field Organiser block, however some of them may not appear in the Content view/ Report Data Sources/ Formula measures.

 

SUBSET_UNIQUESLIST(stat_field, subset)A comma-separated list of all unique values] of a field.
SUBSET_EMPTYCOUNT(stat_field, subset)Number of empty values] of a field.
SUBSET_NONEMPTYCOUNT(stat_field, subset)Number of non-empty values] of a field.
SUBSET_UNIQUECOUNT(stat_field, subset)Number of unique non-empty values] of a field.
SUBSET_FIRSTNONNULL(stat_field, subset)The first non-null value in the original data order] of a field.
SUBSET_FIRST(stat_field, subset)The first value (or null) in the original data order] of a field.
SUBSET_LASTNONNULL(stat_field, subset)The last non-null value in the original data order] of a field.
SUBSET_LAST(stat_field, subset)The last value (or null) in the original data order] of a field.
SUBSET_SINGLETON(stat_field, subset)The single value, if there is one] of a field.
SUBSET_MAX(stat_field, subset)Maximum value (highest or last)] of a field.
SUBSET_MEAN(stat_field, subset)Mean (average) of values] of a field.
SUBSET_MEDIAN(stat_field, subset)Median (middle) of values] of a field.
SUBSET_MIN(stat_field, subset)Minimum value (lowest or first)] of a field.
SUBSET_MODE(stat_field, subset)Most common value (first if multimodal)] of a field.
SUBSET_PRODUCT(stat_field, subset)Product of values] of a field.
SUBSET_RANGE(stat_field, subset)Range of values (maximum minus minimum)] of a field.
SUBSET_STDDEV(stat_field, subset)Standard deviation of values from the mean] of a field.
SUBSET_SUM(stat_field, subset)Sum (total) of values] of a field.



SUBSET is usually used only as an argument to another function (e.g. SUBSET_SUM), like above, to define the data groupings:
(a) all records (if no arguments are specified); or
(b) a current subset of records (if field is specified); or
(c) another subset of records (if field and field value are specified); or
(d) another subset of records, using a custom operator (if all arguments are specified).


Custom operator should be one of: 

"=", "<>", ">", ">=", "<", "<=", "!=" (not equal), "contains", "does-not-contain", "starts-with", "ends-with".


Reference 2

Interactive Omniscope functions guide link


Reference 3 

Several examples of formulas used in a Content view to produce some KPIs using date referencing.

Note that because of the absence of a tabular structure, referencing is done by using date functions. 


#Sum of values on the last day in our time series:

SUBSET_SUM([Value], subset([Date],

SUBSET_MAX([Date])

))


#Previous day value:

SUBSET_SUM([Value], SUBSET([Date],

(DATEADD((SUBSET_MAX([Date])), -1, "day"))

))


#Sum of Values on the same day last month (compared to the latest day in the series):
SUBSET_SUM([Value], subset([Date],
DATEADD(SUBSET_MAX([Date]),(-1),"month")
))



# finding the value sum on the last day of the previous month :

SUBSET_SUM([Value], subset([Date],

 LASTDAYOFMONTH(

 (DATEADD(SUBSET_MAX([Date]),(-1),"month"))

)))

# you can replace the "month" argument here with "year","week" or "day". If Max date is 22 May 2022 - the formula will retrieve sum of values for the 22nd April 2022, or, in case of -1 Year, values on the 22 May 2021.


#Value sum for the latest Month - our month field is called [Month MMM-yy]:

SUBSET_SUM([Value], subset([Month MMM-yy],

SUBSET_MAX([Month MMM-yy])

))


# MoM difference (month on month) from the latest month in our datset:

SUBSET_SUM([Value], subset([Month MMM-yy],

SUBSET_MAX([Month MMM-yy])

))

-

SUBSET_SUM([Value], subset([Month MMM-yy],

DATEADD(SUBSET_MAX([Month MMM-yy]),-1,"month")

))


# Year to date Value sum (sum of all values for the latest year in the time series):

SUBSET_SUM([Value], subset([Year],

SUBSET_MAX([Year])

))


# Sum of values for the previous year:

SUBSET_SUM([Value], subset([Year],

DATEADD(SUBSET_MAX([Year]),-1,"year")

))


# YoY (year on year) difference (sum of Values for the Max year - previous year):

SUBSET_SUM([Value], subset([Year],

SUBSET_MAX([Year])

))

-

SUBSET_SUM([Value], subset([Year],

DATEADD(SUBSET_MAX([Year]),-1,"year")

))


#nDifference from a previous day in a time series where [Value] field is a cumulative

SUBSET_MAX([Value])

-

SUBSET_MAX([Value], SUBSET([Date],

(DATEADD((SUBSET_MAX([Date])), -1, "day"))

))


#Increase rate (difference divided by the previous period):

(SUBSET_MAX([Value])

-

SUBSET_MAX([Value], SUBSET([Date],

(DATEADD((SUBSET_MAX([Date])), -1, "day"))

)))

/

SUBSET_MAX([Value], SUBSET([Date],

(DATEADD((SUBSET_MAX([Date])), -1, "day")


#Calculate the % difference in [Price USD] between the first and the last day in the dataset

(SUBSET_SINGLETON([Price USD],

 SUBSET([DATE], SUBSET_MAX([DATE]))) 

-

SUBSET_SINGLETON([Price USD],

 SUBSET([DATE], SUBSET_MIN([DATE])))) 

/

 SUBSET_SINGLETON([Price USD],

 SUBSET([DATE], SUBSET_MIN([DATE])))



Moving average formula example 

Moving average calculation is relying on order for reference, and for this purpose you can use Currentrow() or Rank another field, such as [Date] - these formulas should be used in the Field Organiser block.


Previous Close = 

SUBSET_SUM([Close], Subset([Row],[Row]-1,"="))

# establishing the previous value, with [Row] reference



It is useful to include a safeguard IF function and Intersection function:

IF([Row]<14,

null, 

SUBSET_SUM([Value],

INTERSECTION(

SUBSET([Row],[Row],"<="),

SUBSET([Row],[Row]-13,">=")))

/14

)


Rank Revenue values in descending order within each Month, so the highest value will be ranked # 1. This formula relies on the Field Organiser tabular structure.

RANK([Revenue], [Revenue], false, false, Subset([Month]))







Was this article helpful?

That’s Great!

Thank you for your feedback

Sorry! We couldn't be helpful

Thank you for your feedback

Let us know how can we improve this article!

Select atleast one of the reasons
CAPTCHA verification is required.

Feedback sent

We appreciate your effort and will try to fix the article