Achieving the Gold Standard in Marketing Measurement

Every marketing executive wants the same thing – a single source of truth for how their advertising campaigns are performing.
We spent $1 million, how much did we make in return?
It’s the sort of request that sounds reasonable, but when you get into it there’s a surprising amount of complexity in delivering that vision.
Marketing Attribution
Reverse-engineering whether your advertising caused people to buy your product is difficult because most methodologies require data we don’t have:
- We can count how many people saw or clicked on an ad, but if most of those people chose “Ask App Not to Track” when Apple showed them the notification (96% of people do), we lose that data.
- If they’re using an Adblocker (42% of people do) that’s another big hole in your tracking. If they’re in the EU and they ignore that pesky cookie banner (76% of people do) you can’t track them either.
- Even before these developments in the digital space, advertisers needed to be able to measure marketing campaigns where there are no clicks to track, for example TV ads, radio, or billboards.
- Additionally, all of these methods measure correlation, not causality; just because somebody saw your ad does not mean it was the ad that deserves all of the credit for driving them to purchase.
These issues and many others conspire to make it impossible to develop a full picture of what causes people to buy your products. Unless we manage to fit every consumer with a brain-wave monitor or get everyone living full time in the metaverse where we can track their every movement, the data we need to resolutely measure marketing effectiveness will always remain elusive.
Marketing Mix Modeling
The traditional way to solve this problem was Marketing Mix Modeling (MMM), a statistical technique that became popular in the 1970s, where you use aggregated data to correlate spikes and dips in revenue with what you spent on advertising.
For example, if on Thursday you spent nothing on advertising and made $500k in sales, then on Friday you spent $100k on ads and got $700k in sales, you can estimate that $1 spent on ads returns about $2 in sales. Of course it gets more complicated as you add in more channels and other factors like seasonality, which is why you need statistics. The more granular you get with your model, i.e. from channel level to campaign or creative tactic, the harder it is to tease apart the statistical correlations.
MMM has seen a resurgence in recent years because it works across every channel, and doesn’t require user-level data, so is privacy-friendly and unaffected by gaps in tracking data. However this increased flexibility comes at the cost of complexity – getting to an accurate model can be time-consuming and expensive, even for an experienced analyst. Teasing out correlations from the data is complex, and even then you’ve found correlation not causation – the model can still be way off, which means wasted ad spend.
The Gold Standard for Ad Measurement
As documented in a recent HBR article “A New Gold Standard for Digital Ad Measurement?”, the industry is fast converging on an accepted standard practice for solving the problems associated with MMM – calibrating your model with lift tests. This new technique for “triangulating” the truth is touted by both Google and Meta as the solution to loss of tracking data, and may finally help us deliver on the vision of a single source of truth to prove the value of advertising to executives and the finance team.
Randomized Controlled Trials (RCTs) which we’ll refer to as “experiments”, are known as the “gold standard” for proving one thing causes another. In medicine, there’s a hierarchy of evidence with RCTs at the top, beaten only by Systematic reviews and meta-analysis of multiple studies, as the highest quality source of evidence. Marketing Mix Modeling sits lower in the hierarchy, analogous to “Cohort Studies”, with digital attribution landing further down as “Case Series and reports”. For those keeping track, your boss’s opinion on what’s working should be at the very bottom.
Calibrating MMM with Experiments
Running an experiment to prove the value of your advertising campaigns, works as follows. You divide up your audience at random, for example by location (which is how they do it at Measured), and only show ads to half of them, spending $1.4m over two weeks. Those that didn’t see ads generated $3.5 million in sales, and those that saw ads made $6 million, you know that your ads have a return on ad spend of $1.8. If our marketing mix model disagrees, we know the model was wrong and can adjust it.
How you divide your audience is important. Platform lift studies use an “intent-to-treat model”, where the people who would have seen your ads normally, are randomly assigned to see them or not. However this can still suffer from data loss, because the same issues affecting tracking also affect the ability of the platforms to divide up the audience and track whether they saw ads or not. At Measured they divide based on the geographic location, because that is a more foolproof way to randomize your test and control groups, that works independently across every channel, and is robust to privacy issues.
The truth is that there’s never just one model that fits the data. The below graphic is from Meta’s Robyn, an open-source MMM package, showing 10,000 possible models from one brand’s dataset (each dot is a model). Thanks to a bit of statistical wizardry, the system optimizes towards the models that are more accurate (left - less error in making predictions) and more plausible (bottom - less difference between your current spend allocation and what the model predicts you should spend), but that still leaves you with 100 models to choose from, some with very different results.
You can’t test everything all the time, but if the model agrees with the times you did run experiments, you can trust it to advise on channels or time periods where you have no experimental data. As per a study reference in the HBR article, calibration on average corrected MMM-based return-on-ad-spend estimates by 15%, with other studies finding an average correction of 25%. For an advertiser spending $30 million a year, that means better allocation of $3.5m to $7.5m of your budget into the right channels.
The results of experiments, like the incremental geo-region tests that Measured offers, can serve as a “ground truth” for the experts building your model. If things are not lining up, they can make changes to the parameters or features of the model, or help identify anomalies or mistakes in the data cleaning and modeling process. There are three main ways to align your model with your experimental results:
- Manually compare the results of the MMM and the ad experiments to make sure they agree on what channels are driving performance.
- Use the experiment results to choose between models after an optimization process has been run, i.e. to select one of the final 100 models you get from Robyn.
- Incorporate experiment results directly into the optimization process, for example as a Bayesian prior or a constraint for the algorithm to optimize to.
The first method is simple and achievable for any MMM setup – even if you’re using a traditional vendor who does not support modern Bayesian methods, they should be able to answer why their model disagrees with the findings of your experiment. The best vendors will be glad to get this experimental data, because they care about making sure their model has a positive real world impact. The third option is more sophisticated, but is essential if you want to have a fully automated modeling solution that doesn’t drift into making poor allocation decisions.
How Measured Can Help
The team at Measured has been working for 7 years on making running experiments easier for marketers, and fully automated. Often the most forward-thinking clients use the results of those experiments in calibrating their Marketing Mix Models, something they celebrate as a great use of their technology. Meta and Google’s work in this space is welcomed, as bringing awareness to the value of experimentation to prove the value of marketing. Brands that are ready to achieve this new gold standard should talk to Measured and see how they can help.
Furthermore, one thing most aren’t aware of is that Measured has scaled that hierarchy of evidence pyramid towards a systematic review and meta-analysis of incrementality across all channels and types of client. Their technology uses the data from over 25,000 lift tests they’ve run, to be able to offer our clients an evidence-based answer to the likely incrementality of their channels, even when they haven’t run a test. This data too can also be plugged into a model as priors or constraints, providing guard rails to protect against building an inaccurate model.
If you are interested in implementing the gold standard of measurement in marketing, talk to the Measured team and I’m sure they’d be happy to help.