UX initiatives are often seen as a disruption rather than a means to
solving existing problems in an organization. In this post, we’ll
explore how you can build trust for your UX work, gain support, and make
a noticeable impact.
When I start any UX project, typically, there is very little
confidence in the successful outcome of my UX initiatives. In fact,
there is quite a lot of reluctance and hesitation, especially from teams that have been burnt by empty promises and poor delivery in the past.
Good
UX has a huge impact on business. But often, we need to build up
confidence in our upcoming UX projects. For me, an effective way to do
that is to address critical bottlenecks and uncover hidden deficiencies — the ones that affect the people I’ll be working with.
Let’s take a closer look at what this can look like.
UX Doesn’t Disrupt, It Solves Problems
Bottlenecks
are usually the most disruptive part of any company. Almost every team,
every unit, and every department has one. It’s often well-known by
employees as they complain about it, but it rarely finds its way to senior management as they are detached from daily operations.
The Iceberg of Ignorance: Sidney Yoshida discovered that leadership is usually unaware of the organization’s real problems.
The
bottleneck can be the only senior developer on the team, a broken
legacy tool, or a confusing flow that throws errors left and right —
there’s always a bottleneck, and it’s usually the reason for long waiting times, delayed delivery, and cutting corners in all the wrong places.
We might not be able to fix the bottleneck. But for a smooth flow of work, we need to ensure that non-constraint resources don’t produce more
than the constraint can handle. All processes and initiatives must be
aligned to support and maximize the efficiency of the constraint.
So
before doing any UX work, look out for things that slow down the
organization. Show that it’s not UX work that disrupts work, but it’s internal disruptions that UX can help with.
And once you’ve delivered even a tiny bit of value, you might be
surprised how quickly people will want to see more of what you have in
store for them.
The Work Is Never Just “The Work”
Meetings,
reviews, experimentation, pitching, deployment, support, updates, fixes
— unplanned work blocks other work from being completed. Exposing the root causes of unplanned work
and finding critical bottlenecks that slow down delivery is not only
the first step we need to take when we want to improve existing
workflows, but it is also a good starting point for showing the value of
UX.
The work is never just “the work.” In every project — as well as before and after it — there is a lot of invisible, and often unplanned, work going on
To learn more about the points that create friction in people’s day-to-day work, set up 1:1s with the team
and ask them what slows them down. Find a problem that affects
everyone. Perhaps too much work in progress results in late delivery and
low quality? Or lengthy meetings stealing precious time?
One frequently overlooked detail is that we can’t manage work that is invisible. That’s why it is so important that we visualize the work first. Once we know the bottleneck, we can suggest ways to improve it.
It could be to introduce 20% idle times if the workload is too high,
for example, or to make meetings slightly shorter to make room for other
work.
The Theory Of Constraints
The idea that the work is
never just “the work” is deeply connected to the Theory of Constraints
discovered by Dr. Eliyahu M. Goldratt. It showed that any improvements made anywhere beside the bottleneck are an illusion.
Any improvement after the bottleneck is useless
because it will always remain starved, waiting for work from the
bottleneck. And any improvements made before the bottleneck result in
more work piling up at the bottleneck.
Components of UX Strategy: it’s difficult to build confidence in your UX work without preparing a proper UX strategy ahead of time.
Wait Time = Busy ÷ Idle
To
improve flow, sometimes we need to freeze the work and bring focus to
one single project. Just as important as throttling the release of work is managing the handoffs.
The wait time for a given resource is the percentage of time that the
resource is busy divided by the percentage of time it’s idle. If a
resource is 50% utilized, the wait time is 50/50, or 1 unit.
If
the resource is 90% utilized, the wait time is 90/10, or 9 times longer.
And if it’s 99% of time utilized, it’s 99/1, so 99 times longer than if
that resource is 50% utilized. The critical part is to make wait times visible so you know when your work spends days sitting in someone’s queue.
The exact times don’t matter, but if a resource is busy 99% of the time, the wait time will explode.
Avoid 100% Occupation
Our goal is to maximize flow: that means exploiting the constraint but creating idle times for non-constraint to optimize system performance.
One surprising finding for me was that any attempt to maximize the utilization of all resources — 100% occupation across all departments — can actually be counterproductive. As Goldratt noted, “An hour lost at a bottleneck is an hour out of the entire system. An hour saved at a non-bottleneck is worthless.”
Recommended Read: “The Phoenix Project”
“The Phoenix Project” by Gene Kim, Kevin Behr, and George Spafford is a wonderful novel about the struggles of shipping. (Large preview)
I can only wholeheartedly recommend The Phoenix Project, an absolutely incredible book that goes into all the fine details of the Theory of Constraints described above.
It’s
not a design book but a great book for designers who want to be more
strategic about their work. It’s a delightful and very real read about
the struggles of shipping (albeit on a more technical side).
Wrapping Up
People
don’t like sudden changes and uncertainty, and UX work often disrupts
their usual ways of working. Unsurprisingly, most people tend to block
it by default. So before we introduce big changes, we need to get their
support for our UX initiatives.
We need to build confidence and show them the value that UX work can have — for their day-to-day work. To achieve that, we can work together with them. Listening to the pain points they encounter in their workflows, to the things that slow them down.
Once we’ve uncovered internal disruptions,
we can tackle these critical bottlenecks and suggest steps to make
existing workflows more efficient. That’s the foundation to gaining
their trust and showing them that UX work doesn’t disrupt but that it’s here to solve problems.
Struggling with slow Largest Contentful Paint (LCP)? Newly introduced by
Google, LCP subparts help you pinpoint where page load delays come
from. Now, in the Chrome UX Report, this data provides real visitor
insights to speed up your site and boost rankings. It unpacks
what LCP subparts are, what they mean for your website speed, and how
you can measure them.
The Largest Contentful Paint
(LCP) in Core Web Vitals measures how quickly a website loads from a
visitor’s perspective. It looks at how long after opening a page the
largest content element becomes visible. If your website is loading
slowly, that’s bad for user experience and can also cause your site to rank lower in Google.
When
trying to fix LCP issues, it’s not always clear what to focus on. Is
the server too slow? Are images too big? Is the content not being
displayed? Google has been working to address that recently by
introducing LCP subparts, which tell you where page load delays are coming from. They’ve also added this data to the Chrome UX Report, allowing you to see what causes delays for real visitors on your website!
Let’s take a look at what the LCP subparts are, what they mean for your website speed, and how you can measure them.
LCP subparts split the Largest Contentful Paint metric into four different components:
Time to First Byte (TTFB): How quickly the server responds to the document request.
Resource Load Delay: Time spent before the LCP image starts to download.
Resource Load Time: Time spent downloading the LCP image.
Element Render Delay: Time before the LCP element is displayed.
The
resource timings only apply if the largest page element is an image or
background image. For text elements, the Load Delay and Load Time
components are always zero.
One way to measure how much each component contributes to the LCP score on your website is to use DebugBear’s website speed test. Expand the Largest Contentful Paint metric to see subparts and other details related to your LCP score.
Here,
we can see that TTFB and image Load Duration together account for 78%
of the overall LCP score. That tells us that these two components are
the most impactful places to start optimizing.
What’s happening during each of these stages? A network request waterfall can help us understand what resources are loading through each stage.
The
LCP Image Discovery view filters the waterfall visualization to just
the resources that are relevant to displaying the Largest Contentful
Paint image. In this case, each of the first three stages contains one
request, and the final stage finishes quickly with no new resources
loaded. But that depends on your specific website and won’t always be
the case.
In
this example, we can see that creating the server connection doesn’t
take all that long. Most of the time is spent waiting for the server to
generate the page HTML. So, to improve the TTFB, we need to speed up
that process or cache the HTML so we can skip the HTML generation
entirely.
The “resource” we want to load is the LCP image. Ideally, we just have an <img> tag near the top of the HTML, and the browser finds it right away and starts loading it.
But sometimes, we get a Load Delay, as is the case here. Instead of loading the image directly, the page uses lazysize.js, an image lazy loading library that only loads the LCP image once it has detected that it will appear in the viewport.
Part
of the Load Delay is caused by having to download that JavaScript
library. But the browser also needs to complete the page layout and
start rendering content before the library will know that the image is
in the viewport. After finishing the request, there’s a CPU task (in
orange) that leads up to the First Contentful Paint milestone, when the page starts rendering. Only then does the library trigger the LCP image request.
How do we optimize this? First of all, instead of using a lazy loading library, you can use the native loading="lazy" image attribute. That way, loading images no longer depends on first loading JavaScript code.
The fourth and final LCP component, Render Delay, is often the most confusing. The resource has loaded, but for some reason, the browser isn’t ready to show it to the user yet!
Luckily,
in the example we’ve been looking at so far, the LCP image appears
quickly after it’s been loaded. One common reason for render delay is
that the LCP element is not an image. In that case, the render delay is caused by render-blocking scripts and stylesheets. The text can only appear after these have loaded and the browser has completed the rendering process.
Another reason you might see render delay is when the website preloads the LCP image. Preloading is a good idea, as it practically eliminates any load delay and ensures the image is loaded early.
However,
if the image finishes downloading before the page is ready to render,
you’ll see an increase in render delay on the page. And that’s fine!
You’ve improved your website speed overall, but after optimizing your
image, you’ve uncovered a new bottleneck to focus on.
That’s why, in February 2025, Google started including subpart data in the CrUX data report. It’s not (yet?) included in PageSpeed Insights, but you can see those metrics in DebugBear’s “Web Vitals” tab.
One super useful bit of info here is the LCP resource type: it tells you how many visitors saw the LCP element as a text element or an image.
Even
for the same page, different visitors will see slightly different
content. For example, different elements are visible based on the device
size, or some visitors will see a cookie banner while others see the
actual page content.
To make the data easier to interpret, Google only reports subpart data for images.
If
the LCP element is usually text on the page, then the subparts info
won’t be very helpful, as it won’t apply to most of your visitors.
But breaking down text LCP is relatively easy: everything that’s not part of the TTFB score is render-delayed.
Track Subparts On Your Website With Real User Monitoring #
That’s why a real-user monitoring tool like DebugBear comes in handy when fixing your LCP scores. You can track scores across all pages on your website over time and get dedicated dashboards for each LCP subpart.
You can also review specific visitor experiences, see what the LCP image was for them, inspect a request waterfall, and check LCP subpart timings. Sign up for a free trial.
Conclusion
Having
more granular metric data available for the Largest Contentful Paint
gives web developers a big leg up when making their website faster.
Including
subparts in CrUX provides new insight into how real visitors experience
your website and can tell if the optimizations you’re considering would
really be impactful.
Modern
frameworks are supposed to help speed up development while providing
modern tools and a developer-friendly workflow. In theory, this is great
and makes a lot of sense. In reality, Kevin Leary has found that they
cause far more problems than they solve. This ultimately leads to the
big question: why are modern theme frameworks so popular, and do they
really benefit developers in the long run?
When it comes to custom WordPress development, theme frameworks like Sage and Genesis
have become a go-to solution, particularly for many agencies that rely
on frameworks as an efficient starting point for client projects. They
promise modern standards, streamlined workflows, and maintainable
codebases. At face value, these frameworks seem to be the answer to
building high-end, bespoke WordPress websites. However, my years of
inheriting these builds as a freelance developer tell a different story —
one rooted in the reality of long-term maintenance, scalability, and
developer onboarding.
As someone who specializes in working with
professional websites, I’m frequently handed projects originally built
by agencies using these frameworks. This experience has given me a unique perspective on the real-world implications of these tools
over time. While they may look great in an initial pitch, their
complexities often create friction for future developers, maintenance
teams, and even the businesses they serve.
This is not to
say frameworks like Sage or Genesis are without merit, but they are far
from the universal “best practice” they’re often touted to be.
Below, I’ll share the lessons I’ve learned from inheriting and working with these setups, the challenges I’ve faced, and why I believe a minimal WordPress approach often provides a better path forward.
Why Agencies Use Frameworks
Frameworks
are designed to make WordPress development faster, cleaner, and
optimized for current best practices. Agencies are drawn to these tools
for several reasons:
Current code standards Frameworks like Sage adopt PSR-2 standards, composer-based dependency management, and MVC-like abstractions.
Reusable components Sage’s Blade templating encourages modularity, while Genesis relies on hooks for extensive customization.
Streamlined design tools Integration with Tailwind CSS, SCSS, and Webpack (or newer tools like Bud) allows rapid prototyping.
Optimized performance Frameworks are typically designed with lightweight, bloat-free themes in mind.
Team productivity By creating a standardized approach, these frameworks promise efficiency for larger teams with multiple contributors.
On
paper, these benefits make frameworks an enticing choice for agencies.
They simplify the initial build process and cater to developers
accustomed to working with modern PHP practices and JavaScript-driven
tooling. But whenever I inherit these projects years later, the cracks
in the foundation begin to show.
The Reality of Maintaining Framework-Based Builds
While
frameworks have their strengths, my firsthand experience reveals
recurring issues that arise when it’s time to maintain or extend these
builds. These challenges aren’t theoretical — they are issues I’ve
encountered repeatedly when stepping into an existing framework-based
site.
1. Abstraction Creates Friction
One
of the selling points of frameworks is their use of abstractions, such
as Blade templating and controller-to-view separation. While these
patterns make sense in theory, they often lead to unnecessary complexity in practice.
For
instance, Blade templates abstract PHP logic from WordPress’s
traditional theme hierarchy. This means errors like syntax issues don’t
provide clear stack traces pointing to the actual view file — rather,
they reference compiled templates. Debugging becomes a scavenger hunt, especially for developers unfamiliar with Sage’s structure.
One
example is a popular news outlet with millions of monthly visitors.
When I first inherited their Sage-based theme, I had to bypass their
Lando/Docker environment to use my own minimal Nginx localhost setup.
The theme was incompatible with standard WordPress workflows, and I had
to modify build scripts to support a traditional installation. Once I
resolved the environment issues, I realized their build process was
incredibly slow, with hot module replacement only partially functional
(Blade template changes wouldn’t reload). Each save took 4–5 seconds to
compile.
Faced with a decision to either upgrade to Sage 10 or
rebuild the critical aspects, I opted for the latter. We drastically
improved performance by replacing the Sage build with a simple Laravel
Mix process. The new build process was reduced from thousands of lines
to 80, significantly improving developer workflow. Any new developer
could now understand the setup quickly, and future debugging would be
far simpler.
2. Inflexible Patterns
While
Sage encourages “best practices,” these patterns can feel rigid and
over-engineered for simple tasks. Customizing basic WordPress features —
like adding a navigation menu or tweaking a post query — requires
following the framework’s prescribed patterns. This introduces a learning curve for developers who aren’t deeply familiar with Sage, and slows down progress for minor adjustments.
Traditional
WordPress theme structures, by contrast, are intuitive and widely
understood. Any WordPress developer, regardless of background, can jump
into a classic theme and immediately know where to look for templates,
logic, and customizations. Sage’s abstraction layers, while
well-meaning, limit accessibility to a smaller, more niche group of
developers.
3. Hosting Compatibility Issues
When
working with Sage, issues with hosting environments are inevitable. For
example, Sage’s use of Laravel Blade compiles templates into cached PHP
files, often stored in directories like /wp-content/cache. Strict file system rules on managed hosting platforms, like WP Engine, can block these writes, leading to white screens or broken templates after deployment.
This
was precisely the issue I faced with a custom agency-built theme using
the Sage theme on WPEngine.” Every Git deployment resulted in a white
screen of death due to PHP errors caused by Blade templates failing to
save in the intended cache directory. The solution, recommended by WP
Engine support, was to use the system’s /tmp directory.
While this workaround prevented deployment errors, it undermined the
purpose of cached templates, as temporary files are cleared by PHP’s
garbage collection. Debugging and implementing this solution consumed
significant time — time that could have been avoided had the theme been
designed with hosting compatibility in mind.
4. Breaking Changes And Upgrade Woes
Upgrading
from Sage 9 to Sage 10 — or even from older versions of Roots — often
feels like a complete rebuild. These breaking changes create friction
for businesses that want long-term stability. Clients, understandably,
are unwilling to pay for what amounts to refactoring without a visible
return on investment. As a result, these sites stagnate, locked into
outdated versions of the framework, creating problems with dependency management (e.g., Composer packages, Node.js versions) and documentation mismatches.
One
agency subcontract I worked on recently gave me insight into Sage 10’s
latest approach. Even on small microsites with minimal custom logic, I
found the Bud-based build system sluggish, with watch processes taking
over three seconds to reload.
For developers accustomed to faster
workflows, this is unacceptable. Additionally, Sage 10 introduced new
patterns and directives that departed significantly from Sage 9, adding a
fresh learning curve. While I understand the appeal of mirroring
Laravel’s structure, I couldn’t shake the feeling that this complexity
was unnecessary for WordPress. By sticking to simpler approaches, the
footprint could be smaller, the performance faster, and the maintenance
much easier.
The Cost Of Over-Engineering
The issues above boil down to one central theme: over-engineering.
Frameworks
like Sage introduce complexity that, while beneficial in theory, often
outweighs the practical benefits for most WordPress projects.
When
you factor in real-world constraints — like tight budgets, frequent
developer turnover, and the need for intuitive codebases — the case for a minimal approach becomes clear.
Minimal WordPress setups embrace simplicity:
No abstraction for abstraction’s sake Traditional WordPress theme hierarchy is straightforward, predictable, and accessible to a broad developer audience.
Reduced tooling overhead Avoiding reliance on tools like Webpack or Blade removes potential points of failure and speeds up workflows.
Future-proofing A standard theme structure remains compatible with WordPress core updates and developer expectations, even a decade later.
Like many things, this all sounds great and makes sense in theory,
but what does it look like in practice? Seeing is believing, so I’ve
created a minimal theme that exemplifies some of the concepts I’ve
described here. This theme is a work in progress, and there are plenty
of areas where it needs work. It provides the top features that custom
WordPress developers seem to want most in a theme framework.
Before we dive in, I’ll list out some of the key benefits of what’s going on in this theme. Above all of these, working minimally and keeping things simple and easy to understand is by far the largest benefit, in my opinion.
A watch task that compiles and reloads in under 100ms;
Sass for CSS preprocessing coupled with CSS written in BEM syntax;
Global context variables for common WordPress data: site_url, site_description, site_url, theme_dir, theme_url, primary_nav, ACF custom fields, the_title(), the_content().
Templating Language
Twig
is included with this theme, and it is used to load a small set of
commonly used global context variables such as theme URL, theme
directory, site name, site URL, and so on. It also includes some core
functions as well, like the_content(), the_title(),
and others you’d routinely often use during the process of creating a
custom theme. These global context variables and functions are available
for all URLs.
While it could be argued that Twig is an
unnecessary additional abstraction layer when we’re trying to establish a
minimal WordPress setup, I chose to include it because this type of
abstraction is included in Sage. But it’s also for a few other important
reasons:
Old,
Dependable, and
Stable.
You
won’t need to worry about any future breaking changes in future
versions, and it’s widely in use today. All the features I commonly see
used in Sage Blade templates can easily be handled with Twig similarly.
There really isn’t anything you can do with Blade that isn’t possible
with Twig.
Blade is a great templating language, but it’s best
suited for Laravel, in my opinion. BladeOne does provide a good way to
use it as a standalone templating engine, but even then, it’s still not
as performant under pressure as Twig. Twig’s added performance, when
used with small, efficient contexts, allows us to avoid the complexity
that comes with caching view output. Compile-on-the-fly Twig is very
close to the same speed as raw PHP in this use case.
Most importantly, Twig was built to be portable. It can be installed with composer and used within the theme with just 55 lines of code.
Now,
in a real project, this would probably be more than 55 lines, but
either way, it is, without a doubt, much easier to understand and work
with than Blade. Blade was built for use in Laravel, and it’s just not
nearly as portable. It will be significantly easier to identify issues,
track them down with a direct stack trace, and fix them with Twig.
The
view context in this theme is deliberately kept sparse, during a site
build you’ll add what you specifically need for a particular site. A
lean context for your views helps with performance and workflow.
Models & Controllers
The
template hierarchy follows the patterns of good ol’ WordPress, and
while some developers don’t like this, it is undoubtedly the most widely
accepted and commonly understood standard. Each standard theme file
uses a model where you define your data structures with PHP and hand off
the theme as the context to a .twig view file.
Developers
like the structure of separating server-side logic from a template, and
in a classic MVC/MVVC pattern, we have our model, view, and controller.
Here, I’m using the standard WordPress theme templates as models.
Currently,
template files include some useful basics. You’re likely familiar with
these standard templates, but I’ll list them here for posterity:
404.php: Displays a custom “Page Not Found” message when a visitor tries to access a page that doesn’t exist.
archive.php: Displays a list of posts from a particular archive, such as a category, date, or tag archive.
author.php: Displays a list of posts by a specific author, along with the author’s information.
category.php: Displays a list of posts from a specific category.
footer.php: Contains the footer section of the theme, typically including closing HTML tags and widgets or navigation in the footer area.
front-page.php: The template used for the site’s front page, either static or a blog, depending on the site settings.
functions.php:
Adds custom functionality to the theme, such as registering menus and
widgets or adding theme support for features like custom logos or post
thumbnails.
header.php: Contains the header section of the theme, typically including the site’s title, meta tags, and navigation menu.
index.php: The fallback template for all WordPress pages is used if no other more specific template (like category.php or single.php) is available.
page.php: Displays individual static pages, such as “About” or “Contact” pages.
screenshot.png: An image of the theme’s design is shown in the WordPress theme selector to give users a preview of the theme’s appearance.
search.php: Displays the results of a search query, showing posts or pages that match the search terms entered by the user.
single.php: Displays individual posts, often used for blog posts or custom post types.
tag.php: Displays a list of posts associated with a specific tag.
Extremely Fast Build Process For SCSS And JavaScript
The
build is curiously different in this theme, but out of the box, you can
compile SCSS to CSS, work with native JavaScript modules, and have a
live reload watch process with a tiny footprint. Look inside the bin/*.js files, and you’ll see everything that’s happening.
There are just two commands here, and all web developers should be familiar with them:
Watch While developing, it will reload or inject JavaScript and CSS changes into the browser automatically using a Browsersync.
Build This task compiles all top-level *.scss files efficiently. There’s room for improvement, but keep in mind this theme serves as a concept.
Now for a curveball: there is no compile process for JavaScript.
File changes will still be injected into the browser with hot module
replacement during watch mode, but we don’t need to compile anything.
WordPress will load theme JavaScript as native ES modules, using WordPress 6.5’s support for ES modules.
My reasoning is that many sites now pass through Cloudflare, so modern
compression is handled for JavaScript automatically. Many specialized
WordPress hosts do this as well. When comparing minification to GZIP,
it’s clear that minification provides trivial gains in file reduction.
The vast majority of file reduction is provided by CDN and server
compression. Based on this, I believe the benefits of a fast workflow
far outweigh the additional overhead of pulling in build steps for
webpack, Rollup, or other similar packaging tools.
We’re fortunate
that the web fully supports ES modules today, so there is really no
reason why we should need to compile JavaScript at all if we’re not
using a JavaScript framework like Vue, React, or Svelte.
A Contrarian Approach
My
perspective and the ideas I’ve shared here are undoubtedly contrarian.
Like anything alternative, this is bound to ruffle some feathers.
Frameworks like Sage are celebrated in developer circles, with strong
communities behind them. For certain use cases — like large-scale,
enterprise-level projects with dedicated development teams — they may
indeed be the right fit.
Simplicity,
in my view, is underrated in modern web development. A minimal
WordPress setup, tailored to the specific needs of the project without
unnecessary abstraction, is often the leaner, more sustainable choice.
Conclusion
Inheriting
framework-based projects has taught me invaluable lessons about the
real-world impact of theme frameworks. While they may impress in an
initial pitch or during development, the long-term consequences of added
complexity often outweigh the benefits. By adopting a minimal WordPress
approach, we can build sites that are easier to maintain, faster to onboard new developers, and more resilient to change.
Modern
tools have their place, but minimalism never goes out of style. When
you choose simplicity, you choose a codebase that works today, tomorrow,
and years down the line. Isn’t that what great web development is all
about?
Effective
data storytelling isn’t a black box. By integrating UX research &
psychology, you can craft more impactful and persuasive narratives.
Victor Yocco and Angelica Lo Duca outline a five-step framework that
provides a roadmap for creating data stories that resonate with
audiences on both a cognitive and emotional level.
Data
storytelling is a powerful communication tool that combines data
analysis with narrative techniques to create impactful stories. It goes
beyond presenting raw numbers by transforming complex data into
meaningful insights that can drive decisions, influence behavior, and
spark action.
When done right, data storytelling simplifies
complex information, engages the audience, and compels them to act.
Effective data storytelling allows UX professionals to effectively
communicate the “why” behind their design choices, advocate for user-centered improvements, and ultimately create more impactful and persuasive presentations.
This translates to stronger buy-in for research initiatives, increased
alignment across teams, and, ultimately, products and experiences that
truly meet user needs.
For instance, The New York Times’ Snow Fall
data story (Figure 1) used data to immerse readers in the tale of a
deadly avalanche through interactive visuals and text, while The Guardian’s The Counted
(Figure 2) powerfully illustrated police violence in the U.S. by
humanizing data through storytelling. These examples show that effective
data storytelling can leave lasting impressions, prompting readers to
think differently, act, or make informed decisions.
Figure
1: The NYT Snow Fall displays data visualizations alongside a narrative
of the events preceding and during a deadly avalanche.Figure 2: The Guardian The Counted tells a compelling data story of the facts behind people killed by the police in the US.
The importance of data storytelling lies in its ability to:
Simplify complexity It makes data understandable and actionable.
Engage and persuade Emotional and cognitive engagement ensures audiences not only understand but also feel compelled to act.
Bridge gaps Data storytelling connects the dots between information and human experience, making the data relevant and relatable.
While
there are numerous models of data storytelling, here are a few
high-level areas of focus UX practitioners should have a grasp on:
Narrative Structures: Traditional storytelling models like the hero’s journey (Vogler, 1992) or the Freytag pyramid
(Figure 3) provide a backbone for structuring data stories. These
models help create a beginning, rising action, climax, falling action,
and resolution, keeping the audience engaged.
Figure 3: Freytag’s Pyramid provides a narrative structure for storytellers.
Data Visualization:
Broadly speaking, these are the tools and techniques for visualizing
data in our stories. Interactive charts, maps, and infographics (Cairo, 2016) transform raw data into digestible visuals, making complex information easier to understand and remember.
Narrative Structures For Data
Moving
beyond these basic structures, let’s explore how more sophisticated
narrative techniques can enhance the impact of data stories:
The Three-Act Structure This
approach divides the data story into setup, confrontation, and
resolution. It helps build context, present the problem or insight, and
offer a solution or conclusion (Few, 2005).
The Hero’s Journey (Data Edition) We
can frame a data set as a problem that needs a hero to overcome. In
this case, the hero is often the audience or the decision-maker who
needs to use the data to solve a problem. The data itself becomes the
journey, revealing challenges, insights, and, ultimately, a path to
resolution.
Example: Presenting data on
declining user engagement could follow the hero’s journey. The “call to
adventure” is the declining engagement. The “challenges” are revealed
through data points showing where users are dropping off. The “insights”
are uncovered through further analysis, revealing the root causes. The
“resolution” is the proposed solution, supported by data, that the
audience (the hero) can implement.
Problems With Widely Used Data Storytelling Models
Many
data storytelling models follow a traditional, linear structure: data
selection, audience tailoring, storyboarding with visuals, and a call to
action. While these models aim to make data more accessible, they often
fail to engage the audience on a deeper level, leading to missed
opportunities. This happens because they prioritize the presentation of data over the experience of the audience, neglecting how different individuals perceive and process information.
Figure 4: The traditional flow for creating a data-driven story.
While
existing data storytelling models adhere to a structured and
technically correct approach to data creation, they often fall short of
fully analyzing and understanding their audience. This gap weakens their
overall effectiveness and impact.
Cognitive Overload Presenting
too much data without context or a clear narrative overwhelms the
audience. Instead of enlightenment, they experience confusion and
disengagement. It’s like trying to drink from a firehose; the sheer
volume becomes counterproductive. This overload can be particularly
challenging for individuals with cognitive differences who may require
information to be presented in smaller, more digestible chunks.
Emotional Disconnect Data-heavy
presentations often fail to establish an emotional connection, which is
crucial for driving audience engagement and action. People are more
likely to remember and act upon information that resonates with their
feelings and values.
Lack of Personalization Many
data stories adopt a one-size-fits-all approach. Without tailoring the
narrative to specific audience segments, the impact is diluted. A
message that resonates with a CEO might not land with frontline
employees.
Over-Reliance on Visuals While
visuals are essential for simplifying data, they are insufficient
without a cohesive narrative to provide context and meaning, and they
may not be accessible to all audience members.
These
shortcomings reveal a critical flaw: while current models successfully
follow a structured data creation process, they often neglect the
deeper, audience-centered analysis required for actual storytelling
effectiveness. To bridge this gap,
Traditional models can be improved by focusing more on the following two critical components:
Audience understanding:
A greater focus can be concentrated on who the audience is, what they
need, and how they perceive information. Traditional models should
consider the unique characteristics and needs of specific audiences.
This lack of audience understanding can lead to data stories that are
irrelevant, confusing, or even misleading.
Effective data
storytelling requires a deep understanding of the audience’s
demographics, psychographics, and information needs. This includes
understanding their level of knowledge about the topic, their prior
beliefs and attitudes, and their motivations for seeking information. By
tailoring the data story to a specific audience, storytellers can
increase engagement, comprehension, and persuasion.
Psychological principles:
These models could be improved with insights from psychology that
explain how people process information and make decisions. Without these
elements, even the most beautifully designed data story may fall flat.
Traditional models of data storytelling can be improved with two
critical components that are essential for creating impactful and
persuasive narratives: audience understanding and psychological
principles.
By incorporating audience understanding and
psychological principles into their storytelling process, data
storytellers can create more effective and engaging narratives that
resonate with their audience and drive desired outcomes.
Persuasion In Data Storytelling
All storytelling involves persuasion.
Even if it’s a poorly told story and your audience chooses to ignore
your message, you’ve persuaded them to do that. When your audience feels
that you understand them, they are more likely to be persuaded by your
message. Data-driven stories that speak to their hearts and minds are
more likely to drive action. You can frame your message effectively when
you have a deeper understanding of your audience.
Applying Psychological Principles To Data Storytelling
Humans
process information based on psychological cues such as cognitive ease,
social proof, and emotional appeal. By incorporating these principles,
data storytellers can make their narratives more engaging, memorable,
and persuasive.
Psychological principles help data storytellers tap into how people perceive, interpret, and remember information.
The Theory of Planned Behavior
While
there is no single truth when it comes to how human behavior is created
or changed, it is important for a data storyteller to use a theoretical
framework to ensure they address the appropriate psychological factors
of their audience. The Theory of Planned Behavior (TPB)
is a commonly cited theory of behavior change in academic psychology
research and courses. It’s useful for creating a reasonably effective
framework to collect audience data and build a data story around it.
The TPB (Ajzen 1991) (Figure 5) aims to predict and explain human behavior. It consists of three key components:
Attitude This
refers to the degree to which a person has a favorable or unfavorable
evaluation of the behavior in question. An example of attitudes in the
TPB is a person’s belief about the importance of regular exercise for
good health. If an individual strongly believes that exercise is
beneficial, they are likely to have a favorable attitude toward engaging
in regular physical activity.
Subjective Norms These
are the perceived social pressures to perform or not perform the
behavior. Keeping with the exercise example, this would be how a person
thinks their family, peers, community, social media, and others perceive
the importance of regular exercise for good health.
Perceived Behavioral Control This
component reflects the perceived ease or difficulty of performing the
behavior. For our physical activity example, does the individual believe
they have access to exercise in terms of time, equipment, physical
capability, and other potential aspects that make them feel more or less
capable of engaging in the behavior?
As shown in Figure 5,
these three components interact to create behavioral intentions, which
are a proxy for actual behaviors that we often don’t have the resources
to measure in real-time with research participants (Ajzen, 1991).
Figure
5: The factors of the TPB interact with each other, collectively
shaping an individual's behavioral intentions, which, in turn, are the
most proximal determinant of human social behavior.
UX
researchers and data storytellers should develop a working knowledge of
the TPB or another suitable psychological theory before moving on to
measure the audience’s attitudes, norms, and perceived behavioral
control. We have included additional resources to support your learning
about the TPB in the references section of this article.
How To Understand Your Audience And Apply Psychological Principles
OK,
we’ve covered the importance of audience understanding and psychology.
These two principles serve as the foundation of the proposed model of
storytelling we’re putting forth. Let’s explore how to integrate them into your storytelling process.
Introducing The Audience Research Informed Data Storytelling Model (ARIDSM)
At
the core of successful data storytelling lies a deep understanding of
your audience’s psychology. Here’s a five-step process to integrate UX
research and psychological principles effectively into your data
stories:
Figure 6: The 5 steps of the Audience Research Informed Data Storytelling Model (ARIDSM).
Step 1: Define Clear Objectives
Before
diving into data, it’s crucial to establish precisely what you aim to
achieve with your story. Do you want to inform, persuade, or inspire
action? What specific message do you want your audience to take away?
Why it matters:
Defining clear objectives provides a roadmap for your storytelling
journey. It ensures that your data, narrative, and visuals are all
aligned toward a common goal. Without this clarity, your story risks
becoming unfocused and losing its impact.
How to execute Step 1: Start by asking yourself:
What is the core message I want to convey?
What do I want my audience to think, feel, or do after experiencing this story?
How will I measure the success of my data story?
Frame
your objectives using action verbs and quantifiable outcomes. For
example, instead of “raise awareness about climate change,” aim to
“persuade 20% of the audience to adopt one sustainable practice.”
Example: Imagine
you’re creating a data story about employee burnout. Your objective
might be to convince management to implement new policies that promote
work-life balance, with the goal of reducing reported burnout cases by
15% within six months.
Step 2: Conduct UX Research To Understand Your Audience
This
step involves gathering insights about your audience: their
demographics, needs, motivations, pain points, and how they prefer to
consume information.
Why it matters:
Understanding your audience is fundamental to crafting a story that
resonates. By knowing their preferences and potential biases, you can
tailor your narrative and data presentation to capture their attention
and ensure the message is clearly understood.
How to execute Step 2:
Employ UX research methods like surveys, interviews, persona
development, and testing the message with potential audience members.
Example: If
your data story aims to encourage healthy eating habits among college
students, your research might conduct a survey of students to determine
what types of attitudes exist towards specific types of healthy foods
for eating, to apply that knowledge in your data story.
Step 3: Analyze and Select Relevant Audience Data
This
step bridges the gap between raw data and meaningful insights. It
involves exploring your data to identify patterns, trends, and key
takeaways that support your objectives and resonate with your audience.
Why it matters:
Careful data analysis ensures that your story is grounded in evidence
and that you’re using the most impactful data points to support your
narrative. This step adds credibility and weight to your story, making
it more convincing and persuasive.
How to execute Step 3:
Clean and organize your data. Ensure accuracy and consistency before analysis.
Identify key variables and metrics. This
will be determined by the psychological principle you used to inform
your research. Using the TPB, we might look closely at how we measured
social norms to understand directionally how the audience perceives
social norms around the topic of the data story you are sharing,
allowing you to frame your call to action in ways that resonate with
these norms. You might run a variety of statistics at this point,
including factor analysis to create groups based on similar traits,
t-tests to determine if averages on your measurements are significantly
different between groups, and correlations to see if there might be an
assumed direction between scores on various items.
Example: If
your objective is to demonstrate the effectiveness of a new teaching
method, analyzing how your audience perceives their peers to be open to
adopting new methods, their belief that they are in control over the
decision to use a new teaching method, and their attitude towards the
effectiveness of their current teaching methods to create groups that
have various levels of receptivity in trying new methods, allowing you
to later tailor your data story for each group.
Step 4: Apply The Theory of Planned Behavior Or Your Psychological Principle Of Choice [Done Simultaneous With Step 3]
In
this step, you will see that The Theory of Planned Behavior (TPB)
provides a robust framework for understanding the factors that drive
human behavior. It posits that our intentions, which are the strongest
predictors of our actions, are shaped by three core components:
attitudes, subjective norms, and perceived behavioral control. By
consciously incorporating these elements into your data story, you can
significantly enhance its persuasive power.
Why it matters:
The TPB offers valuable insights into how people make decisions. By
aligning your narrative with these psychological drivers, you increase
the likelihood of influencing your audience’s intentions and,
ultimately, their behavior. This step adds a layer of strategic
persuasion to your data storytelling, making it more impactful and
effective.
How to execute Step 4:
Here’s how to leverage the TPB in your data story:
Influence Attitudes:
Present data and evidence that highlight the positive consequences of
adopting the desired behavior. Frame the behavior as beneficial,
valuable, and aligned with the audience’s values and aspirations.
This
is where having a deep knowledge of the audience is helpful. Let’s
imagine you are creating a data story on exercise and your call to
action promoting exercise daily. If you know your audience has a highly
positive attitude towards exercise, you can capitalize on that and frame
your language around the benefits of exercising, increasing exercise,
or specific exercises that might be best suited for the audience. It’s
about framing exercise not just as a physical benefit but as a holistic
improvement to their life. You can also tie it to their identity,
positioning exercise as an integral part of living the kind of life they
aspire to.
Shape Subjective Norms: Demonstrate
that the desired behavior is widely accepted and practiced by others,
especially those the audience admires or identifies with. Knowing ahead
of time if your audience thinks daily exercise is something their peers
approve of or engage in will allow you to shape your messaging
accordingly. Highlight testimonials, success stories, or case studies
from individuals who mirror the audience’s values.
If you were to
find that the audience does not consider exercise to be normative
amongst peers, you would look for examples of similar groups of people
who do exercise. For example, if your audience is in a certain age
group, you might focus on what data you have that supports a large
percentage of those in their age group engaging in exercise.
Enhance Perceived Behavioral Control:
Address any perceived barriers to adopting the desired behavior and
provide practical solutions. For instance, when promoting daily
exercise, it’s important to acknowledge the common obstacles people face
— lack of time, resources, or physical capability — and demonstrate how
these can be overcome.
Step 5: Craft A Balanced And Persuasive Narrative
This
is where you synthesize your data, audience insights, psychological
principles (including the TPB), and storytelling techniques into a
compelling and persuasive narrative. It’s about weaving together the
logical and emotional elements of your story to create an experience
that resonates with your audience and motivates them to act.
Why it matters:
A well-crafted narrative transforms data from dry statistics into a
meaningful and memorable experience. It ensures that your audience not
only understands the information but also feels connected to it on an
emotional level, increasing the likelihood of them internalizing the
message and acting upon it.
How to execute Step 5:
Structure your story strategically:
Use a clear narrative arc that guides your audience through the
information. Begin by establishing the context and introducing the
problem, then present your data-driven insights in a way that supports
your objectives and addresses the TPB components. Conclude with a
compelling call to action that aligns with the attitudes, norms, and
perceived control you’ve cultivated throughout the narrative.
Example: In a data story about promoting exercise, you could:
Determine what stories might be available using the data you have collected or obtained.
In this example, let’s say you work for a city planning office and have
data suggesting people aren’t currently biking as frequently as they
could, even if they are bike owners.
Begin with a relatable
story about lack of exercise and its impact on people’s lives. Then,
present data on the benefits of cycling, highlighting its positive
impact on health, socializing, and personal feelings of well-being
(attitudes).
Integrate TPB elements: Showcase
stories of people who have successfully incorporated cycling into their
daily commute (subjective norms). Provide practical tips on bike safety,
route planning, and finding affordable bikes (perceived behavioral
control).
Use infographics to compare commute
times and costs between driving and cycling. Show maps of bike-friendly
routes and visually appealing images of people enjoying cycling.
Call to action:
Encourage the audience to try cycling for a week and provide links to
resources like bike share programs, cycling maps, and local cycling
communities.
Evaluating The Method
Our
next step is to test our hypothesis that incorporating audience
research and psychology into creating a data story will lead to more
powerful results. We have conducted preliminary research using messages
focused on climate change, and our results suggest some support for our
assertion.
We purposely chose a controversial topic because we
believe data storytelling can be a powerful tool. If we want to truly
realize the benefits of effective data storytelling, we need to focus on
topics that matter. We also know that academic research suggests it is
more difficult to shift opinions or generate behavior around topics that
are polarizing (at least in the US), such as climate change.
We
are not ready to share the full results of our study. We will share
those in an academic journal and in conference proceedings. Here is a
look at how we set up the study and how you might do something similar
when either creating a data story using our method or doing your own
research to test our model. You will see that it closely aligns with the
model itself, with the added steps of testing the message against a
control message and taking measurements of the actions the message(s)
are likely to generate.
Step 1: We chose our
topic and the data set we wanted to explore. As I mentioned, we
purposely went with a polarizing topic. My academic background was in
messaging around conservation issues, so we explored that. We used data
from a publicly available data set that states July 2023 was the hottest month ever recorded.
Step 2:
We identified our audience and took basic measurements. We decided our
audience would be members of the general public who do not have jobs
working directly with climate data or other relevant fields for climate
change scientists.
We wanted a diverse range of ages and
backgrounds, so we screened for this in our questions on the survey to
measure the TPB components as well. We created a survey to measure the
elements of the TPB as it relates to climate change and administered the
survey via a Google Forms link that we shared directly, on social media
posts, and in online message boards related to topics of climate change
and survey research.
Step 3: We analyzed our
data and broke our audience into groups based on key differences. This
part required a bit of statistical know-how. Essentially, we entered all
of the responses into a spreadsheet and ran a factor analysis
to define groups based on shared attributes. In our case, we found two
distinct groups for our respondents. We then looked deeper into the
individual differences between the groups, e.g., group 1 had a notably
higher level of positive attitude towards taking action to remediate
climate change.
Step 4 [remember this happens
simultaneously with step 3]: We incorporated aspects of the TPB in how
we framed our data analysis. As we created our groups and looked at the
responses to the survey, we made sure to note how this might impact the
story for our various groups. Using our previous example, a group with a
higher positive attitude toward taking action might need less
convincing to do something about climate change and more information on
what exactly they can do.
Table 1 contains examples of the
questions we asked related to the TPB. We used the guidance provided
here to generate the survey items to measure the TPB related to climate
change activism. Note that even the academic who created the TPB states
there are no standardized questions (PDF) validated to measure the concepts for each individual topic.
Item
Measures
Scale
How
beneficial do you believe individual actions are compared to systemic
changes (e.g., government policies) in tackling climate change?
Attitude
1 to 5 with 1 being “not beneficial” and 5 being “extremely beneficial”
How much do you think the people you care about (family, friends, community) expect you to take action against climate change?
Subjective Norms
1 to 5 with 1 being “they do not expect me to take action” and 5 being “they expect me to take action”
How confident are you in your ability to overcome personal barriers when trying to reduce your environmental impact?
Perceived Behavioral Control
1 to 5 with 1 being “not at all confident” and 5 being “extremely confident”
Table 1:Examples
of questions we used to measure the TPB factors. We asked multiple
questions for each factor and then generated a combined mean score for
each component.
Step 5: We created data
stories aligned with the groups and a control story. We created multiple
stories to align with the groups we identified in our audience. We also
created a control message that lacked substantial framing in any
direction. See below for an example of the control data story (Figure 7)
and one of the customized data stories (Figure 8) we created.
Figure
7: Control data story. For the control story, we displayed the data
around daily surface air temperature with some additional information
explaining the chart. We did not attempt to influence behavior or tap
into psychology to suggest there was urgency or persuade the participant
to want to learn more. The color used in the chart comes from the
initial chart generated in the source. We acknowledge that color is
likely to present some psychological influence, given the use of red to
represent extreme heat and cooler colors like blue to represent cooler
time periods. (Large preview)Figure
8: Group 1 data story. Our measurements suggested that the participants
in Group 1 had a higher level of awareness of climate change and the
related negative impacts of more extreme temperatures. Therefore, we
didn’t call out the potential negatives of climate change and instead
focused on a more positive message of how we might make a positive
impact. Group one had higher levels of subjective norms, suggesting that
language promoting how others engage in certain behaviors might align
with what they believe to be true. We focused on the community aspect of
the message, encouraging them to act. (Large preview)
Step 6:
We released the stories and took measurements of the likelihood of
acting. Specific to our study, we asked the participants how likely they
were to “Click here to LEARN MORE.” Our hypothesis was that individuals
would express a notably higher likelihood to want to click to learn
more on the data story aligned with their grouping, as compared to the
competing group and the control group.
Step 7: We
analyzed the differences between the preexisting groups and what they
stated was their likelihood of acting. As I mentioned, our findings are
still preliminary, and we are looking at ways to increase our response
rate so we can present statistically substantiated findings. Our initial
findings are that we do see small differences between the responses to
the tailored data stories and the control data story. This is
directionally what we would be expecting to see. If you are going to
conduct a similar study or test out your messages, you would also be
looking for results that suggest your ARIDS-derived message is more
likely to generate the expected outcome than a control message or a
non-tailored message.
Overall, we feel there is an exciting
possibility and that future research will help us refine exactly what is
critical about generating a message that will have a positive impact on
your audience. We also expect there are better models of psychology to
use to frame your measurements and message depending on the audience and
topic.
For example, you might feel Maslow’s hierarchy of needs
is more relevant to your data storytelling. You would want to take
measurements related to these needs from your audience and then frame
the data story using how a decision might help meet their needs.
Elevate Your Data Storytelling
Traditional
models of data storytelling, while valuable, often fall short of
effectively engaging and persuading audiences. This is primarily due to
their neglect of crucial aspects such as audience understanding and the application of psychological principles. By incorporating these elements into the data storytelling process, we can create more impactful and persuasive narratives.
The
five-step framework proposed in this article — defining clear
objectives, conducting UX research, analyzing data, applying
psychological principles, and crafting a balanced narrative — provides a roadmap for creating data stories that resonate with audiences on both a cognitive and emotional level. This approach ensures that data is not merely presented but is transformed into a meaningful experience
that drives action and fosters change. As data storytellers, embracing
this human-centric approach allows us to unlock the full potential of
data and create narratives that truly inspire and inform.
Effective
data storytelling isn’t a black box. You can test your data stories for
effectiveness using the same research process we are using to test our
hypothesis as well. While there are additional requirements in terms of
time as a resource, you will make this back in the form of a stronger
impact on your audience when they encounter your data story if it is
shown to be significantly greater than the impact of a control message
or other messages you were considering that don’t incorporate the
psychological traits of your audience.
Please feel free to use our method and provide any feedback on your experience to the author.
Could
AI assist UX researchers by dynamically asking follow-up questions
based on participant responses? Eduard Kuric discusses the significance
of context in the creation of relevant follow-up questions for
unmoderated usability testing, how an AI tasked with interactive
follow-up should be validated, and the potential — along with the risks —
of AI interaction in usability testing.
Unmoderated usability testing has been steadily growingmore popular
with the assistance of online UX research tools. Allowing participants
to complete usability testing without a moderator, at their own pace and
convenience, can have a number of advantages.
The first is the
liberation from a strict schedule and the availability of moderators,
meaning that a lot more participants can be recruited on a more cost-effective and quick basis. It also lets your team see how users interact with your solution in their natural environment,
with the setup of their own devices. Overcoming the challenges of
distance and differences in time zones in order to obtain data from all
around the globe also becomes much easier.
However, forgoing the use of moderators also has its drawbacks. The moderator brings flexibility, as well as a human touch
into usability testing. Since they are in the same (virtual) space as
the participants, the moderator usually has a good idea of what’s going
on. They can react in real-time depending on what they
witness the participant do and say. A moderator can carefully remind the
participants to vocalize their thoughts. To the participant, thinking
aloud in front of a moderator can also feel more natural than just
talking to themselves. When the participant does something interesting,
the moderator can prompt them for further comment.
Meanwhile, a
traditional unmoderated study lacks such flexibility. In order to
complete tasks, participants receive a fixed set of instructions. Once
they are done, they can be asked to complete a static questionnaire, and
that’s it.
The feedback that the research & design team
receives will be completely dependent on what information the
participants provide on their own. Because of this, the phrasing of
instructions and questions in unmoderated testing is extremely crucial.
Although, even if everything is planned out perfectly, the lack of adaptive questioning
means that a lot of the information will still remain unsaid,
especially with regular people who are not trained in providing user
feedback.
If the usability test participant
misunderstands a question or doesn’t answer completely, the moderator
can always ask for a follow-up to get more information. A question then
arises: Could something like that be handled by AI to upgrade
unmoderated testing?
Generative AI could present a new,
potentially powerful tool for addressing this dilemma once we consider
their current capabilities. Large language models (LLMs), in particular,
can lead conversations that can appear almost humanlike. If LLMs could
be incorporated into usability testing to interactively enhance the
collection of data by conversing with the participant, they might
significantly augment the ability of researchers to obtain detailed
personal feedback from great numbers of people. With human participants
as the source of the actual feedback, this is an excellent example of human-centered AI as it keeps humans in the loop.
There are quite a number of gaps in the research of AI in UX. To help with fixing this, we at UXtweak research have conducted a case study aimed at investigating whether AI could generate follow-up questions that are meaningful and result in valuable answers from the participants.
Asking
participants follow-up questions to extract more in-depth information
is just one portion of the moderator’s responsibilities. However, it is a
reasonably-scoped subproblem for our evaluation since it encapsulates
the ability of the moderator to react to the context of the conversation
in real time and to encourage participants to share salient
information.
Experiment Spotlight: Testing GPT-4 In Real-Time Feedback
The focus of our study was on the underlying principles
rather than any specific commercial AI solution for unmoderated
usability testing. After all, AI models and prompts are being tuned
constantly, so findings that are too narrow may become irrelevant in a
week or two after a new version gets updated. However, since AI models
are also a black box based on artificial neural networks, the method by
which they generate their specific output is not transparent.
Our
results can show what you should be wary of to verify that an AI
solution that you use can actually deliver value rather than harm. For
our study, we used GPT-4, which at the time of the experiment was the
most up-to-date model by OpenAI, also capable of fulfilling complex
prompts (and, in our experience, dealing with some prompts better than
the more recent GPT-4o).
In our experiment, we conducted a
usability test with a prototype of an e-commerce website. The tasks
involved the common user flow of purchasing a product.
In this setting, we compared the results with three conditions:
A
regular static questionnaire made up of three pre-defined questions
(Q1, Q2, Q3), serving as an AI-free baseline. Q1 was open-ended, asking
the participants to narrate their experiences during the task. Q2 and Q3
can be considered non-adaptive follow-ups to Q1 since they asked
participants more directly about usability issues and to identify things
that they did not like.
The question Q1, serving as a seed for up to three GPT-4-generated follow-up questions as the alternative to Q2 and Q3.
All three pre-defined questions, Q1, Q2, and Q3, each used as a seed for its own GPT-4 follow-up.
The following prompt was used to generate the follow-up questions:
The prompt employed in our experiment to create AI-generated follow-up questions in an unmoderated usability test
To
assess the impact of the AI follow-up questions, we then compared the
results on both a quantitative and a qualitative basis. One of the
measures that we analyzed is informativeness — ratings of the responses based on how useful they are at elucidating new usability issues encountered by the user.
As
seen in the figure below, the informativeness dropped significantly
between the seed questions and their AI follow-up. The follow-ups rarely
helped identify a new issue, although they did help elaborate further
details.
Compared to the pre-defined seed questions, AI follow-up questions lacked informativeness about new usability issues.
The emotional reactions of the participants
offer another perspective on AI-generated follow-up questions. Our
analysis of the prevailing emotional valence based on the phrasing of
answers revealed that, at first, the answers started with a neutral
sentiment. Afterward, the sentiment shifted toward the negative.
In
the case of the pre-defined questions Q2 and Q3, this could be seen as
natural. While question Seed 1 was open-ended, asking the participants
to explain what they did during the task, Q2 and Q3 focused more on the
negative — usability issues and other disliked aspects. Curiously, the
follow-up chains generally received even more negative receptions than
their seed questions, and not for the same reason.
Sentiment
analysis reveals a drop in participant sentiment in questions involving
AI follow-up questions compared to the seed questions in the GPT
variant.
Frustration
was common as participants interacted with the GPT-4-driven follow-up
questions. This is rather critical, considering that frustration with
the testing process can sidetrack participants from taking usability
testing seriously, hinder meaningful feedback, and introduce a negative
bias.
A major aspect that participants were frustrated with was redundancy. Repetitiveness,
such as re-explaining the same usability issue, was quite common. While
pre-defined follow-up questions yielded 27-28% of repeated answers
(it’s likely that participants already mentioned aspects they disliked
during the open-ended Q1), AI-generated questions yielded 21%.
That’s
not that much of an improvement, given that the comparison is made to
questions that literally could not adapt to prevent repetition at all.
Furthermore, when AI follow-up questions were added to obtain more
elaborate answers for every pre-defined question, the repetition ratio
rose further to 35%. In the variant with AI, participants also rated the
questions as significantly less reasonable.
Answers
to AI-generated questions contained a lot of statements like “I already
said that” and “The obvious AI questions ignored my previous
responses.”
Repetition
of answers in follow-up questions in the unmoderated usability test.
Seed questions and their GPT-4 follow-up form a group. This allows us to
distinguish the repetitions of AI follow-up answers depending on
whether the information they repeat originates from the same group
(intra-group) or from other groups (inter-group).
The
prevalence of repetition within the same group of questions (the seed
question, its follow-up questions, and all of their answers) can be seen
as particularly problematic since the GPT-4 prompt had been provided
with all the information available in this context. This demonstrates
that a number of the follow-up questions were not sufficiently distinct and lacked the direction that would warrant them being asked.
Insights From The Study: Successes And Pitfalls
To summarize the usefulness of AI-generated follow-up questions in usability testing, there are both good and bad points.
Successes:
Generative AI (GPT-4) excels at refining participant answers with contextual follow-ups.
Depth of qualitative insights can be enhanced.
Challenges:
Limited capacity to uncover new issues beyond pre-defined questions.
Participants can easily grow frustrated with repetitive or generic follow-ups.
While
extracting answers that are a bit more elaborate is a benefit, it can
be easily overshadowed if the lack of question quality and relevance is
too distracting. This can potentially inhibit participants’ natural
behavior and the relevance of feedback if they’re focusing on the AI.
Therefore,
in the following section, we discuss what to be careful of, whether you
are picking an existing AI tool to assist you with unmoderated
usability testing or implementing your own AI prompts or even models for
a similar purpose.
Recommendations For Practitioners
Context
is the end-all and be-all when it comes to the usefulness of follow-up
questions. Most of the issues that we identified with the AI follow-up
questions in our study can be tied to the ignorance of proper context in one shape or another.
Based
on real blunders that GPT-4 made while generating questions in our
study, we have meticulously collected and organized a list of the types of context
that these questions were missing. Whether you’re looking to use an
existing AI tool or are implementing your own system to interact with
participants in unmoderated studies, you are strongly encouraged to use
this list as a high-level checklist. With it as the
guideline, you can assess whether the AI models and prompts at your
disposal can ask reasonable, context-sensitive follow-up questions
before you entrust them with interacting with real participants.
Without further ado, these are the relevant types of context:
General Usability Testing Context. The
AI should incorporate standard principles of usability testing in its
questions. This may appear obvious, and it actually is. But it needs to
be said, given that we have encountered issues related to this context
in our study. For example, the questions should not be leading, ask
participants for design suggestions, or ask them to predict their future
behavior in completely hypothetical scenarios (behavioral research is
much more accurate for that).
Usability Testing Goal Context. Different
usability tests have different goals depending on the stage of the
design, business goals, or features being tested. Each follow-up
question and the participant’s time used in answering it are valuable
resources. They should not be wasted on going off-topic. For example, in
our study, we were evaluating a prototype of a website with placeholder
photos of a product. When the AI starts asking participants about their
opinion of the displayed fake products, such information is useless to
us.
User Task Context. Whether the tasks in
your usability testing are goal-driven or open and exploratory, their
nature should be properly reflected in follow-up questions. When the
participants have freedom, follow-up questions could be useful for
understanding their motivations. By contrast, if your AI tool foolishly
asks the participants why they did something closely related to the task
(e.g., placing the specific item they were supposed to buy into the
cart), you will seem just as foolish by association for using it.
Design Context. Detailed
information about the tested design (e.g., prototype, mockup, website,
app) can be indispensable for making sure that follow-up questions are
reasonable. Follow-up questions should require input from the
participant. They should not be answerable just by looking at the
design. Interesting aspects of the design could also be reflected in the
topics to focus on. For example, in our study, the AI would
occasionally ask participants why they believed a piece of information
that was very prominently displayed in the user interface, making the
question irrelevant in context.
Interaction Context. If
Design Context tells you what the participant could potentially see and
do during the usability test, Interaction Context comprises all their
actual actions, including their consequences. This could incorporate the
video recording of the usability test, as well as the audio recording
of the participant thinking aloud. The inclusion of interaction context
would allow follow-up questions to build on the information that the
participant already provided and to further clarify their decisions. For
example, if a participant does not successfully complete a task,
follow-up questions could be directed at investigating the cause, even
as the participant continues to believe they have fulfilled their goal.
Previous Question Context. Even
when the questions you ask them are mutually distinct, participants can
find logical associations between various aspects of their experience,
especially since they don’t know what you will ask them next. A skilled
moderator may decide to skip a question that a participant already
answered as part of another question, instead focusing on further
clarifying the details. AI follow-up questions should be capable of
doing the same to avoid the testing from becoming a repetitive slog.
Question Intent Context. Participants
routinely answer questions in a way that misses their original intent,
especially if the question is more open-ended. A follow-up can spin the
question from another angle to retrieve the intended information.
However, if the participant’s answer is technically a valid reply but
only to the word rather than the spirit of the question, the AI can miss
this fact. Clarifying the intent could help address this.
When assessing a third-party AI tool, a question to ask is whether the tool allows you to provide all of the contextual information explicitly.
If
AI does not have an implicit or explicit source of context, the best it
can do is make biased and untransparent guesses that can result in
irrelevant, repetitive, and frustrating questions.
Even
if you can provide the AI tool with the context (or if you are crafting
the AI prompt yourself), that does not necessarily mean that the AI will
do as you expect, apply the context in practice, and approach its
implications correctly. For example, as demonstrated in our study, when a
history of the conversation was provided within the scope of a question
group, there was still a considerable amount of repetition.
The
most straightforward way to test the contextual responsiveness of a
specific AI model is simply by conversing with it in a way that relies
on context. Fortunately, most natural human conversation already depends
on context heavily (saying everything would take too long otherwise),
so that should not be too difficult. What is key is focusing on the
varied types of context to identify what the AI model can and cannot do.
The
seemingly overwhelming number of potential combinations of varied types
of context could pose the greatest challenge for AI follow-up
questions.
For example, human moderators may decide to go against
the general rules by asking less open-ended questions to obtain
information that is essential for the goals of their research while also
understanding the tradeoffs.
In our study, we have observed that
if the AI asked questions that were too generically open-ended as a
follow-up to seed questions that were open-ended themselves, without a
significant enough shift in perspective, this resulted in repetition,
irrelevancy, and — therefore — frustration.
The
fine-tuning of the AI models to achieve an ability to resolve various
types of contextual conflict appropriately could be seen as a reliable
metric by which the quality of the AI generator of follow-up questions
could be measured.
Researcher control is
also key since tougher decisions that are reliant on the researcher’s
vision and understanding should remain firmly in the researcher’s hands.
Because of this, a combination of static and AI-driven questions with complementary strengths and weaknesses could be the way to unlock richer insights.
A focus on contextual sensitivity validation
can be seen as even more important while considering the broader social
aspects. Among certain people, the trend-chasing and the general
overhype of AI by the industry have led to a backlash against AI. AI
skeptics have a number of valid concerns, including usefulness, ethics,
data privacy, and the environment. Some usability testing participants
may be unaccepting or even outwardly hostile toward encounters with AI.
Therefore,
for the successful incorporation of AI into research, it will be
essential to demonstrate it to the users as something that is both reasonable and helpful. Principles of ethical research
remain as relevant as ever. Data needs to be collected and processed
with the participant’s consent and not breach the participant’s privacy
(e.g. so that sensitive data is not used for training AI models without
permission).
Conclusion: What’s Next For AI In UX?
So,
is AI a game-changer that could break down the barrier between
moderated and unmoderated usability research? Maybe one day. The
potential is certainly there. When AI follow-up questions work as
intended, the results are exciting. Participants can become more
talkative and clarify potentially essential details.
To any UX
researcher who’s familiar with the feeling of analyzing vaguely phrased
feedback and wishing that they could have been there to ask one more
question to drive the point home, an automated solution that could do
this for them may seem like a dream. However, we should also exercise
caution since the blind addition of AI without testing and oversight can
introduce a slew of biases. This is because the relevance of follow-up questions is dependent on all sorts of contexts.
Humans
need to keep holding the reins in order to ensure that the research is
based on actual solid conclusions and intents. The opportunity lies in
the synergy that can arise from usability researchers and designers
whose ability to conduct unmoderated usability testing could be
significantly augmented.
Humans + AI = Better Insights
The best approach to advocate for is likely a balanced one. As UX researchers and designers, humans should continue to learn how
to use AI as a partner in uncovering insights. This article can serve
as a jumping-off point, providing a list of the AI-driven technique’s
potential weak points to be aware of, to monitor, and to improve on.