Hire a web Developer and Designer to upgrade and boost your online presence with cutting edge Technologies

Monday, March 17, 2025

How To Build Confidence In Your UX Work

 UX initiatives are often seen as a disruption rather than a means to solving existing problems in an organization. In this post, we’ll explore how you can build trust for your UX work, gain support, and make a noticeable impact.

When I start any UX project, typically, there is very little confidence in the successful outcome of my UX initiatives. In fact, there is quite a lot of reluctance and hesitation, especially from teams that have been burnt by empty promises and poor delivery in the past.

Good UX has a huge impact on business. But often, we need to build up confidence in our upcoming UX projects. For me, an effective way to do that is to address critical bottlenecks and uncover hidden deficiencies — the ones that affect the people I’ll be working with.

Let’s take a closer look at what this can look like.

UX Doesn’t Disrupt, It Solves Problems

Bottlenecks are usually the most disruptive part of any company. Almost every team, every unit, and every department has one. It’s often well-known by employees as they complain about it, but it rarely finds its way to senior management as they are detached from daily operations.

The Onion Layers
The Iceberg of Ignorance: Sidney Yoshida discovered that leadership is usually unaware of the organization’s real problems.

The bottleneck can be the only senior developer on the team, a broken legacy tool, or a confusing flow that throws errors left and right — there’s always a bottleneck, and it’s usually the reason for long waiting times, delayed delivery, and cutting corners in all the wrong places.

We might not be able to fix the bottleneck. But for a smooth flow of work, we need to ensure that non-constraint resources don’t produce more than the constraint can handle. All processes and initiatives must be aligned to support and maximize the efficiency of the constraint.

So before doing any UX work, look out for things that slow down the organization. Show that it’s not UX work that disrupts work, but it’s internal disruptions that UX can help with. And once you’ve delivered even a tiny bit of value, you might be surprised how quickly people will want to see more of what you have in store for them.

The Work Is Never Just “The Work”

Meetings, reviews, experimentation, pitching, deployment, support, updates, fixes — unplanned work blocks other work from being completed. Exposing the root causes of unplanned work and finding critical bottlenecks that slow down delivery is not only the first step we need to take when we want to improve existing workflows, but it is also a good starting point for showing the value of UX.

Why it’s never just the work.
The work is never just “the work.” In every project — as well as before and after it — there is a lot of invisible, and often unplanned, work going on

To learn more about the points that create friction in people’s day-to-day work, set up 1:1s with the team and ask them what slows them down. Find a problem that affects everyone. Perhaps too much work in progress results in late delivery and low quality? Or lengthy meetings stealing precious time?

One frequently overlooked detail is that we can’t manage work that is invisible. That’s why it is so important that we visualize the work first. Once we know the bottleneck, we can suggest ways to improve it. It could be to introduce 20% idle times if the workload is too high, for example, or to make meetings slightly shorter to make room for other work.

The Theory Of Constraints

The idea that the work is never just “the work” is deeply connected to the Theory of Constraints discovered by Dr. Eliyahu M. Goldratt. It showed that any improvements made anywhere beside the bottleneck are an illusion.

Any improvement after the bottleneck is useless because it will always remain starved, waiting for work from the bottleneck. And any improvements made before the bottleneck result in more work piling up at the bottleneck.

UX Strategy Components
Components of UX Strategy: it’s difficult to build confidence in your UX work without preparing a proper UX strategy ahead of time.

Wait Time = Busy ÷ Idle

To improve flow, sometimes we need to freeze the work and bring focus to one single project. Just as important as throttling the release of work is managing the handoffs. The wait time for a given resource is the percentage of time that the resource is busy divided by the percentage of time it’s idle. If a resource is 50% utilized, the wait time is 50/50, or 1 unit.

If the resource is 90% utilized, the wait time is 90/10, or 9 times longer. And if it’s 99% of time utilized, it’s 99/1, so 99 times longer than if that resource is 50% utilized. The critical part is to make wait times visible so you know when your work spends days sitting in someone’s queue.

The exact times don’t matter, but if a resource is busy 99% of the time, the wait time will explode.

Avoid 100% Occupation

Our goal is to maximize flow: that means exploiting the constraint but creating idle times for non-constraint to optimize system performance.

One surprising finding for me was that any attempt to maximize the utilization of all resources — 100% occupation across all departments — can actually be counterproductive. As Goldratt noted, “An hour lost at a bottleneck is an hour out of the entire system. An hour saved at a non-bottleneck is worthless.”

Recommended Read: “The Phoenix Project”

The Phoenix Project
The Phoenix Project” by Gene Kim, Kevin Behr, and George Spafford is a wonderful novel about the struggles of shipping. (Large preview)

I can only wholeheartedly recommend The Phoenix Project, an absolutely incredible book that goes into all the fine details of the Theory of Constraints described above.

It’s not a design book but a great book for designers who want to be more strategic about their work. It’s a delightful and very real read about the struggles of shipping (albeit on a more technical side).

Wrapping Up

People don’t like sudden changes and uncertainty, and UX work often disrupts their usual ways of working. Unsurprisingly, most people tend to block it by default. So before we introduce big changes, we need to get their support for our UX initiatives.

We need to build confidence and show them the value that UX work can have — for their day-to-day work. To achieve that, we can work together with them. Listening to the pain points they encounter in their workflows, to the things that slow them down.

Once we’ve uncovered internal disruptions, we can tackle these critical bottlenecks and suggest steps to make existing workflows more efficient. That’s the foundation to gaining their trust and showing them that UX work doesn’t disrupt but that it’s here to solve problems.

How To Fix Largest Contentful Paint Issues With Subpart Analysis

 Struggling with slow Largest Contentful Paint (LCP)? Newly introduced by Google, LCP subparts help you pinpoint where page load delays come from. Now, in the Chrome UX Report, this data provides real visitor insights to speed up your site and boost rankings. It unpacks what LCP subparts are, what they mean for your website speed, and how you can measure them.

The Largest Contentful Paint (LCP) in Core Web Vitals measures how quickly a website loads from a visitor’s perspective. It looks at how long after opening a page the largest content element becomes visible. If your website is loading slowly, that’s bad for user experience and can also cause your site to rank lower in Google.

When trying to fix LCP issues, it’s not always clear what to focus on. Is the server too slow? Are images too big? Is the content not being displayed? Google has been working to address that recently by introducing LCP subparts, which tell you where page load delays are coming from. They’ve also added this data to the Chrome UX Report, allowing you to see what causes delays for real visitors on your website!

Let’s take a look at what the LCP subparts are, what they mean for your website speed, and how you can measure them.

The Four LCP Subparts #

LCP subparts split the Largest Contentful Paint metric into four different components:

  1. Time to First Byte (TTFB): How quickly the server responds to the document request.
  2. Resource Load Delay: Time spent before the LCP image starts to download.
  3. Resource Load Time: Time spent downloading the LCP image.
  4. Element Render Delay: Time before the LCP element is displayed.

The resource timings only apply if the largest page element is an image or background image. For text elements, the Load Delay and Load Time components are always zero.

How To Measure LCP Subparts #

One way to measure how much each component contributes to the LCP score on your website is to use DebugBear’s website speed test. Expand the Largest Contentful Paint metric to see subparts and other details related to your LCP score.

Here, we can see that TTFB and image Load Duration together account for 78% of the overall LCP score. That tells us that these two components are the most impactful places to start optimizing.

LCP Subparts
(Large preview)

What’s happening during each of these stages? A network request waterfall can help us understand what resources are loading through each stage.

The LCP Image Discovery view filters the waterfall visualization to just the resources that are relevant to displaying the Largest Contentful Paint image. In this case, each of the first three stages contains one request, and the final stage finishes quickly with no new resources loaded. But that depends on your specific website and won’t always be the case.

LCP image discovery
(Large preview)

Time To First Byte #

The first step to display the largest page element is fetching the document HTML. We recently published an article about how to improve the TTFB metric.

In this example, we can see that creating the server connection doesn’t take all that long. Most of the time is spent waiting for the server to generate the page HTML. So, to improve the TTFB, we need to speed up that process or cache the HTML so we can skip the HTML generation entirely.

Resource Load Delay #

The “resource” we want to load is the LCP image. Ideally, we just have an <img> tag near the top of the HTML, and the browser finds it right away and starts loading it.

But sometimes, we get a Load Delay, as is the case here. Instead of loading the image directly, the page uses lazysize.js, an image lazy loading library that only loads the LCP image once it has detected that it will appear in the viewport.

Part of the Load Delay is caused by having to download that JavaScript library. But the browser also needs to complete the page layout and start rendering content before the library will know that the image is in the viewport. After finishing the request, there’s a CPU task (in orange) that leads up to the First Contentful Paint milestone, when the page starts rendering. Only then does the library trigger the LCP image request.

Load Delay
(Large preview)

How do we optimize this? First of all, instead of using a lazy loading library, you can use the native loading="lazy" image attribute. That way, loading images no longer depends on first loading JavaScript code.

But more specifically, the LCP image should not be lazily loaded. That way, the browser can start loading it as soon as the HTML code is ready. According to Google, you should aim to eliminate resource load delay entirely.

Resources Load Duration #

The Load Duration subpart is probably the most straightforward: you need to download the LCP image before you can display it!

In this example, the image is loaded from the same domain as the HTML. That’s good because the browser doesn’t have to connect to a new server.

Other techniques you can use to reduce load delay:

Element Render Delay #

The fourth and final LCP component, Render Delay, is often the most confusing. The resource has loaded, but for some reason, the browser isn’t ready to show it to the user yet!

Luckily, in the example we’ve been looking at so far, the LCP image appears quickly after it’s been loaded. One common reason for render delay is that the LCP element is not an image. In that case, the render delay is caused by render-blocking scripts and stylesheets. The text can only appear after these have loaded and the browser has completed the rendering process.

Render Delay
(Large preview)

Another reason you might see render delay is when the website preloads the LCP image. Preloading is a good idea, as it practically eliminates any load delay and ensures the image is loaded early.

However, if the image finishes downloading before the page is ready to render, you’ll see an increase in render delay on the page. And that’s fine! You’ve improved your website speed overall, but after optimizing your image, you’ve uncovered a new bottleneck to focus on.

Render Delay with preloaded LCP image
(Large preview)

LCP Subparts In Real User CrUX Data #

Looking at the Largest Contentful Paint subparts in lab-based tests can provide a lot of insight into where you can optimize. But all too often, the LCP in the lab doesn’t match what’s happening for real users!

That’s why, in February 2025, Google started including subpart data in the CrUX data report. It’s not (yet?) included in PageSpeed Insights, but you can see those metrics in DebugBear’s “Web Vitals” tab.

Subpart data in the CrUX data report
(Large preview)

One super useful bit of info here is the LCP resource type: it tells you how many visitors saw the LCP element as a text element or an image.

Even for the same page, different visitors will see slightly different content. For example, different elements are visible based on the device size, or some visitors will see a cookie banner while others see the actual page content.

To make the data easier to interpret, Google only reports subpart data for images.

If the LCP element is usually text on the page, then the subparts info won’t be very helpful, as it won’t apply to most of your visitors.

But breaking down text LCP is relatively easy: everything that’s not part of the TTFB score is render-delayed.

Track Subparts On Your Website With Real User Monitoring #

Lab data doesn’t always match what real users experience. CrUX data is superficial, only reported for high-traffic pages, and takes at least 4 weeks to fully update after a change has been rolled out.

That’s why a real-user monitoring tool like DebugBear comes in handy when fixing your LCP scores. You can track scores across all pages on your website over time and get dedicated dashboards for each LCP subpart.

Dashboards for each LCP subpart

You can also review specific visitor experiences, see what the LCP image was for them, inspect a request waterfall, and check LCP subpart timings. Sign up for a free trial.

DebugBear tool where you can review visitor experiences and check LCP subpart timings

Conclusion

Having more granular metric data available for the Largest Contentful Paint gives web developers a big leg up when making their website faster.

Including subparts in CrUX provides new insight into how real visitors experience your website and can tell if the optimizations you’re considering would really be impactful.

The Case For Minimal WordPress Setups: A Contrarian View On Theme Frameworks

 

Modern frameworks are supposed to help speed up development while providing modern tools and a developer-friendly workflow. In theory, this is great and makes a lot of sense. In reality, Kevin Leary has found that they cause far more problems than they solve. This ultimately leads to the big question: why are modern theme frameworks so popular, and do they really benefit developers in the long run?

When it comes to custom WordPress development, theme frameworks like Sage and Genesis have become a go-to solution, particularly for many agencies that rely on frameworks as an efficient starting point for client projects. They promise modern standards, streamlined workflows, and maintainable codebases. At face value, these frameworks seem to be the answer to building high-end, bespoke WordPress websites. However, my years of inheriting these builds as a freelance developer tell a different story — one rooted in the reality of long-term maintenance, scalability, and developer onboarding.

As someone who specializes in working with professional websites, I’m frequently handed projects originally built by agencies using these frameworks. This experience has given me a unique perspective on the real-world implications of these tools over time. While they may look great in an initial pitch, their complexities often create friction for future developers, maintenance teams, and even the businesses they serve.

This is not to say frameworks like Sage or Genesis are without merit, but they are far from the universal “best practice” they’re often touted to be.

Below, I’ll share the lessons I’ve learned from inheriting and working with these setups, the challenges I’ve faced, and why I believe a minimal WordPress approach often provides a better path forward.

Why Agencies Use Frameworks

Frameworks are designed to make WordPress development faster, cleaner, and optimized for current best practices. Agencies are drawn to these tools for several reasons:

  • Current code standards
    Frameworks like Sage adopt PSR-2 standards, composer-based dependency management, and MVC-like abstractions.
  • Reusable components
    Sage’s Blade templating encourages modularity, while Genesis relies on hooks for extensive customization.
  • Streamlined design tools
    Integration with Tailwind CSS, SCSS, and Webpack (or newer tools like Bud) allows rapid prototyping.
  • Optimized performance
    Frameworks are typically designed with lightweight, bloat-free themes in mind.
  • Team productivity
    By creating a standardized approach, these frameworks promise efficiency for larger teams with multiple contributors.

On paper, these benefits make frameworks an enticing choice for agencies. They simplify the initial build process and cater to developers accustomed to working with modern PHP practices and JavaScript-driven tooling. But whenever I inherit these projects years later, the cracks in the foundation begin to show.

The Reality of Maintaining Framework-Based Builds

While frameworks have their strengths, my firsthand experience reveals recurring issues that arise when it’s time to maintain or extend these builds. These challenges aren’t theoretical — they are issues I’ve encountered repeatedly when stepping into an existing framework-based site.

1. Abstraction Creates Friction

One of the selling points of frameworks is their use of abstractions, such as Blade templating and controller-to-view separation. While these patterns make sense in theory, they often lead to unnecessary complexity in practice.

For instance, Blade templates abstract PHP logic from WordPress’s traditional theme hierarchy. This means errors like syntax issues don’t provide clear stack traces pointing to the actual view file — rather, they reference compiled templates. Debugging becomes a scavenger hunt, especially for developers unfamiliar with Sage’s structure.

One example is a popular news outlet with millions of monthly visitors. When I first inherited their Sage-based theme, I had to bypass their Lando/Docker environment to use my own minimal Nginx localhost setup. The theme was incompatible with standard WordPress workflows, and I had to modify build scripts to support a traditional installation. Once I resolved the environment issues, I realized their build process was incredibly slow, with hot module replacement only partially functional (Blade template changes wouldn’t reload). Each save took 4–5 seconds to compile.

Faced with a decision to either upgrade to Sage 10 or rebuild the critical aspects, I opted for the latter. We drastically improved performance by replacing the Sage build with a simple Laravel Mix process. The new build process was reduced from thousands of lines to 80, significantly improving developer workflow. Any new developer could now understand the setup quickly, and future debugging would be far simpler.

2. Inflexible Patterns

While Sage encourages “best practices,” these patterns can feel rigid and over-engineered for simple tasks. Customizing basic WordPress features — like adding a navigation menu or tweaking a post query — requires following the framework’s prescribed patterns. This introduces a learning curve for developers who aren’t deeply familiar with Sage, and slows down progress for minor adjustments.

Traditional WordPress theme structures, by contrast, are intuitive and widely understood. Any WordPress developer, regardless of background, can jump into a classic theme and immediately know where to look for templates, logic, and customizations. Sage’s abstraction layers, while well-meaning, limit accessibility to a smaller, more niche group of developers.

3. Hosting Compatibility Issues

When working with Sage, issues with hosting environments are inevitable. For example, Sage’s use of Laravel Blade compiles templates into cached PHP files, often stored in directories like /wp-content/cache. Strict file system rules on managed hosting platforms, like WP Engine, can block these writes, leading to white screens or broken templates after deployment.

This was precisely the issue I faced with a custom agency-built theme using the Sage theme on WPEngine.” Every Git deployment resulted in a white screen of death due to PHP errors caused by Blade templates failing to save in the intended cache directory. The solution, recommended by WP Engine support, was to use the system’s /tmp directory. While this workaround prevented deployment errors, it undermined the purpose of cached templates, as temporary files are cleared by PHP’s garbage collection. Debugging and implementing this solution consumed significant time — time that could have been avoided had the theme been designed with hosting compatibility in mind.

4. Breaking Changes And Upgrade Woes

Upgrading from Sage 9 to Sage 10 — or even from older versions of Roots — often feels like a complete rebuild. These breaking changes create friction for businesses that want long-term stability. Clients, understandably, are unwilling to pay for what amounts to refactoring without a visible return on investment. As a result, these sites stagnate, locked into outdated versions of the framework, creating problems with dependency management (e.g., Composer packages, Node.js versions) and documentation mismatches.

One agency subcontract I worked on recently gave me insight into Sage 10’s latest approach. Even on small microsites with minimal custom logic, I found the Bud-based build system sluggish, with watch processes taking over three seconds to reload.

For developers accustomed to faster workflows, this is unacceptable. Additionally, Sage 10 introduced new patterns and directives that departed significantly from Sage 9, adding a fresh learning curve. While I understand the appeal of mirroring Laravel’s structure, I couldn’t shake the feeling that this complexity was unnecessary for WordPress. By sticking to simpler approaches, the footprint could be smaller, the performance faster, and the maintenance much easier.

The Cost Of Over-Engineering

The issues above boil down to one central theme: over-engineering.

Frameworks like Sage introduce complexity that, while beneficial in theory, often outweighs the practical benefits for most WordPress projects.

When you factor in real-world constraints — like tight budgets, frequent developer turnover, and the need for intuitive codebases — the case for a minimal approach becomes clear.

Minimal WordPress setups embrace simplicity:

  • No abstraction for abstraction’s sake
    Traditional WordPress theme hierarchy is straightforward, predictable, and accessible to a broad developer audience.
  • Reduced tooling overhead
    Avoiding reliance on tools like Webpack or Blade removes potential points of failure and speeds up workflows.
  • Future-proofing
    A standard theme structure remains compatible with WordPress core updates and developer expectations, even a decade later.

In my experience, minimal setups foster easier collaboration and faster problem-solving. They focus on solving the problem rather than adhering to overly opinionated patterns.

Real World Example

Like many things, this all sounds great and makes sense in theory, but what does it look like in practice? Seeing is believing, so I’ve created a minimal theme that exemplifies some of the concepts I’ve described here. This theme is a work in progress, and there are plenty of areas where it needs work. It provides the top features that custom WordPress developers seem to want most in a theme framework.

Modern Features

Before we dive in, I’ll list out some of the key benefits of what’s going on in this theme. Above all of these, working minimally and keeping things simple and easy to understand is by far the largest benefit, in my opinion.

  • A watch task that compiles and reloads in under 100ms;
  • Sass for CSS preprocessing coupled with CSS written in BEM syntax;
  • Native ES modules;
  • Composer package management;
  • Twig view templating;
  • View-controller pattern;
  • Namespaced PHP for isolation;
  • Built-in support for the Advanced Custom Fields plugin;
  • Global context variables for common WordPress data: site_url, site_description, site_url, theme_dir, theme_url, primary_nav, ACF custom fields, the_title(), the_content().

Templating Language

Twig is included with this theme, and it is used to load a small set of commonly used global context variables such as theme URL, theme directory, site name, site URL, and so on. It also includes some core functions as well, like the_content(), the_title(), and others you’d routinely often use during the process of creating a custom theme. These global context variables and functions are available for all URLs.

While it could be argued that Twig is an unnecessary additional abstraction layer when we’re trying to establish a minimal WordPress setup, I chose to include it because this type of abstraction is included in Sage. But it’s also for a few other important reasons:

  • Old,
  • Dependable, and
  • Stable.

You won’t need to worry about any future breaking changes in future versions, and it’s widely in use today. All the features I commonly see used in Sage Blade templates can easily be handled with Twig similarly. There really isn’t anything you can do with Blade that isn’t possible with Twig.

Blade is a great templating language, but it’s best suited for Laravel, in my opinion. BladeOne does provide a good way to use it as a standalone templating engine, but even then, it’s still not as performant under pressure as Twig. Twig’s added performance, when used with small, efficient contexts, allows us to avoid the complexity that comes with caching view output. Compile-on-the-fly Twig is very close to the same speed as raw PHP in this use case.

Most importantly, Twig was built to be portable. It can be installed with composer and used within the theme with just 55 lines of code.

Now, in a real project, this would probably be more than 55 lines, but either way, it is, without a doubt, much easier to understand and work with than Blade. Blade was built for use in Laravel, and it’s just not nearly as portable. It will be significantly easier to identify issues, track them down with a direct stack trace, and fix them with Twig.

The view context in this theme is deliberately kept sparse, during a site build you’ll add what you specifically need for a particular site. A lean context for your views helps with performance and workflow.

Models & Controllers

The template hierarchy follows the patterns of good ol’ WordPress, and while some developers don’t like this, it is undoubtedly the most widely accepted and commonly understood standard. Each standard theme file uses a model where you define your data structures with PHP and hand off the theme as the context to a .twig view file.

Developers like the structure of separating server-side logic from a template, and in a classic MVC/MVVC pattern, we have our model, view, and controller. Here, I’m using the standard WordPress theme templates as models.

Currently, template files include some useful basics. You’re likely familiar with these standard templates, but I’ll list them here for posterity:

  • 404.php: Displays a custom “Page Not Found” message when a visitor tries to access a page that doesn’t exist.
  • archive.php: Displays a list of posts from a particular archive, such as a category, date, or tag archive.
  • author.php: Displays a list of posts by a specific author, along with the author’s information.
  • category.php: Displays a list of posts from a specific category.
  • footer.php: Contains the footer section of the theme, typically including closing HTML tags and widgets or navigation in the footer area.
  • front-page.php: The template used for the site’s front page, either static or a blog, depending on the site settings.
  • functions.php: Adds custom functionality to the theme, such as registering menus and widgets or adding theme support for features like custom logos or post thumbnails.
  • header.php: Contains the header section of the theme, typically including the site’s title, meta tags, and navigation menu.
  • index.php: The fallback template for all WordPress pages is used if no other more specific template (like category.php or single.php) is available.
  • page.php: Displays individual static pages, such as “About” or “Contact” pages.
  • screenshot.png: An image of the theme’s design is shown in the WordPress theme selector to give users a preview of the theme’s appearance.
  • search.php: Displays the results of a search query, showing posts or pages that match the search terms entered by the user.
  • single.php: Displays individual posts, often used for blog posts or custom post types.
  • tag.php: Displays a list of posts associated with a specific tag.

Extremely Fast Build Process For SCSS And JavaScript

The build is curiously different in this theme, but out of the box, you can compile SCSS to CSS, work with native JavaScript modules, and have a live reload watch process with a tiny footprint. Look inside the bin/*.js files, and you’ll see everything that’s happening.

There are just two commands here, and all web developers should be familiar with them:

  1. Watch
    While developing, it will reload or inject JavaScript and CSS changes into the browser automatically using a Browsersync.
  2. Build
    This task compiles all top-level *.scss files efficiently. There’s room for improvement, but keep in mind this theme serves as a concept.

Now for a curveball: there is no compile process for JavaScript. File changes will still be injected into the browser with hot module replacement during watch mode, but we don’t need to compile anything.

WordPress will load theme JavaScript as native ES modules, using WordPress 6.5’s support for ES modules. My reasoning is that many sites now pass through Cloudflare, so modern compression is handled for JavaScript automatically. Many specialized WordPress hosts do this as well. When comparing minification to GZIP, it’s clear that minification provides trivial gains in file reduction. The vast majority of file reduction is provided by CDN and server compression. Based on this, I believe the benefits of a fast workflow far outweigh the additional overhead of pulling in build steps for webpack, Rollup, or other similar packaging tools.

We’re fortunate that the web fully supports ES modules today, so there is really no reason why we should need to compile JavaScript at all if we’re not using a JavaScript framework like Vue, React, or Svelte.

A Contrarian Approach

My perspective and the ideas I’ve shared here are undoubtedly contrarian. Like anything alternative, this is bound to ruffle some feathers. Frameworks like Sage are celebrated in developer circles, with strong communities behind them. For certain use cases — like large-scale, enterprise-level projects with dedicated development teams — they may indeed be the right fit.

For the vast majority of WordPress projects I encounter, the added complexity creates more problems than it solves. As developers, our goal should be to build solutions that are not only functional and performant but also maintainable and approachable for the next person who inherits them.

Simplicity, in my view, is underrated in modern web development. A minimal WordPress setup, tailored to the specific needs of the project without unnecessary abstraction, is often the leaner, more sustainable choice.

Conclusion

Inheriting framework-based projects has taught me invaluable lessons about the real-world impact of theme frameworks. While they may impress in an initial pitch or during development, the long-term consequences of added complexity often outweigh the benefits. By adopting a minimal WordPress approach, we can build sites that are easier to maintain, faster to onboard new developers, and more resilient to change.

Modern tools have their place, but minimalism never goes out of style. When you choose simplicity, you choose a codebase that works today, tomorrow, and years down the line. Isn’t that what great web development is all about?

The Human Element: Using Research And Psychology To Elevate Data Storytelling

 

Effective data storytelling isn’t a black box. By integrating UX research & psychology, you can craft more impactful and persuasive narratives. Victor Yocco and Angelica Lo Duca outline a five-step framework that provides a roadmap for creating data stories that resonate with audiences on both a cognitive and emotional level.

Data storytelling is a powerful communication tool that combines data analysis with narrative techniques to create impactful stories. It goes beyond presenting raw numbers by transforming complex data into meaningful insights that can drive decisions, influence behavior, and spark action.

When done right, data storytelling simplifies complex information, engages the audience, and compels them to act. Effective data storytelling allows UX professionals to effectively communicate the “why” behind their design choices, advocate for user-centered improvements, and ultimately create more impactful and persuasive presentations. This translates to stronger buy-in for research initiatives, increased alignment across teams, and, ultimately, products and experiences that truly meet user needs.

For instance, The New York Times’ Snow Fall data story (Figure 1) used data to immerse readers in the tale of a deadly avalanche through interactive visuals and text, while The Guardian’s The Counted (Figure 2) powerfully illustrated police violence in the U.S. by humanizing data through storytelling. These examples show that effective data storytelling can leave lasting impressions, prompting readers to think differently, act, or make informed decisions.

The NYT Snow Fall displays data visualizations alongside a narrative of the events
Figure 1: The NYT Snow Fall displays data visualizations alongside a narrative of the events preceding and during a deadly avalanche.
The Guardian data story example
Figure 2: The Guardian The Counted tells a compelling data story of the facts behind people killed by the police in the US.

The importance of data storytelling lies in its ability to:

  • Simplify complexity
    It makes data understandable and actionable.
  • Engage and persuade
    Emotional and cognitive engagement ensures audiences not only understand but also feel compelled to act.
  • Bridge gaps
    Data storytelling connects the dots between information and human experience, making the data relevant and relatable.

While there are numerous models of data storytelling, here are a few high-level areas of focus UX practitioners should have a grasp on:

Narrative Structures: Traditional storytelling models like the hero’s journey (Vogler, 1992) or the Freytag pyramid (Figure 3) provide a backbone for structuring data stories. These models help create a beginning, rising action, climax, falling action, and resolution, keeping the audience engaged.

Freytag’s Pyramid
Figure 3: Freytag’s Pyramid provides a narrative structure for storytellers.

Data Visualization: Broadly speaking, these are the tools and techniques for visualizing data in our stories. Interactive charts, maps, and infographics (Cairo, 2016) transform raw data into digestible visuals, making complex information easier to understand and remember.

Narrative Structures For Data

Moving beyond these basic structures, let’s explore how more sophisticated narrative techniques can enhance the impact of data stories:

  • The Three-Act Structure
    This approach divides the data story into setup, confrontation, and resolution. It helps build context, present the problem or insight, and offer a solution or conclusion (Few, 2005).
  • The Hero’s Journey (Data Edition)
    We can frame a data set as a problem that needs a hero to overcome. In this case, the hero is often the audience or the decision-maker who needs to use the data to solve a problem. The data itself becomes the journey, revealing challenges, insights, and, ultimately, a path to resolution.
Example:
Presenting data on declining user engagement could follow the hero’s journey. The “call to adventure” is the declining engagement. The “challenges” are revealed through data points showing where users are dropping off. The “insights” are uncovered through further analysis, revealing the root causes. The “resolution” is the proposed solution, supported by data, that the audience (the hero) can implement.

Problems With Widely Used Data Storytelling Models

Many data storytelling models follow a traditional, linear structure: data selection, audience tailoring, storyboarding with visuals, and a call to action. While these models aim to make data more accessible, they often fail to engage the audience on a deeper level, leading to missed opportunities. This happens because they prioritize the presentation of data over the experience of the audience, neglecting how different individuals perceive and process information.

The traditional flow for creating a data-driven story.
Figure 4: The traditional flow for creating a data-driven story.

While existing data storytelling models adhere to a structured and technically correct approach to data creation, they often fall short of fully analyzing and understanding their audience. This gap weakens their overall effectiveness and impact.

  • Cognitive Overload
    Presenting too much data without context or a clear narrative overwhelms the audience. Instead of enlightenment, they experience confusion and disengagement. It’s like trying to drink from a firehose; the sheer volume becomes counterproductive. This overload can be particularly challenging for individuals with cognitive differences who may require information to be presented in smaller, more digestible chunks.
  • Emotional Disconnect
    Data-heavy presentations often fail to establish an emotional connection, which is crucial for driving audience engagement and action. People are more likely to remember and act upon information that resonates with their feelings and values.
  • Lack of Personalization
    Many data stories adopt a one-size-fits-all approach. Without tailoring the narrative to specific audience segments, the impact is diluted. A message that resonates with a CEO might not land with frontline employees.
  • Over-Reliance on Visuals
    While visuals are essential for simplifying data, they are insufficient without a cohesive narrative to provide context and meaning, and they may not be accessible to all audience members.

These shortcomings reveal a critical flaw: while current models successfully follow a structured data creation process, they often neglect the deeper, audience-centered analysis required for actual storytelling effectiveness. To bridge this gap,

Data storytelling must evolve beyond simply presenting information — it should prioritize audience understanding, engagement, and accessibility at every stage.

Improving On Traditional Models

Traditional models can be improved by focusing more on the following two critical components:

Audience understanding: A greater focus can be concentrated on who the audience is, what they need, and how they perceive information. Traditional models should consider the unique characteristics and needs of specific audiences. This lack of audience understanding can lead to data stories that are irrelevant, confusing, or even misleading.

Effective data storytelling requires a deep understanding of the audience’s demographics, psychographics, and information needs. This includes understanding their level of knowledge about the topic, their prior beliefs and attitudes, and their motivations for seeking information. By tailoring the data story to a specific audience, storytellers can increase engagement, comprehension, and persuasion.

Psychological principles: These models could be improved with insights from psychology that explain how people process information and make decisions. Without these elements, even the most beautifully designed data story may fall flat. Traditional models of data storytelling can be improved with two critical components that are essential for creating impactful and persuasive narratives: audience understanding and psychological principles.

By incorporating audience understanding and psychological principles into their storytelling process, data storytellers can create more effective and engaging narratives that resonate with their audience and drive desired outcomes.

Persuasion In Data Storytelling

All storytelling involves persuasion. Even if it’s a poorly told story and your audience chooses to ignore your message, you’ve persuaded them to do that. When your audience feels that you understand them, they are more likely to be persuaded by your message. Data-driven stories that speak to their hearts and minds are more likely to drive action. You can frame your message effectively when you have a deeper understanding of your audience.

Applying Psychological Principles To Data Storytelling

Humans process information based on psychological cues such as cognitive ease, social proof, and emotional appeal. By incorporating these principles, data storytellers can make their narratives more engaging, memorable, and persuasive.

Psychological principles help data storytellers tap into how people perceive, interpret, and remember information.

The Theory of Planned Behavior

While there is no single truth when it comes to how human behavior is created or changed, it is important for a data storyteller to use a theoretical framework to ensure they address the appropriate psychological factors of their audience. The Theory of Planned Behavior (TPB) is a commonly cited theory of behavior change in academic psychology research and courses. It’s useful for creating a reasonably effective framework to collect audience data and build a data story around it.

The TPB (Ajzen 1991) (Figure 5) aims to predict and explain human behavior. It consists of three key components:

  1. Attitude
    This refers to the degree to which a person has a favorable or unfavorable evaluation of the behavior in question. An example of attitudes in the TPB is a person’s belief about the importance of regular exercise for good health. If an individual strongly believes that exercise is beneficial, they are likely to have a favorable attitude toward engaging in regular physical activity.
  2. Subjective Norms
    These are the perceived social pressures to perform or not perform the behavior. Keeping with the exercise example, this would be how a person thinks their family, peers, community, social media, and others perceive the importance of regular exercise for good health.
  3. Perceived Behavioral Control
    This component reflects the perceived ease or difficulty of performing the behavior. For our physical activity example, does the individual believe they have access to exercise in terms of time, equipment, physical capability, and other potential aspects that make them feel more or less capable of engaging in the behavior?

As shown in Figure 5, these three components interact to create behavioral intentions, which are a proxy for actual behaviors that we often don’t have the resources to measure in real-time with research participants (Ajzen, 1991).

Visualization of the theory of planned behaviour
Figure 5: The factors of the TPB interact with each other, collectively shaping an individual's behavioral intentions, which, in turn, are the most proximal determinant of human social behavior.

UX researchers and data storytellers should develop a working knowledge of the TPB or another suitable psychological theory before moving on to measure the audience’s attitudes, norms, and perceived behavioral control. We have included additional resources to support your learning about the TPB in the references section of this article.

How To Understand Your Audience And Apply Psychological Principles

OK, we’ve covered the importance of audience understanding and psychology. These two principles serve as the foundation of the proposed model of storytelling we’re putting forth. Let’s explore how to integrate them into your storytelling process.

Introducing The Audience Research Informed Data Storytelling Model (ARIDSM)

At the core of successful data storytelling lies a deep understanding of your audience’s psychology. Here’s a five-step process to integrate UX research and psychological principles effectively into your data stories:

A Five-Step Journey for Integrating UX Research and Psychology into Data Storytelling
Figure 6: The 5 steps of the Audience Research Informed Data Storytelling Model (ARIDSM).

Step 1: Define Clear Objectives

Before diving into data, it’s crucial to establish precisely what you aim to achieve with your story. Do you want to inform, persuade, or inspire action? What specific message do you want your audience to take away?

Why it matters: Defining clear objectives provides a roadmap for your storytelling journey. It ensures that your data, narrative, and visuals are all aligned toward a common goal. Without this clarity, your story risks becoming unfocused and losing its impact.

How to execute Step 1: Start by asking yourself:

  • What is the core message I want to convey?
  • What do I want my audience to think, feel, or do after experiencing this story?
  • How will I measure the success of my data story?

Frame your objectives using action verbs and quantifiable outcomes. For example, instead of “raise awareness about climate change,” aim to “persuade 20% of the audience to adopt one sustainable practice.”

Example:
Imagine you’re creating a data story about employee burnout. Your objective might be to convince management to implement new policies that promote work-life balance, with the goal of reducing reported burnout cases by 15% within six months.

Step 2: Conduct UX Research To Understand Your Audience

This step involves gathering insights about your audience: their demographics, needs, motivations, pain points, and how they prefer to consume information.

Why it matters: Understanding your audience is fundamental to crafting a story that resonates. By knowing their preferences and potential biases, you can tailor your narrative and data presentation to capture their attention and ensure the message is clearly understood.

How to execute Step 2: Employ UX research methods like surveys, interviews, persona development, and testing the message with potential audience members.

Example:
If your data story aims to encourage healthy eating habits among college students, your research might conduct a survey of students to determine what types of attitudes exist towards specific types of healthy foods for eating, to apply that knowledge in your data story.

Step 3: Analyze and Select Relevant Audience Data

This step bridges the gap between raw data and meaningful insights. It involves exploring your data to identify patterns, trends, and key takeaways that support your objectives and resonate with your audience.

Why it matters: Careful data analysis ensures that your story is grounded in evidence and that you’re using the most impactful data points to support your narrative. This step adds credibility and weight to your story, making it more convincing and persuasive.

How to execute Step 3:

  • Clean and organize your data.
    Ensure accuracy and consistency before analysis.
  • Identify key variables and metrics.
    This will be determined by the psychological principle you used to inform your research. Using the TPB, we might look closely at how we measured social norms to understand directionally how the audience perceives social norms around the topic of the data story you are sharing, allowing you to frame your call to action in ways that resonate with these norms. You might run a variety of statistics at this point, including factor analysis to create groups based on similar traits, t-tests to determine if averages on your measurements are significantly different between groups, and correlations to see if there might be an assumed direction between scores on various items.
Example:
If your objective is to demonstrate the effectiveness of a new teaching method, analyzing how your audience perceives their peers to be open to adopting new methods, their belief that they are in control over the decision to use a new teaching method, and their attitude towards the effectiveness of their current teaching methods to create groups that have various levels of receptivity in trying new methods, allowing you to later tailor your data story for each group.

Step 4: Apply The Theory of Planned Behavior Or Your Psychological Principle Of Choice [Done Simultaneous With Step 3]

In this step, you will see that The Theory of Planned Behavior (TPB) provides a robust framework for understanding the factors that drive human behavior. It posits that our intentions, which are the strongest predictors of our actions, are shaped by three core components: attitudes, subjective norms, and perceived behavioral control. By consciously incorporating these elements into your data story, you can significantly enhance its persuasive power.

Why it matters: The TPB offers valuable insights into how people make decisions. By aligning your narrative with these psychological drivers, you increase the likelihood of influencing your audience’s intentions and, ultimately, their behavior. This step adds a layer of strategic persuasion to your data storytelling, making it more impactful and effective.

How to execute Step 4:

Here’s how to leverage the TPB in your data story:

Influence Attitudes: Present data and evidence that highlight the positive consequences of adopting the desired behavior. Frame the behavior as beneficial, valuable, and aligned with the audience’s values and aspirations.

This is where having a deep knowledge of the audience is helpful. Let’s imagine you are creating a data story on exercise and your call to action promoting exercise daily. If you know your audience has a highly positive attitude towards exercise, you can capitalize on that and frame your language around the benefits of exercising, increasing exercise, or specific exercises that might be best suited for the audience. It’s about framing exercise not just as a physical benefit but as a holistic improvement to their life. You can also tie it to their identity, positioning exercise as an integral part of living the kind of life they aspire to.

Shape Subjective Norms: Demonstrate that the desired behavior is widely accepted and practiced by others, especially those the audience admires or identifies with. Knowing ahead of time if your audience thinks daily exercise is something their peers approve of or engage in will allow you to shape your messaging accordingly. Highlight testimonials, success stories, or case studies from individuals who mirror the audience’s values.

If you were to find that the audience does not consider exercise to be normative amongst peers, you would look for examples of similar groups of people who do exercise. For example, if your audience is in a certain age group, you might focus on what data you have that supports a large percentage of those in their age group engaging in exercise.

Enhance Perceived Behavioral Control: Address any perceived barriers to adopting the desired behavior and provide practical solutions. For instance, when promoting daily exercise, it’s important to acknowledge the common obstacles people face — lack of time, resources, or physical capability — and demonstrate how these can be overcome.

Step 5: Craft A Balanced And Persuasive Narrative

This is where you synthesize your data, audience insights, psychological principles (including the TPB), and storytelling techniques into a compelling and persuasive narrative. It’s about weaving together the logical and emotional elements of your story to create an experience that resonates with your audience and motivates them to act.

Why it matters: A well-crafted narrative transforms data from dry statistics into a meaningful and memorable experience. It ensures that your audience not only understands the information but also feels connected to it on an emotional level, increasing the likelihood of them internalizing the message and acting upon it.

How to execute Step 5:

Structure your story strategically: Use a clear narrative arc that guides your audience through the information. Begin by establishing the context and introducing the problem, then present your data-driven insights in a way that supports your objectives and addresses the TPB components. Conclude with a compelling call to action that aligns with the attitudes, norms, and perceived control you’ve cultivated throughout the narrative.

Example:
In a data story about promoting exercise, you could:
  • Determine what stories might be available using the data you have collected or obtained. In this example, let’s say you work for a city planning office and have data suggesting people aren’t currently biking as frequently as they could, even if they are bike owners.
  • Begin with a relatable story about lack of exercise and its impact on people’s lives. Then, present data on the benefits of cycling, highlighting its positive impact on health, socializing, and personal feelings of well-being (attitudes).
  • Integrate TPB elements: Showcase stories of people who have successfully incorporated cycling into their daily commute (subjective norms). Provide practical tips on bike safety, route planning, and finding affordable bikes (perceived behavioral control).
  • Use infographics to compare commute times and costs between driving and cycling. Show maps of bike-friendly routes and visually appealing images of people enjoying cycling.
  • Call to action: Encourage the audience to try cycling for a week and provide links to resources like bike share programs, cycling maps, and local cycling communities.

Evaluating The Method

Our next step is to test our hypothesis that incorporating audience research and psychology into creating a data story will lead to more powerful results. We have conducted preliminary research using messages focused on climate change, and our results suggest some support for our assertion.

We purposely chose a controversial topic because we believe data storytelling can be a powerful tool. If we want to truly realize the benefits of effective data storytelling, we need to focus on topics that matter. We also know that academic research suggests it is more difficult to shift opinions or generate behavior around topics that are polarizing (at least in the US), such as climate change.

We are not ready to share the full results of our study. We will share those in an academic journal and in conference proceedings. Here is a look at how we set up the study and how you might do something similar when either creating a data story using our method or doing your own research to test our model. You will see that it closely aligns with the model itself, with the added steps of testing the message against a control message and taking measurements of the actions the message(s) are likely to generate.

Step 1: We chose our topic and the data set we wanted to explore. As I mentioned, we purposely went with a polarizing topic. My academic background was in messaging around conservation issues, so we explored that. We used data from a publicly available data set that states July 2023 was the hottest month ever recorded.

Step 2: We identified our audience and took basic measurements. We decided our audience would be members of the general public who do not have jobs working directly with climate data or other relevant fields for climate change scientists.

We wanted a diverse range of ages and backgrounds, so we screened for this in our questions on the survey to measure the TPB components as well. We created a survey to measure the elements of the TPB as it relates to climate change and administered the survey via a Google Forms link that we shared directly, on social media posts, and in online message boards related to topics of climate change and survey research.

Step 3: We analyzed our data and broke our audience into groups based on key differences. This part required a bit of statistical know-how. Essentially, we entered all of the responses into a spreadsheet and ran a factor analysis to define groups based on shared attributes. In our case, we found two distinct groups for our respondents. We then looked deeper into the individual differences between the groups, e.g., group 1 had a notably higher level of positive attitude towards taking action to remediate climate change.

Step 4 [remember this happens simultaneously with step 3]: We incorporated aspects of the TPB in how we framed our data analysis. As we created our groups and looked at the responses to the survey, we made sure to note how this might impact the story for our various groups. Using our previous example, a group with a higher positive attitude toward taking action might need less convincing to do something about climate change and more information on what exactly they can do.

Table 1 contains examples of the questions we asked related to the TPB. We used the guidance provided here to generate the survey items to measure the TPB related to climate change activism. Note that even the academic who created the TPB states there are no standardized questions (PDF) validated to measure the concepts for each individual topic.

ItemMeasuresScale
How beneficial do you believe individual actions are compared to systemic changes (e.g., government policies) in tackling climate change?Attitude1 to 5 with 1 being “not beneficial” and 5 being “extremely beneficial”
How much do you think the people you care about (family, friends, community) expect you to take action against climate change?Subjective Norms1 to 5 with 1 being “they do not expect me to take action” and 5 being “they expect me to take action”
How confident are you in your ability to overcome personal barriers when trying to reduce your environmental impact?Perceived Behavioral Control1 to 5 with 1 being “not at all confident” and 5 being “extremely confident”

Table 1: Examples of questions we used to measure the TPB factors. We asked multiple questions for each factor and then generated a combined mean score for each component.

Step 5: We created data stories aligned with the groups and a control story. We created multiple stories to align with the groups we identified in our audience. We also created a control message that lacked substantial framing in any direction. See below for an example of the control data story (Figure 7) and one of the customized data stories (Figure 8) we created.

Control data story.
Figure 7: Control data story. For the control story, we displayed the data around daily surface air temperature with some additional information explaining the chart. We did not attempt to influence behavior or tap into psychology to suggest there was urgency or persuade the participant to want to learn more. The color used in the chart comes from the initial chart generated in the source. We acknowledge that color is likely to present some psychological influence, given the use of red to represent extreme heat and cooler colors like blue to represent cooler time periods. (Large preview)
Example of the customized data story
Figure 8: Group 1 data story. Our measurements suggested that the participants in Group 1 had a higher level of awareness of climate change and the related negative impacts of more extreme temperatures. Therefore, we didn’t call out the potential negatives of climate change and instead focused on a more positive message of how we might make a positive impact. Group one had higher levels of subjective norms, suggesting that language promoting how others engage in certain behaviors might align with what they believe to be true. We focused on the community aspect of the message, encouraging them to act. (Large preview)

Step 6: We released the stories and took measurements of the likelihood of acting. Specific to our study, we asked the participants how likely they were to “Click here to LEARN MORE.” Our hypothesis was that individuals would express a notably higher likelihood to want to click to learn more on the data story aligned with their grouping, as compared to the competing group and the control group.

Step 7: We analyzed the differences between the preexisting groups and what they stated was their likelihood of acting. As I mentioned, our findings are still preliminary, and we are looking at ways to increase our response rate so we can present statistically substantiated findings. Our initial findings are that we do see small differences between the responses to the tailored data stories and the control data story. This is directionally what we would be expecting to see. If you are going to conduct a similar study or test out your messages, you would also be looking for results that suggest your ARIDS-derived message is more likely to generate the expected outcome than a control message or a non-tailored message.

Overall, we feel there is an exciting possibility and that future research will help us refine exactly what is critical about generating a message that will have a positive impact on your audience. We also expect there are better models of psychology to use to frame your measurements and message depending on the audience and topic.

For example, you might feel Maslow’s hierarchy of needs is more relevant to your data storytelling. You would want to take measurements related to these needs from your audience and then frame the data story using how a decision might help meet their needs.

Elevate Your Data Storytelling

Traditional models of data storytelling, while valuable, often fall short of effectively engaging and persuading audiences. This is primarily due to their neglect of crucial aspects such as audience understanding and the application of psychological principles. By incorporating these elements into the data storytelling process, we can create more impactful and persuasive narratives.

The five-step framework proposed in this article — defining clear objectives, conducting UX research, analyzing data, applying psychological principles, and crafting a balanced narrative — provides a roadmap for creating data stories that resonate with audiences on both a cognitive and emotional level. This approach ensures that data is not merely presented but is transformed into a meaningful experience that drives action and fosters change. As data storytellers, embracing this human-centric approach allows us to unlock the full potential of data and create narratives that truly inspire and inform.

Effective data storytelling isn’t a black box. You can test your data stories for effectiveness using the same research process we are using to test our hypothesis as well. While there are additional requirements in terms of time as a resource, you will make this back in the form of a stronger impact on your audience when they encounter your data story if it is shown to be significantly greater than the impact of a control message or other messages you were considering that don’t incorporate the psychological traits of your audience.

Please feel free to use our method and provide any feedback on your experience to the author.

Human-Centered Design Through AI-Assisted Usability Testing: Reality Or Fiction?

 

Could AI assist UX researchers by dynamically asking follow-up questions based on participant responses? Eduard Kuric discusses the significance of context in the creation of relevant follow-up questions for unmoderated usability testing, how an AI tasked with interactive follow-up should be validated, and the potential — along with the risks — of AI interaction in usability testing.

Unmoderated usability testing has been steadily growing more popular with the assistance of online UX research tools. Allowing participants to complete usability testing without a moderator, at their own pace and convenience, can have a number of advantages.

The first is the liberation from a strict schedule and the availability of moderators, meaning that a lot more participants can be recruited on a more cost-effective and quick basis. It also lets your team see how users interact with your solution in their natural environment, with the setup of their own devices. Overcoming the challenges of distance and differences in time zones in order to obtain data from all around the globe also becomes much easier.

However, forgoing the use of moderators also has its drawbacks. The moderator brings flexibility, as well as a human touch into usability testing. Since they are in the same (virtual) space as the participants, the moderator usually has a good idea of what’s going on. They can react in real-time depending on what they witness the participant do and say. A moderator can carefully remind the participants to vocalize their thoughts. To the participant, thinking aloud in front of a moderator can also feel more natural than just talking to themselves. When the participant does something interesting, the moderator can prompt them for further comment.

Meanwhile, a traditional unmoderated study lacks such flexibility. In order to complete tasks, participants receive a fixed set of instructions. Once they are done, they can be asked to complete a static questionnaire, and that’s it.

The feedback that the research & design team receives will be completely dependent on what information the participants provide on their own. Because of this, the phrasing of instructions and questions in unmoderated testing is extremely crucial. Although, even if everything is planned out perfectly, the lack of adaptive questioning means that a lot of the information will still remain unsaid, especially with regular people who are not trained in providing user feedback.

If the usability test participant misunderstands a question or doesn’t answer completely, the moderator can always ask for a follow-up to get more information. A question then arises: Could something like that be handled by AI to upgrade unmoderated testing?

Generative AI could present a new, potentially powerful tool for addressing this dilemma once we consider their current capabilities. Large language models (LLMs), in particular, can lead conversations that can appear almost humanlike. If LLMs could be incorporated into usability testing to interactively enhance the collection of data by conversing with the participant, they might significantly augment the ability of researchers to obtain detailed personal feedback from great numbers of people. With human participants as the source of the actual feedback, this is an excellent example of human-centered AI as it keeps humans in the loop.

There are quite a number of gaps in the research of AI in UX. To help with fixing this, we at UXtweak research have conducted a case study aimed at investigating whether AI could generate follow-up questions that are meaningful and result in valuable answers from the participants.

Asking participants follow-up questions to extract more in-depth information is just one portion of the moderator’s responsibilities. However, it is a reasonably-scoped subproblem for our evaluation since it encapsulates the ability of the moderator to react to the context of the conversation in real time and to encourage participants to share salient information.

Experiment Spotlight: Testing GPT-4 In Real-Time Feedback

The focus of our study was on the underlying principles rather than any specific commercial AI solution for unmoderated usability testing. After all, AI models and prompts are being tuned constantly, so findings that are too narrow may become irrelevant in a week or two after a new version gets updated. However, since AI models are also a black box based on artificial neural networks, the method by which they generate their specific output is not transparent.

Our results can show what you should be wary of to verify that an AI solution that you use can actually deliver value rather than harm. For our study, we used GPT-4, which at the time of the experiment was the most up-to-date model by OpenAI, also capable of fulfilling complex prompts (and, in our experience, dealing with some prompts better than the more recent GPT-4o).

In our experiment, we conducted a usability test with a prototype of an e-commerce website. The tasks involved the common user flow of purchasing a product.

Note: See our article published in the International Journal of Human-Computer Interaction for more detailed information about the prototype, tasks, questions, and so on).

In this setting, we compared the results with three conditions:

  1. A regular static questionnaire made up of three pre-defined questions (Q1, Q2, Q3), serving as an AI-free baseline. Q1 was open-ended, asking the participants to narrate their experiences during the task. Q2 and Q3 can be considered non-adaptive follow-ups to Q1 since they asked participants more directly about usability issues and to identify things that they did not like.
  2. The question Q1, serving as a seed for up to three GPT-4-generated follow-up questions as the alternative to Q2 and Q3.
  3. All three pre-defined questions, Q1, Q2, and Q3, each used as a seed for its own GPT-4 follow-up.

The following prompt was used to generate the follow-up questions:

The prompt to create AI-generated follow-up questions in an unmoderated usability test.
The prompt employed in our experiment to create AI-generated follow-up questions in an unmoderated usability test

To assess the impact of the AI follow-up questions, we then compared the results on both a quantitative and a qualitative basis. One of the measures that we analyzed is informativeness — ratings of the responses based on how useful they are at elucidating new usability issues encountered by the user.

As seen in the figure below, the informativeness dropped significantly between the seed questions and their AI follow-up. The follow-ups rarely helped identify a new issue, although they did help elaborate further details.

A graph showing AI follow-up questions compared to the pre-defined seed questions
Compared to the pre-defined seed questions, AI follow-up questions lacked informativeness about new usability issues.

The emotional reactions of the participants offer another perspective on AI-generated follow-up questions. Our analysis of the prevailing emotional valence based on the phrasing of answers revealed that, at first, the answers started with a neutral sentiment. Afterward, the sentiment shifted toward the negative.

In the case of the pre-defined questions Q2 and Q3, this could be seen as natural. While question Seed 1 was open-ended, asking the participants to explain what they did during the task, Q2 and Q3 focused more on the negative — usability issues and other disliked aspects. Curiously, the follow-up chains generally received even more negative receptions than their seed questions, and not for the same reason.

A graph showing sentiment analysis involving AI follow-up questions compared to the seed questions in the GPT variant.
Sentiment analysis reveals a drop in participant sentiment in questions involving AI follow-up questions compared to the seed questions in the GPT variant.

Frustration was common as participants interacted with the GPT-4-driven follow-up questions. This is rather critical, considering that frustration with the testing process can sidetrack participants from taking usability testing seriously, hinder meaningful feedback, and introduce a negative bias.

A major aspect that participants were frustrated with was redundancy. Repetitiveness, such as re-explaining the same usability issue, was quite common. While pre-defined follow-up questions yielded 27-28% of repeated answers (it’s likely that participants already mentioned aspects they disliked during the open-ended Q1), AI-generated questions yielded 21%.

That’s not that much of an improvement, given that the comparison is made to questions that literally could not adapt to prevent repetition at all. Furthermore, when AI follow-up questions were added to obtain more elaborate answers for every pre-defined question, the repetition ratio rose further to 35%. In the variant with AI, participants also rated the questions as significantly less reasonable.

Answers to AI-generated questions contained a lot of statements like “I already said that” and “The obvious AI questions ignored my previous responses.”
A graph showing repetition of answers in follow-up questions in the unmoderated usability test.
Repetition of answers in follow-up questions in the unmoderated usability test. Seed questions and their GPT-4 follow-up form a group. This allows us to distinguish the repetitions of AI follow-up answers depending on whether the information they repeat originates from the same group (intra-group) or from other groups (inter-group).

The prevalence of repetition within the same group of questions (the seed question, its follow-up questions, and all of their answers) can be seen as particularly problematic since the GPT-4 prompt had been provided with all the information available in this context. This demonstrates that a number of the follow-up questions were not sufficiently distinct and lacked the direction that would warrant them being asked.

Insights From The Study: Successes And Pitfalls

To summarize the usefulness of AI-generated follow-up questions in usability testing, there are both good and bad points.

Successes:

  • Generative AI (GPT-4) excels at refining participant answers with contextual follow-ups.
  • Depth of qualitative insights can be enhanced.

Challenges:

  • Limited capacity to uncover new issues beyond pre-defined questions.
  • Participants can easily grow frustrated with repetitive or generic follow-ups.

While extracting answers that are a bit more elaborate is a benefit, it can be easily overshadowed if the lack of question quality and relevance is too distracting. This can potentially inhibit participants’ natural behavior and the relevance of feedback if they’re focusing on the AI.

Therefore, in the following section, we discuss what to be careful of, whether you are picking an existing AI tool to assist you with unmoderated usability testing or implementing your own AI prompts or even models for a similar purpose.

Recommendations For Practitioners

Context is the end-all and be-all when it comes to the usefulness of follow-up questions. Most of the issues that we identified with the AI follow-up questions in our study can be tied to the ignorance of proper context in one shape or another.

Based on real blunders that GPT-4 made while generating questions in our study, we have meticulously collected and organized a list of the types of context that these questions were missing. Whether you’re looking to use an existing AI tool or are implementing your own system to interact with participants in unmoderated studies, you are strongly encouraged to use this list as a high-level checklist. With it as the guideline, you can assess whether the AI models and prompts at your disposal can ask reasonable, context-sensitive follow-up questions before you entrust them with interacting with real participants.

Without further ado, these are the relevant types of context:

  • General Usability Testing Context.
    The AI should incorporate standard principles of usability testing in its questions. This may appear obvious, and it actually is. But it needs to be said, given that we have encountered issues related to this context in our study. For example, the questions should not be leading, ask participants for design suggestions, or ask them to predict their future behavior in completely hypothetical scenarios (behavioral research is much more accurate for that).
  • Usability Testing Goal Context.
    Different usability tests have different goals depending on the stage of the design, business goals, or features being tested. Each follow-up question and the participant’s time used in answering it are valuable resources. They should not be wasted on going off-topic. For example, in our study, we were evaluating a prototype of a website with placeholder photos of a product. When the AI starts asking participants about their opinion of the displayed fake products, such information is useless to us.
  • User Task Context.
    Whether the tasks in your usability testing are goal-driven or open and exploratory, their nature should be properly reflected in follow-up questions. When the participants have freedom, follow-up questions could be useful for understanding their motivations. By contrast, if your AI tool foolishly asks the participants why they did something closely related to the task (e.g., placing the specific item they were supposed to buy into the cart), you will seem just as foolish by association for using it.
  • Design Context.
    Detailed information about the tested design (e.g., prototype, mockup, website, app) can be indispensable for making sure that follow-up questions are reasonable. Follow-up questions should require input from the participant. They should not be answerable just by looking at the design. Interesting aspects of the design could also be reflected in the topics to focus on. For example, in our study, the AI would occasionally ask participants why they believed a piece of information that was very prominently displayed in the user interface, making the question irrelevant in context.
  • Interaction Context.
    If Design Context tells you what the participant could potentially see and do during the usability test, Interaction Context comprises all their actual actions, including their consequences. This could incorporate the video recording of the usability test, as well as the audio recording of the participant thinking aloud. The inclusion of interaction context would allow follow-up questions to build on the information that the participant already provided and to further clarify their decisions. For example, if a participant does not successfully complete a task, follow-up questions could be directed at investigating the cause, even as the participant continues to believe they have fulfilled their goal.
  • Previous Question Context.
    Even when the questions you ask them are mutually distinct, participants can find logical associations between various aspects of their experience, especially since they don’t know what you will ask them next. A skilled moderator may decide to skip a question that a participant already answered as part of another question, instead focusing on further clarifying the details. AI follow-up questions should be capable of doing the same to avoid the testing from becoming a repetitive slog.
  • Question Intent Context.
    Participants routinely answer questions in a way that misses their original intent, especially if the question is more open-ended. A follow-up can spin the question from another angle to retrieve the intended information. However, if the participant’s answer is technically a valid reply but only to the word rather than the spirit of the question, the AI can miss this fact. Clarifying the intent could help address this.

When assessing a third-party AI tool, a question to ask is whether the tool allows you to provide all of the contextual information explicitly.

If AI does not have an implicit or explicit source of context, the best it can do is make biased and untransparent guesses that can result in irrelevant, repetitive, and frustrating questions.

Even if you can provide the AI tool with the context (or if you are crafting the AI prompt yourself), that does not necessarily mean that the AI will do as you expect, apply the context in practice, and approach its implications correctly. For example, as demonstrated in our study, when a history of the conversation was provided within the scope of a question group, there was still a considerable amount of repetition.

The most straightforward way to test the contextual responsiveness of a specific AI model is simply by conversing with it in a way that relies on context. Fortunately, most natural human conversation already depends on context heavily (saying everything would take too long otherwise), so that should not be too difficult. What is key is focusing on the varied types of context to identify what the AI model can and cannot do.

The seemingly overwhelming number of potential combinations of varied types of context could pose the greatest challenge for AI follow-up questions.

For example, human moderators may decide to go against the general rules by asking less open-ended questions to obtain information that is essential for the goals of their research while also understanding the tradeoffs.

In our study, we have observed that if the AI asked questions that were too generically open-ended as a follow-up to seed questions that were open-ended themselves, without a significant enough shift in perspective, this resulted in repetition, irrelevancy, and — therefore — frustration.

The fine-tuning of the AI models to achieve an ability to resolve various types of contextual conflict appropriately could be seen as a reliable metric by which the quality of the AI generator of follow-up questions could be measured.

Researcher control is also key since tougher decisions that are reliant on the researcher’s vision and understanding should remain firmly in the researcher’s hands. Because of this, a combination of static and AI-driven questions with complementary strengths and weaknesses could be the way to unlock richer insights.

Various types of context on which follow-up question generation is dependent on.

A focus on contextual sensitivity validation can be seen as even more important while considering the broader social aspects. Among certain people, the trend-chasing and the general overhype of AI by the industry have led to a backlash against AI. AI skeptics have a number of valid concerns, including usefulness, ethics, data privacy, and the environment. Some usability testing participants may be unaccepting or even outwardly hostile toward encounters with AI.

Therefore, for the successful incorporation of AI into research, it will be essential to demonstrate it to the users as something that is both reasonable and helpful. Principles of ethical research remain as relevant as ever. Data needs to be collected and processed with the participant’s consent and not breach the participant’s privacy (e.g. so that sensitive data is not used for training AI models without permission).

Conclusion: What’s Next For AI In UX?

So, is AI a game-changer that could break down the barrier between moderated and unmoderated usability research? Maybe one day. The potential is certainly there. When AI follow-up questions work as intended, the results are exciting. Participants can become more talkative and clarify potentially essential details.

To any UX researcher who’s familiar with the feeling of analyzing vaguely phrased feedback and wishing that they could have been there to ask one more question to drive the point home, an automated solution that could do this for them may seem like a dream. However, we should also exercise caution since the blind addition of AI without testing and oversight can introduce a slew of biases. This is because the relevance of follow-up questions is dependent on all sorts of contexts.

Humans need to keep holding the reins in order to ensure that the research is based on actual solid conclusions and intents. The opportunity lies in the synergy that can arise from usability researchers and designers whose ability to conduct unmoderated usability testing could be significantly augmented.

Humans + AI = Better Insights
Illustration says Humans + AI = Better Insights.

The best approach to advocate for is likely a balanced one. As UX researchers and designers, humans should continue to learn how to use AI as a partner in uncovering insights. This article can serve as a jumping-off point, providing a list of the AI-driven technique’s potential weak points to be aware of, to monitor, and to improve on.