n February 2026, Fireflies (#NotSponsored) asked if I wanted to create a Voice Agent. They said it could be used for user research.
Sure, I’d love to try that.
I used its default questions, of which there were only a few. Can you describe how you currently do this task or workflow? What parts of this experience are most frustrating? What parts work well for you today? How often do you run into issues here?
These are not my favorite questions, but let’s see how the default questions go. One might imagine they would produce excellent results, especially if the bot was trained on using these questions.
Screenshot from the setup page. I’m not sure if this would have gone significantly better by uploading a knowledge base. Either AI can ask questions to meet the research goals or it can’t.
Here is how that spoken conversation went, transcribed by Fireflies. I spoke into my computer headset. “Fred” the Fireflies AI spoke with a male American voice, though I think there was a choice of voices in the settings. My commentary afterward.
User Research Agent <> Debbie Levitt Meeting Date: 3rd Feb, 2026–2:14 PM
Link to session (may or may not work without a Fireflies account): https://app.fireflies.ai/view/User-Research-Agent-Debbie-Levitt::01KGHT8NV1EB5E4ED9S2GXH1XN
Agent: Hi, Debbie Levitt. Thanks for joining. How are you doing today?
Debbie Levitt: Okay. How about you?
Agent: I’m doing well, thanks for asking. I’ll be asking a few questions to understand your experience and perspective. There are no right or wrong answers. Feel free to take your time. Sound okay?
Debbie Levitt: Sure.
Agent: Great. Thanks. Can you describe how you currently do this task or workflow?
Debbie Levitt: Yeah, it started when I was trying to buy a rack case for my DJ mixer and I had a lot of problems trying to buy the right thing that would fit my mixer.
Agent: Could you walk me through a bit more about what that process looked like?
Debbie Levitt: Yeah. I was using the website to try to find what would fit my mixer, but there were a lot of confusing terms like things that were measured in U, the letter U, and ra. These were terms I wasn’t familiar with. I could measure my mixer, but I didn’t know what it was in terms of U or ra, and it was hard to find the right thing on the website.
Agent: Got it. What parts of this experience are most frustrating?
Debbie Levitt: The parts I didn’t even tell you about yet. There’s an entire task in which I was trying to buy the right thing and ended up having to deal with customer support and they were trying to guess what would fit. It was really quite a long task and process that we haven’t really covered.
Agent: Sure, take your time. Can you walk me through what happened with that?
Debbie Levitt: Yeah. After I did some searching on the site, I ended up contacting customer support and I tried to ask them what would fit my mixer and even they weren’t particularly sure. They ended up writing me back with some suggestions and went back and forth a number of times until I selected an item which they confirmed would fit. And so I bought it. But when I got did not, it, let’s just say it was not as described or expected and I ended up having to return it. Even customer support admitted that it didn’t end up working out for me and they gave me a refund.
Agent: Thanks. So, just to make sure I understand, what were the most frustrating parts of that whole experience for you?
Debbie Levitt: Really? Everything. I’m not sure any part of that went particularly well. There were problems with all of it.
Agent: Okay. What parts work well for you today?
Debbie Levitt: I guess the only thing that might have worked well if I had to pick something was that this website did have a good variety of equipment. I had many choices, but of course I couldn’t tell which choices were right for me.
Agent: Got it. How often do you run into issues here?
Debbie Levitt: Well, how often do you shop for a rack case for your DJ mixer, so, you know, this is the first time I tried it and it didn’t go well. I’m not sure frequency matters here.
Agent: I had a brief connectivity issue. I’m back. Let me continue.
Debbie Levitt: Sure. Where were you?
Agent: We just finished going through all the questions I had asked. If there was anything else you’d like to share before we wrap up?
Debbie Levitt: I don’t think so. I guess I’m wondering what your key takeaway is from this.
Agent: I appreciate you asking, but my role here is really just to listen and gather your perspective. I’m not here to interpret or draw conclusions. That happens later in the process. Is there anything else about your experience you’d like to add before we finish up?
Debbie Levitt: No, thanks.
Agent: Thanks for sharing your thoughts today. I really appreciate your time. Take care.
Debbie Levitt: Bye. Bye.
Here is its summary of that conversation.
- Product Fit Confusion: Customers struggle with unclear sizing terms, leading to poor product fit and multiple support queries.
- Support Inefficiency: Customer support lacks tools for accurate product recommendations, causing delays and increasing operational costs.
- Frustrating Experience: A complicated purchase process risks negative reviews and churn, especially for infrequent buyers like Debbie.
- Need for Clarity: Technical terms must be simplified to enhance user experience and reduce return rates.
- Improved Guidance: Better fit tools and clearer specs could empower customers and minimize support loads, enhancing profitability.
- Trust Building: Smooth first experiences are crucial for infrequent purchases to foster customer trust and loyalty.
Would this be news to your company?
Let’s imagine that you want to research customers who have had issues that led to item returns. Which of the above do you already not know? Is any of this news to you?
More importantly, after reading that conversation, what do you understand about the customer’s experiences or journey? Do you understand enough hows and whys to detail the root causes, create a strategy, and consider solutions?
Does the user really feel listened to? Understood?
Did we meet or exceed research goals?
What would a human Researcher do differently?
A well-qualified human Researcher would have understood that there are multiple tasks here. “Tell me about a time…” or “Walk me through your task…” wouldn’t be granular enough to understand what went well or badly for me during shopping, selection/decision-making, communication with Customer Support, receiving the item, realizing it wasn’t right, dealing with Customer Support again, and returning it.
My journey had multiple tasks or sub-tasks, and multiple experiences. The devil is in the details. The problem started with me not understanding how rack cases are measured, but my problems got worse when Customer Support recommended the wrong items. There was a whole panic-filled task when the rack case came, and I thought there was something wrong with me because the case didn’t fit as I expected.
This AI conversation is one centimeter better than nothing, and does not replace or augment humans.
I would say it shines a spotlight on how much we still need human Researchers.
As a professional Researcher, I wouldn’t add a tool like this to my research studies. That was a very limited conversation. Perhaps it’s just Fireflies and it being a new feature. But I’ve tried a bunch of these now, and these are still not tools that I would add to my process.
If you are not speaking to customers at all, this might seem amazing. This might feel like a customer connection that your company has never made before. That’s a great first step. Though that connection was made between your customer and a robot that didn’t really care about them… so there’s plenty of room for improvement.
Now, allow qualified Researchers to have decent conversations with real humans. They’ll appreciate it! They’ll appreciate someone really listening and responding to what they say. Hearing what they say and what they don’t say. Digging for the hows and whys so that our company can understand what’s going well or badly.
AI can ask questions, but it’s not really a moderator or Researcher. We should understand the difference and set our expectations accurately.
You can try these AI tools, but you’ll need to understand ahead of time that the technology in early 2026 is still very much like a badly spoken survey. We’re asking people fixed questions. We won’t be able to improvise too many follow-up questions. We won’t be able to understand a complex task or set of tasks. We might not learn whys and hows. But we can get “some feedback.”
Is there a scenario in which I would use this? Not at this time. To me, this is still fundamentally flawed, and the tech isn’t there yet.
I feel like if I wanted to survey someone or ask short questions, I could run a survey. I wouldn’t use an AI voice agent to really learn tasks, behaviors, journeys, or anything with complexity.
It could be good if you have a survey or something close to it, and you want to open this up to people who would rather speak answers due to time, context, or accessibility needs.
Hey, Deb, did you prefer talking to the AI? Like some AI companies claim?
I felt like it wasn’t really listening or understanding. I felt really awkward when it asked how I am… um… I guess I’m OK. Do you ask an AI how it’s doing today? That seems polite, but it’s a robot! It’s not going to say, “Aww, it’s humid, and my knee hurts!”
I felt like it was dying to barrel through the four questions it knew it had to ask. It asked me a broad “tell me a story” question, but then didn’t let me tell that story. As soon as I took a breath, it went to the next question. So it got pieces of the story. It only let me tell more of the story when I pointed out that I hadn’t finished.
And still, it didn’t get enough of the story. It got bits that I rushed through. I felt rushed. Thinking about research sessions with human Researchers, I didn’t smile. I wasn’t friendly. I didn’t feel like someone really got my story. I didn’t get follow-up questions that dug into my experience.
We tested Claude on the same story.
In January 2026, we tested Claude as a chat-based UX Researcher, and had it ask me about the same story (buying and returning my mixer case). I scored Claude a 3 out of 10 for moderating the session. You can watch that video here: https://youtube.com/live/w8fqtgDOTzE
Claude got a 3 for writing its own questions and improvising questions that were OK. Not the best questions. You could tell it was trained on material that probably came more from non-Researchers excited about research than professional and experienced qualitative Researchers.
If Claude got a 3, Fireflies (in Feb 2026) got a score of 1. This was barely a conversation. This was a survey disguised as a customer conversation. Fireflies didn’t have context, and didn’t ask about it. It had four questions, and it couldn’t seem to improvise other questions that would help it gather more data.
No comments:
Post a Comment