Hire a web Developer and Designer to upgrade and boost your online presence with cutting edge Technologies

Saturday, April 30, 2022

Testing The CLI The Way People Use It

 Have you ever wondered, why do people write CLI tools? When is a good time to think about yours? Today we’ll touch on these questions, along with some tips to remember when creating one. However, all of this serves as a prelude to the real topic: end-to-end testing of CLI tools.

Thousands of tools for the command-line interface (CLI) are out there, without exaggeration. They serve all kinds of purposes. Yarn is one of the most used CLIs in the world, bringing ease to the package management of millions of projects. Others are narrower in scope, serving as a way to communicate with a particular tool such as Webpack (webpack-cli) or TypeScript (tsc).

Every CLI serves its purpose, but they all have one thing in common: the interface part of the name. While it might seem odd or mystifying to the less technical people out there, it is one of the most common ways in which people communicate with and control programs. It’s especially odd when we remember that it’s the oldest way that people have interacted with a computer that didn’t involve plastic punch cards or uploading a program into the computer through some other means.

While people have come up with all kinds of ways to test web and other applications, CLI tools have been overlooked in this area for the most part. Today, we’ll touch on end-to-end testing of these tools, go through patterns to follow, and introduce a library to solve some of the issues we encounter along the way.

CLI Tools And Why You Need One

Before digging in, we should talk about creating a CLI tool in the first place. After all, to test a CLI, we would usually need to create one first. Forget for a moment about all of the tools used by millions of people, and focus instead on the use case of creating your own CLI. A couple of questions would need to be answered: Why would you do that, and how would you do that?

I spend most of my work-time at Pipedrive, a healthy-paced growing company with a little under a thousand employees as of the time of writing. Putting this into perspective is important.

Even a single person or a small team can suffer tremendously from a suboptimal repetitive process. Losing an hour a day is tremendously wasteful and often leads people to hating the task and everything connected to it. However, what makes the problem worse is scale. The more a task is repeated or the higher the number of people repeating the task, the bigger the problem becomes.

With a thousand people and several hundreds of engineers involved, any repetitive task could grow into ridiculous proportions. It would leak development resources, always a scarce commodity, no matter the position of your company.

That’s one of the main problems Pipedrive has been focusing on a lot lately, having grown to the decent size it is. We’ve been optimizing reusable things, cutting out repetitive work, and ideally getting rid of the need to reinvent the wheel between teams altogether.

That’s where the CLIs come in. It’s a powerful tool for optimizing repetitive work: It’s cheap to create, and it can access or run pretty much anything you need, from reading and writing to the file system to directly accessing remote databases. Your imagination truly is the limit. I’ve been involved in creating several CLI tools in the past year or two, but not only in this time period, nor the biggest of them at Pipedrive.

I’ve worked on my own open-source CLI tool in the past. The tool might not be the most used, but it does save a lot of time by testing the implementation of the Swup library on a website, which is an incredibly time-consuming task when done manually. The point is that you don’t have to be a thousand-person company to benefit from your own CLI tool.

Designing The CLI For Testing #

Now that we’ve established why one would need an own CLI tool, let’s get into the how. Plenty of guides exist around the internet on how to build your own CLI. That’s mainly the reason why we’ll skip this topic altogether. Instead, let’s focus on something more specific: designing the CLI so that it can be easily tested.

Ideally, each part of the CLI would be a standalone task that you can run and evaluate without running the CLI program. Most libraries, or templates of the sort, that are meant for building CLIs are designed that way already. That might be a conscious design of the creators, but it might just be an accidental byproduct of the purpose of the CLI in running specific, often small tasks.

Enough talking. Let’s throw some code examples into the mix. One of the most popular libraries, if not the most, for building CLI tools is Commander. It’s also one of the simplest tools, meaning that it doesn’t add much abstraction on top, and it mainly simplifies the definition and reading of possible options. A contrived example of a CLI would look something like the following:

const { Command } = require('commander');
const program = new Command();

const print = (string) => {
  console.log(string);
}
    
program
  .name('my-cli')
  .description('CLI to show off some cool stuff')
  .version('1.0.0');
    
program.command('print')
  .description('Print a string')
  .argument('<string>', 'string to print')
  .action(print);
    
program.parse();

The example nicely shows how Commander simplifies the management of what the CLI should run at all. For us, the important thing to notice is the line containing the definition of the action called for the print print command — in other words, the function called when the CLI is executed by running my-cli print hello!.

This is important because the actual handler of this command is a standalone function — that is, a function that can be imported, executed, mocked, or anything in between, without touching any other part of the program. In fact, this particular handler is something we could call a pure function, because it doesn’t have any side effects and it always returns the same output for the same input. However, even for non-pure functions, any side effects that could touch the file system, an external API, or whatnot can be mocked, making the testing powerful and replicable for more complex functions, too.

Make It About The User

We went through how separate parts of a CLI can be tested. Now let’s consider a different approach. Just as unit tests are usually accompanied by other more complex tests for your typical apps, the same approach and arguments can be used for the CLI. While testing separate parts of a program is always advisable, bringing automated tests much closer to the user’s use cases does have its charm.

It is not-so-coincidentally the same philosophy is followed by the Testing Library. It brings the flows that we use for testing a step closer to the way the code is used by its users. Thanks to this popular library, many people are so used to the idea that it might even seem obvious, although the same pattern with CLIs is not that common.

The more your tests resemble the way your software is used, the more confidence they can give you.

That’s a short but representative line from Testing Library’s documentation. Let’s unpack this idea. There are simply so many moving pieces in even the smallest programs. Every piece has the potential to break. These moving pieces could be anything. Let’s consider external dependencies as an example. They are usually trusted to follow semantic versioning, but there is always room for mistakes and accidental breaking by maintainers. This probability is significantly higher these days considering that a typical JavaScript project has hundreds of dependencies. These dependencies are not tested in our tests, and they are often mocked up altogether when unit testing.

The same applies to code that might be partially untested or to a scenario that just wasn’t considered.

That’s a technical perspective, but an even better description would be more fundamental. I, as developer of the program, don’t care whether some subpart gets called once or twice or that the values are different. My main concern is the business logic — that the program does for the user what I intend it to do.

If the program is supposed to create a file on disk, then that’s what we need to make sure works as expected. If the program does some other operation based on user input, then — you guessed it — that is what we need to make sure works.

In a sense, it’s a shift away from programming logic altogether, moving closer to the business use cases, because those are what matter at the end of the day.

From Idea To Implementation

As is usual with projects, now that we’ve established our reasoning, we can move to the practical part of the implementation. The goal is clear by now: allow testing of a CLI tool, resembling the way users use it as closely as possible.

In this section, let’s focus on the most interesting part of the task, which is how we would like the tests to look like. We’ll skip the implementation of the API for the most part, because that consists of technical problems that we can solve anytime.

Conveniently, all of the below-mentioned functionality is provided by a library that we’ve put together for today’s purpose. It’s named CLI Testing Library, which is not exactly creative, but given the similarities in philosophy to Testing Library, it aptly describes what it provides. The library is certainly in the early stages, but it’s been used to test code in production so far without issues. As mentioned, the implementation is not something we will dig into, but it can be reviewed on GitHub, and it is a fairly small code base.

Basic Execution

Thanks to the nature of the program we are testing, it’s quite simple to assume what we would like to do in the test: run a shell command that would run the program in a separate process, just as the CLI would be executed by the user in the terminal. For a moment, we can assume that the program is simple and does not require any further input from the user other than the initial options. Considering all of that, we can imagine that the ideal API might look something like this:

await execute('node my-cli.js my-first-command');

This looks sufficient for the basic use of executing a program and waiting for it to finish. The most obvious thing we would like to know is whether the program has finished successfully. CLI tools use exit codes for that, whereby the general convention is that anything above 0 represents an unsuccessful run of some sort. It’s similar to HTTP codes, with 200 being the equivalent of ultimate success. Such a code could definitely be captured and returned from our API execution, to later be compared:

const { exitCode } = await execute('node my-cli.js my-first-command');

That will surely do for a convenient basic API to run an end-to-end CLI test. Before moving on to the other points, let’s spice it up a little with some additional useful information, like the stdout and stderr of the program. In case you’re not familiar with these terms, they’re outputs you’d see in the console as a user, and these differ only in the purpose of the outputted text: default or error.

It would certainly be helpful to check whether the program printed what it was meant to, having finished successfully. Perhaps that’s what our program was meant to do after all, just print something. A simple extension of our existing API would suffice for that.

const { exitCode, stdout, stderr } = await execute('node my-cli.js my-first-command');

console.log(exitCode); // 0
console.log(stdout); // ["Hello worlds!"]
console.log(stderr); // []

With that, let’s call this our first iteration of the CLI Testing Library. We can execute a program, give it parameters, wait for it to finish, and evaluate some basic outcomes.

User Input

While executing a program and waiting for it to finish is sufficient for a basic program, a little more thought needs to be put into the implementation of a program with which the user can interact. A classic scenario would be asking the user to input text or select an option.

We can certainly get inspired by the Node.js API, which provides an exec function, fairly comparable to our own execute function described above, but it also provides a spawn method, which creates a process but in this case also allows for further interaction with the process after its creation. Not only will we have to use most of the same logic under the hood for our testing library, but because CLIs are always creating a process, we can also get inspired by Node.js for our own library’s API.

Let’s consider a basic scenario of the CLI asking only for text input from the user. For this trivial program alone, we’ll require several utilities. First, we need to wait for the actual input trigger (in other words, an instruction from the CLI to write some text). This instruction will be printed out in the stdout, mentioned earlier, so we would likely want to wait for a specific text question. Let’s call that waitForText, which would accept a string, and we’ll search for it any time the CLI program outputs a new line.

Next, it’s time to input the text that the program is asking for. In this case, we’ll have to interact with stdin under the hood, which is the input equivalent of stdout. Let’s call this utility writeText. Just like the previous utility function, it will accept a string.

Once the text is inputted into the process “console”, we would usually have to confirm it by pressing a key, such as “Enter”. That’s yet another interaction utility we can introduce, for pressing specific keys. Under the hood, we would also use stdin, of course, but let’s not concern ourselves with that. Let’s call it pressKey, which will accept the name of the key.

Now that the task is done, there is just one thing left to do: wait for the program to finish before we can evaluate whether it was executed successfully, and so on. waitForFinish is the obvious name for this. With all that in mind, we can imagine something like the following:

const { waitForText, writeText, pressKey, waitForFinish } = await spawn(
    'node my-cli ask-for-name'
);

await waitForText('What is your name?');
await writeText('Georgy');
await pressKey('Enter');
await waitForFinish();

With the code above, we can simulate the whole interaction of the user with the program. We can also accompany the spawn helper with some more information such as exit code, stdout, or stderr, just like we did for execute. For spawn, the ideal format might be a bit different — perhaps get’ers would be best because the values will be dynamic throughout the program’s execution, and these get’ers could technically be called any time. getStdout, getStderr, or getExitCode will do the trick.

const { getExitCode, getStdout, getStderr, waitForText, writeText, pressKey, waitForFinish } = await spawn(
    'node my-cli ask-for-name'
);

await waitForText('What is your name?');
await writeText('Georgy');
await pressKey('Enter');
await waitForFinish();

console.log(getExitCode());  // 0
console.log(getStdout());  // ["What is your name?", "Georgy", "Your name is Georgy"]
console.log(getStderr());  // []

With that, we’ve covered the main idea of testing more complex interactive CLI programs.

Enclosed And Independent Environment

Now that we’re deep into testing the CLI program by actually running the CLI, we should cover an important part of any test: its independence of other tests and test runs. Test frameworks usually support tests being run in parallel, in band, or the like, but that doesn’t mean we should limit ourselves to one of those. Each and every run, unless built otherwise, should be completely independent of everything else.

With your usual code, this is simple in most cases, but with the CLI and end-to-end tests, things can get tricky really quickly. The main problem is that the CLI often works with its surroundings. It might read some configuration files, generate other files, or manipulate the file system in some way. This means that each test needs to have its own temporary space on disk where the test use case can be prepared, with any files that might be needed for the test run. The same file-system space also needs to be cleaned up later so that nothing is left behind after each test run.

Creating a folder on disk is essential for any engine, and so it is for Node.js, one of the more mature engines. It even provides functionality for creating temporary folders somewhere on disk, wherever appropriate for the given operating system. The same functionality also gives us the path of the folder on disk so that it can be cleaned up when needed. Fortunately, we can easily use this cross-platform temporary-folder functionality for our test runs.

Let’s get back to our library API. It’s clear that each test should have some sort of prepare and cleanup stages. Something like the following would cover that, allowing for completely independent test runs:

const { execute, cleanup } = await prepareEnvironment();

const { exitCode } = await execute('node my-cli.js my-first-command');

await cleanup();

Now that we have a dedicated test-run root folder, we can create all kinds of helpers to manipulate this enclosed disk environment. Reading files, creating files and folders, making sure a file exists, listing a folder’s contents — all of these and many more helpers related to disk manipulation are already provided by the Node.js process itself. All we have to do is wrap them so that the root directory used is the temporary one we have created.

const {
  makeDir,
  writeFile,
  readFile,
  removeFile,
  removeDir,
  exists,
  ls,
} = await prepareEnvironment();

await makeDir('./subfolder');
await writeFile('./subfolder/file.txt', 'this will be file content');

const folderContent = await ls('./');
console.log(folderContent); // ["subfolder"]

const doesFileExists = await exists('./subfolder/file.txt');
console.log(doesFileExists); // true

const content = await readFile('./subfolder/file.txt');
console.log(content); // this will be file content

await removeFile('./subfolder/file.txt');
await removeDir('./subfolder'); // removes folder with any content

We should consider another thing related to cleanup. From the perspective of test execution, it’s completely unclear what are the contents of the CLI program itself. With end-to-end tests, we’re only concerned with what it does, not with the implementation. That means we cannot be sure what the subprocess contains or does, meaning that it can leave things hanging when executed. When the cleanup function is called in our test runs, we know that we’re done with testing. This means that part of the cleanup function could be a forceful teardown of anything that remains open or running.

Systems Differences #

More caveats arise when it comes to differences in systems and shells. The library already makes several normalization steps, described below.

It might be surprising, but even on a single system, different runs can produce a different stdout array of outputted lines. In some cases, lines might be missing, and in others, they might be there. Combined with the fact that the array will likely be used in combination with snapshots, this is unacceptable and needs to be normalized. In our case, always getting rid of empty lines would be a sufficient solution.

A similar thing needs to be done with all of those special symbols used by the shell — for example, the ones used to make the output in the correct color. These symbols can differ across shell engines with the same program; so, removing them from the output will simplify things.

There is another small but real use case with a system’s special symbols. For whatever reason, in one type of system, they might be clearly visible as a special character in the output, and in another, they might be completely invisible. Two identical strings not being considered the same would lead to an insanely annoying debugging situation. Again, deleting these is the way to go.

Last but not least, in the previous section we talked about creating a separate file-system space. The full path of the current execution folder will often be used in the CLI output. The same goes for the home directory of the system’s current user. Both paths could be part of the output and would cause a test failure in different environments. These need to be normalized so that they are not different in the CLI output in different systems and runs. We can replace these with something more generic, such as {base} and {home}, making it easily identifiable that a path is one of those special folders on disk.

Mocking

Let’s be honest: Whatever we want to test and however close we want to get to the use case that the user sees, from time to time there will simply be a use case where we’ll need to make some compromise.

A good example is the CLI running with some dependency on an external web API. Each run would be affected by an external force. Moreover, it would depend not only on the API itself, but also on the internet connection and possibly some other factors, such as a VPN connection. That would compromise the requirement of reproducible runs, which is crucial for testing. So, we would need to sacrifice the integrity of the CLI program in such a case.

There is no library-integrated way to solve that. Remember that the library concerns itself with executing a process and the things around it. It doesn’t touch or understand the underlying CLI in any way. That’s why the following is more of a technique that can be used to mock parts of the executed CLI.

For the mocks to take effect on any part of the CLI program’s code, they need to be a part of the program itself — that is, be a part of one process. There is only one reasonable solution to this: make the mock a part of the CLI. Any other solution would have to be specific and invasive with regard to how the child processes are being executed in the system. That’s not something we could simply implement in a library and cover all possible use cases. It would also make the test run inconsistent with the production run in a way that is not easily controlled by the CLI’s author.

Instead, let’s focus on the program extension that was mentioned. After all, the CLI being tested will usually also be the CLI being developed, so making another entry point with the mocks included should be fairly doable. The example below mocks a response received from Axios, assuming that the CLI uses Axios for this request.

// mock-and-run.js
import axios from 'axios';
import MockAdapter from 'axios-mock-adapter';

const mock = new MockAdapter(axios);

mock.onGet('http://example.com/').reply(200, 'mocked response');

require('./index');  // include the CLI entry

Once we have that, we can just run the CLI the same way we would without the mocks, except that the CLI won’t be making any actual external requests, and it will always get a reproducible mocked response for the request code.

const { exitCode } = await execute('node mock-and-run.js my-first-command');

Conclusion

Many would argue that testing is the core of quality software and long-term sustainability. Different kinds of testing bring different kinds of benefits. Remember that, as with any other testing, end-to-end testing is an additional kind of testing at our disposal, not a replacement. The same surely applies to CLI testing.

With end-to-end tests, we can be even more confident that the program we’re testing will do exactly what we want it to do, and won’t be broken by some mistakes only affecting runtime, like the ones that can pop up after updating dependencies.

The power of CLI testing lies in its flexible nature. Whenever we’re testing a program with the JavaScript library, we are certainly not restricted to Node.js programs. After all, we are executing a shell command; so, as long as the environment is able to execute the program as a process, any language will do.

 

 

Preventing Bad UX Through Integrated Design Workflows

 

Let’s take a moment to think about the time you’ve spent navigating intranets, password resets, project management software, or government websites. How many moments of technological frustration can you add up in the last few days when you think about them? Some of these websites and platforms are too important to avoid — they enable us to fulfill fundamental human transactions and operations. In today’s world, it’s become common to feel our energy is depleted by this steady stream of digital experiences.

Given our increasing dependency on digital interactions, advocating for good UX will become increasingly necessary, as our reliance on digital tools continues to grow. A new canvassing of experts in technology, communications, and social change by Pew Research Center presents a universal view that “people’s relationship with technology will deepen as larger segments of the population come to rely more on digital connections for work, education, health care, daily commercial transactions and essential social interactions.” As this shift toward what is dubbed a tele-everything world continues to unfold, the people who work in tech hold an incredible responsibility to ensure that their creations make life simpler, not more stressful or more time-consuming.

As a designer, I feel a sense of responsibility to dig deeper into why it’s so uncommon to encounter digital tools that are straight-up simple, empathetic, and helpful. In this article, we will explore the causes, as I’ve seen in my practice, look at the effects this can have on the team, and finally propose some actionable solutions that don’t just say: convince people to increase the budget.

Common Sources Of Bad UX In Your Product #

If good UX has been a hot topic in the industry for years, then why is bad UX still so common? The easy answer points toward product designers and developers as individuals who create the UX itself. However, if you believe that, then your bad UX problem will persist despite hiring the most competitive talent on the job market.

Based on my experiences as a UX Designer and Design Manager, here are the top four underlying reasons why your tech product might be experiencing Bad UX:

1. Under-Resourced Dev Teams For The Size Of A Company’s Goals #

These conditions place the team in a ‘starvation mode’ where delivering anything on time is already difficult enough; the steps required for quality UX are extremely difficult to prioritize. The issue here is that company leadership views Good UX as luxurious (which is quite hilarious, because UX is often a key differentiator in the most competitive products out there), even as a hindrance to velocity (which is equally hilarious, because of the disastrous impact Bad UX can have on velocity in the long run, but whatever, leadership).

“I encounter under-resourced dev teams constantly, and it’s disheartening every time. Usually, quality is the first thing to go, even though most professionals know it should be scope. Decision-makers in these contexts have a very hard time imagining scoping down, so they consistently push the team to move faster instead.”

— Aidan Gordon, Technology Lead

2. Under-Resourced Design Teams For The Number Of Developers #

A recent survey of 377 professionals by Nielsen Norman Group revealed that about a third of designers are outnumbered by at least 10 developers. Imagine the pressure on designers when there’s such a skewed ratio like this. They need to pump out screens and logic for devs (short for developers) to work on every week. The team’s production velocity is wrongly measured by its dev power, and because the design bottleneck is so strong, devs have to wing it and just kind of ‘figure out’ UX independently. Thoughtful user testing falls by the wayside, as designers’ workloads are unmanageable.

3. Misunderstanding “Agile” As “As Fast As Possible” #

Agile workflow tactics gained popularity without paying enough attention to the underlying rituals that enable them to be successful. According to Atlassian (the creators of Jira and Confluence), Agile calls for “collaborative cross-functional teams, open communication, collaboration, adaptation, and trust amongst team members.” Each and every one of those key aspects are easily deprioritized when a team’s strategic goals force them to operate in starvation mode in the first place. Agile, as it was designed, recognizes that good UX is the result of navigating continuous dependencies between all branches of the product team. In other words, Good UX requires a lot of back and forth, which is a kind of collaborative and communicative mode that immediately falls to the wayside when we are in a rush.

“I’ve observed companies that aren’t committed to an iterative mindset and process, but use “Agile” as a bandaid for quicker releases. Sometimes, there’s a fear coming from leadership that we might never return to fix something, or a fear that we won’t be able to sell version 1 without a fully functional feature X. Unless the whole company embraces iterations, the product team will either struggle to release quickly, or to release quality… the concern is if we release a v1 with less than perfect scope we will never go back to fixing it.”

— Jill Hesse, Director at Genomics Data Management

4. Misunderstanding The Meaning And Purpose Of UX #

Often, when I’ve been hired to work for teams, I have observed that the main issue was simply a mild and widespread confusion about what User Experience really is, both in the tech crew and the business crew. Misunderstanding the purpose of UX is akin to misunderstanding its value. If this happens on the business side of the company, then the product team will likely be under-resourced in the design department. If this misunderstanding happens on the product team level (perhaps due to a lack of designers in strategically influential positions, or lack of designers altogether), UX winds up being disregarded or thought of as just UI, which is to say: “something that can be added later.”

This summary is meant to offer a view of the operational and cultural forces that bring about UX failures. If you’re a leader in tech, I hope you draw the essential link between the happiness of your product team, the quality of the User Experience, and your business’ revenues. Your product team knows what conditions they need in place for them to produce a high-quality UX. They have some of the answers to your Bad UX problem, and they might be a heck of a lot simpler than you think.

The Impact Of Bad UX On Your Team And Company #

In organizational psychology and modern ways of viewing work, like Officevibe’s Employee Engagement Guide, there’s often a theme that comes up: happier employees make more productive employees/better work. I’d add users into this cycle somewhere because creating excellent experiences creates a virtuous cycle into revenue and solidifies the meaning we find in contributing positively to others.

On the flip side, when bad UX has lingered in a product for so long, it can feel like a mountain to overcome, and it grinds down the talented and passionate humans on your team.

The effect can play out on teams in a few ways I’ve seen in real life:

  • Long-term ‘UX bugs’ harm team morale.
    Over time, the glaring UX issues product can force a continuation of being caught between a rock and a hard place, where a revamp is increasingly needed, but would require more and more resources. In this kind of scenario, you might see designers regularly churning out band-aid features instead of creating elegant solutions. The team can still produce new, innovative features, but more slowly and with more mental (dare I say emotional) labor than is necessary. It basically just gets harder and harder to create stuff that you could be proud of. It can get demoralizing over time.
  • Lack of opportunities to create Good UX wears down confidence.
    As a designer (or other people on a product team), your job, your portfolio, your sense of credibility in the space, sense of confidence — and I’ll even go as far as saying your self-worth — are directly affected by the impact you feel from your work. You know you are talented, interested and capable enough to produce great things, but anyone caught in a Bad UX situation for a long time will see those joyful and creative feelings start to dim.
“As a designer, working in a user-driven product culture is so important for your own satisfaction. If you’re working within a company or team with a weak UX culture, you can get stuck meeting one or a few people’s biased preferences instead of hundreds or thousands of users’ real needs. You know you’re letting users down. In some cases, you’re even adding *more* friction and frustration into someone’s life… Over time, your confidence in the quality of your designs diminishes, and, eventually, so does your overall engagement at work.”

— Erica Gregor, Head of Design & Product at Penrose Partners

Bad UX hinders a team’s growth and strategic value.
When Bad UX pervades in such a way that it causes your team to lose time or motivation, under-delivery becomes the norm. It starts to seem like, from the outside lens, that your team isn’t relevant or competent. When it’s hard for a team to demonstrate its strategic/business value, investment in the team’s growth can slow down, and they don’t get to benefit from the innovation power of a more diverse range of skills and talents.

Solutions For Preventing Bad UX #

The discussions I’ve had in the industry about the causes of Bad UX always revolve around too little time or resources to achieve the elegant and empathetic design-dev workflow proposed by experts in Agile, Design Thinking, etc. But notice the irony of some of the biggest, most funded teams still producing Bad UX. Have you ever tried joining a Microsoft Teams call as a ‘free’ user? Not only is more time and resources a false solution to Bad UX, the very focus on “more resources as the solution to Bad UX” makes Good UX seem like a privilege that only the most funded projects can access.

“I am not sure if I have got stuck in Groundhog Day, or have become the center of the universe. I try to log in… It says I am not on Teams yet, and asks me to “Sign up.” I am taken to the Teams home page, where I click on “Sign up for free.” It says, I already have an account setup... So I click on “Sign in.” Now it asks me to open the app... And then it says that I am not on Teams…”

— Sumit Anantwar, on being stuck in a login loop on Microsoft Teams

I’ve seen this false perception of “Good UX as privilege” lead small companies to accept sub-par UX as the norm. They assume they can’t afford the price tag and this, in turn, invites a culture of complacency around user experience and a lowering of the design standard overall. This is a huge missed opportunity because smaller teams have a huge advantage in how nimble they can be if they can manage to exploit this advantage to the highest possible degree they can blow larger competitors out of the water. The key is mastering and experimenting with your culture and workflow surrounding user experience.

If you’re in or running a product team, whatever the size, maturity or degree of suffering from Bad UX problems, I’d like to recommend a few working practices that I’ve developed over the years in an effort to invoke joy and assure the quality in the work we do on product teams.

Flip The Script Around Who Is Responsible For UX Quality #

In my experience, there is a false sense across the industry that designers should be the only ones responsible for advocating for the user, ensuring UI bugs are fixed, validating use cases, designing interactions and finding potential gotchas or edge cases. Products and websites are becoming more complex, both in terms of technical possibilities and in terms of the interactions we need to make simple and easy. Does it really make sense for just designers to hold the knowledge on how to make a nice interaction? I think we’re underestimating how capable ‘non-designers’ are, and what our roles should be, and how we work together.

I’m not arguing that devs should be turned into designers, that’s the kind of a passive expectation already getting foisted on devs as it is. Shared responsibility does not mean equal effort and priority; instead, it means that we are all on the UX ship and care about steering it in the right direction. Most of this can be achieved through intentional changes to a team’s workflow with little impact on the actual time spent working.

Rewire Your Team’s Workflow, So That UX Is Touched By Many #

If I liked bumper stickers, I would have one that says, “We should be just as creative about our workflows as we are about our work itself.” Certain product workflows have simply become traditional ideals in the industry because of the pressing need to bring operational structure to the natural chaos of the creative and innovative process. Just because big companies write polished articles and online courses about how they work, doesn’t mean that you should adopt their way of doing things. If a multi-million-dollar company changed their font to Comic Sans would you do it too? Some great examples of clever workflow practices I’ve seen work are:

  • Early and creative collaboration between designers and developers.
    This practice works well to prevent future gotchas like discovering a feature-defining limitation to backend or frontend logic. It’s basically just having early open conversations (and being friends).

  • Map out complexity in a collaborative way.
    ‘Complexity mapping’ is the only phrase I can think of that encapsulates the moment when you document and draw things like: edge cases, user flows, sitemaps, and system logic. Drawing things out is a very efficient way to think through and communicate if/then logic. I don’t see teams drawing things out together as a group often enough. Don’t be afraid to draw together as a way of aligning the vision and important specs of a project. You save a lot of time and effort speaking about complicated things out loud, rather than just pointing.

  • Make research findings accessible to all team members.
    After collaborating with lots of dev teams, I see consistent enthusiasm for user feedback and research data, yet it’s not as widely shared as it could be. Invite devs to take notes during interviews once in a while and casually ask for their thoughts on snippets of user insights regularly. These kinds of active practices allow devs to care about the user experience in a real way, which goes deeper than reading reports or attending presentations.

  • Together with your team, set a standard together for what “good quality” means.
    It’s simple: code doesn’t get shipped until it’s high enough quality from everyone’s perspective. Your QA specialists, developers, designers, product owners, and product managers each have functional and non-functional requirements that are important to them. Find out what they are and get serious about meeting them before sending something out the door. Having shared standards ensures we agree on what unacceptable/good/great/amazing UX is.

Let Go Of Your Assumptions About What Developers And Designer Even Mean #

Almost every team I’ve worked with has a similar approach to identities on the team: management, research, design, development, and QA. For example, a designer on one team will have fairly identical responsibilities as a designer on another. These identities were designed over years of industry growth and workflow standardization — workflows that haven’t proven to be successful in producing Good UX in a widespread manner. So, I invite you to give your talented people permission to work on a project without so much attachment to their job descriptions; push them to contribute to aspects of the design-dev-QA flow that they feel drawn to, and observe what emerges.

When a product team is given an opportunity to play with their very identities, UX immediately reveals itself as not just a special thing ‘creatives’ can do. Some developers gravitate toward gaining a basic understanding of UX that would enable them to contribute to the ultimate quality of the project. Conversely, designers might discover they have a genuine need for a basic understanding of technical concepts. There are many more overlapping skills between designers and developers which can be harnessed!

With all of these practices, I’m urging you to think together better, which is absolutely needed in our intricate work scenarios as knowledge workers. The impact of collaborative pairing across disciplines has led to a lot of significant breakthroughs in my experience and creates and solidifies our connection to one another. Also, it’s 1000% more fun.

Real Life Examples Of Good UX And Great Design-Dev Collaboration #

Here are some concrete examples of where I (or my crew) have used non-traditional collaboration techniques to build better UX.

Fintech Case: What Happens When Devs And Designers Map Out Complexity Together #

As we were collaboratively mapping out the main flows for a fintech app with an intense application process, we discovered a big problem with the data model we were about to build everything on. As we were working on our flows, our tech lead dug into the data and found that more than 50% of applications included more than one person. Originally we assumed we could skip it, but were prompted to revisit the decision through this process. The crew immediately started the foundational backend work to enable an application to have more than 1 person associated with it, and now the majority of users have a smooth experience in the app.

Enterprise Case: How Design-Dev Collaboration Identified A Dealbreaking Project Barrier #

We were tasked with redesigning a command-line tool for an enterprise product that required the download and upload of an XML file. The weakest point in the tool was the lack of guidance and feedback people received as they used the tool. When we showed the wireframes with new error messages and guidance to the developers, they revealed to us that the tool already had problems parsing errors in the right order because of the nature of the XML file and the underlying database.

Once everyone realized that the main purpose of the redesign would be impossible to fulfill within the scope of work, we decided to scrap the project until we could fix things properly.

Biotech Case: How Design-Dev Collaboration Maintains A Healthy Project Scope #

We were designing a custom field configuration interaction and had designed a lot of cases as we went. Users of this biotech platform could create their own: number field, text field, multiline text field, radio button dropdown, toggles, and so on.

Thankfully, we had two developers review the design team’s wireframes and logic early on. One dev pointed out way more logic that needed to be defined because this scientific software had requirements that were buried deep in the code like the number of decimals to show, the maximum possible value, etc. This prevented a major scope creep during implementation and prevented users from being blocked at migration time.

Hopefully, these examples allow you to imagine just a small glimpse into the potential that these workflow improvements and collaborative culture can generate.

Conclusion #

Our work in the tech industry can feel like a grind at times: fast deadlines, rushing, redoing work you’ve already done, and pushing sub-par final implementation out the door. This, in part, is because a company’s strategy or a team’s workflows don’t help catch complexity early enough, so devs and designers have to respond by patching in weird UX solutions just to get a thing out the door.

Here’s a recap of some actionable steps that are sure to improve the UX culture in your team:

As a product leader:

  • Open a discussion with your team: What needs to be true in order to deliver higher quality user experiences? This question should help you notice frictions in their team structure and their workflows.
  • Offer the whole team (including developers, researchers, managers, and quality analysts) a learning experience about UX like this introductory course. Celebrate the end of the course (and put it into practice) together by designing and implementing a new feature.
  • Same goes for working in Agile: get everyone on the same page through a common learning experience like through this book and put it into practice together through a shared project.
  • Implement new, quick rituals that gather your whole team during the design process, especially in its early and messier stages. Your developers, managers, and QA people might feel out of place at first, but that’s only because the world has taught them to feel that way. This simple exposure to design will grow to influence what they care about in their work and eventually shift everyone’s sense of responsibility for good user experiences.

As a product team member, do what’s in your power about the recommendations above:

  • As a designer, invite a developer to a 30-minute meeting showing them a new feature you’re conceptualizing or some fresh research insights you are working with.
  • As any team member, host a discussion with your team about how you might deliver higher quality design without needing new resources. Test it out and share your learnings with your Manager.
  • As a developer, get a sense of some tactical UX/UI basics with this course and try using some of the principles next time you work on a feature, note the most seamless pieces you implemented and share with your team.

All in all, we want all levels of a company and all members of a product team working with the same definition and values around UX. In companies where there’s shared responsibility for the quality of a product, collaboration flows organically and frequently. This constant meeting of perspectives and skills is our way forward if we want to honor the idea that tech should help people save time and effort in as many ways as possible.

Resources #

 

 

 

 

 

 

 

Designing Better Navigation With Navigation Queries

When designing interfaces, we often focus on the usual suspects. How do we design better mega-menus and carousels? How do we support users with better breadcrumbs? How do we better display our sidebar navigation? And how do we provide a better search experience, along with decent filtering, sorting and search?

While all these features for navigation are absolutely important and useful, there are also a few other navigation patterns that are often forgotten or dismissed. We can think of them as navigation shortcuts, helping users get where they want to go, faster — without having to use traditional navigation at all.

As it turns out, sometimes they are much more effective, especially on large sites with thousands of pages, many of which have been gathering dust over the years.

Designing For Exploration

Our typical experience on the web is somewhat unusual. We visit websites with all kinds of various intents, yet to address that intent, we usually have to translate it into a meaningful combination of keywords, clicks, taps and selections. We rarely get the answers we need immediately; instead, we discover the answers in a long-winded journey between pages and sub-navigation items.

Gerry McGovern once rightfully suggested that more people have been on top of Mount Everest than have been to the 10th page of Google’s search results. This is probably true, yet usually, our interfaces provide long lists of options and are rarely designed for exploration. We surface a multitude of options rather than taking advantage of context and association, as Marcin Ignac has put it recently.

In fact, we leave users on their own with a few signposts along the way. They need to survey the landscape, jump between menu items, iterate on search queries, and scout tags and footer links.

It works, but it’s slow. To minimize the distance between intent and action, we can query users about their intent and then assist users in their journey. That’s when navigation queries come into play.

The idea behind navigation queries is not new. We’ve seen all kinds of variations of Madlib pattern, natural language forms and chatbots, all of which present a human-friendly way to specify intent without having to use input fields or navigation menus. Usually, we’d see the entire form presented as a sentence in front of us, with a few drop-downs allowing us to specify what is it exactly what we are looking for. However, we can also apply this concept more dynamically.

The idea, then, is to create a “query constructor” for the user’s intent. In our interface, we could show options to choose from and based on one answer, provide further options, all the way to the point where we guide a user to the page of interest. And that’s what we would call a navigation query.

On AO.de, the front page is dedicated to query the type of device that the customers are interested in. Once it’s selected, another selection appears, allowing users to specify one of the filters that could be applied to their query. And depending on that input, the third filter selection appears. Finally, a slider helps users to pick the right price range for the product.

In that example, customers don’t need to use the navigation or search at all to get relevant results. Obviously, it wouldn’t harm to replace a drop-down with a smart autocomplete to avoid dead-ends, but this works here, too.

 

On Commonbond.co, you could define your intent using a navigation query pattern. In a dedicated area on the page, additionally to the primary navigation on the top, users are presented with a drop-down. They can specify what exactly they’d like to do on the website, or what they are looking for. Once one option is selected, another drop-down appears, allowing them to specify their intent even further.

This experience mimics the drill-down navigation with multiple levels. Yet the difference is that users are making small decisions, one after another, without being confronted with the entire navigation at every step of the way.

Similar to mega-menus, there is no need to load a new page, and users can easily go between options without having to recalibrate their mouse pointer or finger in the menu. In fact, you could potentially also select multiple options at once and get only a selection of pages that are relevant to you.

 Cork Chamber uses a navigation query in addition to the rest of the navigation. ”I want to” is taking a primary spot in the navigation, driving users directly to the page of interest. Essentially it’s just a drop-down that provides users with a few options. But it could be extended with second and third-level selections. Notice how user-centric the navigation is though: “I want to” is focused on what the visitors of the page plan to do.

Sbahn.berlin, a public transportation service in Berlin, allows users to choose the view that fits them best and brings them directly to a page that they might not be able to easily spot otherwise. By choosing one of the options, they jump directly to the 4th-level navigation, without having to interact with a hover or click the menu at all.

Figuring out just the right page to book an appointment on City of Düsseldorf might take quite a while when going through the global navigation or external search. However, two drop-downs in the central area allow citizens to specify their intent and choose a location. The result, then, is a link to the page where they can complete their task. No need to use the navigation or search at all.

Monday.com is using a similar pattern for their onboarding flow. On the homepage, prospect customers can first select what they’d like to manage using the Monday.com product. Based on that input, one of the onboarding flows is triggered, guiding users to relevant boards. A great way to bring people to relevant views is by minimizing the distance between the intent and value.

Feature Comparison

Imagine that you’ve added a few headphones for comparison on an eCommerce retailer site. You probably don’t plan on purchasing all of them, but rather want to find the option that works best for you. What kind of experience are you expecting when comparing these items?

Most of the time, it will be a good ol’ feature comparison table, with multiple columns, one for each product, and hundreds of attributes to browse through. To navigate it, we’ll probably be using a lown mower pattern, going through the tables row by row, from right to left and back again. Admittedly, that’s a quite tiring and time-consuming undertaking.

In fact, nobody wakes up in the morning hoping to finally compare products by features in a comparison table matrix. As customers, we actually want to find out what a better option is, yet we need to do quite a bit of work to get there. Even though we might have very specific attributes in mind that we care about most. To improve their experience, we can just ask our users what their intent is.

 Not a typical feature comparison on Productchart.com. Instead of using a feature comparison table, Productchart maps all products in a two-dimensional space. Customers can choose the attribute on each axis, and they can also use filters to reduce the overall number of options to a more manageable selection. They can also highlight products of interest and compare them side by side.

On Mediamarkt, feature comparison is happening without tables altogether. Instead, when users choose to compare products, they are asked to choose relevant attributes first. Potentially it could even be an autocomplete multi-combo-box, complemented with all available features grouped into accordions.

Each selection becomes a single step in the evaluation journey, where customers can vote up and vote down products based on the features that they have. Once it’s done, they are presented with the winning option — based on their interests and preferences. Additionally, there is an option to see the entire feature comparison matrix and even download it as a PDF for convenience.

A-Z Index Pattern

Once a website keeps growing, it gets into navigation decay. New navigation items are added all over the place just because there don’t seem to be good existing categories under which they could live. So new categories get added, while older categories and partially outdated content never get deleted or archived. And because there are many different content managers involved, with many different content management systems, tags become inconsistent, categories are mislabeled and content is often duplicated — just in case.

The right way to address this issue is to redesign the information architecture and established guidelines for publishing, categorizing, archiving and deleting. That’s the role of governance, of course, and as such, it might be years until any significant changes get implemented. Yet during that time visitors of the site can hardly find any information on the site, and had to rely on Google or Bing instead, often landing on competing websites altogether.

One way to address that is by using an A-Z Index pattern. We identify the top tasks that users perform on the site. For each task, we define a set of keywords that they associate the task with. We run tree testing to ensure that they can find the pages that they are looking for. And then we surface the A-Z catalog of keywords on a single page.

 

In fact, that’s a very typical approach that many large websites, especially public service websites, will use — alongside search and global navigation. Every keyword is of course a link, driving users to the page of interest.

Sometimes each letter is represented on a separate page, and sometimes vertical accordions are used. In usability tests, the best way to show an A-Z index appears to be by listing all keywords on a single page — mostly because users can use in-browser search to look something up quickly without having to go and explore multiple pages.

To take it one step further, we could also expose relevant information right in the A-Z index. Rather than driving users to a dedicated page, they could choose what information they want to learn — opening hours, location, booking appointment links, etc. — and study that information without ever heading to individual pages.

A good example of a similar idea is University of Antwerp which surfaces useful information directly on the A-Z index page. Of course, this information could also be accessible within an accordion, but then we’d also need a button to open and collapse all accordions at once.

Aarhus University highlights the A-Z index as part of global navigation. Visitors can choose their role first, then choose a letter, and then explore the overview of all options available, jumping to a specific department or faculty.

Most importantly, visitors can quickly jump from any page to any other page. In this case, the A-Z index is permanently accessible in the header of each page. That’s not something other navigation patterns provide out of the box.

The only caveat here is that keywords appearing in the A-Z index have to be thoroughly tested to ensure that users actually find what they need in the index. And sometimes the index is complemented with an in-index search, which is very similar to autocomplete.

Tap-Ahead Autocomplete Pattern

We tend to use autocomplete to highlight relevant keyword suggestions. However, We could also drive users directly to relevant categories, specific products, brands, or even collections of items or records that we’ve prepared ahead of time.

On Prisma.fi, Hema.nl and Ikea.com, autocomplete prompts category suggestions, products, frequent searches as well as products and information about each product, from their length to their prices. Rather than focusing on a list of keywords, the autocomplete actually provides an overview of items that the users might be looking for.

On Prisma.fi, Hema.nl and Ikea.com, autocomplete prompts category suggestions, products, frequent searches as well as products and information about each product, from their length to their prices. Rather than focusing on a list of keywords, the autocomplete actually provides an overview of items that the users might be looking for.

Statistics Estonia 100 highlights an overview of articles but also the actual query results that a visitor might be looking for. Each type of data is marked, along with the recent statistics provided right in the autocomplete.

However, we could also take it to the next level entirely. We can provide users with helpful feedback on their query and guide them towards a better keyword query that would also bring them better results. And that’s exactly what the tap-ahead autocomplete pattern provides.

With tap-ahead autocomplete, we allow users to construct a query based on autocomplete suggestions. As users hit the autocomplete field or start typing a keyword, suggestions appear. Users can either jump directly to the keyword, or append frequently used keyword combinations to their query, hence “constructing” their query based on the suggestions.

Some large websites are using the tap-ahead pattern extensively. On Mediamarkt.de, users can click through to the keyword that matches the interest, or click on the arrow on the right-hand side. The user’s query is then replaced with the selected query, while still leaving the user within the search input. They can continue their iterations on search queries until they feel confident in specifying their intent well enough.

Tap-ahead minimizes the amount of effort needed for typing, but also drives customers to relevant results and gives them the confidence that they are actually on the right track.

If you are designing an interface for expert users, perhaps slightly more advanced ways to use search might be reasonable. Stackoverflow allows its users to specify a filter right in the search box, without having to rely on filters, tags, or any other modes of navigation. Only focus users receive hints about how to use search in a more advanced way — should they wish to do so.

Stripe also allows customers to specify filters right in the search box. Users can focus on typing their query in the search input, and as they do, they also see the results immediately.

Wrapping Up

When designing navigation, we often rely on predictable patterns. That’s a good thing as our outcome is usually predictable, familiar, and hence obvious to our customers.

However, sometimes navigation might be just a bit too tiring and time-consuming, and in such cases, we can use navigation queries to pick up our users whenever they are and gently guide them toward the page that is of interest to them. They are unlikely to help you resolve all IA issues on the site, but they could help users get where they want to be faster.

All the techniques mentioned above can help us get there. By no means do they replace established navigation patterns; they complement and add to the experience, especially on large and slightly outdated websites.

Next time you are working on navigation, consider designing more explorative interfaces for navigation and search; explore navigation queries, evaluation journeys, A-Z index, and tap-ahead autocomplete. They are unlikely to help you resolve all IA issues on the site, but they could help users get where they want to go be, faster. And sometimes that’s just what is needed at the current stage of the project.

Meet Smart Interface Design Patterns

If you are interested in similar insights around UX, take a look at Smart Interface Design Patterns, our shiny new 6h-video course with 100s of practical examples from real-life projects. Plenty of design patterns and guidelines on everything from accordions and dropdowns to complex tables and intricate web forms — with five new segments added every year. Just sayin’! Check a free preview.

 

 

 

Testing The CLI The Way People Use It

 Have you ever wondered, why do people write CLI tools? When is a good time to think about yours? Today we’ll touch on these questions, along with some tips to remember when creating one. However, all of this serves as a prelude to the real topic: end-to-end testing of CLI tools.

Thousands of tools for the command-line interface (CLI) are out there, without exaggeration. They serve all kinds of purposes. Yarn is one of the most used CLIs in the world, bringing ease to the package management of millions of projects. Others are narrower in scope, serving as a way to communicate with a particular tool such as Webpack (webpack-cli) or TypeScript (tsc).

Every CLI serves its purpose, but they all have one thing in common: the interface part of the name. While it might seem odd or mystifying to the less technical people out there, it is one of the most common ways in which people communicate with and control programs. It’s especially odd when we remember that it’s the oldest way that people have interacted with a computer that didn’t involve plastic punch cards or uploading a program into the computer through some other means.

While people have come up with all kinds of ways to test web and other applications, CLI tools have been overlooked in this area for the most part. Today, we’ll touch on end-to-end testing of these tools, go through patterns to follow, and introduce a library to solve some of the issues we encounter along the way.

CLI Tools And Why You Need One #

Before digging in, we should talk about creating a CLI tool in the first place. After all, to test a CLI, we would usually need to create one first. Forget for a moment about all of the tools used by millions of people, and focus instead on the use case of creating your own CLI. A couple of questions would need to be answered: Why would you do that, and how would you do that?

I spend most of my work-time at Pipedrive, a healthy-paced growing company with a little under a thousand employees as of the time of writing. Putting this into perspective is important.

Even a single person or a small team can suffer tremendously from a suboptimal repetitive process. Losing an hour a day is tremendously wasteful and often leads people to hating the task and everything connected to it. However, what makes the problem worse is scale. The more a task is repeated or the higher the number of people repeating the task, the bigger the problem becomes.

With a thousand people and several hundreds of engineers involved, any repetitive task could grow into ridiculous proportions. It would leak development resources, always a scarce commodity, no matter the position of your company.

That’s one of the main problems Pipedrive has been focusing on a lot lately, having grown to the decent size it is. We’ve been optimizing reusable things, cutting out repetitive work, and ideally getting rid of the need to reinvent the wheel between teams altogether.

That’s where the CLIs come in. It’s a powerful tool for optimizing repetitive work: It’s cheap to create, and it can access or run pretty much anything you need, from reading and writing to the file system to directly accessing remote databases. Your imagination truly is the limit. I’ve been involved in creating several CLI tools in the past year or two, but not only in this time period, nor the biggest of them at Pipedrive.

I’ve worked on my own open-source CLI tool in the past. The tool might not be the most used, but it does save a lot of time by testing the implementation of the Swup library on a website, which is an incredibly time-consuming task when done manually. The point is that you don’t have to be a thousand-person company to benefit from your own CLI tool.

Designing The CLI For Testing #

Now that we’ve established why one would need an own CLI tool, let’s get into the how. Plenty of guides exist around the internet on how to build your own CLI. That’s mainly the reason why we’ll skip this topic altogether. Instead, let’s focus on something more specific: designing the CLI so that it can be easily tested.

Ideally, each part of the CLI would be a standalone task that you can run and evaluate without running the CLI program. Most libraries, or templates of the sort, that are meant for building CLIs are designed that way already. That might be a conscious design of the creators, but it might just be an accidental byproduct of the purpose of the CLI in running specific, often small tasks.

Enough talking. Let’s throw some code examples into the mix. One of the most popular libraries, if not the most, for building CLI tools is Commander. It’s also one of the simplest tools, meaning that it doesn’t add much abstraction on top, and it mainly simplifies the definition and reading of possible options. A contrived example of a CLI would look something like the following:

const { Command } = require('commander');
const program = new Command();

const print = (string) => {
  console.log(string);
}
    
program
  .name('my-cli')
  .description('CLI to show off some cool stuff')
  .version('1.0.0');
    
program.command('print')
  .description('Print a string')
  .argument('<string>', 'string to print')
  .action(print);
    
program.parse();

The example nicely shows how Commander simplifies the management of what the CLI should run at all. For us, the important thing to notice is the line containing the definition of the action called for the print print command — in other words, the function called when the CLI is executed by running my-cli print hello!.

This is important because the actual handler of this command is a standalone function — that is, a function that can be imported, executed, mocked, or anything in between, without touching any other part of the program. In fact, this particular handler is something we could call a pure function, because it doesn’t have any side effects and it always returns the same output for the same input. However, even for non-pure functions, any side effects that could touch the file system, an external API, or whatnot can be mocked, making the testing powerful and replicable for more complex functions, too.

Make It About The User #

We went through how separate parts of a CLI can be tested. Now let’s consider a different approach. Just as unit tests are usually accompanied by other more complex tests for your typical apps, the same approach and arguments can be used for the CLI. While testing separate parts of a program is always advisable, bringing automated tests much closer to the user’s use cases does have its charm.

It is not-so-coincidentally the same philosophy is followed by the Testing Library. It brings the flows that we use for testing a step closer to the way the code is used by its users. Thanks to this popular library, many people are so used to the idea that it might even seem obvious, although the same pattern with CLIs is not that common.

The more your tests resemble the way your software is used, the more confidence they can give you.

That’s a short but representative line from Testing Library’s documentation. Let’s unpack this idea. There are simply so many moving pieces in even the smallest programs. Every piece has the potential to break. These moving pieces could be anything. Let’s consider external dependencies as an example. They are usually trusted to follow semantic versioning, but there is always room for mistakes and accidental breaking by maintainers. This probability is significantly higher these days considering that a typical JavaScript project has hundreds of dependencies. These dependencies are not tested in our tests, and they are often mocked up altogether when unit testing.

The same applies to code that might be partially untested or to a scenario that just wasn’t considered.

That’s a technical perspective, but an even better description would be more fundamental. I, as developer of the program, don’t care whether some subpart gets called once or twice or that the values are different. My main concern is the business logic — that the program does for the user what I intend it to do.

If the program is supposed to create a file on disk, then that’s what we need to make sure works as expected. If the program does some other operation based on user input, then — you guessed it — that is what we need to make sure works.

In a sense, it’s a shift away from programming logic altogether, moving closer to the business use cases, because those are what matter at the end of the day.

From Idea To Implementation #

As is usual with projects, now that we’ve established our reasoning, we can move to the practical part of the implementation. The goal is clear by now: allow testing of a CLI tool, resembling the way users use it as closely as possible.

In this section, let’s focus on the most interesting part of the task, which is how we would like the tests to look like. We’ll skip the implementation of the API for the most part, because that consists of technical problems that we can solve anytime.

Conveniently, all of the below-mentioned functionality is provided by a library that we’ve put together for today’s purpose. It’s named CLI Testing Library, which is not exactly creative, but given the similarities in philosophy to Testing Library, it aptly describes what it provides. The library is certainly in the early stages, but it’s been used to test code in production so far without issues. As mentioned, the implementation is not something we will dig into, but it can be reviewed on GitHub, and it is a fairly small code base.

Basic Execution #

Thanks to the nature of the program we are testing, it’s quite simple to assume what we would like to do in the test: run a shell command that would run the program in a separate process, just as the CLI would be executed by the user in the terminal. For a moment, we can assume that the program is simple and does not require any further input from the user other than the initial options. Considering all of that, we can imagine that the ideal API might look something like this:

await execute('node my-cli.js my-first-command');

This looks sufficient for the basic use of executing a program and waiting for it to finish. The most obvious thing we would like to know is whether the program has finished successfully. CLI tools use exit codes for that, whereby the general convention is that anything above 0 represents an unsuccessful run of some sort. It’s similar to HTTP codes, with 200 being the equivalent of ultimate success. Such a code could definitely be captured and returned from our API execution, to later be compared:

const { exitCode } = await execute('node my-cli.js my-first-command');

That will surely do for a convenient basic API to run an end-to-end CLI test. Before moving on to the other points, let’s spice it up a little with some additional useful information, like the stdout and stderr of the program. In case you’re not familiar with these terms, they’re outputs you’d see in the console as a user, and these differ only in the purpose of the outputted text: default or error.

It would certainly be helpful to check whether the program printed what it was meant to, having finished successfully. Perhaps that’s what our program was meant to do after all, just print something. A simple extension of our existing API would suffice for that.

const { exitCode, stdout, stderr } = await execute('node my-cli.js my-first-command');

console.log(exitCode); // 0
console.log(stdout); // ["Hello worlds!"]
console.log(stderr); // []

With that, let’s call this our first iteration of the CLI Testing Library. We can execute a program, give it parameters, wait for it to finish, and evaluate some basic outcomes.

User Input #

While executing a program and waiting for it to finish is sufficient for a basic program, a little more thought needs to be put into the implementation of a program with which the user can interact. A classic scenario would be asking the user to input text or select an option.

We can certainly get inspired by the Node.js API, which provides an exec function, fairly comparable to our own execute function described above, but it also provides a spawn method, which creates a process but in this case also allows for further interaction with the process after its creation. Not only will we have to use most of the same logic under the hood for our testing library, but because CLIs are always creating a process, we can also get inspired by Node.js for our own library’s API.

Let’s consider a basic scenario of the CLI asking only for text input from the user. For this trivial program alone, we’ll require several utilities. First, we need to wait for the actual input trigger (in other words, an instruction from the CLI to write some text). This instruction will be printed out in the stdout, mentioned earlier, so we would likely want to wait for a specific text question. Let’s call that waitForText, which would accept a string, and we’ll search for it any time the CLI program outputs a new line.

Next, it’s time to input the text that the program is asking for. In this case, we’ll have to interact with stdin under the hood, which is the input equivalent of stdout. Let’s call this utility writeText. Just like the previous utility function, it will accept a string.

Once the text is inputted into the process “console”, we would usually have to confirm it by pressing a key, such as “Enter”. That’s yet another interaction utility we can introduce, for pressing specific keys. Under the hood, we would also use stdin, of course, but let’s not concern ourselves with that. Let’s call it pressKey, which will accept the name of the key.

Now that the task is done, there is just one thing left to do: wait for the program to finish before we can evaluate whether it was executed successfully, and so on. waitForFinish is the obvious name for this. With all that in mind, we can imagine something like the following:

const { waitForText, writeText, pressKey, waitForFinish } = await spawn(
    'node my-cli ask-for-name'
);

await waitForText('What is your name?');
await writeText('Georgy');
await pressKey('Enter');
await waitForFinish();

With the code above, we can simulate the whole interaction of the user with the program. We can also accompany the spawn helper with some more information such as exit code, stdout, or stderr, just like we did for execute. For spawn, the ideal format might be a bit different — perhaps get’ers would be best because the values will be dynamic throughout the program’s execution, and these get’ers could technically be called any time. getStdout, getStderr, or getExitCode will do the trick.

const { getExitCode, getStdout, getStderr, waitForText, writeText, pressKey, waitForFinish } = await spawn(
    'node my-cli ask-for-name'
);

await waitForText('What is your name?');
await writeText('Georgy');
await pressKey('Enter');
await waitForFinish();

console.log(getExitCode());  // 0
console.log(getStdout());  // ["What is your name?", "Georgy", "Your name is Georgy"]
console.log(getStderr());  // []

With that, we’ve covered the main idea of testing more complex interactive CLI programs.

Enclosed And Independent Environment #

Now that we’re deep into testing the CLI program by actually running the CLI, we should cover an important part of any test: its independence of other tests and test runs. Test frameworks usually support tests being run in parallel, in band, or the like, but that doesn’t mean we should limit ourselves to one of those. Each and every run, unless built otherwise, should be completely independent of everything else.

With your usual code, this is simple in most cases, but with the CLI and end-to-end tests, things can get tricky really quickly. The main problem is that the CLI often works with its surroundings. It might read some configuration files, generate other files, or manipulate the file system in some way. This means that each test needs to have its own temporary space on disk where the test use case can be prepared, with any files that might be needed for the test run. The same file-system space also needs to be cleaned up later so that nothing is left behind after each test run.

Creating a folder on disk is essential for any engine, and so it is for Node.js, one of the more mature engines. It even provides functionality for creating temporary folders somewhere on disk, wherever appropriate for the given operating system. The same functionality also gives us the path of the folder on disk so that it can be cleaned up when needed. Fortunately, we can easily use this cross-platform temporary-folder functionality for our test runs.

Let’s get back to our library API. It’s clear that each test should have some sort of prepare and cleanup stages. Something like the following would cover that, allowing for completely independent test runs:

const { execute, cleanup } = await prepareEnvironment();

const { exitCode } = await execute('node my-cli.js my-first-command');

await cleanup();

Now that we have a dedicated test-run root folder, we can create all kinds of helpers to manipulate this enclosed disk environment. Reading files, creating files and folders, making sure a file exists, listing a folder’s contents — all of these and many more helpers related to disk manipulation are already provided by the Node.js process itself. All we have to do is wrap them so that the root directory used is the temporary one we have created.

const {
  makeDir,
  writeFile,
  readFile,
  removeFile,
  removeDir,
  exists,
  ls,
} = await prepareEnvironment();

await makeDir('./subfolder');
await writeFile('./subfolder/file.txt', 'this will be file content');

const folderContent = await ls('./');
console.log(folderContent); // ["subfolder"]

const doesFileExists = await exists('./subfolder/file.txt');
console.log(doesFileExists); // true

const content = await readFile('./subfolder/file.txt');
console.log(content); // this will be file content

await removeFile('./subfolder/file.txt');
await removeDir('./subfolder'); // removes folder with any content

We should consider another thing related to cleanup. From the perspective of test execution, it’s completely unclear what are the contents of the CLI program itself. With end-to-end tests, we’re only concerned with what it does, not with the implementation. That means we cannot be sure what the subprocess contains or does, meaning that it can leave things hanging when executed. When the cleanup function is called in our test runs, we know that we’re done with testing. This means that part of the cleanup function could be a forceful teardown of anything that remains open or running.

Systems Differences #

More caveats arise when it comes to differences in systems and shells. The library already makes several normalization steps, described below.

It might be surprising, but even on a single system, different runs can produce a different stdout array of outputted lines. In some cases, lines might be missing, and in others, they might be there. Combined with the fact that the array will likely be used in combination with snapshots, this is unacceptable and needs to be normalized. In our case, always getting rid of empty lines would be a sufficient solution.

A similar thing needs to be done with all of those special symbols used by the shell — for example, the ones used to make the output in the correct color. These symbols can differ across shell engines with the same program; so, removing them from the output will simplify things.

There is another small but real use case with a system’s special symbols. For whatever reason, in one type of system, they might be clearly visible as a special character in the output, and in another, they might be completely invisible. Two identical strings not being considered the same would lead to an insanely annoying debugging situation. Again, deleting these is the way to go.

Last but not least, in the previous section we talked about creating a separate file-system space. The full path of the current execution folder will often be used in the CLI output. The same goes for the home directory of the system’s current user. Both paths could be part of the output and would cause a test failure in different environments. These need to be normalized so that they are not different in the CLI output in different systems and runs. We can replace these with something more generic, such as {base} and {home}, making it easily identifiable that a path is one of those special folders on disk.

Mocking #

Let’s be honest: Whatever we want to test and however close we want to get to the use case that the user sees, from time to time there will simply be a use case where we’ll need to make some compromise.

A good example is the CLI running with some dependency on an external web API. Each run would be affected by an external force. Moreover, it would depend not only on the API itself, but also on the internet connection and possibly some other factors, such as a VPN connection. That would compromise the requirement of reproducible runs, which is crucial for testing. So, we would need to sacrifice the integrity of the CLI program in such a case.

There is no library-integrated way to solve that. Remember that the library concerns itself with executing a process and the things around it. It doesn’t touch or understand the underlying CLI in any way. That’s why the following is more of a technique that can be used to mock parts of the executed CLI.

For the mocks to take effect on any part of the CLI program’s code, they need to be a part of the program itself — that is, be a part of one process. There is only one reasonable solution to this: make the mock a part of the CLI. Any other solution would have to be specific and invasive with regard to how the child processes are being executed in the system. That’s not something we could simply implement in a library and cover all possible use cases. It would also make the test run inconsistent with the production run in a way that is not easily controlled by the CLI’s author.

Instead, let’s focus on the program extension that was mentioned. After all, the CLI being tested will usually also be the CLI being developed, so making another entry point with the mocks included should be fairly doable. The example below mocks a response received from Axios, assuming that the CLI uses Axios for this request.

// mock-and-run.js
import axios from 'axios';
import MockAdapter from 'axios-mock-adapter';

const mock = new MockAdapter(axios);

mock.onGet('http://example.com/').reply(200, 'mocked response');

require('./index');  // include the CLI entry

Once we have that, we can just run the CLI the same way we would without the mocks, except that the CLI won’t be making any actual external requests, and it will always get a reproducible mocked response for the request code.

const { exitCode } = await execute('node mock-and-run.js my-first-command');

Conclusion

Many would argue that testing is the core of quality software and long-term sustainability. Different kinds of testing bring different kinds of benefits. Remember that, as with any other testing, end-to-end testing is an additional kind of testing at our disposal, not a replacement. The same surely applies to CLI testing.

With end-to-end tests, we can be even more confident that the program we’re testing will do exactly what we want it to do, and won’t be broken by some mistakes only affecting runtime, like the ones that can pop up after updating dependencies.

The power of CLI testing lies in its flexible nature. Whenever we’re testing a program with the JavaScript library, we are certainly not restricted to Node.js programs. After all, we are executing a shell command; so, as long as the environment is able to execute the program as a process, any language will do