Hire a web Developer and Designer to upgrade and boost your online presence with cutting edge Technologies
Showing posts with label user interaction. Show all posts
Showing posts with label user interaction. Show all posts

Wednesday, July 17, 2024

How To Design Effective Conversational AI Experiences: A Comprehensive Guide

 This in-depth guide takes you through the three crucial phases of conversational search, revealing how users express their needs, explore results, and refine their queries. Learn how AI agents can overcome communication barriers, personalize the search experience, and adapt to evolving user intent. Discover practical strategies and real-world examples to guide your development of intuitive, effective, and user-centric conversational interfaces.

Conversational AI is revolutionizing information access, offering a personalized, intuitive search experience that delights users and empowers businesses. A well-designed conversational agent acts as a knowledgeable guide, understanding user intent and effortlessly navigating vast data, which leads to happier, more engaged users, fostering loyalty and trust. Meanwhile, businesses benefit from increased efficiency, reduced costs, and a stronger bottom line. On the other hand, a poorly designed system can lead to frustration, confusion, and, ultimately, abandonment.

Achieving success with conversational AI requires more than just deploying a chatbot. To truly harness this technology, we must master the intricate dynamics of human-AI interaction. This involves understanding how users articulate needs, explore results, and refine queries, paving the way for a seamless and effective search experience.

This article will decode the three phases of conversational search, the challenges users face at each stage, and the strategies and best practices AI agents can employ to enhance the experience.

To analyze these complex interactions, Trippas et al. (2018) (PDF) proposed a framework that outlines three core phases in the conversational search process:

  1. Query formulation: Users express their information needs, often facing challenges in articulating them clearly.
  2. Search results exploration: Users navigate through presented results, seeking further information and refining their understanding.
  3. Query re-formulation: Users refine their search based on new insights, adapting their queries and exploring different avenues.

Building on this framework, Azzopardi et al. (2018) (PDF) identified five key user actions within these phases: reveal, inquire, navigate, interrupt, interrogate, and the corresponding agent actions — inquire, reveal, traverse, suggest, and explain.

Table created by the author based on Azzopardi et al.’s paper
Table created by the author based on Azzopardi et al.’s paper. (Large preview)

In the following sections, I’ll break down each phase of the conversational search journey, delving into the actions users take and the corresponding strategies AI agents can employ, as identified by Azzopardi et al. (2018) (PDF). I’ll also share actionable tactics and real-world examples to guide the implementation of these strategies.

Phase 1: Query Formulation: The Art Of Articulation #

In the initial phase of query formulation, users attempt to translate their needs into prompts. This process involves conscious disclosures — sharing details they believe are relevant — and unconscious non-disclosure — omitting information they may not deem important or struggle to articulate.

This process is fraught with challenges. As Jakob Nielsen aptly pointed out,

“Articulating ideas in written prose is hard. Most likely, half the population can’t do it. This is a usability problem for current prompt-based AI user interfaces.”

— Jakob Nielsen

This can manifest as:

  • Vague language: “I need help with my finances.”
    Budgeting? Investing? Debt management?
  • Missing details: “I need a new pair of shoes.”
    What type of shoes? For what purpose?
  • Limited vocabulary: Not knowing the right technical terms. “I think I have a sprain in my ankle.”
    The user might not know the difference between a sprain and a strain or the correct anatomical terms.

These challenges can lead to frustration for users and less relevant results from the AI agent.

AI Agent Strategies: Nudging Users Towards Better Input #

To bridge the articulation gap, AI agents can employ three core strategies:

  1. Elicit: Proactively guide users to provide more information.
  2. Clarify: Seek to resolve ambiguities in the user’s query.
  3. Suggest: Offer alternative phrasing or search terms that better capture the user’s intent.

The key to effective query formulation is balancing elicitation and assumption. Overly aggressive questioning can frustrate users, and making too many assumptions can lead to inaccurate results.

For example,

User: “I need a new phone.”

AI: “What’s your budget? What features are important to you? What size screen do you prefer? What carrier do you use?...”

This rapid-fire questioning can overwhelm the user and make them feel like they’re being interrogated. A more effective approach is to start with a few open-ended questions and gradually elicit more details based on the user’s responses.

As Azzopardi et al. (2018) (PDF) stated in the paper,

“There may be a trade-off between the efficiency of the conversation and the accuracy of the information needed as the agent has to decide between how important it is to clarify and how risky it is to infer or impute the underspecified or missing details.”

Implementation Tactics And Examples #

  • Probing questions: Ask open-ended or clarifying questions to gather more details about the user’s needs. For example, Perplexity Pro uses probing questions to elicit more details about the user’s needs for gift recommendations.
An example of how a conversational search engine employs probing questions to a user query, which says “gifts for my friend.”
An example of how a conversational search engine employs probing questions to refine user queries and better understand their intent. (Large preview)

For example, after clicking one of the initial prompts, “Create a personal webpage,” ChatGPT added another sentence, “Ask me 3 questions first on whatever you need to know,” to elicit more details from the user.

List of questions ChatGPT-4o asked on the initial prompt “Create a personal webpage”
After clicking one of the initial prompts, “Create a personal webpage” on ChatGPT-4o. (Large preview)
  • Interactive refinement: Utilize visual aids like sliders, checkboxes, or image carousels to help users specify their preferences without articulating everything in words. For example, Adobe Firefly’s side settings allow users to adjust their preferences.
An example of interactive refinement through setting panels on the prompt moon
An example of how interactive refinement in conversational search through setting panels can help users explore their preferences. (Large preview)
  • Suggested prompts: Provide examples of more specific or detailed queries to help users refine their search terms. For example, Nelson Norman Group provides an interface that offers a suggested prompt to help users refine their initial query.
Example design suggesting more descriptive versions of prompts
Example design suggesting more descriptive versions of prompts to users provided by Nelson Norman Group. (Large preview)

For example, after clicking one of the initial prompts in Gemini, “Generate a stunning, playful image,” more details are added in blue in the input.

An example of a refined prompt enriched with details.
An example of a refined prompt enriched with details. (Large preview)
  • Offering multiple interpretations: If the query is ambiguous, present several possible interpretations and let the user choose the most accurate one. For example, Gemini offers a list of gift suggestions for the query “gifts for my friend who loves music,” categorized by the recipient’s potential music interests to help the user pick the most relevant one.
An example of gift suggestions tailored to different types of music enthusiasts.
An example of gift suggestions tailored to different types of music enthusiasts. (Large preview)

Phase 2: Search Results Exploration: A Multifaceted Journey #

A table which visualizes a search results exploration phase
Table created by author based on Azzopardi et al.’s paper. (Large preview)

Once the query is formed, the focus shifts to exploration. Users embark on a multifaceted journey through search results, seeking to understand their options and make informed decisions.

Two primary user actions mark this phase:

  1. Inquire: Users actively seek more information, asking for details, comparisons, summaries, or related options.
  2. Navigate: Users navigate the presented information, browse through lists, revisit previous options, or request additional results. This involves scrolling, clicking, and using voice commands like “next” or “previous.”

AI Agent Strategies: Facilitating Exploration And Discovery #

To guide users through the vast landscape of information, AI agents can employ these strategies:

  1. Reveal: Present information that caters to diverse user needs and preferences.
  2. Traverse: Guide the user through the information landscape, providing intuitive navigation and responding to their evolving interests.

During discovery, it’s vital to avoid information overload, which can overwhelm users and hinder their decision-making. For example,

User: “I’m looking for a place to stay in Tokyo.”

AI: Provides a lengthy list of hotels without any organization or filtering options.

Instead, AI agents should offer the most relevant results and allow users to filter or sort them based on their needs. This might include presenting a few top recommendations based on ratings or popularity, with options to refine the search by price range, location, amenities, and so on.

Additionally, AI agents should understand natural language navigation. For example, if a user asks, “Tell me more about the second hotel,” the AI should provide additional details about that specific option without requiring the user to rephrase their query. This level of understanding is crucial for flexible navigation and a seamless user experience.

Implementation Tactics And Examples #

  • Diverse formats: Offer results in various formats (lists, summaries, comparisons, images, videos) and allow users to specify their preferences. For example, Gemini presents a summarized format of hotel information, including a photo, price, rating, star rating, category, and brief description to allow the user to evaluate options quickly for the prompt “I’m looking for a place to stay in Paris.”
An example of a conversational search interface using a summary format for the prompt “I’m looking for a place to stay in Paris.”
An example of a conversational search interface using a summary format to present relevant information concisely. (Large preview)
  • Context-aware navigation: Maintain conversational context, remember user preferences, and provide relevant navigation options. For example, following the previous example prompt, Gemini reminds users of the potential next steps at the end of the response.
An example of contextual navigation by asking, “I can help you narrow down … What would you like to do next?”
An example of contextual navigation by asking, “I can help you narrow down … What would you like to do next?” (Large preview)
  • Interactive exploration: Use carousels, clickable images, filter options, and other interactive elements to enhance the exploration experience. For example, Perplexity offers a carousel of images related to “a vegetarian diet” and other interactive elements like “Watch Videos” and “Generate Image” buttons to enhance exploration and discovery.
An example of various interactive elements in a conversational search for a vegetarian diet
An example of how interactive elements like image carousels, video recommendations (“Watch”), and image generation options (“Generate Image”) can facilitate exploration in conversational search. (Large preview)
  • Multiple responses: Present several variations of a response. For example, users can see multiple draft responses to the same query by clicking the “Show drafts” button in Gemini.
An example of a conversational search interface providing multiple response options for a UX welcome message.
An example of a conversational search interface providing multiple response options for a UX welcome message. (Large preview)
  • Flexible text length and tone. Enable users to customize the length and tone of AI-generated responses to better suit their preferences. For example, Gemini provides multiple options for welcome messages, offering varying lengths, tones, and degrees of formality.
An example of a conversational search that caters to diverse user preferences by offering multiple tones, text length, and language styles for a UX welcome message.
An example of a conversational search that caters to diverse user preferences by offering multiple tones, text length, and language styles. (Large preview)

Phase 3: Query Re-formulation: Adapting To Evolving Needs #

As users interact with results, their understanding deepens, and their initial query might not fully capture their evolving needs. During query re-formulation, users refine their search based on exploration and new insights, often involving interrupting and interrogating. Query re-formulation empowers users to course-correct and refine their search.

Table, which visualizes a query re-formulation phase
Table created by author based on Azzopardi et al.’s paper. (Large preview)
  • Interrupt: Users might pause the conversation to:
    • Correct: “Actually, I meant a desktop computer, not a laptop.”
    • Add information: “I also need it to be good for video editing.”
    • Change direction: “I’m not interested in those options. Show me something else.”
  • Interrogate: Users challenge the AI to ensure it understands their needs and justify its recommendations:
    • Seek understanding: “What do you mean by ‘good battery life’?”
    • Request explanations: “Why are you recommending this particular model?”

AI Agent Strategies: Adapting And Explaining #

To navigate the query re-formulation phase effectively, AI agents need to be responsive, transparent, and proactive. Two core strategies for AI agents:

  1. Suggest: Proactively offer alternative directions or options to guide the user towards a more satisfying outcome.
  2. Explain: Provide clear and concise explanations for recommendations and actions to foster transparency and build trust.

AI agents should balance suggestions with relevance and explain why certain options are suggested while avoiding overwhelming them with unrelated suggestions that increase conversational effort. A bad example would be the following:

User: “I want to visit Italian restaurants in New York.”

AI: Suggest unrelated options, like Mexican restaurants or American restaurants, when the user is interested in Italian cuisine.

This could frustrate the user and reduce trust in the AI.

A better answer could be, “I found these highly-rated Italian restaurants. Would you like to see more options based on different price ranges?” This ensures users understand the reasons behind recommendations, enhancing their satisfaction and trust in the AI’s guidance.

Implementation Tactics And Examples #

  • Transparent system process: Show the steps involved in generating a response. For example, Perplexity Pro outlines the search process step by step to fulfill the user’s request.
An example of a conversational search interface displays the steps involved in generating a healthy meal plan.
An example of a conversational search interface displays the steps involved in generating a healthy meal plan. (Large preview)
  • Explainable recommendations: Clearly state the reasons behind specific recommendations, referencing user preferences, historical data, or external knowledge. For example, ChatGPT includes recommended reasons for each listed book in response to the question “books for UX designers.”
An example of explainable recommendations for books by ChatGPT.
An example of explainable recommendations for books. (Large preview)
  • Source reference: Enhance the answer with source references to strengthen the evidence supporting the conclusion. For example, Perplexity presents source references to support the answer.
An example of a conversational agent provides source references for the question, “Is it a trend that more people want remote jobs, not onsite jobs?”.
An example of a conversational agent provides source references for the question, “Is it a trend that more people want remote jobs, not onsite jobs?”. (Large preview)
  • Point-to-select: Users should be able to directly select specific elements or locations within the dialogue for further interaction rather than having to describe them verbally. For example, users can select part of an answer and ask a follow-up in Perplexity.
An example of how a conversational search interface can implement a point-to-select feature, enabling users to select specific terms or phrases. An example shows a highlighted word in a text and a message above it stating, “Ask follow-up”.
An example of how a conversational search interface can implement a point-to-select feature, enabling users to select specific terms or phrases. (Large preview)
  • Proactive recommendations: Suggest related or complementary items based on the user’s current selections. For example, Perplexity offers a list of related questions to guide the user’s exploration of “a vegetarian diet.”
An example of related questions based on the user’s initial query.
An example of related questions based on the user’s initial query. (Large preview)

Overcoming LLM Shortcomings #

While the strategies discussed above can significantly improve the conversational search experience, LLMs still have inherent limitations that can hinder their intuitiveness. These include the following:

  • Hallucinations: Generating false or nonsensical information.
  • Lack of common sense: Difficulty understanding queries that require world knowledge or reasoning.
  • Sensitivity to input phrasing: Producing different responses to slightly rephrased queries.
  • Verbosity: Providing overly lengthy or irrelevant information.
  • Bias: Reflecting biases present in the training data.

To create truly effective and user-centric conversational AI, it’s crucial to address these limitations and make interactions more intuitive. Here are some key strategies:

  • Incorporate structured knowledge
    Integrating external knowledge bases or databases can ground the LLM’s responses in facts, reducing hallucinations and improving accuracy.
  • Fine-tuning
    Training the LLM on domain-specific data enhances its understanding of particular topics and helps mitigate bias.
  • Intuitive feedback mechanisms
    Allow users to easily highlight and correct inaccuracies or provide feedback directly within the conversation. This could involve clickable elements to flag problematic responses or a “this is incorrect” button that prompts the AI to reconsider its output.
  • Natural language error correction
    Develop AI agents capable of understanding and responding to natural language corrections. For example, if a user says, “No, I meant X,” the AI should be able to interpret this as a correction and adjust its response accordingly.
  • Adaptive learning
    Implement machine learning algorithms that allow the AI to learn from user interactions and improve its performance over time. This could involve recognizing patterns in user corrections, identifying common misunderstandings, and adjusting behavior to minimize future errors.

Training AI Agents For Enhanced User Satisfaction #

Understanding and evaluating user satisfaction is fundamental to building effective conversational AI agents. However, directly measuring user satisfaction in the open-domain search context can be challenging, as Zhumin Chu et al. (2022) highlighted. Traditionally, metrics like session abandonment rates or task completion were used as proxies, but these don’t fully capture the nuances of user experience.

To address this, Clemencia Siro et al. (2023) offer a comprehensive approach to gathering and leveraging user feedback:

  • Identify key dialogue aspects
    To truly understand user satisfaction, we need to look beyond simple metrics like “thumbs up” or “thumbs down.” Consider evaluating aspects like relevance, interestingness, understanding, task completion, interest arousal, and efficiency. This multi-faceted approach provides a more nuanced picture of the user’s experience.
  • Collect multi-level feedback
    Gather feedback at both the turn level (each question-answer pair) and the dialogue level (the overall conversation). This granular approach pinpoints specific areas for improvement, both in individual responses and the overall flow of the conversation.
  • Recognize individual differences
    Understand that the concept of satisfaction varies per user. Avoid assuming all users perceive satisfaction similarly.
  • Prioritize relevance
    While all aspects are important, relevance (at the turn level) and understanding (at both the turn and session level) have been identified as key drivers of user satisfaction. Focus on improving the AI agent’s ability to provide relevant and accurate responses that demonstrate a clear understanding of the user’s intent.

Additionally, consider these practical tips for incorporating user satisfaction feedback into the AI agent’s training process:

  • Iterate on prompts
    Use user feedback to refine the prompts to elicit information and guide the conversation.
  • Refine response generation
    Leverage feedback to improve the relevance and quality of the AI agent’s responses.
  • Personalize the experience
    Tailor the conversation to individual users based on their preferences and feedback.
  • Continuously monitor and improve
    Regularly collect and analyze user feedback to identify areas for improvement and iterate on the AI agent’s design and functionality.

The Future Of Conversational Search: Beyond The Horizon #

The evolution of conversational search is far from over. As AI technologies continue to advance, we can anticipate exciting developments:

  • Multi-modal interactions
    Conversational search will move beyond text, incorporating voice, images, and video to create more immersive and intuitive experiences.
  • Personalized recommendations
    AI agents will become more adept at tailoring search results to individual users, considering their past interactions, preferences, and context. This could involve suggesting restaurants based on dietary restrictions or recommending movies based on previously watched titles.
  • Proactive assistance
    Conversational search systems will anticipate user needs and proactively offer information or suggestions. For instance, an AI travel agent might suggest packing tips or local customs based on a user’s upcoming trip.

Tuesday, August 1, 2023

How To Enable Collaboration In A Multiparty Setting

 As Artificial Intelligence becomes more widespread, so too does the adoption of digital agents and voice interactions. Explore the power of mixed reality systems and multimodal communication to enhance collaboration between consultants and clients, transforming advisory services.

As Artificial Intelligence becomes more widespread and pervasive, the transition to a data-driven age poses a conundrum for many: Will AI replace me at my job? Can it become smarter than humans? Who is making the important decisions, and who is accountable?

AI is becoming more and more complex, and tools like ChatGPT, Siri, and Alexa are already a part of everyday life to an extent where even experts struggle to grasp and explain the functionality in a tangible way. How can we expect the average human to trust such a system? Trust matters not only in decision-making processes but also in order for societies to be successful. Ask yourself this question: Who would you trust with a big personal or financial decision?

Today’s banking counseling sessions are associated with various challenges: Besides preparation and follow-up, the consultant is also busy with many different tasks during the conversation. The cognitive load is high, and tasks are either done on paper or with a personal computer, which is why the consultant can’t engage sufficiently with the client. Clients are mostly novices who are not familiar with the subject matter. The consequent state of passivity or uncertainty often stems from a phenomenon known as information asymmetry, which occurs when the consultant has more or better information than the client.

In this article, we propose a new approach based on co-creation and collaboration in advisory services. An approach that enables the consultant to simply focus on the customers’ needs by leveraging the assistance of a digital agent. We explore the opportunities and limitations of integrating a digital agent into an advisory meeting in order to allow all parties to engage actively in the conversation.

Rethinking Human-Machine Environments In Advisory Services

Starting from the counseling session described above, we tackled the issues of information asymmetry, trust building, and cognitive overload within the context of a research project.

Understanding the linguistic landscape of Switzerland with its various Swiss-German dialects, the digital agent “Mo” supports consultants and clients in banking consultations by taking over time-consuming tasks, providing support during the consultation, and extracting information. By means of an interactive table, the consultation becomes a multimodal environment in which the agent acts as a third interaction partner.

The setup enables a collaborative exchange between interlocutors, as information is equally visible and accessible to all parties (shared information). Content can be placed anywhere on the table through natural, haptic interactions. Whether the agent records information in the background, actively participates in the composition of a stock portfolio, or warns against risky transactions, Mo “sits” at the table throughout the entire consultation.

To promote active participation from all parties during the counseling session, we have pinpointed crucial elements that facilitate collaboration in a multi-party setting:

  • Shared Device
    All information is made equally visible and interactable for all parties.
  • Collaborative Digital Agent
    By using human modes of communication, social cues, and the support of local dialects, the agent becomes accessible and accepted.
  • Comprehensible User Interfaces
    Multimodal communication helps to convey information in social interactions. Through the use of different output channels, we can convey information in different complexities.
  • Speech Patterns for Voice User Interfaces
    Direct orders to an AI appear unnatural in a multi-party setting. The use of different speech and turn-taking patterns allows the agent to integrate naturally into the conversation.

In the next sections, we will take a closer look at how collaborative experiences can be designed based on those key factors.

“Hello Mo”: Designing Collaborative Voice User Interfaces

Imagine yourself sitting at the table with your bank advisor in a classic banking advisory meeting. The consultant tries to explain to you a ton of banking-specific stuff, all while using a computer or tablet to display stock price developments or to take notes on your desired transactions. In this setup, it is hard for consultants to keep up a decent conversation while retrieving and entering data into a system. This is where voice-based interactions save the day.

When using voice as an input method during a conversation, users do not have to change context (e.g., take out a tablet, or operate a screen with a mouse or keyboard) in order to enter or retrieve data. This helps the consultant to perform a task more efficiently while being able to foster a personal relationship with the client. However, the true strength of voice interactions lies in their ability to handle complex information entry. For example, purchasing stocks requires an input of multiple parameters, such as the title or the number of shares. Where in a GUI, all of these input variables have to be tediously entered by hand, VUIs offer an option of entering everything with one sentence.

Nonetheless, VUIs are still uncharted territory for many users and are accordingly viewed with a huge amount of skepticism. Thus, it is important to consider how we can create voice interactions that are accessible and intuitive. To achieve this goal, it is essential to grasp the fundamental principles of voice interaction, such as the following speech patterns.

Command and Control

This pattern is widely used by popular voice assistants such as Siri, Alexa, and Google Assistant. As the name implies, the assistants are addressed with a direct command — often preceded by a signal “wake word.” For example,

“Hey, Google” → Command: “Turn on the Bedroom Light”

Conversational

The Conversational Pattern, in which the agent understands intents directly from the context of the conversation, is less common in productive systems. Nevertheless, we can find examples in science fiction, such as HAL (2001: A Space Odyssey) or J.A.R.V.I.S. (Iron Man 3). The agent can directly extract intent from natural speech without the need for a direct command to be uttered. In addition, the agent may speak up on his own initiative.

As the Command and Control approach is widely used in voice applications, users are more familiar with this pattern. However, utilizing the Conversational Pattern can be advantageous, as it enables users to interact with the agent effortlessly, eliminating the requirement for them to be familiar with predefined commands or keywords, which they may formulate incorrectly.

In our case of a multi-party setting, users perceived the Conversational Pattern in the context of transaction detection as surprising and unpredictable. For the most part, this is due to the limitations of the intent recognition system. For example, during portfolio customization, stock titles are discussed actively. Not every utterance of a stock title corresponds to a transaction, as the consultant and client are debating possibilities before execution. It is fairly difficult or nearly impossible for the agent to distinguish between option and intent. In this case, command structures offer more reliability and control at the expense of the naturalness of the conversation since the Command and Control Pattern results in unnatural interruption and pauses in the conversation flow. To get the best of both worlds (natural interactions and predictable behavior), we introduce a completely new speech pattern:

Conversational Confirmation

Typically, transaction intents are formulated according to the following structure:

Interlocutor 1: We then buy 20 shares of Smashing Media Stocks (intent).
Interlocutor 2: Yes, let’s do that (confirmation).
Interlocutor 1: All right then, let’s buy Smashing Media Stocks (reconfirmation).

In the current implementation of the Conversational Pattern, the transaction would be executed after the first utterance, which was often perceived to be irritating. In the Conversational Confirmation pattern, the system waits for both parties to confirm and executes the transaction only after the third utterance. By adhering to the natural rules of human conversation, this approach meets the users’ expectations.

Conclusion #

  1. Regarding the users’ mental model of digital agents, the Command and Control Pattern provides users with more control and security.
  2. The Command and Control Pattern is suitable as a fallback in case the agent does not understand an intent.
  3. The Conversational Pattern is suitable when information has to be obtained passively from the conversation. (logging)
  4. For collaborative counseling sessions, the Conversational Confirmation Pattern could greatly enhance the counseling experience and lead to a more natural conversation in a multi-party setting.

In a world where personal devices such as PCs, mobile phones, and tablets are prevalent, we have grown accustomed to interacting with technical devices in “single-player mode.” The use of private devices undoubtedly has its advantages in certain situations (as in not having to share the million cute cats we google during work with our boss). But when it comes to collaborative tasks — sharing is caring.

Put yourself back into the previously described scenario. At some point, the consultant is trying to show stock price trends on the computer or tablet screen. However, regardless of how the screen is positioned, at least one of the participants has limited vision. Due to the fact that the computer is a personal device of the consultant, the client is excluded from actively engaging with it — leading to the problem of unequal distribution of information.

By integrating an interactive tabletop projection into the consultation meeting, we aimed to overcome the limitations of “personal devices,” improving trust, transparency, and decision empowerment. It is essential to understand that human communication relies on various channels, i.e., modalities (voice, sight, body language, and so on), which help individuals to express and comprehend complex information more effectively. The interactive table as an output system facilitates this aspect of human communication in the digital-physical realm. In a shared device, we use the physical space as an interaction modality. The content can be intuitively moved and placed in the interaction space using haptic elements and is no longer bound to a screen. These haptic tokens are equally accessible to all users, encouraging especially novice users to interact and collaborate on a regular tabletop surface.

Token interaction.

The interactive tabletop projection also makes information more comprehensible for users. For example, during the consultation, the agent updates the portfolio visualization in real time. The impact of a transaction on the overall portfolio can be directly grasped and pulled closer by the client and advisor and used as a basis for discussion.

Transaction detected.

A result is a transparent approach to information, which increases the understanding of bank-specific and system-specific processes, consequently improving trust in the advisory service and leading to more interaction between customer and advisor.

Apart from the spatial modality, the proposed mixed reality system provides other input and output channels, each with its unique characteristics and strengths. If you are interested in this topic this article on Smashing provides a great comparison of VUIs and GUIs and when to use which.

Conclusion #

The proposed mixed reality system fosters collaboration since:

  1. Information is equally accessible to all parties (reducing information asymmetry, fostering shared understanding, and building trust).
  2. One user interface can be operated collectively by several interaction partners (engagement).
  3. Multisensory human communication can be transferred to the digital space (ease of use).
  4. Information can be better comprehended due to multimodal output (ease of use).

Next Stop: Collaborative AI (Or How To Make A Robot Likable) #

For consultation services, we need an intelligent agent to reduce the consultant’s cognitive load. Can we design an agent that is trustworthy, even likable, and accepted as a third collaboration partner?

Empathy For Machines

Whether it’s machines or humans, empathy is crucial for interactions, and social cues are the salt and pepper to achieve this. Social cues are verbal or nonverbal signals that guide conversations and other social interactions by influencing our perceptions of and reactions toward others. Examples of social cues include eye contact, facial expressions, tone of voice, and body language. These impressions are important communicative tools because they provide social and contextual information and facilitate social understanding. In order for the agent to appear approachable, likable, and trustworthy, we have attempted to incorporate social elements while designing the agent. As described above, social cues in human communication are transported through different channels. Transferring to the digital context once again requires the use of multimodality.

The visual manifestation of the agent enables the elaboration of character-defining elements, such as facial expressions and body language in digital space, analogous to the human body. Highlighting important context information, such as indicating system status.

Agent warning against risky transactions.

In terms of voice interactions, social cues play an important role in system feedback. For example, a common human communication practice is to confirm an action by stating a short “mhm” or “ok.” Applying this practice to the agent’s behavior, we tried to create a more transparent and natural feeling VUI.

When designing voice interactions, it’s important to note that the agent’s perception is heavily influenced by the speech pattern utilized. Once the agent is addressed with a direct command, it is assigned a subordinate role (servant) and is no longer perceived as an equal interaction partner. Recognizing the intent of the conversation independently, the agent is perceived as more intelligent and trustworthy.

Mo: Ambassador Of System Transparency 

Despite great progress in Swiss German speech recognition, transaction misrecognition still occurs. While dealing with an imperfect system, we have tried to take advantage of it by leveraging the agent to make system-specific processes more understandable and transparent. We implemented the well-known usability heuristic: the more comprehensible system-specific processes are, the better the understanding of a system and the more likely users feel empowered to interact with it (and the more they trust and accept the agent).

A core activity of every banking consultation meeting is the portfolio elaboration phase, where the consultant, client, and agent try to find the best investment solutions. In the process of adjusting the portfolio, transactions get added and removed with the helping hand of the agent. If “Mo” is not fully confident of a transaction, “Mo” checks in and asks whether the recognized transaction has been understood correctly.

Mo asking whether a transaction was understood correctly.

The agent’s voice output follows the usual conventions of a conversation: as soon as an interlocutor is unsure regarding the content of a conversation, he or she speaks up, politely apologizes, and asks if the understood content corresponds to the intent of the conversation. In case the transaction was misunderstood, the system offers the possibility to correct the error by adjusting the transaction using touch and a scrolling token (Microsoft Dial). We deliberately chose these alternative input methods over repeating the intent with voice input to avoid repetitive errors and minimize frustration. By giving the user the opportunity to take action and be in control of an actual error situation, the overall acceptance of the system and the agent are strengthened, creating a nutritious soil for collaboration.

Conclusion:

  • Social cues provide opportunities to design the agent to be more approachable, likable, and trustworthy. They are an important tool for transporting context information and enabling system feedback.
  • Making the agent part of explaining system processes helps improve the overall acceptance and trust in both the agent and the system (Explainable AI).

Towards The Future

Irrespective of the specific consulting field, whether it’s legal, healthcare, insurance, or banking, two key factors significantly impact the quality of counseling. The first factor involves the advisor’s ability to devote undivided attention to the client, ensuring their needs are fully addressed. The second factor pertains to structuring the counseling session in a manner that facilitates equal access to information for all participants, presenting it in a way that even inexperienced individuals can understand. By enhancing customer experience through promoting self-determined and well-informed decision-making, businesses can boost customer retention and foster loyalty.

Introducing a shared device in counseling sessions offers the potential to address the problem of information asymmetry and promote collaboration and a shared understanding among participants. Does this mean that every consultation session depends on the proposed mixed reality setup? For physical consultations, the interactive tabletop projection (or an equivalent interaction space where all participants have equal access to information) does enable a democratic approach to information — personal devices just won’t do the job.

In the context of digital (remote) consultations, collaboration, and transparency remain crucial, but the interaction space undergoes significant changes, thereby altering the requirements. Regardless of the specific interaction space, careful consideration must be given to conveying information in an understandable manner. Utilizing different modalities can enhance the comprehensibility of user interfaces, even in traditional mobile or desktop UIs.

To alleviate the cognitive load on consultants, we require a system capable of managing time-consuming tasks in the background. However, it is important to acknowledge that digital agents and voice interactions remain unfamiliar territory for many users, and there are instances where voice processing falls short of users’ high expectations. Nevertheless, speech processing will certainly see great improvements in the next few years, and we need to start thinking today about what tomorrow’s interactions with voice assistants might look like.

Friday, July 28, 2023

Modern Technology And The Future Of Language Translation

 The field of language translation has never been more exciting. The opportunities for translation management systems to generate accurate, real-time translations are numerous, thanks to the growing and evolving development of artificial intelligence, machine learning, and natural language processing. In this article, Adriano Raiano discusses the evolution of language translation platforms, detailing how we got to where we are today and what advancements we can look forward to in the coming years.

Multilingual content development presents its own set of difficulties, necessitating close attention to language translations and the use of the right tools. The exciting part is that translation technology has advanced remarkably over time.

In this article, we’ll explore the growth of translation technology throughout time, as well as its origins, and lead up to whether machine translation and artificial intelligence (AI) actually outperform their conventional counterparts when it comes to managing translations. In the process, we’ll discuss the fascinating opportunities offered by automated approaches to language translation as we examine their advantages and potential drawbacks.

And finally, we will speculate on the future of language translation, specifically the exhilarating showdown between OpenAI and Google in their race to dominate the AI landscape.

The Evolution Of Translation Technology #

Translation technology can be traced back to Al-Kindi’s Manuscript on Deciphering Cryptographic Messages. However, with the arrival of computers in the mid-twentieth century, translation technology began taking shape. Over the years, significant milestones have marked the evolution, shaping how translations are performed and enhancing the capabilities of language professionals.

Black and white photo of a phone operator using a transcription machine
Image source: Reddit. (Large preview)

Georgetown University and IBM conducted the so-called Georgetown-IBM experiment in the 1950s. The experiment was designed primarily to capture governmental and public interests and funding by demonstrating machine translation capabilities. It was far from a fully featured system. This early system, however, was rule-based and lexicographical, resulting in low reliability and slow translation speeds. Despite its weaknesses, it laid the foundation for future advancements in the field.

The late 1980s and early 1990s marked the rise of statistical machine translation (SMT) pioneered by IBM researchers. By leveraging bilingual corpora, SMT improved translation accuracy and laid the groundwork for more advanced translation techniques.

In the early 1990s, commercial computer-assisted translation (CAT) tools became widely available, empowering translators and boosting productivity. These tools utilized translation memories, glossaries, and other resources to support the translation process and enhance efficiency.

The late 1990s saw IBM release a rule-based statistical translation engine (pdf), which became the industry standard heading into the new century. IBM’s translation engine introduced predictive algorithms and statistical translation, bringing machine translation to the forefront of language translation technology.

In the early 2000s, the first cloud-based translation management systems (TMS) began appearing in the market. While there were some early non-cloud-based versions in the mid-1980s, these modern systems transformed the translation process by allowing teams of people to work more flexibly and collaborate with other company members regardless of their location. The cloud-based approach improved accessibility, scalability, and collaboration capabilities, completely changing how translation projects were managed.

2006 is a significant milestone in translation management because it marks the launch of Google Translate. Using predictive algorithms and statistical translation, Google Translate brought machine translation to the masses and has remained the de facto tool for online multilingual translations. Despite its powerful features, it gained a reputation for inaccurate translations. Still, it plays a pivotal role in making translation technology more widely known and utilized, paving the way for future advancements.

The Google Translate interface
Image source: Bureau Works. (Large preview)

In 2016, Google Translate made a significant leap by introducing neural machine translation (NMT). NMT surpassed previous translation tools, offering improved quality, fluency, and context preservation.

NMT set a new commercial standard and propelled the field forward. By 2017, DeepL emerged as an AI-powered machine translation system renowned for its high-quality translations and natural-sounding output. DeepL’s capabilities further demonstrated the advancements achieved in the field of translation technology.

From 2018 onward, the focus has remained on enhancing NMT models, which continue to outperform traditional statistical machine translation (SMT) approaches. NMT has proven instrumental in improving translation accuracy and has become the preferred approach in today’s many translation applications.

More after jump! Continue reading below ↓

What Translation Technology Came Into Place Over the Years #

Translation technology has evolved significantly over the years, offering various tools to enhance the translation process. The main types of translation technology include:

  • Computer-assisted translation (CAT)
    These software applications support translators by providing databases of previous translations, translation memories, glossaries, and advanced search and navigation tools. CAT tools revolutionize translation by improving efficiency and enabling translators to focus more on the translation itself.
  • Machine translation (MT)
    Machine translation is an automated system that produces translated content without human intervention. It can be categorized into rule-based (RBMT), statistical (SMT), or neural (NMT) approaches. MT’s output quality varies based on language pairs, subject matter, pre-editing, available training data, and post-editing resources. Raw machine translation may be used for low-impact content while post-editing by human translators is advisable for high-impact or sensitive content.
  • Translation management systems (TMS)
    TMS platforms streamline translation project management, offering support for multiple languages and file formats, real-time collaboration, integration with CAT tools and machine translation, reporting features, and customization options. TMS solutions ensure organized workflow and scalability for efficient translation project handling.

Translation technology advancements have transformed the translation process, making it more efficient, cost-effective, and scalable.

Finding The Right Translation Approach: Machine Vs. Human #

Finding the proper translation approach involves weighing the benefits and drawbacks of machine translation (MT) and human translation. Each approach has its own strengths and considerations to take into account.

Human translation, performed by professional linguists and subject-matter experts, offers accuracy, particularly for complex documents like legal and technical content. Humans can grasp linguistic intricacies and apply their own experiences and instincts to deliver high-quality translations. They can break down a language, ensure cultural nuances are correctly understood, and inject creativity to make the content compelling.

Collaborating with human translators allows direct communication, reducing the chances of missing project objectives and minimizing the need for revisions.

An illustration of a robot butting heads with a man in a shirt and tie
Image source: TechTalks. (Large preview)

That said, human translation does have some downsides, namely that it is resource-intensive and time-consuming compared to machine translation. If you have ever worked on a multilingual project, then you understand the costs associated with human translation — not every team has a resident translator, and finding one for a particular project can be extremely difficult. The costs often run high, and the process may not align with tight timelines or projects that prioritize speed over contextual accuracy.

Nevertheless, when it comes to localization and capturing the essence of messaging for a specific target audience, human translators excel in fine-tuning the content to resonate deeply. Machine translation cannot replicate the nuanced touch that human translators bring to the table.

On the other hand, machine translation — powered by artificial intelligence and advanced algorithms — is rapidly improving its understanding of context and cultural nuances. Machine translation offers speed and cost-efficiency compared to that manual translations, making it suitable for certain projects that prioritize quick turnarounds and where contextual accuracy is not the primary concern.

Modern TMSs often integrate machine and human translation capabilities, allowing users to choose the most appropriate approach for their specific requirements. Combining human translators with machine translation tools can create a powerful translation workflow. Machine translation can be used as a starting point and paired with human post-editing to ensure linguistic precision, cultural adaptation, and overall quality.

Translation management systems often provide options for leveraging both approaches, allowing for flexibility and optimization based on the content, time constraints, budget, and desired outcome. Ultimately, finding the proper translation approach depends on the content’s nature, the desired accuracy level, project objectives, budget considerations, and time constraints. Assessing these factors and considering the advantages and disadvantages of human and machine translation will guide you in making informed decisions that align with your or your team’s needs and goals.

AI and Machine Translation #

Thanks to machine learning and AI advancements, translation technology has come a long way in recent years. However, complete translation automation is not yet feasible, as human translators and specialized machine translation tools offer unique advantages that complement each other.

The future of translation lies in the collaboration between human intelligence and AI-powered machine translation. Human translators excel in creative thinking and adapting translations for specific audiences, while AI is ideal for automating repetitive tasks.

This collaborative approach could result in a seamless translation process where human translators and AI tools work together in unison.

Machine-translation post-editing ensures the accuracy and fluency of AI-generated translations, while human translators provide the final touches to cater to specific needs. This shift should lead to a transition from computer-assisted human translation to human-assisted computer translation. Translation technology will continue to evolve, allowing translators to focus on more complex translations while AI-powered tools handle tedious tasks. It is no longer a question of whether to use translation technology but which tools to utilize for optimal results.

The future of translation looks promising as technology empowers translators to deliver high-quality translations efficiently, combining the strengths of human expertise and AI-powered capabilities.

The Rise of Translation Management Systems #

Regarding AI and human interaction, TMSs play a crucial role in facilitating seamless collaboration. Here are five more examples of how TMSs enhance the synergy between human translators and AI.

Terminology Management #

TMSs offer robust terminology management features, allowing users to create and maintain comprehensive term bases or glossaries, ensuring consistent usage of specific terminology across translations, and improving accuracy.

Quality Assurance Tools #

TMSs often incorporate quality assurance tools that help identify potential translation errors and inconsistencies. These tools can flag untranslated segments, incorrect numbers, or inconsistent translations, enabling human translators to review and rectify them efficiently.

Workflow Automation #

TMSs streamline the translation process by automating repetitive tasks. They can automatically assign translation tasks to appropriate translators, track progress, and manage deadlines. This automation improves efficiency and allows human translators to focus more on the creative aspects of translation, like nuances in the voice and tone of the content.

Collaboration And Communication #

TMSs provide collaborative features that enable real-time communication and collaboration among translation teams. They allow translators to collaborate on projects, discuss specific translation challenges, and share feedback, fostering a cohesive and efficient workflow.

Reporting And Analytics #

TMSs offer comprehensive reporting and analytics capabilities, providing valuable insights into translation projects. Users can track project progress, measure translator productivity, and analyze translation quality, allowing for continuous improvement and informed decision-making.

By leveraging the power of translation management systems, the interaction between AI and human translators becomes more seamless, efficient, and productive, resulting in high-quality translations that meet the specific needs of each project.

Google And OpenAI Competition #

We’re already seeing brewing competition between Google and OpenAI for dominance in AI-powered search and generated content. I expect 2024 to be the year that the clash involves translation technology.

Google and OpenAI logos
Image source: Answer IQ. (Large preview)

That said, when comparing OpenAI’s platform to Google Translate or DeepL, it’s important to consider the respective strengths and areas of specialization of each one. Let’s briefly consider the strengths of each one to see precisely how they differ.

Continuously Improved And Robust Translation #

Google Translate and DeepL are dedicated to the field of machine translation and have been, for many years, focusing on refining their translation capabilities.

As a result, they have developed robust systems that excel in delivering high-quality translations. These platforms have leveraged extensive data and advanced techniques to improve their translation models, addressing real-world translation challenges continuously. Their systems’ continuous refinement and optimization have allowed them to achieve impressive translation accuracy and fluency.

Generating Text #

OpenAI primarily focuses on generating human-like text and language generation tasks.

While OpenAI’s models, including ChatGPT, can perform machine translation tasks, they may not possess the same level of specialization and domain-specific knowledge as Google Translate and DeepL.

The primary objective of OpenAI’s language models is to generate coherent and contextually appropriate text rather than specifically fine-tuning their models for machine translation.

Compared to ChatGPT, Google Translate and DeepL excel in domain-specific sentences while factoring in obstacles to translation, such as background sounds when receiving audio input. In that sense, Google Translate and DeepL have demonstrated their ability to handle real-world translation challenges effectively, showcasing their continuous improvement and adaptation to different linguistic contexts.

The Future Of Machine Translation #

Overall, when it comes to machine translation, Google Translate and DeepL have established themselves as leaders in the field, with a focus on delivering high-quality translations. Their extensive experience and focus on continual improvement contribute to their reputation for accuracy and fluency. While OpenAI’s ChatGPT models technically offer translation capabilities, they may not possess the same level of specialization or optimization tailored explicitly for machine translation tasks.

It’s important to note that the landscape of machine translation is continuously evolving, and the relative strengths of different platforms may change over time. While Google Translate and DeepL have demonstrated their superiority in translation quality, it’s worth considering that OpenAI’s focus on language generation and natural language processing research could benefit future advancements in their machine translation capabilities. Together, the three systems could make a perfect trifecta of accurate translations, speed and efficiency, and natural language processing.

OpenAI’s commitment to pushing the boundaries of AI technology and its track record of innovation suggests it may invest more resources in improving machine translation performance. As OpenAI continues to refine its models and explore new approaches, there is a possibility that it could bridge that gap and catch up with Google Translate and DeepL in terms of translation quality and specialization.

The machine translation landscape is highly competitive, with multiple research and industry players continuously striving to enhance translation models. As advancements in machine learning and neural networks continue, it’s conceivable that newer platforms or models could emerge and disrupt the current dynamics, introducing even higher-quality translations or specialized solutions in specific domains.

So, even though Google Translate and DeepL currently hold an advantage regarding translation quality and domain-specific expertise today in 2023, it’s essential to acknowledge the potential for future changes in the competitive landscape in the years to come. As technology progresses and new breakthroughs occur, the relative strengths and weaknesses of different platforms may shift, leading to exciting developments in the field of machine translation.

Conclusion #

In summary, the evolution of translation technology has brought advancements to the multilingual space:

  • The choice of translation approach depends on project requirements, considering factors such as accuracy, budget, and desired outcomes.
  • Machine translation offers speed and cost-efficiency, while human translation excels in complex content.
  • Collaboration between human translators and AI-powered machines is best to get accurate translations that consider voice and tone.
  • Translation management systems are crucial in facilitating collaboration between AI and human translators.

While Google Translate and DeepL have demonstrated higher translation quality and specialization, OpenAI’s focus on human-like text generation may lead to improvements in machine translation capabilities. And those are only a few of the providers.

That means the future of translation technology is incredibly bright as platforms, like locize, continue to evolve. As we’ve seen, there are plenty of opportunities to push this field further, and the outcomes will be enjoyable to watch in the coming years.

Friday, October 28, 2022

Sustainable Web Development Strategies Within An Organization

 Climate change and sustainability are increasing concerns for digital organizations, as well as individuals working in tech. In this article, we’ll explore some of the ways we can raise awareness and effect change within an organization to create a more positive environmental impact.

Sustainability is rightly becoming more widely discussed within the web development industry, just as it is an increasing concern in the wider public consciousness. Many countries around the world have committed to ambitious climate goals, although many have some way to go if they are to meet their targets.

All industries have a part to play, and that includes web design and development. The internet accounts for an estimated 3–4% of global emissions — equivalent to some countries. That means we, as tech workers, are in a position to make choices that contribute to reducing the environmental impact of our industry. Not only that, but as a well-connected industry, one that builds digital products often used by thousands or millions of people, we are also relatively well-positioned to influence the behavior of others.

Line drawing of a laptop computer, an aeroplane, and the continent of Africa, with the text “3–4% of global emissions”
The carbon emissions of the internet are roughly equivalent to those generated by the entire aviation industry, or the whole of Africa. (Image credit: BBC.com) (Large preview)

In this article, we’ll explore some of the ways that we, as individuals, can use our skills to have a positive environmental impact within a digital organization.

Presenting The Case For Sustainability #

One of the first hurdles to implementing sustainable practices within an organization (or on a project) is convincing stakeholders that it is worth the investment. Any change of practice, however small, will probably require some time investment by employees. Being able to present a business case, and demonstrate that the benefits outweigh the costs, will help justify focusing resources in the area of sustainability.

Cost-Effectiveness #

It would be great to think that for every company, the idea of building a better world trumps financial concerns. Unfortunately, with some exceptions, that’s generally not the case. But there are plenty of actions we can take that reduce our environmental impact and reduce costs (or increase revenue) at the same time.

For example, changing our database architecture to be more efficient could save on server costs. Making performance improvements to a client’s site could result in happier clients who send more business our way. Identifying where sustainability and cost savings overlap is a good place to start.

Regulation #

Despite financial impact being a fairly obvious incentive, it’s not the only one, and perhaps not even the most significant. In his recent Smashing Conference talk, green software expert Asim Hussain mentioned that the biggest shift he is seeing is as a result of regulation — or the threat of regulation.

With many countries publicly committed to Net Zero goals, it is increasingly likely that companies will need to submit to the regulation of their carbon emissions. The UK’s commitment is enshrined into law, with carbon budgets set over many years. Many companies are already taking the long view and looking to get ahead of the competition by reducing their emissions early.

Being able to demonstrate as a company that you are committed to sustainability can open up a greater number of opportunities. Organizations working with the UK government to build new digital services, for example, are required to meet standards defined in their Greening Government ICT and Digital Services Strategy.

Accreditation #

Companies that can demonstrate their environmental credentials may be eligible for certification, such as ISO14001 standard in the UK. In the case of Ada Mode, the company I work for, this has directly contributed to winning us more work and has enabled us to partner with much larger organizations.

Businesses that achieve BCorp status can benefit (according to the website) from “committed and motivated employees, increased customer loyalty, higher levels of innovation, and market leadership”.

Certainly, organizations positioning themselves as environmentally conscious increase their chances of attracting sustainability-minded candidates for recruitment as more and more people seek meaningful work.

It’s All In The Branding #

Another great bit of advice from Asim’s talk at the Smashing Conference was on branding. The “Eco” movement has long been associated with being somewhat spartan, taking away something, using or consuming less. Rather than giving our users a reduced experience, reducing the environmental impact of our digital products has the opportunity to deliver our users more. Asim talked about Performance Mode in Microsoft Edge: switching on Performance Mode means users get a faster website, while also saving resources. “Performance Mode” sounds a lot more appealing than “Eco Mode”, which sounds like something is being taken away.

The Bigger Picture #

When presenting the case for investing time in sustainability efforts in an organization, it can be helpful to explain the relevance of small actions on a bigger scale. For example, Smashing’s editor, Vitaly Friedman, makes a case for reducing the size and quality of images on a site by explaining the overall cost and CO2 savings when taking into account page views over an entire year.

On the Fact Sheets page, we can save approx. 85% of images’ file sizes without a noticeable loss of image quality. With approx. 1,300,000 annual page views…this makes for 5.2 Terabyte of wasted traffic.
The difference is approx. EUR 1000–1650 in costs (on one single page!). Notably, this makes for 17.28 tons of CO2, which requires 925 trees to be planted, and that’s enough to drive an electric car for 295,000km — annually.
More after jump! Continue reading below ↓

Get Organized #

Affecting change at an organizational level is nearly always easier when you build consensus.

Forming A Team #

Forming a green team within your organization enables you to support each other to achieve climate goals and identify new opportunities. ClimateAction.tech has some resources on starting a green team at your place of work.

If your organization is small, or there is a lack of interest, then finding a supportive community outside of work (such as ClimateAction.tech) can help you stay motivated and lend their advice. It’s also a great idea to connect with teams working on sustainability in other businesses.

Planning #

Once you have a team, you’ll be in a good position to plan your actions. It can be hard to know where to focus your efforts first. One way we could do this is by drawing a diagram and sorting potential actions according to their impact and effort.

An effort versus impact diagram, showing how actions could be prioritized from dark blue (high priority) to light blue (low priority).
Using an effort vs. impact diagram to brainstorm sustainability actions could help you decide what to prioritize. (Large preview)

For example, switching to a green hosting provider could be a small-to-medium effort but result in a high impact. Re-writing your web app to use a more lightweight JS framework could be an extremely high effort for a relatively low impact.

The goal is to identify the areas where your efforts would be best focused. Low-effort/high-impact actions are easy wins and definitely worth prioritizing. Achieving a few aims early on is great for moral and helps keep the momentum going. High-effort/high-impact actions are worth considering as part of your long-term strategy, even if you can’t get to them right away. Low-effort/low-impact tasks might also be worth doing, as they won’t take up too much time and effort. High-effort/low-impact actions are generally to be avoided.

This isn’t the only way to prioritize, however. Other factors to consider include workload, resources (including financial), and the availability of team members. For example, if your development team are particularly stretched thin, it may be more prudent to focus on goals within the areas of design or project management or prioritize actions that can be easily integrated with the development workflow in a current project.

It’s not always the case that every sustainability effort needs to be meticulously planned and scheduled. Jamie Thompson from intelligent energy platform Kaluza explained in a recent talk how a developer spent just 30 minutes of spare time removing database logs, resulting in a large reduction in CO2 emissions — enough to offset Jamie’s train journey to the event.

Watch the video of Jamie’s talk from Green Tech South West.

Measuring The Impact #

Measuring the impact of your sustainability efforts is a thorny subject and depends on what exactly you want to measure. To get some idea of the impact of changes to our websites, we can use tools such as Website Carbon Calculator, EcoPing, and Beacon. These tools are especially helpful in making the impact more tangible by comparing the amount of CO2 emitted to common activities such as traveling by car, boiling the kettle, or watching a video.

Screenshot of the Website Carbon Calculator, showing that 0.35g of CO2 is produced every time someone visits the Smashing homepage
The Website Carbon Calculator by Wholegrain Digital shows that Smashing Magazine’s homepage is cleaner than 77% of web pages tested. (Image credit: Website Carbon Calculator) (Large preview)

Where sustainability goals align with cost-saving (such as reducing server load), we may be able to measure the impact of the financial savings we’re making. But we should be careful not to conflate the two goals.

Some Areas To Consider #

If you’re not sure where to start when it comes to making your digital organization more sustainable, here are a few areas to think about.

Green Your Website #

There are many ways we can reduce the environmental impact of the websites and digital products we build, from reducing and optimizing our images to minimizing the amount of data we transfer to implementing a low-energy color scheme. Tom Greenwood’s book, Sustainable Web Design is packed with advice for building low-carbon websites.

When the architectural website Dezeen discovered how polluting their website was, they took steps to massively reduce its carbon footprint, resulting in some huge savings — according to their measurements, equivalent to the carbon sequestered by 96,600 mature trees.

Green Hosting #

Our choice of web host can have a big impact on our organization’s carbon emissions. Consider switching to a host that uses renewable energy. The Green Web Foundation has a directory.

Switch Your Analytics #

Do you really need Google Analytics on every site you build? How about switching to a lower-carbon alternative like Fathom or Cabin instead? As a bonus, you might not need that cookie banner, either.

Developer Toolchain #

Eric Bailey writes in this article for Thoughtbot:

“If I was a better programmer, I’d write a script that shows you the cumulative CO₂ you’ve generated every time you type npm install.”

Clean up your dependencies and remove the ones you no longer need, especially if you’re working on a project or package that will be installed by a lot of developers. Consider whether a static site might serve your needs better than a bloated WordPress project in some instances. (Eric’s article also includes a bunch of other great tips for building more sustainably.)

Hardware And E-Waste #

Several tonnes of carbon go into producing our MacBooks, PCs, tablets, and mobile devices, even before we start using them. Do we really need to upgrade our devices as regularly as we do? We must also consider their disposal, which also produces generates carbon emissions and produces harmful waste. It might be possible to repair the device or, if we need to upgrade, to sell or donate the old ones to someone who needs them, extending their useful life.

Gerry McGovern has written and spoken extensively about the problem of e-waste, including his book, World Wide Waste.

Electricity Use #

It’s probably fairly obvious, but reducing our electricity consumption by switching off or powering devices when we don’t need them and switching to a green electricity supplier could make a big difference.

Travel #

Does your team regularly drive or fly for work? It might be helpful to set some organization-level targets for reducing carbon-intensive travel and looking for sustainable alternatives where possible. Driving and flying are among the most polluting activities an individual can engage in.

Larger Organizations #

If you work for a big corporation, the battle to get climate action on the agenda may be uphill — but, on the flip side, your efforts could have a far more wide-ranging impact. Small changes to improve the carbon footprint of a site can have a big impact when that site is used by millions of people. And in an organization of thousands, corporate policies on sustainable travel and electricity use can save a lot of carbon emissions.

Many of the big tech companies have the potential to use their lobbying power for the greater good. As tech workers, we can help push it up the agenda. Check out Climate Voice for some of the ways tech workers are attempting to use their influence.

Screenshot of the 1in5 for 1.5 homepage, with the heading “Urge tech to lobby for climate change”
“1in5 For 1.5” is a campaign by Climate Voice that urges big tech companies to use their lobbying power to positively influence climate policy. (Image credit: Climate Voice) (Large preview)

Spread The Word #

A common argument people make against action on climate change is that individual actions don’t make a difference. There’s a great podcast episode in the How To Save a Planet series called Is Your Carbon Footprint BS? which confronts exactly this dilemma. You could argue that when taken individually, our actions are of little consequence. But all of our actions have the potential to spark action in others and ripple outwards. Dr. Anthony Leiserowitz, who runs the Yale Center for Climate Change Communication is quoted in the episode saying:

“One of the single most important things that anyone, anyone can do. When people say, ‘What can I do about climate change?’ My answer, first and foremost, is to talk about it.”

By taking action at an organizational level, you’ve already extended your sphere of influence beyond just yourself. Encourage the people working at your company to be vocal about your climate commitments. We have the power to inspire action in others.

Inclusivity, Accessibility And Climate Justice #

However we choose to take action on climate change and sustainability, it’s imperative to exclude no one. We should make sure our actions don’t overtly or covertly place undue burdens on already-marginalized people, including those with disabilities, people of color, those living in developing countries, people with below-average incomes, or LGBTQ+ people. Climate change is already exacerbating inequalities, with the people causing the least pollution the ones at the most risk from its effects. We must ensure that whatever climate action we take, we’re making fair and equitable decisions that include everyone.

Resources #

  • Jon Gibbins, founder and director of As It Should Be, a UK-based agency helping digital teams design and build accessible and sustainable products and services, recently delivered a talk about accessibility and sustainability. You can watch his talk, Leave No One Behind, on the Green Tech South West website.
  • The Environment Variables podcast from the Green Software Foundation has an episode on Accessibility and Sustainability.
  • Read more about climate justice in this article from Carbon Brief.