Honey Vig Web Developer Ajax,Jquery, ASP, PHP, dotNET

Friday, January 31, 2025

How To Design For High-Traffic Events And Prevent Your Website From Crashing

Product drops and sales are a great way to increase revenue, but these events can result in traffic spikes that affect a site’s availability and performance. To prevent website crashes, you’ll have to make sure that the sites you design can handle large numbers of server requests at once. Let’s discuss how!

Product launches and sales typically attract large volumes of traffic. Too many concurrent server requests can lead to website crashes if you’re not equipped to deal with them. This can result in a loss of revenue and reputation damage.

The good news is that you can maximize availability and prevent website crashes by designing websites specifically for these events. For example, you can switch to a scalable cloud-based web host, or compress/optimize images to save bandwidth.

In this article, we’ll discuss six ways to design websites for high-traffic events like product drops and sales:

How To Design For High-Traffic Events

Let’s take a look at six ways to design websites for high-traffic events, without worrying about website crashes and other performance-related issues.

1. Compress And Optimize Images

One of the simplest ways to design a website that accommodates large volumes of traffic is to optimize and compress images. Typically, images have very large file sizes, which means they take longer for browsers to parse and display. Additionally, they can be a huge drain on bandwidth and lead to slow loading times.

You can free up space and reduce the load on your server by compressing and optimizing images. It’s a good idea to resize images to make them physically smaller. You can often do this using built-in apps on your operating system.

There are also online optimization tools available like Tinify, as well as advanced image editing software like Photoshop or GIMP:

Image format is also a key consideration. Many designers rely on JPG and PNG, but adaptive modern image formats like WebP can reduce the weight of the image and provide a better user experience (UX).

You may even consider installing an image optimization plugin or an image CDN to compress and scale images automatically. Additionally, you can implement lazy loading, which prioritizes the loading of images above the fold and delays those that aren’t immediately visible.

2. Choose A Scalable Web Host

The most convenient way to design a high-traffic website without worrying about website crashes is to upgrade your web hosting solution.

Traditionally, when you sign up for a web hosting plan, you’re allocated a pre-defined number of resources. This can negatively impact your website performance, particularly if you use a shared hosting service.

Upgrading your web host ensures that you have adequate resources to serve visitors flocking to your site during high-traffic events. If you’re not prepared for this eventuality, your website may crash, or your host may automatically upgrade you to a higher-priced plan.

Therefore, the best solution is to switch to a scalable web host like Cloudways Autonomous:

This is a fully managed WordPress hosting service that automatically adjusts your web resources based on demand. This means that you’re able to handle sudden traffic surges without the hassle of resource monitoring and without compromising on speed.

With Cloudways Autonomous your website is hosted on multiple servers instead of just one. It uses Kubernetes with advanced load balancing to distribute traffic among these servers. Kubernetes is capable of spinning up additional pods (think of pods as servers) based on demand, so there’s no chance of overwhelming a single server with too many requests.

High-traffic events like sales can also make your site a prime target for hackers. This is because, in high-stress situations, many sites enter a state of greater vulnerability and instability. But with Cloudways Autonomous, you’ll benefit from DDoS mitigation and a web application firewall to improve website security.

3. Use A CDN

As you’d expect, large volumes of traffic can significantly impact the security and stability of your site’s network. This can result in website crashes unless you take the proper precautions when designing sites for these events.

A content delivery network (CDN) is an excellent solution to the problem. You’ll get access to a collection of strategically-located servers, scattered all over the world. This means that you can reduce latency and speed up your content delivery times, regardless of where your customers are based.

When a user makes a request for a website, they’ll receive content from a server that’s physically closest to their location. Plus, having extra servers to distribute traffic can prevent a single server from crashing under high-pressure conditions. Cloudflare is one of the most robust CDNs available, and luckily, you’ll get access to it when you use Cloudways Autonomous.

You can also find optimization plugins or caching solutions that give you access to a CDN. Some tools like Jetpack include a dedicated image CDN, which is built to accommodate and auto-optimize visual assets.

4. Leverage Caching

When a user requests a website, it can take a long time to load all the HTML, CSS, and JavaScript contained within it. Caching can help your website combat this issue.

A cache functions as a temporary storage location that keeps copies of your web pages on hand (once they’ve been requested). This means that every subsequent request will be served from the cache, enabling users to access content much faster.

The cache mainly deals with static content like HTML which is much quicker to parse compared to dynamic content like JavaScript. However, you can find caching technologies that accommodate both types of content.

There are different caching mechanisms to consider when designing for high-traffic events. For example, edge caching is generally used to cache static assets like images, videos, or web pages. Meanwhile, database caching enables you to optimize server requests.

If you’re expecting fewer simultaneous sessions (which isn’t likely in this scenario), server-side caching can be a good option. You could even implement browser caching, which affects static assets based on your HTTP headers.

There are plenty of caching plugins available if you want to add this functionality to your site, but some web hosts provide built-in solutions. For example, Cloudways Autonomous uses Cloudflare’s edge cache and integrated object cache.

5. Stress Test Websites

One of the best ways to design websites while preparing for peak traffic is to carry out comprehensive stress tests.

This enables you to find out how your website performs in various conditions. For instance, you can simulate high-traffic events and discover the upper limits of your server’s capabilities. This helps you avoid resource drainage and prevent website crashes.

You might have experience with speed testing tools like Pingdom, which assess your website performance. But these tools don’t help you understand how performance may be impacted by high volumes of traffic.

Therefore, you’ll need to use a dedicated stress test tool like Loader.io:

This is completely free to use, but you’ll need to register for an account and verify your website domain. You can then download your preferred file and upload it to your server via FTP.

After that, you’ll find three different tests to carry out. Once your test is complete, you can take a look at the average response time and maximum response time, and see how this is affected by a higher number of clients.

6. Refine The Backend

The final way to design websites for high-traffic events is to refine the WordPress back end.

The admin panel is where you install plugins, activate themes, and add content. The more of these features that you have on your site, the slower your pages will load.

Therefore, it’s a good idea to delete any old pages, posts, and images that are no longer needed. If you have access to your database, you can even go in and remove any archived materials.

On top of this, it’s best to remove plugins that aren’t essential for your website to function. Again, with database access, you can get in there and delete any tables that sometimes get left behind when you uninstall plugins via the WordPress dashboard.

When it comes to themes, you’ll want to opt for a simple layout with a minimalist design. Themes that come with lots of built-in widgets or rely on third-party plugins will likely add bloat to your loading times. Essentially, the lighter your back end, the quicker it will load.

Conclusion

The easiest way to support fluctuating traffic volumes is to upgrade to a scalable web hosting service like Cloudways Autonomous. This way, you can adjust your server resources automatically, based on demand. Plus, you’ll get access to a CDN, caching, and an SSL certificate. Get started today!

Thursday, January 30, 2025

Svelte 5 And The Future Of Frameworks: A Chat With Rich Harris

After months of anticipation, debate, and even a bit of apprehension, Svelte 5 arrived earlier this year. Frederick O’Brien caught up with its creator, Rich Harris, to talk about the path that brought him and his team here and what lies ahead.

Svelte occupies a curious space within the web development world. It’s been around in one form or another for eight years now, and despite being used by the likes of Apple, Spotify, IKEA, and the New York Times, it still feels like something of an upstart, maybe even a black sheep. As creator Rich Harris recently put it,

“If React is Taylor Swift, we’re more of a Phoebe Bridges. She’s critically acclaimed, and you’ve heard of her, but you probably can’t name that many of her songs.”

— Rich Harris

This may be why the release of Svelte 5 in October this year felt like such a big deal. It tries to square the circle of convention and innovation. Can it remain one of the best-loved frameworks on the web while shaking off suspicions that it can’t quite rub shoulders with React, Vue, and others when it comes to scalability? Whisper it, but they might just have pulled it off. The post-launch reaction has been largely glowing, with weekly npm downloads doubling compared to six months ago.

Still, I’m not in the predictions game. The coming months and years will be the ultimate measure of Svelte 5. And why speculate on the most pressing questions when I can just ask Rich Harris myself? He kindly took some time to chat with me about Svelte and the future of web development.

Not Magic, But Magical

Svelte 5 is a ground-up rewrite. I don’t want to get into the weeds here — key changes are covered nicely in the migration guide — but suffice it to say the big one where day-to-day users are concerned is runes. At times, magic feeling $ has given way to the more explicit $state, $derived, and $effect.

A lot of the talk around Svelte 5 included the sentiment that it marks the ‘maturation’ of the framework. To Harris and the Svelte team, it feels like a culmination, with lessons learned combined with aspirations to form something fresh yet familiar.

“This does sort of feel like a new chapter. I’m trying to build something that you don’t feel like you need to get a degree in it before you can be productive in it. And that seems to have been carried through with Svelte 5.”

— Rich Harris

Although raw usage numbers aren’t everything, seeing the uptick in installations has been a welcome signal for Harris and the Svelte team.

“For us, success is definitely not based around adoption, though seeing the number go up and to the right gives us reassurance that we’re doing the right thing and we’re on the right track. Even if it’s not the goal, it is a useful indication. But success is really people building their apps with this framework and building higher quality, more resilient, more accessible apps.”

— Rich Harris

The tenets of a Svelte philosophy outlined by Harris earlier this year reinforce the point:

The web matters.
Optimise for vibes.
Don’t optimise for adoption.
HTML, The Mother Language.
Embrace progress.
Numbers lie.
Magical, not magic.
Dream big.
No one cares.
Design by consensus.

Click the link above to hear these expounded upon, but you get the crux. Svelte is very much a qualitative project. Although Svelte performs well in a fair few performance metrics itself, Harris has long been a critic of metrics like Lighthouse being treated as ends in themselves. Fastest doesn’t necessarily mean best. At the end of the day, we are all in the business of making quality websites.

Rich Harris – North Star, JSNation US 2024

Frameworks are a means to that end, and Harris sees plenty of work to be done there.

Software Is Broken

Every milestone is a cause for celebration. It’s also a natural pause in which to ask, “Now what?” For the Svelte team, the sights seem firmly set on shoring up the quality of the web.

“A conclusion that we reached over the course of a recent discussion is that most software in the world is kind of terrible. Things are not good. Half the stuff on my phone just doesn’t work. It fails at basic tasks. And the same is true for a lot of websites. The number of times I’ve had to open DevTools to remove the disabled attribute from a button so that I can submit a form, or been unclear on whether a payment went through or not.”

— Rich Harris

This certainly meshes with my experience and, doubtless, countless others. Between enshittification, manipulative algorithms, and the seemingly endless influx of AI-generated slop, it’s hard to shake the feeling that the web is becoming increasingly decadent and depraved.

“So many pieces of software that we use are just terrible. They’re just bad software. And it’s not because software engineers are idiots. Our main priority as toolmakers should be to enable people to build software that isn’t broken. As a baseline, people should be able to build software that works.”

— Rich Harris

This sense of responsibility for the creation and maintenance of good software speaks to the Svelte team’s holistic outlook and also looks to influence priorities going forward.

Brave New World

Part of Svelte 5 feels like a new chapter in the sense of fresh foundations. Anyone who’s worked in software development or web design will tell you how much of a headache ground-up rewrites are. Rebuilding the foundations is something to celebrate when you pull it off, but it also begs the question: What are the foundations for?

Harris has his eyes on the wider ecosystem around frameworks.

“I don’t think there’s a lot more to do to solve the problem of taking some changing application state and turning it into DOM, but I think there’s a huge amount to be done around the ancillary problems. How do we load the data that we put in those components? Where does that data live? How do we deploy our applications?”

— Rich Harris

In the short to medium term, this will likely translate into some love for SvelteKit, the web application framework built around Svelte. The framework might start having opinions about authentication and databases, an official component library perhaps, and dev tools in the spirit of the Astro dev toolbar. And all these could be precursors to even bigger explorations.

“I want there to be a Rails or a Laravel for JavaScript. In fact, I want there to be multiple such things. And I think that at least part of Svelte’s long-term goal is to be part of that. There are too many things that you need to learn in order to build a full stack application today using JavaScript.”

— Rich Harris

Why Don’t We Have A Laravel For JavaScript? by Theo Browne
“Why We Don’t Have a Laravel For JavaScript… Yet” by Vince Canger

Onward

Although Svelte has been ticking along happily for years, the release of version 5 has felt like a new lease of life for the ecosystem around it. Every day brings new and exciting projects to the front page of the /r/sveltejs subreddit, while this year’s Advent of Svelte has kept up a sense of momentum following the stable release.

Below are just a handful of the Svelte-based projects that have caught my eye:

Despite the turbulence and inescapable sense of existential dread surrounding much tech, this feels like an exciting time for web development. The conditions are ripe for lovely new things to emerge.

And as for Svelte 5 itself, what does Rich Harris say to those who might be on the fence?

“I would say you have nothing to lose but an afternoon if you try it. We have a tutorial that will take you from knowing nothing about Svelte or even existing frameworks. You can go from that to being able to build applications using Svelte in three or four hours. If you just want to learn Svelte basics, then that’s an hour. Try it.”

— Rich Harris

Tight Mode: Why Browsers Produce Different Performance Results

We know that browsers do all sorts of different things under the hood. One of those things is the way they not only fetch resources like images and scripts from the server but how they prioritize those resources. Chrome and Safari have implemented a “Tight Mode” that constrains which resources are loaded and in what order, but they each take drastically different approaches to it. With so little information about Tight Mode available, this article attempts a high-level explanation of what it is, what triggers it, and how it is treated differently in major browsers.

I was chatting with Debug B ear’s Matt Zeunert and, in the process, he casually mentioned this thing called Tight Mode when describing how browsers fetch and prioritize resources. I wanted to nod along like I knew what he was talking about but ultimately had to ask: What the heck is “Tight” mode?

What I got back were two artifacts, one of them being the following video of Akamai web performance expert Robin Marx speaking at We Love Speed in France a few weeks ago:

The other artifact is a Google document originally published by Patrick Meenan in 2015 but updated somewhat recently in November 2023. Patrick’s blog has been inactive since 2014, so I’ll simply drop a link to the Google document for you to review.

That’s all I have and what I can find on the web about this thing called Tight Mode that appears to have so much influence on the way the web works. Robin acknowledged the lack of information about it in his presentation, and the amount of first-person research in his talk is noteworthy and worth calling out because it attempts to describe and illustrate how different browsers fetch different resources with different prioritization. Given the dearth of material on the topic, I decided to share what I was able to take away from Robin’s research and Patrick’s updated article.

It’s The First of Two Phases #

The fact that Patrick’s original publication date falls in 2015 makes it no surprise that we’re talking about something roughly 10 years old at this point. The 2023 update to the publication is already fairly old in “web years,” yet Tight Mode is still nowhere when I try looking it up.

So, how do we define Tight Mode? This is how Patrick explains it:

“Chrome loads resources in 2 phases. “Tight mode” is the initial phase and constraints [sic] loading lower-priority resources until the body is attached to the document (essentially, after all blocking scripts in the head have been executed).”

— Patrick Meenan

OK, so we have this two-part process that Chrome uses to fetch resources from the network and the first part is focused on anything that isn’t a “lower-priority resource.” We have ways of telling browsers which resources we think are low priority in the form of the Fetch Priority API and lazy-loading techniques that asynchronously load resources when they enter the viewport on scroll — all of which Robin covers in his presentation. But Tight Mode has its own way of determining what resources to load first.

Chrome Tight Mode screenshot — Figure 1: Chrome loads resources in two phases, the first of which is called “Tight Mode.” (Large preview)

Tight Mode discriminates resources, taking anything and everything marked as High and Medium priority. Everything else is constrained and left on the outside, looking in until the body is firmly attached to the document, signaling that blocking scripts have been executed. It’s at that point that resources marked with Low priority are allowed in the door during the second phase of loading.

There’s a big caveat to that, but we’ll get there. The important thing to note is that…

Chrome And Safari Enforce Tight Mode #

Yes, both Chrome and Safari have some working form of Tight Mode running in the background. That last image illustrates Chrome’s Tight Mode. Let’s look at Safari’s next and compare the two.

A screenshot comparing Tight Mode in Chrome with Tight Mode in Safari. — Figure 2: Comparing Tight Mode in Chrome with Tight Mode in Safari. Notice that Chrome allows five images marked with High priority to slip out of Tight Mode. (Large preview)

Look at that! Safari discriminates High-priority resources in its initial fetch, just like Chrome, but we get wildly different loading behavior between the two browsers. Notice how Safari appears to exclude the first five PNG images marked with Medium priority where Chrome allows them. In other words, Safari makes all Medium- and Low-priority resources wait in line until all High-priority items are done loading, even though we’re working with the exact same HTML. You might say that Safari’s behavior makes the most sense, as you can see in that last image that Chrome seemingly excludes some High-priority resources out of Tight Mode. There’s clearly some tomfoolery happening there that we’ll get to.

Where’s Firefox in all this? It doesn’t take any extra tightening measures when evaluating the priority of the resources on a page. We might consider this the “classic” waterfall approach to fetching and loading resources.

Comparison of Chrome, Safari, and Firefox Tight Mode — Figure 3: Chrome and Safari have implemented Tight Mode while Firefox maintains a simple waterfall.(Large preview)

Chrome And Safari Trigger Tight Mode Differently #

Robin makes this clear as day in his talk. Chrome and Safari are both Tight Mode proponents, yet trigger it under differing circumstances that we can outline like this:

	Chrome	Safari
Tight Mode triggered	While blocking JS in the `<head>` is busy.	While blocking JS or CSS anywhere is busy.

Notice that Chrome only looks at the document <head> when prioritizing resources, and only when it involves JavaScript. Safari, meanwhile, also looks at JavaScript, but CSS as well, and anywhere those things might be located in the document — regardless of whether it’s in the <head> or <body>. That helps explain why Chrome excludes images marked as High priority in Figure 2 from its Tight Mode implementation — it only cares about JavaScript in this context.

So, even if Chrome encounters a script file with fetchpriority="high" in the document body, the file is not considered a “High” priority and it will be loaded after the rest of the items. Safari, meanwhile, honors fetchpriority anywhere in the document. This helps explain why Chrome leaves two scripts on the table, so to speak, in Figure 2, while Safari appears to load them during Tight Mode.

That’s not to say Safari isn’t doing anything weird in its process. Given the following markup:

<head>
  <!-- two high-priority scripts -->
  <script src="script-1.js"></script>
  <script src="script-1.js"></script>

  <!-- two low-priority scripts -->
  <script src="script-3.js" defer></script>
  <script src="script-4.js" defer></script>
</head>
<body>
  <!-- five low-priority scripts -->
  <img src="image-1.jpg">
  <img src="image-2.jpg">
  <img src="image-3.jpg">
  <img src="image-4.jpg">
  <img src="image-5.jpg">
</body>

…you might expect that Safari would delay the two Low-priority scripts in the <head> until the five images in the <body> are downloaded. But that’s not the case. Instead, Safari loads those two scripts during its version of Tight Mode.

Safari deferred scripts — Figure 4: Safari treats deferred scripts in the `<head>` with High priority. (Large preview)

Chrome And Safari Exceptions #

I mentioned earlier that Low-priority resources are loaded in during the second phase of loading after Tight Mode has been completed. But I also mentioned that there’s a big caveat to that behavior. Let’s touch on that now.

According to Patrick’s article, we know that Tight Mode is “the initial phase and constraints loading lower-priority resources until the body is attached to the document (essentially, after all blocking scripts in the head have been executed).” But there’s a second part to that definition that I left out:

“In tight mode, low-priority resources are only loaded if there are less than two in-flight requests at the time that they are discovered.”

A-ha! So, there is a way for low-priority resources to load in Tight Mode. It’s when there are less than two “in-flight” requests happening when they’re detected.

Wait, what does “in-flight” even mean?

That’s what’s meant by less than two High- or Medium-priority items being requested. Robin demonstrates this by comparing Chrome to Safari under the same conditions, where there are only two High-priority scripts and ten regular images in the mix:

<head>
  <!-- two high-priority scripts -->
  <script src="script-1.js"></script>
  <script src="script-1.js"></script>
</head>
<body>
  <!-- ten low-priority images -->
  <img src="image-1.jpg">
  <img src="image-2.jpg">
  <img src="image-3.jpg">
  <img src="image-4.jpg">
  <img src="image-5.jpg">
  <!-- rest of images -->
  <img src="image-10.jpg">
</body>

Let’s look at what Safari does first because it’s the most straightforward approach:

Nothing tricky about that, right? The two High-priority scripts are downloaded first and the 10 images flow in right after. Now let’s look at Chrome:

We have the two High-priority scripts loaded first, as expected. But then Chrome decides to let in the first five images with Medium priority, then excludes the last five images with Low priority. What. The. Heck.

The reason is a noble one: Chrome wants to load the first five images because, presumably, the Largest Contentful Paint (LCP) is often going to be one of those images and Chrome is hedging bets that the web will be faster overall if it automatically handles some of that logic. Again, it’s a noble line of reasoning, even if it isn’t going to be 100% accurate. It does muddy the waters, though, and makes understanding Tight Mode a lot harder when we see Medium- and Low-priority items treated as High-priority citizens.

Even muddier is that Chrome appears to only accept up to two Medium-priority resources in this discriminatory process. The rest are marked with Low priority.

That’s what we mean by “less than two in-flight requests.” If Chrome sees that only one or two items are entering Tight Mode, then it automatically prioritizes up to the first five non-critical images as an LCP optimization effort.

Truth be told, Safari does something similar, but in a different context. Instead of accepting Low-priority items when there are less than two in-flight requests, Safari accepts both Medium and Low priority in Tight Mode and from anywhere in the document regardless of whether they are located in the <head> or not. The exception is any asynchronous or deferred script because, as we saw earlier, those get loaded right away anyway.

How To Manipulate Tight Mode #

This might make for a great follow-up article, but this is where I’ll refer you directly to Robin’s video because his first-person research is worth consuming directly. But here’s the gist:

We have these high-level features that can help influence priority, including resource hints (i.e., preload and preconnect), the Fetch Priority API, and lazy-loading techniques.
We can indicate fetchpriority="high" and fetchpriority="low" on items.

<img src="lcp-image.jpg" fetchpriority="high">
<link rel="preload" href="defer.js" as="script" fetchpriority="low">

Using fetchpriority="high" is one way we can get items lower in the source included in Tight Mode. Using fetchpriority="low is one way we can get items higher in the source excluded from Tight Mode.
For Chrome, this works on images, asynchronous/deferred scripts, and scripts located at the bottom of the <body>.
For Safari, this only works on images.

Again, watch Robin’s talk for the full story starting around the 28:32 marker.

That’s Tight… Mode #

It’s bonkers to me that there is so little information about Tight Mode floating around the web. I would expect something like this to be well-documented somewhere, certainly over at Chrome Developers or somewhere similar, but all we have is a lightweight Google Doc and a thorough presentation to paint a picture of how two of the three major browsers fetch and prioritize resources. Let me know if you have additional information that you’ve either published or found — I’d love to include them in the discussion.

What Does AI Really Mean?

In 2024, Artificial Intelligence (AI) hit the limelight with major advancements. The problem with reaching common knowledge and so much public attention so quickly is that the term becomes ambiguous. While we all have an approximation of what it means to “use AI” in something, it’s not widely understood what infrastructure having AI in your project, product, or feature entails.

So, let’s break down the concepts that make AI tick. How is data stored and correlated, and how are the relationships built in order for an algorithm to learn how to interpret that data? As with most data-oriented architectures, it all starts with a database.

Data As Coordinates

Creating intelligence, whether artificial or natural, works in a very similar way. We store chunks of information, and we then connect them. Multiple visualization tools and metaphors show this in a 3-dimensional space with dots connected by lines on a graph. Those connections and their intersection are what make up for intelligence. For example, we put together “chocolate is sweet and nice” and “drinking hot milk makes you warm”, and we make “hot chocolate”.

We, as human beings, don’t worry too much about making sure the connections land at the right point. Our brain just works that way, declaratively. However, for building AI, we need to be more explicit. So think of it as a map. In order for a plane to leave CountryA and arrive at CountryB it requires a precise system: we have coordinates, we have 2 axis in our maps, and they can be represented as a vector: [28.3772, 81.5707].

For our intelligence, we need a more complex system; 2 dimensions will not suffice; we need thousands. That’s what vector databases are. Our intelligence can now correlate terms based on the distance and/or angle between them, create cross-references, and establish patterns in which every term occurs.

A specialized database that stores and manages data as high-dimensional vectors. It enables efficient similarity searches and semantic matching.

Querying Per Approximation

As stated in the last session, matching the search terms (your prompt) to the data is the exercise of semantic matching (it establishes the pattern in which keywords in your prompt are used within its own data), and the similarity search, the distance (angular or linear) between each entry. That’s actually a roughly accurate representation. What a similarity search does is define each of the numbers in a vector (that’s thousands of coordinates long), a point in this weird multi-dimensional space. Finally, to establish similarity between each of these points, the distance and/or angles between them are measured.

This is one of the reasons why AI isn’t deterministic — we also aren’t — for the same prompt, the search may produce different outputs based on how the scores are defined at that moment. If you’re building an AI system, there are algorithms you can use to establish how your data will be evaluated.

This can produce more precise and accurate results depending on the type of data. The main algorithms used are 3, and Each one of them performs better for a certain kind of data, so understanding the shape of the data and how each of these concepts will correlate is important to choosing the correct one. In a very hand-wavy way, here’s the rule-of-thumb to offer you a clue for each:

Cosine Similarity
Measures angle between vectors. So if the magnitude (the actual number) is less important. It’s great for text/semantic similarity
Dot Product
Captures linear correlation and alignment. It’s great for establishing relationships between multiple points/features.
Euclidean Distance
Calculates straight-line distance. It’s good for dense numerical spaces since it highlights the spatial distance.

INFO

When working with non-structured data (like text entries: your tweets, a book, multiple recipes, your product’s documentation), cosine similarity is the way to go.

Now that we understand how the data bulk is stored and the relationships are built, we can start talking about how the intelligence works — let the training begin!

Language Models

A language model is a system trained to understand, predict, and finally generate human-like text by learning statistical patterns and relationships between words and phrases in large text datasets. For such a system, language is represented as probabilistic sequences.

In that way, a language model is immediately capable of efficient completion (hence the quote stating that 90% of the code in Google is written by AI — auto-completion), translation, and conversation. Those tasks are the low-hanging fruits of AI because they depend on estimating the likelihood of word combinations and improve by reaffirming and adjusting the patterns based on usage feedback (rebalancing the similarity scores).

As of now, we understand what a language model is, and we can start classifying them as large and small.

Large Language Models (LLMs)

As the name says, use large-scale datasets &mdash with billions of parameters, like up to 70 billion. This allows them to be diverse and capable of creating human-like text across different knowledge domains. Think of them as big generalists. This makes them not only versatile but extremely powerful. And as a consequence, training them demands a lot of computational work.

Small Language Models (SLMs)

With a smaller dataset, with numbers ranging from 100 million to 3 billion parameters. They take significantly less computational effort, which makes them less versatile and better suited for specific tasks with more defined constraints. SLMs can also be deployed more efficiently and have a faster inference when processing user input.

Fine-Tunning

Fine-tuning an LLM consists of adjusting the model’s weights through additional specialized training on a specific (high-quality) dataset. Basically, adapting a pre-trained model to perform better in a particular domain or task.

As training iterates through the heuristics within the model, it enables a more nuanced understanding. This leads to more accurate and context-specific outputs without creating a custom language model for each task. On each training iteration, developers will tune the learning rate, weights, and batch-size while providing a dataset tailored for that particular knowledge area. Of course, each iteration depends also on appropriately benchmarking the output performance of the model.

As mentioned above, fine-tuning is particularly useful for applying a determined task with a niche knowledge area, for example, creating summaries of nutritional scientific articles, correlating symptoms with a subset of possible conditions, etc.

Fine-tuning is not something that can be done frequently or fast, requiring numerous iterations, and it isn’t intended for factual information, especially if dependent on current events or streamed information.

Enhancing Context With Information

Most conversations we have are directly dependent on context; with AI, it isn’t so much different. While there are definitely use cases that don’t entirely depend on current events (translations, summarization, data analysis, etc.), many others do. However, it isn’t quite feasible yet to have LLMs (or even SLMs) being trained on a daily basis.

For this, a new technique can help: Retrieve-Augmented Generation (RAG). It consists of injecting a smaller dataset into the LLMs in order to provide it with more specific (and/or current) information. With a RAG, the LLM isn’t better trained; it still has all the generalistic training it had before — but now, before it generates the output, it receives an ingest of new information to be used.

INFO

RAG enhances the LLM’s context, providing it with a more comprehensive understanding of the topic.

For an RAG to work well, data must be prepared/formatted in a way that the LLM can properly digest it. Setting it up is a multi-step process:

Retrieval
Query external data (such as web pages, knowledge bases, and databases).
Pre-Processing
Information undergoes pre-processing, including tokenization, stemming, and removal of stop words.
Grounded Generation
The pre-processed retrieved information is then seamlessly incorporated into the pre-trained LLM.

RAG first retrieves relevant information from a database using a query generated by the LLM. Integrating an RAG to an LLM enhances its context, providing it with a more comprehensive understanding of the topic. This augmented context enables the LLM to generate more precise, informative, and engaging responses.

Since it provides access to fresh information via easy-to-update database records, this approach is mostly for data-driven responses. Because this data is context-focused, it also provides more accuracy to facts. Think of a RAG as a tool to turn your LLM from a generalist into a specialist.

Enhancing an LLM context through RAG is particularly useful for chatbots, assistants, agents, or other usages where the output quality is directly connected to domain knowledge. But, while RAG is the strategy to collect and “inject” data into the language model’s context, this data requires input, and that is why it also requires meaning embedded.

Embedding

To make data digestible by the LLM, we need to capture each entry’s semantic meaning so the language model can form the patterns and establish the relationships. This process is called embedding, and it works by creating a static vector representation of the data. Different language models have different levels of precision embedding. For example, you can have embeddings from 384 dimensions all the way to 3072.

In other words, in comparison to our cartesian coordinates in a map (e.g., [28.3772, 81.5707]) with only two dimensions, an embedded entry for an LLM has from 384 to 3072 dimensions.

Let’s Build

I hope this helped you better understand what those terms mean and the processes which encompass the term “AI”. This merely scratches the surface of complexity, though. We still need to talk about AI Agents and how all these approaches intertwine to create richer experiences. Perhaps we can do that in a later article — let me know in the comments if you’d like that!

Meanwhile, let me know your thoughts and what you build with this!