What makes the underused Server-Timing
header uniquely powerful among all other response headers? We’ll
rethink the expectation for using it exclusively for timing and see fast
solutions for hard-to-solve monitoring challenges.
In the world of HTTP Headers, there is one header that I believe deserves more air-time and that is the Server-Timing
header.
To me, it’s a must-use in any project where real user monitoring (RUM)
is being instrumented. To my surprise, web performance monitoring
conversations rarely surface Server-Timing
or cover a very shallow understanding of its application — despite it being out for many years.
Part
of that is due to the perceived limitation that it’s exclusively for
tracking time on the server — it can provide so much more value! Let’s
rethink how we can leverage this header. In this piece, we will dive
deeper to show how Server-Timing
headers are so uniquely
powerful, show some practical examples by solving challenging monitoring
problems with this header, and provoke some creative inspiration by
combining this technique with service workers.
Server-Timing
is uniquely powerful, because it is the only
HTTP Response header that supports setting free-form values for a
specific resource and makes them accessible from a JavaScript Browser
API separate from the Request/Response references themselves. This
allows resource requests, including the HTML document itself, to be
enriched with data during its lifecycle, and that information can be
inspected for measuring the attributes of that resource!
The only other header that’s close to this capability is the HTTP Set-Cookie
/ Cookie
headers. Unlike Cookie
headers, Server-Timing
is only on the response for a specific resource where Cookies
are sent on requests and responses for all resources after they’re set
and unexpired. Having this data bound to a single resource response is
preferable, as it prevents ephemeral data about all responses from
becoming ambiguous and contributes to a growing collection of cookies
sent for remaining resources during a page load.
Setting Server-Timing
This
header can be set on the response of any network resource, such as XHR,
fetch, images, HTML, stylesheets, etc. Any server or proxy can add this
header to the request to provide inspectable data. The header is
constructed via a name with an optional description and/or metric value.
The only required field is the name. Additionally, there can be many Server-Timing
headers set on the same response which would be combined and separated via a comma.
A few simple examples:
Important Note: For cross-origin resources, Server-Timing
and other potentially sensitive timing values are not exposed to consumers. To allow these features, we will also need the Timing-Allow-Origin
header that includes our origin or the *
value.
For this article, that’s all we’ll need to start exposing the value and leave other more specific articles to go deeper. MDN docs.
Consuming Server-Timing
Web
browsers expose a global Performance Timeline API to inspect details
about specific metrics/events that have happened during the page
lifecycle. From this API we can access built-in performance API
extensions which expose timings in the form of PerformanceEntries
.
There are a handful of different entry subtypes but, for the scope of this article, we will be concerned with the PerformanceResourceTiming
and PerformanceNavigationTiming
subtypes. These subtypes are currently the only subtypes related to network requests and thus exposing the Server-Timing
information.
For
the top-level HTML document, it is fetched upon user navigation but is
still a resource request. So, instead of having different PerformanceEntries
for the navigation and the resource aspects, the PerformanceNavigationTiming
provides resource loading data as well as additional
navigation-specific data. Because we are looking at just the resource
load data, we will exclusively refer to the requests (navigational docs
or otherwise) simply as resources.
To query performance entries, we have 3 APIs that we can call: performance.getEntries()
, performance.getEntriesByType()
, performance.getEntriesByName()
. Each will return an array of performance entries with increasing specificity.
Finally, each of these resources will have a serverTiming
field which is an array of objects mapped from the information provided in the Server-Timing
header — where PerformanceEntryServerTiming
is supported (see considerations below). The shape of the objects in this array is defined by the PerformanceEntryServerTiming
interface which essentially maps the respective Server-Timing
header metric options: name
, description
, and duration
.
Let’s look at this in a complete example.
A request was made to our data endpoint and among the headers, we sent back the following:
On the client-side, let’s assume this is our only resource loaded on this page:
That covers the fundamental APIs used to access resource entries and the information provided from a Server-Timing
header. For links to more details on these APIs, see the resources section at the bottom.
Now that we have the fundamentals of how to set and use this header/API combo, let’s dive into the fun stuff.
It’s Not Just About Time
From
my conversations and work with other developers, the name
“Server-Timing” impresses a strong connection that this is a tool used
to track spans of time or a detail about a span of time exclusively.
This is entirely justified by the name and the intent of the feature.
However, the spec for this header is very flexible; allowing for values
and expressing information that could have nothing to do with timing or
performance in any way. Even the duration
field has no
predefined unit of measurement — you can put any number (double) in that
field. By stepping back and realizing that the fields available have no
special bindings to particular types of data, we can see that this
technique is also an effective delivery mechanism for any arbitrary data
allowing lots of interesting possibilities.
Examples of non-timing information you could send: HTTP Response status code, regions, request ids, etc. — any freeform data that suits your needs. In some cases, we might send redundant information that might be in other headers already, but that’s ok. As we will cover, accessing other headers for resources quite often isn’t possible, and if it has monitoring value, then it’s ok to be redundant.
No References Required
Due to the design of web browser APIs, there are currently no mechanisms for querying requests and their relative responses after the fact. This is important because of the need to manage memory. To read information about a request or its respective response, we have to have a direct reference to these objects. All of the web performance monitoring software we work with provide RUM clients that put additional layers of monkey patching on the page to maintain direct access to a request being made or the response coming back. This is how they offer drop-in monitoring of all requests being made without us needing to change our code to monitor a request. This is also why these clients require us to put the client before any request that we want to monitor. The complexities of patching all of the various networking APIs and their linked functionality can become very complex very quickly. If there were an easy access mechanism to pull relevant resource/request information about a request, we would certainly prefer to do that on the monitoring side.
To make matters more difficult, this monkey-patching pattern only works for resources where JavaScript is directly used to initiate the networking. For Images, Stylesheets, JS files, the HTML Doc, etc. the methods for monitoring the request/response details are very limited, as usually there is no direct reference available.
This
is where the Performance Timeline API provides great value. As we saw
earlier, it quite literally is a list of requests made and some data
about each of them respectively. The data for each performance entry is
very minimal and almost entirely limited to timing information and some
fields that, depending on their value, would impact how a resource’s
performance is measured relative to other resources. Among the timing
fields, we have direct access to the serverTiming
data.
Putting all of the pieces together, resources can have Server-Timing
headers in their network responses that contain arbitrary data. Those resources can then be easily queried, and the Server-Timing
data can be accessed without a direct reference to the request/response
itself. With this, it doesn’t matter if you can access/manage
references for a resource, all resources can be enriched with arbitrary
data accessible from an easy-to-use web browser API. That’s a very
unique and powerful capability!
Next, let’s apply this pattern to some traditionally tough challenges to measure.
Solution 1: Inspecting Images and Other Asset Responses
Images,
stylesheets, JavaScript files, etc. typically aren’t created by using
direct references to the networking APIs with information about those
requests. For example, we almost always trigger image downloads by
putting an img
element in our HTML. There are techniques for loading these assets which require using JavaScript fetch
/xhr
APIs to pull the data and push it into an asset reference directly.
While that alternate technique makes them easier to monitor, it is
catastrophic to performance in most cases. The challenge is how do we
inspect these resources without having direct networking API references?
To tie this to real-world use cases, it’s important to ask why might we want to inspect and capture response information about these resources? Here are a few reasons:
- We might want to proactively capture details like status codes for our resources, so we can triage any changes.
For example, missing images (404s) are likely totally different issues and types of work than dealing with images returning server errors (500s). - Adding monitoring to parts of our stack that we don’t control.
Usually, teams offload these types of assets to a CDN to store and deliver to users. If they are having issues, how quickly will the team be able to detect the issue? - Runtime or on-demand variations of resources have become more commonplace techniques.
For example, image resizing, automatic polyfilling of scripts on the CDN, etc — these systems can have many limits and reasons for why they might not be able to create or deliver a variation. If you expect that 100% of users retrieve a particular type of asset variation, it’s valuable to be able to confirm that.
This came up at a previous company I worked at where on-demand image resizing was used for thumbnail images. Due to the limitations of the provider, a significant number of users would get worse experiences due to full-size images loading where thumbnails are supposed to appear. So, where we thought >99% of users would get optimal images, >30% would hit performance issues, because images did not resize.
Now that we have some understanding of what might motivate us to inspect these resources, let’s see how Server-Timing
can be leveraged for inspection.
Image HTML:
Image response headers:
Inspecting the image response information:
This
metric was very valuable because, despite returning “happy” responses
(200s), our images weren’t resized and potentially not converted to the
right format, etc. Along with the other performance information on the
entry like download times, we see the status was served as 200
(not triggering our onerror handlers on the element), resizing failed after spending 1.2s
on attempting to resize, and we have a request-id that we can use to
debug this in our other tooling. By sending this data to our RUM
provider, we can aggregate and proactively monitor how often these
conditions happen.
Solution 2: Inspect Resources That Return Before JS Runs
Code used to monitor resources (fetch, XHR, images, stylesheets, scripts, HTML, etc.) requires JavaScript code to aggregate and then send the information somewhere. This almost always means there’s an expectation for the monitoring code to run before the resources that are being monitored. The example presented earlier of the basic monkey patching used to automatically monitor fetch requests is a good example of this. That code has to run before any fetch request that needs to be monitored. However, there are lots of cases, from performance to technical constraints, where we might not be able to or simply should not change the order in which a resource is requested to make it easier to be monitored.
Another very common monitoring technique is to
put event listeners on the page to capture events that might have
monitoring value. This usually comes in the form of onload
or onerror
handlers on elements or using addEventListener
more abstractly. This technique requires JS to have been set before the
event fires, or before the listener itself is attached. So, this
approach still carries the characteristic only monitoring events going
forward, after the monitoring JS is run, thus requiring the JS to
execute before the resources requiring measurement.
Mapping this to real-world use-cases, e-commerce sites put a heavy emphasis on “above the fold” content rendering very quickly — typically deferring JS as much as possible. That said, there might be resources that are impactful to measure, such as successful delivery of the product image. In other situations, we might also decide that the monitoring library itself shouldn’t be in the critical path due to page weight. What are the options to inspect these requests retroactively?
The technique is the same as Solution #1! This is possible because browsers automatically maintain a buffer of all of the Performance Entries (subject to the buffer size limit that can be changed). This allows us to defer JS until later in the page load cycle without needing to add listeners ahead of the resource.
Instead of repeating the Solution #1 example, let’s look at what both retroactive and future inspection of performance entries looks like to show the difference of where they can be leveraged. Please note that, while we are inspecting images in these examples, we can do this for any resource type.
Setting up context for this code, our need is that
we have to ensure our product images are being delivered successfully.
Let’s assume all website images return this Server-Timing
header structure. Some of our important images can happen before our
monitoring script and, as the user navigates, more will continue to
load. How do we handle both?
Image response headers:
Our monitoring logic. We expect this to run after the critical path content of the page.
Inspecting the image response information:
Despite deferring our monitoring script until it was out of the critical path, we are capturing the data for all images which have loaded before our script and will continue to monitor them, as the user continues using the site.
Solution 3: Inspecting the HTML Doc
The final example solution we will look at is related to the ultimate “before JS can run” resource — the HTML document itself. If our monitoring solutions are loaded as JS via the HTML, how can we monitor the delivery of the HTML document?
There is some precedence in monitoring HTML doc delivery. For monitoring response data, the most common setup is to use server logs/metrics/traces to capture this information. That is a good solution but depending on the tooling, the data may be decoupled from RUM data causing us to need multiple tools to inspect our user experiences. Additionally, this practice could also miss metadata (page instance identifiers for example) that allows us to aggregate and correlate information for a given page load — for example, correlating async requests failing when the doc returns certain document response codes.
A common pattern for doing this work is putting the content inside of the HTML content itself. This has to be put into the HTML content, because the JS-based monitoring logic does not have access to the HTML request headers that came before it. This turns our HTML document into a dynamic document content. This may be fine for our needs and allows us to take that information and provide it to our RUM tooling. However, this could become a challenge if our system for HTML delivery is out of our control, or if the system has some assumptions into how HTML delivery must function. Examples of this might be, expecting that the HTML is fully static, such that we can cache it downstream in some deterministic manner — “partially dynamic” HTML bodies are much more likely to be handled incorrectly by caching logic.
Within the HTML delivery process, there could also be
additional data that we want to understand, such as what datacenters
processed the request throughout the chain. We might have a CDN edge
handler that proxies a request from an origin. In this case, we can’t
expect each layer could/should process and inject HTML content. How
might Server-Timing
headers help us here?
Building on
the concepts of Solution #1 and Solution #2, here’s how we can capture
valuable data about the HTML document itself. Keep in mind that any part
of the stack can add a Server-Timing
header to the response, and it will be joined together in the final header value.
Let’s assume we have a CDN edge handler and an origin which can process the document:
CDN added response headers:
Origin added response headers:
Inspecting the HTML response information:
From
this information, our monitoring JavaScript (which could have been
loaded way later) can aggregate where the HTML processing happened,
status codes from the different servers (which can differ for legit
reasons — or bugs), and request identifiers if they need to correlate
this with server logs. It also knows how much time was taken on the
“server” via the cdn_time
duration — “server” time being the total time starting at the first non-user proxy/server that we provide. Using that cdn_time
duration, the already accessible HTML Time-To-First-Byte value and the origin_time
duration, we can determine latency sections more accurately, such as the user latency, the cdn
to origin latency, etc. This is incredibly powerful in optimizing such a
critical delivery point and protecting it from regression.
Combining Server-Timing with Service Workers
Service Workers are scripts that are initialized by the website to sit between the website, the browser, and the network (when available). When acting as a proxy, they can be used to read and modify requests coming from and responses returning to the website. Given service workers are so feature rich, we won’t attempt to cover them in depth in this article — a simple web search will yield a mountain of information about their capabilities. For this article, we will focus on the proxying capability of a service worker — it’s ability to process requests/responses.
The key to combining these tools is knowing that the Server-Timing
header and its respective PerformanceEntry
is calculated after service worker proxying takes place. This allows us to use the service workers to add Server-Timing
headers to responses that can provide valuable information about the request itself.
What type of information might we want to capture within the service worker? As mentioned before, service workers have lots of capabilities, and any one of those actions could produce something valuable to capture. Here are a few that come to mind:
- Is this request served from the service worker cache?
- Is this served from the service worker while off-line?
- What service worker strategy for this request type is being used?
- What version of the service worker is being used?
This is helpful in checking our assumptions about service worker invalidation. - Take values from other headers and put them into a
Server-Timing
header for downstream aggregation.
Valuable when we don’t have the option to change the headers on the request but would like to inspect them in RUM — such is usually the case with CDN providers. - How long has a resource been in service worker cache?
Service workers have to be initialized on the website which is an asynchronous process itself. Furthermore, service workers only process requests within the defined scope. As such, even the basic question of, “is this request processed by the service worker?” can drive interesting conversations on how much we are leaning on its capabilities to drive great experiences.
Let’s dive into how this might look in the code.
Basic JS logic used on the site to initialize the service worker:
Inside of /service-worker.js
, basic request/response proxying:
Requests that are processed from the service worker, now will have a Server-Timing
header appended to their responses. This allows us to inspect that data
via the Performance Timeline API, as we demonstrated in all of our
prior examples. In practice, we likely didn’t add the service worker for
this single need — meaning we already have it instrumented for handling
requests. Adding the one header in 2 places allowed us to measure
status codes for all requests, service worker-based cache-hit ratios,
and how often service workers are processing requests.
Why Use Server-Timing
If We Have Service Workers? #
This is an important question that comes up when discussing combining these techniques. If a service worker can grab all of the header and content information, why do we need a different tool to aggregate it?
The
work of measuring timing and other arbitrary metadata about requests is
almost always, so that we can send this information to a RUM provider
for analysis, alerting, etc. All major RUM clients have 1 or 2 windows
for which we can enrich the data about a request — when the response
happens, and when the PerformanceEntry
is detected. For example, if we make a fetch request, the RUM client captures the request/response details and sends it. If a PerformanceEntry
is observed, the client sends that information as well — attempting to
associate it to the prior request if possible. If RUM clients offer the
ability to add information about that requests/entries, those were the
only windows to do it.
In practice, a service worker may or may
not be activated yet, a request/response may or may not have processed
the service worker, and all service worker data sharing requires async
messaging to the site via postMessage()
API. All of these
aspects introduce race conditions for a service worker to be active,
able to capture data, and then send that data in time to be enriched by
the RUM client.
Contrasting this with Server-Timing
, a RUM client that processes the Performance Timeline API will immediately have access to any Server-Timing
data set on the PerformanceEntry
.
Given
this assessment of service worker’s challenges with enriching
request/response data reliably, my recommendation is that service
workers be used to provide more data and context instead of being the
exclusive mechanism for delivering data to the RUM client on the main
thread. That is, use Server-Timing
and, where needed, use service worker to add more context or in cases where Server-Timing
isn’t supported — if required. In this case, we might be creating
custom events/metrics instead of enriching the original request/response
data aggregation, as we will assume that the race conditions mentioned
will lead to missing the windows for general RUM client enrichment.
Considerations for Server-Timing
Usage
As uniquely powerful as it is, it’s not without important considerations. Here’s a list of considerations based on the current implementation at time of writing:
- Browser Support — Safari does not support putting the
Server-Timing
data into the Performance Timeline API (they do show it in DevTools).
This is a shame, however, given this is not about functionality for users, but instead it’s about improved capabilities for performance monitoring — I side with this not being a blocking problem. With browser-based monitoring, we never expect to measure 100% of browsers/users. Currently, this means we’d look to get ~70-75% support based on global browser usage data. Which is usually more than enough to feel confident that our metrics are showing us good signals about the health and performance or our systems. As mentioned,Server-Timing
is sometimes the only way to get those metrics reliably, so we should feel confident about leveraging this tool.
As mentioned previously, if we absolutely have to have this data for Safari, we could explore using a cookie-based solution for Safari users. Any solutions here would have to be tested heavily to ensure they don’t hinder performance. - If we are looking to improve performance, we want to avoid adding lots of weight to our responses, including headers.
This is a trade-off of additional weight for value added metadata. My
recommendation is that if you’re not in the range 500 bytes or more to
your
Server-Timing
header, I would not be concerned. If you are worried, try varying lengths and measure its impact! - When appending multiple
Server-Timing
headers on a single response, there is a risk of duplicateServer-Timing
metric names. Browsers will surface all of them in theserverTiming
array on thePerformanceEntry
. It’s best to ensure that this is avoided by specific or namespaced naming. If it can’t be avoided, then we would break down the order of events that added each header and define a convention we can trust. Otherwise, we can create a utility that doesn’t blindly addServer-Timing
entries but will also update existing entries if they are already on the Response. - Try to avoid the mistake of misremembering that responses cache the
Server-Timing
values as well. In some cases you might want to filter out the timing-related data of cached responses that, before they were cached, spent time on the server. There are varying ways to detect if the request went to the network with data on thePerformanceEntry
, such asentry.transferSize > 0
, orentry.decodedBodySize > 0
, orentry.duration > 40
. We can also lean into what we’ve learned withServer-Timing
to set a timestamp on the header for comparison.
Wrapping Up
We’ve gone pretty deep into the application of the Server-Timing
Header for use cases that aren’t aligned to the “timing” use case that
this header is generally associated with. We’ve seen its power to add
freeform data about a resource and access the data without needing a
reference to the networking API used to make it. This is a very unique
capability that we leveraged to measure resources of all types, inspect
them retroactively, and even capture data about the HTML doc itself.
Combining this technique with service workers, we can add more
information from the service worker itself or to map response
information from uncontrolled server responses to Server-Timing
for easy access.
I believe that Server-Timing
is so impressively unique that it should be used way more, but I also
believe that it shouldn’t be used for everything. In the past, this has
been a must-have tool for performance instrumentation projects I’ve
worked on to provide impossible to access resource data and identify
where latency is occuring. If you’re not getting value out of having the
data in this header, or if it doesn’t fit your needs — there’s no
reason to use it. The goal of this article was to provide you with a new
perspective on Server-Timing
as a tool to reach for, even if you’re not measuring time.
No comments:
Post a Comment