eclipse.dev pages show stale content due to browser caching
Summary
We release a new version of the Eclipse ESCET website at eclipse.dev/escet
. Once the Git repo is synced to the webserver, still browsers show stale content. The problem is that browsers now decide when to recheck for content, rather than the website indicating whether new content is available.
Steps to reproduce
Visit eclipse.dev/escet
for the first time in a browser when it hasn't been updated for several weeks or even a few months (we typically release only once per quarter, at the end of the quarter). Wait for a new release to be made available. Go to the website. Browser shows stale/old version, not new one.
What is the current bug behavior?
Users get notified that a new version is available. They go check it out. They see an old website, old release notes, etc. They have to press F5 to force a refresh. Not all users are such knowledgeable about browsers. Users complain about the website being outdated.
What is the expected correct behavior?
Visiting the website shows the latest version of the website. It is OK if it only does so a few minutes after a new version is released. But not hours, days or weeks later.
We want to send out messages that a new version is available. We can wait a few minutes after deploying the new website to do so. But we don't want to wait hours or days. And when users get the messages, they should actually see the new version of the website.
Relevant logs and/or screenshots
The problem is that currently, the webserver doesn't indicate how the browser may cache, nor when it should revalidate with the webserver. As such, browsers determine this themselves. See for instance this (old) information on how Firefox determines this: https://www-archive.mozilla.org/projects/netlib/http/http-caching-faq.html. It is based on RFC 2616, and basically the longer the page hasn't been modified, the longer the browser won't check with the server whether the browser's cached version of the page is still current. There is more recent information at https://developer.mozilla.org/en-US/docs/Web/HTTP/Caching#heuristic_caching, which states that browser's won't check for 'time_since_it_has_been_modified / 10'.
Note that this form of "heuristic caching" is not recommended. https://developer.mozilla.org/en-US/docs/Web/HTTP/Caching#heuristic_caching indicates: "Heuristic caching is a workaround that came before Cache-Control support became widely adopted, and basically all responses should explicitly specify a Cache-Control header."
Currently, the following response headers are sent by the webserver:
HTTP/2 200
server: nginx
date: Tue, 09 Apr 2024 18:22:48 GMT
content-type: text/html
content-length: 3527
last-modified: Sun, 31 Mar 2024 13:31:40 GMT
etag: "2e0c-614f4e499302a-gzip"
content-encoding: gzip
expires: Wed, 10 Apr 2024 18:22:07 GMT
x-proxy-cache: HIT
content-security-policy: frame-ancestors 'self'
x-frame-options: SAMEORIGIN
accept-ranges: bytes
X-Firefox-Spdy: h2
The solution is to have the webserver send a Cache-Control
response header with its responses. To learn about this header, that is part of the HTTP standard, see https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Cache-Control. Other headers like etag
and last-modified
are already properly included.
Note that setting a maximum age does not mean that after that time passes, the cache is discarded. The response from the webserver is still kept, it is just considered 'stale' rather than 'fresh'. This means the browser will (re)validate with the webserver. It will ask whether the content has changed. It may get the response from the webserver that it has not changed (304 Not Modified
), and can keep using the cached response, or it gets the new content. So, the content is not constantly retransmitted. But, the browser does ask for changes, and thus shows current content. See https://developer.mozilla.org/en-US/docs/Web/HTTP/Caching#validation.
To learn more about HTTP caching, see https://developer.mozilla.org/en-US/docs/Web/HTTP/Caching.
I don't know what Cache-Control
header is best. I'd probably go for either Cache-Control: no-cache
, to let the browser always check with the webserver. Somewhat confusingly, the response is actually cached, but becomes stale immediately, and the browser must thus always revalidate with the webserver. Or use something like Cache-Control: max-age=60, must-revalidate
to not check with the server for a minute, but after a minute it must always recheck with the webserver.
If the Eclipse Foundation were to use an Apache webserver, I'd have added a .htaccess
file to fix this. But, NGINX is used, which doesn't support this: https://www.nginx.com/resources/wiki/start/topics/examples/likeapache-htaccess/. The only solution is for the Eclipse Foundation to update the NGINX server configuration to provide proper response headers.
Priority
-
Urgent -
High -
Medium -
Low
Severity
-
Blocker -
Major -
Normal -
Low
Impact
Doesn't block anything, but doesn't look good when users see old/stale content. Doesn't reflect well on the project.