10 Varnish Cache mistakes and how to avoid them
Although the Varnish Book covers all Varnish-related things, we’ve gathered the most common Varnish Cache mistakes and are addressing these in this short tutorial.
UPDATE: this post has been around for quite some time already and some items have become outdated. To address this we’re hosting a webinar on the same topic.
- Caching set cookies
Caching an object with a Set-Cookie header can have devastating effects, as any client requesting the object will get that same cookie set. This can potentially lead to a session transfer. In general we recommend avoiding the use of return (deliver) in vcl_fetch, to stay safe against this. If you really do need a return (deliver), be careful and check for the presence of Set-Cookie first. By default, Varnish will of course not cache responses with this header set.
2. Varying on User-Agent
Many content management systems will issue a “Vary: User-Agent”. This will more or less render the cache useless as finding two users with the exact same user-agent string is pretty hard. Normalize the string and your cache hit rates will increase dramatically.
3. Setting high TTLs without a proper plan
The higher the TTL the better the speed of the website and the better the user experience. Setting a high TTL also reduces the load on the backend which can save you a lot of money. However, if you plan on setting a high TTL you should also have a way to invalidate the contents in the cache as the content changes (such as Varnish Enhanced Cache Invalidation).
4. Believing everything you read about Varnish Cache online
There are a lot of tuning tips out there, both for Varnish Cache and Linux kernel itself. We’ve seen multiple installations with more or less random settings that we’ve traced back to blog posts where people have been testing various settings. For instance, there are options that you can enable that will work very well on a local area network but will break when clients are accessing the website across the internet. Read the documentation and be careful changing settings without understanding the implications. Get yourself up to speed by downloading The Varnish Book.
5. Failure to monitor Varnish Cache’s ‘Nuked Counter’
Monitoring varnishstat’s n_lru_nuked counter will tell us how many times Varnish Cache had to forcefully evict an object from the cache in order to fit new objects. Monitoring this counter will let you know if your cache is starved for storage. If you see an elevated value here, it means your working set does not fit in the configured storage and you will benefit from adding more space.
6. Not using custom error messages
In the case that the origin server has fallen over, and Varnish Cache finds it does not have a suitable candidate object to serve to the client, Varnish Cache will respond with the dreaded “Guru Meditation” error response. We recommend you customize this error message (this can be done by editing the response in the VCL subroutine vcl_error) to be more in line with the look and feel of your website. You can have a look at www.varnish-software.com/error for some inspiration. You can even embed images inline in the HTML markup.
7. Messing with accept-encoding
Varnish 3.0 and later has native support for gzip. This means that there is no longer a need to manually mangle the Accept-Encoding request header in order to cache both compressed and uncompressed versions of responses. Varnish 3.0 and later will handle this automatically, by uncompressing content on the fly when needed. This isn’t the deepest pitfall out there, but maintaining a short, sweet and well readable VCL is always a good thing.
8. Failure to understand hit-for-pass
Hit-for-pass is not an intuitive concept. Many users fail to understand how it works and may misconfigure Varnish Cache. Varnish Cache will coalesce multiple requests into one backend request. If that response then does something funny, like doing a Set-Cookie, Varnish Cache will create a hit-for-pass object in order to remember that requests to this URL should not be put on the waiting list and simply sent straight to the backend. The default TTL for these objects is 120 seconds. If you set the TTL for hit-for-pass reponses to 0 you’ll force serialized access to that URL. With some traffic on that URL access will be slow as molasses.
9. Misconfiguration of memory
If you give Varnish Cache too much memory you run the risk of running out of memory. This might be painful, especially on Linux which has some issues with paging performance. In addition many users fail to realize there is a per-object memory overhead.
On the other hand giving Varnish too little memory will most likely result in a very low cache hit rate, giving your users a bad user experience. When we onboard new customers this is one of the things we pay a lot of attention to as the consequences of running out of memory are pretty dire.
10. Failure to monitor sys log
Varnish Cache runs as two separate processes; the management process and the child process. The management process is responsible for keeping the child running, and various other tasks, while the child process does the actual heavy lifting. In the event of a crash, the child process will automatically be started back up again — often so quickly that the downtime is not noticeable.
We recommend monitoring syslog in order to catch these events. A different possibility is to pay attention to varnishstat’s uptime counter — if that resets to 0, it means the child process has been restarted. In Varnish 4.0 the management process has its own counters (MGT.*) that can also be used to monitor child restarts.
Good luck! :)
-> please check out the webinar for an update on this topic