Caching and Page Loading Optimization

February 8, 2012 Pavel Novitsky Tips&Tricks

Today we continue reviewing caching and page loading optimization issues. We start this post with cache proxies and then offer 10 useful recommendations for website creation.

Cache Proxies

Caching in the Web server can be implemented in two ways. The Web server can cache static data in the memory and dramatically reduce required time to form page. Some Web servers, such as G-WAN, Nginx, Squid are optimized for static data and automatically cache data into memory. Other Web servers allow you to organize caching with configuration files. Apache, for example, supports mod_cache module, which provides a cache-memory for static content (mod_mem_cache) and the disk cache for data caching that is collected from dynamic sources (mod_disk_cache).

A separate Web server can serve as the cache server – so called the cache proxy server.

The main objective of the cache proxy server is to intercept and process the requests to the real application web server and deployment previously saved data from the root server. Typically, several proxy servers are set; each server is geographically distant from the rest. Thus, the user is connected to the server closest to him\her. As a result of it response time and primary server load with the web application are reduced.

Cisco describes the proxy cache like this: “When a browser wishes to retrieve a URL, it takes the host name component and translates that name to an IP address. A HTTP session is opened against that address, and the client requests the URL from the server.

When using a proxy cache, not much is altered in the transaction. The client opens a HTTP session with the proxy cache, and directs the URL request to the proxy cache instead.

If the cache contains the referenced URL it is checked for freshness by comparing with the “Expires:” date field of the content, if it exists, or by some locally defined freshness factor. Stale objects are revalidated with the server, and if the server revalidates the content, the object is remarked as fresh. Fresh objects are delivered to the client as a cache hit. If the cache does not have a local copy of the URL, or the object is stale, this is a cache miss. In this case the cache acts as an agent for the client, opens its own session to the server named in the URL, and attempts a direct transfer to the cache.”

Most of large projects in one way or another use a proxy cache server. For example, Wikimedia foundation uses 50 proxy cache Squid servers in three locations around the world to accelerate the delivery of content to users.

10 useful recommendations for website creation:

1. Try to indicate the period of documents and files validity in accordance with the regularity of their updates. You can indicate bigger max-age value for statistic data, for daily updated pages about a day, etc.2. If possible, make each document available only for one URL. Do not transmit user data via the URL, except if the generated page is entirely destined for one user.3. If you update your file to download, it is desirable to change its name and references to it, so the user does not mistakenly download an outdated version from the cache. You can also do it when you need to update, for example, an image that might be cached for a long time.4. Track the Last-Modified header to match the actual date of content changes. Do not resave files and pages, if you are not going to change them.

5. Minimize the use of SSL, and use POST-requests only where it is needed.

6. Try to avoid situations when the displayed data depends on the cookie. They are difficult to cache. If there are elements on the page, which are different for different users, an acceptable solution could be loading them using AJAX technology.

7. Reduce the amount of resources that require HTTP-authentication. For example, images on a password-protected page normally should not require authentication. Bring them in the directory with free access. If you want to cache pages that are protected by password, use the title:

8. Send Content-Length header showing the length of the response body in bytes (excluding the length of the header). It will allow the client to send multiple requests on one connection at a time.
9. If your server collects statistics query and you are afraid of losing it, leave a small non-cached element on your page, such as an image sized 1×1 pixel. It will allow you to get incoming requests from each user, even if the main document was received from the cache. The effect of caching will remain, although it will be spoiled by “extra” query.
10. To check the headers the server sends, use this plugin for Firefox.

Conclusion: The cache is useful. Cache more often.

Despite the fact that the caching on the web server side is used quite effectively for reducing the load on the web site, it is not the only kind of cache that can be used to optimize your application.

 

Expect to read about Application and Database caching advantages in speeding up the server on February, 15.



2 comments

  1. Nice article! :)
    I’m using Amazon CDN service. Can you advise if I can replace CDN with any proxy server?

  2. @Narayan
    Proxy servers and CDNs differ and you can not replace one with another. The main idea of the CDN is to keep content close to your visitor thus reducing latency. Some CDNs behave like a proxy — they send requests to the original server if there is no the requested file in the content delivery network. But CDN serves only static content and does not handle dynamic requests.
    Proxy servers can be part of your web-server farm (reverse proxy) or work as frontend at the visitor’s network. Normal proxies act as caches for those visitors who come through them.A reverse proxy is essentially offloading the work of communication with the end-visitor from your servers e.g. if they have a slow connection they’ll tie up a server generating a page for longer. Reverse proxies can also be set in front of multiple servers – either all doing the same thing or different things and the proxy presents a single address to the outside world.
    So CDNs and proxy servers are used for different purposes and it’s hard to suggest anything without knowing the details of your performace needs. If you need a help or a consultation you can contact me at pavel@belvg.com.

Post a new comment

top
BelVG Newsletter
Subscribe to our mailing list and get interesting stuff and updates to your email inbox.
Email *