Re: distributed caching from Henrik Nordstrom on 2001-03-23 (squid-dev)

From: Henrik Nordstrom <hno@dont-contact.us>
Date: Fri, 23 Mar 2001 19:05:29 +0100

Theoretically you can build something similar to CSS using digests. Have
one central server collect digests for all members and then use ICP to
query this central database, in the same way as CSS.

The lack of Vary support in ICP is not that bad yet, but might become a
problem when the use of varying objects grows in magnitude. Still not
all is lost, as some of the hits will be correct. And the scheme can be
extended to support Vary:ing objects with a small amount of overhead
(one extra HTTP roundtrip might be required between the caches on a
"false" hit to learn that the object Varies.. then the Vary information
can be included in the digest key)

--
Henrik Nordstrom
Squid hacker
Roger Venning wrote:
> 
> I've been thinking just a tiny bit about distributed caching again. Some of
> you might have seen the central squid server concept that SE Net of Adelaide
> had supported work on (http://www.senet.com.au/css). This was essentially
> a centralized cache digest aggregation point, queriable via ICP. I'm not
> sure
> whether the cost of having a separate well memory-resourced box is worth
> the benefits of cache-digests (although of course memory has now dropped
> below $1AU per MB... I'm young but can remember when even disk was more
> expensive than that).
> 
> Essentially for a loose confederation of organisations that are prepared
> to act as siblings the problems can be in my (largely uninformed,
> _correction and additions desired_) opinion:
> 
> o benefits of having large distributed cache a largely negated by the fact
> that no-one is prepared to run in 'proxy-only' mode, and so all caches
> move to a state where they hold the same objects
> 
> o ICP traffic between siblings is n^2, although multicast helps by halving
> this (they all have to reply right?). Unreachable peers impose performance
> penalities on your own clients (admittedly minimised). Slow peers
> continually
> impact performance. If your sibling aren't running well dimensioned links...
> Of course how many people have got multicast going?
> 
> o Cache digests solve most of the above problems, but suffer from becoming
> outdated, and issues of accuracy, due to the update interval/size/bandwidth
> saving tradeoffs.
> 
> In order to overcome the first problem, I think that a method of running a
> cache in an intermediate state between 'proxy-only' and normal cache
> those objects that are cacheable mode might be useful. I suggest that this
> could be done by using past popularity as a indication of future popularity,
> and that 'highly popular' objects could migrate into multiple positions in
> a distributed cache, while unpopular objects are left on a single cache.
> 
> This could be done by keeping popularity state, 'inferring' from last access
> time, or done stochastically(?) by simply assigning a 'proxy-only
> probability' -
> but the number of requests for a single object will normally be too low
> for this
> last idea to work very successfully as far as I can tell.
> 
> I think there are elements of the Central Squid Server (CSS) that attack
> the last two points, especially the fact that CSSs could be themselves
> formed
> into a hierarchy, so a CSS could be kept regionally. The 'proxy-only
> probability'
> idea could be implemented separately.
> 
> Finally, all of ICP, cache digests and CSS are based around HTTP/1.0 style
> objects, as recognised by HTCP. Does anyone have estimates of what
> percentage of objects are unable to be located by ICP? (Does this make
> sense?)
> 
> Roger.
> 
> 
> -------------------------------------------------------------
> Roger Venning   \ Do not go gentle into that good night
> Melbourne        \ Rage, rage against the dying of the light.
> Australia <r.venning@bipond.com>                 Dylan Thomas

Received on Fri Mar 23 2001 - 11:10:54 MST

This archive was generated by hypermail pre-2.1.9 : Tue Dec 09 2003 - 16:13:40 MST