Re: thoughts on memory usage... from Andres Kroonmaa on 1997-08-20 (squid-dev)

From: Andres Kroonmaa <andre@dont-contact.us>
Date: Wed, 20 Aug 1997 12:15:01 +0200 (EETDST)

--MimeMultipartBoundary
Content-type: text/plain; charset=US-ASCII
Content-transfer-encoding: 7BIT

On 20 Aug 97 at 10:53, Oskar Pearson wrote:

> > keeping actual URL string in ram. Squid is request driven, and it doesn't
> > care very much of what is in its cache for other times.
> > For URL search all we need is a uniq identifier that can be calculated
> > from any given URL. thus we'd need an alogritm that gives always a uniq
> what is 'hash.c' for if this isn't what it's already doing?
>
> Are we essentially keeping the whole URL in ram so that we can eliminate
> hash table collisions?

If hash.c is good enough, we'd need no entry->url altogether, successful
hash lookup means we have have an url hit. Request URL would be kept in ram
only as long as request is being serviced.

> Possible problems:
>
> The hash table would have to be dynamically sized so that on a 1 gig
> cache it won't use as much ram as on a 10 gig cache. This is since
> you essentially need 1 'hole' in the hash table for each object.

Sure. Does squid currently use fixed size hash table?

> > Upon request we'd calc a uniq hash id from URL and make a lookup.
> How unique.... shall we use md5?

I am not at all familiar with hash algoritms, so I'd better keep quiet
here.

> > As no place on disk would contain actual URL for which the id was
> > made, it could be very difficult to change algoritm if the need arises.
> > Also it would be hard to detect when collisions occur. To doublecheck,
> > I'd suggest to prepend URL to any object on disk. Then, when servicing
> Agreed. I think that it's perhaps time that we start keeping extra info
> in the on-disk cache... for example 'URL','headers'(eg cookies),'time
> retrieved'
>
> > In addition, saving URL's with objects gives a way to rebuild all store
> > data from files spread on disks in case swaplog gets trashed or corrupted.
> doing 1/2 a million 'open,seek,close's on every object in the cache will
> be slooooow.

Well. If you pay buck-for-Meg, and you loose 12Gig of cache, you are
willing to pray all Gods to make your cache contents return...
When you change your cache dir structure, eg. when adding disks, you
loose the contents. If it is not the job for squid to move files around,
thats ok, you can write your own script to do that, but it is now
impossible - if swaplog goes, all your cache goes...

> > In conclusion, if this idea is worth anything, squid RAM usage could
> > drop from average 100 bytes per URL to 6-10, giving more ram and speeding
> > up lookups.
> I think that md5 is the 'way to go'. Using md5 would mean that we essentially
> don't have to worry about collisions... but it's CPU intensive.

It will be used only upon each request once, I think this is ok.

----------------------------------------------------------------------
  Andres Kroonmaa mail: andre@online.ee
  Network Manager
  Organization: MicroLink Online Tel: 6308 909
  Tallinn, Sakala 19 Pho: +372 6308 909
  Estonia, EE0001 http://www.online.ee Fax: +372 6308 901
----------------------------------------------------------------------

--MimeMultipartBoundary--
Received on Tue Jul 29 2003 - 13:15:42 MDT

This archive was generated by hypermail pre-2.1.9 : Tue Dec 09 2003 - 16:11:23 MST