Re: memory-mapped files in Squid

From: Carlos Maltzahn <carlosm@dont-contact.us>
Date: Thu, 21 Jan 1999 16:36:23 -0700 (MST)

On Wed, 20 Jan 1999, Andres Kroonmaa wrote:

    On 20 Jan 99, at 10:30, Carlos Maltzahn <carlosm@mroe.cs.colorado.edu> wrote:
    
>
> I need some implementation help.
>
> The idea is to store all objects <=8K into one large memory mapped file.
    
     The one dead-end is that you cannot mmap more than 3GB of file at a
     time. Thats awfully little, and remmaping some huge file is pretty
     expensive system call.
    
That's only true for 32bit machines, right? In a year or so, we
probably all use 64bit machines. Large proxy servers with huge disks
use already 64bit architectures. Aside from that, remember that
I'm using this file to only store small objects. According to my traces
the average object size of <=8K objects is about 2.6K. For >8K objects I
still use the normal file system. If 3GB is getting too small one could
come up with a scheme where originally all objects are stored in files but
objects (<=8K) which produce more than one hit per day move into a memory
mapped file.

     The another drawback is that physical disk io becomes pretty unpredictable.
     With heavy load disk times go up to xxx ms range, you can't afford
     the whole squid process being blocked by page-io. One solution is
     fully threaded code, or, quite a huge rewrite.
    
Yes. I found that especially the 30sec sync wrecks havoc on large memory
mapped files. But for dedicated machines one could easily change the sync
rate to a much larger interval, say 5mins. If the machine crashes, 5mins
or even an hour of missed caching is not going to kill me.

     unmapping file with dirty pages adds to the fuel - all dirty pages are
     flushed to disks, and as they are paged directly out from user space,
     the whole process is blocked for the duration of unmap.
    
     The bottom line seems to be that mmap is suited for cases when you
     mmap/unmap rarely. For squid this would hit 3GB limit.
    
Yes. I don't propose to mmap/unmap frequently.

> I'm not familiar with Squid's source code and the various sub systems it
> uses. But ideally one could integrate memory mapped files with Squid's
> memory pools so that loaded parts of the memory mapped file don't have to
> compete with web objects in memory pools.
    
     You are on the right track, but mmap is not the way to go. UFS overhead
     is what makes squid slow, to avoid that squid would need its own FS.
     As most overhead comes from directory structure maintenance and free
     space handling, these are the most obvious areas of work.
    
Isn't most of the UFS overhead inode management and the fact that you use
individual files? With mmap you have _one_ inode that is constantly used
and therefore cached. You don't have to worry about file system level
fragmentation. So even though a Squid FS would be nice, I disagree with
you that it is necessary to significantly reduce disk I/O.

> Have people tried something similar? What is your experience? Would you
> extend Squid's swapout/swapin modules or Squid's memory pools. What other
> possibilities do exist?
    
     "squid-FS and alikes".
     In fact, squid should be rewritten somehow so that it has some
     generalised (object)file io api that could be more easily modified. Then
     it would be possible to experiment with many different algoritms.
     Currently, if you want to try your stuff out, you'd have to rewrite
     pretty much of squid, and then you are your own.

Yup - too bad. I'm currently looking into other open source web
server/proxy software that might be easier to extend. Medusa
(www.nightmare.com) comes to mind...
    
> I ran two trace-driven file system load generators, one similar to Squid's
> current file access, and one with a combination of memory-mapped file
> access for <=8K objects and normal file access for >8K objects. The
> results were very encouraging. So I really like to implement something
> like this in Squid and see whether it indeed saves a lot disk I/O.
    
     it will, until you go beyond some tons of GB's. Then mmap()ing overhead
     will take over. The main benefit you get is you avoid dir structure
     indirection overhead, perhaps upto 6 disk io's per 8K object. thats
     alot. But that should be handled by some other means.
    
     I dunno what OS you run on, but in my experience, Solaris 2.6 goes boo-boo
     after you try handling >~15GB with 512MB of RAM and 10% fragmentation.
     Only after this point you can really see drastic influences of differing
     algoritms.
    
I used DUNIX 4.0 with 512MB and three disks with a total of 10GB. The two
days of trace used 4GB for >8K objects and about 1.7GB of the 2GB
memory-mapped file for <=8K objects.

Are you talking about running Squid on a 15GB/512MB system? The effect you
are describing could be because 512MB might not be enough to handle the
meta data of a full 15GB cache without swapping, right? As long as the
size of the metadata is linear dependent on the size of the cache you
should run into this problem in any case.
    
     I'd suggest looking into using just plain swapfiles in bunch, either
     ontop of UFS or on raw partitions. Implement block allocation and
     maintenance yourself, tightly coupled with squid' needs. There is
     one work in progress for squid-FS that looks very interesting, and
     if you like, I can present you my own ideas on the subject some time.
     
I'd be very interested in finding out more about this. Who is working on
squid-FS?

Carlos
Received on Tue Jul 29 2003 - 13:15:55 MDT

This archive was generated by hypermail pre-2.1.9 : Tue Dec 09 2003 - 16:12:02 MST