Re: Buffer/String split, take2

From: Adrian Chadd <adrian_at_squid-cache.org>
Date: Wed, 21 Jan 2009 10:23:07 -0500

2009/1/21 Kinkie <gkinkie_at_gmail.com>:

> What I fear from the D&C approach is that we'll end up with lots of
> duplicate code between the 'buffer' classes, to gain a tiny little bit
> of efficiency and semantic clarity. If that approach has to be taken,
> then I'd rather take the variant of the note - in fact that's quite in
> line with what the current (agreeably ugly) code does.

The trouble is that the current, agreeably ugly code, actually works
(for values of "works") right now, and the last thing the project
needs is for that "works" bit to be disturbed too much.

> In my opinion the 'universal buffer' model can be adapted quite easily
> to address different uses by extending its allocation strategy - it's
> a self-contained function of code exactly for this purpose, and it
> could be extended again by using Strategy patterns to do whatever the
> caller wishes. It would be trivial for instance for users to request
> that the underlying memory be allocated by the pageful, or to request
> preallocation of a certain amount of memory if they know they'll be
> using, etc.
> Having a wide interface is a drawback of the Universal approach,

But you don't know how that memory should be arranged. If its just for
strings, then you know the memory should be arranged in whatever makes
sense to minimise memory allocator overheads. In the parsing codepath,
that involves parsing and creating references to an already-allocated
large chunk of RAM, instead of copying into separately allocated
areas. For things like disk IO (and later on, network IO too!) this
may not be as obvious a case. In fact, based on the -provider-
(anonymous? disk? network? some peer module?) you may want to request
pages from -them- to put data into for various reasons, as simply
grabbing an anonymous page from the system allocator and filling it
with data may need -another- copy step later on.

This is why I'm saying that right now, focusing on -just- the String
stuff and the minimum required to do copy-free parsing and copying in
and out of the store is probably the best bet. A "universal" buffer
method is probably over-reaching things. There's a lot of code in
Squid which needs tidying up and whatever we come up and -all- of it
-has- to happen -regardless- of what buffer abstraction(s) we choose.

> Regarding vector i/o, it's almost a no-brainer at a first glance:
> given UniversalBuffer, implement UniversalBufferList and make MemBuf
> use the latter to implement producer-consumer semantics. Then use this
> for writev(). produce and consume become then extremely lightweight
> calls. Let me remind you that currently MemBuf happily memmoves
> contents at each consume, and other producer-consumer classes I could
> find (BodyPipe and StoreEntry) are entirely different beasts, which
> would benefit from having their interfaces changed to use
> UniversalBuffers, but probably not their innards.

And again, what I'm saying here is that a conservative, cautious
approach now is likely to save a lot of risk in the development path
forward.

> Regarding Adrian's proposal, he and I discussed the issue extensively.
> I don't agree with him that the current String will give us the best
> long-term benefits. My expectation is (but we can only know after we
> have at least some extensive use of it) that the cheap substringing
> features of the current UniversalBuffer implementation will give us
> substantial benefits in the long term.
> I agree with him that fixing the most broken parts of the String
> interface is a sensible strategy for merging whatever String
> implementation we end up choosing.

> I fear that if we focus too much on the long-term, we may end up
> losing sight of the medium-term, and thus we risk reaching neither
> because short-term noone does anything. EVERYONE keeps on asserting
> that squid (2 and 3) has low-level issues to be fixed, yet at the same
> time only Adrian does something in squid-2, and I feel I'm the only
> one trying to do something in squid-3 - PLEASE correct me and prove me
> wrong.

*shrug* I think people keep choosing the wrong bits to bite off. I'm
not specifically talking about you Kinkie, this certainly isn't the
only instance where the problem isn't really fully understood.

The problem in my eyes is that noone understands the entire Squid-3
codebase enough to start to understand what needs to happen and begin
engineering an actual path forward. Everyone knows their little
"corner" of the codebase. Squid-3 seems to be plagued by little
mini-projects which focus on specific areas without much knowledge of
how it all holds together, and all kinds of busted behaviour ensues.

> There's another issue which worries me: the current implementation has
> been in the works for 5 months; there have been two extensive reviews,
> two half-rewrites and endless discussions. Now the issue crops up that
> the basic design - whose blueprint has also been available for 5
> months in the wiki - is not good, and that we may end up having to
> basically start from scratch. How can we as a project prevent for the
> future such a waste of resources (in this case, mine) is in my very
> humble opinion an issue key to the very survival of squid as a
> project.

You don't want my public opinion on this. I've been asked not to give
it before. :)

> As a side note, if we end up starting from scratch, it will have to be
> someone else doing it; I would be obviously biased towards the 'wrong'
> design, and I would prefer to concentrate on other things to be done
> which are characterized by a lesser risk of merge rejection.

It is not that you did it "wrong". You gained valuable insight and
hopefully learned a thing (or ten) from the experience. The end goal
is mostly fine. Answering questions about what would be the "correct"
buffer design is not easy. I've gone through trying to do this in
Squid-2 and each time I've contracted my design goals until the steps
were quite small. You saw how much code I needed in the end to convert
the Squid-2 "String" code into a NUL-terminated referenced buffer
thing with zero-ish cost stringDup() functionality. It wasn't all that
much at all. The problem was rewriting some of the crappy code first.

And this boils down to my main beef (that I'm willing to say on a
public squid list right now) with an -actual- main problem. The main
problem exposed here is that although people have large-scale ideas on
what chunks of the codebase -should- be, almost none of the existing
mess is tidied up before new things are shoehorned in and then months
of random, hard to track down instabilities pop up because people
thought the correct path was "add new feature which I want to use,
convert over bits of code to use it, leave it alone." Stuff is never
fully integrated, and we later on find out that perhaps the original
design wasn't quite suitable for the tasks converted over to it.

This -has- not worked in the past for Squid-3. It most likely -will-
not work moving forward, and yet people keep persisting that this is a
valid path forward.

Guys. The codebase sucks, ok? Said sucky bits -need- to be rewritten
before new code goes in, or you will simply have more of the same crap
happen to the project over and over again.

Adrian
Received on Wed Jan 21 2009 - 15:23:19 MST

This archive was generated by hypermail 2.2.0 : Wed Jan 21 2009 - 12:00:26 MST