Re: [RFC] cache architecture

From: Amos Jeffries <squid3_at_treenet.co.nz>
Date: Thu, 26 Jan 2012 14:53:27 +1300

On 26.01.2012 13:48, Alex Rousskov wrote:
> On 01/25/2012 05:32 PM, Amos Jeffries wrote:
>> On 26.01.2012 08:26, Alex Rousskov wrote:
>>> On 01/25/2012 01:20 AM, Henrik Nordström wrote:
>>>> ons 2012-01-25 klockan 15:03 +1300 skrev Amos Jeffries:
>>>>
>>>>> We also need to enumerate how many of these cases are
>>>>> specifically
>>>>> "MUST purge" versus "MUST update". The update case is a lot more
>>>>> lenient
>>>>> to sync issues than purges are.
>>>>
>>>> The case which matters here is that update actions done by a user
>>>> should
>>>> be immediately visible by the same user after accepted by the
>>>> requested
>>>> server.
>>>>
>>>> i.e. POST/PUT/DELETE etc need to invalidate any cached
>>>> representation of
>>>> the requested URL or Content-Location of response when same host.
>>>
>>> The above matches my expectations but I do not think it matches
>>> Amos'
>>> point of view.
>>
>> I think I understand what you are trying to get at as the problem.
>> But
>> don't think we can claim it as a HTTP violation when all instances
>> which
>> received request (or knowledge of it) do purge properly.
>>
>> A cache/worker which is *completely* unaware of the POST/PUT/DELETE
>> having ever happened cannot be held accountable for the result.
>
> From client and server points of view, the "cache" is a single
> entity,
> with a single address (i.e., "all workers together" or "any worker I
> might end up talking to" if you wish). We can claim ignorance of
> those
> points of view, but I think that would violate HTTP spirit if not the
> letter.
>
> We created workers as an internal performance optimization that has
> nothing to do with HTTP. It is our responsibility to make sure that
> optimization stays internal. If caches are not synchronized, the
> optimization may negatively affect external HTTP agents.

I see you arguing that IPC messages about purges is a requirement we
imposed on ourselves. I agree, and focus on IPC so that admin who
disable ICP/HTCP/PURGE are not causing problems.

I see no evidence that sharing an IP is any more (or less) of a
violation than each worker having a unique IP and same FQDN. We haven't
gone around claiming that sibling relationships or popular CDN
hierarchies are all violating HTTP, though they hit sync problems too.

>
> Again, if HTTP has no text defining when two cooperating caches must
> be
> in sync, then it would be difficult to decide which interpretation of
> the HTTP spirit is "correct".

The new wording for HTTPbis part 6 draft -18 section 2.5 about
PUT/POST/DELETE/unknown explicitly clarifies the spirit with:
"
    Note that this does not guarantee that all appropriate responses are
    invalidated. For example, the request that caused the change at the
    origin server might not have gone through the cache where a response
    is stored.
"

section 2.2 on what responses can be served uses the wording
"
    Also, note that unsafe requests might invalidate already stored
    responses; see Section 2.5.
"
*might* invalidate.

Two giant loopholes to walk through. Invalidation MUST is a best-effort
benefit for a hierarchy, not a guarantee of removal.

With Squid SMP mode design being an entire hierarchy inside one box we
have to adjust our viewpoint to that of hierarchy compliance. The
workers are as compliant as ever individually. We have raised awareness
of the hierarchy level interaction problems and need to fix it above and
beyond the specs. They in word and spirit focus on requirements of
individual cache instances, not distributed or hierarchy.

What we do by fixing the problem is improve the
friendliness/predictability and usefulness of Squid responses. Not the
compliance level.

>
>> We are wholly dependent on the server or client providing
>> cache-controls
>> that workaround the sync issues. So long as those controls and the
>> purge
>> are obeyed correctly *when received* I call it compliant.
>
> IMO, we are responsible for delivering the right controls to all
> cooperating caches. A client does not know we have many of them
> hiding
> under a single Squid hood so it cannot make sure each of the caches
> receives the necessary controls. The client has the right to treat
> all
> workers as one because the client is talking to a single proxy
> address.
>

Yes.

Amos
Received on Thu Jan 26 2012 - 01:53:41 MST

This archive was generated by hypermail 2.2.0 : Thu Jan 26 2012 - 12:00:13 MST