Re: Store_url_rewrite 2.7 vs\to 3.head basic review from Eliezer Croitoru on 2012-09-08 (squid-dev)

From: Eliezer Croitoru <eliezer_at_ngtech.co.il>
Date: Sat, 08 Sep 2012 23:33:52 +0300

On 09/08/2012 08:18 PM, Alex Rousskov wrote:
> On 09/08/2012 06:24 AM, Eliezer Croitoru wrote:
>> (I will respond to the other emails also)
>> A moment before I will get back to the 3.head I want to review what was
>> done on 2.7 vs 3.head before even touching the code.
>>
>> summary:
>>
>> squid 2.7 store_url_rewrite makes use of
>> mem_object->request->storeurl mainly as a null to verify that there
>> wasn't any action.
>> in all the cases of a test that storeurl is null another action was
>> the choice.
>> this avoids the need in any step to even access the original request
>> url if not needed(avoiding harming the original request url) which
>> leads to put code only on key points that needs access to the storeurl.
> Are you saying that Squid2 uses mem_object->request->storeurl as a
> boolean only?
not exatly.
it stores the rewritten url but most of the time the check are is a base
to the decision if to use it for something.
but it's only three stages in a life cycle:
NULL
REWRITTEN URL
FREE

>
> It would be useful to know what those "actions" are. The question are:
>
> 1a. Does Squid2 store rewritten URL in the cache?
in the metadata
> 1b. If yes, why?
why is the question I didnt found the answer yet...

> 2b. Why does Squid2 store original URL in the cache?
For this I think that it's for mgr:... (dont remember option) that fetch
data on objects in the cache now.
I didnt looked at it in depth yet but this is from my first glances at
the code.
and this assumption would maybe answer about why the meta_storeurl.
I remember that was something about 2.7 that you couldn't know anything
about the current state of store_url objects in cache but could about
the others.
so it might was meant to serve this mgr:feature that will give you that
data.
>> - about the schema structure using "squid://" which is a bad
>> idea to use since it's a http request.
>> the basic usage should be to use an internal non public domain
>> name for storeurl.
> Do we have to change the original URL schema or domain name when
> rewriting the URL for Store? If yes, why?
we as squid NO.
we as users NO(that what I ment)
actually I think it's a bad idea to change the original schema on the
rewriting process since we do want to validate the rewritten url
authenticity as a uri.(for hashing and other stuff if needed)
if anyone will use "squid://" the object will never be accessible for
any use such as htcp,icp and for cache_peering at all.
the responsibility of choosing what domain name and url structure after
rewriting is on the proxy maintainer.
on US is to force him use what is good for him while using squid and all
squid features.

So since store_url_rewrite is attached to an external program that can
send some nasty stuff into squid WE restrict it and not it restrict us.
>> - 302 loops. problem when a url is being rewritten and the same
>> url rewrite result in the same url based on the credentials.
>> case: http://example.com/?arg1=111&arg2=222&arg3=333
>> s/(arg1\=[\d]+).*(arg2\=[\d]+)/http...$1&$2/g
>> then the same url with another arg4=444 will result in the same
>> rewritten url but the arg4 will result in a 302 reply and will
>> be stored by the refresh patterns.
>> this can result the the client requesting the same url in a loop.
> If store URL rewriting causes a loop, does not that imply that the
> rewriting rules are wrong?
not always because from what I have seen some vendors used a method like
this:
client->request->proxy

/ start fetching the object into cache
proxy
\ send the client 302 to the same url exactly(some are smarter)

client(2nd try)->request->proxy
proxy->reply the client with the fetched object.

specificity on youtube many ISP cache solutions (other then squid) do
almost the same but you see a redirect mark in the url most of the time.
I have tried to use some redirect mark but I always got into a point
that a 302 redirect loop was waiting for me behind the corner.
To solve that I specificity used ICAP to change the reply of only 302
responses to no-store(thanks amos to the idea).
<http://wiki.squid-cache.org/ConfigExamples/DynamicContent/YouTube#ConfigExamples.2BAC8-DynamicContent.2BAC8-YouTube.2BAC8-Discussion.Fixed>

> That is, they map URLs with semantically
> different responses to the same URL, which breaks things. How can Squid
> handle that automatically?
If you do ask me there is a wrong concept of handling 302 responses in
squid by default.
the same as put and post is not being cached by default a 302 response
shouldn't be cache since it's a moved_temporary status code and means it
will be stale in a really fast period.
also most of the time it's a reply smaller then *1KB so the cache gain
here even on a 33.6 dialup connection wont be that significant(302 only
not 200).
The loss by caching 302 can be a client trying to get a file that stale
in the origin side of the network.
A 301 is a permanent redirect so it should be cached by default but a
302.. is kind of a stub of "dont cache me i'm not suppose to be in
cached ever"

*but you must understand that the case is that the case is a combination
of very bad refresh_pattern that forces the cache to hold the 302
response so it's not 100% squid fault for it.
>> there was a patch for that but it refers to all 302 replies and
>> is not applied to the current 3 stable.(kind of tested)
> What did the patch change?
in youtube case there was a specific patch and prior to it there was a
recommendation to use:
minimum_object_size 512 bytes
that later became
minimum_object_size 1100 bytes

the fix patch that was used is at the wiki:
http://wiki.squid-cache.org/ConfigExamples/DynamicContent/YouTube#ConfigExamples.2BAC8-DynamicContent.2BAC8-YouTube.2BAC8-Discussion.Fixed

the patch:
Index: src/client_side.c
===================================================================
--- src/client_side.c (revision 134)
+++ src/client_side.c (working copy)
@@ -2408,6 +2408,17 @@
                 is_modified = 0;
         }
      }
+ /* bug fix for 302 moved_temporarily loop bug when using storeurl*/
+ if (mem->reply->sline.status >= 300 && mem->reply->sline.status
< 400) {
+ if (httpHeaderHas(&e->mem_obj->reply->header, HDR_LOCATION))
+ if
(!strcmp(http->uri,httpHeaderGetStr(&e->mem_obj->reply->header,
HDR_LOCATION))) {
+ debug(33, 2) ("clientCacheHit: Redirect Loop Detected:
%s\n",http->uri);
+ http->log_type = LOG_TCP_MISS;
+ clientProcessMiss(http);
+ return;
+ }
+ }
+ /* bug fix end here*/
      stale = refreshCheckHTTPStale(e, r);
      debug(33, 2) ("clientCacheHit: refreshCheckHTTPStale returned
%d\n", stale);
      if (stale == 0) {

which is a bad idea since it refers to 300-400 range and not only to 302.
but it's detecting the loop using the headers and not only the 302 response.
in 3.head the code is something else and I dont know yet in what part of
the code this kind of check can be done.

I have tried using cache_deny allow 302status but amos told me it's not
working as expected.

in 3.head the equivalent code is at: client_side_reply.cc:456:
clientReplyContext::cacheHit(StoreIOBuffer result)
but it's using a IOBuffer and not a httpreply yet so we cant use there
the same exact logic unless we parse the result here if at all.

>
> Thank you,
>
> Alex.
NP,
Eliezer
Received on Sat Sep 08 2012 - 20:34:11 MDT

This archive was generated by hypermail 2.2.0 : Sun Sep 09 2012 - 12:00:05 MDT