Re: [squid-users] Caching youtube videos and similar...

From: Amos Jeffries <squid3_at_treenet.co.nz>
Date: Tue, 27 Oct 2009 15:37:17 +1300

On Sun, 25 Oct 2009 22:20:18 -0200, Guillermo Javier Nardoni
<gjnardoni_at_yahoo.com.ar> wrote:
> Hello, I'm working on a squid 2.7 box running on debian etch.
>
>
> Squid is compile from sources and it works fine for almost any content.
>
>
> What I would like to cache is youtube content (dynamic content).
> Followed a few tutorials on the web
>
> event those on squid's website it didn't work for me.
>
>
> Squid always returns TCP_MISS for any video from youtube and googlemaps
> too.
>
>
> Can you help me to achive this goal?.
>
>
> P.S: I guess the issue is due to the header of youtube site:
>
>
> Cache-Control: max-age=0
>
>
> Should i override this, if so, how to?
>
>
> this is my squid.conf file..
>
>
> geryon-svr-001:/usr/local/squid27/etc# cat squid.conf
> cache_dir ufs /squid/cache/squid27 14336 16 256

I *HIGHLY* recommend changing that to "aufs".

> cache_log /usr/local/squid27/var/logs/cache.log
> cache_mem 64 MB
> cache_store_log /usr/local/squid27/var/logs/store.log
> access_log /usr/local/squid27/var/logs/access.log squid
> coredump_dir /squid/cache/squid27/cache
>
> acl all src all
> acl esri src 10.0.0.0/8
> acl manager proto cache_object
> acl localhost src 127.0.0.1/32
> acl to_localhost dst 127.0.0.0/8 0.0.0.0/32
>
> acl localnet src 10.0.0.0/8 # RFC1918 possible internal network
> acl localnet src 172.16.0.0/12 # RFC1918 possible internal network
> acl localnet src 192.168.0.0/16 # RFC1918 possible internal network
>
> acl SSL_ports port 443
> acl Safe_ports port 80 # http
> acl Safe_ports port 21 # ftp
> acl Safe_ports port 443 # https
> acl Safe_ports port 70 # gopher
> acl Safe_ports port 210 # wais
> acl Safe_ports port 1025-65535 # unregistered ports
> acl Safe_ports port 280 # http-mgmt
> acl Safe_ports port 488 # gss-http
> acl Safe_ports port 591 # filemaker
> acl Safe_ports port 777 # multiling http
> acl CONNECT method CONNECT
>
> acl youtube dstdomain .youtube.com .googlevideo.com .video.google.com
> .video.google.com.ar .video.google.com.au
> acl youtubeip dst 74.125.0.0/16
> acl youtubeip dst 64.15.0.0/16
> acl youtubeip dst 200.49.0.0/16
>
> cache allow youtube
> cache allow youtubeip
> cache allow esri
>
> cache allow all

The above 4 lines of ACl tests per request collapse down to a single
action: cache allow all

>
> # These are from http://wiki.squid-cache.org/Features/StoreUrlRewrite
> acl store_rewrite_list dstdomain mt.google.com mt0.google.com
> mt1.google.com mt2.google.com
> acl store_rewrite_list dstdomain mt3.google.com
> acl store_rewrite_list dstdomain kh.google.com kh0.google.com
> kh1.google.com kh2.google.com
> acl store_rewrite_list dstdomain kh3.google.com
> acl store_rewrite_list dstdomain khm.google.com khm0.google.com
> khm1.google.com khm2.google.com
> acl store_rewrite_list dstdomain khm3.google.com
> acl store_rewrite_list dstdomain kh.google.com.ar kh0.google.com.ar
> kh1.google.com.ar kh2.google.com.ar
> acl store_rewrite_list dstdomain kh3.google.com.ar
> acl store_rewrite_list dstdomain khm.google.com.ar khm0.google.com.ar
> khm1.google.com.ar khm2.google.com.ar
> acl store_rewrite_list dstdomain khm3.google.com.ar
> acl store_rewrite_list dstdomain kh.google.com.au kh0.google.com.au
> kh1.google.com.au khm.google.com.au
> acl store_rewrite_list dstdomain kh2.google.com.au kh3.google.com.au
>

NP: the wiki example was created by two people , one in Australia, one in
Argentina. If you are in any other country the google ccTLD (*.au *.ar)
will be different and you need to add your own country ones to that list
for it to catch them.

This is the only thing I can see that directly affects your problem.

> # This needs to be narrowed down quite a bit!
> acl store_rewrite_list dstdomain .youtube.com
>
> storeurl_access allow store_rewrite_list
> storeurl_access deny all
>
> storeurl_rewrite_program /usr/local/bin/store_url_rewrite
>
>
> http_access allow manager localhost localnet
> http_access deny manager
> http_access deny !Safe_ports
> http_access deny CONNECT !SSL_ports
> http_access deny to_localhost
> http_access allow localnet
> http_access deny all
>
> icp_access allow localnet
> icp_access deny all
>
> http_port 3128 transparent

Doing transparent on a common port is a bad thing for a number of reasons.
You should imagine a random port number and use that instead. It's only
relevant between the firewall and Squid so no other software need know it.

>
> hierarchy_stoplist cgi-bin ?
> maximum_object_size_in_memory 1024 KB
> memory_replacement_policy lru
> cache_replacement_policy lru
> store_dir_select_algorithm least-load
> max_open_disk_fds 0
> minimum_object_size 0 KB
> maximum_object_size 4194240 KB
> cache_swap_low 90
> cache_swap_high 95
> update_headers on
>
> logformat squid %ts.%03tu %6tr %>a %Ss/%03Hs %<st %rm %ru %un %Sh/%<A
%mt
> logformat squidmime %ts.%03tu %6tr %>a %Ss/%03Hs %<st %rm %ru %un
> %Sh/%<A %mt [%>h] [%<h]
> logformat common %>a %ui %un [%tl] "%rm %ru HTTP/%rv" %Hs %<st %Ss:%Sh
> logformat combined %>a %ui %un [%tl] "%rm %ru HTTP/%rv" %Hs %<st
> "%{Referer}>h" "%{User-Agent}>h" %Ss:%Sh
>
> logfile_daemon /usr/local/squid27/libexec/logfile-daemon
> logfile_rotate 10
> emulate_httpd_log off
> log_ip_on_direct on
> mime_table /usr/local/squid27/etc/mime.conf
> log_mime_hdrs off
> pid_filename /usr/local/squid27/var/logs/squid.pid
> debug_options ALL,1
> log_fqdn off
> client_netmask 255.255.255.255
> strip_query_terms on
> buffered_logs off
> netdb_filename /usr/local/squid27/var/logs/netdb.state
> ftp_passive on
> ftp_sanitycheck on
> ftp_telnet_protocol on
>
> # diskd_program /usr/local/squid27/libexec/diskd-daemon
> unlinkd_program /usr/local/squid27/libexec/unlinkd
> # pinger_program /usr/local/squid27/libexec/pinger
>
> #Suggested default:
> refresh_pattern ^ftp: 1440 20% 10080
> refresh_pattern ^gopher: 1440 0% 1440
> refresh_pattern -i \.swf$ 10080 90% 999999 override-expire
> override-lastmod reload-into-ims ignore-reload ignore-no-cache
> ignore-private
> refresh_pattern -i \.flv$ 10080 90% 999999 override-expire
> override-lastmod reload-into-ims ignore-reload ignore-no-cache
> ignore-private
> refresh_pattern ^http://sjl-v[0-9]+\.sjl\.youtube\.com 10080 90% 999999
> override-expire override-lastmod reload-into-ims ignore-reload
> ignore-no-cache ignore-private
> refresh_pattern get_video\?video_id 10080 90% 999999 override-expire
> override-lastmod reload-into-ims ignore-reload ignore-no-cache
> ignore-private
> refresh_pattern watch\?v 10080 90% 999999 override-expire
> override-lastmod reload-into-ims ignore-reload ignore-no-cache
> ignore-private
> refresh_pattern youtube\.com/get_video\? 10080 90% 999999
> override-expire override-lastmod reload-into-ims ignore-reload
> ignore-no-cache ignore-private
> refresh_pattern youtube\.com/watch\? 10080 90% 999999 override-expire
> override-lastmod reload-into-ims ignore-reload ignore-no-cache
> ignore-private
> refresh_pattern ^http://www.youtube\.com/get_video\? 10080 90% 999999
> override-expire override-lastmod reload-into-ims ignore-reload
> ignore-no-cache ignore-private
> refresh_pattern ^http://www.youtube\.com/watch\? 10080 90% 999999
> override-expire override-lastmod reload-into-ims ignore-reload
> ignore-no-cache ignore-private
>
>
> quick_abort_min -1 KB
> # quick_abort_max 16 KB
> # quick_abort_pct 95
>
> acl shoutcast rep_header X-HTTP09-First-Line ^ICY.[0-9]
> upgrade_http0.9 deny shoutcast
>
> via on
> cache_vary on
> acl apache rep_header Server ^Apache
> broken_vary_encoding allow apache
> collapsed_forwarding off
>
> refresh_stale_hit 0 seconds
> ie_refresh on
>
> cache_mgr webmaster_at_rosarioip.com.ar
> mail_from guillermo_at_geryon.com.ar
> cache_effective_user nobody
> cache_effective_group nogroup
> icon_directory /usr/local/squid27/share/icons
> error_directory /usr/local/squid27/share/errors/Spanish
> check_hostnames off
> ignore_unknown_nameservers off

That is generally not a good idea. Despite Squid randomizing its listening
DNS port by default if it can be detected by a malicious intruder your
clients are screwed.

> fqdncache_size 4096
> cachemgr_passwd 29162959 all

Sigh. Now you have to change that password everywhere you use it. It's
just been archived in public view forever alongside your name.

> reload_into_ims on
> pipeline_prefetch on
>
> refresh_pattern -i (/cgi-bin/|\?) 0 0% 0
>
> refresh_pattern . 0 20% 4320
>
>
>
> Thans in advance,
>
>
> Guillermo.-
Received on Tue Oct 27 2009 - 02:37:21 MDT

This archive was generated by hypermail 2.2.0 : Tue Oct 27 2009 - 12:00:03 MDT