Re: [squid-users] invalid url

From: Henrik Nordstrom <hno@dont-contact.us>
Date: Mon, 15 Mar 2004 20:17:09 +0100 (CET)

On Mon, 15 Mar 2004, [utf-8] Jørgen Hovland wrote:

> 0x00c0 352e 313b 202e 4e45 5420 434c 5220 312e 5.1;..NET.CLR.1.
> 0x00d0 312e 3433 3232 290d 0a48 6f73 743a 2077 1.4322)..Host:.w
> 0x00e0 7777 2e6a f872 6765 6e2e 6e75 0d0a 436f ww.j.rgen.nu..Co

This is cleary an invalid HTTP request.

a) 0xf8 is not in the set of valid characters in a HTTP host name

b) 0xf8 is also not a valid UTF-8 character. Loosely speaking it can be
argued that HTTP in the future should be modified to support UTF-8
encoding according to the internationalisation guielines from IETF. There
is however a lot of work remaining before this point is officially
reached.

If you want to sent requests like this you today MUST be using a browser
which supports the established IDN transition encoding of non-ascii host
names. Any other use is outside of any standards and things will fail.

> I see that IE sends an url encoded GET line, which is what it is
> supposed to do. I would put my 5 euro in that this is a non-implemented
> squid feature.

Yes and no.

What you are trying to do is outside of any HTTP standards, and in
addition in violation to the existing standards and guidelines in how it
should be done.

> Im simply interested in getting things to work. Permitting such domains
> will acomplish this task. I have no control over all the browsers that
> will be using our proxy.

And Squid has neither.

> As you probably are aware of, there are probably more browsers out there
> not IDN capable than capable of IDN.

I am well aware of this, and it is an initial pain which the network has
to go thru before IDN gets widely accepted. Until then IDN names is plain
not reliable as a technology and should be seen as a bonus not a must.
Anyone selling IDN names without making sure the customer knows this is
almost a fraud.

> Rejecting such domains in a proxy software is not going to help anyone.

There is limits on how much brain damage one adds to the proxy to work
around browsers not following the standards, but if you find reasonable
methods to deal with this brain damage you are welcome to submit a patch.

> The smartest thing would be to automaticly translate to IDN in squid
> directly (as an optional choice of course).

We have considered accepting IDN encoding from UTF-8 into Squid, but so
far nobody has sumbitted any patches for doing so. IDN encoding of other
encodings is questionable.

If you use --disable-hostname-checks then the IDN encoding can be done in
a redirector helper to Squid if desired, without having to modify Squid.
Or even better if your DNS supports UTF-8 (or whatever encoding you
use) then no modifications to Squid is needed unless ofcourse your client
tries to follow the standards given (according to the standard) garbage
input and %nn encodes the hostname in which case this has to be undone
(can also be done in the redirector helper without modifying Squid).

The use of ISO-8859-X encodings within DNS for host/domain names is a
bastard which in my opinion should never have been let loose on the
Internet. Any DNS operator accepting ISO-8859-X encodings can be seriously
questioned if they consider DNS as an important infrastructure for the
Internet or just a quick way to earn money with no respect for the
function DNS provides. The fact that the DNS protocol is
encoding-independent does not varrant such abuse of the hostname DNS
scheme (which is anything but encoding-independent).

Regards
Henrik
Received on Mon Mar 15 2004 - 12:17:13 MST

This archive was generated by hypermail pre-2.1.9 : Thu Apr 01 2004 - 12:00:02 MST