Russian in HTTP


Because the Internet is growing and more and more languages are used to create pages, it is very good to understand how to build a web server to also support Russian. The information on these pages pramarely targets webmasters, people who support web servers. However, general users also might find it very useful to learn how to correctly write actual Russian web pages if they can not influence their ISP.
My strong belief is that the most correct way to handle multi-language multy-encoding documents is through HTTP server-HTTP client interation. The client should report which language it wants to talk to and which encoding is most sutable (see RFC 2070).
The client-server interation should look as follows. A client, when initiating the request, should send HTTP_ACCEPT_LANGUAGE and HTTP_ACCEPT_CHARSET variables to the server. Server sends, one of the following:
	Content-Type:	text/html; charset=koi8-r
	Content-Type:	text/html; charset=windows-1251
	Content-Type:	text/html; charset=x-mac-cyrillic
	Content-Type:	text/html; charset=cp866
	Content-Type:	text/html; charset=iso-8859-5
This allows "smart" browsers (for example, later versions of Netscape Communicator) to automatically switch fonts. Read more on this on Andrei Chernov's pages.
Having words like "please, choose an appropriate encoding" on your pages is really a BAD idea, drives people crazy. When I see these words, I really get mad on those who can not comply with simple rules, to eliminate all this encoding mess. This is especially true for those who have Microssoft made HTTP servers. This company, apparently think they can do whatever is convinient for them, not for others.
Here is my advice. Get the latest version of Apache and a FLY plug-in module written by Igor Sereda (sereda@spb.runnet.ru). The module allows on-the-fly recoding from one character set to another on the basis of either HTTP_ACCEPT_CHARSET or, if it is not set, it scans "User-Agent" field from which it tries to figure out what platform and OS you are on. I am archiving it on sunsite as well.
If you are using some other HTTP server, you are on your own. I would advise you to get a real software, especially if you are using some Microsoft stuff.
There might be different opinions about the recoding but until Unicode is not universally supported, I believe, we have to live with it. Although, my own opinion is that KOI8-R has to be the ONLY encoding for the Russian web, for e-mail and news are transported in this encoding.
If your ISP can not provide you with the correct server configuration, you might try to use the HTML tag. This will also tell a smart browser to switch the fonts. Here is the example:
	<Meta HTTP-EQUIV="Content-Type"
	    Content="text/html; charset=koi8-r">
However, this solution is very very very undesireble: it might interfere with the caching proxy servers, so you are loosing potental corporate clients that sit behind the firewall. If the proxy does recoding as well, what happens is that the document gets recoded but the tag stays, so you'll end up with the document that is impossible to see, unless you save it on the local disk, delete and load to the browser again. It is very unlikely that someone would want to do things like that.
This is especially painful when the charset is something other than KOI8-R, CP1251, for example. Because the Unix version of Netscape (4.04, at least) has a bug in CP1251 handling, you'll cut those users completely if you write CP1251 in the tag. Many Windows based HTML editors are stupid enough to write those tags, so PLEASE, PLEASE, PLEASE, always check the code after you created it!