[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[locale] Fw: unscientific charset survey



 Но мне кажется, данные не совсем чистые, так как в ньсах
(NNTP) практически не встречается Charset=Windows-1251
и процент KOI8-R максимален. Как известно, гейт www.fido7.ru
пропускает только KOI8-R.


-----Original Message-----
From: Erland Sommarskog <sommar@algonet.se>
To: usefor@rkive.landfield.com <usefor@rkive.landfield.com>
Date: 11 марта 2001 г. 1:25
Subject: unscientific charset survey

This might be of some interest for this group:

From: "Eric A. Hall" <ehall@ehsco.com>
Newsgroups: comp.std.internat,comp.mail.mime,comp.mail.headers
Subject: unscientific charset survey
Date: 06 Mar 2001 17:46:21 GMT
Organization: EHS Company
Lines: 26
Message-ID: <3AA52268.F6A76652@ehsco.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
X-Mailer: Mozilla 4.75 [en] (WinNT; U)
X-Accept-Language: en

I needed some charset distribution numbers and couldn't find any, so I
pointed a perl script at my ISP's news server.

   4,024,487 messages were processed.
   3,389,401 (84%) had no charset defined.
     632,680 (16%) had legal charsets or aliases defined.
       2,406 (.05%) had illegal charsets defined.

The following had more than 1,000 matches:

   ASCII              400,291
   ISO-8859-1         177,786
   ISO-8859-2          25,704
   KOI8-R              10,228
   ISO-2022-JP          7,677
   Windows-1252         4,718
   BIG5                 2,502
   UTF-8                1,616
   ISO-8859-15          1,064

Raw data and charts at http://www.ehsco.com/opinion/20010305.html

Eric A. Hall                                        http://www.ehsco.com/
Internet Core Protocols          http://www.oreilly.com/catalog/coreprot/

Erland Sommarskog, Stockholm, sommar@algonet.se