Fw: unscientific charset survey



 Но мне кажется, данные не совсем чистые, так как в ньсах
(NNTP) практически не встречается Charset=Windows-1251
и процент KOI8-R максимален. Как известно, гейт www.fido7.ru
пропускает только KOI8-R.


From: Erland Sommarskog <sommar@algonet.se>
To: usefor@rkive.landfield.com <usefor@rkive.landfield.com>
Date: 11 марта 2001 г. 1:25
Subject: unscientific charset survey

This might be of some interest for this group:

From: "Eric A. Hall" <ehall@ehsco.com>
Newsgroups: comp.std.internat,comp.mail.mime,comp.mail.headers
Subject: unscientific charset survey
Date: 06 Mar 2001 17:46:21 GMT
I needed some charset distribution numbers and couldn't find any, so I
pointed a perl script at my ISP's news server.

   4,024,487 messages were processed.
   3,389,401 (84%) had no charset defined.
     632,680 (16%) had legal charsets or aliases defined.
       2,406 (.05%) had illegal charsets defined.

The following had more than 1,000 matches:

   ASCII              400,291
   ISO-8859-1         177,786
   ISO-8859-2          25,704
   KOI8-R              10,228
   ISO-2022-JP          7,677
   Windows-1252         4,718
   BIG5                 2,502
   UTF-8                1,616
   ISO-8859-15          1,064

Raw data and charts at http://www.ehsco.com/opinion/20010305.html

Eric A. Hall                                        http://www.ehsco.com/
Internet Core Protocols          http://www.oreilly.com/catalog/coreprot/

Erland Sommarskog, Stockholm, sommar@algonet.se