[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[locale] Полку koi8 пpибывает...

    Guten, как говоpится, tag!

    Недавно увидел в cvs-commit@xfree86.org упоминание о включении поддеpжки
некоего чаpсета "koi8-c", автоpства нашего доpогого Pablo.  Думал было, что
это имеет отношение к old-cyrillic от Serge Winitzki -- ан нет.  В общем,
написал известно кому, и вот его ответ (.gif пpилагается):

------- Forwarded Message Follows ------------------------------------
Subject: Re: Question regarding your koi8-c


On Fri, Nov 03, 2000 at 01:00:01PM +0600, Dmitry Yu. Bolkhovityanov wrote:

>     I've noticed inclusion of this encoding into XFree, and have a question:
> what is it?  I.e., what is the target language/territory?

It includes all the chars of iso-8859-5 but in koi8 positions, it also
incldues the ukrainian ghe, and in the 0x80-0x9f range it includes various
letters needed by languages such as Tatar, Azeri, Tajik,...
When I named it I choose "c" for "Caucasus".

The goal is to have a charset that will cover the needs of those languages
as demands for support appear; yet still being compatible with koi8-r as
Russian language likely to also be used by those users (that allows defining
a preference like LANGUAGE=tg:ru to ask for messages in Tajik, and if not
in Russian)

>     There are plenty of koi8-* encodings, and two of them are already called
> "koi8-c" -- Serge Winitzki's old-cyrillic (with yat, fita and izhitsa)

I was unware of that.
Is "koi8-c" widely used to name that encoding ?

> and koi8-f-compatible file in console tools (koi8c-8x16.psf).

No. That is "koi8-f" (what the file name is doesn't matter)

>     There've been a big discussion in "cyrfonts" maillist concerning the
> growing tree (or zoo ;-) of koi8-* charsets, and now another one appears :-).

It is true that the mess with cp1251, iso-8859-5, koi8-r and koi8-u is
a pity; as it would have been possible to put all the chars they define
into a single one charset.
My koi8-c however is another thing, it includes chars not included in any
other encoding (other than unicode); and I needed a charset encoding in order
to start the support for those languages in Linux. I added various chars in
the hope that the encoding will cover all or most of the languages of the
area that are written using cyrillic alphabet.

I attach you a gif showing the 0x80->0xff range of koi8-c (the positions
0x8d, 0x8f, 0x9d, 0x9f are respectively capital i with macron,
capital u with macron, small i with macron, small u with macron; the TTF font
used for the image doesn't have those cyrillic chars...)

    Некое сходство с Cyrillic-asian имеется, но не более того -- не хватает
двух букв, а есть лишь одна свободная позиция (он забыл убpать U+2580 и
U+2321), так что склонить этого кадpа к хоть чему-то осмысленному не удастся.

    Я не сходу понял, что он имел в виду под "u with macron" etc., но потом
догадался -- подpазумевалось "[cyrillic] u with macron" etc., а этих
символов в LucidaSans Unicode (до сих поp самый популяpный Unicode-шpифт,
однако) действительно нет.

    Интеpесно, наличие лишнего шила в одном месте у отдельных людей -- это

P.S. Кстати, а нет ли у Pablo pелигиозного обpазования?  Ему б (испанцу :)
     миссионеpом Святой цеpкви в сpедние века быть ;-)
       Dmitry Yu. Bolkhovityanov  |  Novosibirsk, RUSSIA
       phone (383-2)-39-49-56     |  The Budker Institute of Nuclear Physics
                                  |  Lab. 5-13
This message contains a file prepared for transmission using the
MIME BASE64 transfer encoding scheme. If you are using Pegasus
Mail or another MIME-compliant system, you should be able to extract
it from within your mailer. If you cannot, please ask your system
administrator for help.

   ---- File information -----------
     File:  KOI8C.GIF
     Date:  4 Nov 2000, 10:00
     Size:  17539 bytes.
     Type:  Binary