[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Fwd: Fwd: Unicode support]



 




----------  Перенаправленное сообщение  ----------
Subject: Unicode support
Date: Sat, 10 Jul 1999 19:53:07 +0100
From: Markus Kuhn <Markus.Kuhn@cl.cam.ac.uk>


Peter Novodvorsky wrote on 1999-07-10 16:13 UTC:
> Can anybody tell me, does Unicode work in current 3.9 version?
> I mean Unicode locale support. Does anyone work on it, and
> what is still has to be done?

The only Unicode support in 3.9 so far is

  - There are now some ISO10646-1 encoded fonts (6x13, ClearlyU,
    other fixed and B&H fonts will follow shortly)

  - xterm has now a UTF-8 mode (option -u8), such that together with the 
    new fonts you can use Unicode in text-mode applications

More information on this has been collected on

  http://www.cl.cam.ac.uk/~mgk25/unicode.html
  http://www.cl.cam.ac.uk/~mgk25/ucs-fonts.html

There is now support for UTF-8 locales in glibc 2.1 and some people have
started with extending various GNU tools for UTF-8, but there is
unfortunately still *no* UTF-8 support whatsoever in Xlib. I also do not
know about anyone working on it. X11R6.3 did already contain a prototype
Unicode locale, but this is as far as I understand for the old and now
deprecated UTF-1 encoding, and not for UTF-8. UTF-1 was a draft encoding
that used modular arithmetic and did not use the C1 control character
codes in the range 0x80-0x9f. It was never used anywhere, and UTF-8 is
now the Unicode encoding for Unix both commonly used and officially
recommended by POSIX standards.

Sun, IBM, SCO, etc. have all independently added already UTF-8 locales
to their respective X distributions, but the X consortium collapsed
before this could go into the sample implementation, so someone has to
do it again for XFree86. The locale support for UTF-1 (lcutf.c, etc.)
and for the various Japanese multibyte encodings should give some
guideline on how to do it properly.

There is various i18n and X Input Method documentation in the
distribution that looks relevant, but is not very easy to read and needs
very careful study first. The actual amount of coding necessary might be
very small. We have already a complete keysym -> Unicode conversion
table in xterm (keysym2ucs.c), which can easily be recycled in an Xlib
UTF-8 input method.

Because Xlib does not yet have a UTF-8 locale, xterm currently uses a
hack in order to convert keysyms directly into UTF-8, bypassing much of
the normal Xlib keyboard mechanics. This should really be done via
XmbLookupString inside Xlib, which should then also allow the use of the
compose key to access many more Unicode characters than the normal
keyboard layout offers.

By the way, a few ideas for full Unicode keyboard support are in

  http://www.cl.cam.ac.uk/~mgk25/volatile/ISO-14755.pdf

and should be later also be considered when a UTF-8 Input Method is
implemented.

I also suggest that the UTF-1 code be removed at the same time, because
UTF-1 was never really used and ISO has dumped the corresponding
standard.

Do you want to have a go at it? 

Markus

-- 
Markus G. Kuhn, Computer Laboratory, University of Cambridge, UK
Email: mkuhn at acm.org,  WWW: <http://www.cl.cam.ac.uk/~mgk25/>
--
Peter Novodvorsky,
  Anar kaluva tielyanna!