by Serge Winitzki

Contents

Motivation

This work is an effort of Alexy Khrabrov and Serge Winitzki to standardize the rendering of Russian text in Latin letters.

The idea is to provide enough alternative ways of rendering the Russian letters so that people can use different letter combinations according to their liking. However, our major concerns are consistency of coding and the ability to unambiguously and automatically render latinized Russian texts into various "native" forms (using existing coding schemes used on UNIX, DOS or the Macintosh) and vice versa.

At present, people are using some kind of latinized phonetic representation of Russian, and different representations disagree with each other. This makes the conversion of such phonetically represented Russian texts into a native coding rather tedious. The development of a standard coding scheme will especially facilitate electronic communication in Russian language between those limited to text-mode terminals and keyboards and those using a "native" Russian encoding and a graphical representation ("Russian fonts").

Overview

The Russkaya Latinica coding scheme was designed with you, the user, in mind. (We should add: an English-speaking user.) What seems most intuitive to you is most probably implemented in our scheme. In fact, we are sure that when you typed Russian text using Latin letters you used almost exactly the same coding as the one we propose. The words which are difficult to transliterate are the ones Russkaya Latinica helps you with. Russkaya Latinica isn't just another set of difficult rules to memorize. It's a flexible standard accomodating a wide range of intuitive perceptions, and you definitely won't have to drastically change your typing. At the same time, it's rigorous in the sense that every Russian word can be represented exactly and unambiguously.

To make it all work, I created a perl script to convert text from Russkaya Latinica to native encodings and back, as well as between various existing native schemes; currently, it supports KOI-8, ISO, DOS CP-866 ("alternative"), DOS CP-1252 (MS-Windows native), and Macintosh encodings. You will need a free program perl to run this script; read it for further instructions.

There is also an older (but functional) software package for the same purpose which may be useful for people with old and slow DOS-only machines (e.g. a 80286 PC).

Getting started

The standard we propose has only a few new rules you have to remember.
  • The letter "tshcha" is can be represented by a q or a w or a combination shch (which by itself only rarely occurs in Russian, e.g. in the word "vesnushchatyj". We hope you find at least one of those alternatives intuitive enough. If you are used to typing w for v, or typing q for ja, the translator program should have an option to accomodate these alternatives; however, we suggest sticking to v.
  • The letter "oborotnoe E" is e' or e` or `e, as in: e'tot. Also, a perhaps more intuitive symbol @ may be used for both uppercase and lowercase e', as in: @KRAN NE RABOTAET.
  • The hard sign is ~ and the soft sign is ' (or `). For example: ob~ezdit'.
  • You can use the symbols ~, ' and ` for the hard and the soft sign in both uppercase and lowercase words, since they are "malleable" (see below) and will become uppercase in all-uppercase words. For example, in the text "V~EZD TOL'KO DLYA PERSONALA", the hard and the soft signs will be translated to uppercase.

    That's it! Now you know enough to use Russkaya Latinica. Most probably, the phonetic code that feels right to you (if you map it to English phonetics) is compatible with our rules.

    Note:

    Examples of usage

    Let us show a few examples to illustrate the proposed scheme.

    Bystraja Ryzhaja Lisa Prygnula Cherez Lenivuju Sobaku.
    (translation of "A Quick Brown Fox Jumped Over the Lazy Dog")

    [Russian text 1]
    Odin brityj anglichanin finiki zheval kak morkov'.

    [Russian text 2]
    Kazhdyj ohotnik zhelajet znat', gde sidit fazan.

    [Russian text 3]
    Zawiwajuwajasya zhenwina zhevala szhizhennyj zhen'shen'.


    [Russian text 4]

    V'juwijsya plyushch ne zakryval vida s verandy na roskoshnyj plyazh,
    raskinuvshijsya na beregu zaliva. Vglyadyvajas' v ob~ektiv binoklya,
    maj'or Wukin skazal:
    - Chto-to malovato chajek segodnya. Mozhet, e'to iz-za holodov?
    - Kakie tam ewyo chajki! - ne otryvajas' ot zhurnala, prognusavil
    Jeryomenko. - Nam by e'togo nashego negodyaja za zhabry sxvatit'...


    Here is an example of a phonetic coding used by Alex Kaplan (of INFO-RUSS). (Note that the below text is almost perfectly compatible with Russkaya Latinica, even though its author has never heard of it. The only problem is with the letter "tshcha".)

    Direktrisa Federal'noj migracionnoj sluzhby g-zha T. Regent izdala prikaz, zapreschajuschij predostavljat' status bezhencev (tochnee, vynuzhdennyh pereselencev) "licam chechenskoj nacional'nosti". Korrespondentu "Moskovskih novostej" ona zajavila, chto prikaz prinjat "pod davleniem sverhu".
    Zakony i konvencii, imejuschie silu zakona, kotorye etim narusheny, mozhno perechisljat' dolgo, i etot perechen' nachinaetsja s Konstitucii RF, zapreschajuschej dazhe "trebovat' ot grazhdan opredelenija ili ukazanija svoej nacional'noj [etnicheskoj] prinadlezhnosti". Zabavno, chto eta stat'ja mozhet byt' priostanovlena pri rezhime ChP (v otlichie ot takih dejstvitel'no fundamental'nyh svobod, kak pravo chastnoj sobstvennosti). Vprochem, ChP ne vvodilos'.

    We hope that you will like Russkaya Latinica enough to start using it. The more people stick to the standard, the easier it will become to communicate.

    Some advanced features

    Malleability

    The letters e' and the soft and hard signs are translated by one character, but what if one needs them uppercase and lowercase? The solution is to introduce the concept of "malleable" characters. The symbols for e' and the soft and hard signs (@, ' and ~) are lowercase by default, but they are transformed to uppercase when surrounded by uppercase letters. The precise definition of what "surrounded" means is found here. Malleability makes it easier to write all-uppercase captions.

    Caveat: The symbols @, ' and ~, when used alone, stand for lowercase letter e' and the lowercase soft and hard signs. If a stand-alone uppercase "E'" is needed, one cannot use a stand-alone "@" because that would be lowercase.

    The standalone uppercase soft and hard signs are never used in a Russian text. Should one need to use them, there is a special way to do this. All malleable symbols become uppercase when preceded by ^ and become lowercase when preceded by _.

    The malleability is akin to the property of two-letter combinations to become uppercase if one of the letters is uppercase. For example, SH, sH and Sh mean the same uppercase letter "sha".

    Escape character

    The backslash character \ is used as an escape character that prevents other symbols from sticking together. For instance, if one writes *major, the letter j will stick to o to form the letter yo. To circumvent this, you can use "j'" instead of "j": maj'or. However, a more straightforward approach is to write maj\or. Generally, a backslash will produce no output but will break digraphs. It also escapes itself, i.e. \\ produces a single backslash.

    Another use of the backslash is to prevent quotes (` and ') from being translated as the soft sign (use \` and \' for quotes; at the beginning of word, the quote characters ` and ' don't have to be escaped, since no Russian word begins with a soft sign). Actually, all malleable characters - @ ~ ' ` - can be escaped using backslash (preventing their translation).

    How to combine Russian and English text

    It is important to be able to include some real Latin letters or a whole section in another language in a Russian text. This is accomplished yet again by the backslash. A backslash followed by one space works as a Russian/English toggle. For example (underlined for clarity):
    - E'to-to kak raz \ fine, \ - skazal ya. - E'to \ OK with me.\
    Note that the \ combination produces no output, not even a space, so you should add the spaces if necessary.

    After a backslash-space combination and until the next one, no translation whatsoever is performed on the text. This may be useful for entering English, TeX commands or other text that uses backslashes and special characters. If you need to enter native coded Russian TeX commands... well, you probably don't need to use Russkaja Latinica for that purpose.

    The latest script (see above) supports an option to interpret TeX commands and text within dollar signs (i.e. equations) without translation. This is probably of little use but was requested by a friend of mine.

    What to pay attention to

    When using an intuiutive phonetical scheme of their own, people usually don't pay attention to unambiguous representation of letters. Common examples include: using "ts" instead of a single letter "ce"; "ye" for a single letter "e" (for example, *Yeltsin); "y" for "i kratkoe"; "yo" for both "i kratkoe + o" as well as for a single letter "yo" (as in *major); "sch" for "tshcha" (as in *veschi); and "g" or "j" for "zh". While phonetically obvious, such coding cannot be unambiguously translated into a native Russian coding without tedious editting by hand. A user would probably feel unsure about transliterating these letters anyway, and we suggest to use the Russkaya Latinica standard in these difficult cases.

    Here are some suggestions on how to handle common difficulties of spelling.


    [HOME]Back to home page
    1