ICSA Guide to Cryptography
Randall Nichols
 $69.95  0-07-913759-8
Backward Forward
Chapter: 1 | 2 | 3 | 4 | 5 | 6 | 7 | 9 | 10

Reserve your copy at a
Beta Bookstore near you!
Contact Bet@books
© 1998 The McGraw-Hill Companies, Inc. All rights reserved.
Any use of this Beta Book is subject to the rules stated in the Terms of Use.

CHAPTER 3

HISTORICAL SYSTEMS I

Cryptograms are roughly divided into Ciphers and Codes. William F. Friedman (1891-1969), the dean of American Cryptography, defined a Cipher message as one produced by applying a method of cryptography to the individual letters of the plain text taken either singly or in groups of constant length. Practically every cipher message is the result of the joint application of a General System (or Algorithm) a method of treatment, which is invariable and a Specific Key which is variable, at the will of the correspondents. The Algorithm controls the exact steps followed under the general system. It is assumed that the general system is known by the correspondents and by the cryptanalyst. What is not generally known is the key structure.

A Code message is a cryptogram which has been produced by using a code book consisting of arbitrary combinations of letters, entire words, figures substituted for words, partial words, phrases, of plain text. Whereas a cipher system acts upon individual letters or definite groups taken as units, a code deals with entire words or phrases or even sentences taken as units. The process of converting plain text into cipher text is Encipherment. The reverse process of reducing cipher text into plain text is known as Decipherment.

Substitution and Transposition Ciphers Compared

Cipher systems are divided into two classes: substitution and transposition. Modern cipher systems use both substitution and transposition to create secret messages. The fundamental difference between substitution and transposition methods is that in the former the normal or conventional values of the letters of the plain text are changed, without any change in the relative positions of the letters in their original sequences, whereas in the latter only the relative positions of the letters of the plain text in the original sequences are changed, without any changes to the conventional values for the letters. Since the methods of encipherment are radically different in the two cases, the principles involved in the cryptanalysis of both types of ciphers are fundamentally different. It is instructive to be able to differentiate whether a cipher has been enciphered by substitution or transposition.

Simple Substitution

Probably the most popular amateur cipher is the simple substitution cipher (aka Aristocrat). We see them in newspapers. Kids use them to fool teachers, lovers send them to each other for special meetings, they have been used by the Masons, secret Greek societies, and by fraternal organizations. Current gangs in the Southwest use them to do drug deals. They are found in literature like the "Gold Bug" by Edgar Allen Poe and "The Dancing Men" by Arthur Conan Doyle. The death threats by the infamous Zodiac killer in San Francisco in the late 1960's were also simple substitutions.

A recurring theme of this book is that all ciphers have a common basis in mathematics and probability theory. The basis language of the cipher doesn't matter as long as it can be characterized mathematically. Mathematics is the common link for deciphering any language cipher. This is also known as the principle of Cryptographic Universality.

Based on mathematical and statistical principles, we can identify the language of the cryptogram and then break open its contents.

Four Basic Operations of Cryptanalysis

William F. Friedman presented the fundamental operations for the solution of practically every cryptogram:

    1. The determination of the language employed in the plain text version.
    2. The determination of the general system of cryptography employed.
    3. The reconstruction of the specific key in the case of a cipher system, or the reconstruction of, partial or complete, of the code book, in the case of a code system or both the key and the codebook in the case of an enciphered code system.
    4. The reconstruction or establishment of the plain text gained from steps 1 - 3.

In some cases, step 2 may precede step 1. This is the classical approach to cryptanalysis.

It may be further reduced to:

    1. Arrangement and rearrangement of data to disclose nonrandom characteristics or manifestations (frequency counts, repetitions, patterns, symmetrical phenomena).
    2. Recognition of the nonrandom characteristics or manifestations when disclosed (via statistics or other techniques).
    3. Explanation of nonrandom characteristics when recognized (by luck, intelligence, or perseverance).

Much of the work is in determining the general system. In the final analysis, the solution of every cryptogram involving a form of substitution depends upon its reduction to monoalphabetic terms (one alphabet or one set of language symbols), if it is not originally in those terms.

A demonstration of the solution of a simple "Aristocrat" substitution may start the process of understanding the science of cryptography.

General Nature of English Language

A working knowledge of the letters, characteristics, relations with each other, and their favorite positions in words is very valuable in solving substitution ciphers.

W. F. Friedman was the first to employ the principle that English language letters are mathematically distributed in a uniliteral frequency distribution:

13 9 8 8 7 7 7 6 6 4 4 3 3 3 3 2 2 2 1 1 1 - - - - -

E T A O N I R S H L D C U P F M W Y B G V K Q X J Z

That is, in each 100 letters of text, E has a frequency (or number of appearances) of about 13; T, a frequency of about 9; K Q X J Z appear so seldom, that their frequency is a low decimal. Table 3-1 and 3-2 present a historical view of English data based on military text.

Table 3-1

Hitt’s Military Text – English Data

Basis 20,000 letters of military text:

6 Vowels: A E I O U Y = 40 %

20 Consonants:

5 High Frequency (D N R S T) = 35 %

10 Medium Frequency (B C F G H L M P V W) = 24 %

5 Low Frequency (J K Q X Z) = 1 %

====

100 %

The four vowels A, E, I, O and the four consonants N, R,

S, T constitutes about 2/3 of the normal English plain text.

The most frequent English digraphs are:

TH--50 AT--25 ST--20

ER--40 EN--25 IO--18

ON--39 ES--25 LE--18

AN--38 OF--25 IS--17

RE--36 OR--25 OU--17

HE--33 NT--24 AR--16

IN--31 EA--22 AS--16

ED--30 TI--22 DE--16

ND--30 TO--22 RT--16

HA--26 IT--20 VE—-16

The most frequent English trigraphs (three letter combinations):

THE--89 TIO--33 EDT--27

AND--54 FOR--33 TIS--25

THA--47 NDE--31 OFT--23

ENT--39 HAS--28 STH--21

ION--36 NCE--27 MEN—-20

Frequency of Initial and Final Letters:

Letters- A B C D E F G H I J K L M N O P Q R S T U V W X Y Z

Initial- 9 6 6 5 2 4 2 3 3 1 1 2 4 2 10 2 - 4 5 17 2 - 7 - 3 - Final - 1 - 10 17 6 4 2 - - 1 6 1 9 4 1 - 8 9 11 1 - 1 - 8 -

Relative Frequencies of Vowels:

A 19.5% E 32.0% I 16.7% O 20.2% U 8.0% Y 3.6%

Average number of vowels per 20 letters, 8.

Table 3-2

Probability of Occurrence of English Letters (Friedman Data)

Letter Probability Letter Probability

A .082 N .067

B .015 O .075

C .028 P .019

D .043 Q .001

E .127 R .060

F .022 S .063

G .020 T .091

H .061 U .028

I .070 V .010

J .002 W .023

K .008 X .001

L .040 Y .020

M .024 Z .001

Letter Groups:

  1. E, having a probability of about 0.127
  2. T, A, O, I, N, S, H, R, each having probabilities between 0.06 - 0.09.
  3. D, L, having probabilities around 0.04.
  4. C, U, M, W, F, G, Y, P, B, each having probabilities between 0.015 - 0.023.
  5. V, K, J, X, Q, Z, each having probabilities less 0.01.

Letter Groups

A E I O U 38.58%

L N R S T 33.43%

J K Q X Z 1.11%

E T A O N 45.08%

E T A O N I S R H 70.02%

Letter Characteristics and Interactions

Appendix B presents a variety of language data. Several references give letter characteristics. Gaines presents letter contact data for English, German, French, Italian, Spanish and Portuguese. Nichols published data on English and 10 different languages. It is available online at the ACA Crypto Drop Box ACA-L@sage.und.nodak.edu.

Friedman in his Military Cryptanalytics Part I - Volume 1 gives charts showing the lower and upper limits of deviation from theoretical (random) for the number of vowels, high, low and medium frequency consonants, and blanks in distributions for plain text and random text for messages of various lengths.

In Military Cryptanalytics Part I - Volume 2, Friedman gives a veritable potpourri of statistical data on letter frequencies and word characteristics such as digraphs, trigraphs, tetragraphs, grouped letters, relative log data, pattern words, idiomorphic (structural) data, standard endings, initials, foreign language data [German, French, Italian, Spanish, Portuguese and Russian], classification of systems used in concealment, nulls and literals.

Sinkov assigns log frequencies to digraphs to aid in identification. Friedman explains this procedure. Depending on the basis text, we choose we find variations in the frequency of letters. For example, literary English gives slightly different results than frequencies based on military or ordinary English text.

The important concept is that languages may be characterized by their letter behavior. It turns out that similar groups of information, such as vowel relationships with specific consonants, carry through in the cipher text and are potentially identifiable in the

cipher text.

Eyeballing and Aristocrat

While reading the newspaper, you see the following cryptogram. Train your eye to look for wedges or "IN’s" into the cryptogram. Assume that we’re dealing with English and that we have simple substitution. What do we know? Although short, there are several entries for solution. Number the words.

A-1. Elevated thinker. K2 (71) by: LANAKI

1 2 3 4 5

F Y V Y Z X Y V E F I T A M G V U X V Z E F A

6 7 8 9 10

I T A M F Y Q F M V Q D V E J D D A J T U V U

11 12 13 14

R O H O E F V D O. * Q G R V D F * E S Y M V Z F P V D

Analysis of A-1

We note that words numbered 2,3,7,10 and 12 have patterns of repeated letters. We could use published lists of pattern words, which are words listed alphabetically, grouped by number of letters in the word and by the pattern of the repeated letters in the group. The solver not having access to these pattern words lists can easily make his own, but with experience, the common pattern words in newspaper cryptograms will easily be remembered. We examine words 1 and 7 and see that the first and fourth letters of word 7 are the same as the first two letters of word 1. This suggests word 7 is a very common pattern word "that" and that word 1 is most likely "the". A quick count shows that "V" occurs 11 times, approximately 15% of the 71 letters of the cryptogram, making it a very likely candidate for "e". Words 3 and 6 use the same 4 letters I T A M. Note that there is a flow to this cryptogram The _ _ is? _ _ and? _ _. Titles either help or should be ignored as red herrings. Elevated might mean "high" and the thinker could be the proper person.

Filling in the cryptogram using [the... that] assumption we have:

1 2 3 4 5

t h e h h e t e e t

F Y V Y Z X Y V E F I T A M G V U X V Z E F A

6 7 8 9 10

t h a t e a e e

I T A M F Y Q F M V Q D V E J D D A J T U V U

11 12 13 14

t e a e t h e t e

R O H O E F V D O. * Q G R V D F * E S Y M V Z F P V D

Not bad for a start. Word 5, a two letter word starting with "t" can only be "to". We fill in the substitution of "o" for "A". When we look at the group of words 7,8,9 we see that word 9, a three letter word starting with "a" and ending with "e" could be "ace", "ade", "age", "ale", "ape", "ate", "axe", "awe", and word 8, a two letter word ending in e can only represent "be", "he", "me", or "we". We add this information to the recovered portion of the cryptogram. No matter how hard we make the process, no matter how hard we scramble the plain text, it is only a puzzle, albeit a difficult one. Note how each wedge leads to the next wedge. Always look for confirmation that your assumptions are correct. Have an eraser ready to step back a step if necessary. Keep a tally on which letters have been placed correctly. Those that are unconfirmed guesses signify with ‘?’ Piece by piece, we build on the opening wedge.

1 2 3 4 5

t h e h h e s t o e e s t o

F Y V Y Z X Y V E F I T A M G V U X V Z E F A

6 7 8 9 10

o t h a t e a r e s r r o e

I T A M F Y Q F M V Q D V E J D D A J T U V U

11 12 13 14

s t e r a e r t s h e t e r

R O H O E F V D O. * Q G R V D F * E S Y M V Z F P V D

Now we have some bigger wedges. The s_h is a possible 'sch'from German. Word 10 could be 'surrounded.' Z = i. The name could be Albert Schweitzer. Lets try these guesses. Word 2 might be 'highest' which goes with the title.

1 2 3 4 5

t h e h i g h e s t n o w l e d g e i s t o

F Y V Y Z X Y V E F I T A M G V U X V Z E F A

6 7 8 9 10

n o w t h a t w e a r e s u r r o u n d e d

I T A M F Y Q F M V Q D V E J D D A J T U V U

11 12 13 14

s t e r a l b e r t s c h w e i t z e r

R O H O E F V D O. * Q G R V D F * E S Y M V Z F P V D

The final message is: The highest knowledge is to know that we are surrounded by mystery. Albert Schweitzer.

We have solved ("cracked") the message, but what do we know about the keying method? In problem A-1, we set up the plain text alphabet as a normal sequence

[ A, B, … Z ] and fill in the cipher text letters below it. Note the keyword LIGHT.

plain - a b c d e f g h i j k l m n o p q r s t u v w x y z

cipher - Q R S U V W X Y Z L I G H T A B C D E F J K M N O P

Keyword = LIGHT

Cipher text alphabets are generally mixed for more security and an easy mnemonic to remember is chosen as a translation key.

In tougher ciphers, we use the above key recovery procedure to go back and forth between the cryptogram and keying alphabet to yield additional information.

To summarize the ‘eyeball’ method (more sophisticated souls call this method "by inspection"):

  1. Look for common letters that appear frequently throughout the message but don't expect an exact correspondence in popularity.
  2. Look for short, common words (the, and, are, that, is, to) and common endings (tion, ing, ers, ded, ted, ess).
  3. Make a guess, try out the substitutions, and keep track of your progress. Look for readability.

There is a popular game show on TV known as "Wheel of Fortune". Recognize the similarities?

Multiliteral Substitution with Single-Equivalent Cipher Alphabets

Monoalphabetic substitution methods are classified as uniliteral and multiliteral systems. Uniliteral systems maintain a strict one-to-one correspondence between the length of the units of the plain and those of the cipher text. Each letter of plain text is replaced by a single character in the cipher text. In multiliteral monoalphabetic substitution systems, this correspondence is no longer one plain to one cipher but may be one plain to two cipher, where each letter of the plain text is replaced by two characters in the cipher text; or one plain to three cipher, where a three-character combination in the cipher text represents a single letter of the plain text. We refer to these systems as uniliteral, biliteral, and triliteral, respectively. Ciphers in which one plain text letter is represented by cipher characters of two or more elements are classed as multiliteral.

Biliteral Ciphers

Friedman gives some interesting examples of biliteral monoalphabetic substitution. Many cipher systems start with a geometric shape

>>INSERT FIGURE 3-1 HERE<<

Using the square in Figure 3-1, we derive the following cipher alphabet:

Plain : a b c d e f g h I j k l m

Cipher: WW WH WI WT WE HW HH HI HT HT HE IW IH

Plain : n o p q r s t u v w x y z

Cipher: II IT IE TW TH TI TT TE EW EH EI ET EE

The alphabet derived from the cipher square or matrix is referenced by row and column coordinates, respectively. The key to this system is that when a message is enciphered by this biliteral alphabet, the cryptogram is still monoalphabetic in character. A frequency distribution based upon pairs of letters will have all the characteristics of a simple uniliteral distribution for a monoalphabetic substitution cipher.

Numbers can be used as effectively as letters in the biliteral cipher. The simplest form is A=01, B=02, C=03,...Z=26. So, the plain text letters have as their equivalents two-digit numbers indicating their position in the normal alphabet. Dinome (two digit) cipher matrices are shown in Figure 3-2:

>>INSERT FIGURE 3-2 HERE<<

Note that frequently-used punctuation marks can be enciphered in the above matrix.

Another four are shown in Figures 3-3 through 3-6:

>>INSERT FIGURES 3-3 to 3-6 HERE<<

It is possible to generate false or pseudo-code or artificial code language by using an enciphering matrix that uses vowels as row indicators and consonants as column indicators.

>>INSERT FIGURE 3-7 HERE<<

Enciphering the word RAIDS would be OCABE FAFOD.

Another subterfuge used to camouflage the biliteral cipher matrix is to append a third character to the row or column indicator. This third character may be produced through the use of cipher matrix shown in Figure 3-8 (wherein A=611, B=612, etc.) or the third character can be the "sum checking" digit which is the non-carrying sum (modulo 10) of the preceding two digits such as trinomes 257, 831, and 662.

>>INSERT FIGURE 3-8 HERE<<

All the above matrices are bipartite, which means they can be divided into two separate parts that can be clearly defined by row and column indicators. This is the primary weakness of this type of cipher. Sinkov presents a good description of the modulo arithmetic required to solve biliteral cipher challenges.

Biliteral but Not Bipartite

Consider the following cipher matrix:

>>INSERT FIGURE 3-9 HERE<<

We can produce a biliteral cipher alphabet in which the equivalent for any letter in the matrix is the sum of the two coordinates that indicate its cell in the matrix:

Plain A B C D E F G H I J K L M

Cipher 14 20 19 12 22 23 24 10 18 18 25 17 26

Plain N O P Q R S T U V W X Y Z

Cipher 28 29 30 31 13 32 34 16 35 36 37 11 38

A = 9+5 =14, E = 21 + 1 =22

The cipher units are biliteral but they are not bipartite. An equivalent is 14 and digits 1 and 4 have no meaning per se. Plain text letters whose cipher equivalents begin with 1 may be found in two different rows of the matrix and those of whose equivalents end in 4 appear in three different columns.

Another possibility lends itself to certain multiliteral ciphers in the use of a word spacer or word separator. The word space might be represented by a value in the matrix; i.e., the separator is enciphered as a value (dinome 39 in Figure 3-4). The word space might be an unenciphered element. Lets break from the theory and look at four interesting multiliteral historical ciphers before discussing the general cryptanalytic attack on the multiliteral cipher.

Trithemian

The abbot Trithemius, born Johann von Heydenberg (1462-1516) invented one of the first multiliteral ciphers. It was fashioned similar to the Baconian Cipher and was a means for disguising secret text. His work "Steganographia" published in 1499 describes several systems of 'covered writing.'

His alphabet, modified to include 26 letters of present-day English, is shown in Figure 3-10, below; it consists of all the permutations of three things taken three at a time or 3 3 = 27 in all.

>>INSERT FIGURE 3-10 HERE<<

The cipher text does not have to be restricted to digits; any groupings of three things taken three at a time will do.

Bacon

Sir Francis Bacon (1561-1626) invented a cipher in which the cipher equivalents are five-letter groups and the resulting cipher is monoalphabetic in character. Bacon uses a 24-letter cipher with I and J, U and W used interchangeably (Table 3-3).

Table 3-3

Bacon’s Biliteral Alphabet

A = aaaaa I/J = abaaa R = baaaa

B = aaaab K = abaab S = baaab

C = aaaba L = ababa T = baaba

D = aaabb M = ababb U/V = baabb

E = aabaa N = abbaa W = babaa

F = aabab O = abbab X = babab

G = aabba P = abbba Y = babba

H = aabbb Q = abbbb Z = babbb

 

Bacon described the steganographic effect of message enfolding in an innocent external message. Suppose we let capitals be the "a" element and lower-case letters represent the "b" elements. The message "All is well with me today" can be made to convey the message "Help."

A L l i s W E l L W I t H m E T o d a Y

a a b b b a a b a a a b a b a a b b b a

H E L P

Thus, Bacon describes several variations on the theme. Note the regularity of construction of Bacon's biliteral alphabet, a feature which permits its reconstruction from memory.

Hayes Ciphers

Probably the most corrupt political election occurred on November 7, 1876 with the election of President Rutherford B. Hayes (Republican). He defeated Samuel Jones Tilden (Democrat). Tilden had won the popular vote by 700,000 votes but because of frauds surrounding the Electoral College; he was deprived of the high office of President. Actually, both candidates were involved with bribery, election tampering, voter fraud, conspiracy and a host of other goodies. Tilden ran on a law and order ticket that credited him with convicting Boss Tweed and the Tweed Ring in New York City, which controlled the city through Tammany Hall. For two years into Hayes Presidency, the scandals persisted.

With the help of the New York Tribune, Republicans finished the Tilden 'honesty' horse. They published the Tilden Ciphers and keys. There were about 400 of them representing substitution and transposition forms. We will revisit the transposition forms at a later juncture. They represented secret and illegal operations by Tilden’s men in Florida, Louisiana, South Carolina and Oregon. The decipherment’s were done by investigators of the Tribune. Here are two examples and their solution:

GEO. F. RANEY, Tallahassee.

P P Y Y E M N S N Y Y Y P I M A S H N S Y Y S S I T E P A A E

N S H N S P E N N S S H N S M M P I Y Y S N P P Y E A A P I E

I S S Y E S H A I N S S S P E E I Y Y S H N Y N S S S Y E P I

A A N Y I T N S S H Y Y S P Y Y P I N S Y Y S S I T E M E I P

I M M E I S S E I Y Y E I S S I T E I E P Y Y P E E I A A S S

I M A A Y E S P N S Y Y I A N S S S E I S S M M P P N S P I N

S S N P I N S I M I M Y Y I T E M Y Y S S P E Y Y M M N S Y Y

S S I T S P Y Y P E E P P P M A A A Y Y P I I T

L' Engle goes up tomorrow. Daniel

Examination of the message discloses a bipartite alphabet cipher with only ten different letters used. Dividing the messages by twos, assigning arbitrary letters for pairs of letters and performing a triliteral frequency distribution will yield a solution.

PP YY EM NS NY YY PI MA SH NS YY SS …

A B C D E B F G H D B I …

Message reads:

Have Marble and Coyle telegraph for influential men from Delaware and Virginia. Indications of weakening here. Press advantage and watch board.

Here is another cryptogram using numerical substitutes:

S. PASCO AND E. M. L'ENGLE

84 55 84 25 93 34 82 31 31 75 93 82 77 33 55 42

93 20 93 66 77 66 33 84 66 31 31 93 20 82 33 66

52 48 44 55 42 82 48 89 42 93 31 82 66 75 31 93

DANIEL

There were several messages of this type. They disclosed that only 26 different numbers were used.

Message reads:

Cocke will be ignored, Eagan called in. Authority reliable.

The Tribune experts came up with the following alphabets:

AA = O EN = Y IT = D NS = E PP = H SS = N

AI = U EP = C MA = B NY = M SH = L YE = F

EI = I IA = K MM = G PE = T SN = P YI = X

EM = V IM = S NN = J PI = R SP = W YY = A

--------------------------------------------------------------------

20 = D 33 = N 44 = H 62 = X 77 = G 89 = Y

25 = K 34 = W 48 = T 66 = A 82 = I 93 = E

27 = S 39 = P 52 = U 68 = F 84 = C 96 = M

31 = L 42 = R 55 = O 75 = B 87 = V 99 = J

William F. Friedman correlated these alphabets with the results being amusing:

H I S P A Y M E N T

1 2 3 4 5 6 7 8 9 0

----------------------------------

H1 .

I 2 . K S D

S3 . L N W P

P4 . R H T

A5 . U O

Y6 . X A F

M7 . B G

E8 . I C V Y

N9 . E M J

T0 .

-----------------------------------

The blank squares may have contained proper names and money designations. Key = HISPAYMENT for bribery seems to be appropriate.

Blue and Gray

One of the most fascinating stories of the American Civil War (1861-65) is about communications using flag telegraphy or also known as the wigwag signal system. Wigwag is a system of positioning a flag (or flags) at various angles that indicate the corresponding twenty-six letters of the alphabet. It was created in the mid-1800s by three men working at separate locations: Navy Captain Phillip Colomb and, Army Captain Francis Bolton, in England, and Surgeon-inventor Albert J. Meyer in America. Meyer observed the railroad electromagnetic telegraph, developed by Alexander Bain, and invented a touch method of communication for the deaf and later the wigwag system. He developed companion methods with torches and disks. The name "wigwag" derived from the flag movements.

Three main color combinations were used in flags measuring two, four, and six feet square. The white banners had red, square centers while the black or red flags had white centers. Myers method required three motions (elements) to be used for each letter. The first position always initiated a message sequence. Motion one went from head to toe and back on right side. Motion 2 went from head to toe and back on left side. Motion three went from head to toe and back in front of the man. Each motion was made quickly. Figure 3-11 indicates the multiliteral alphabet and directional orders to convey a message.

>>INSERT FIGURE 3-11 HERE<<

As the Civil War wore on, Myer increased the wigwag motions to four. This enabled more specialized words and abbreviations to be used. In 1864, Myer invented a similar daytime system with disks. For night signals, Myer applied his system with torches on the signal poles and lanterns. A foot torch was used as a reference point. Thus, the direction of the flying wave could better be seen. Compare this to the semaphore system used by ships at sea when radio silence is necessary.

Myer continuously improved his invention through 1859 and presented his findings gratis to the Union Army (which gave him a luke warm yawn for his trouble). Alexander Porter, his chief assistant joined the Confederate Army and used the wigwag system in actual combat. Porter was able to warn Colonel Nathan Evans at Manassas Junction - Stone Bridge that the Union Army had reached Sudley Ford and was about to surprise General Beauregard's best Division. Porter sent from his observation tower, the following message to Colonel Evans at the Stone Bridge defenses: "Look out for your left, you are turned."

Colonel Evans turned his cannons and musket fire toward the Federal troops before they could initiate their attack. Porter was credited later (and decorated), for his vigilance led to changes in the tactics of the entire struggle around Manassas Junction. The application of the new signal system had directly influenced the shocking Union defeat that eventful July day.

Myers signaling system was catapulted into use at the Battle of Gettysburg. General Lee had invaded northern soil in June 1863. His Potomac crossing was relied by flag system to the War Department. General Joseph Hooker resigned under fire on June 28. General George Meade (of NSA grounds fame--a weathered face if you ever saw one) took over command of the Army of the Potomac. His headquarters were at Taneytown, MD. Startling news came via signalmen on July 1. A skirmish on the Maryland border indicated that General Buford was facing a major force not in Maryland but in Pennsylvania. Lee was himself in command at Gettysburg. Signalmen of each army unit sent out calls for help. Reinforcements from dozens of units several miles away were committed to the fray. By July 1, 73,000 gray and 88,000 blue met in one of history's most decisive battles.

Rarely, if at all, do textbooks even hint that the secret message system of flags affected these history-changing events. Yet, the crucial sightings by Union observers directly tipped the scales against Lee's best tactics. The most famous incident was when Captain Castle on Cemetery Ridge, refused to submit to Confederate artillery barrage as General George Pickett charged the "thin blue line", used a wooden pole and a bedsheet to make a makeshift flag to alert Union forces under General Meade who ordered countermeasures. Pickett's charge was stopped short of breaching the Union lines. General Lee's gamble failed. Previously disregarded flagmen enabled George Meade to enter the shrine of heroes.

Numerical Ciphers

Cipher alphabets whose cipher components consist of numbers are practicable for telegraph or radio transmission. They may take forms corresponding to those employing letters. Standard numerical cipher alphabets are those in which the cipher component is a normal sequence of numbers.

Plain - A B C D E F G H I J K L M

Cipher - 11 12 13 14 15 16 17 18 19 20 21 22 23

Plain - N O P Q R S T U V W X Y Z

Cipher - 24 25 26 27 28 29 30 31 32 33 34 35 36

We could easily have started the cipher alphabet with A= 01, B=02,..., Z=26 with the same results.

Mixed numerical cipher alphabets are those that have been keyed by a key word turned into numerical cipher equivalents or have a random combination of two or more digits for each letter of plain text.

Plain - A B C D E F G H I J K L M

Cipher - 76 88 01 67 04 80 66 99 96 96 02 69 90

Plain - N O P Q R S T U V W X Y Z

Cipher - 77 05 87 60 39 79 03 78 68 98 86 70 97

Rather than apply a brute force attack on all combinations of two-letter equivalents of the above ciphertext, we would try a frequency count, then check for repeated digrams and trigrams, and then solve as one-for one substitution without complicating modifications.

Figures 3-3 and 3-4 could be arranged for simple numerical equivalents like this:

>>INSERT FIGURES 3-3a & 3-4a HERE<<

Numerical cipher values lend themselves to treatment by various mathematical processes to further complicate the cipher system in which they are used. These processes, mainly addition or subtraction, may be applied to each cipher equivalent individually, or to the complete numerical cipher message by considering it as one number. The Hill cipher is another good example of the use of mathematical transformation processes on ciphers. See Bauer or Kahn for details. In modern cryptographic systems, the DES family of ciphers use simple S-Boxes [substitution boxes] that are reorganized by ordered non-linear mathematical rules applied several times over (known as rounds).

One-Time Pad

The question of 'unbreakable' mathematical ciphers might be posed at this juncture. Lets look at the famous one-time pad and see what it offers. The one-time pad is truly an unbreakable cipher system. The one - time pad consists of a nonrepetitive truly random key of letters or characters that is used just once. The key is written on special sheets of paper and glued together in a pad. The sender uses each key letter on the pad to encrypt exactly one plain text letter or character. The receiver has an identical pad and uses the key on the pad, in turn, to decrypt each letter of the cipher text. Each key is used exactly once and for only one message. The sender encrypts the message and destroys the pad's page. The receiver does the same thing after decrypting the message. We use a new message - new page and new key letters/numbers -- each time.

The one-time pad is unbreakable both in theory and in practice. Interception of cipher text does not help the cryptographer break this cipher. No matter how much cipher text the analyst has available, or how much time he had to work on it, he could never solve it.

The reason is that no pattern can be constructed for the key. The perfect randomness of the one time system nullifies any efforts to reconstruct the key or plain text via any of the following cryptanalytic methods described by Friedman: horizontal or lengthwise analysis, cohesion, re-assembly via Kasiski or Kerchhoffs’ columns, repeats or internal framework erection.

Brute force (trial and error) might bring out the true plain text but it would also yield every other text of the same length, and there is no way to tell which is the right one. It should be noted that the possible solutions increase as the message lengthens and rapidly reaches the point where all the computer power in the world, working together, would require decades and centuries to come up with all possible solutions. Only the hindsight of history would enable us to pick the right solution.

Supposing the key were stolen, would this help to predict future keys? No, because a random key has no underlying system to exploit. If it did, it would not be random.

A truly random key sequence XOR’ed with a nonrandom plain text message produces a completely random cipher text message and no amount of computing will change that. The one-time pad can be extended to encryption of binary data by computer. Instead of letters, we use bits.

Fresh Key Drawback

The one-time pad has a drawback--the quantities of fresh key required. For military messages in the field (a fluid situation) a practical limit is reached. It is impossible to produce and distribute sufficient fresh key to the units. During World War II, the US Army’s European theater HQ's transmitted, even before the Normandy invasion, 2 million five (5) letter code groups a day! It would have, therefore, consumed 10 million letters of key every 24 hours--the equivalent of a shelf of 20 average books.

Randomness

The real issue for the one-time pad is that the keys must be truly random. Attacks against the one-time pad must be against the method used to generate the key itself. Pseudo random number generators don't count; often they have nonrandom properties. Tests at the ICSA Cryptography Laboratory have confirmed that random number generation based on deterministic machine states needs to be based on at least 20 different state functions to prevent attack on the random number generator(s). More on this issue in a later chapter.

The Structure of Language

Linguistic anthropologists have used cryptography to reconstruct ancient languages by comparing contemporary descendants and in so doing make discoveries about history. Others make inferences about universal features of language, linking them to uniformity’s in the brain. Still others study linguistic differences to discover varied worldviews and patterns of thought in a multitude of cultures.

The Rossetta Stone found by the Egyptian Dhautpol and the French officer Pierre-Francois Bouchard near the town of Rosetta in the Nile Delta, gave us a look at Syriac, Greek and Egyptian Hieroglyphs all of the same text. The fascinating story of its decipherment is covered in Kahn. Of special interest was the final decipherment of the Egyptian writing containing homophones - different signs standing for the same sound.

Until the late 1950’s, linguists thought that the study of language should proceed through a sequence of stages of analysis. The first stage was phonology, the study of sounds used in speech. Phones are speech sounds present and significant in each language. They were recorded using the International Phonetic Alphabet, a series of symbols devised to describe dozens of sounds that occur in different languages.

The next stage was morphology, the study of forms in which sounds combine, to form morphemes--words and their meaningful constituents. The word cats has two morphemes /cat/ and /s/ indicating the animal and plurality. A lexicon is a dictionary of all morphemes. A morpheme is the smallest meaningful unit of speech. Isolating or analytic languages are those in which words are morphologically benign, like Chinese or Vietnamese. Agglutinative languages string together successive morphemes. Turkish is a good example of this. Inflection languages change the form of a word to mark all kinds of grammar distinctions, such as tense or gender. Indo-European languages tend to be highly inflectional. The next step was to study syntax, the arrangement and order of words in phrases and sentences.

Phonemes and Phones

No language contains all the sounds in the International Phonetic Alphabet. A Phoneme is the smallest unit of distinctive sound. Phonemes lack meaning in themselves but through sound contrasts distinguish meaning. We find them in minimal pairs, words that resemble each in all but one sound. An example is the minimal pair pit/bit. The /p/ and /b/ are phonemes in English. Another example is bit and beat which separates the phonemes /I/ and /i/ in English. Friedman describes similar phenomena called homologs and uses them to solve a variety of cryptograms.

Standard (American) English (SE), the region free dialect of TV network newscasters, has about thirty-five phonemes of at least eleven vowels and twenty-four consonants. The number of phonemes varies from language to language - from fifteen to sixty, averaging between thirty and forty. The number of phonemes varies between dialects. In American English, vowel phonemes vary noticeably from dialect to dialect. Readers should pronounce the words in Figure 3-12, paying attention to whether they distinguish each of the vowel sounds. We Americans do not generally pronounce them at all.

>>INSERT FIGURE 3-12 HERE<<

Phonetic symbols are identified by English words that include them; note that most are minimal pairs.

high front (spread) [i] as in beat

lower high front (spread) [i] as in bit

mid front (spread) [ea] as in bait

lower mid front (spread) [e] as in bet

low front [ae] as in bat

central [ua] as in butt

low back [a] as in pot

lower mid back (rounded) [ou] as in bought

mid back (rounded) [o] as in boat

lower high back (rounded) [U] as in put

high back (rounded) [u] as in boot

Phonetics studies sounds in general -- what people actually say in various languages.

Phonemics is concerned with sound contrasts of a particular language. In English /b/ and /v/ are phonemes, occurring in minimal pairs such as bat and vat. In Spanish, the contract between [b] and [v] doesn't distinguish meaning, and are not phonemes. The [b] sound is used in Spanish to pronounce words spelled with either b or v. (Non phonemic phones are enclosed in brackets).

In any language, a given phoneme extends over a phonetic range. In English the phoneme /p/ ignores the phonetic contrast between the [pH] in pin and the [p] in spin. How many of you noticed the difference? [pH] is aspirated, so that a puff of air follows the [p]. Not true with [p] in spin. To see the difference, light a match and watch the flame as you say the two words. In Chinese the contrast between [p] and [pH] is distinguished only by the contrast between an aspirated and unaspirated [p].

Historical Linguistics

Knowledge of linguistic relationships is often valuable to determine the events of the past 5000 years. By studying contemporary daughter languages, past language features can be reconstructed. Daughter languages descend from the same parent language that has been changing for thousands of years. The original language from which they diverge is called a "protolanguage." French and Spanish are daughter languages of Latin. Language evolves over time into subgroups (closely related taxonomy) but with distinct cultural differences. Figure 3-13 shows the main languages and subgroups of the Indo European language stock.

All these daughter languages have developed out of the protolanguage (Proto-Indo-European) spoken in Northern Europe about 5,000 years ago. [Note subgroupings.] English, a member of the Germanic branch, is more closely related to German and Dutch than it is to Italic or Romance languages such as French and Spanish. However, English shares many linguistic features with French through borrowing and diffusion.

The doctrine of linguistic relativity is central to cryptographic treatment of language ciphers. It states that all known languages and dialects are effective means of communication.

Nichols’ Theorem states that if, they are linguistically related, they can be codified, enciphered deciphered and treated as cryptographic units for analysis and statistical treatment.

>>INSERT FIGURE 3-13 HERE<<

Main Languages of Indo-European Stock

Dead Languages

Figure 3-14 pertains to live languages. Professor Cyrus H. Gordon in his fascinating book "Forgotten Scripts", shows how cryptography is used to recover ancient writings. He tells the story of the unraveling of each of these ancient languages: Egyptian, Old Persion, Sumer-Akkadian, Hittite, Ugaritic, Eteocretan, Minoan and Eblaite. He specializes in cuneiform and hieroglyphic inscriptions and gives us a glimpse into the ancient societies that gave birth to the Western world.

Cryptographic Threads

There is a common cryptographic thread for most languages. All known writing systems are partly or wholly phonetic, and express the sounds of a particular language. Writing is speech put in visible form, in such a way that any reader instructed in its conventions can reconstruct the vocal message. Writing as "visible speech" was invented about five thousand years ago by Sumerians and almost simultaneously by ancient Egyptians.

The ancient Mayan knew that it was 12 cycles, 18 katuns, 16 tuns, 0 uinals, and 16 kins since the beginning of the Great Cycle. The day was 12 Cib 14 Uo and was ruled by the seventh Lord of the Night. The moon was nine days old. Precisely 5,101 of our years and 235 days had passed. So said the ancient Mayan scribes. We remember the day as 14 May 1989.

Writing Systems

Three kinds of writing systems have been identified: Rebus which is a combination of logograms and phonetic signs; Syllabic such as CV - consonant vowel such as Cherokee or Inuit; and Alphabetic, which is phonemic, the individual consonants and vowels make up the sounds of the language. Writing systems can also be classified by their signs. Table 3-4 differentiates writing systems by the number of signs used.

Table 3-4

Writing System No. of Signs

Logographic

Sumerian 600+

Egyptian 2,500

Hittite Hieroglyphic 497

Chinese 5,000+

"Pure" Syllabic

Persian 40

Linear B 87

Cypriote 56

Cherokee 85

Alphabetic or Consonantal

English 26

Anglo-Saxon 31

Sanskrit 35

Etruscan 20

Russian 36

Hebrew 22

Arabic 28

Linguist Michael D. Coe classifies the entire group of Proto-Mayan languages into fourteen daughter divisions of Proto-Mayan, and thirty one sub languages from Huastec to Tzuthil. He presents an extraordinary story of applied cryptanalysis and applied linguistics.

Xenocrypts – Language Ciphers

Xenocrypts are foreign language substitutions. Xenocrypts represent a cultural universal expressed at its common denominator - mathematics. It is the author’s contention that any language can be learned from its cryptographic building blocks. To understand the building blocks we can look at underlying structure of language. Furthermore, most languages share the common framework of mathematics and statistics. To be able to solve Xenocrypts, it is only necessary to learn the basic (group) mathematical structure of the language, to use a bidirectional translation dictionary and to recognize the underlying cipher construct.

Ciphers start with the problem of recognizing the language and then the distribution of characters within the particular language. The legendary W. F. Friedman once remarked: "treating the frequency distribution as a statistical curve, when such treatment is possible, is one of the most useful and trustworthy methods in cryptography."

Table 3-5 gives the frequency distributions of ten languages developed from various sources. Frequencies of letters, and their order, are not fixed quantities in any language. Group frequencies, however, are fairly constant in every_language. This is the common thread--the linguistic_relativity of all languages.

Table 3-5

Xenocryptic Frequency Data, %

16 8 7 6 5 4 2 <1

NORWEGIAN: E RNS T AI LDO GKM UVFHPA' JBO' YAECWXZQ

10 9 7 6 4 3 <2

LATIN: I E UTA SRN OM CPL (balance)

18 8 7 6 5 4 3 2 <1

FRENCH: E AN RSIT UO L D CMP VB F-Y

14 13 12 8 6 5 4 3 2 <1

PORTUGUESE: A E O RS IN DMT UCL P QV (balance)

18 11 8 7 5 4 3 2 <1

GERMAN: E N I RS ADTU GHO LBM CW (balance)

15 12 8 7 5 4 3 1 <1

CATALAN: E A S ILRNT OC DU MP BVQGF (balance)

16 13 8 6 5 4 3 <2

HUNGARIAN: E A T OS LNZ KIM RGU (balance)

13 12 11 9 7 6 5 3 2 <1

ITALIAN: E A I O L NRT SC DMO'U VG balance)

20 10 7 6 5 4 3 2 <1

DUTCH: E N IAT O DL S GKH UVWBJMPZ (balance)

13 9 8 7 5 4 3 1 <1

SPANISH: EA O S RNI DL CTU MP GYB (balance)

[Special characters are generally not available in classical or BC – before computer - cryptograms. They are reduced to other constructions such as an umlaut = ae. ]

Kullback gives the following tables for Monoalphabetic and Digraphic texts for eight languages correlated with the number of letters N in a cryptogram. The constants may be derived from probability data for a given language:

Table 3-6

Monoalphabetic Digraphic

Text Text

English 0.0661N(N-1) 0.0069N(N-1)

French 0.0778N(N-1) 0.0093N(N-1)

German 0.0762N(N-1) 0.0112N(N-1)

Italian 0.0738N(N-1) 0.0081N(N-1)

Japanese 0.0819N(N-1) 0.0116N(N-1)

Portuguese 0.0791N(N-1)

Russian 0.0529N(N-1) 0.0058N(N-1)

Spanish 0.0775N(N-1) 0.0093N(N-1)

Random Text

Monographic Digraphic Trigraphic

.038N(N-1) .0015N(N-1) .000057N(N-1)

German Reduction Ciphers - Traffic Analysis

A small sister to cryptanalysis is the science of traffic analysis. Traffic Analysis is the branch of signal intelligence analysis, which deals with the study of external characteristic of signal communications.

The information was used: 1) to effect interception, 2) to aid cryptanalysis, 3) to rate the level and value of intelligence in the absence of the specific message contents and 4) to improve the security in the communication nets. Traffic analysis was a primary reason for the cracking of the German Codes in World War II. {Unfortunately, the same principles worked on the British and American Codes as well.} The German Army was dedicated to unquestioned organization. Paperwork and radio messages must flow to the various military units in a prescribed manner.

Components

Allowing for historical differences in language, procedure signs and signals, there were six standard elements for military radio communications systems. These were: 1) call-up, 2) order of traffic, 3) transmission of traffic, 4) receipting for traffic, 5) corrections and services, and 6) signing off.

In order to insure proper handling of messages in the field and message center, some information was sent in the clear or using simple coding. This information about routing and accounting was usually in the beginning or ending of a message. This included: 1) Serial numbers, message center number, 2) Group Count, 3) File Date and Time [like a PGP signature] 4) Routing System - origin, destination and relay, (distinction is made as to action or FYI locations) 5) Priority (important information was originally signal flashed - hence the term FLASH message for urgent message) 6) transmission and delivery procedure, 7) addresses and signatures, 8) special instructions. As a rule, German high-echelon traffic contained most of these items and German low-echelon traffic cut them to a minimum.

The German penchant for organization could be seen in the way they handled serial numbers. Any radio message flowing from division level to soldier in the field would have a reference serial number attached in clear or matrix cipher, by the writer, the HQ message center, the signal center or code room, the "in desk", the transmitter, linkage, and/or operator. The routing system usually consisted of a code and syllabary that represented the location or unit.

You can see where modern E-Mail and word processing systems have made some of this information easier to handle by the portable desk idea but traffic analysis would still apply.

Compare the six traffic elements above to the modern network computer packet. Look at the information flow in terms of the OSI model and notice that all six elements have their corresponding "sisters" in the headers and protocols used to route information.

American cryptographers were adept in determining the German Order of Battle from their cryptonets (ex. from intercepts from Corp to theater). Traffic analysis not only gave the locations but the communication relationships between units or groups of units in the field. Some German commands were allowed latitude in their compositions of codes and ciphers. This proved to be an exploitable fault in the German security.

Applications to Cryptanalysis

Traffic analysis yields information via Crib messages, Isologs and Chatter. Crib messages assume a partial knowledge of the underlying plain text through recognition of the external characteristics. Command "sitreps" (situation reports) up and down German channels, were especially easy for American "crypees". The origin, serial number range, the cryptonet id, report type, the file date and time, message length and error messages in the clear, gave a clear picture of the German command process. German order of battle, troop dispositions and movements were deduced by traffic analysis.

An Isolog exists when the underlying plain text is encrypted in two different systems. They exist because of relay repetition requirements, book messages to multiple receivers (spamming would have been a definite no-no), or error by the code clerk. American cryptographers were particularly effective in obtaining intelligence by this method.

Traffic analysis boils down to finding the contact relationships among units, tracking their movements, building up the cryptonet authorities, capitalizing on lack of randomness in their structures, and exploiting book and relay cribs. American intelligence was quite successful in this endeavor against the Germans as well as the Japanese in World War II.

ADFGVX

"Weh dem der leugt und Klartext funkt" - Lieutenant Jaeger German 5th Army. ["Woe to him who lies and radios in the clear"] Jaeger was a German code expert sent to stiffen the German Code discipline in France in 1918. Ironically, the double "e" in Jaeger's name gave US Army traffic analysis experts a fix on code changes in 1918.

ADFGVX, is one of the best known field ciphers in the history of cryptology. Originally a 5 x 5 matrix of just 5 letters, ADFGX, the system was expanded on June 1, 1918 to a 6th letter V. The letters were chosen for their clarity in Morse: A .-, D -.., F ..-., G --., V ...-, and X -..-.

W. F. Friedman describes one of the first traffic analysis charts regarding battle activity from May to August, 1918 at Marne, and Rheims, France. It was based solely on the ebb and flow of traffic in the ADFGVX cipher. This cipher was restricted to German High Command communications between and among the headquarters of divisions and army corps.

The ADFGVX cipher was considered secure because it combined both a good substitution (bipartite fractionation or two-part fractionation enciphering system) and an excellent transposition in one system. During the eight-month history of this cipher, only 10 keys were recovered by the Allies (in 10 days of heavy traffic) and fifty percent of the messages on these days were read. These intercepts effected the reverse of the German advances (15 divisions) under Ludendorff at Montdidier and Compiegne, about 50 miles North of Paris. Solution by the famed French Captain Georges Painvin was based on just two specialized cases. No general solution for the cipher was found by the Allies. In 1933, William Friedman and the SIS found a general solution. French General Givierge, of the Deuxieme Bureau also published a solution to the general case.

The June 3 message that Painvin cracked which changed the course of World War I:

From German High Command in Remaugies: Munition-ierung beschleunigen Punkt Soweit nicut eingesehen auch bei Tag or "Rush Munitions Stop Even by day if not seen."

Cipher text starts: CHI-126: FGAXA XAXFF FAFFA AVDFA GAXFX FAAAG

This told the Allies where and when the bombardment preceding the next major

German push was planned.

ADFGVX Cryptanalysis

According to William F. Friedman, there were only three viable ways to attack this cipher (at the time). The first method required 2 or messages with identical plain text beginnings to uncover the transposition. Under the second method, 2 or more messages with plain text endings were required to break the flat distribution shield (the flatter the distribution the more random the ciphertext and hence, the more difficult the cracking of the cipher) of the substitution part of the cipher. The German addiction to stereotyped phraseology was so prevalent in all German military communications that in each day’s traffic, messages with similar endings and beginnings were found (sometimes both). The third method required messages with the exact same number of letters. Painvin used the first two methods when he cracked the 5 letter ADFGX version in April, 1918.

Lest we underestimate the difficulty of this cipher, we might step behind Painvin’s shoulders as he worked. At 4:30 am on March 21, 6000 guns opened fire on the Allied line at Somme. Five hours later, 62 German Divisions pushed forward on a 40-mile front. Radio traffic increased dramatically, Painvin had just a few intercepts in the ADFGX cipher and the longer ones had been split in three parts to prevent anagramming.

Five letters, therefore, a checkerboard? Maybe. Simple monoalphabetic cipher? No, too flat of a distribution.

The German oddity of first parts of messages with identical bits and pieces of text in the same order in the cryptograms begin to show. Painvin feels the oddity could most likely have resulted from transposed beginnings according to the same key; the identical tops of the columns of the transposition tableau. Painvin sections the cryptograms by timeframe:

chi-110: (1) ADXDA (2) XGFXG (3) DAXXGX (4) GDADFF

chi-114: (1) ADXDD (2) XGFFD (3) DAXAGD (4) GDGXD

He does this with 20 blocks to reconstruct the transposition key. Using the principal - long columns to the left, he finds segments 3,6,14, 18 to left. Balance clustered to right. Using other messages with common endings (repeated), he segments the columns to the left. Correctly? No. He uses 18 additional intercepts to juxtaposition 60 letters AA's, AD's, etc. Using frequency count, he finds a monoalphabetic substitution. He finds column 5-8 and 8-5 are inverted.

Painvin sets up a skeleton checkerboard - he assumes correctly the order to be side-top:

A D F G X

A

D e

F

G

X

Since the message was 20 letters, the order might be side-top, repeated, meaning side coordinates would fall on 1st, 3rd, 5th positions during encipherment, so he separates them by frequency characteristics. In 48 hours of incredible labor, Painvin pairs the correct letters and builds the checkerboard, solving the toughest field cipher the world had yet seen. A cipher that defends itself by fractionation - the breaking up of plain text letters equivalents into pieces, with the consequent dissipation of its ordinary characteristics. The transposition further scatters these characteristics in a particularly effective fashion, while dulling the clues that normally help to reconstruct a transposition.

Arabian Contributors to Cryptology

Dr. Ibrahim A. Al-Kadi gave an outstanding 1990 paper to the Swedish Royal Institute of Technology in Stockholm regarding the Arabic contributions to cryptology.

Dr. Al-Kadi reported on the Arabic scientist by the name of Abu Yusuf Yaqub ibn Is-haq ibn as Sabbah ibn 'omran ibn Ismail Al- Kindi, who authored a book on cryptology the "Risalah fi Istikhraj al-Mu'amma" (Manuscript for the Deciphering Cryptographic Messages) circa 750 AD. Al-Kindi introduced cryptanalysis techniques, classification of ciphers, Arabic Phonetics and Syntax and most importantly described the use of several statistical techniques for cryptanalysis. [This book apparently antedates other cryptology references by 300 years. It also predates writings on probability and statistics by Pascal and Fermat by nearly 800 years.]

Dr. Al-Kadi also reported on the mathematical writings of Al- Khwarizmi (780-847) who introduced common technical terms such as zero, cipher, algorithm, algebra and Arabic numerals. The decimal number system and the concept of zero were originally developed in India.

In the early ninth century the Arabs translated Brahmagupta's "Siddharta" from Sanscrit into Arabic. The new numerals were quickly adopted throughout the Islamic Empire from China to Spain. Translations of Al-Khwarizmi's book on arithmetic by Robert of Chester, John of Halifax and the Italian Leonardo of Pisa, aka Fibonacci strongly advocated the use of Arabic numerals over the previous Roman Standard Numerals (I,V,X,C,D,M).

The Roman system was very cumbersome because there was no concept of zero or (empty space). The concept of zero, which we all think of as natural, was just the opposite in medieval Europe. In Sanskrit, the zero was called "sunya" or "empty". The Arabs translated the Indian into the Arabic equivalent "sifr". Europeans adopted the concept and symbol but not name, but transformed it into Latin equivalent "cifra" and "cephirium" {Fibonnaci did this}. The Italian equivalent of these words "zefiro", "zefro" and "zevero". The latter was shortened to "Zero".

The French formed the word "chiffre" and conceded the Italian word "zero". The English used "zero" and "Cipher" from the word ciphering as a means of computing. The Germans used the words "ziffer" and "chiffer".

The concept of zero or sifr or cipher was so confusing and ambiguous to common Europeans that in arguments people would say "talk clearly and not so far fetched as a cipher". Cipher came to mean concealment of clear messages or simply encryption. Dr. Al-Kadi concluded that the Arabic word sifr, for the digit zero, developed into the European technical term for encryption.

Nihilist Substitution

For some reason, Russian prisoners were not allowed computers in their cells. This might have been because at the time computers were too big to be hidden under their shirts. Russian prisoners were forbidden to communicate with each other. To outwit their jailers they invented a "knock" system to indicate the rows and columns of a simple checkerboard (Polybius square at 5 x 5 for English or 6 x 6 for 35 Russian letters). For ex:

1 2 3 4 5

1 U N Ij T E

2 D S A O F

3 M R C B G

4 H K L P Q

5 V W X Y Z

Keyword = United States of America

Repeated letters in the keyword were used only once in the square the first time encountered.

Plain text: g o t a c i g a r e t t e

Cipher text: 35 24 14 23 33 13 35 23 32 15 14 14 15

Prisoners memorized the proper numbers and "talked" at about 10-15 words per minute. One of the advantages was that it afforded communication by a great variety of media--anything that could be dotted, knotted, pierced, flashed or indicate numerals in any way could be used. The innocuous letter was always suspicious.

Cipher text letters were indicated by the number of letters written together; breaks in count by spaces in handwriting; upstrokes, downstrokes, thumbnail prints, all subtly used to bootleg secrets in and out of prisons. The system was universal in penal institutions. American POW's used it in Vietnam. Transposition of the keyword provided a further mixed alphabet:

B L A C K S M I T H

D E F G N O P Q R U

V W X Y Z

Taken off by columns:

B D V L E W A F X C G Y K N Z S O M P I Q T R H U

the Polybius square would be:

1 2 3 4 5

1 B D V L E

2 W A F X C

3 G Y K N Z

4 S O M P I

5 Q T R H U

The Nihilists, so named for their opposition to the czarist regime, added a repeating numerical keyword (KW) which made the cipher a periodic similar to the Vigenere (discussed later) but with additional weaknesses.

Let the KW = ARISE 22 53 45 41 15

Plain: b o m b w i n t e r p a l a c e

Numerical: 11 42 43 11 21 45 34 52 15 53 44 22 14 22 25 15

Key: 22 53 45 41 15 22 53 45 41 15 22 53 45 41 15 22

Cipher: 33 97 88 52 36 67 87 97 56 68 66 75 59 63 40 37

or with bifurcation into five number groups :

33978 85236 67879 75668 66755 96340 37774

nulls=774

Nihilist Transportation

A simpler form of the Nihilist is a double transposition. The plain-text was written in by rows (or diagonals); a keyword switched the rows; a same or different keyword switched the columns, and the resulting cipher text was removed by columns or by one of forty (40) or more routes out of the square.

ex: KW = SCOTIA or 524631

Plain: let us hear from you at once concerning jewels xxxx

Transpose by Columns Transpose by Rows

S C O T I A

5 2 4 6 3 1 1 2 3 4 5 6

1 S E U H T L (let us h) S 5 E U J W T O

2 R A F O R E C 2 R A F O R E

3 A Y U T O M O 4 A N E B C O

4 A N E B C O T 6 X L X X S E

5 E U J W T O I 3 A Y U T O M

6 X L X X S E A 1 S E U H T L

X is a bad choice for nulls because it is obviously a low frequency letter and therefore gives positional information about the keyword length.

The resulting cryptogram:

E U J W T O R A F O R E A N E B C O X L X X S E A

Y U T O M S E U H T L.

(message length and 5th group are entries to solution)

Clues to cryptanalysis of the Nihilist systems were reconstructing the routes, evenness of distribution of vowels, period determination and digram/trigram frequency in cipher text.

Chinese Cryptography

Dr. Dan August found that the Four-Corner System and the Chinese Phonetic Alphabet System lend themselves to manual cryptographic treatment. His treatment of these two systems is easier to understand than some military texts on the subject. Let a message in Chinese be X1, X2, and X3. Xn, where Xi represents a character. The code for Xi is vector union of three sets, v1, v2, and v3. V1 is a single digit code for tone, v2 is a four or five digit Four Corner representation code, and v3 is a 6 digit phonetic code representing 3 phonetic symbols each by two digits.

3

Xj = U S v1 Eq.13-1

1-3

This union is called an asymmetric code. The Four Corner System encodes characters into several generic shapes. Each character is broken into four (4) quadrants, and assigned a digit to the generic shape that best corresponds to the actual shape.

The Chinese Phonetic Alphabet is Pinyin with symbols instead of English letters. Each symbol corresponds to one of 37 ordered phonetic sounds. The 21 initial, 3 medial and 13 finals are a unique ordered set - a true alphabet. The strength of encryption of Chinese is dependent on the specific Chinese encoding character schemes. Three cases are:

  1. Phonetic Alphabet Only: The cipher must include both a transposition (to hide cohesion and positional limitations) and a substitution (to hide the frequency patterns.)
  2. Four Corner System: The cipher can be based on ring operations [performed on codewords rather than characters, either on an individual basis or over the whole message; the name comes from the algebraic operations involving integers mod 10 or mod 37] which super-encipher the encoded text.
  3. Combination of Methods 1) and 2): A text encoded by a combination of both methods will need a cipher employing both transposition and substitution. The transposition needs to mix up the symbols within codewords and the message itself. This prevents a bifurcated analysis.

In Chinese there is more dependence between encoding and enciphering operations than in English. The choice of the encoding system influences the type of enciphering operations.

Historical Perspectives - China

China appears to have had a much-delayed entry into the cipher business. Partially because so many Chinese did not read or write, and partially because the language was so complex, Chinese cryptography was limited until the 19 century. But there were seeds:

The Chinese strategist Sun Tzu (500 b.c.) recommended a true but small code, which limited the plaintext to 40 elements and assigned them to the first 40 characters of a poem, forming a substitution table. Richard Deacon describes a method of code encryption, which the secret society Triads used in the early 1800's. The Tongs in San Francisco used the same system. This method limited the plaintext space and based codewords on multiples of three.

The "Inner Ring" techniques taught to "Sa Bu Nims" (teachers) by the masters of Korean Tae Kwon Do (which came from the Ancient Tae Kwan and before that Kung Fu) were passed on by means of codeword transposition ciphers. In 1885, Sun Yat-Sen used codes to transmit information by telegraph. During WWII, Herbert Yardley taught Kuomintang (Chinese Nationalist) soldiers to cryptanalyze Japanese ciphers. However, the Japanese had already outpaced the Chinese in cryptanalytical abilities.

Japan's Chuo tokujobu (Central Bureau Of Signal Intelligence) was responsible for crypto-communication and signal intelligence, including cryptanalysis, translation, interception, and direction finding against the Soviet Union, China and Britain. It began operations in 1921.

In May 1928, the Angohan (Codes and Ciphers Office) obtained excellent results in intercepting and decoding Chinese codes during the Sino-Japanese clash at Tsinan between Chiang Kaishek's Northern Expeditionary Army and the IJA (Imperial Japanese Army).

The warlord Chang Tso-lin was murdered in June 1928. Angohan succeeded in decoding "Young Marshal" Chang Hsueh-liang's secret communications and made a substantial contribution to the understanding of the warlord politics of Manchuria.

The Anjohan not only mastered the basics of Chinese codes and ciphers but also broke the Nanking Government and the Chinese Legation codes in Tokyo.

The Chinese codes in 1935 were called "Mingma". They were basically made up of four digit numbers. The Chinese did not encode the name of either the sender or receiver, nor the date or the time of the message. The China Garrison Army's Tokujohan office was able to disclose the composition, strength, and activities of Chiang Kai-shek's branch armies, such as those led by Sung Che-yuan and Chang Hseuh-liang. It was not able to decode the Chinese Communist or Air Force messages.

By the time of the 1937 Sino-Japanese War, Japanese cryptanalytical experts had been able to greatly expand their knowledge of the Chinese system of codes and ciphers, as well as improve their decoding skills. About 80% of what was intercepted were then decoded. This included military and diplomatic codes but not the Communist code messages.

Kuomintang upgraded their Mingma codes in 1938. They adopted a different system, called tokushu daihon (special codebook) in Japanese which complicated by mixing compound words. By October, 1940, Chiang Kai-shek's main forces were using a repeating key system. This stumped the Japanese cryptanalysts for a short time, then they returned to a 75% decoding level during the war. They continued to make great contributions to major military operations in China.

The Japanese broke the Kuomintang codes during the Chungyuang Operation in the Southern Shansi or Chungt'iao Mountain Campaign. In February 1941, significant penetration of Communist signal traffic was obtained.

The tokujo operations against the North China Area Army and the Chinese Communist codes were a tragic failure. The IJA's China experts held a highly negative image towards the Chinese.

This may have prejudiced their attitude towards intelligence estimates of China and the Chinese, which in turn adversely affected their operational (crypto-intelligence) thinking on China in general.

When the Sian mutiny broke out and Chiang Kai-shek was kidnapped in December 1936, Major General Isogai (IJA's leading expert in COMINT for China) toasted (more like roasted) the demise of Chiang. Colonel Kanji Ishiwara (Japan's chief military strategist) deplored the incident because he felt China was on the brink of unity because of Chiang Kai-shek's efforts. He considered the ability to read Chiang's codes just a matter of doing the business of war.

Wrap-Up

The history of cryptography holds many surprises and innovations. Of special interest is the Universality of Cryptography as applied to all languages. This important concept reports that all languages both live and dead may be characterized by their letter, sound or symbolic behavior. It turns out that similar groups of information, such as vowel relationships with specific consonants, carry through in the cipher text and are potentially identifiable in the cipher text. This is true even in the most difficult of cipher text.

The doctrine of linguistic relativity is central to cryptographic treatment of language ciphers. It states that all known languages and dialects are effective means of communication.

Nichols’ Theorem states that if, they are linguistically related, they can be codified, enciphered deciphered and treated as cryptographic units for analysis and statistical treatment.

Figure 3-1

W H I T E

....................

W . A B C D E

.

H . F G H IJ K

.

I . L M N O P

.

T . Q R S T U

.

E . V W X Y Z

Figure 3-2

1 2 3 4 5 6 7 8 9 0

.................................

1 . A B C D E F G H I J

2 . K L M N O P Q R S T

3 . U V W X Y Z . , : ;

 

Figure 3-3 Figure 3-4

5 6 7 8 9 0 1 2 3 4 5 6 7 8 9

.................... ............................

1 . A B C D E F 1 . A B C D E F G H I

2 . G H IJ K L M 2 . J K L M N O P Q R

3 . N O P Q R S 3 . S T U V W X Y Z *

4 . T UV W X Y Z

Figure 3-5 Figure 3-6

M U N I C H A B C D E F G H I

.................... .............................

B .A 7 E 5 R M A . A D G J M P S V Y

E .G 1 N Y B 2 B . B E H K N Q T W Z

R .C 3 D 4 F 6 C . C F I L O R U X 1

L .H 8 I 9 J 0 D . 2 3 4 5 6 7 8 9 0

I .K L O P Q S

N .T U V W X Z

Figure 3-7

B C D F G

..............

A . A B C D E

E . F G H IJ K

I . L M N O P

O . Q R S T U

U . V W X Y Z

 

Figure 3-8

1 2 3 4 5

..................

61 . A B C D E

72 . F G H IJ K

83 . L M N O P

94 . Q R S T U

05 . V W X Y Z

A=611 , B=612 X=053

 

Figure 3-9

1 2 3 4 5

..................

09 . H Y D R A

15 . U L IJ C B

21 . E F G K M

27 . N O P Q S

33 . T V W X Z

Figure 3-10

A - 111 G - 131 M - 221 S - 311 Y - 331

B - 112 H - 132 N - 222 T - 312 Z - 332

C - 113 I - 133 O - 223 U - 313 * - 333

D - 121 J - 211 P - 231 V - 321

E - 122 K - 212 Q - 232 W - 322

F - 123 L - 213 R - 233 X - 323

 

Figure 3-11

A - 112 H - 312 O - 223 V - 222

B - 121 I - 213 P - 313 W - 311

C - 211 J - 232 Q - 131 X - 321

D - 212 K - 323 R - 331 Y - 111

E - 221 L - 231 S - 332 Z - 113

F - 122 M - 132 T - 133

G - 123 N - 322 U - 233

Myers Signal Directions

3 - End of a word

33 - End of a sentence

333 - End of message

22.22.22.3 - Signal of assent. Message understood

22.22.22.333 - Cease signaling

121.121.121.3 - Repeat

212121.3 - Error

211.211.211.3 - Move a little to the right

221.221.221.3 - Move a little to the left

 

Figure 3-3a Figure 3-4a

1 2 3 4 5 1 2 3 4 5 6 7 8 9

................ ............................

1 . A B C D E 1 . A B C D E F G H I

2 . F G H IJ K 2 . J K L M N O P Q R

3 . L M N O P 3 . S T U V W X Y Z *

4 . Q R S T U

5 . V W X Y Z

where: A = 11, R=42 Z=55

Figure 3-12

Vowel Phonemes

Standard American English

According to Height of Tongue and Tongue Position

in Front, Center and Back of Mouth

Tongue High

i u

I U

ea ua o

e ou Mid

ae a

Tongue Low

Tongue Central Tongue

Front Back

Figure 3-13

Chomsky Model

For Message From Speaker to Hearer

or Writer on Both Sides

... Sounds (phonological component)...

. .

. .

. .

Surface-structure sentence Surface-structure sentence

. .

. .

Transformational rule Transformational rule

. .

. .

Deep structure sentence Deep structure sentence

. .

. .

. .

Thought Thought

(meaning, semantic component (meaning, semantic component

^

SPEAKER HEARER

Figure 3-13a

Main Languages of Indo-European Stock

INDO-EUROPEAN

.

............................................................

. . . .

. . . .

CELTIC ITALIC GERMANIC .

. . . . . . . . .

. . . . .

o Welsh . . . .

o Irish . West North .

o Scots Gaelic . . . .

o Breton . . . .

. . . .

ROMANCE o Dutch o Danish .

. o English o Icelandic .

Latin o Flemish o Norwegian .

. o Frisian o Swedish .

. o German .

o Catalan o Yiddish .

o French .

o Italian .

o Portuguese .

o Provencal .

o Rumanian .

o Spanish .

.

.

............................................................. . . .

. . . .

HELLENIC Albanian . .

. . .

. Armenian .

Ancient Greek .

. .

. .

Greek .

.

.

...............................................

Figure 3-14

Main Languages of Indo-European Stock

INDO-EUROPEAN

.

............................................................

. . . .

. . . .

CELTIC ITALIC GERMANIC .

. . . . . . . . .

. . . . .

o Welsh . . . .

o Irish . West North .

o Scots Gaelic . . . .

o Briton . . . .

. . . .

ROMANCE o Dutch o Danish .

. o English o Icelandic .

Latin o Flemish o Norwegian .

. o Frisian o Swedish .

. o German .

o Catalan o Yiddish .

o French .

o Italian .

o Portuguese .

o Provencal .

o Rumanian .

o Spanish .

.

.

............................................................. . . .

. . . .

HELLENIC Albanian . .

. . .

. Armenian .

Ancient Greek .

. .

. .

Greek .

.

.

...............................................

. . .

. . .

INDO-IRANIAN BALTIC SLAVIC

. . .

. . .

. o Latvian o Bulgarian

. o Lithuanian o Czech

. o Macedonian

. o Polish

o Old Persian o Russian

o Persian o Serbo-Croatian

o SANSKRIT o Slovak

. o Slovenian

. o Ukrainian

.

o Bengali

o Hindi

o Punjabi

o Urdu

Backward Forward
Chapter: 1 | 2 | 3 | 4 | 5 | 6 | 7 | 9 | 10

Reserve your copy at a
Beta Bookstore near you!
Contact Bet@books
© 1998 The McGraw-Hill Companies, Inc. All rights reserved.
Any use of this Beta Book is subject to the rules stated in the Terms of Use.

Beta Books | Beta Bookstores | Computing McGraw-Hill

Professional Publishing Home | Contact Us | Customer Service | For Authors | International Offices | New Book Alert | Search Catalog/Order | Site Map | What's New


A Division of the McGraw-Hill Companies
Copyright © 1998 The McGraw-Hill Companies. All rights reserved. Any use is subject to the Terms of Use; the corporation also has a comprehensive Privacy Policy governing information we may collect from our customers.