Breaking a CÊsar Cipher
Introduction
This presents two different methods of breaking a CÊsar cipher using statistics. The ciphertext is
VRVRVHDPVWUHVV; the frequency counts are:
D 1 0.0714 R 2 0.1429 W 1 0.0714
H 2 0.1429 U 1 0.0714
P 1 0.0714 V 6 0.4286
As a preliminary, we note that the unicity distance of a CÊsar cipher is roughly 1.5, so 14 characters
should be more than enough to find a unique key and decryption.
Method #1: frequency counts
The most frequent letters in English are ETAONRISH. The following table summarizes the overlap,
assuming the key is the number indicated:
key
ct
letters
key
ct
letters
key
ct
letters
key
ct
letters
0
2
RH
7
4
VHPU
13
3
RVU
20
2
HU
1
2
PU
8
3
VPW
14
3
VHW
21
1
VD
2
3
VPW
9
2
RW
15
4
HDPW
22
3
DPW
3
6
VRHDWU
10
2
RD
16
3
HDU
23
1
P
4
3
VRW
11
2
DP
17
2
VR
24
2
RP
5
1
W
12
2
DU
18
2
WU
25
3
RHD
6
1
Y
19
2
WU
The key value 3 leaps out. Notice that had we counted differences, the same thing would happen, as the
differences are the same as the counts.
Method #2: correlations
For this, we need frequency counts of the letters in plaintext, and for this purpose (since we don't have the
plaintext handy) we simply assume the plaintext follows the 1-gram, independent model. Note that this is
usually a poor assumption since in English the letters are not independent, but we'll try it and see what
happens.
The relevant equation is:
S0æt<26Ft(c)p(t-k) = 0.0714[p(3-k) + p(15-k) + p(20-k) + p(22-k)] +
0.1429[p(7-k) + p(17-k)] + 0.4286p(21-k)
and the correlations over the values of k come out to be:
key
r
key
r
key
r
key
r
0
0.0281
7
0.0590
13
0.0550
20
0.0252
1
0.0295
8
0.0425
14
0.0513
21
0.0500
2
0.0639
9
0.0331
15
0.0388
22
0.0253
3
0.0762
10
0.339
16
0.0400
23
0.0186
4
0.0514
11
0.0229
17
0.0763
24
0.0252
5
0.0160
12
0.0183
18
0.0330
25
0.0341
6
0.0214
19
0.0312
The values 3 and 17 are just about equal, so both must be tried.
Answer
The key is 3; the message is SOSOSEAMSTRESS.