Breaking a Cæsar Cipher
Introduction
This presents two different methods of breaking a Cæsar cipher
using statistics. The ciphertext is VRVRVHDPVWUHVV;
the frequency counts are:
D | 1 | 0.0714 | R | 2 | 0.1429 | W | 1 | 0.0714 |
H | 2 | 0.1429 | U | 1 | 0.0714 | |
P | 1 | 0.0714 | V | 6 | 0.4286 | |
As a preliminary, we note that the unicity distance of a Cæsar
cipher is roughly 1.5, so 14 characters should be more than enough to
find a unique key and decryption.
Method #1: frequency counts
The most frequent letters in English are ETAONRISH.
The following table summarizes the overlap,
assuming the key is the number indicated:
key | ct | letters |
key | ct | letters |
key | ct | letters |
key | ct | letters |
0 | 2 | RH |
7 | 4 | VHPU |
13 | 3 | RVU |
20 | 2 | HU |
1 | 2 | PU |
8 | 3 | VPW |
14 | 3 | VHW |
21 | 1 | VD |
2 | 3 | VPW |
9 | 2 | RW |
15 | 4 | HDPW |
22 | 3 | DPW |
3 | 6 | VRHDWU |
10 | 2 | RD |
16 | 3 | HDU |
23 | 1 | P |
4 | 3 | VRW |
11 | 2 | DP |
17 | 2 | VR |
24 | 2 | RP |
5 | 1 | W |
12 | 2 | DU |
18 | 2 | WU |
25 | 3 | RHD |
6 | 1 | Y |
| | |
19 | 2 | WU |
| | |
The key value 3 leaps out. Notice that had we counted differences, the
same thing would happen, as the differences are the same as the
counts.
Method #2: correlations
For this, we need frequency counts of the letters in plaintext, and for
this purpose (since we don't have the plaintext handy) we simply assume
the plaintext follows the 1-gram, independent model. Note that this is
usually a poor assumption since in English the letters are not
independent, but we'll try it and see what happens.
The relevant equation is:
SUM0<=t<26Ft(c)p(t-k) =
0.0714[p(3-k) + p(15-k) + p(20-k) + p(22-k)] +
0.1429[p(7-k) + p(17-k)] + 0.4286p(21-k)
and the correlations over the values of k come out to be:
key | r | key | r |
key | r | key | r |
0 | 0.0281 | 7 | 0.0590 |
13 | 0.0550 | 20 | 0.0252 |
1 | 0.0295 | 8 | 0.0425 |
14 | 0.0513 | 21 | 0.0500 |
2 | 0.0639 | 9 | 0.0331 |
15 | 0.0388 | 22 | 0.0253 |
3 | 0.0762 | 10 | 0.339 |
16 | 0.0400 | 23 | 0.0186 |
4 | 0.0514 | 11 | 0.0229 |
17 | 0.0763 | 24 | 0.0252 |
5 | 0.0160 | 12 | 0.0183 |
18 | 0.0330 | 25 | 0.0341 |
6 | 0.0214 | | |
19 | 0.0312 | | |
The values 3 and 17 are just about equal, so both must be tried.
Answer
The key is 3; the message is SOSOSEAMSTRESS
You can get this document in
ASCII text,
Framemaker+SGML version 5.5,
PDF (for Acrobat 3.0 or later),
or
Postscript.
Send email to
cs253@csif.cs.ucdavis.edu.
Department of Computer Science
University of California at Davis
Davis, CA 95616-8562
Page last modified on 1/14/99