Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 4

Introduction:

Cryptography is used as a method that utilizes mathematics to encrypt and decrypt data,
enabling users to store and transmit confidential information across networks in order to
make sure none other than the intended recipient is able to understand the data. The use of
cryptography is reflected throughout history with wars being a prime example when orders
must be delivered safely without being intercepted by the opposite force. Nowadays, with
the internet being an essential part of our lives, valuable information such as credit cards
numbers, passwords and private messages are constantly being transferred online.
Therefore, ciphers are needed to conceal the content of theses information and reduce the
risk of information leak. When a data is encrypted by the sender for transmission, it must be
decrypted by the recipient.

Decryption is the process that revert the ciphertext with either the recipient is aware
of the encryption system and the key used by the sender or using deciphering techniques.
Decryption is processing the ciphertext backward to the original message. Deciphering
without requiring the “key” typically required is often known as Cryptanalysis. This usually
involves knowledge of how the encryption system works and deducing the key through
different attack models depends on different encryption system including: Brute Force attack,
Man-in-the-middle, Frequency Analysis, ...

I am drawn to this topic particularly because as I’m looking at news around the world,
information about data leaks or hackers releasing individuals’ data is prevalent. So how is
our data being protected? How tough is it to access another person’s encrypted data? When
we access social media or just browsing the internet in general, we tend to overlook these
concerns. As someone who’s planning to study computer science in the future, the idea of
learning how our information is stored and delivered is exciting and informative.

Caesar Cipher (Substitution Cipher):


The simple substitution cipher is a cipher that has been in use for many centuries
and encrypts the plaintext by swapping each letter or numbers in the initial message with a
different symbol. And Caesar Cipher is one of the most simply and well-known example of
Substitution Cipher
To simply explained Caesar Cipher, a number could be assigned to each letter in the
English alphabet with a number starting from 0 to 25, which should give us the table below:

A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
0 1 2 3 4 5 6 7 8 9 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5

Then the key use for encryption would be symbolize as “k”, and the letter would be shift from
position x → position x +k. For example, if the key or k = 5 then A with the position of 0 would have
the position of 0 + 5 = 5 or the letter F. The completed encryption would be as follows:
A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
5 6 7 8 9 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 0 1 2 3 4
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5
For this example, the letter “x” with the initial position of 23 with the shift which is 5,
then the position of the letter would be 23 + 5 = 28. But there are only 26 letters in the
alphabet or only 25 position possible as a has the beginning position of 0. Therefore, if in the
situation where the position is equal or larger than 26, 26 would be subtracted from that
number. As a result, the encryption of a letter by shift n can be mathematically described as
a modular arithmetic function:
e(x) = (x+k) (mod 26)
The Decryption formula with the knowledge of the key would be:
d(x) = (x-k) (mod26)
So in the example above with “x”, the position of x would be: (23+5)(mod 26) = 2 → So the
letter of “x” would be replaced by the letter “c” and we can have the example of text encrypted by the
Caesar cipher:

Shift number: 5
Plaintext: Hi My name is Hoang. This is a Test
Encrypted: Mn Rd sfrj nx Mtfsl. Ymnx nx f Yjxy

Cracking the Caesar Cipher:

One of the ways we can crack the Caesar cipher is with the use of Chi-squared
Statistic which is a measure to examine how similar is two categorical probability
distributions. The smaller value come from Chi-squared statistic is, the more similar are the
distributions. When the value is 0, the two distributions is the same. The Chi-squared
statistic formula is written as:
2
i=Z
(C i−Ei )2
X ( C , E ) =∑ Where CA is the count of the letter A, and EA is the expected
i=A Ei
count of letter A.

By using the Chi-squared statistic, we can decipher the ciphertext with each of the 25
possible keys that range from 1 to 25. In each possible keys, we can find the total Chi-
squared value of the count of every letter compared to the expected frequency of that letter
appeared in English. Then the key with the smallest combine value would be the key that we
are looking for.

The paragraph I have chosen is “hello this is an example paragraph that i made. My
name is Hoang and I’m trying to break this code through Frequency Analysis.” and with the
key of 8 I encrypted it to “ pmttw bpqa qa iv mfiuxtm xiziozixp bpib q uilm. Ug vium qa Pwivo
ivl Qu bzgqvo bw jzmis bpqa kwlm bpzwcop Nzmycmvkg Ivitgaqa.”

Firstly, we must find out the frequency of each letter appeared in our ciphertext. This
can be done easily with the use of frequency analysis available on multiple website. The site
that I used is named Crypto Corner. The 2nd and 3rd row represent the count and percent of
frequency in percent of letter appeared in the paragraph respectively.
The expected percentage of frequency of letter in English can also be found on this
site as:

To find the expected count of the letter, we can use the expected percentage of
frequency multiple by the number of character in the paragraph. So for example, the
expected percentage of frequency of the letter “E’” is 12.9%. Thus the expected count of the
letter in the 23 word paragraph is 12.9% * 23 = 2.967 times. From there we can calculate
other letters’ expected count as shown below:

A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
1 1 0 0 2 0 0 1 1 0. 0 0 0 1 1 0 0 1 1 2 0 0 0 0. 0 0.
. . . . . . . . . 0 . . . . . . . . . . . . . 0 . 0
8 5 6 9 9 5 4 4 6 3 1 9 5 5 7 4 0 3 4 0 6 2 5 3 4 1
8 8 4 8 6 0 6 0 1 4 8 2 5 4 2 3 2 8 4 9 4 3 5 4 6 6
6 7 4 9 7 6 3 5 4 2 1 5 7 3 9 3 4 2 5 1

So if the key is 1, the paragraph would be “olssv aopz pz hu lehtwsl whyhnyhwo


aoha p thkl. Tf uhtl pz Ovhun huk Pt ayfpun av iylhr aopz jvkl aoyvbno Mylxblujf Huhsfzpz”
and the frequency would be:

Then we would calculate the Chi-squared formula for each of the letter in the
alphabet and calculate the sum. For example, the Chi-squared value of letter H is:
(13−1.403)2
x ( H )= = 95.859
1.403

And the Chi-squared value of the whole alphabet is


(C i−Ei )2
i=Z
=¿ ¿3020.58
2
X =∑
i= A Ei
We will repeat the process for all of the 25 possible key. The result is in the table below:
Key Chi-squared value
1 3020.58
2 5351.949
3 4956.596
4 3107.096
5 3610.645
6 4091.501
7 4091.018
8 400.1052
9 12938.08
10 3081.843
11 3082.494
12 2832.334
13 6895.31
14 6896.264
15 3952.318
16 6386.332
17 6404.043
18 10958.59
19 2797.69
20 1883.688
21 2666.465
22 2657.74
23 3334.621
24 3881.132
25 8830.289

Looking at the Chi-squared value, the value at the key of 8 is 400.1052 and
noticeably smaller compare to other keys and is also the key that was chose at the
beginning. As a conclusion, we have found the key to the Caesar Cipher without knowing the
key.

Conculsion
Because of the simplicity nature of the encryption system, the cipher is relatively
easy to crack. Without understanding the meaning of the word, one can still break this cipher
through the use Chi-squared Statistic as shown, which can be done through calculations
done on machine at a flash. But if one can examine the meaning of the word, it would take
less 25 tries by looking for common key word such as “the” “my” or single word like “a” and
“I”.
The cipher can be improved in the complexity by using the different type of key. You
can assign each of the letter to a different random letter. This method is called a mono-
alphabetic substitution cipher. Then there is 26! Possibilities of the plain text that can be
encrypted. Still these possibilities can be also process by the machine using the Chi-squared
Statistic.

To combat the use of frequency analysis attack is with the use of increased block
size of the cipher or the number of units are encrypted at once. With the example of the
Vigenère Cipher or polyalphabetic cipher that use a word or random string of character
instead of just one key make this method much more difficult to effectively find the key.

You might also like