Solve picoCTF's RSA Challenge Sum-O-Primes

Posted on 2022-08-20 Edited on 2023-09-15 In Technical Know-how Views: Symbols count in article: 10k Reading time ≈ 9 mins.

By chance, I came across a picoCTF RSA challenge called Sum-O-Primes. This problem is not difficult, you can do it by knowing the basics of the RSA algorithm. In addition, if you are familiar with the history of the evolution of the RSA algorithm, you can find a second ingenious fast solution.

picoCTF Project

picoCTF is a free computer security education program created by security and privacy experts at Carnegie Mellon University. It uses original content built on the CTF (Capture the Flag) framework to provide a variety of challenges. It provides participants with valuable opportunities to systematically learn cybersecurity knowledge and gain practical experience.

The collection of practice questions for picoCTF is called picoGym. The general problem solution is to search or decipher a string in the format "picoCTF{...}" from the given information, that is, the flag to be captured. As shown in the figure below, picoGym currently contains 271 cybersecurity challenge exercises, covering general skills, cryptography, reverse engineering, forensics, and other fields.

Sum-O-Primes Challenge

There are 50 cryptography-related challenges in picoGym, one of which is Sum-O-Primes. The task of this challenge is simple and explained as follows:

We have so much faith in RSA we give you not just the product of the primes, but their sum as well!

gen.py

output.txt

That is, we not only give the product of the two prime numbers used by RSA but also tell you their sum. How are these given? You need to discover by yourself from the rest of the information. After clicking the two links and downloading the file, open the first Python file:

gen.py

#!/usr/bin/python

from binascii import hexlify
from gmpy2 import mpz_urandomb, next_prime, random_state
import math
import os
import sys

if sys.version_info < (3, 9):
    import gmpy2
    math.gcd = gmpy2.gcd
    math.lcm = gmpy2.lcm

FLAG  = open('flag.txt').read().strip()
FLAG  = int(hexlify(FLAG.encode()), 16)
SEED  = int(hexlify(os.urandom(32)).decode(), 16)
STATE = random_state(SEED)

def get_prime(bits):
    return next_prime(mpz_urandomb(STATE, bits) | (1 << (bits - 1)))

p = get_prime(1024)
q = get_prime(1024)

x = p + q
n = p * q

e = 65537

m = math.lcm(p - 1, q - 1)
d = pow(e, -1, m)

c = pow(FLAG, e, n)

print(f'x = {x:x}')
print(f'n = {n:x}')
print(f'c = {c:x}')

If you have basic Python programming skills and understand the principles of the RSA algorithm, you should be able to read the above program quickly. What it does is:

Open the file flag.txt to read the content. Then use the hexlify and int functions to convert it to an integer and store the result in a variable FLAG.
Call the function get_prime to generate two prime numbers, store their sum in x and their product in n. Then assign 65537 to e and calculate the RSA private exponent d.
Use standard pow functions to perform modular exponentiation, which implements RSA encryption to encrypt plaintext FLAG into ciphertext c.
Print out x, n, and c.

Open the second file, which is apparently the output of the first program in Python:

output.txt

1
2
3

x = 154ee809a4dc337290e6a4996e0717dd938160d6abfb651736d9f5d524812a659b310ad1f221196ee8ab187fa746a1b488a4079cddfc5db08e78be0d96c83c01e9bb42420b40d6f0ad9f220633459a6dc058bb01c517386bfbd2d4811c9b08558b0e05534768581a74884758d15e15b4ef0dbd6a338bf1f52eed4f137957737d2
n = 6ce91e471f1df651b0d275d6d5522703feecdd77e7821a2caf9514104c059781c1b2e64772d9220addd657ecbd4e6cb8b5941608f6ab54bd5760074a5cd5854920439422192d2ee8912f1ebcc0d97714f209ee2a22e2da60e071541cb7e0772373cfea71831673378ee6432e63abfd14db0d4aa601928923253f9edd419ce96f4d68ce0aa3e6d6b530cd46eefbdac93038ce949c9dd2e573a47471cf8223f88b96e00a92f4d47fd277c42c4075b5e99b41a9f279f442bc0d533b9ddc50592e369e7026b3f7afaa8edf8972f0c3055f4de67a0eea963f099a32e1539de1d1727abadd9235f66371998ec883d1f89b8d907270842818cae49cd5c7f906c4752e81
c = 48b89662b9718fb391c96527272bf74c27810edaca09b63e694af9d11608010b1db9aedd1c867849371121941a1ccac610f7b28b92fa2f981babe816e6d3ecfab83514ed7e18e2b23fc3b96c7002ff47da897e9f2a9cb1b4e245396589e0b72affb73568a2016031555d2a46557919e44a15cd43fe9e1881d40dce1d1e36625e63b1472d3c317898102943072e06d79688c96b6ee2e584002c66497a9cdc48c38aa0548a7bc4fed9b4c23fcd493f38ece68788ef37a559b7f20c6941fcf8e567d9f50807259a7f11fa7a01d3125a1f7609cd94781f224ec8351605354b11c6b078fe015826342c3271ee3af4b99bb0a538b1e6b845594ee6546be8abd22ef2bd

Once you understand the meaning of the question, you can make a judgment immediately —— if you can decrypt the ciphertext c and retrieve the plaintext FLAG, you can get the original content of flag.txt, that is, capture the flag.

Conventional Solution

RSA decryption requires a private key exponent d. Referring to the steps of the RSA algorithm below, it is obvious that this demands integer factorization for large prime numbers p and q first.

Choose two large prime numbers \(p\) and \(q\), compute \(n=pq\)
Compute Carmichael function \(\lambda(n)=\operatorname{lcm}(p − 1, q − 1)\) the product, \(\operatorname{lcm}\) is a function to find the least common multiple
Choose any number \(e\) that is less than and coprime to \(\lambda(n)\), then compute \(d\), the modular multiplicative inverse of \(e\) regarding \(\lambda(n)\), \(d\equiv e^{-1}\pmod {\lambda(n)}\)
\((n,e)\) is the RSA public key, \((n,d)\) the RSA private key
Use the public key to encrypt the plaintext \(m\), the formula is \(c\equiv m^e\pmod n\)
Use the private key to decrypt the ciphertext \(c\), the formula is \(m\equiv c^d\pmod n\)

From here, the challenge becomes a problem that, knowing the sum and product of two large prime numbers known, find these two large prime numbers. That is, to solve a system of quadratic linear equations

\[ \left\{ \begin{aligned} p+q &=n \\ p*q &=x \end{aligned} \right. \]

Using the knowledge of elementary mathematics, the above equations can be transformed into a quadratic equation \[p^2 - x * p + n = 0\]

Obviously, \(p\) and \(q\) are its two roots. According to the quadratic formula

\[(p,q)={\frac {x}{2}}\pm {\sqrt {\left({\frac {x}{2}}\right)^{2}-n}}\]

We can get \(p\) and \(q\). The rest of the work is easy. The code to compute \(d\) from \(p\) and \(q\) can be copied directly from lines 28, 30, and 31 in gen.py. The final complete Python problem-solving code is as follows:

import math

file = open('output.txt', 'r')
Lines = file.readlines()
file.close()

x = int((Lines[0].split())[2], 16) # x = p + q
n = int((Lines[1].split())[2], 16) # n = p * q
c = int((Lines[2].split())[2], 16) # Ciphertext

def solve_rsa_primes(s: int, m: int) -> tuple:
    '''
    Solve RSA prime numbers (p, q) from the quadratic equation
    p^2 - s * p + m = 0 with the formula p = s/2 +/- sqrt((s/2)^2 - m)

    Input: s - sum of primes, m - product of primes
    Output: (p, q)
    '''
    half_s = s >> 1
    tmp = math.isqrt(half_s ** 2 - m)
    return int(half_s + tmp), int(half_s - tmp);  

# Now run with the real input
p, q = solve_rsa_primes(x, n)
m = math.lcm(p - 1, q - 1)
e = 65537
d = pow(e, -1, m)
FLAG = pow(c, d, n)
print(FLAG.to_bytes((FLAG.bit_length() + 7) // 8, 'big'))

The above program defines a general function solve_rsa_primes to solve two large prime numbers. After it gets d, the same pow function is called to decrypt, and finally the plaintext is converted from a large integer to a byte sequence and printed out. The result of running this program is

1	b'picoCTF{pl33z_n0_g1v3_c0ngru3nc3_0f_5qu4r35_92fe3557}'

BINGO! Capture the Flag successfully!

Note: The function solve_rsa_primes calls math.isqrt to compute the integer square root of the given integer. This is indispensable! If it is written incorrectly with math.sqrt, the following overflow error will occur

>>>
=============== RESTART: /Users/zixi/Downloads/Sum-O-Primes.py ==============
Traceback (most recent call last):
  File "/Users/zixi/Downloads/Sum-O-Primes.py", line 35, in <module>
    p, q = solve_rsa_primes(x, n)
  File "/Users/zixi/Downloads/Sum-O-Primes.py", line 31, in solve_rsa_primes
    tmp = math.sqrt(int(half_s ** 2 - m))
OverflowError: int too large to convert to float

This error happens because math.sqrt uses floating-point arithmetic but fails to convert large integers to floating-point numbers.

Quick Solution

The conventional solution to this problem has to solve a quadratic equation, so the integer square root operation is essential. Is there a solution that doesn't need a square root operation? The answer is yes.

In the original RSA paper, the public exponent \(e\) and the private exponent \(d\) have the relationship as the following equation

\[d⋅e≡1\pmod{\varphi(n)}\]

Here the modular is the Euler's totient function \(\varphi(n)=(p-1)(q-1)\). Since \(\varphi(N)\) is always divisible by \(\lambda(n)\), any d satisfying the above also satisfies \(d⋅e≡1\pmod{\lambda(n)}\), thus the private exponent is not unique. Although the calculated \(d>\lambda(n)\), the square root operation can be avoided when applied to the Sum-O-Primes problem. This is because \[ \begin{aligned} \varphi(n)&=(p-1)(q-1)\\ &=pq-(p+q)+1\\ &=n-x+1 \end{aligned} \]

Hereby the formula for computing the private exponent becomes

\[ \begin{aligned} d&≡e^{-1}\pmod{\varphi(n)}\\ &≡e^{-1}\pmod{(n-x+1)} \end{aligned} \]

Now that \(n\) and \(x\) are readily available, this method does not require finding \(p\) and \(q\) first, and naturally, there is no need for a square root operation. The Python code for this new solution is very concise

d1 = pow(e, -1, n - x + 1)
FLAG = pow(c, d1, n)
print(FLAG.to_bytes((FLAG.bit_length() + 7) // 8, 'big'))

print("d = ", d)
print("d1 = ", d1)
assert(d1>d)
print("d1/d = ", d1/d)

To compare these two solutions, 4 lines of print and assert statements are added at the end. The execution result of this code is

>>>
=============== RESTART: /Users/zixi/Downloads/Sum-O-Primes.py ==============
b'picoCTF{pl33z_n0_g1v3_c0ngru3nc3_0f_5qu4r35_92fe3557}'
d =  1590433953643304448870807755026766943237397482033766155980367645454600169745357277163199312196609495875891431590581528929277583062406061101224041553945564552302546648687338536694903918084325519368961617691238793972703013656395301935576994660878296156727353260699130612675943209520489312860964899655070852366584778594425834982623831654304915478835573020874834723387183369976749895237126850604587166433366381884290402338703266523462767765540527102747754912478720160791675179128443712374832507705614160658601242723842366612805686436771142338154848447759947887908800687914418476358484536216953925324788380823429735298973
d1 =  11901952834426939436403812982514571575614906347331071933175950931208083895179963694981295931167346168378938101218143770786299673201984563299831132533757316974157649670783507276616478666261648674806749337918514985951832847720617452268824430679672778783943236259522437088812130196067329355430038927225825521934485847159262037514154059696664148362902872186817856316128403800463106817000251243818717005827615275821709043532925457271839955998044684537152992871171338447136672661193487297988293156428071068861346467230927990425182893890027896377626007826573834588309038513191969376781172191621785853174152547091371818954913
d1/d =  7.483462489694971

As shown above, this solution also succeeds in capturing the flag. The \(d\) value (d1) calculated by the new solution is more than 7 times that of the conventional solution.

Click here to download all the code of this article: Sum-O-Primes.py.gz