I deleted the production database by accident

Today at around 10:45pm, after a couple of glasses of red wine, we deleted the production database by accident 😨.

Thankfully our database is a managed database from DigitalOcean, which means that DigitalOcean automatically do backups once a day. After 5 minutes of hand-wringing and panic, we went into maintenance mode and were able to restore a backup. At around 11:15pm, 30 minutes after the disaster, we went back online, however 7 hours of scoreboard data was gone forever 😵. We are extremely sorry about this.

What happened?

It’s tempting to blame the disaster on the couple of glasses of red wine. However, the function that wiped the database was written whilst sober. It’s a function that deletes the local database and creates all the required tables from scratch. This evening, whilst doing some late evening coding, the function connected to the production database and wiped it. Why? This is something we’re still trying to figure out.

Here is the code that caused the disaster:

def database_model_create():
    """Only works on localhost to prevent catastrophe"""
    database = config.DevelopmentConfig.DB_DATABASE
    user = config.DevelopmentConfig.DB_USERNAME
    password = config.DevelopmentConfig.DB_PASSWORD
    port = config.DevelopmentConfig.DB_PORT
    local_db = PostgresqlDatabase(database=database, user=user, password=password, host='localhost', port=port)
    local_db.drop_tables([Game, Player, Round, Score, Order])
    local_db.create_tables([Game, Player, Round, Score, Order])
    print('Initialized the local database.')

Note that host is hardcoded to localhost. This means it should never connect to any machine other than the developer machine. We’re too tired to figure it out right now. The gremlins won this time.

What have we learned? Why won’t this happen again?

We’ve learned that having a function that deletes your database is too dangerous to have lying around. The problem is, you can never really test the safety mechanisms properly, because testing it would mean pointing a gun at the production database.

The truth is, we can never be 100% sure that something like this won’t happen again. Computers are just too complex and there are days when the complexity gremlins win. However, we will figure out what went wrong and ensure that that particular error doesn’t happen again.

Again, we are very sorry. Good night.

Photo by Dawn Armfield on Unsplash

Leave a Reply