Modelling many users making simultaneous hotel reservations in the database

Stephen: We can sign up every user ... in the world

Me: ... I wonder with our current code, what happens with that many users attempting to making reservations? :thinking_face:

The problem

What happens when n many users are attempting to reserve space on a package? If two users perform the reservation check at the same time, then they'll see the same number of rooms available and both be allowed to make a reservation.

If we generalize to n many users making simultaneous requests, it follows that we are quite likely to create more reservations than we actually have inventory for.

The data setup

Currently we have RoomInventory which has a date and a quantity, this represents how much stock we have in total.

We use RoomReservations which have a foreign key to Room, a check in and checkout date and a quantity of requested rooms.

To see if we have something in stock, we subtract reservations from inventory.

The problem lies with the fact that the data does not prevent us from overselling (e.g we could decrement the RoomInventory so it only ever represents current available, but that would likely cause ops confusion)

Database Locks

Max suggested that we have some form of table locking when checking reservations

We want some form of locking, the table level is probably excessive as only one person could ever make a reservation at a time, even if the reservation they're making is for unrelated inventory. So it's probably a good start to see what the docspostgres explicit locking on say.

Postgres and django allows to use a SELECT FOR UPDATE

FOR UPDATE causes the rows retrieved by the SELECT statement to be locked as though for update. This prevents them from being locked, modified or deleted by other transactions until the current transaction ends. That is, other transactions that attempt UPDATE, DELETE, SELECT FOR UPDATE, SELECT FOR NO KEY UPDATE, SELECT FOR SHARE or SELECT FOR KEY SHARE of these rows will be blocked until the current transaction ends; ...

Row Locking RoomInventory (not chosen)

This means if we lock the RoomInventory, then any other select for update will have to wait, So from https://docs.djangoproject.com/en/2.2/ref/models/querysets/#select-for-update our code would look something like:

    RoomInventory.objects.select_for_update(the nights we're interested in, skip_locked=True)
    with transaction.atomic():
        do_an_inventory_check(on the nights we're interested in)
        if we have stock then:
            create_reservation()

This will mean we can only create Reservations when we hold the lock for the inventory we're interested in.

But what about existing Reservations, I can just up the quantity?

This needs some testing, but we can add a ManyToMany from the RoomReservation to RoomInventory that stores the nights that that reservation is for.

When you lock an object, it also locks the objects that it's related to. When we lock RoomInventory hopefully that'd lock any other RoomReservations on that inventory.

SELECT FOR UPDATE locking other tables in PostgreSQL - Database Administrators Stack Exchange

We'd would need to test for a many to many

`skip_locked?`

This allows us to other transactions to select_for_update other RoomInventory if there is not currently a lock on those rows.

So we can create inventory for different room inventory concurrently.

Potential Problems

What's the performance implications of doing this?
What timeout issues, live/dead/infinite lock scenarios could occur?
many to many does not mean the reservation is locked? perhaps just the intermediate table?

Summary

lock the RoomInventory using select_for_update(skip_locked=True) for the duration of the inventory check and the creation of the reservation (i.e when we're going through the ambassador flow)

add a RoomReservation.used_inventory ManyToMany(RoomInventory) to prevent editing of existing Reservations at the same time that we're creating a new one.

Room Tokens and locking on a token level

Another idea would be to use foreign key constraints with a RoomInventoryToken model to represent rooms assigned to a reservation, when we want constraints on the number of rooms being sold, it's often best to use database constraints to enforce this, instead of something more fragile in code. So if we add a model:

    class RoomInventoryToken(BaseModel):
        room_inventory = ForeignKey(room_inventory)
        reservation = ForegnKey(Reservation, null=True)

Each token represents a room, (we use token to avoid people thinking we've built rooming) and once it's used, it gets attached to a Reservation.

We'd want for a given list of RoomInventory (days of check in for a hotel) to select requested_quantity unused token(s), this can be achieved by

using a GROUP_BY room_inventory
and window function to give each non-null row a row number
select for update (with skip locked), the one of these results.

This has a few advantages over inventory locking.

we only lock RoomInventoryTokens so multiple people can purchase the same inventory at once

we cannot over sell as we cannot assign tokens to more than one reservation (we'd need to be careful of say not giving away a token after it'd already been assigned)

We'd need to do some other stuff like create n tokens for RoomInventory.quantity and keep that in sync or preferably just remove RoomInventory.quantity altogether.

So, did it work and how'd the code turn out?

Pretty good! Of all the parts of the code that have had to deal with a relative amount of high traffic, this code has performed well, hasn't needed to be changed and our problems have been elsewhere such as pricing calculation.

def get_room_inventory_tokens(
    room: Room, requested_quantity: int, check_in: date, check_out: date
) -> List[StockToken]:
    """Fetch the next unused inventory tokens

    MUST be used inside an `transaction.atomic()` due to the use of select_for_update

    This prevents multiple people from buying the same inventory at the same time.
    RoomInventory represents a night you can stay at a hotel
    e.g "29th Feb 2000 at the TravelLodge Milton Keynes"

    A token represents an instance of a room, if we have 100 tokens, we have 100
    rooms in stock, e.g:
        "100 rooms for the 29th feb 2000 at the travellodge milton keynes"

    we use SELECT FOR UPDATE SKIP LOCKED to lock the tokens, so only 1 transaction
    try to buy the tokens at once, since we skip locked tokens, we won't have a case
    where one transactions buys the tokens and another follows an overwrites the
    same ones immediately after
    see https://www.postgresql.org/docs/9.5/sql-select.html#SQL-FOR-UPDATE-SHARE
    """
    # Find all the room inventory within the check in checkout date range
    room_inventory = RoomInventory.objects.filter(
        room=room, date__gte=check_in, date__lt=check_out
    ).values_list('id', flat=True)
    # Use a common table expression to row number the current free tokens
    # https://www.postgresql.org/docs/12/queries-with.html
    # This is the
    #     WITH cte as (
    #         SELECT RowNumber as row_number OVER room_inventory_id
    #     )
    # part of the CTE
    cte = With(
        # we filter out any used reservations, so only want free inventory
        StockToken.objects.filter(reservation__isnull=True)
        .filter(room_inventory__in=room_inventory)
        .annotate(
            row_number=Window(
                expression=RowNumber(), partition_by=F('room_inventory_id')
            )
        )
    )

    # We need requested_quantity rooms for each night, so the total
    # number of tokens is the number of nights * requested_quantity
    num_inventory = len(room_inventory)
    total_requested_tokens = num_inventory * requested_quantity

    # This is the second part of the CTE
    tokens = (
        cte.join(StockToken.objects.all(), id=cte.col.id)
        .with_cte(cte)
        .annotate(row_number=cte.col.row_number)
        # we order by row, so our results looks a bit like
        #
        # room_inventory_id         | row number    | token_id
        # milton_keynes_29th_mar    | 1             | tkn_a
        # milton_keynes_30th_mar    | 1             | tkn_r
        # milton_keynes_31th_mar    | 1             | tkn_y
        # milton_keynes_29th_mar    | 2             | tkn_o
        # milton_keynes_30th_mar    | 2             | tkn_d
        # milton_keynes_31th_mar    | 2             | tkn_w
        .order_by('row_number')
        # and we LIMIT the number of tokens we want to
        # total_requested_tokens == number of nights * number of requested rooms
        # and we skip any currently locked tokens, so we only try to buy free ones
        .select_for_update(skip_locked=True)[:total_requested_tokens]
    )
    _check_tokens_are_valid(tokens, room_inventory, requested_quantity)
    return tokens

There are quite a few improvements I'd like to make if I get the time.

Detecting whether the code is currently in a transaction and erroring if it isn't https://stackoverflow.com/questions/18544033/how-to-tell-if-your-select-query-is-within-a-transaction-or-not . I'd rather catch errors at dev time rather than run time.

Looking into the performance of skip_locked, it stands to reason the more transactions that are occuring simultaneously, the more rows the n + 1 th transaction will have to scan over. It hasn't been a problem so far, but I'd rather get ahead of that now.

The code looks a bit scary, this is a combination of django not having CTEs built-in and having to use a seperate library, many developers aren't used to it. Infact when we wanted to use this solution elsewhere, more junior members copy-pasted without really understanding as they were too scared to refactor it. So some improvements are possible.

Educating developers that the Token table doesn't take much space. I gave this explanation to another developer.

fk to room inventory 1 uuid 128 bits (16 bytes)

fk to room reservation 1 uuid of 128 bits

date created date updated. 2 * 8 bytes

I'm guestimating a standard meme image takes up more space than all the tokens we have in the database