Uses for Redis in your Django application

Uses for Redis in your Django application

In my work as a web developer, I am used to relying on relational databases like PostgreSQL for all kinds of persistence. But as I learn better practices for developing scalable systems, I discover better tools for some of the jobs. One of these tools is in-memory key-value stores, especially Redis.

I will list some of the uses that I have found Redis to be a good choice for in my projects. If you're like me and new to using in-memory stores for something other than caching, then maybe you will find this article interesting.

Most if not all of the examples described here apply to other key-values stores, like Memcached. And many of them can be useful in other Python web frameworks and even non-Python projects as well. But Redis and Django are the tools I'm most familiar with, hence my choice.

Caching

Of course, an obvious use is to cache data, like requests to external APIs or heavy database queries. Other examples below assume you have set up caching in Redis somewhat like this:

CACHES = {
    "default": {
        "BACKEND": "django_redis.cache.RedisCache",
        "LOCATION": "redis://localhost:6379/0",
        "OPTIONS": {
            "CLIENT_CLASS": "django_redis.client.DefaultClient",
        }
    }
}
Django’s cache framework | Django documentation | Django

User sessions

By default Django stores user sessions in the DB, which works well. But you have to keep in mind that Django creates a new session for every new visitor of your site. That means that over time sessions will inflate your DB size.

You can take care of this and run manage.py clearsessions on schedule. Or you could use Redis which will clean up old sessions by itself. And it will keep them in memory which can potentially shave off a couple of milliseconds of your app's response time.

Configuring Django to use Redis for sessions is just one line in your settings, assuming you use Redis for caching as well:

SESSION_ENGINE = "django.contrib.sessions.backends.cache"
How to use sessions | Django documentation | Django

Tracking which users are online

If your application has a social element, it will probably have to display a little green dot near avatars or usernames of online users. One way to do this is to track when each user has performed any action on your site last time: i.e. when they have sent their last HTTP request.

Initially my go-to practice would be to add a date-time field on the user model:

class User(AbstractBaseUser):
    # other fields
    last_action_time = models.DateTimeField(null=True)

Then to add a middleware which updates this field on each requests for authenticated users:

class LastActionMiddleware:
    def __init__(self, get_response):
        self.get_response = get_response

    def __call__(self, request):
        if request.user.is_authenticated:
            request.user.last_action_time = timezone.now()
            request.user.save(update_fields=['last_action_time'])

        return self.get_response(request)

And finally, to compare this field to the current time to find out if the user is online:

class User(AbstractBaseUser):
    # fields and methods
    
    @property
    def is_online(self) -> bool:
        if not self.last_action_time:
            return False
            
        time_since_last_action = timezone.now() - self.last_action_time
        return time_since_last_action.total_seconds() < settings.ONLINE_TIMEOUT

The disadvantage of this method is that each authenticated request causes a write in the DB. As the number of users increases, these writes can decrease performance of your application.

It's pretty easy to use in-memory cache for this feature. I would incapsulate last action tracking logic in a separate module, like users.services.lastaction:

def _get_last_action_cache_key(user_id: int) -> str:
    return f"user:{user_id}:last_action"


def set_last_action(user_id: int, last_action: datetime = None):
    last_action = last_action or timezone.now()
    cache_key = _get_last_action_cache_key(user_id)
    cache.set(cache_key, last_action, settings.ONLINE_TIMEOUT)


def get_last_action(user_id: int) -> Optional[datetime]:
    cache_key = _get_last_action_cache_key(user_id)
    return cache.get(cache_key)

Then call set_last_action in the middleware and get_last_action in the user model.

Pre-computing heavy DB queries

This idea is somewhat the same as caching, except that we don't expect the value to expire by itself and instead we update it whenever needed.

An example of how Redis can be used in this case is to index a category tree of an online catalog. Let's say you have a category model:

class Category(models.Model):
    name = models.CharField(max_length=255)
    parent = models.ForeignKey('catalog.Category', null=True, on_delete=models.CASCADE, related_name='child_set')

It can be pretty tricky to use RDBMS to query tree-like structure like this. If you need to find all descendants of a given category (to list all nested products), you have to write a recursive query, which will have to scan the parent_id index many times, especially if your catalog has hundreds of categories with many levels of hierarchy.

Another way to do this is to index the category tree yourself into a JSON-serializable dictionary. For example:

DESCENDANT_INDEX_KEY = 'category:descendant_index'


def update_category_descendant_index() -> Dict[int, List[int]]:
    descendant_index = _build_descendant_index()
    cache.set(DESCENDANT_INDEX_KEY, json.dumps(descendant_index))
    return descendant_index


def _build_descendant_index() -> Dict[int, List[int]]:
    """
    Returns a dictionary which maps each category ID to a list
    of all its descendant IDs.
    """

Each time an administrator updates or deletes a category, call update_category_index so that the changes would be reflected in the catalog.

Then, when you need to get the list of category descendants, you just call get_category_descendants:

def get_category_descendant_index() -> Dict[int, List[int]]:
    descendant_index = cache.get(CATEGORY_INDEX_KEY)
    if descendant_index:
        return descendant_index
    else:
        return update_category_descendant_index()


def get_category_descendants(id: int) -> List[int]:
    descendant_index = get_category_descendant_index()
    return descendant_index.get(id, [])

Counting views for a page

If you're incrementing the view count for pages straight in the DB, you're facing the same problem as with the online user tracking: each request causes a write in the DB and that may negatively impact the performance in the long run.

class PageView(generic.View):
    def get(request, pk):
        # Incrementing on the DB side to be safe from race conditions
        Page.objects.filter(pk=pk).update(view_count=models.F('view_count') + 1)
        page = get_object_or_404(Page, pk=pk)
        return render(request, 'page.html', {'page': page})
        

To go easy on your DB and your disk, you could store view counts in the key-value store:

# In the view counting module, e.g. pages.services.viewcount

def get_view_count_cache_key(page_pk: int) -> str:
    return f'pages:{page_pk}:view_count'


def increment_view_count(page_pk: int):
    cache_key = get_view_count_cache_key(page_pk)
    cache.incr(cache_key)
    

def get_view_count(page_pk: int) -> int:
    cache_key = get_view_count_cache_key(page_pk)
    return cache.get(cache_key, default=0)

If it's important to count views for each date or for each client location separately, you can put this information right in the key:

def get_view_count_cache_key(page_pk: int, day: date, location: str) -> str:
    return f'pages:{page_pk}:{day.strftime('%Y-%m-%d')}:{location}:view_count'

If you need to make sure the count is persistent (while in-memory storage is not entirely sound to data loss), you can run a daily or hourly background job that copies view count of each page into the DB.


Cover photo by Kevin Oetiker on Unsplash