Demo of a webapp with Selenium, Docker, Celery, Channels, Redis, Rabbitmq and Django

Published on 20 July 2021 (Updated on 23 July 2021)

Overview of the webapp

Here I fill in 4 links from a Task model in Django Admin which are :

https://www.facebook.com/
https://www.linkedin.com/
https://www.instagram.com/
https://twitter.com/

When I click on "Save" in Django Admin, this is what happens:

A task is launched with Celery
In this task, Selenium launches Chrome
A loop is performed on the links of Task model
For each link, Selenium retrieves it and sends it to Chrome
Chrome navigates to the link and takes a screenshot
These screenshots are then sent by WebSocket

This article details these concepts based on Python :

Django Admin with ASGI
Selenium Python
Celery with RabbitMQ
Django channels with Redis

I also explain how to use them with Docker.

If you are more practical than theoretical, you can go directly to test the webapp and explore the code on the link of the Github directory django-selenium-docker.

1. Django Admin, the webapp

Django Admin is a webapp that allows you to quickly create administration sites. This tool uses Django model and Django Model Admin.

Using Django with Docker

To run Django with Docker, I use :

ASGI : interface between web servers, frameworks and async-enabled Python applications
daphne : HTTP, HTTP2 and WebSocket protocol server for ASGI and ASGI-HTTP

This combination is necessary for the following because I use Django-channels.

Here is the code of my Dockerfile :

FROM python:3.9.2-slim-buster

RUN useradd app

EXPOSE 8000

ENV PYTHONUNBUFFERED=1 \
    PORT=8000

WORKDIR /app
COPY --chown=app:app . .

RUN pip install "daphne==3.0.2"
RUN pip install -r requirements.txt

USER app

CMD set -xe; daphne -b 0.0.0.0 -p 8000 app.app.asgi:application

Django ASGI

Here is the entry point of our Django app with ASGI :

import os
from django.core.asgi import get_asgi_application
from channels.routing import ProtocolTypeRouter

os.environ.setdefault('DJANGO_SETTINGS_MODULE', 'settings.production')
django_asgi_app = get_asgi_application()

application = ProtocolTypeRouter({
    "http": django_asgi_app,
})

Postgresql, the database

Then, I connect a Postgresql database in the Django settings like this :

DATABASES = {
    'default': {
        'ENGINE': 'django.db.backends.postgresql_psycopg2',
        'NAME': 'app',
        'USER': 'app_user',
        'PASSWORD': 'changeme',
        'HOST': 'localhost',
        'PORT': '5432',
    }
}

Docker compose with Django

I can now run my services with docker compose like this:

version: "3.3"

services:
  app_db:
    container_name: app_db
    image: postgres:13.1
    environment:
      - POSTGRES_USER=app_user
      - POSTGRES_PASSWORD=changeme
      - POSTGRES_DB=app_db
    volumes:
      - app_db:/var/lib/postgresql/13.1/main
    ports:
      - "5432:5432"
    networks:
      - app_network
    restart: on-failure
  app:
    container_name: app
    build: ./
    depends_on:
      - app_db
    ports:
      - "8001:8000"
    image: app
    networks:
      - app_network
    restart: on-failure

networks:
  app_network:

volumes:
  app_db:

Static file management

For Django Admin static file management, I use a CloufFront + S3 where I have already collected the files.

This allows me to avoid complicating the configuration in my docker compose for the future with this simple line of code in Django settings :

STATIC_URL = 'https://static.snoweb.fr/'

Media files management

I don't use media files on this project.

If you want to use them, a good solution is to use django-storages package with :

DEFAULT_FILE_STORAGE = 'storages.backends.s3boto3.S3Boto3Storage'

This technique also simplifies the docker compose.

2. Selenium, the automatic web browser

Selenium is a set of tools and libraries for automating web browsers.

Using Selenium with Docker

There are many ways to run Selenium with Docker. To use the official images, go to docker-selenium.

Here, I reuse my previous Dockerfile by adding the dependencies and the Chrome installation:

FROM python:3.9.2-slim-buster

RUN useradd app

EXPOSE 8000

ENV PYTHONUNBUFFERED=1 \
    PORT=8000

WORKDIR /app
COPY --chown=app:app . .

RUN apt-get update --yes --quiet

# Installs the dependencies used by Chrome and Selenium
RUN apt-get install --yes --quiet --no-install-recommends \
    gettext \
    fonts-liberation \
    libasound2 \
    libatk-bridge2.0-0 \
    libatk1.0-0 \
    libatspi2.0-0 \
    libcairo2 \
    libcups2 \
    libdbus-1-3 \
    libdrm2 \
    libgbm1 \
    libgdk-pixbuf2.0-0 \
    libglib2.0-0 \
    libgtk-3-0 \
    libnspr4 \
    libnss3 \
    libpango-1.0-0 \
    libx11-6 \
    libxcb1 \
    libxcomposite1 \
    libxdamage1 \
    libxext6 \
    libxfixes3 \
    libxkbcommon0 \
    libxrandr2 \
    libxshmfence1 \
    wget \
    xdg-utils \
    netcat \
    xvfb \
 && rm -rf /var/lib/apt/lists/*

# Install Chrome
RUN dpkg -i ./bin/google-chrome.deb

RUN pip install "daphne==3.0.2"
RUN pip install -r requirements.txt

USER app

CMD set -xe; daphne -b 0.0.0.0 -p 8000 app.app.asgi:application

You need to manually download the google-chrome.deb package and place it correctly in your project.

Using Selenium with Python

To launch Chrome with Python, I use a class that inherits from the Chrome driver, here is an overview :

import os
from django.conf import settings
from selenium import webdriver

USER_AGENT = "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/89.0.4389.114 Safari/537.36"
WINDOW_SIZE = "1200x1000"

class Browser(webdriver.Chrome):
    timeout = 15

    def __init__(self):
        
        # Adds the executable folder to PATH so that chromedriver is available
        path_bin = str(settings.BASE_DIR / 'bin')
        if path_bin not in os.environ["PATH"]:
            os.environ["PATH"] += os.pathsep + path_bin
        
        # Customise Chrome options
        chrome_options = webdriver.ChromeOptions()
        
        # Allows you to launch Chrome without a window
        chrome_options.add_argument('--headless')
        
        # Google recommends this option
        chrome_options.add_argument("--no-sandbox")
        
        # Fixing issues with Docker memory
        chrome_options.add_argument("--disable-dev-shm-usage")
        
        # Allows you to customise the USER_AGENT
        chrome_options.add_argument(f"user-agent={USER_AGENT}")
        
        # Allows you to change the size of the window
        chrome_options.add_argument(f"window-size={WINDOW_SIZE}")
        
        super().__init__(chrome_options=chrome_options)

Here is how to use this class, Chrome go to Linkedin website :

browser = Browser()
browser.get("https://www.linkedin.com/")

Find all possible commands in the Selenium Python documentation.

3. Celery, the worker that executes tasks

Celery is a Python library that allows you to launch long tasks with a worker. Here are some examples of uses:

A user updates a link in the <header> of his site via a CMS on a static site. This requires rebuilding all 100 pages of his site with the updated <header>. Celery runs a long task to rebuild his site and notifies him when he has finished.
A user updates a skill on his developer profile. Celery updates the algorithm data and then rebuilds his static profile page and then notifies recruiters based on the ads that match him.

The worker's role is to take on the workload of the webapp and avoid keeping the user waiting.

Configuring a Celery app with Django

Celery needs a broker to transport messages between the different services.

Here we use RabbitMQ for the broker.

It is also possible to use Redis. However, you will be limited in some features with Celery.

Here is the configuration I use in the Django settings :

# RabbitMQ
APP_BROKER_URL = 'pyamqp://'

# Allows you to select or store the result of a task. Here, I use our Postgresql database
APP_RESULT_BACKEND = 'django-db'

# Allows you to track the start of a task
APP_TASK_TRACK_STARTED = True

# Allows you to add a time limit
APP_TASK_TIME_LIMIT = 12 * 60 * 60

# Number of simultaneous tasks
APP_WORKER_CONCURRENCY = 1

Using Celery with Python

To launch Celery, I define a Celery app in a Django app like this:

import os
from django.conf import settings
from celery import Celery

# Allows you to add an environment variable to the worker, here it is the Django settings
os.environ.setdefault("DJANGO_SETTINGS_MODULE", "settings.production")

app = Celery('app.core')

# Adds the APP prefix we defined earlier in the Django settings
app.config_from_object(settings, namespace='APP')

# Empty Celery cache
app.control.purge()

# Search for tasks in apps
app.autodiscover_tasks()

I then use this app to define a task.

Here is an example of a simplified task with an "OK" result that is restarted 3 times if an Exception is raised with an increasing interval between Exceptions :

from app.core.tasks_app import app

@app.task(bind=True, max_retries=3)
def task_example(self):
    try:
        # Celery finishes its task here with an "OK" result
        return "OK"
    except Exception as exc:
        # An exception is raised, Celery restarts the task with a defined interval
        self.retry(exc=exc, countdown=5 * self.request.retries)

Here is the code to launch this task from anywhere :

from app.core.tasks import task_example
task_example.delay()

Testing Celery with Django

Celery provides a "start_worker" function to easily test a task.

Here is how to start the task previously defined in a test with Django :

import json
from celery.contrib.testing.worker import start_worker
from django.test import SimpleTestCase
from app.core.models import Task
from app.core.tasks import app, task_example

class WorkerTest(SimpleTestCase):
    celery_worker = None
    databases = '__all__'

    @classmethod
    def setUpClass(cls):
        super().setUpClass()
        cls.celery_worker = start_worker(app, perform_ping_check=False)
        cls.celery_worker.__enter__()

    @classmethod
    def tearDownClass(cls):
        super().tearDownClass()
        cls.celery_worker.__exit__(None, None, None)

    @classmethod
    def test_run_script(cls):
        task_example.delay()

Docker compose with Celery

Here, I reuse the previous docker compose and add :

worker
RabbitMQ broker

version: "3.3"

services:
  app_rabbitmq:
    container_name: app_rabbitmq
    hostname: rabbitmq
    image: rabbitmq:latest
    ports:
      - "5672:5672"
    networks:
      - app_network
    restart: on-failure
    environment:
      - RABBITMQ_DEFAULT_USER=app_user
      - RABBITMQ_DEFAULT_PASS=changeme
  app_db:
    container_name: app_db
    image: postgres:13.1
    environment:
      - POSTGRES_USER=app_user
      - POSTGRES_PASSWORD=changeme
      - POSTGRES_DB=app_db
    volumes:
      - app_db:/var/lib/postgresql/13.1/main
    ports:
      - "5432:5432"
    networks:
      - app_network
    restart: on-failure
  app_worker_core:
    command: sh -c "celery -A app.core worker -l info"
    container_name: app_worker_core
    depends_on:
      - app
      - app_db
      - app_rabbitmq
    hostname: app_worker_core
    image: app
    networks:
      - app_network
    restart: on-failure
  app:
    container_name: app
    build: ./
    depends_on:
      - app_db
      - app_rabbitmq
    ports:
      - "8001:8000"
    image: app
    networks:
      - app_network
    restart: on-failure

networks:
  app_network:

volumes:
  app_db:

4. Django Channels, the messenger

Django Channels is a simple way to use WebSockets with Django. It uses the ASGI that I configured earlier.

Here are some examples of uses:

instant messenger
online multiplayer game

Configuring Django Channels

Django Channels needs channel layers to communicate with other services .

Here, I use Redis. Here is how I define it in Django settings with the channels_redis package:

CHANNEL_LAYERS = {
    'default': {
        'BACKEND': 'channels_redis.core.RedisChannelLayer',
        'CONFIG': {
            "hosts": [('127.0.0.1', 6379)],
        },
    },
}

Using Django Channels with Python

Django Channels uses consumer classes. This is the interface to the WebSocket. For example, the actions :

a new connection
writing to the WebSocket
a disconnection

Here is an example of a consumer that :

adds the user to a channel based on a "key_composer" key retrieved from the url
defines a function to send a screenshot to the client
removes the user from a channel on disconnection

from channels.generic.websocket import WebsocketConsumer
import json
from asgiref.sync import async_to_sync


class TaskConsumer(WebsocketConsumer):
    task_key_composer = None

    def connect(self):
        self.task_key_composer = self.scope['url_route']['kwargs']['key_composer']
        async_to_sync(self.channel_layer.group_add)(
            self.task_key_composer,
            self.channel_name
        )
        self.accept()

    def disconnect(self, close_code):
        async_to_sync(self.channel_layer.group_discard)(
            self.task_key_composer,
            self.channel_name
        )

    def sync_function(self, event):
        self.send(text_data=json.dumps({
            'screenshot_b64': event['screenshot_b64']
        }))

I then set this consumer on a route so that it is accessible by the client like this :

from django.urls import path
from app.core.consumers import TaskConsumer

websocket_urlpatterns = [
    path('ws/task/<key_composer>/', TaskConsumer.as_asgi()),
]

I then add this route to the previously defined ASGI app :

import os
from django.core.asgi import get_asgi_application
from channels.routing import ProtocolTypeRouter, URLRouter

os.environ.setdefault('DJANGO_SETTINGS_MODULE', 'settings.production')
django_asgi_app = get_asgi_application()

from channels.auth import AuthMiddlewareStack
from app.core.urls import websocket_urlpatterns

application = ProtocolTypeRouter({
    "http": django_asgi_app,
    "websocket": AuthMiddlewareStack(
        URLRouter(
            websocket_urlpatterns
        )
    ),
})

To write to the WebSocket from Python, here is the code to use:

from channels.layers import get_channel_layer
from asgiref.sync import async_to_sync

channel_layer = get_channel_layer()
async_to_sync(channel_layer.group_send)(
    'my_key_composer',
    {
        'type': 'sync_function',
        'screenshot_b64': browser.get_screenshot_as_base64()
    }
)

Here is the example of the django-selenium-docker webapp on the client side (front-end):

client connects to the WebSocket
gets the screenshots
updates the interface

<script>
    const screenshot = document.querySelector('#screenshoot-my_key_composer');
    const wsProtocol = location.protocol !== 'https:' ? 'ws' : 'wss'
    const socket = new WebSocket(
        wsProtocol + '://'
        + window.location.host
        + '/ws/task/my_key_composer/'
    );
    socket.onmessage = function (e) {
        const data = JSON.parse(e.data);
        screenshot.src = 'data:image/png;base64, ' + data.screenshot_b64;
    };
    socket.onclose = function (e) {
        console.error('Chat socket closed unexpectedly');
    };
</script>

Docker compose with Django Channels

Then, I update my docker compose with the Redis channel layer like this:

version: "3.3"

services:
  app_rabbitmq:
    container_name: app_rabbitmq
    hostname: rabbitmq
    image: rabbitmq:latest
    ports:
      - "5672:5672"
    networks:
      - app_network
    restart: on-failure
    environment:
      - RABBITMQ_DEFAULT_USER=app_user
      - RABBITMQ_DEFAULT_PASS=changeme
  app_redis:
    container_name: app_redis
    networks:
      - app_network
    image: redis:latest
    command: redis-server --requirepass changeme
    ports:
      - "6379:6379"
    restart: on-failure
  app_db:
    container_name: app_db
    image: postgres:13.1
    environment:
      - POSTGRES_USER=app_user
      - POSTGRES_PASSWORD=changeme
      - POSTGRES_DB=app_db
    volumes:
      - app_db:/var/lib/postgresql/13.1/main
    ports:
      - "5432:5432"
    networks:
      - app_network
    restart: on-failure
  app_worker_core:
    command: sh -c "celery -A app.core worker -l info"
    container_name: app_worker_core
    depends_on:
      - app
      - app_db
      - app_rabbitmq
      - app_redis
    hostname: app_worker_core
    image: app
    networks:
      - app_network
    restart: on-failure
  app:
    container_name: app
    build: ./
    depends_on:
      - app_db
      - app_rabbitmq
      - app_redis
    ports:
      - "8001:8000"
    image: app
    networks:
      - app_network
    restart: on-failure

networks:
  app_network:

volumes:
  app_db:

Secure WebSockets

If you use WebSockets with the app in HTTPS, you must use the WSS (web socket secure) protocol.

Here is an example of configuration with :

Nginx with a proxy_pass
Letsencrypt
Docker

upstream app {
    server 127.0.0.1:8001;
}

server {
    server_name example.com;
    listen 443 ssl http2;
    client_max_body_size 50m;
    location / {
         include proxy_params;
         proxy_pass http://app;
    }
    location /ws/ {
    	 proxy_pass http://app;
         proxy_http_version 1.1;
         proxy_set_header Upgrade $http_upgrade;
         proxy_set_header Connection "upgrade";
    }
    ssl_certificate /etc/letsencrypt/live/example.com/fullchain.pem;
    ssl_certificate_key /etc/letsencrypt/live/example.com/privkey.pem;
    include /etc/letsencrypt/options-ssl-nginx.conf;
    ssl_dhparam /etc/letsencrypt/ssl-dhparams.pem;
}

server {
    if ($host = example.com) {
        return 301 https://$host$request_uri;
    }
    server_name example.com;
    listen 80;
    return 404;
}

Conclusion

We have seen in this article many concepts to use in a web application with Python :

Django Admin with ASGI
Selenium
Celery
Django Channels
Docker

Find the code of this example webapp on the link django-selenium-docker.

Configuring a webapp with Django, Selenium and Docker