You have built a Flask application. It handles routes, talks to a database, renders templates, and works perfectly on your laptop. Now comes the part that separates hobby projects from production software: deployment.
Deployment is the process of taking your application from a development environment — where you run flask run and hit localhost:5000 — to a production environment where real users access it over the internet, 24 hours a day, under unpredictable load, with zero tolerance for data loss or security breaches.
The gap between development and production is enormous. In development, you have one user (yourself), debug mode is on, the database is local, and if the server crashes you just restart it. In production, you might have thousands of concurrent users, secrets must be locked down, the database needs connection pooling and backups, the server must survive crashes and restart automatically, and every request must be served over HTTPS.
This tutorial covers every aspect of deploying a Flask application to production. We will work through the deployment stack from the inside out: preparing your application code, configuring a production WSGI server, setting up a reverse proxy, containerizing with Docker, deploying to cloud platforms, and building CI/CD pipelines. By the end, you will have a complete, repeatable deployment workflow that you can use for any Flask project.
Before we dive into specifics, here is the high-level checklist every Flask deployment must address:
Let us work through each of these systematically.
Production readiness starts in your application code. Before you think about servers, containers, or cloud platforms, your Flask app itself must be configured correctly.
This is the single most critical deployment rule. Flask’s debug mode enables the interactive debugger, which allows anyone who can trigger an error to execute arbitrary Python code on your server. It also enables the reloader, which watches your files for changes and restarts the process — unnecessary overhead in production.
# NEVER do this in production app.run(debug=True) # Interactive debugger exposed to the internet # Correct: debug off, or better yet, don't use app.run() at all app.run(debug=False)
In production, you will not call app.run() at all. A WSGI server like Gunicorn imports your application object directly. But if your code has debug=True anywhere, make sure it is controlled by an environment variable.
Professional Flask applications use configuration classes to separate development, testing, and production settings. This pattern keeps sensitive production values out of your code and makes it easy to switch environments.
# config.py
import os
class Config:
"""Base configuration shared across all environments."""
SECRET_KEY = os.environ.get("SECRET_KEY", "fallback-dev-key-change-me")
SQLALCHEMY_TRACK_MODIFICATIONS = False
MAX_CONTENT_LENGTH = 16 * 1024 * 1024 # 16 MB upload limit
class DevelopmentConfig(Config):
"""Local development settings."""
DEBUG = True
SQLALCHEMY_DATABASE_URI = os.environ.get(
"DATABASE_URL",
"sqlite:///dev.db"
)
class TestingConfig(Config):
"""Test suite settings."""
TESTING = True
SQLALCHEMY_DATABASE_URI = "sqlite:///:memory:"
WTF_CSRF_ENABLED = False
class ProductionConfig(Config):
"""Production settings - all secrets from environment variables."""
DEBUG = False
TESTING = False
SQLALCHEMY_DATABASE_URI = os.environ["DATABASE_URL"] # No fallback; crash if missing
SECRET_KEY = os.environ["SECRET_KEY"] # No fallback; crash if missing
SESSION_COOKIE_SECURE = True
SESSION_COOKIE_HTTPONLY = True
SESSION_COOKIE_SAMESITE = "Lax"
PREFERRED_URL_SCHEME = "https"
config_by_name = {
"development": DevelopmentConfig,
"testing": TestingConfig,
"production": ProductionConfig,
}
Notice that ProductionConfig uses os.environ["DATABASE_URL"] without a fallback. This is intentional. If the environment variable is not set, the application crashes immediately at startup with a clear KeyError. This is far better than silently connecting to a wrong database or running with a default secret key.
Load the correct configuration in your application factory:
# app/__init__.py
import os
from flask import Flask
from config import config_by_name
def create_app(config_name=None):
if config_name is None:
config_name = os.environ.get("FLASK_ENV", "development")
app = Flask(__name__)
app.config.from_object(config_by_name[config_name])
# Initialize extensions
from app.extensions import db, migrate, ma
db.init_app(app)
migrate.init_app(app, db)
ma.init_app(app)
# Register blueprints
from app.routes import api_bp
app.register_blueprint(api_bp, url_prefix="/api")
return app
Your requirements.txt must pin every dependency to an exact version. Without pinning, a new install might pull a different version of a library that introduces breaking changes or security vulnerabilities.
# Generate pinned requirements from your current environment pip freeze > requirements.txt
A pinned requirements.txt looks like this:
# requirements.txt Flask==3.1.0 Flask-SQLAlchemy==3.1.1 Flask-Migrate==4.0.7 gunicorn==23.0.0 psycopg2-binary==2.9.10 python-dotenv==1.0.1 marshmallow==3.23.1 redis==5.2.1
For more robust dependency management, consider using pip-tools. You write a requirements.in with your direct dependencies, and pip-compile generates a fully pinned requirements.txt with all transitive dependencies and hash verification.
In development, Flask serves static files from the static/ directory. In production, this is inefficient — Flask is a Python application server, not a file server. Nginx (or a CDN) should serve static files directly, bypassing your Python process entirely. We will configure this in the Nginx section.
For now, make sure your static files are organized:
myapp/ ├── app/ │ ├── static/ │ │ ├── css/ │ │ ├── js/ │ │ └── images/ │ ├── templates/ │ └── ...
Flask’s built-in development server is single-threaded, not optimized for performance, and has no process management. It is designed for one thing: local development convenience. Running it in production is like driving a go-kart on the highway — it technically moves forward, but it is not built for the conditions.
A production WSGI server handles multiple concurrent requests using worker processes or threads, manages worker lifecycle (restarting crashed workers), and is tuned for throughput and reliability.
Gunicorn (Green Unicorn) is the most popular WSGI server for Python applications. It uses a pre-fork worker model: a master process spawns multiple worker processes, each handling requests independently. If a worker crashes, the master spawns a replacement.
# Install Gunicorn pip install gunicorn
# Run with default settings (1 worker) gunicorn "app:create_app()" # Specify host and port gunicorn --bind 0.0.0.0:8000 "app:create_app()" # Multiple workers gunicorn --workers 4 --bind 0.0.0.0:8000 "app:create_app()"
The string "app:create_app()" tells Gunicorn to import the app module and call create_app() to get the WSGI application object. If your app object is a module-level variable, use "app:app" or "wsgi:app".
The number of workers determines how many concurrent requests your server can handle. The general formula is:
workers = (2 * CPU_CORES) + 1
On a 4-core machine, that is 9 workers. Each worker is a separate OS process with its own memory space, so more workers means more memory usage. Monitor your server’s memory and adjust accordingly.
For production, use a configuration file instead of command-line arguments:
# gunicorn.conf.py import multiprocessing # Server socket bind = "0.0.0.0:8000" # Worker processes workers = multiprocessing.cpu_count() * 2 + 1 worker_class = "sync" worker_connections = 1000 timeout = 30 keepalive = 2 # Logging accesslog = "-" # stdout errorlog = "-" # stderr loglevel = "info" # Process naming proc_name = "myapp" # Server mechanics daemon = False pidfile = None umask = 0 tmp_upload_dir = None # Restart workers after this many requests (prevents memory leaks) max_requests = 1000 max_requests_jitter = 50 # Preload application code before forking workers preload_app = True
# Run with config file gunicorn -c gunicorn.conf.py "app:create_app()"
Gunicorn supports different worker types for different workloads:
async def views).# Threaded workers (4 threads per worker) gunicorn --workers 4 --threads 4 --bind 0.0.0.0:8000 "app:create_app()" # Gevent workers pip install gevent gunicorn --workers 4 --worker-class gevent --bind 0.0.0.0:8000 "app:create_app()" # Uvicorn workers (for async Flask) pip install uvicorn gunicorn --workers 4 --worker-class uvicorn.workers.UvicornWorker --bind 0.0.0.0:8000 "app:create_app()"
uWSGI is an alternative WSGI server with more features and more complexity. It supports the same pre-fork model but adds protocol-level optimizations, built-in caching, and its own process management.
# Install uWSGI pip install uwsgi # Run Flask app uwsgi --http 0.0.0.0:8000 --wsgi-file wsgi.py --callable app --processes 4 --threads 2
uWSGI is powerful but has a steeper learning curve. For most Flask deployments, Gunicorn is the simpler and more common choice. Choose uWSGI if you need its specific features (e.g., built-in caching, spooler for background tasks, or the uwsgi protocol for Nginx communication).
In production, you do not expose Gunicorn directly to the internet. Instead, you put Nginx in front of it as a reverse proxy. Nginx handles several responsibilities that Gunicorn should not:
# /etc/nginx/sites-available/myapp
server {
listen 80;
server_name myapp.example.com;
# Redirect all HTTP to HTTPS
return 301 https://$server_name$request_uri;
}
server {
listen 443 ssl http2;
server_name myapp.example.com;
# SSL certificates (managed by Certbot)
ssl_certificate /etc/letsencrypt/live/myapp.example.com/fullchain.pem;
ssl_certificate_key /etc/letsencrypt/live/myapp.example.com/privkey.pem;
ssl_protocols TLSv1.2 TLSv1.3;
ssl_ciphers HIGH:!aNULL:!MD5;
ssl_prefer_server_ciphers on;
# Security headers
add_header X-Frame-Options "SAMEORIGIN" always;
add_header X-Content-Type-Options "nosniff" always;
add_header X-XSS-Protection "1; mode=block" always;
add_header Strict-Transport-Security "max-age=31536000; includeSubDomains" always;
# Serve static files directly
location /static/ {
alias /var/www/myapp/app/static/;
expires 30d;
add_header Cache-Control "public, immutable";
}
# Proxy all other requests to Gunicorn
location / {
proxy_pass http://127.0.0.1:8000;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Proto $scheme;
proxy_redirect off;
# Timeouts
proxy_connect_timeout 60s;
proxy_read_timeout 60s;
proxy_send_timeout 60s;
}
# Client upload size limit
client_max_body_size 16M;
# Logging
access_log /var/log/nginx/myapp_access.log;
error_log /var/log/nginx/myapp_error.log;
}
# Enable the site sudo ln -s /etc/nginx/sites-available/myapp /etc/nginx/sites-enabled/ sudo nginx -t # Test configuration sudo systemctl reload nginx
When Nginx forwards requests to Gunicorn, Flask sees the request as coming from 127.0.0.1 instead of the actual client. The X-Forwarded-For and X-Forwarded-Proto headers carry the original client information. Tell Flask to trust these headers:
from werkzeug.middleware.proxy_fix import ProxyFix app = create_app() app.wsgi_app = ProxyFix(app.wsgi_app, x_for=1, x_proto=1, x_host=1, x_prefix=1)
Let’s Encrypt provides free TLS certificates. Certbot automates the entire process:
# Install Certbot sudo apt install certbot python3-certbot-nginx # Obtain and install certificate (auto-configures Nginx) sudo certbot --nginx -d myapp.example.com # Certbot sets up auto-renewal. Verify it: sudo certbot renew --dry-run
Certbot modifies your Nginx configuration to add SSL directives and sets up a systemd timer for automatic renewal before the certificate expires (every 90 days).
Docker packages your application, its dependencies, and its runtime environment into a single, portable image. This eliminates the “works on my machine” problem — if it runs in the Docker container locally, it runs the same way in production.
# Dockerfile
FROM python:3.12-slim
# Set environment variables
ENV PYTHONDONTWRITEBYTECODE=1 \
PYTHONUNBUFFERED=1
# Create a non-root user
RUN groupadd -r appuser && useradd -r -g appuser -d /app -s /sbin/nologin appuser
# Set working directory
WORKDIR /app
# Install system dependencies
RUN apt-get update && \
apt-get install -y --no-install-recommends gcc libpq-dev && \
rm -rf /var/lib/apt/lists/*
# Install Python dependencies
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
# Copy application code
COPY . .
# Switch to non-root user
USER appuser
# Expose port
EXPOSE 8000
# Health check
HEALTHCHECK --interval=30s --timeout=10s --start-period=5s --retries=3 \
CMD python -c "import urllib.request; urllib.request.urlopen('http://localhost:8000/health')" || exit 1
# Run with Gunicorn
CMD ["gunicorn", "-c", "gunicorn.conf.py", "app:create_app()"]
Key decisions in this Dockerfile:
python:3.12-slim — The slim variant is much smaller than the full image (150 MB vs 1 GB) while still including essential system librariesPYTHONDONTWRITEBYTECODE=1 — Prevents Python from creating .pyc files in the containerPYTHONUNBUFFERED=1 — Ensures print statements and log messages appear immediately in Docker logsrequirements.txt changes, not on every code changeMulti-stage builds produce smaller production images by separating the build environment from the runtime environment:
# Dockerfile.multistage
# Stage 1: Build
FROM python:3.12-slim AS builder
WORKDIR /app
RUN apt-get update && \
apt-get install -y --no-install-recommends gcc libpq-dev && \
rm -rf /var/lib/apt/lists/*
COPY requirements.txt .
RUN pip install --no-cache-dir --prefix=/install -r requirements.txt
# Stage 2: Production
FROM python:3.12-slim
ENV PYTHONDONTWRITEBYTECODE=1 \
PYTHONUNBUFFERED=1
RUN groupadd -r appuser && useradd -r -g appuser -d /app -s /sbin/nologin appuser
# Install only runtime dependencies (no gcc, no build tools)
RUN apt-get update && \
apt-get install -y --no-install-recommends libpq5 && \
rm -rf /var/lib/apt/lists/*
# Copy installed packages from builder
COPY --from=builder /install /usr/local
WORKDIR /app
COPY . .
USER appuser
EXPOSE 8000
HEALTHCHECK --interval=30s --timeout=10s --start-period=5s --retries=3 \
CMD python -c "import urllib.request; urllib.request.urlopen('http://localhost:8000/health')" || exit 1
CMD ["gunicorn", "-c", "gunicorn.conf.py", "app:create_app()"]
The builder stage installs GCC and builds any C extensions (like psycopg2). The production stage only copies the compiled packages, leaving the build tools behind. This can reduce your image size by 200+ MB.
Always include a .dockerignore to keep unnecessary files out of the image:
# .dockerignore __pycache__ *.pyc *.pyo .git .gitignore .env .env.* *.md .pytest_cache .mypy_cache .coverage htmlcov/ venv/ .venv/ docker-compose*.yml Dockerfile* .dockerignore tests/ docs/ *.log
Docker Compose orchestrates multiple containers. A typical Flask production stack includes the application, a database, and a cache:
# docker-compose.yml
version: "3.9"
services:
web:
build: .
ports:
- "8000:8000"
environment:
- FLASK_ENV=production
- DATABASE_URL=postgresql://myapp:secretpassword@db:5432/myapp
- SECRET_KEY=${SECRET_KEY}
- REDIS_URL=redis://redis:6379/0
depends_on:
db:
condition: service_healthy
redis:
condition: service_healthy
restart: unless-stopped
volumes:
- app-static:/app/app/static
networks:
- backend
db:
image: postgres:16-alpine
environment:
- POSTGRES_DB=myapp
- POSTGRES_USER=myapp
- POSTGRES_PASSWORD=secretpassword
volumes:
- postgres-data:/var/lib/postgresql/data
healthcheck:
test: ["CMD-SHELL", "pg_isready -U myapp"]
interval: 10s
timeout: 5s
retries: 5
restart: unless-stopped
networks:
- backend
redis:
image: redis:7-alpine
command: redis-server --maxmemory 128mb --maxmemory-policy allkeys-lru
volumes:
- redis-data:/data
healthcheck:
test: ["CMD", "redis-cli", "ping"]
interval: 10s
timeout: 5s
retries: 5
restart: unless-stopped
networks:
- backend
nginx:
image: nginx:alpine
ports:
- "80:80"
- "443:443"
volumes:
- ./nginx/nginx.conf:/etc/nginx/conf.d/default.conf:ro
- app-static:/var/www/static:ro
- ./certbot/conf:/etc/letsencrypt:ro
- ./certbot/www:/var/www/certbot:ro
depends_on:
- web
restart: unless-stopped
networks:
- backend
volumes:
postgres-data:
redis-data:
app-static:
networks:
backend:
driver: bridge
# Build and start all services docker compose up -d --build # View logs docker compose logs -f web # Run database migrations docker compose exec web flask db upgrade # Stop all services docker compose down # Stop and remove all data (careful!) docker compose down -v
Your Flask application needs a health check endpoint that Docker, load balancers, and monitoring tools can hit:
# app/routes/health.py
from flask import Blueprint, jsonify
from app.extensions import db
health_bp = Blueprint("health", __name__)
@health_bp.route("/health")
def health_check():
"""Basic health check - is the app running?"""
return jsonify({"status": "healthy"}), 200
@health_bp.route("/health/ready")
def readiness_check():
"""Readiness check - can the app handle requests?
Checks database connectivity and other dependencies.
"""
checks = {}
# Check database
try:
db.session.execute(db.text("SELECT 1"))
checks["database"] = "connected"
except Exception as e:
checks["database"] = f"error: {str(e)}"
return jsonify({"status": "unhealthy", "checks": checks}), 503
return jsonify({"status": "healthy", "checks": checks}), 200
With your application containerized, you have multiple options for where to run it. Each cloud platform offers different tradeoffs between control, simplicity, and cost.
AWS offers several services for Flask deployment, ranging from fully managed to bare metal:
The simplest AWS option. Elastic Beanstalk handles provisioning, load balancing, auto-scaling, and monitoring. You deploy your code, and AWS manages the infrastructure.
# Install EB CLI pip install awsebcli # Initialize Elastic Beanstalk in your project eb init -p python-3.12 myapp --region us-east-1 # Create an environment and deploy eb create production # Deploy updates eb deploy # Open in browser eb open
Elastic Beanstalk looks for an application.py file or a Procfile to know how to run your app:
# Procfile (for Elastic Beanstalk) web: gunicorn -c gunicorn.conf.py "app:create_app()"
For Docker-based deployments with more control. You push your Docker image to ECR (Elastic Container Registry) and define how ECS runs it. ECS handles scaling, networking, and load balancing. More configuration than Elastic Beanstalk, but more flexibility.
# Build and push to ECR aws ecr get-login-password --region us-east-1 | docker login --username AWS --password-stdin 123456789.dkr.ecr.us-east-1.amazonaws.com docker build -t myapp . docker tag myapp:latest 123456789.dkr.ecr.us-east-1.amazonaws.com/myapp:latest docker push 123456789.dkr.ecr.us-east-1.amazonaws.com/myapp:latest
Full control. You provision a virtual server, SSH in, install everything yourself, and manage updates. This is the most work but gives you complete control over the environment. Use this when you have specific requirements that managed services cannot accommodate.
Heroku is the fastest path from code to production. It is a Platform-as-a-Service (PaaS) that abstracts away all infrastructure concerns.
# Procfile (required by Heroku) web: gunicorn "app:create_app()"
# runtime.txt (specify Python version) python-3.12.8
# Deploy to Heroku heroku create myapp-production heroku addons:create heroku-postgresql:essential-0 heroku config:set SECRET_KEY=your-production-secret-key heroku config:set FLASK_ENV=production git push heroku main # Run migrations heroku run flask db upgrade # View logs heroku logs --tail
Heroku automatically detects Python applications, installs dependencies from requirements.txt, and runs the command specified in Procfile. It handles HTTPS, load balancing, and zero-downtime deploys.
DigitalOcean App Platform sits between Heroku’s simplicity and AWS’s flexibility. It supports both Dockerfile-based and buildpack-based deployments, connects directly to your GitHub repository, and auto-deploys on push.
# .do/app.yaml
name: myapp
services:
- name: web
github:
repo: yourusername/myapp
branch: main
build_command: pip install -r requirements.txt
run_command: gunicorn "app:create_app()"
environment_slug: python
instance_count: 2
instance_size_slug: professional-xs
envs:
- key: FLASK_ENV
value: production
- key: SECRET_KEY
type: SECRET
value: your-secret-key
- key: DATABASE_URL
scope: RUN_TIME
value: ${db.DATABASE_URL}
databases:
- name: db
engine: PG
version: "16"
| Factor | Heroku | AWS EB | AWS ECS | DigitalOcean |
|---|---|---|---|---|
| Setup complexity | Low | Medium | High | Low |
| Control | Limited | Medium | High | Medium |
| Cost (small app) | $5-25/mo | $15-50/mo | $20-60/mo | $5-25/mo |
| Auto-scaling | Yes | Yes | Yes | Yes |
| Docker support | Yes | Yes | Native | Yes |
| Free tier | No | Yes (12 months) | Yes (12 months) | No |
Your development SQLite database will not work in production. Production databases need concurrent access, connection pooling, automated backups, and replication. PostgreSQL is the standard choice for Flask applications.
Every database query requires a connection. Opening and closing connections for each request is expensive. Connection pooling maintains a pool of reusable connections.
SQLAlchemy (which Flask-SQLAlchemy wraps) includes a built-in connection pool. Configure it for production:
# config.py - ProductionConfig
class ProductionConfig(Config):
SQLALCHEMY_DATABASE_URI = os.environ["DATABASE_URL"]
# Connection pool settings
SQLALCHEMY_ENGINE_OPTIONS = {
"pool_size": 20, # Maximum number of persistent connections
"max_overflow": 10, # Extra connections allowed beyond pool_size
"pool_timeout": 30, # Seconds to wait for a connection from the pool
"pool_recycle": 1800, # Recycle connections after 30 minutes
"pool_pre_ping": True, # Test connections before using them
}
pool_pre_ping=True is especially important. It tests each connection before handing it to your application. If the connection has gone stale (e.g., the database restarted), SQLAlchemy transparently creates a new one instead of giving you a broken connection that causes an error on your user’s request.
Flask-Migrate (powered by Alembic) tracks database schema changes as versioned migration scripts. This is essential in production because you cannot drop and recreate tables — you have real data.
# Generate a migration after changing models flask db migrate -m "add user email column" # Review the generated migration in migrations/versions/ # Then apply it flask db upgrade # Rollback if something goes wrong flask db downgrade
Always review generated migrations before applying them. Alembic does its best to detect changes, but it can miss things (especially column renames, which it detects as a drop + create). Treat migrations as code that deserves code review.
Automate PostgreSQL backups with a cron job:
#!/bin/bash
# backup.sh
TIMESTAMP=$(date +%Y%m%d_%H%M%S)
BACKUP_DIR="/backups/postgres"
DB_NAME="myapp"
mkdir -p "$BACKUP_DIR"
pg_dump -U myapp -h localhost "$DB_NAME" | gzip > "$BACKUP_DIR/${DB_NAME}_${TIMESTAMP}.sql.gz"
# Keep only last 30 days of backups
find "$BACKUP_DIR" -name "*.sql.gz" -mtime +30 -delete
echo "Backup completed: ${DB_NAME}_${TIMESTAMP}.sql.gz"
# Add to crontab (daily at 2 AM) 0 2 * * * /opt/scripts/backup.sh >> /var/log/backup.log 2>&1
If you are using a managed database (AWS RDS, DigitalOcean Managed Databases), automated backups are built in. Configure the retention period and test your restore procedure regularly.
In production, print() statements are not logging. You need structured, configurable logging that writes to files or external services, includes severity levels, and gives you enough context to debug problems at 3 AM without SSH access to the server.
# app/logging_config.py
import logging
import logging.handlers
import os
def configure_logging(app):
"""Configure application logging for production."""
# Remove default Flask handler
app.logger.handlers.clear()
# Set log level from environment
log_level = os.environ.get("LOG_LEVEL", "INFO").upper()
app.logger.setLevel(getattr(logging, log_level))
# Console handler (for Docker/container logs)
console_handler = logging.StreamHandler()
console_handler.setLevel(logging.DEBUG)
# Format: timestamp - logger name - level - message
formatter = logging.Formatter(
"[%(asctime)s] %(name)s %(levelname)s in %(module)s: %(message)s",
datefmt="%Y-%m-%d %H:%M:%S"
)
console_handler.setFormatter(formatter)
app.logger.addHandler(console_handler)
# File handler with rotation (for VM deployments)
if os.environ.get("LOG_TO_FILE"):
file_handler = logging.handlers.RotatingFileHandler(
"logs/app.log",
maxBytes=10_000_000, # 10 MB
backupCount=10
)
file_handler.setLevel(logging.INFO)
file_handler.setFormatter(formatter)
app.logger.addHandler(file_handler)
# Suppress noisy loggers
logging.getLogger("werkzeug").setLevel(logging.WARNING)
logging.getLogger("sqlalchemy.engine").setLevel(logging.WARNING)
app.logger.info("Logging configured at %s level", log_level)
For production systems that feed logs into aggregation services (ELK stack, Datadog, CloudWatch), JSON-formatted logs are easier to parse and query:
# app/logging_config.py (JSON variant)
import json
import logging
from datetime import datetime, timezone
class JSONFormatter(logging.Formatter):
"""Format log records as JSON for log aggregation services."""
def format(self, record):
log_entry = {
"timestamp": datetime.now(timezone.utc).isoformat(),
"level": record.levelname,
"logger": record.name,
"module": record.module,
"function": record.funcName,
"line": record.lineno,
"message": record.getMessage(),
}
if record.exc_info:
log_entry["exception"] = self.formatException(record.exc_info)
# Include extra fields if present
if hasattr(record, "request_id"):
log_entry["request_id"] = record.request_id
if hasattr(record, "user_id"):
log_entry["user_id"] = record.user_id
return json.dumps(log_entry)
# app/middleware.py
import time
import uuid
from flask import g, request, current_app
def register_request_hooks(app):
"""Register before/after request hooks for logging."""
@app.before_request
def before_request():
g.request_id = str(uuid.uuid4())[:8]
g.start_time = time.time()
@app.after_request
def after_request(response):
duration = time.time() - g.start_time
current_app.logger.info(
"request_completed",
extra={
"request_id": g.request_id,
"method": request.method,
"path": request.path,
"status": response.status_code,
"duration_ms": round(duration * 1000, 2),
"ip": request.remote_addr,
}
)
response.headers["X-Request-ID"] = g.request_id
return response
Sentry captures exceptions in real time, groups them, tracks their frequency, and provides full stack traces with local variable values. It is the industry standard for production error tracking.
pip install sentry-sdk[flask]
# app/__init__.py
import os
import sentry_sdk
from sentry_sdk.integrations.flask import FlaskIntegration
def create_app(config_name=None):
# Initialize Sentry before creating the app
if os.environ.get("SENTRY_DSN"):
sentry_sdk.init(
dsn=os.environ["SENTRY_DSN"],
integrations=[FlaskIntegration()],
traces_sample_rate=0.1, # 10% of requests for performance monitoring
environment=os.environ.get("FLASK_ENV", "production"),
)
app = Flask(__name__)
# ... rest of factory
The twelve-factor app methodology (12factor.net) establishes that configuration should be stored in the environment, not in code. This principle is fundamental to modern deployment.
In development, environment variables are managed with .env files. The python-dotenv package loads these into the environment automatically.
pip install python-dotenv
# .env (NEVER commit this file) FLASK_ENV=development SECRET_KEY=dev-secret-key-not-for-production DATABASE_URL=postgresql://localhost:5432/myapp_dev REDIS_URL=redis://localhost:6379/0 SENTRY_DSN= LOG_LEVEL=DEBUG
# wsgi.py (entry point) from dotenv import load_dotenv load_dotenv() # Load .env file before anything else from app import create_app app = create_app()
Critical rule: Never commit .env files to version control. Add them to .gitignore. Provide a .env.example with placeholder values so developers know which variables are needed.
# .env.example (commit this file) FLASK_ENV=development SECRET_KEY=change-me-to-a-random-string DATABASE_URL=postgresql://localhost:5432/myapp_dev REDIS_URL=redis://localhost:6379/0 SENTRY_DSN= LOG_LEVEL=DEBUG
The twelve factors most relevant to Flask deployment:
requirements.txt, virtual environments)flask db upgrade, management commands)Continuous Integration and Continuous Deployment automates testing and deployment. Every push to your repository triggers a pipeline that tests your code, builds a Docker image, and deploys it to production. No manual steps, no “I forgot to run the tests” moments.
# .github/workflows/deploy.yml
name: Test, Build, Deploy
on:
push:
branches: [main]
pull_request:
branches: [main]
env:
REGISTRY: ghcr.io
IMAGE_NAME: ${{ github.repository }}
jobs:
test:
runs-on: ubuntu-latest
services:
postgres:
image: postgres:16-alpine
env:
POSTGRES_DB: myapp_test
POSTGRES_USER: myapp
POSTGRES_PASSWORD: testpassword
ports:
- 5432:5432
options: >-
--health-cmd pg_isready
--health-interval 10s
--health-timeout 5s
--health-retries 5
steps:
- uses: actions/checkout@v4
- name: Set up Python
uses: actions/setup-python@v5
with:
python-version: "3.12"
cache: "pip"
- name: Install dependencies
run: |
python -m pip install --upgrade pip
pip install -r requirements.txt
pip install pytest pytest-cov
- name: Run tests
env:
DATABASE_URL: postgresql://myapp:testpassword@localhost:5432/myapp_test
SECRET_KEY: test-secret-key
FLASK_ENV: testing
run: |
pytest --cov=app --cov-report=xml -v
- name: Upload coverage
uses: codecov/codecov-action@v4
with:
file: ./coverage.xml
build:
needs: test
runs-on: ubuntu-latest
if: github.event_name == 'push'
permissions:
contents: read
packages: write
steps:
- uses: actions/checkout@v4
- name: Log in to Container Registry
uses: docker/login-action@v3
with:
registry: ${{ env.REGISTRY }}
username: ${{ github.actor }}
password: ${{ secrets.GITHUB_TOKEN }}
- name: Build and push Docker image
uses: docker/build-push-action@v5
with:
context: .
push: true
tags: |
${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}:latest
${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}:${{ github.sha }}
deploy:
needs: build
runs-on: ubuntu-latest
if: github.event_name == 'push'
steps:
- name: Deploy to production server
uses: appleboy/ssh-action@v1
with:
host: ${{ secrets.DEPLOY_HOST }}
username: ${{ secrets.DEPLOY_USER }}
key: ${{ secrets.DEPLOY_SSH_KEY }}
script: |
cd /opt/myapp
docker compose pull web
docker compose up -d --no-deps web
docker compose exec -T web flask db upgrade
docker image prune -f
This pipeline has three stages:
main (not PRs). Builds the Docker image and pushes it to GitHub Container Registry.Let us put everything together into a complete, production-ready deployment. This is the full stack you would use for a real Flask application.
myapp/ ├── app/ │ ├── __init__.py # Application factory │ ├── extensions.py # SQLAlchemy, Migrate, etc. │ ├── models/ │ ├── routes/ │ │ ├── api.py │ │ └── health.py │ ├── static/ │ └── templates/ ├── migrations/ # Flask-Migrate / Alembic ├── tests/ ├── nginx/ │ └── nginx.conf ├── .env.example ├── .dockerignore ├── .github/ │ └── workflows/ │ └── deploy.yml ├── config.py ├── docker-compose.yml ├── docker-compose.prod.yml ├── Dockerfile ├── gunicorn.conf.py ├── requirements.txt └── wsgi.py
# wsgi.py
import os
from dotenv import load_dotenv
load_dotenv()
from app import create_app
app = create_app(os.environ.get("FLASK_ENV", "production"))
# docker-compose.prod.yml
version: "3.9"
services:
web:
build:
context: .
dockerfile: Dockerfile
expose:
- "8000"
environment:
- FLASK_ENV=production
- DATABASE_URL=postgresql://myapp:${DB_PASSWORD}@db:5432/myapp
- SECRET_KEY=${SECRET_KEY}
- REDIS_URL=redis://redis:6379/0
- SENTRY_DSN=${SENTRY_DSN}
- LOG_LEVEL=INFO
depends_on:
db:
condition: service_healthy
redis:
condition: service_healthy
restart: unless-stopped
networks:
- internal
db:
image: postgres:16-alpine
environment:
- POSTGRES_DB=myapp
- POSTGRES_USER=myapp
- POSTGRES_PASSWORD=${DB_PASSWORD}
volumes:
- postgres-data:/var/lib/postgresql/data
- ./backups:/backups
healthcheck:
test: ["CMD-SHELL", "pg_isready -U myapp"]
interval: 10s
timeout: 5s
retries: 5
restart: unless-stopped
networks:
- internal
redis:
image: redis:7-alpine
command: redis-server --maxmemory 256mb --maxmemory-policy allkeys-lru --requirepass ${REDIS_PASSWORD}
volumes:
- redis-data:/data
healthcheck:
test: ["CMD", "redis-cli", "-a", "${REDIS_PASSWORD}", "ping"]
interval: 10s
timeout: 5s
retries: 5
restart: unless-stopped
networks:
- internal
nginx:
image: nginx:alpine
ports:
- "80:80"
- "443:443"
volumes:
- ./nginx/nginx.conf:/etc/nginx/conf.d/default.conf:ro
- static-files:/var/www/static:ro
- ./certbot/conf:/etc/letsencrypt:ro
- ./certbot/www:/var/www/certbot:ro
depends_on:
- web
restart: unless-stopped
networks:
- internal
volumes:
postgres-data:
redis-data:
static-files:
networks:
internal:
driver: bridge
# nginx/nginx.conf
upstream flask_app {
server web:8000;
}
# Rate limiting zone
limit_req_zone $binary_remote_addr zone=api:10m rate=10r/s;
server {
listen 80;
server_name myapp.example.com;
# Allow Let's Encrypt challenge
location /.well-known/acme-challenge/ {
root /var/www/certbot;
}
# Redirect everything else to HTTPS
location / {
return 301 https://$server_name$request_uri;
}
}
server {
listen 443 ssl http2;
server_name myapp.example.com;
ssl_certificate /etc/letsencrypt/live/myapp.example.com/fullchain.pem;
ssl_certificate_key /etc/letsencrypt/live/myapp.example.com/privkey.pem;
ssl_protocols TLSv1.2 TLSv1.3;
ssl_ciphers HIGH:!aNULL:!MD5;
ssl_prefer_server_ciphers on;
ssl_session_cache shared:SSL:10m;
ssl_session_timeout 10m;
# Security headers
add_header X-Frame-Options "SAMEORIGIN" always;
add_header X-Content-Type-Options "nosniff" always;
add_header Referrer-Policy "strict-origin-when-cross-origin" always;
add_header Content-Security-Policy "default-src 'self'" always;
add_header Strict-Transport-Security "max-age=31536000; includeSubDomains; preload" always;
# Gzip compression
gzip on;
gzip_types text/plain text/css application/json application/javascript text/xml;
gzip_min_length 1000;
# Static files served directly by Nginx
location /static/ {
alias /var/www/static/;
expires 30d;
add_header Cache-Control "public, immutable";
access_log off;
}
# API routes with rate limiting
location /api/ {
limit_req zone=api burst=20 nodelay;
proxy_pass http://flask_app;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Proto $scheme;
}
# All other routes
location / {
proxy_pass http://flask_app;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Proto $scheme;
}
client_max_body_size 16M;
}
# gunicorn.conf.py
import multiprocessing
import os
# Server socket
bind = "0.0.0.0:8000"
# Workers
workers = int(os.environ.get("GUNICORN_WORKERS", multiprocessing.cpu_count() * 2 + 1))
worker_class = os.environ.get("GUNICORN_WORKER_CLASS", "sync")
worker_connections = 1000
timeout = 120
keepalive = 5
# Logging
accesslog = "-"
errorlog = "-"
loglevel = os.environ.get("LOG_LEVEL", "info").lower()
access_log_format = '%(h)s %(l)s %(u)s %(t)s "%(r)s" %(s)s %(b)s "%(f)s" "%(a)s" %(D)s'
# Process management
max_requests = 1000
max_requests_jitter = 50
preload_app = True
graceful_timeout = 30
# Hook: log when workers start and stop
def on_starting(server):
server.log.info("Gunicorn master starting")
def post_fork(server, worker):
server.log.info("Worker spawned (pid: %s)", worker.pid)
def worker_exit(server, worker):
server.log.info("Worker exited (pid: %s)", worker.pid)
# scripts/check_production.py
"""Pre-deployment production readiness checker."""
import os
import sys
def check_production_readiness():
checks = []
errors = []
# 1. Check required environment variables
required_vars = ["SECRET_KEY", "DATABASE_URL", "FLASK_ENV"]
for var in required_vars:
if os.environ.get(var):
checks.append(f"[PASS] {var} is set")
else:
errors.append(f"[FAIL] {var} is not set")
# 2. Check debug mode
flask_env = os.environ.get("FLASK_ENV", "")
if flask_env == "production":
checks.append("[PASS] FLASK_ENV is 'production'")
else:
errors.append(f"[FAIL] FLASK_ENV is '{flask_env}', expected 'production'")
# 3. Check SECRET_KEY is not a default
secret = os.environ.get("SECRET_KEY", "")
weak_secrets = ["dev", "secret", "change-me", "default", "password"]
if any(weak in secret.lower() for weak in weak_secrets):
errors.append("[FAIL] SECRET_KEY appears to be a default/weak value")
elif len(secret) < 32:
errors.append(f"[FAIL] SECRET_KEY is too short ({len(secret)} chars, need 32+)")
else:
checks.append("[PASS] SECRET_KEY looks strong")
# 4. Check database URL is not SQLite
db_url = os.environ.get("DATABASE_URL", "")
if "sqlite" in db_url:
errors.append("[FAIL] DATABASE_URL uses SQLite (not suitable for production)")
else:
checks.append("[PASS] DATABASE_URL is not SQLite")
# Print results
print("\n=== Production Readiness Check ===\n")
for check in checks:
print(f" {check}")
for error in errors:
print(f" {error}")
print(f"\n Passed: {len(checks)}, Failed: {len(errors)}\n")
if errors:
print(" RESULT: NOT READY FOR PRODUCTION\n")
sys.exit(1)
else:
print(" RESULT: READY FOR PRODUCTION\n")
sys.exit(0)
if __name__ == "__main__":
check_production_readiness()
Scaling is the art of handling more traffic without degrading performance. There are two approaches, and you will eventually use both.
Vertical scaling means giving your server more resources — more CPU, more RAM, faster disks. It is the simplest approach: upgrade your VM from 2 cores to 8 cores, and Gunicorn spawns more workers. But vertical scaling has a ceiling. A single machine can only get so big, and it is still a single point of failure.
Horizontal scaling means running multiple instances of your application behind a load balancer. This is the standard approach for production systems.
┌─────────────┐
│ Internet │
└──────┬──────┘
│
┌──────▼──────┐
│ Load Balancer│
│ (Nginx) │
└──┬───┬───┬──┘
│ │ │
┌────────▼┐ ┌▼────────┐ ┌▼────────┐
│ Flask #1 │ │ Flask #2 │ │ Flask #3 │
│ Gunicorn │ │ Gunicorn │ │ Gunicorn │
└────┬─────┘ └───┬─────┘ └───┬─────┘
│ │ │
┌────▼───────────▼────────────▼────┐
│ PostgreSQL + Redis │
└───────────────────────────────────┘
Horizontal scaling requires your application to be stateless. That means:
# app/cache.py
import redis
import json
import os
from functools import wraps
from flask import current_app
redis_client = redis.from_url(os.environ.get("REDIS_URL", "redis://localhost:6379/0"))
def cache_response(timeout=300, key_prefix="view"):
"""Decorator to cache Flask view responses in Redis."""
def decorator(f):
@wraps(f)
def wrapper(*args, **kwargs):
cache_key = f"{key_prefix}:{f.__name__}:{hash(str(args) + str(kwargs))}"
# Try to get from cache
cached = redis_client.get(cache_key)
if cached:
current_app.logger.debug("Cache hit: %s", cache_key)
return json.loads(cached)
# Execute function and cache result
result = f(*args, **kwargs)
redis_client.setex(cache_key, timeout, json.dumps(result))
current_app.logger.debug("Cache miss, stored: %s", cache_key)
return result
return wrapper
return decorator
# Usage in a route
@api_bp.route("/products")
@cache_response(timeout=60)
def get_products():
products = Product.query.all()
return [p.to_dict() for p in products]
# Store sessions in Redis instead of signed cookies pip install Flask-Session
# config.py
import redis
class ProductionConfig(Config):
SESSION_TYPE = "redis"
SESSION_REDIS = redis.from_url(os.environ["REDIS_URL"])
SESSION_PERMANENT = False
SESSION_USE_SIGNER = True
A Content Delivery Network serves your static files from edge servers around the world, reducing latency for users far from your origin server. Popular options include CloudFront (AWS), Cloudflare, and Fastly.
# config.py
class ProductionConfig(Config):
CDN_DOMAIN = os.environ.get("CDN_DOMAIN", "")
# In templates, use the CDN domain for static assets
# app/__init__.py
@app.context_processor
def inject_cdn():
return {"cdn_domain": app.config.get("CDN_DOMAIN", "")}
<!-- In Jinja2 templates -->
{% if cdn_domain %}
<link rel="stylesheet" href="https://{{ cdn_domain }}/static/css/style.css">
{% else %}
<link rel="stylesheet" href="{{ url_for('static', filename='css/style.css') }}">
{% endif %}
These are the mistakes I see most often in Flask deployments. Every one of them has caused production outages.
Running with debug=True exposes the Werkzeug interactive debugger. Anyone who can trigger an exception can execute arbitrary Python code on your server. This is not a theoretical risk — it is a trivially exploitable remote code execution vulnerability.
# NEVER in production app.run(debug=True) # Always check assert not app.debug, "Debug mode must be off in production"
# BAD: Secret in source code, visible in Git history forever app.config["SECRET_KEY"] = "my-super-secret-key-2024" # GOOD: Secret from environment app.config["SECRET_KEY"] = os.environ["SECRET_KEY"]
Even if you delete the hardcoded secret in a later commit, it remains in your Git history. Anyone with repository access can find it. If this has already happened, rotate the secret immediately.
Without health checks, your load balancer and container orchestrator have no way to know if your application is actually working. A process can be running but unable to handle requests (e.g., database connection lost). Health checks let the infrastructure detect and replace unhealthy instances automatically.
All traffic must be encrypted. No exceptions. Credentials, session tokens, and user data are all visible in plain HTTP. Let’s Encrypt makes this free. There is no excuse.
SQLite does not support concurrent writes. When two Gunicorn workers try to write simultaneously, one gets a “database is locked” error. Use PostgreSQL or MySQL.
Without connection pooling, every request opens a new database connection and closes it when done. Under load, you exhaust the database’s connection limit. SQLAlchemy’s pool is configured by default, but you should tune pool_size and max_overflow for your workload.
If you log to a file without rotation, the file grows until it fills the disk. Use RotatingFileHandler or, better yet, log to stdout and let Docker/systemd handle it.
When deploying a new version, the old process must finish handling in-flight requests before shutting down. Gunicorn handles this correctly with SIGTERM by default, but make sure your deployment process sends the right signal and waits for the graceful timeout.
Follow the twelve-factor methodology. It was written by engineers at Heroku who deployed millions of applications. The principles are battle-tested and apply to every Flask deployment.
Every aspect of your infrastructure should be defined in version-controlled files:
Dockerfile — Application containerdocker-compose.yml — Service orchestrationnginx.conf — Reverse proxy configurationgunicorn.conf.py — WSGI server configuration.github/workflows/deploy.yml — CI/CD pipelineIf your production server dies, you should be able to recreate the entire environment from these files. No manual server configuration. No tribal knowledge. Everything is documented in code.
Users should never see an error page during a deployment. Strategies:
pip-audit for vulnerability scanning)docker scout, trivy)# Scan for known vulnerabilities in your dependencies pip install pip-audit pip-audit # Scan Docker image docker scout cves myapp:latest
Deployment is not a one-time event. It is an ongoing practice. Your deployment infrastructure evolves with your application. Start with the basics — Gunicorn, Nginx, Docker, CI/CD — and add sophistication as your needs grow. The patterns in this tutorial will serve you from your first production deployment to your thousandth.