|
1 | 1 | # Lambda Video Transcoder
|
2 | 2 |
|
3 |
| -🔥 A serverless AWS Lambda solution for video transcoding |
| 3 | +[](https://opensource.org/licenses/MIT) |
| 4 | + |
| 5 | +🔥 A serverless AWS Lambda solution for on-the-fly video transcoding, transcription, and adaptive streaming (HLS & DASH). |
| 6 | + |
| 7 | +This project provides a robust and scalable way to process video files uploaded to an S3 bucket. When a video is uploaded, a Lambda function is triggered to: |
| 8 | +1. **Probe Video:** Determine resolution and other metadata. |
| 9 | +2. **Transcode to Multiple Resolutions:** Create different quality levels suitable for adaptive bitrate streaming (e.g., 1080p, 720p, 480p). |
| 10 | +3. **Generate HLS & DASH Playlists:** Create manifest files for Apple HLS and MPEG-DASH streaming. |
| 11 | +4. **Create Sprite Sheet:** Generate a thumbnail sprite sheet for video scrubbing previews. |
| 12 | +5. **Transcribe Audio:** Use Amazon Transcribe to generate a text transcription of the video's audio. |
| 13 | +6. **Stream Content (Optional):** A secondary Lambda handler can be exposed via API Gateway to serve the transcoded video segments and playlists directly from S3, enabling byte-range requests for efficient streaming. |
| 14 | + |
| 15 | +## Features |
| 16 | +- **Serverless Architecture:** Leverages AWS Lambda for compute, S3 for storage, and Amazon Transcribe for audio-to-text. |
| 17 | +- **Adaptive Bitrate Streaming:** Outputs HLS and DASH formats for wide compatibility across devices. |
| 18 | +- **Automated Transcription:** Integrates with Amazon Transcribe. |
| 19 | +- **Thumbnail Sprite Generation:** For enhanced video player UIs. |
| 20 | +- **Multiple Deployment Options:** Supports both traditional .zip deployment with Lambda Layers and Docker container image deployment. |
| 21 | +- **Customizable Presets:** Easily configure video output resolutions and bitrates. |
| 22 | + |
| 23 | +## Project Structure |
| 24 | +``` |
| 25 | +. |
| 26 | +├── Dockerfile # For Docker-based Lambda deployment |
| 27 | +├── LICENSE # Project License |
| 28 | +├── README.md # This file |
| 29 | +├── requirements.txt # Root Python dependencies (if any, for local dev) |
| 30 | +├── events/ |
| 31 | +│ └── basic.json # Sample event for local testing |
| 32 | +├── src/ |
| 33 | +│ └── transcoder/ |
| 34 | +│ ├── __init__.py |
| 35 | +│ ├── app.py # Core Lambda function logic |
| 36 | +│ └── requirements.txt # Dependencies for the Lambda function |
| 37 | +└── tests/ |
| 38 | + ├── __init__.py |
| 39 | + └── test_handler.py # Unit tests (example) |
| 40 | +``` |
| 41 | + |
| 42 | +## Prerequisites |
| 43 | +- AWS Account |
| 44 | +- AWS CLI installed and configured |
| 45 | +- Docker installed (for Docker-based deployment) |
| 46 | +- Python 3.9+ (for local development and .zip deployment) |
| 47 | +- Access to static builds of FFmpeg and ffprobe (for .zip/Layer deployment, or can be downloaded in Dockerfile) |
| 48 | + |
| 49 | +## Deployment Options 🚀 |
| 50 | + |
| 51 | +You can deploy this application to AWS Lambda using either a traditional .zip archive with Lambda Layers or by using a Docker container image. |
| 52 | + |
| 53 | +## Option 1: Deployment using .zip archive and Lambda Layers |
| 54 | + |
| 55 | +This is the traditional method for deploying Lambda functions. |
| 56 | + |
| 57 | +**1. Prepare Lambda Layer for ffmpeg/ffprobe:** |
| 58 | + - Download static builds of `ffmpeg` and `ffprobe` compatible with the Lambda runtime's Amazon Linux version (e.g., Amazon Linux 2 for Python 3.9/3.11). A common source is [John Van Sickle's FFmpeg builds](https://johnvansickle.com/ffmpeg/). |
| 59 | + - Create the following folder structure: |
| 60 | + ``` |
| 61 | + python/ |
| 62 | + bin/ |
| 63 | + ffmpeg |
| 64 | + ffprobe |
| 65 | + ``` |
| 66 | + - Ensure `ffmpeg` and `ffprobe` are executable (`chmod +x ffmpeg ffprobe`). |
| 67 | + - Zip the `python` folder (e.g., `ffmpeg-layer.zip`). |
| 68 | + - In the AWS Lambda console, create a new Layer and upload `ffmpeg-layer.zip`. Note the Layer ARN. |
| 69 | +
|
| 70 | +**2. Package Your Application Code:** |
| 71 | + - Your application code is in `src/transcoder/app.py`. |
| 72 | + - If you have Python dependencies beyond `boto3` (which is included in the Lambda Python runtime), as listed in `src/transcoder/requirements.txt`, install them into a package directory: |
| 73 | + ```bash |
| 74 | + pip install -r src/transcoder/requirements.txt -t ./package |
| 75 | + ``` |
| 76 | + - Create a .zip file containing your `app.py` (copied from `src/transcoder/`) and the contents of the `package` directory. `app.py` should be at the root of the .zip file. |
| 77 | + ```bash |
| 78 | + cp src/transcoder/app.py ./app.py # Copy app.py to root for zipping |
| 79 | + # If you have a package directory |
| 80 | + zip -r lambda_function.zip app.py ./package |
| 81 | + # If you only have app.py (and boto3 is sufficient) |
| 82 | + # zip lambda_function.zip app.py |
| 83 | + rm app.py # Clean up copied file |
| 84 | + ``` |
| 85 | + Ensure `app.py` is at the root of the zip. If you copied `app.py` into `src/transcoder/` within the zip, the handler path will need to reflect that. For simplicity, it's often easier to ensure `app.py` is at the root. |
| 86 | +
|
| 87 | +**3. Create Lambda Function (Zip file):** |
| 88 | + - In the AWS Lambda Console, click **"Create function"**. |
| 89 | + - Choose **"Author from scratch"**. |
| 90 | + - **Function name:** Enter a descriptive name. |
| 91 | + - **Runtime:** Select a Python version (e.g., Python 3.11 or as supported). |
| 92 | + - **Architecture:** Choose `x86_64` or `arm64` based on your ffmpeg build and preference. |
| 93 | + - **Permissions:** Create a new execution role with basic Lambda permissions, or choose an existing one. This role will be modified later. |
| 94 | + - Click **"Create function"**. |
| 95 | + - **Upload code:** In the "Code source" section, upload your `lambda_function.zip`. |
| 96 | + - **Handler:** Set the handler to `app.lambda_handler` (assuming `app.py` is at the root of your zip and named `app.py`). |
| 97 | + - **Layers:** Add the ffmpeg/ffprobe Lambda Layer you created in step 1. |
| 98 | +
|
| 99 | +## Option 2: Deployment using Docker Container Image |
| 100 | +
|
| 101 | +This method packages your application and dependencies, including FFmpeg, into a Docker image. |
| 102 | +
|
| 103 | +**1. Review/Update Dockerfile:** |
| 104 | + - The provided `Dockerfile` in the repository is a good starting point. |
| 105 | + ```dockerfile |
| 106 | + # Use the AWS Lambda Python 3.13 base image (Amazon Linux 2023) |
| 107 | + FROM public.ecr.aws/lambda/python:3.13 |
| 108 | +
|
| 109 | + # Install dependencies via microdnf (if any beyond base image + ffmpeg) |
| 110 | + RUN microdnf update -y && \ |
| 111 | + microdnf install -y tar xz && \ |
| 112 | + microdnf clean all |
| 113 | +
|
| 114 | + # Download static build of FFmpeg |
| 115 | + RUN curl -L https://johnvansickle.com/ffmpeg/releases/ffmpeg-release-amd64-static.tar.xz \ |
| 116 | + -o /tmp/ffmpeg.tar.xz && \ |
| 117 | + tar -xJf /tmp/ffmpeg.tar.xz -C /tmp && \ |
| 118 | + mkdir -p /opt/bin && \ |
| 119 | + cp /tmp/ffmpeg-*-static/ffmpeg /opt/bin/ && \ |
| 120 | + cp /tmp/ffmpeg-*-static/ffprobe /opt/bin/ && \ |
| 121 | + chmod +x /opt/bin/ffmpeg /opt/bin/ffprobe && \ |
| 122 | + rm -rf /tmp/* |
| 123 | +
|
| 124 | + # Copy application code and any other necessary files |
| 125 | + # If you have a requirements.txt specific to the transcoder (src/transcoder/requirements.txt): |
| 126 | + COPY src/transcoder/requirements.txt ${LAMBDA_TASK_ROOT}/requirements.txt |
| 127 | + RUN pip install -r ${LAMBDA_TASK_ROOT}/requirements.txt -t ${LAMBDA_TASK_ROOT} |
| 128 | + |
| 129 | + COPY src/transcoder/app.py ${LAMBDA_TASK_ROOT}/app.py |
| 130 | + # Ensure your app.py uses /opt/bin/ffmpeg and /opt/bin/ffprobe |
| 131 | +
|
| 132 | + # Set the Lambda handler (filename.handler_function) |
| 133 | + CMD ["app.lambda_handler"] |
| 134 | + ``` |
| 135 | + - Ensure the `FFMPEG` and `FFPROBE` paths in `src/transcoder/app.py` are correctly set to `/opt/bin/ffmpeg` and `/opt/bin/ffprobe` respectively (this is done by default in the current `app.py`). |
| 136 | + - If `src/transcoder/requirements.txt` exists and contains dependencies beyond `boto3`, uncomment and adjust the `COPY` and `RUN pip install` lines in the `Dockerfile`. |
| 137 | +
|
| 138 | +**2. Build and Push Docker Image to Amazon ECR:** |
| 139 | + - **Install AWS CLI and Docker:** Ensure they are installed and configured locally. |
| 140 | + - **Create ECR Repository (if it doesn't exist):** |
| 141 | + ```bash |
| 142 | + aws ecr create-repository --repository-name your-lambda-repo-name --image-scanning-configuration scanOnPush=true --region your-aws-region |
| 143 | + ``` |
| 144 | + Replace `your-lambda-repo-name` and `your-aws-region`. |
| 145 | + - **Authenticate Docker to your ECR registry:** |
| 146 | + ```bash |
| 147 | + aws ecr get-login-password --region your-aws-region | docker login --username AWS --password-stdin YOUR_AWS_ACCOUNT_ID.dkr.ecr.your-aws-region.amazonaws.com |
| 148 | + ``` |
| 149 | + Replace `your-aws-region` and `YOUR_AWS_ACCOUNT_ID`. |
| 150 | + - **Build Docker Image:** Navigate to the root directory of your project (where `Dockerfile` is located) and run: |
| 151 | + ```bash |
| 152 | + docker build -t your-lambda-repo-name . |
| 153 | + ``` |
| 154 | + - **Tag Docker Image for ECR:** |
| 155 | + ```bash |
| 156 | + docker tag your-lambda-repo-name:latest YOUR_AWS_ACCOUNT_ID.dkr.ecr.your-aws-region.amazonaws.com/your-lambda-repo-name:latest |
| 157 | + ``` |
| 158 | + - **Push Docker Image to ECR:** |
| 159 | + ```bash |
| 160 | + docker push YOUR_AWS_ACCOUNT_ID.dkr.ecr.your-aws-region.amazonaws.com/your-lambda-repo-name:latest |
| 161 | + ``` |
| 162 | +
|
| 163 | +**3. Create Lambda Function (Container Image):** |
| 164 | + - In the AWS Lambda Console, click **"Create function"**. |
| 165 | + - Select **"Container image"**. |
| 166 | + - **Function name:** Enter a descriptive name. |
| 167 | + - **Container image URI:** Click **"Browse images"** and select the image you pushed to ECR (e.g., `your-lambda-repo-name:latest`). |
| 168 | + - **Architecture:** Choose `x86_64` (matching the ffmpeg build in the Dockerfile). |
| 169 | + - **Permissions:** Create a new execution role or choose an existing one. This role will be modified. |
| 170 | + - Click **"Create function"**. |
| 171 | +
|
| 172 | +## Common Configuration Steps (for both deployment options) |
| 173 | +
|
| 174 | +**1. IAM Permissions:** |
| 175 | + - Go to the IAM console and find the execution role associated with your Lambda function. |
| 176 | + - Attach policies that grant the following permissions: |
| 177 | + - **AmazonS3FullAccess** (or a more restrictive policy granting `s3:GetObject` on the source bucket and `s3:PutObject` on the destination bucket/prefix). |
| 178 | + Example inline policy for S3: |
| 179 | + ```json |
| 180 | + { |
| 181 | + "Version": "2012-10-17", |
| 182 | + "Statement": [ |
| 183 | + { |
| 184 | + "Effect": "Allow", |
| 185 | + "Action": [ |
| 186 | + "s3:GetObject" |
| 187 | + ], |
| 188 | + "Resource": "arn:aws:s3:::YOUR_SOURCE_BUCKET_NAME/*" |
| 189 | + }, |
| 190 | + { |
| 191 | + "Effect": "Allow", |
| 192 | + "Action": [ |
| 193 | + "s3:PutObject", |
| 194 | + "s3:PutObjectAcl" // Optional, if you need to set ACLs |
| 195 | + ], |
| 196 | + "Resource": "arn:aws:s3:::YOUR_DESTINATION_BUCKET_NAME/*" |
| 197 | + } |
| 198 | + ] |
| 199 | + } |
| 200 | + ``` |
| 201 | + Replace `YOUR_SOURCE_BUCKET_NAME` and `YOUR_DESTINATION_BUCKET_NAME`. The `processed/` prefix is handled by the application logic. |
| 202 | + - **AmazonTranscribeFullAccess** (or a more restrictive policy granting `transcribe:StartTranscriptionJob`). |
| 203 | + Example inline policy for Transcribe: |
| 204 | + ```json |
| 205 | + { |
| 206 | + "Version": "2012-10-17", |
| 207 | + "Statement": [ |
| 208 | + { |
| 209 | + "Effect": "Allow", |
| 210 | + "Action": "transcribe:StartTranscriptionJob", |
| 211 | + "Resource": "*" |
| 212 | + } |
| 213 | + ] |
| 214 | + } |
| 215 | + ``` |
| 216 | + - **AWSLambdaBasicExecutionRole** (usually added by default): Allows writing logs to CloudWatch (`logs:CreateLogGroup`, `logs:CreateLogStream`, `logs:PutLogEvents`). |
| 217 | +
|
| 218 | +**2. Lambda Function Configuration:** |
| 219 | + - In the Lambda function console, navigate to the **"Configuration"** tab. |
| 220 | + - **General configuration:** |
| 221 | + - **Memory:** Increase memory. Video processing is memory-intensive. Start with **2048 MB** or **4096 MB** and monitor/adjust based on execution logs and performance. |
| 222 | + - **Ephemeral storage (for container images):** If using container image deployment, you can increase ephemeral storage beyond the default 512MB if your FFmpeg processes require more temporary disk space (up to 10 GB). For .zip deployments, `/tmp` is limited to 512MB. |
| 223 | + - **Timeout:** Increase the timeout. Video processing can be slow. Start with **5 minutes** (300 seconds) or up to the maximum of **15 minutes** (900 seconds). Monitor and adjust. |
| 224 | + - **Environment Variables (Optional):** |
| 225 | + - `LANGUAGE_CODE`: Defaults to "en-US" in `app.py`. You can override it here if needed for other languages supported by Amazon Transcribe. |
| 226 | + - Any other custom environment variables your application might need. |
| 227 | +
|
| 228 | +**3. Triggers:** |
| 229 | +
|
| 230 | + **A. S3 Trigger (for `process_video` function):** |
| 231 | + - In the Lambda console for your function, go to **"Function overview"** and click **"Add trigger"**. |
| 232 | + - Select **"S3"**. |
| 233 | + - **Bucket:** Choose the S3 bucket where raw videos will be uploaded. |
| 234 | + - **Event type:** Select **"All object create events"** or be more specific (e.g., `PUT`, `POST`, `CompleteMultipartUpload`). |
| 235 | + - **Prefix (Optional):** If you want the Lambda to trigger only for uploads to a specific folder (e.g., `uploads/`). |
| 236 | + - **Suffix (Optional):** If you want to trigger only for specific file types (e.g., `.mp4`). |
| 237 | + - Acknowledge the recursive invocation warning if your Lambda writes back to the same bucket (though this app writes to a `processed/` prefix, which should avoid direct recursion if the trigger is on the root or a different prefix). |
| 238 | + - Click **"Add"**. |
| 239 | +
|
| 240 | + **B. API Gateway Trigger (for `stream_handler` function):** |
| 241 | + - This allows HTTP(S) access to serve the HLS/DASH manifests and video segments. |
| 242 | + - In the Lambda console for your function, go to **"Function overview"** and click **"Add trigger"**. |
| 243 | + - Select **"API Gateway"**. |
| 244 | + - Choose **"Create an API"**. |
| 245 | + - Select **"HTTP API"** (recommended for simplicity and cost) or REST API. |
| 246 | + - **Security:** For initial testing, **"Open"** is fine. For production, implement appropriate authorization (e.g., IAM, Lambda authorizer, API Key). |
| 247 | + - **API name, Deployment stage:** Configure as needed. |
| 248 | + - **Route:** The `stream_handler` expects `bucket` and `key` as query string parameters. A common route might be `/stream` or `/videos/{proxy+}`. You will need to ensure the integration passes query parameters. |
| 249 | + - Click **"Add"**. Note the **API endpoint URL** provided after creation. This URL will be used to access your video streams. |
| 250 | +
|
| 251 | +## Testing |
| 252 | +
|
| 253 | +**1. Testing `process_video` (S3 Trigger):** |
| 254 | + - Upload a video file (e.g., an MP4) to the S3 bucket and prefix you configured as the trigger. |
| 255 | + - Monitor the Lambda function's execution in Amazon CloudWatch Logs. |
| 256 | + - Check your destination S3 bucket (and the `processed/` prefix) for the output HLS files, DASH files, `sprite.png`, and the transcription JSON. |
| 257 | +
|
| 258 | +**2. Testing `stream_handler` (API Gateway Trigger):** |
| 259 | + - Once `process_video` has successfully run, you can test the streaming. |
| 260 | + - Construct the URL using your API Gateway endpoint and the S3 key for a manifest or segment. |
| 261 | + - Example for HLS master playlist (replace placeholders): |
| 262 | + `YOUR_API_ENDPOINT/stream?bucket=YOUR_S3_BUCKET_NAME&key=processed/YOUR_VIDEO_BASENAME/hls/master.m3u8` |
| 263 | + - Example for a video segment (replace placeholders): |
| 264 | + `YOUR_API_ENDPOINT/stream?bucket=YOUR_S3_BUCKET_NAME&key=processed/YOUR_VIDEO_BASENAME/hls/720p/seg_000_720p.ts` |
| 265 | + - You can use a tool like `curl`, a web browser, or an HLS/DASH test player (like VLC or online players) to access these URLs. |
| 266 | + - Check CloudWatch Logs for the API Gateway and Lambda function if you encounter issues. |
| 267 | +
|
| 268 | +## Important Notes: |
| 269 | +- **FFmpeg Static Builds:** Ensure the FFmpeg static build used (either in Layer or Docker) is compatible with the Lambda execution environment (Amazon Linux 2 for older Python runtimes, Amazon Linux 2023 for `public.ecr.aws/lambda/python:3.13` base image). The `Dockerfile` downloads a common amd64 static build. |
| 270 | +- **Costs:** Be mindful of AWS costs: S3 (storage, requests, data transfer), Lambda (invocations, duration, memory), ECR (storage), API Gateway (requests, data transfer), and Amazon Transcribe (transcription minutes). |
| 271 | +- **Error Handling & Logging:** The provided code has basic error handling. For production, implement more robust error handling, use structured logging, and consider setting up Dead-Letter Queues (DLQs) for your Lambda function to handle failed invocations. |
| 272 | +- **Large Files & Long Processing:** For very large video files or extremely long processing times, Lambda's 15-minute timeout or /tmp storage limits (512MB for .zip, configurable up to 10GB for containers) might be insufficient. In such scenarios, consider AWS Batch for asynchronous, long-running jobs or AWS Elemental MediaConvert for a managed media conversion service. |
| 273 | +- **Idempotency:** Consider if parts of your workflow need to be idempotent, especially if retries occur. |
| 274 | +- **Security:** |
| 275 | + - Adhere to the principle of least privilege for IAM roles. Grant only the necessary permissions. |
| 276 | + - Secure your API Gateway endpoint using appropriate authentication and authorization mechanisms for production. |
| 277 | + - Regularly update dependencies, including the base Docker image and FFmpeg. |
| 278 | +
|
| 279 | +## Local Development & Testing (Conceptual) |
| 280 | +
|
| 281 | +While full end-to-end testing requires AWS services, you can test parts of the `app.py` logic locally: |
| 282 | +
|
| 283 | +1. **Setup:** |
| 284 | + * Ensure Python 3.9+ is installed. |
| 285 | + * Install dependencies: `pip install -r src/transcoder/requirements.txt boto3 moto` (Moto is for mocking AWS services). |
| 286 | + * Have FFmpeg and ffprobe installed locally and accessible in your PATH, or adjust `FFMPEG`/`FFPROBE` paths in `app.py` for local testing. |
| 287 | +2. **Mocking AWS Services:** |
| 288 | + * Use `moto` to mock S3 and Transcribe calls during local unit tests. |
| 289 | +3. **Sample Event:** |
| 290 | + * Use `events/basic.json` (you might need to create/modify this to represent an S3 event) to simulate a Lambda invocation. |
| 291 | + * Example `events/basic.json` for S3 trigger: |
| 292 | + ```json |
| 293 | + { |
| 294 | + "Records": [ |
| 295 | + { |
| 296 | + "s3": { |
| 297 | + "bucket": { |
| 298 | + "name": "your-local-test-bucket" |
| 299 | + }, |
| 300 | + "object": { |
| 301 | + "key": "sample.mp4" |
| 302 | + } |
| 303 | + } |
| 304 | + } |
| 305 | + ] |
| 306 | + } |
| 307 | + ``` |
| 308 | +4. **Running `app.lambda_handler`:** |
| 309 | + * Write a small Python script to load the event and call `app.lambda_handler(event, None)`. |
| 310 | +
|
| 311 | +```python |
| 312 | +# Example local_runner.py (place in project root) |
| 313 | +import json |
| 314 | +import sys |
| 315 | +sys.path.append('src/transcoder') # Add app module to path |
| 316 | +from app import lambda_handler |
| 317 | +
|
| 318 | +if __name__ == '__main__': |
| 319 | + with open('events/basic.json', 'r') as f: |
| 320 | + event = json.load(f) |
| 321 | + |
| 322 | + # --- Setup for local testing --- |
| 323 | + # 1. Create a dummy sample.mp4 or use a small test video. |
| 324 | + # 2. Manually create ./temp_s3_bucket/sample.mp4 if your code expects to download it. |
| 325 | + # OR modify app.py to use a local file path directly for tmp_in for local testing. |
| 326 | + # 3. FFmpeg/ffprobe must be in PATH or paths in app.py adjusted. |
| 327 | + # |
| 328 | + # This local run won't interact with actual S3/Transcribe unless you configure |
| 329 | + # boto3 with real credentials and endpoints, or use moto for mocking. |
| 330 | + # For true local simulation of S3, tools like localstack can be used. |
| 331 | + |
| 332 | + print("Simulating Lambda invocation locally...") |
| 333 | + result = lambda_handler(event, None) |
| 334 | + print("\nLambda Output:") |
| 335 | + print(json.dumps(result, indent=2)) |
| 336 | +
|
| 337 | +``` |
| 338 | +This local setup is primarily for unit/integration testing of Python logic, not for testing FFmpeg processing in the exact Lambda environment. For that, deploying to AWS is necessary. |
| 339 | + |
| 340 | +## Contributing |
| 341 | +Pull requests are welcome! For major changes, please open an issue first to discuss what you would like to change. |
| 342 | +Please make sure to update tests as appropriate. |
| 343 | + |
| 344 | +## License |
| 345 | +This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details. |
| 346 | + |
0 commit comments