Skip to content

Commit c0d726b

Browse files
committed
* Added README.md.
1 parent 9688840 commit c0d726b

File tree

1 file changed

+344
-1
lines changed

1 file changed

+344
-1
lines changed

README.md

Lines changed: 344 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,346 @@
11
# Lambda Video Transcoder
22

3-
🔥 A serverless AWS Lambda solution for video transcoding
3+
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
4+
5+
🔥 A serverless AWS Lambda solution for on-the-fly video transcoding, transcription, and adaptive streaming (HLS & DASH).
6+
7+
This project provides a robust and scalable way to process video files uploaded to an S3 bucket. When a video is uploaded, a Lambda function is triggered to:
8+
1. **Probe Video:** Determine resolution and other metadata.
9+
2. **Transcode to Multiple Resolutions:** Create different quality levels suitable for adaptive bitrate streaming (e.g., 1080p, 720p, 480p).
10+
3. **Generate HLS & DASH Playlists:** Create manifest files for Apple HLS and MPEG-DASH streaming.
11+
4. **Create Sprite Sheet:** Generate a thumbnail sprite sheet for video scrubbing previews.
12+
5. **Transcribe Audio:** Use Amazon Transcribe to generate a text transcription of the video's audio.
13+
6. **Stream Content (Optional):** A secondary Lambda handler can be exposed via API Gateway to serve the transcoded video segments and playlists directly from S3, enabling byte-range requests for efficient streaming.
14+
15+
## Features
16+
- **Serverless Architecture:** Leverages AWS Lambda for compute, S3 for storage, and Amazon Transcribe for audio-to-text.
17+
- **Adaptive Bitrate Streaming:** Outputs HLS and DASH formats for wide compatibility across devices.
18+
- **Automated Transcription:** Integrates with Amazon Transcribe.
19+
- **Thumbnail Sprite Generation:** For enhanced video player UIs.
20+
- **Multiple Deployment Options:** Supports both traditional .zip deployment with Lambda Layers and Docker container image deployment.
21+
- **Customizable Presets:** Easily configure video output resolutions and bitrates.
22+
23+
## Project Structure
24+
```
25+
.
26+
├── Dockerfile # For Docker-based Lambda deployment
27+
├── LICENSE # Project License
28+
├── README.md # This file
29+
├── requirements.txt # Root Python dependencies (if any, for local dev)
30+
├── events/
31+
│ └── basic.json # Sample event for local testing
32+
├── src/
33+
│ └── transcoder/
34+
│ ├── __init__.py
35+
│ ├── app.py # Core Lambda function logic
36+
│ └── requirements.txt # Dependencies for the Lambda function
37+
└── tests/
38+
├── __init__.py
39+
└── test_handler.py # Unit tests (example)
40+
```
41+
42+
## Prerequisites
43+
- AWS Account
44+
- AWS CLI installed and configured
45+
- Docker installed (for Docker-based deployment)
46+
- Python 3.9+ (for local development and .zip deployment)
47+
- Access to static builds of FFmpeg and ffprobe (for .zip/Layer deployment, or can be downloaded in Dockerfile)
48+
49+
## Deployment Options 🚀
50+
51+
You can deploy this application to AWS Lambda using either a traditional .zip archive with Lambda Layers or by using a Docker container image.
52+
53+
## Option 1: Deployment using .zip archive and Lambda Layers
54+
55+
This is the traditional method for deploying Lambda functions.
56+
57+
**1. Prepare Lambda Layer for ffmpeg/ffprobe:**
58+
- Download static builds of `ffmpeg` and `ffprobe` compatible with the Lambda runtime's Amazon Linux version (e.g., Amazon Linux 2 for Python 3.9/3.11). A common source is [John Van Sickle's FFmpeg builds](https://johnvansickle.com/ffmpeg/).
59+
- Create the following folder structure:
60+
```
61+
python/
62+
bin/
63+
ffmpeg
64+
ffprobe
65+
```
66+
- Ensure `ffmpeg` and `ffprobe` are executable (`chmod +x ffmpeg ffprobe`).
67+
- Zip the `python` folder (e.g., `ffmpeg-layer.zip`).
68+
- In the AWS Lambda console, create a new Layer and upload `ffmpeg-layer.zip`. Note the Layer ARN.
69+
70+
**2. Package Your Application Code:**
71+
- Your application code is in `src/transcoder/app.py`.
72+
- If you have Python dependencies beyond `boto3` (which is included in the Lambda Python runtime), as listed in `src/transcoder/requirements.txt`, install them into a package directory:
73+
```bash
74+
pip install -r src/transcoder/requirements.txt -t ./package
75+
```
76+
- Create a .zip file containing your `app.py` (copied from `src/transcoder/`) and the contents of the `package` directory. `app.py` should be at the root of the .zip file.
77+
```bash
78+
cp src/transcoder/app.py ./app.py # Copy app.py to root for zipping
79+
# If you have a package directory
80+
zip -r lambda_function.zip app.py ./package
81+
# If you only have app.py (and boto3 is sufficient)
82+
# zip lambda_function.zip app.py
83+
rm app.py # Clean up copied file
84+
```
85+
Ensure `app.py` is at the root of the zip. If you copied `app.py` into `src/transcoder/` within the zip, the handler path will need to reflect that. For simplicity, it's often easier to ensure `app.py` is at the root.
86+
87+
**3. Create Lambda Function (Zip file):**
88+
- In the AWS Lambda Console, click **"Create function"**.
89+
- Choose **"Author from scratch"**.
90+
- **Function name:** Enter a descriptive name.
91+
- **Runtime:** Select a Python version (e.g., Python 3.11 or as supported).
92+
- **Architecture:** Choose `x86_64` or `arm64` based on your ffmpeg build and preference.
93+
- **Permissions:** Create a new execution role with basic Lambda permissions, or choose an existing one. This role will be modified later.
94+
- Click **"Create function"**.
95+
- **Upload code:** In the "Code source" section, upload your `lambda_function.zip`.
96+
- **Handler:** Set the handler to `app.lambda_handler` (assuming `app.py` is at the root of your zip and named `app.py`).
97+
- **Layers:** Add the ffmpeg/ffprobe Lambda Layer you created in step 1.
98+
99+
## Option 2: Deployment using Docker Container Image
100+
101+
This method packages your application and dependencies, including FFmpeg, into a Docker image.
102+
103+
**1. Review/Update Dockerfile:**
104+
- The provided `Dockerfile` in the repository is a good starting point.
105+
```dockerfile
106+
# Use the AWS Lambda Python 3.13 base image (Amazon Linux 2023)
107+
FROM public.ecr.aws/lambda/python:3.13
108+
109+
# Install dependencies via microdnf (if any beyond base image + ffmpeg)
110+
RUN microdnf update -y && \
111+
microdnf install -y tar xz && \
112+
microdnf clean all
113+
114+
# Download static build of FFmpeg
115+
RUN curl -L https://johnvansickle.com/ffmpeg/releases/ffmpeg-release-amd64-static.tar.xz \
116+
-o /tmp/ffmpeg.tar.xz && \
117+
tar -xJf /tmp/ffmpeg.tar.xz -C /tmp && \
118+
mkdir -p /opt/bin && \
119+
cp /tmp/ffmpeg-*-static/ffmpeg /opt/bin/ && \
120+
cp /tmp/ffmpeg-*-static/ffprobe /opt/bin/ && \
121+
chmod +x /opt/bin/ffmpeg /opt/bin/ffprobe && \
122+
rm -rf /tmp/*
123+
124+
# Copy application code and any other necessary files
125+
# If you have a requirements.txt specific to the transcoder (src/transcoder/requirements.txt):
126+
COPY src/transcoder/requirements.txt ${LAMBDA_TASK_ROOT}/requirements.txt
127+
RUN pip install -r ${LAMBDA_TASK_ROOT}/requirements.txt -t ${LAMBDA_TASK_ROOT}
128+
129+
COPY src/transcoder/app.py ${LAMBDA_TASK_ROOT}/app.py
130+
# Ensure your app.py uses /opt/bin/ffmpeg and /opt/bin/ffprobe
131+
132+
# Set the Lambda handler (filename.handler_function)
133+
CMD ["app.lambda_handler"]
134+
```
135+
- Ensure the `FFMPEG` and `FFPROBE` paths in `src/transcoder/app.py` are correctly set to `/opt/bin/ffmpeg` and `/opt/bin/ffprobe` respectively (this is done by default in the current `app.py`).
136+
- If `src/transcoder/requirements.txt` exists and contains dependencies beyond `boto3`, uncomment and adjust the `COPY` and `RUN pip install` lines in the `Dockerfile`.
137+
138+
**2. Build and Push Docker Image to Amazon ECR:**
139+
- **Install AWS CLI and Docker:** Ensure they are installed and configured locally.
140+
- **Create ECR Repository (if it doesn't exist):**
141+
```bash
142+
aws ecr create-repository --repository-name your-lambda-repo-name --image-scanning-configuration scanOnPush=true --region your-aws-region
143+
```
144+
Replace `your-lambda-repo-name` and `your-aws-region`.
145+
- **Authenticate Docker to your ECR registry:**
146+
```bash
147+
aws ecr get-login-password --region your-aws-region | docker login --username AWS --password-stdin YOUR_AWS_ACCOUNT_ID.dkr.ecr.your-aws-region.amazonaws.com
148+
```
149+
Replace `your-aws-region` and `YOUR_AWS_ACCOUNT_ID`.
150+
- **Build Docker Image:** Navigate to the root directory of your project (where `Dockerfile` is located) and run:
151+
```bash
152+
docker build -t your-lambda-repo-name .
153+
```
154+
- **Tag Docker Image for ECR:**
155+
```bash
156+
docker tag your-lambda-repo-name:latest YOUR_AWS_ACCOUNT_ID.dkr.ecr.your-aws-region.amazonaws.com/your-lambda-repo-name:latest
157+
```
158+
- **Push Docker Image to ECR:**
159+
```bash
160+
docker push YOUR_AWS_ACCOUNT_ID.dkr.ecr.your-aws-region.amazonaws.com/your-lambda-repo-name:latest
161+
```
162+
163+
**3. Create Lambda Function (Container Image):**
164+
- In the AWS Lambda Console, click **"Create function"**.
165+
- Select **"Container image"**.
166+
- **Function name:** Enter a descriptive name.
167+
- **Container image URI:** Click **"Browse images"** and select the image you pushed to ECR (e.g., `your-lambda-repo-name:latest`).
168+
- **Architecture:** Choose `x86_64` (matching the ffmpeg build in the Dockerfile).
169+
- **Permissions:** Create a new execution role or choose an existing one. This role will be modified.
170+
- Click **"Create function"**.
171+
172+
## Common Configuration Steps (for both deployment options)
173+
174+
**1. IAM Permissions:**
175+
- Go to the IAM console and find the execution role associated with your Lambda function.
176+
- Attach policies that grant the following permissions:
177+
- **AmazonS3FullAccess** (or a more restrictive policy granting `s3:GetObject` on the source bucket and `s3:PutObject` on the destination bucket/prefix).
178+
Example inline policy for S3:
179+
```json
180+
{
181+
"Version": "2012-10-17",
182+
"Statement": [
183+
{
184+
"Effect": "Allow",
185+
"Action": [
186+
"s3:GetObject"
187+
],
188+
"Resource": "arn:aws:s3:::YOUR_SOURCE_BUCKET_NAME/*"
189+
},
190+
{
191+
"Effect": "Allow",
192+
"Action": [
193+
"s3:PutObject",
194+
"s3:PutObjectAcl" // Optional, if you need to set ACLs
195+
],
196+
"Resource": "arn:aws:s3:::YOUR_DESTINATION_BUCKET_NAME/*"
197+
}
198+
]
199+
}
200+
```
201+
Replace `YOUR_SOURCE_BUCKET_NAME` and `YOUR_DESTINATION_BUCKET_NAME`. The `processed/` prefix is handled by the application logic.
202+
- **AmazonTranscribeFullAccess** (or a more restrictive policy granting `transcribe:StartTranscriptionJob`).
203+
Example inline policy for Transcribe:
204+
```json
205+
{
206+
"Version": "2012-10-17",
207+
"Statement": [
208+
{
209+
"Effect": "Allow",
210+
"Action": "transcribe:StartTranscriptionJob",
211+
"Resource": "*"
212+
}
213+
]
214+
}
215+
```
216+
- **AWSLambdaBasicExecutionRole** (usually added by default): Allows writing logs to CloudWatch (`logs:CreateLogGroup`, `logs:CreateLogStream`, `logs:PutLogEvents`).
217+
218+
**2. Lambda Function Configuration:**
219+
- In the Lambda function console, navigate to the **"Configuration"** tab.
220+
- **General configuration:**
221+
- **Memory:** Increase memory. Video processing is memory-intensive. Start with **2048 MB** or **4096 MB** and monitor/adjust based on execution logs and performance.
222+
- **Ephemeral storage (for container images):** If using container image deployment, you can increase ephemeral storage beyond the default 512MB if your FFmpeg processes require more temporary disk space (up to 10 GB). For .zip deployments, `/tmp` is limited to 512MB.
223+
- **Timeout:** Increase the timeout. Video processing can be slow. Start with **5 minutes** (300 seconds) or up to the maximum of **15 minutes** (900 seconds). Monitor and adjust.
224+
- **Environment Variables (Optional):**
225+
- `LANGUAGE_CODE`: Defaults to "en-US" in `app.py`. You can override it here if needed for other languages supported by Amazon Transcribe.
226+
- Any other custom environment variables your application might need.
227+
228+
**3. Triggers:**
229+
230+
**A. S3 Trigger (for `process_video` function):**
231+
- In the Lambda console for your function, go to **"Function overview"** and click **"Add trigger"**.
232+
- Select **"S3"**.
233+
- **Bucket:** Choose the S3 bucket where raw videos will be uploaded.
234+
- **Event type:** Select **"All object create events"** or be more specific (e.g., `PUT`, `POST`, `CompleteMultipartUpload`).
235+
- **Prefix (Optional):** If you want the Lambda to trigger only for uploads to a specific folder (e.g., `uploads/`).
236+
- **Suffix (Optional):** If you want to trigger only for specific file types (e.g., `.mp4`).
237+
- Acknowledge the recursive invocation warning if your Lambda writes back to the same bucket (though this app writes to a `processed/` prefix, which should avoid direct recursion if the trigger is on the root or a different prefix).
238+
- Click **"Add"**.
239+
240+
**B. API Gateway Trigger (for `stream_handler` function):**
241+
- This allows HTTP(S) access to serve the HLS/DASH manifests and video segments.
242+
- In the Lambda console for your function, go to **"Function overview"** and click **"Add trigger"**.
243+
- Select **"API Gateway"**.
244+
- Choose **"Create an API"**.
245+
- Select **"HTTP API"** (recommended for simplicity and cost) or REST API.
246+
- **Security:** For initial testing, **"Open"** is fine. For production, implement appropriate authorization (e.g., IAM, Lambda authorizer, API Key).
247+
- **API name, Deployment stage:** Configure as needed.
248+
- **Route:** The `stream_handler` expects `bucket` and `key` as query string parameters. A common route might be `/stream` or `/videos/{proxy+}`. You will need to ensure the integration passes query parameters.
249+
- Click **"Add"**. Note the **API endpoint URL** provided after creation. This URL will be used to access your video streams.
250+
251+
## Testing
252+
253+
**1. Testing `process_video` (S3 Trigger):**
254+
- Upload a video file (e.g., an MP4) to the S3 bucket and prefix you configured as the trigger.
255+
- Monitor the Lambda function's execution in Amazon CloudWatch Logs.
256+
- Check your destination S3 bucket (and the `processed/` prefix) for the output HLS files, DASH files, `sprite.png`, and the transcription JSON.
257+
258+
**2. Testing `stream_handler` (API Gateway Trigger):**
259+
- Once `process_video` has successfully run, you can test the streaming.
260+
- Construct the URL using your API Gateway endpoint and the S3 key for a manifest or segment.
261+
- Example for HLS master playlist (replace placeholders):
262+
`YOUR_API_ENDPOINT/stream?bucket=YOUR_S3_BUCKET_NAME&key=processed/YOUR_VIDEO_BASENAME/hls/master.m3u8`
263+
- Example for a video segment (replace placeholders):
264+
`YOUR_API_ENDPOINT/stream?bucket=YOUR_S3_BUCKET_NAME&key=processed/YOUR_VIDEO_BASENAME/hls/720p/seg_000_720p.ts`
265+
- You can use a tool like `curl`, a web browser, or an HLS/DASH test player (like VLC or online players) to access these URLs.
266+
- Check CloudWatch Logs for the API Gateway and Lambda function if you encounter issues.
267+
268+
## Important Notes:
269+
- **FFmpeg Static Builds:** Ensure the FFmpeg static build used (either in Layer or Docker) is compatible with the Lambda execution environment (Amazon Linux 2 for older Python runtimes, Amazon Linux 2023 for `public.ecr.aws/lambda/python:3.13` base image). The `Dockerfile` downloads a common amd64 static build.
270+
- **Costs:** Be mindful of AWS costs: S3 (storage, requests, data transfer), Lambda (invocations, duration, memory), ECR (storage), API Gateway (requests, data transfer), and Amazon Transcribe (transcription minutes).
271+
- **Error Handling & Logging:** The provided code has basic error handling. For production, implement more robust error handling, use structured logging, and consider setting up Dead-Letter Queues (DLQs) for your Lambda function to handle failed invocations.
272+
- **Large Files & Long Processing:** For very large video files or extremely long processing times, Lambda's 15-minute timeout or /tmp storage limits (512MB for .zip, configurable up to 10GB for containers) might be insufficient. In such scenarios, consider AWS Batch for asynchronous, long-running jobs or AWS Elemental MediaConvert for a managed media conversion service.
273+
- **Idempotency:** Consider if parts of your workflow need to be idempotent, especially if retries occur.
274+
- **Security:**
275+
- Adhere to the principle of least privilege for IAM roles. Grant only the necessary permissions.
276+
- Secure your API Gateway endpoint using appropriate authentication and authorization mechanisms for production.
277+
- Regularly update dependencies, including the base Docker image and FFmpeg.
278+
279+
## Local Development & Testing (Conceptual)
280+
281+
While full end-to-end testing requires AWS services, you can test parts of the `app.py` logic locally:
282+
283+
1. **Setup:**
284+
* Ensure Python 3.9+ is installed.
285+
* Install dependencies: `pip install -r src/transcoder/requirements.txt boto3 moto` (Moto is for mocking AWS services).
286+
* Have FFmpeg and ffprobe installed locally and accessible in your PATH, or adjust `FFMPEG`/`FFPROBE` paths in `app.py` for local testing.
287+
2. **Mocking AWS Services:**
288+
* Use `moto` to mock S3 and Transcribe calls during local unit tests.
289+
3. **Sample Event:**
290+
* Use `events/basic.json` (you might need to create/modify this to represent an S3 event) to simulate a Lambda invocation.
291+
* Example `events/basic.json` for S3 trigger:
292+
```json
293+
{
294+
"Records": [
295+
{
296+
"s3": {
297+
"bucket": {
298+
"name": "your-local-test-bucket"
299+
},
300+
"object": {
301+
"key": "sample.mp4"
302+
}
303+
}
304+
}
305+
]
306+
}
307+
```
308+
4. **Running `app.lambda_handler`:**
309+
* Write a small Python script to load the event and call `app.lambda_handler(event, None)`.
310+
311+
```python
312+
# Example local_runner.py (place in project root)
313+
import json
314+
import sys
315+
sys.path.append('src/transcoder') # Add app module to path
316+
from app import lambda_handler
317+
318+
if __name__ == '__main__':
319+
with open('events/basic.json', 'r') as f:
320+
event = json.load(f)
321+
322+
# --- Setup for local testing ---
323+
# 1. Create a dummy sample.mp4 or use a small test video.
324+
# 2. Manually create ./temp_s3_bucket/sample.mp4 if your code expects to download it.
325+
# OR modify app.py to use a local file path directly for tmp_in for local testing.
326+
# 3. FFmpeg/ffprobe must be in PATH or paths in app.py adjusted.
327+
#
328+
# This local run won't interact with actual S3/Transcribe unless you configure
329+
# boto3 with real credentials and endpoints, or use moto for mocking.
330+
# For true local simulation of S3, tools like localstack can be used.
331+
332+
print("Simulating Lambda invocation locally...")
333+
result = lambda_handler(event, None)
334+
print("\nLambda Output:")
335+
print(json.dumps(result, indent=2))
336+
337+
```
338+
This local setup is primarily for unit/integration testing of Python logic, not for testing FFmpeg processing in the exact Lambda environment. For that, deploying to AWS is necessary.
339+
340+
## Contributing
341+
Pull requests are welcome! For major changes, please open an issue first to discuss what you would like to change.
342+
Please make sure to update tests as appropriate.
343+
344+
## License
345+
This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.
346+

0 commit comments

Comments
 (0)