You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
🔥 A serverless AWS Lambda solution for on-the-fly video transcoding, transcription, and adaptive streaming (HLS & DASH), with a simple Flask-based frontend for uploads.
5
+
🔥 A serverless AWS Lambda solution for on-the-fly video transcoding, transcription, and adaptive streaming (HLS & DASH), with a simple Flask-based frontend for uploads, status tracking, and video management.
6
6
7
7
This project provides a robust and scalable way to process video files. It includes:
8
8
1.**S3 Event Trigger:** A Lambda function is triggered when a video is uploaded to an S3 bucket.
9
-
2.**Video Processing:**
10
-
***Probe Video:**Determines resolution and metadata.
11
-
***Transcode to Multiple Resolutions:** Creates different quality levels for adaptive bitrate streaming.
9
+
2.**Video Processing Pipeline:**
10
+
***Content-Addressable Storage:**Uses the MD5 hash of the video file as a unique ID to prevent duplicate processing.
11
+
***Transcode to Multiple Resolutions:** Creates different quality levels (e.g., 1080p, 720p, 480p) for adaptive bitrate streaming.
12
12
***Generate HLS & DASH Playlists:** Creates manifest files for Apple HLS and MPEG-DASH.
13
13
***Create Dynamic Sprite Sheet:** Generates a thumbnail sprite sheet for video scrubbing previews.
14
-
***Transcribe Audio:** Uses Amazon Transcribe to generate a text transcription.
15
-
3.**Flask Web Interface:**
16
-
*A simple web page to upload videos directly to S3.
17
-
*A status page to check the transcoding progress.
18
-
*A streaming endpoint to serve the transcoded content.
19
-
4.**API Gateway Integration:** The Flask app is served via API Gateway, allowing public access.
14
+
***Transcribe Audio (Optional):** Uses Amazon Transcribe to generate subtitles.
15
+
3.**Flask Backend with REST API:**
16
+
*Handles large file uploads using S3 multipart uploads.
17
+
*Provides API endpoints to list, check status, and delete videos.
18
+
*Serves video content using S3 presigned URLs.
19
+
4.**Direct Access via Lambda Function URL:** The Flask app is served via a Lambda Function URL, providing direct public HTTP access without needing an API Gateway.
20
20
21
21
## Features
22
22
-**Serverless Architecture:** Leverages AWS Lambda, S3, and Amazon Transcribe.
23
-
-**Flask Frontend:** Easy-to-use web interface for video uploads and status checks.
24
-
-**Adaptive Bitrate Streaming:** Outputs HLS and DASH formats.
25
-
-**Automated Transcription:** Integrates with Amazon Transcribe.
26
-
-**Dynamic Thumbnail Sprite Generation:** Creates a sprite sheet based on video duration.
23
+
-**Content-Addressable:** Video processing is based on file content (MD5 hash), making it idempotent.
24
+
-**Large File Support:** Handles large video uploads efficiently using S3 multipart uploads.
25
+
-**Adaptive Bitrate Streaming:** Outputs HLS and (optionally) DASH formats.
26
+
-**Automated Transcription:** Integrates with Amazon Transcribe (can be disabled).
27
+
-**Dynamic Thumbnail Sprite Generation:** Creates a sprite sheet for rich player seeking previews.
28
+
-**REST API:** Provides endpoints for managing videos, suitable for a modern frontend.
29
+
-**Video Deletion:** API endpoint to delete a video and all its associated assets from S3.
27
30
-**Docker-based Deployment:** Simplified deployment using a container image.
28
31
29
32
## Project Structure
@@ -43,7 +46,7 @@ This project provides a robust and scalable way to process video files. It inclu
43
46
│ └── templates/
44
47
│ └── index.html # HTML for the upload frontend
45
48
└── tests/
46
-
└── ...
49
+
└── test_handler.py
47
50
```
48
51
49
52
## Prerequisites
@@ -52,28 +55,30 @@ This project provides a robust and scalable way to process video files. It inclu
52
55
- Docker installed
53
56
- An S3 bucket to store uploads and transcoded files.
54
57
58
+
## Configuration
59
+
60
+
The Lambda function is configured using environment variables. Set these in the Lambda function's configuration page in the AWS Console.
|`BUCKET_NAME`|**Required.** The name of the S3 bucket for uploads and transcoded files. |`None`|
65
+
|`LAMBDA_FUNCTION_URL`| The public URL of the Lambda function. Required for generating correct links in HLS/DASH manifests. |`""`|
66
+
|`GENERATE_DASH`| Set to `"true"` to generate MPEG-DASH manifests alongside HLS. |`"true"`|
67
+
|`GENERATE_SUBTITLES`| Set to `"true"` to enable video transcription with Amazon Transcribe. |`"true"`|
68
+
|`THUMBNAIL_WIDTH`| The width of the generated thumbnails in the sprite sheet. |`1280`|
69
+
|`LOG_LEVEL`| The logging level for the application. |`"INFO"`|
70
+
|`SPRITE_FPS`| The frame rate (frames per second) to use for generating the thumbnail sprite. |`1`|
71
+
|`SPRITE_ROWS`| The number of rows in the thumbnail sprite sheet. |`10`|
72
+
|`SPRITE_COLUMNS`| The number of columns in the thumbnail sprite sheet. |`10`|
73
+
|`SPRITE_INTERVAL`| The interval in seconds between frames captured for the thumbnail sprite. |`1`|
74
+
|`SPRITE_SCALE_W`| The width to scale each thumbnail to in the sprite sheet. |`180`|
75
+
76
+
55
77
## Deployment Guide 🚀
56
78
57
79
This project is designed for deployment as a Docker container image to AWS Lambda.
58
80
59
-
**1. Configure Environment Variables:**
60
-
- Before building the Docker image, you need to set the `BUCKET_NAME` environment variable. This can be done in a few ways:
61
-
-**Option A (Hardcode in Dockerfile - for quick tests):**
62
-
```dockerfile
63
-
# In your Dockerfile, before the CMD
64
-
ENV BUCKET_NAME='your-s3-bucket-name'
65
-
```
66
-
- **Option B (Set in Lambda Console - Recommended):** You will set this in the Lambda function's configuration after deployment. This is the most flexible and secure method.
67
-
68
-
**2. Review the Dockerfile:**
69
-
- The `Dockerfile` handles all the necessary steps:
70
-
- Starts from the official AWS Lambda Python base image.
71
-
- Installs FFmpeg.
72
-
- Copies `requirements.txt` and installs Python packages (including Flask and serverless-wsgi).
73
-
- Copies the `app.py` and the `templates` directory.
- Choose **"Create an API"**, select **"HTTP API"**, and **"Open"** security.
121
-
- Note the **API endpoint URL**.
122
-
- **S3 Trigger (for Video Processing):**
123
-
- Click **"Add trigger"**, select **"S3"**.
124
-
- **Bucket:** Select your S3 bucket (`your-s3-bucket-name`).
125
-
- **Event types:** `All object create events`.
107
+
**3. Set Environment Variables:**
108
+
- In the Lambda function's configuration page, go to the **"Configuration"** tab and then **"Environment variables"**.
109
+
- Add the environment variables listed in the **Configuration** section above. `BUCKET_NAME` is required.
110
+
111
+
**4. Add S3 Triggers:**
112
+
- In the function's **"Function overview"** panel, click **"+ Add trigger"**.
113
+
- **Trigger 1 (For Video Uploads):**
114
+
- Select **"S3"** as the source.
115
+
- Choose your bucket (`BUCKET_NAME`).
116
+
- **Event type:**`All object create events`.
126
117
- **Prefix:**`uploads/`
127
-
- Acknowledge the warning and click **"Add"**.
128
-
129
-
## How to Use
130
-
131
-
1. **Open the Web Frontend:**
132
-
- Navigate to the **API endpoint URL** you received when creating the API Gateway trigger.
133
-
2. **Upload a Video:**
134
-
- Use the form to select and upload a video file.
135
-
- Upon successful upload, you will be redirected to a status page.
136
-
3. **Processing:**
137
-
- The upload triggers the S3 event, which invokes the same Lambda function to run the `process_video` logic.
138
-
- This will transcode the video, create playlists, generate the sprite sheet, and start the transcription job.
139
-
4. **Check Status:**
140
-
- The status page will eventually show "Transcoding complete!" once the main HLS playlist is available.
141
-
5. **Accessing Transcoded Content:**
142
-
- The transcoded files are stored in your S3 bucket under the `processed/` prefix.
143
-
- The Flask app also provides a `/stream/` endpoint that can serve these files, which can be used by a video player.
144
-
145
-
## Important Notes
146
-
- **Costs:** Be mindful of AWS costs for S3, Lambda, ECR, API Gateway, and Amazon Transcribe.
147
-
- **Error Handling:** For production, consider adding more robust error handling and a dead-letter queue (DLQ) for the Lambda function.
148
-
- **Large Files:** For files larger than a few hundred MBs, consider using S3 presigned URLs for direct browser uploads to avoid passing the file through the Lambda function's memory.
149
-
- **Security:** For production, secure your API Gateway endpoint and use more restrictive IAM policies.
118
+
- Acknowledge the recursive invocation warning and click **"Add"**.
119
+
- Click **"+ Add trigger"** again.
120
+
- **Trigger 2 (For Processed File Events):**
121
+
- Select **"S3"** as the source.
122
+
- Choose your bucket (`BUCKET_NAME`).
123
+
- **Event type:**`All object create events`.
124
+
- **Prefix:**`processed/`
125
+
- **Suffix:**`.json`
126
+
- Acknowledge the recursive invocation warning and click **"Add"**. This trigger handles events forJSON filesin the `processed/` directory, such as transcription job results from Amazon Transcribe. The Lambda functioncode is designed to handle these events without causing infinite loops.
127
+
128
+
**5. Create Function URL:**
129
+
- In the function's configuration page, go to the **"Configuration"** tab and then **"Function URL"**.
130
+
- Click **"Create function URL"**.
131
+
- **Auth type:** `NONE`.
132
+
- **CORS:** Configure CORS to allow access from your frontend's domain. For testing, you can enable it for all origins.
133
+
- Click **"Save"**.
134
+
- Copy the generated Function URL and set it as the `LAMBDA_FUNCTION_URL` environment variable.
135
+
136
+
## IAM Permissions
137
+
138
+
Your Lambda execution role needs the following permissions. Attach these policies to the role.
139
+
140
+
1. **S3 Access:** Full access to the specific S3 bucket used by the function.
141
+
```json
142
+
{
143
+
"Version": "2012-10-17",
144
+
"Statement": [
145
+
{
146
+
"Effect": "Allow",
147
+
"Action": "s3:*",
148
+
"Resource": [
149
+
"arn:aws:s3:::YOUR_BUCKET_NAME",
150
+
"arn:aws:s3:::YOUR_BUCKET_NAME/*"
151
+
]
152
+
}
153
+
]
154
+
}
155
+
```
156
+
2. **AWS Transcribe Access:** (If `GENERATE_SUBTITLES` is enabled)
157
+
- `transcribe:StartTranscriptionJob`
158
+
- `transcribe:GetTranscriptionJob`
159
+
- The managed policy `AmazonTranscribeFullAccess` can be used for simplicity.
160
+
161
+
3. **CloudWatch Logs:** The default `AWSLambdaBasicExecutionRole` policy is usually sufficient for logging.
162
+
- `logs:CreateLogGroup`
163
+
- `logs:CreateLogStream`
164
+
- `logs:PutLogEvents`
165
+
166
+
## S3 Bucket CORS Configuration
167
+
168
+
To allow the frontend to perform multipart uploads directly to S3 and to support video streaming, you need to configure Cross-Origin Resource Sharing (CORS) on your S3 bucket.
169
+
170
+
Go to your S3 bucket in the AWS Console, selectthe**Permissions** tab, and in the **Cross-origin resource sharing (CORS)** section, paste the following JSON configuration:
171
+
172
+
```json
173
+
[
174
+
{
175
+
"AllowedHeaders": [
176
+
"*"
177
+
],
178
+
"AllowedMethods": [
179
+
"GET",
180
+
"PUT",
181
+
"POST",
182
+
"DELETE",
183
+
"HEAD"
184
+
],
185
+
"AllowedOrigins": [
186
+
"*"
187
+
],
188
+
"ExposeHeaders": [
189
+
"ETag"
190
+
],
191
+
"MaxAgeSeconds": 3000
192
+
}
193
+
]
194
+
```
195
+
196
+
## API Endpoints
197
+
198
+
The Flask application provides several API endpoints for interaction.
|`POST`|`/create_multipart_upload`| Initializes a multipart upload and returns presigned URLs for each chunk. |
203
+
|`POST`|`/complete_multipart_upload`| Finalizes the multipart upload after all chunks are uploaded. |
204
+
|`GET`|`/status/<video_id>`| Gets the detailed processing status of a specific video. |
205
+
|`GET`|`/api/videos`| Returns a list of all successfully processed videos. |
206
+
|`GET`|`/api/transcoding_status`| Returns a list of videos that are currently in the "processing" state. |
207
+
|`DELETE`|`/api/video/<video_id>`| Deletes a video and all its associated files (HLS, DASH, sprites, etc.). |
208
+
|`GET`|`/stream/<path:key>`| Redirects to a presigned S3 URL to stream video content. |
209
+
210
+
## How It Works
211
+
212
+
1. **Upload:** A user uploads a video file via the frontend. For large files, the frontend uses the multipart upload endpoints to upload the file in chunks directly to the `uploads/` prefix in the S3 bucket.
213
+
2. **Trigger:** The S3 `put` event triggers the Lambda function.
214
+
3. **Processing (`process_video`):**
215
+
* The functiondownloads the source video.
216
+
* It calculates the file's MD5 hash, which becomes the `process_id`. This ensures that if the same file is uploaded again, it won't be re-processed.
217
+
* A redirect file (`processed/<original_filename>.json`) is created to map the original name to the `process_id`.
218
+
* A `manifest.json` is created in`processed/<process_id>/` to track the state.
219
+
* The video is transcoded into multiple resolutions using FFmpeg. HLS (and optionally DASH) files are generated.
220
+
* A thumbnail sprite sheet and VTT file are created for scrubbing previews.
221
+
* All artifacts are uploaded to the `processed/<process_id>/` directory in S3.
222
+
* The final `manifest.json` is updated with the status `processing_complete` and paths to all assets.
223
+
4. **Event Handling forProcessed Files:** The second S3 trigger is configured for `.json` filesin the `processed/` directory. This allows the functionto react to events like the completion of an Amazon Transcribe job. The function's logic is designed to handle these events appropriately and avoid infinite recursion from files it generates itself.
224
+
5. **Status Check:** The frontend polls the `/status/<video_id>` endpoint to monitor the progress from `processing` to `processing_complete`.
225
+
6. **Playback:** Once complete, the frontend can retrieve the list of videos from `/api/videos` and play them using the HLS or DASH manifest URLs. The `/stream/` endpoint provides the necessary presigned URLs for the player to access the video segments from S3 securely.
150
226
151
227
## License
152
-
This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.
228
+
229
+
This project is licensed under the MIT License. See the [LICENSE](LICENSE) file for details.
0 commit comments