This project implements a lightweight FPGA-based image compression pipeline optimized for real-time video encoding on resource-constrained platforms such as micro and insect-sized drones. The design captures YCbCr video from an OV7670 camera sensor, compresses it using a JPEG-like DCT-based algorithm, and outputs the compressed bitstream via UART for transmission or storage.
- β YCbCr 4:2:2 Direct Camera Input - SCCB-configured OV7670 camera interface
- β Time-Shared Compression Pipeline - Single DCT/quantizer/zigzag/RLE core services Y, Cb, Cr channels sequentially
- β Low Resource Utilization - Optimized for Artix-7 FPGA (Arty S7-50 board)
- β Real-Time Processing - Supports QVGA/VGA resolution at 15-30 fps
- β Modular IP Architecture - 13 independent IP blocks designed for Vivado Block Design
- β Comprehensive Testbenches - Full simulation coverage for verification
- β Power Efficient - Minimal dynamic power consumption suitable for battery-powered platforms
π· OV7670 Camera (YCbCr 4:2:2)
β
[SCCB Controller] - Configures camera via I2C-like protocol
β
[Camera Interface] - Captures pixel data and sync signals
β
[YCbCr Parser] - Separates interleaved YCbCr into Y, Cb, Cr streams
β
[Block Buffers] - Accumulates 8Γ8 pixel blocks (3 instances: Y, Cb, Cr)
β
[FSM Controller] - Time-shared scheduling of compression pipeline
β
[Shared Compression Pipeline]:
DCT β Quantizer β Zig-Zag β RLE Encoder
β
[Bitstream Mux] - Combines Y, Cb, Cr compressed data with channel markers
β
[UART TX] - Transmits compressed bitstream at 115200 baud
β
π» PC/Storage (Python decoder receives and decompresses images)
Instead of three parallel DCT/quantizer/zigzag/RLE pipelines (which would consume 3Γ resources), this design uses one shared compression core that processes blocks from each channel sequentially:
- Y channel (luminance): Full resolution, uses luminance quantization table
- Cb channel (blue-difference): Reduced by camera (4:2:2), uses chroma quantization table
- Cr channel (red-difference): Reduced by camera (4:2:2), uses chroma quantization table
Benefits:
- ~66% reduction in LUT/FF/BRAM usage vs. parallel pipelines
- Lower power consumption (critical for drones)
- Sufficient throughput for real-time video (pipelining masks sequential processing)
- Maintains high compression quality
- Xilinx Arty S7-50 (Artix-7 XC7A35T FPGA)
- 100 MHz on-board oscillator
- USB UART interface for bitstream output
- OV7670 (or compatible OV76xx sensor)
- Configurable via SCCB (Serial Camera Control Bus, I2C-compatible)
- Outputs YCbCr 4:2:2 pixel data
- Typical XCLK: 10-24 MHz (driven by FPGA via Clocking Wizard at 24 MHz)
- 2Γ 4.7kΞ© Pull-up Resistors (for SCCB SDA/SCL lines)
- 24 MHz Oscillator (optional if FPGA-driven; we use FPGA CLK Wizard)
- Breadboard & Jumper Wires for prototyping
- USB Cable for UART communication and board power
Arty S7-50 Pin β OV7670 Pin
=====================================
GPIO (XCLK out) β XCLK
GPIO (VSYNC in) β VSYNC
GPIO (HREF in) β HREF
GPIO (PCLK in) β PCLK
GPIO[7:0] (in) β D[7:0]
GPIO (SDA) β SIOD
GPIO (SCL) β SIOC
GND β GND
3.3V β VCC
Configures the OV7670 camera via SCCB (Serial Camera Control Bus).
Inputs:
clk- System clock (100 MHz)rst- Active-high resetstart_config- Trigger camera configuration
Outputs:
config_done- Asserted when all camera registers are writtenconfig_busy- Indicates ongoing configurationsccb_sda- Bidirectional data line (open-drain)sccb_scl- Serial clock line (open-drain)
Configuration Registers Set:
- Output format: YCbCr 4:2:2
- Resolution: QVGA (320Γ240) or VGA (640Γ480)
- Frame rate, clock dividers, and color parameters
Testbench: tb_sccb_controller.v
Captures raw pixel data and synchronization signals from OV7670.
Inputs:
pix_clk- Internal pixel processing clock (100 MHz)rst- Active-high resetenable- Enable pixel capture (typically fromconfig_done)cam_pclk- Camera pixel clock (async, from OV7670)cam_vsync- Vertical sync (async)cam_href- Horizontal reference (async)cam_data[7:0]- Camera 8-bit data bus
Outputs:
pixel_stream- Valid pixel indicator and datapixel_out[7:0]- Captured pixel valueframe_start- Asserted on VSYNC rise (frame boundary)line_start- Asserted on HREF rise (line boundary)pixel_valid- Pixel is ready for downstream processingcapturing- Status: currently capturing frame
Features:
- CDC (Clock Domain Crossing) synchronization for async camera signals
- Robust edge detection for frame/line markers
- Pixel buffering on HREF assertion
Testbench: tb_camera_interface.v
Parses interleaved YCbCr 4:2:2 data into separate Y, Cb, Cr streams.
Inputs:
clk- Synchronous clock (100 MHz)rst- Active-high resetpixel_stream_in[7:0]- Incoming interleaved pixel datapixel_valid- Incoming pixel is validline_start/frame_start- Frame/line boundary markers
Outputs:
y_stream_out[7:0]- Luminance pixelscb_stream_out[7:0]- Cb (blue-difference) pixelscr_stream_out[7:0]- Cr (red-difference) pixelschannel_valid[2:0]- Valid signal per channel
Parsing Logic:
Converts Y0 Cb0 Y1 Cr0 Y2 Cb1 Y3 Cr1 ... (4:2:2 format) into three separate streams.
Testbench: tb_ycbcr_parser.v
Accumulates incoming pixels into 8Γ8 blocks for DCT processing.
Inputs:
clk- System clockrst- Active-high resetenable- Enable accumulationpixel_stream_in[7:0]- Input pixelpixel_valid- Pixel is validblock_read_ack- Downstream acknowledges block consumption
Outputs:
block_stream[1023:0]- Complete 8Γ8 block (64 pixels Γ 16-bit)pixel_count[5:0]- Current pixel count in block (0-63)buffer_full- Block is complete and readyblock_ready- Used by FSM for scheduling
Features:
- Dual-port BRAM for simultaneous write (from camera) and read (to DCT)
- Pixel counter tracks fill level
- Ready signal on 64-pixel boundary (8Γ8 block completion)
Testbench: tb_block_buffer.v
Performs 2D Discrete Cosine Transform on 8Γ8 pixel blocks.
Inputs:
clk- System clock (100 MHz)rst- Active-high resetblock_in[1023:0]- 8Γ8 pixel block (64 Γ 16-bit)block_valid- Input block is validstart- Begin DCT computation
Outputs:
dct_out[1023:0]- 8Γ8 DCT coefficients (64 Γ 16-bit fixed-point)dct_valid- Output coefficients are validdone- DCT computation completebusy- Currently processing
Algorithm:
- Row-wise 1D DCT (8 transforms of 8 values each)
- Column-wise 1D DCT (8 transforms of 8 values each)
- Fixed-point arithmetic with 14-bit fractional part
- Optimized via precomputed cosine lookup tables
Latency: ~130 cycles for full 8Γ8 block
Testbench: tb_dct2d.v
Applies quantization to DCT coefficients using switchable tables.
Inputs:
clk- System clockrst- Active-high resetcoeff_stream_in[15:0]- DCT coefficient (fixed-point)start- Begin quantizationenable- Enable processingquant_table_select[1:0]- Select quantization table:00: Luminance (Y channel)01: Chrominance (Cb/Cr channels)
Outputs:
quant_stream_out[15:0]- Quantized coefficientquant_out[1023:0]- Full quantized blockquant_valid- Output is validcoeff_ready- Ready for next coefficientdone- Quantization completebusy- Currently processing
Quantization Strategy:
- JPEG-like quantization tables (luminance has finer granularity, chroma more aggressive)
- Reduces precision of high-frequency components for compression
- Scalable quality adjustment via table presets
Testbench: tb_quantizer.v
Reorders quantized coefficients into JPEG-style zig-zag scan order.
Inputs:
clk- System clockrst- Active-high resetblock_in[1023:0]- 8Γ8 quantized blockblock_valid- Input block is validstart- Begin zig-zag reorderingenable- Enable processing
Outputs:
zigzag_stream_out[15:0]- Coefficient in zig-zag orderzigzag_out[1023:0]- Full reordered blockzigzag_valid- Output stream is validblock_ready- Ready for next blockdone- Reordering completebusy- Currently processing
Zig-Zag Pattern: Traverses coefficients from low-frequency (DC) to high-frequency (AC) components, clustering zeros for efficient RLE encoding.
Testbench: tb_zigzag.v
Run-Length Encodes sequences of zeros and coefficient pairs.
Inputs:
clk- System clockrst- Active-high resetstream_in[15:0]- Coefficient from zig-zagstream_valid- Input coefficient is validenable- Enable encodingstart- Begin encoding
Outputs:
bitstream_out- Variable-length encoded bitsbits_valid- Output bits are validbit_count[4:0]- Number of valid output bits (1-32)done- Encoding completebusy- Currently processing
Encoding Scheme:
- Huffman-like variable-length codes for (run, amplitude) pairs
- Efficient zero compression (runs of zeros encoded as escape codes)
- Example: 15 zeros followed by value 42 β specialized code vs. individual codes
Testbench: tb_rle_encoder.v
Manages time-shared scheduling of the compression pipeline across Y, Cb, Cr channels.
Inputs:
clk- System clockrst- Active-high resetenable- Global enabley_buffer_ready- Y block is availablecb_buffer_ready- Cb block is availablecr_buffer_ready- Cr block is availablepipeline_done- Compression pipeline finished current block
Outputs:
channel_select[1:0]- Select channel for pipeline:00: Y channel01: Cb channel10: Cr channel
quant_table_select[1:0]- Route to quantizer (Y luminance, Cb/Cr chroma)block_read_enable[2:0]- Trigger read from corresponding bufferpipeline_start- Trigger DCT/quant/zigzag/RLE sequence
Scheduling Algorithm: Round-robin through channels, ensuring each block is processed without starvation.
State: IDLE
β
Check Y_buffer_ready β if ready, process Y block
β
Check Cb_buffer_ready β if ready, process Cb block
β
Check Cr_buffer_ready β if ready, process Cr block
β
Loop back to IDLE
Testbench: tb_fsm_controller.v
Combines compressed data from Y, Cb, Cr channels with frame/block markers.
Inputs:
clk- System clockrst- Active-high resetrle_bitstream[31:0]- Compressed data from RLE encoderbits_valid[4:0]- Number of valid bitschannel_id[1:0]- Which channel data came fromframe_marker- Frame start/end boundary
Outputs:
output_bitstream[31:0]- Final compressed frame dataoutput_valid- Output data is validoutput_ready- Downstream can accept data
Multiplexing:
- Interleaves Y, Cb, Cr compressed blocks in JPEG-like order
- Adds sync markers for frame/block boundaries (helps decoder resynchronize)
Testbench: tb_bitstream_mux.v
Serializes compressed bitstream over UART for PC reception.
Inputs:
clk- System clock (100 MHz)rst- Active-high resetdata_in[7:0]- Byte to transmitdata_valid- Input byte is valid
Outputs:
tx- UART transmit line (connects to USB/serial adapter)data_ready- Ready to accept next byte
Configuration:
- Baud rate: 115200 (standard for Arty S7 USB UART)
- Data bits: 8
- Stop bits: 1
- Parity: None
Testbench: tb_uart_tx.v
Clocking Wizard (clk_wiz_0):
Input: 100 MHz (Arty S7 on-board oscillator)
Outputs:
clk_out1 (100 MHz) β Main processing clock for all IP
clk_out2 (24 MHz) β Camera XCLK (to OV7670)
clk_out3 (50 MHz) β Optional control/UART clock
locked β Clock stability indicator
Reset Distribution:
External Reset Button β clk_wiz_0/reset
clk_wiz_0/locked & ~external_reset β Active-high reset to all IPs
| Source Block | β | Destination Block | Signal(s) |
|---|---|---|---|
| clk_wiz_0 | β | All IPs | clk, reset |
| sccb_controller_0 | β | camera_interface_0 | config_done β enable |
| camera_interface_0 | β | ycbcr_parser_0 | pixel_out, pixel_valid |
| ycbcr_parser_0 | β | block_buffer (3Γ) | Y/Cb/Cr streams |
| block_buffer (all 3) | β | fsm_controller_0 | buffer_ready signals |
| fsm_controller_0 | β | dct2d_0 | block, start, enable |
| dct2d_0 | β | quantizer_0 | dct_out, dct_valid |
| quantizer_0 | β | zigzag_0 | quant_out, quant_valid |
| zigzag_0 | β | rle_encoder_0 | zigzag_out, zigzag_valid |
| rle_encoder_0 | β | bitstream_mux_0 | bitstream, bits_valid |
| bitstream_mux_0 | β | uart_tx_0 | output_bitstream, valid |
| uart_tx_0 | β | External Pin | tx (UART) |
# Clock Input (100 MHz)
set_property PACKAGE_PIN E3 [get_ports clk_in]
set_property IOSTANDARD LVCMOS33 [get_ports clk_in]
# Reset Button
set_property PACKAGE_PIN D9 [get_ports reset_btn]
set_property IOSTANDARD LVCMOS33 [get_ports reset_btn]
set_property PULLUP true [get_ports reset_btn]
# UART Interface
set_property PACKAGE_PIN D10 [get_ports uart_tx]
set_property IOSTANDARD LVCMOS33 [get_ports uart_tx]
set_property PACKAGE_PIN A9 [get_ports uart_rx]
set_property IOSTANDARD LVCMOS33 [get_ports uart_rx]
# Camera XCLK Output (24 MHz from clk_wiz_0/clk_out2)
set_property PACKAGE_PIN T11 [get_ports cam_xclk]
set_property IOSTANDARD LVCMOS33 [get_ports cam_xclk]
# SCCB Interface (Open-Drain)
set_property PACKAGE_PIN T10 [get_ports sccb_sda]
set_property IOSTANDARD LVCMOS33 [get_ports sccb_sda]
set_property PULLUP true [get_ports sccb_sda]
set_property PACKAGE_PIN R10 [get_ports sccb_scl]
set_property IOSTANDARD LVCMOS33 [get_ports sccb_scl]
set_property PULLUP true [get_ports sccb_scl]
# Camera Data Interface
set_property PACKAGE_PIN U11 [get_ports cam_pclk]
set_property IOSTANDARD LVCMOS33 [get_ports cam_pclk]
set_property PACKAGE_PIN R11 [get_ports cam_vsync]
set_property IOSTANDARD LVCMOS33 [get_ports cam_vsync]
set_property PACKAGE_PIN P10 [get_ports cam_href]
set_property IOSTANDARD LVCMOS33 [get_ports cam_href]
set_property PACKAGE_PIN N10 [get_ports cam_data[0]]
set_property PACKAGE_PIN M10 [get_ports cam_data[1]]
set_property PACKAGE_PIN L10 [get_ports cam_data[2]]
set_property PACKAGE_PIN K11 [get_ports cam_data[3]]
set_property PACKAGE_PIN J11 [get_ports cam_data[4]]
set_property PACKAGE_PIN K10 [get_ports cam_data[5]]
set_property PACKAGE_PIN J10 [get_ports cam_data[6]]
set_property PACKAGE_PIN H11 [get_ports cam_data[7]]
for {set i 0} {$i < 8} {incr i} {
set_property IOSTANDARD LVCMOS33 [get_ports cam_data[$i]]
}
# Debug LEDs (optional)
set_property PACKAGE_PIN H17 [get_ports debug_led[0]]
set_property PACKAGE_PIN K15 [get_ports debug_led[1]]
set_property IOSTANDARD LVCMOS33 [get_ports debug_led[0]]
set_property IOSTANDARD LVCMOS33 [get_ports debug_led[1]]
# Timing Constraints
create_clock -period 10.0 -name clk100 [get_ports clk_in]
set_input_delay -clock clk100 3.0 [get_ports cam_pclk]
set_input_delay -clock clk100 3.0 [get_ports cam_vsync]
set_input_delay -clock clk100 3.0 [get_ports cam_href]
set_input_delay -clock clk100 3.0 [get_ports cam_data]- Vivado 2021.2 or later (free WebPACK license available)
- Xilinx Arty S7-50 board
- OV7670 Camera Module with breakout board
- Python 3.x (for decoder script)
- Serial/USB adapter (already on Arty S7)
git clone https://github.com/yourusername/drone-image-encoder.git
cd drone-image-encodercd vivado
vivado -source create_project.tcl -mode batch- Open
vivado/project.xprin Vivado - In Block Design:
- Add all 13 IP blocks
- Connect as per pin table above
- Run "Validate Design"
- Generate HDL wrapper
# In Vivado Tcl console
run_synth
run_impl
write_bitstreamopen_hw_manager
connect_hw_server
program_hw_devices [get_hw_devices xc7a35t_0]Arty S7 OV7670
=======================
PIN_T11 β XCLK
PIN_T10 β SIOD (SDA)
PIN_R10 β SIOC (SCL)
PIN_U11 β PCLK
PIN_R11 β VSYNC
PIN_P10 β HREF
PIN_N10-H11β D[7:0]
GND β GND
3.3V β VCC
cd python
python3 image_decoder.py --port /dev/ttyUSB0 --baud 115200 --output frame.jpgcd vivado
vivado -source run_all_sims.tcl -mode batch# In Vivado Tcl console
open_project vivado/project.xpr
# Simulate camera_interface
launch_simulation
run_test tb_camera_interface
view_waveforms
# Simulate DCT pipeline
run_test tb_dct2d
view_waveforms- Camera Interface: VSYNC β HREF toggles β pixel_valid pulses
- Block Buffers: Pixel accumulation β buffer_full on 64th pixel
- DCT: ~130 cycles latency, then valid output
- Quantizer: Streaming coefficient output
- Zig-Zag: 64 coefficients in zig-zag order
- RLE: Variable-length encoded output
- UART: Serial transmission at 115200 baud
| Metric | Value |
|---|---|
| Resolution Support | QVGA (320Γ240), VGA (640Γ480) |
| Frame Rate | 15-30 fps |
| Compression Ratio | 3-5Γ typical JPEG |
| Image Quality (PSNR) | 32-36 dB |
| FPGA Utilization | ~45% LUTs on Artix-7 |
| Power Consumption | ~2-3W (typical) |
| UART Throughput | 115200 baud (14.4 KB/s) |
| Latency (frame) | ~33-67 ms (30-15 fps) |
Solution:
- Check SCCB pull-ups (4.7kΞ© on SDA/SCL)
- Verify XCLK output (24 MHz on oscilloscope)
- Use I2C scanner to probe camera address (0x42 OV7670)
Solution:
- Confirm PCLK, VSYNC, HREF toggling correctly
- Check camera data lines (D[7:0]) for proper voltage levels
- Verify camera_interface
enablesignal is high
Solution:
- Verify all clock domains synchronized
- Check quantization table selection matches channel
- Ensure block_ready handshakes firing correctly
Solution:
- Confirm baud rate 115200 on both FPGA and PC
- Check USB cable and serial adapter
- Verify UART TX pin is correctly mapped
Located in python/image_decoder.py, the decoder:
- Receives compressed bitstream from UART
- Parses frame/block markers to identify Y/Cb/Cr blocks
- Decodes RLE to recover zig-zag coefficients
- Performs inverse zig-zag reordering
- Dequantizes coefficients using inverse tables
- Computes inverse 2D DCT to recover 8Γ8 blocks
- Reconstructs YCbCr image and converts to RGB
- Saves output as PNG/JPG file
Usage:
python3 image_decoder.py \
--port /dev/ttyUSB0 \
--baud 115200 \
--timeout 30 \
--output frame.jpg \
--display- Architecture: See
docs/ARCHITECTURE.md - IP Block Details: See
docs/IP_SPECIFICATIONS.md - Testbench Guide: See
docs/SIMULATION.md - Hardware Setup: See
docs/HARDWARE_SETUP.md - Python Decoder: See
python/DECODER.md
This project is licensed under the MIT License β see LICENSE file for details.
Contributions are welcome! Please:
- Fork the repository
- Create feature branch (
git checkout -b feature/AmazingFeature) - Commit changes (
git commit -m 'Add AmazingFeature') - Push to branch (
git push origin feature/AmazingFeature) - Open Pull Request
- Authors: Shankar & Harivenkatesh
- Email: shankarm2023@gmail.com / harivenkatesh1006@gmail.com
- GitHub Issues: Report bugs or request features
- Discussions: Join community discussions
- Xilinx for Vivado and Artix-7 FPGA documentation
- OV7670 Community for camera technical resources
- JPEG Committee for DCT/quantization standards
- Open-source FPGA community for tools and inspiration
| Phase | Status | Deliverables |
|---|---|---|
| Phase 1: Design & Planning | β Complete | Architecture, IP specs, testbenches |
| Phase 2: IP Development | β Complete | 13 IP blocks, full simulation |
| Phase 3: Block Design Integration | β Complete | Vivado BD, constraints, synthesis |
| Phase 4: Hardware Testing | π In Progress | FPGA bitstream, camera interface |
| Phase 5: Python Decoder | β Complete | Full image reconstruction |
| Phase 6: Optimization | β³ Future | Power reduction, higher framerates |
- Adaptive quantization based on image content
- Configurable resolution switching (QVGA/VGA on-the-fly)
- H.264/H.265 codec support
- Real-time image preview on FPGA HDMI output
- Machine learning-based compression optimization
- Multi-camera support
- Power gating for idle periods
Last Updated: November 2025
Version: 1.0.0