Skip to content

tstanczyk95/McByte

Repository files navigation

[CVPRW 2025] McByte official implementation code.

Please note: This is the release of the first, academic version of McByte, with a runnable demo included. More optimized and engineered version is coming soon. Stay tuned!


No Train Yet Gain: Towards Generic Multi-Object Tracking in Sports and Beyond

Tomasz Stanczyk, Seongro Yoon, Francois Bremond

arxiv 2506.01373

Inria 3IA Côte d'Azur

Abstract

Multi-object tracking (MOT) is essential for sports analytics, enabling performance evaluation and tactical insights. However, tracking in sports is challenging due to fast movements, occlusions, and camera shifts. Traditional tracking-by-detection methods require extensive tuning, while segmentation-based approaches struggle with track processing. We propose McByte, a tracking-by-detection framework that integrates temporally propagated segmentation mask as an association cue to improve robustness without per-video tuning. Unlike many existing methods, McByte does not require training, relying solely on pre-trained models and object detectors commonly used in the community. Evaluated on SportsMOT, DanceTrack, SoccerNet-tracking 2022 and MOT17, McByte demonstrates strong performance across sports and general pedestrian tracking. Our results highlight the benefits of mask propagation for a more adaptable and generalizable MOT approach.

CVPR poster main diagram

Added gradually

  • Cleaning of the code
  • Public release of the first, academic version
  • Installation and setting instructions
  • 🔥 Directly usable demo 🔥
  • Video input file processing
  • Using custom/oracle detections
  • Numerical evaluation instructions
  • More optimized and engineered version

Installation and models

Please follow the complete guideline in INSTALLATION.md.

🔥 Demo 🔥

Simply run the command - no training required:

python tools/demo_track.py --path path/to/your/input

Input is a frame folder or a video file.
Output will be located in: McByte/YOLOX_outputs/yolox_x_mix_det/track_vis/date_time_stamp. Folder with processed frames and a log file (see the section below) will be created. Text output file (tracking records per frame in MOT format) will be created outside the folder.

More arguments:

  • --demo - demo type, input type. Recognized values: image (frames), video. Default: image.
  • --vis_type - visualization type, it enables saving separately: frames with masks and tracklets, frames with detections and frames with tracklets before Kalman filter update. Skipping the visualization, while keeing the text (records) output is also possible. Recognized values: full (frame/image input only), basic, no_vis. Default: basic.
  • --start_frame_no - number of the first starting frame (assuming that your frames are ordered). E.g., if you want to start tracking from the middle of the sequence rather than from the beginning. Default: 1.
  • -f | --exp_file - the name of the YOLOX detector experiment (architecture and parameters) file. Although several ones are possible, we recommend staying with the default: exps/example/mot/yolox_x_mix_det.py.
  • -c | --ckpt - the name of the object detector pretrained weights file (the checkpoint). It must match the architecture from the experiment file above (e.g. YOLOX X). Default: pretrained/yolox_x_sports_mix.pth.tar.
  • --det_path - path to the text file with detections. Default: None. If specified, detector-related arguments will not be considered. See the expected format here or slightly adjust it to yours.

Additional note: The YOLOX object detector model pretrained on SportsMOT as provided by the dataset authors (used as default setting above) behaves very well on the considered sport settings - soccer, basketball, volleyball 🔥

For a complete list of arguments, run:

python tools/demo_track.py --help

Logging funcionality 🔍👀

During the run, a logging file will be created and located in: McByte/YOLOX_outputs/yolox_x_mix_det/track_vis/date_time_stamp/logging_info.txt.
This file contains information per frame with all the tracklets, detections, their confidence scores, tracklet states, associations and cost matrix per step, before and after the mask update. Together with the detection per frame output and traklets-before-Kalman-filter output (--vis_type=full, default, see above), it is a powerful tool to go deep into the details, verify fine-grained behaviour of the tracker and associations per frame, and to further adjust the algorithm to your needs if needed (e.g. if you want to change yourself some hyper-parameters, etc.). This functionality as a whole was very useful in developing McByte and thus we hope it will serve the interested users as well.

Mask resolution and memory

In case you face issues with memory (e.g. GPU limitations), you can reduce the mask internal representation size of the mask temporal propagator component.

In order to do that, go to the file: McByte/Cutie/cutie/config/eval_config.yaml. Find the parameter max_internal_size (line 23). By default it is set as -1 which means that the mask internal representation (used to compute the final propagation mask) will have the same dimensions as your input frames/video. You can decrease the size, by updating the -1 with some fraction of the lower size of your input. E.g. If your input has dimensions of 1920x1080, you can try with 540 (50% of 1080), which will reduce the total size by the factor of 4 (2x smaller weight, 2x smaller width). Consider also less natural numbers, e.g. sqrt(2) ~ 1.41, thus in this case, setting the number as 1080/1.41 ~ 766 (total size reduced by the factor of 2).

Note that as the internal representation of mask becomes smaller, the mask accuracy might decrease. Although mask is used as an association cue in McByte, and small inaccuracies are handled withing the tracker design, significant size changes might lead to overall decreased performace, particularly with small objects and crowds.

Acknowledgement

A large part of the McByte code was borrwed from ByteTrack, YOLOX and Cutie. SAM model was smoothly incorporated and some additional parts were borrowed from BoT-SORT. Many thanks to all the authors for their amazing work!

Usage and citation

If you find this work useful, consider leaving us a star ⭐ and please cite it in your research paper as follows:

@InProceedings{Stanczyk_2025_CVPR,
    author    = {Stanczyk, Tomasz and Yoon, Seongro and Bremond, Francois},
    title     = {No Train Yet Gain: Towards Generic Multi-Object Tracking in Sports and Beyond},
    booktitle = {Proceedings of the Computer Vision and Pattern Recognition Conference (CVPR) Workshops},
    month     = {June},
    year      = {2025},
    pages     = {6039-6048}
}

About

[CVPRW 2025] McByte - tracking in sports without training (No Train Yet Gain)

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages