Implementation of paper - YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors
- Integrated into Huggingface Spaces 🤗 using Gradio. Try out the Web Demo
MS COCO
| Model | Test Size | APtest | AP50test | AP75test | batch 1 fps | batch 32 average time |
|---|---|---|---|---|---|---|
| YOLOv7 | 640 | 51.4% | 69.7% | 55.9% | 161 fps | 2.8 ms |
| YOLOv7-X | 640 | 53.1% | 71.2% | 57.8% | 114 fps | 4.3 ms |
| YOLOv7-W6 | 1280 | 54.9% | 72.6% | 60.1% | 84 fps | 7.6 ms |
| YOLOv7-E6 | 1280 | 56.0% | 73.5% | 61.2% | 56 fps | 12.3 ms |
| YOLOv7-D6 | 1280 | 56.6% | 74.0% | 61.8% | 44 fps | 15.0 ms |
| YOLOv7-E6E | 1280 | 56.8% | 74.4% | 62.1% | 36 fps | 18.7 ms |
Docker environment (recommended)
Expand
# create the docker container, you can change the share memory size if you have more.
nvidia-docker run --name yolov7 -it -v your_coco_path/:/coco/ -v your_code_path/:/yolov7 --shm-size=64g nvcr.io/nvidia/pytorch:21.08-py3
# apt install required packages
apt update
apt install -y zip htop screen libgl1-mesa-glx
# pip install required packages
pip install seaborn thop
# go to code folder
cd /yolov7yolov7.pt yolov7x.pt yolov7-w6.pt yolov7-e6.pt yolov7-d6.pt yolov7-e6e.pt
python test.py --data data/coco.yaml --img 640 --batch 32 --conf 0.001 --iou 0.65 --device 0 --weights yolov7.pt --name yolov7_640_valYou will get the results:
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.51206
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.69730
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.55521
Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.35247
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.55937
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.66693
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.38453
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.63765
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.68772
Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.53766
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.73549
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.83868
To measure accuracy, download COCO-annotations for Pycocotools to the ./coco/annotations/instances_val2017.json
Data preparation
bash scripts/get_coco.sh- Download MS COCO dataset images (train, val, test) and labels. If you have previously used a different version of YOLO, we strongly recommend that you delete
train2017.cacheandval2017.cachefiles, and redownload labels
Single GPU training
# train p5 models
python train.py --workers 8 --device 0 --batch-size 32 --data data/coco.yaml --img 640 640 --cfg cfg/training/yolov7.yaml --weights '' --name yolov7 --hyp data/hyp.scratch.p5.yaml
# train p6 models
python train_aux.py --workers 8 --device 0 --batch-size 16 --data data/coco.yaml --img 1280 1280 --cfg cfg/training/yolov7-w6.yaml --weights '' --name yolov7-w6 --hyp data/hyp.scratch.p6.yamlMultiple GPU training
# train p5 models
python -m torch.distributed.launch --nproc_per_node 4 --master_port 9527 train.py --workers 8 --device 0,1,2,3 --sync-bn --batch-size 128 --data data/coco.yaml --img 640 640 --cfg cfg/training/yolov7.yaml --weights '' --name yolov7 --hyp data/hyp.scratch.p5.yaml
# train p6 models
python -m torch.distributed.launch --nproc_per_node 8 --master_port 9527 train_aux.py --workers 8 --device 0,1,2,3,4,5,6,7 --sync-bn --batch-size 128 --data data/coco.yaml --img 1280 1280 --cfg cfg/training/yolov7-w6.yaml --weights '' --name yolov7-w6 --hyp data/hyp.scratch.p6.yamlThis sections illustrates the changes added to main code in this fork.
There are three main additions:
- Custom augmentations: This permits the addition of new augmentations not present in the yolov7 pipeline.
- Excluding bounding boxes: This enables the training process to remove specific classes from the images.
- Biasing data using labels: This allows the user to select what classes are going to appear more freqently in the training.
- A Squeeze and Excitation Block has been added, which can be used between the neck and head.
The first three additions can be used in the code as follows:
from vyn_yolov7 import Setting
from train import run_train
# relative ratio of the classes. For instance, imagine we have two classes: class1 and class2.
# If we want the class2 to be picked twice as class1, then we do
probabilities = {
'class1': 1.0,
'class2': 2.0,
}
options = Setting()
options.shuffle_class = True
options.probabilities = probabilities
options.custom_augment_fun = custom_augmentations
options.data = 'PATH/TO/YAML_FILE.yaml'
run_train(options)The custom augmentation is a function that receives the image and bounding boxes as numpy arrays and returns the modified image and bounding boxes
An illustration of an augmentation function is presented below:
def custom_augmentation(image, bounding_boxes):
.
.
.
return augmented_image, augmented_bounding_boxesIn order to perform the removal of bounding boxes from images the fields excluded_classes and mapping_classes
from the yaml are used. The purpose of this addition is the following: Imagine that we have a small dataset of people.
In this dataset, each individual person is labelled, but in the dataset there are also hands, heads and other parts of
the human body, but very few of them. In order to get an initial model, it would be convenient not to use these objects,
but we do want to keep using those images because complete persons may be in those images as well.
To solve this issue, this code allows the user to select a set of classes that are going to be removed, meaning the objects in the images will be replaced with random noise.
The bounding boxes in training and validation contain the classes that are being represented.
These classes are going to be integers from 0 to nc_complete, in contrast to nc as in the original code.
The nc represents the number of classes that are going to be detected whereas nc_complete is the total number of
classes in the dataset.
An example of these variables is shown below:
The nc_complete must be passed to the yaml file together with the other two parameters. An example of this yaml file is:
excluded_classes:
- 3
mapping_classes:
0: 0
1: 1
2: 2
names:
- fire_extinguisher_shell
- forklift
- ladder
nc: 3
nc_complete: 4
train: PATH/TO/training.txt
val: PATH/TO/validation.txt
This means that in the dataset there will be 4 classes, the first three are normal ones and they will work in the same way as the original yolov7 code, but the fourth class (index number 3) will be excluded, so the bounding box will not be used for training, but the section of the image inside that bounding box will be changed for noise.
Lastly, the shuffle_class and probabilities are used to bias the dataset when needed.
It allows to pass the relative rate of each class.
For instance, if we have 3 classes: barrier, manhole and water_barrier and we want the class manhole to
be used twice as much as the other two since this class seems to be harder for the model to train it properly. Then,
probabilities = {'barrier': 1, 'manhole': 2, 'water_barrier': 1}Notice the shuffle_class is used to radomly select data from a class or to use all the dataset as is.
So, if shuffle_class is False (default behaviour and the only one in the original code) then all the
dataset will be used, for instance, if there are 100 images (80 of class barrier and 20 of the rests)
the 100 images will be used per epoch. When shuffle_class is True, the training
will select each class with equal probability regardless of the number of images of each class. So, 'barrier',
'manhole', 'water_barrier' will be selected with the same probability even when 'barrier' is more common.
If probabilities is provided then the classes will be selected following that rate instead of with
equal probability.
yolov7_training.pt yolov7x_training.pt yolov7-w6_training.pt yolov7-e6_training.pt yolov7-d6_training.pt yolov7-e6e_training.pt
Single GPU finetuning for custom dataset
# finetune p5 models
python train.py --workers 8 --device 0 --batch-size 32 --data data/custom.yaml --img 640 640 --cfg cfg/training/yolov7-custom.yaml --weights 'yolov7_training.pt' --name yolov7-custom --hyp data/hyp.scratch.custom.yaml
# finetune p6 models
python train_aux.py --workers 8 --device 0 --batch-size 16 --data data/custom.yaml --img 1280 1280 --cfg cfg/training/yolov7-w6-custom.yaml --weights 'yolov7-w6_training.pt' --name yolov7-w6-custom --hyp data/hyp.scratch.custom.yamlOn video:
python detect.py --weights yolov7.pt --conf 0.25 --img-size 640 --source yourvideo.mp4On image:
python detect.py --weights yolov7.pt --conf 0.25 --img-size 640 --source inference/images/horses.jpgPytorch to CoreML (and inference on MacOS/iOS)
Pytorch to ONNX with NMS (and inference)
python export.py --weights yolov7-tiny.pt --grid --end2end --simplify \
--topk-all 100 --iou-thres 0.65 --conf-thres 0.35 --img-size 640 640 --max-wh 640Pytorch to TensorRT with NMS (and inference)
wget https://github.com/WongKinYiu/yolov7/releases/download/v0.1/yolov7-tiny.pt
python export.py --weights ./yolov7-tiny.pt --grid --end2end --simplify --topk-all 100 --iou-thres 0.65 --conf-thres 0.35 --img-size 640 640
git clone https://github.com/Linaom1214/tensorrt-python.git
python ./tensorrt-python/export.py -o yolov7-tiny.onnx -e yolov7-tiny-nms.trt -p fp16Pytorch to TensorRT another way
Expand
wget https://github.com/WongKinYiu/yolov7/releases/download/v0.1/yolov7-tiny.pt
python export.py --weights yolov7-tiny.pt --grid --include-nms
git clone https://github.com/Linaom1214/tensorrt-python.git
python ./tensorrt-python/export.py -o yolov7-tiny.onnx -e yolov7-tiny-nms.trt -p fp16
# Or use trtexec to convert ONNX to TensorRT engine
/usr/src/tensorrt/bin/trtexec --onnx=yolov7-tiny.onnx --saveEngine=yolov7-tiny-nms.trt --fp16Tested with: Python 3.7.13, Pytorch 1.12.0+cu113
See keypoint.ipynb.
See instance.ipynb.
YOLOv7 for instance segmentation (YOLOR + YOLOv5 + YOLACT)
| Model | Test Size | APbox | AP50box | AP75box | APmask | AP50mask | AP75mask |
|---|---|---|---|---|---|---|---|
| YOLOv7-seg | 640 | 51.4% | 69.4% | 55.8% | 41.5% | 65.5% | 43.7% |
YOLOv7 with decoupled TAL head (YOLOR + YOLOv5 + YOLOv6)
| Model | Test Size | APval | AP50val | AP75val |
|---|---|---|---|---|
| YOLOv7-u6 | 640 | 52.6% | 69.7% | 57.3% |
@inproceedings{wang2023yolov7,
title={{YOLOv7}: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors},
author={Wang, Chien-Yao and Bochkovskiy, Alexey and Liao, Hong-Yuan Mark},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
year={2023}
}
@article{wang2023designing,
title={Designing Network Design Strategies Through Gradient Path Analysis},
author={Wang, Chien-Yao and Liao, Hong-Yuan Mark and Yeh, I-Hau},
journal={Journal of Information Science and Engineering},
year={2023}
}
YOLOv7-semantic & YOLOv7-panoptic & YOLOv7-caption
YOLOv7-semantic & YOLOv7-detection & YOLOv7-depth (with NTUT)
YOLOv7-3d-detection & YOLOv7-lidar & YOLOv7-road (with NTUT)
Expand
- https://github.com/AlexeyAB/darknet
- https://github.com/WongKinYiu/yolor
- https://github.com/WongKinYiu/PyTorch_YOLOv4
- https://github.com/WongKinYiu/ScaledYOLOv4
- https://github.com/Megvii-BaseDetection/YOLOX
- https://github.com/ultralytics/yolov3
- https://github.com/ultralytics/yolov5
- https://github.com/DingXiaoH/RepVGG
- https://github.com/JUGGHM/OREPA_CVPR2022
- https://github.com/TexasInstruments/edgeai-yolov5/tree/yolo-pose