Lightweight and efficient deep learning models for fruit detection in orchards.
Attention mechanism
Deep learning
Lightweight network
Object detection
Recognition of apple
Journal
Scientific reports
ISSN: 2045-2322
Titre abrégé: Sci Rep
Pays: England
ID NLM: 101563288
Informations de publication
Date de publication:
30 10 2024
30 10 2024
Historique:
received:
31
07
2024
accepted:
15
10
2024
medline:
31
10
2024
pubmed:
31
10
2024
entrez:
31
10
2024
Statut:
epublish
Résumé
The accurate recognition of apples in complex orchard environments is a fundamental aspect of the operation of automated picking equipment. This paper aims to investigate the influence of dense targets, occlusion, and the natural environment in practical application scenarios. To this end, it constructs a fruit dataset containing different scenarios and proposes a real-time lightweight detection network, ELD(Efficient Lightweight object Detector). The EGSS(Efficient Ghost-shuffle Slim module) module and MCAttention(Mix channel Attention) are proposed as innovative solutions to the problems of feature extraction and classification. The attention mechanism is employed to construct a novel feature extraction network, which effectively utilizes the low-latitude feature information, significantly enhances the fine-grained feature information and gradient flow of the model, and improves the model's anti-interference ability. Eliminate redundant channels with SlimPAN to further compress the network and optimise functionality. The network as a whole employs the Shape-IOU loss function, which considers the influence of the bounding box itself, thereby enhancing the robustness of the model. Finally, the target detection accuracy is enhanced through the transfer of knowledge from the teacher's network through knowledge distillation, while ensuring that the overall network is sufficiently lightweight. The experimental results demonstrate that the ELD network, designed for fruit detection, achieves an accuracy of 87.4%. It has a relatively low number of parameters (
Identifiants
pubmed: 39477987
doi: 10.1038/s41598-024-76662-w
pii: 10.1038/s41598-024-76662-w
doi:
Types de publication
Journal Article
Langues
eng
Sous-ensembles de citation
IM
Pagination
26086Informations de copyright
© 2024. The Author(s).
Références
Lehnert, C., Sa, I., McCool, C., Upcroft, B. & Perez, T. Sweet pepper pose detection and grasping for automated crop harvesting. In 2016 IEEE International Conference on Robotics and Automation (ICRA), 2428–2434 (IEEE, 2016).
Wu, L., Ma, J., Zhao, Y. & Liu, H. Apple detection in complex scene using the improved yolov4 model. Agronomy11, 476 (2021).
doi: 10.3390/agronomy11030476
Liu, X., Zhao, D., Jia, W., Ji, W. & Sun, Y. A detection method for apple fruits based on color and shape features. IEEE Access7, 67923–67933 (2019).
doi: 10.1109/ACCESS.2019.2918313
Tian, Y. et al. Apple detection during different growth stages in orchards using the improved yolo-v3 model. Comput. Electron. Agric.157, 417–426 (2019).
doi: 10.1016/j.compag.2019.01.012
Xiao, B., Nguyen, M. & Yan, W. Q. Apple ripeness identification from digital images using transformers. Multimed. Tools Appl.83, 7811–7825 (2024).
doi: 10.1007/s11042-023-15938-1
Xue, Y. et al. Handling occlusion in UAV visual tracking with query-guided redetection. IEEE Trans. Instrum. Measure.73, 5030217 (2024).
doi: 10.1109/TIM.2024.3440378
Xue, Y. et al. Consistent representation mining for multi-drone single object tracking. IEEE Trans. Circ. Syst. Video Technol.[SPACE] https://doi.org/10.1109/TCSVT.2024.3411301 (2024).
doi: 10.1109/TCSVT.2024.3411301
Xue, Y. et al. Mobiletrack: Siamese efficient mobile network for high-speed UAV tracking. IET Image Proc.16, 3300–3313 (2022).
doi: 10.1049/ipr2.12565
Xue, Y., Jin, G., Shen, T., Tan, L. & Wang, L. Template-guided frequency attention and adaptive cross-entropy loss for UAV visual tracking. Chin. J. Aeronaut.36, 299–312 (2023).
doi: 10.1016/j.cja.2023.03.048
Xue, Y. et al. Smalltrack: Wavelet pooling and graph enhanced classification for uav small object tracking. IEEE Trans. Geosci. Remote Sens.61, 5618815 (2023).
doi: 10.1109/TGRS.2023.3305728
Xu, R. et al. Instance segmentation of biological images using graph convolutional network. Eng. Appl. Artif. Intell.110, 104739 (2022).
doi: 10.1016/j.engappai.2022.104739
Zhang, J., Qian, S. & Tan, C. Automated bridge surface crack detection and segmentation using computer vision-based deep learning model. Eng. Appl. Artif. Intell.115, 105225 (2022).
doi: 10.1016/j.engappai.2022.105225
Jaderberg, M., Simonyan, K., Zisserman, A. et al. Spatial transformer networks. Adv. Neural Inform. Process. Syst.28 (2015).
Yu, J., Jiang, Y., Wang, Z., Cao, Z. & Huang, T. Unitbox: An advanced object detection network. In Proceedings of the 24th ACM international conference on Multimedia, pp. 516–520 (2016).
Rezatofighi, H. et al. Generalized intersection over union: A metric and a loss for bounding box regression. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 658–666 (2019).
Zheng, Z. et al. Distance-IOU loss: Faster and better learning for bounding box regression. Proc. AAAI Conf. Artif. Intell.34, 12993–13000 (2020).
Zhang, Y.-F. et al. Focal and efficient IOU loss for accurate bounding box regression. Neurocomputing506, 146–157 (2022).
doi: 10.1016/j.neucom.2022.07.042
Gevorgyan, Z. Siou loss: More powerful learning for bounding box regression. arXiv preprint[SPACE] arXiv:2205.12740 (2022).
Han, K. et al. Ghostnet: More features from cheap operations. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 1580–1589 (2020).
Xiong, Y. et al. Mobiledets: Searching for object detection architectures for mobile accelerators. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 3825–3834 (2021).
Yu, G. et al. Pp-picodet: A better real-time object detector on mobile devices. arXiv preprint[SPACE] arXiv:2111.00902 (2021).
Maaz, M. et al. Edgenext: Efficiently amalgamated CNN-transformer architecture for mobile vision applications. In European conference on computer vision 3–20 (Springer, Berlin, 2022).
Hinton, G. Distilling the knowledge in a neural network. arXiv preprint[SPACE] arXiv:1503.02531 (2015).
Lan, Q. & Tian, Q. Instance, scale, and teacher adaptive knowledge distillation for visual detection in autonomous driving. IEEE Trans. Intell. Vehic.8, 2358–2370 (2022).
doi: 10.1109/TIV.2022.3217261
Redmon, J., Divvala, S., Girshick, R. & Farhadi, A. You only look once: Unified, real-time object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016).
Redmon, J. & Farhadi, A. Yolo9000: better, faster, stronger. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7263–7271 (2017).
Redmon, J. & Farhadi, A. Yolov3: An incremental improvement. arXiv preprint [SPACE] arXiv:1804.02767 (2018).
Bochkovskiy, A., Wang, C.-Y. & Liao, H.-Y. M. Yolov4: Optimal speed and accuracy of object detection. arXiv preprint[SPACE] arXiv:2004.10934 (2020).
Patnaik, S. K., Babu, C. N. & Bhave, M. Intelligent and adaptive web data extraction system using convolutional and long short-term memory deep learning networks. Big Data Min. Anal.4, 279–297 (2021).
doi: 10.26599/BDMA.2021.9020012
Im Choi, J. & Tian, Q. Visual-saliency-guided channel pruning for deep visual detectors in autonomous driving. In 2023 IEEE Intelligent Vehicles Symposium (IV), 1–6 (IEEE, 2023).
Park, S., Kang, D. & Paik, J. Cosine similarity-guided knowledge distillation for robust object detectors. Sci. Rep.14, 18888 (2024).
doi: 10.1038/s41598-024-69813-6
pubmed: 39143179
pmcid: 11324720
Wang, J. et al. Multi-constraint molecular generation based on conditional transformer, knowledge distillation and reinforcement learning. Nat. Mach. Intell.3, 914–922 (2021).
doi: 10.1038/s42256-021-00403-1
Kim, M.-J. et al. Screening obstructive sleep apnea patients via deep learning of knowledge distillation in the lateral cephalogram. Sci. Rep.13, 17788 (2023).
doi: 10.1038/s41598-023-42880-x
pubmed: 37853030
pmcid: 10584979
Zhao, J. et al. Yolo-granada: A lightweight attentioned yolo for pomegranates fruit detection. Sci. Rep.14, 16848 (2024).
doi: 10.1038/s41598-024-67526-4
pubmed: 39039263
pmcid: 11263582
Wang, J. et al. Toward surface defect detection in electronics manufacturing by an accurate and lightweight yolo-style object detector. Sci. Rep.13, 7062 (2023).
doi: 10.1038/s41598-023-33804-w
pubmed: 37127646
pmcid: 10151317
Guo, H., Wu, T., Gao, G., Qiu, Z. & Chen, H. Lightweight safflower cluster detection based on yolov5. Sci. Rep.14, 18579 (2024).
doi: 10.1038/s41598-024-69584-0
pubmed: 39127852
pmcid: 11316755
Lin, H., Cheng, X., Wu, X. & Shen, D. Cat: Cross attention in vision transformer. In 2022 IEEE International Conference on Multimedia and Expo (ICME), 1–6 (IEEE, 2022).
Fu, J. et al. Dual attention network for scene segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 3146–3154 (2019).
Wan, D. et al. Mixed local channel attention for object detection. Eng. Appl. Artif. Intell.123, 106442 (2023).
doi: 10.1016/j.engappai.2023.106442
Chollet, F. Xception: Deep learning with depthwise separable convolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 1251–1258 (2017).
Zhang, H. & Zhang, S. Shape-iou: More accurate metric considering bounding box shape and scale. arXiv preprint[SPACE] arXiv:2312.17663 (2023).
Wang, Q. et al. Eca-net: Efficient channel attention for deep convolutional neural networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 11534–11542 (2020).
Hu, J., Shen, L. & Sun, G. Squeeze-and-excitation networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7132–7141 (2018).
Hou, Q., Zhou, D. & Feng, J. Coordinate attention for efficient mobile network design. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 13713–13722 (2021).
Woo, S., Park, J., Lee, J.-Y. & Kweon, I. S. Cbam: Convolutional block attention module. In Proceedings of the European Conference on Computer Vision (ECCV), pp. 3–19 (2018).
Yang, L., Zhang, R.-Y., Li, L. & Xie, X. Simam: A simple, parameter-free attention module for convolutional neural networks. In International Conference on Machine Learning, pp. 11863–11874 (PMLR, 2021).
Alexey Bochkovskiy, H.-Y. M. L., Chien-Yao Wang. Yolov5. https://github.com/ultralytics/yolov5 (2021).
Wang, C.-Y., Bochkovskiy, A. & Liao, H.-Y. M. Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7464–7475 (2023).
Li, C. et al. Yolov6: A single-stage object detection framework for industrial applications. arXiv preprint[SPACE] arXiv:2209.02976 (2022).
Alexey Bochkovskiy, C.-Y. W. & Liao, H.-Y. M. Yolov8. https://github.com/ultralytics/ultralytics (2023).
Wang, C. et al. Gold-yolo: Efficient object detector via gather-and-distribute mechanism. Adv. Neural Inf. Process. Syst.36 (2024).
Wang, C.-Y., Yeh, I.-H. & Liao, H.-Y. M. Yolov9: Learning what you want to learn using programmable gradient information. arXiv preprint[SPACE] arXiv:2402.13616 (2024).
Wang, A. et al. Yolov10: Real-time end-to-end object detection. arXiv preprint[SPACE] arXiv:2405.14458 (2024).
Ma, N., Zhang, X., Zheng, H.-T. & Sun, J. Shufflenet v2: Practical guidelines for efficient cnn architecture design. In Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018).
Tang, Y. et al. Ghostnetv2: Enhance cheap operation with long-range attention. Adv. Neural. Inf. Process. Syst.35, 9969–9982 (2022).
Howard, A. et al. Searching for mobilenetv3. In Proceedings of the IEEE/CVF International Conference on computer Vision, pp. 1314–1324 (2019).
Chen, J. et al. Run, don’t walk: chasing higher flops for faster neural networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 12021–12031 (2023).
Ma, X., Li, Y., Yang, Z., Li, S. & Li, Y. Lightweight network for millimeter-level concrete crack detection with dense feature connection and dual attention. J. Build. Eng.94, 109821 (2024).
doi: 10.1016/j.jobe.2024.109821
Xiao, Q., Li, Q. & Zhao, L. Lightweight sea cucumber recognition network using improved yolov5. IEEE Access11, 44787–44797 (2023).
doi: 10.1109/ACCESS.2023.3272558
Chen, X. & Gong, Z. Yolov5-lite: lighter, faster and easier to deploy. Accessed: Sep22 (2021).
Carion, N. et al. End-to-end object detection with transformers. In European Conference on Computer Vision, pp. 213–229 (Springer, 2020).