Detección 3D en infraestructuras inteligentes basada en imagen monocular

Javier Borau Bernad; Álvaro Ramajo Ballester; José María Armingol Moreno

doi:10.17979/ja-cea.2024.45.10737

Autores/as

Javier Borau Bernad Laboratorio de Sistemas Inteligentes, Universidad Carlos III de Madrid https://orcid.org/0009-0009-5623-1688
Álvaro Ramajo Ballester Laboratorio de Sistemas Inteligentes, Universidad Carlos III de Madrid https://orcid.org/0000-0001-9425-9408
José María Armingol Moreno Laboratorio de Sistemas Inteligentes, Universidad Carlos III de Madrid https://orcid.org/0000-0002-3353-9956

DOI:

https://doi.org/10.17979/ja-cea.2024.45.10737

Palabras clave:

Sistemas Inteligentes de Transporte, Machine Learning, Integración de sensores y percepción, Transporte Inteligente, Percepción y detección

Resumen

En los últimos años, los avances en Deep Learning y Visión por Computador han impulsado el desarrollo de algoritmos de detección monocular aplicados a la gestión y seguridad del tráfico urbano, con el objetivo de optimizar la recolección de datos en entornos urbanos para las ciudades inteligentes del futuro. Sin embargo, estos esfuerzos han estado predominantemente enfocados en la extracción de datos desde la perspectiva del vehículo, pasando por alto las ventajas que ofrece el uso de cámaras instaladas en la infraestructura. Este artículo se centra en el estudio de la obtención de datos tridimensionales del tráfico desde esta perspectiva alternativa, aprovechando un punto de vista superior para evitar oclusiones y obtener información más precisa sobre el tamaño y la posición de los vehículos. Así, esta investigación propone un nuevo enfoque metodológico para la integración de sistemas de visión por computador basados en infraestructuras, aplicados a los Sistemas Inteligentes de Transporte.

Referencias

Caesar, H., Bankiti, V., Lang, A. H., Vora, S., Liong, V. E., Xu, Q., Krishnan, A., Pan, Y., Baldan, G., Beijbom, O., June 2020. nuscenes: A multimodal dataset for autonomous driving. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

Chen, Y., Tai, L., Sun, K., Li, M., June 2020. Monopair: Monocular 3d object detection using pairwise spatial relationships. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

Creß, C., Zimmer, W., Strand, L., Fortkord, M., Dai, S., Lakshminarasimhan, V., Knoll, A., 2022. A9-dataset: Multi-sensor infrastructure-based dataset for mobility research. In: 2022 IEEE Intelligent Vehicles Symposium (IV). pp. 965–970. DOI: 10.1109/IV51971.2022.9827401

Geiger, A., Lenz, P., Stiller, C., Urtasun, R., 2013. Vision meets robotics: The kitti dataset. The International Journal of Robotics Research 32, 1231 – DOI: 10.1177/0278364913491297

Geiger, A., Lenz, P., Urtasun, R., 2012. Are we ready for autonomous driving? the kitti vision benchmark suite. 2012 IEEE Conference on Computer Vision and Pattern Recognition, 3354–3361. DOI: 10.1109/CVPR.2012.6248074

He, K., Zhang, X., Ren, S., Sun, J., 2015. Deep residual learning for image recognition.

Li, Z., Jia, J., Shi, Y., 2023. Monolss: Learnable sample selection for monocular 3d detection.

Liao, Y., Xie, J., Geiger, A., 2023. Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Transactions on Pattern Analysis and Machine Intelligence 45 (3), 3292–3310. DOI: 10.1109/TPAMI.2022.3179507

Liu, X., Xue, N., Wu, T., 2021. Learning auxiliary monocular contexts helps monocular 3d object detection.

Liu, Z., Wu, Z., T’oth, R., 2020. Smoke: Single-stage monocular 3d object detection via keypoint estimation. 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), 4289–4298. DOI: 10.1109/CVPRW50498.2020.00506

Ma, X., Zhang, Y., Xu, D., Zhou, D., Yi, S., Li, H., Ouyang, W., June 2021. Delving into localization errors for monocular 3d object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). pp. 4721–4730.

Mao, J., Niu, M., Jiang, C., Liang, H., Chen, J., Liang, X., Li, Y., Ye, C., Zhang, W., Li, Z., Yu, J., Xu, H., Xu, C., 2021. One million scenes for autonomous driving: Once dataset.

MMDetection3D Contributors, 2020. MMDetection3D: OpenMMLab nextgeneration platform for general 3D object detection. https://github.com/open-mmlab/mmdetection3d.

Patil, A., Malla, S., Gang, H., Chen, Y.-T., 2019. The h3d dataset for fullsurround 3d multi-object detection and tracking in crowded urban scenes. In: 2019 International Conference on Robotics and Automation (ICRA). pp. 9552–9557. DOI: 10.1109/ICRA.2019.8793925

Ramajo-Ballester, A., de la Escalera Hueso, A., Armingol Moreno, J. M., 3D Object Detection for Autonomous Driving: A Practical Survey. In: 9th International Conference on Vehicle Technology and Intelligent Transport Systems. pp. 64–73. DOI: 10.5220/0011748400003479

Shi, X., Ye, Q., Chen, X., Chen, C., Chen, Z., Kim, T.-K., October 2021. Geometry-based distance decomposition for monocular 3d object detection. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV). pp. 15172–15181.

Sun, P., Kretzschmar, H., Dotiwalla, X., Chouard, A., Patnaik, V., Tsui, P., Guo, J., Zhou, Y., Chai, Y., Caine, B., Vasudevan, V., Han, W., Ngiam, J., Zhao, H., Timofeev, A., Ettinger, S., Krivokon, M., Gao, A., Joshi, A., Zhang, Y., Shlens, J., Chen, Z., Anguelov, D., June 2020. Scalability in perception for autonomous driving: Waymo open dataset. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

Wang, T., Zhu, X., Pang, J., Lin, D., 2021a. Fcos3d: Fully convolutional onestage monocular 3d object detection.

Wang, T., Zhu, X., Pang, J., Lin, D., 2021b. Probabilistic and geometric depth: Detecting objects in perspective.

Wilson, B., Qi, W., Agarwal, T., Lambert, J., Singh, J., Khandelwal, S., Pan, B., Kumar, R., Hartnett, A., Pontes, J. K., Ramanan, D., Carr, P., Hays, J., 2023. Argoverse 2: Next generation datasets for self-driving perception and forecasting.

Xiao, P., Shao, Z., Hao, S., Zhang, Z., Chai, X., Jiao, J., Li, Z., Wu, J., Sun, K., Jiang, K., Wang, Y., Yang, D., 2021. Pandaset: Advanced sensor suite dataset for autonomous driving. In: 2021 IEEE International Intelligent Transportation Systems Conference (ITSC). pp. 3095–3101. DOI: 10.1109/ITSC48978.2021.9565009

Ye, X., Shu, M., Li, H., Shi, Y., Li, Y., Wang, G., Tan, X., Ding, E., June 2022. Rope3d: The roadside perception dataset for autonomous driving and monocular 3d object detection task. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). pp. 21341–21350.

Yu, F., Wang, D., Darrell, T., 2017. Deep layer aggregation. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2403–2412. DOI: 10.1109/CVPR.2018.00255

Yu, H., Luo, Y., Shu, M., Huo, Y., Yang, Z., Shi, Y., Guo, Z., Li, H., Hu, X., Yuan, J., Nie, Z., June 2022. Dair-v2x: A large-scale dataset for vehicle-infrastructure cooperative 3d object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). pp. 21361–21370.

Detección 3D en infraestructuras inteligentes basada en imagen monocular

Autores/as

DOI:

Palabras clave:

Resumen

Referencias

Descargas

Publicado

Número

Sección

Licencia

Enviar un artículo

Últimas publicaciones

Idioma