تعداد نشریات | 11 |
تعداد شمارهها | 210 |
تعداد مقالات | 2,098 |
تعداد مشاهده مقاله | 2,877,126 |
تعداد دریافت فایل اصل مقاله | 2,084,958 |
Text Detection and Recognition for Robot Localization | ||
Journal of Electrical and Computer Engineering Innovations (JECEI) | ||
مقاله 11، دوره 12، شماره 1، فروردین 2024، صفحه 163-174 اصل مقاله (1.26 M) | ||
نوع مقاله: Original Research Paper | ||
شناسه دیجیتال (DOI): 10.22061/jecei.2023.9857.658 | ||
نویسندگان | ||
Z. Raisi* 1؛ J. Zelek2 | ||
1University of Waterloo, Waterloo, Canada and Chabahar Maritime University, Chabahar, Iran. | ||
2Systems Design Engineering Department, University of Waterloo, Canada. | ||
تاریخ دریافت: 05 تیر 1402، تاریخ بازنگری: 14 شهریور 1402، تاریخ پذیرش: 17 شهریور 1402 | ||
چکیده | ||
Background and Objectives: Signage is everywhere, and a robot should be able to take advantage of signs to help it localize (including Visual Place Recognition (VPR)) and map. Robust text detection & recognition in the wild is challenging due to pose, irregular text instances, illumination variations, viewpoint changes, and occlusion factors. Methods: This paper proposes an end-to-end scene text spotting model that simultaneously outputs the text string and bounding boxes. The proposed model leverages a pre-trained Vision Transformer based (ViT) architecture combined with a multi-task transformer-based text detector more suitable for the VPR task. Our central contribution is introducing an end-to-end scene text spotting framework to adequately capture the irregular and occluded text regions in different challenging places. We first equip the ViT backbone using a masked autoencoder (MAE) to capture partially occluded characters to address the occlusion problem. Then, we use a multi-task prediction head for the proposed model to handle arbitrary shapes of text instances with polygon bounding boxes. Results: The evaluation of the proposed architecture's performance for VPR involved conducting several experiments on the challenging Self-Collected Text Place (SCTP) benchmark dataset. The well-known evaluation metric, Precision-Recall, was employed to measure the performance of the proposed pipeline. The final model achieved the following performances, Recall = 0.93 and Precision = 0.8, upon testing on this benchmark. Conclusion: The initial experimental results show that the proposed model outperforms the state-of-the-art (SOTA) methods in comparison to the SCTP dataset, which confirms the robustness of the proposed end-to-end scene text detection and recognition model. | ||
کلیدواژهها | ||
Text detection؛ Text Recognition؛ Robotics Localization؛ Deep Learning؛ Visual Place Recognition | ||
مراجع | ||
آمار تعداد مشاهده مقاله: 604 تعداد دریافت فایل اصل مقاله: 296 |