CYGERT, S.; CZYŻEWSKI, A. Robust Object Detection with Multi-input Multi-output Faster R-CNN
Conference paper, Lecture Notes in Computer Science, 15 May 2022. DOI: 10.1007/978-3-031-06427-2_48

Recent years have seen impressive progress in visual recognition on many benchmarks, however, generalization to the out-of-distribution setting remains a significant challenge. A state-of-the-art method for robust visual recognition is model ensembling. However, recently it was shown that similarly competitive results could be achieved with a much smaller cost, by using multi-input multi-output architecture (MIMO).

In this work, a generalization of the MIMO approach is applied to the task of object detection using the general-purpose Faster R-CNN model. It was shown that using the MIMO framework allows building strong feature representation and obtains very competitive accuracy when using just two input/output pairs. Furthermore, it adds just 0.5% additional model parameters and increases the inference time by 15.9% when compared to the standard Faster R-CNN. It also works comparably to or outperforms the Deep Ensemble approach in terms of model accuracy, robustness to out-of-distribution setting, and uncertainty calibration when the same number of predictions is used. This work opens up avenues for applying the MIMO approach in other high-level tasks such as semantic segmentation and depth estimation.

CZYTAJ PUBLIKACJĘ

SZWOCH, G.; KOTUS, J. Acoustic Detector of Road Vehicles Based on Sound Intensity.
Sensors 2021, 21, 7781. https://doi.org/10.3390/s21237781

A method of detecting and counting road vehicles using an acoustic sensor placed by theroad is presented. The sensor measures sound intensity in two directions: parallel and perpendicularto the road. The sound intensity analysis performs acoustic event detection. A normalized positionof the sound source is tracked and used to determine if the detected event is related to a movingvehicle and to establish the direction of movement. The algorithm was tested on a continuous 24-hrecording made in real-world conditions. The overall results were: recall 0.95, precision 0.95, F-score0.95. In the analysis of one-hour slots, the worst results obtained in dense traffic were: recall 0.9,precision 0.93, F-score 0.91. The proposed method is intended for application in a network of trafficmonitoring sensors, such as a smart city system. Its advantages include using a small, low cost andpassive sensor, low algorithm complexity, and satisfactory detection accuracy.

CZYTAJ PUBLIKACJĘ

KOTUS J, SZWOCH G, Estimation of Average Speed of Road Vehicles by Sound Intensity Analysis, Sensors 2021, 21(16), 5337,
doi:10.3390/s21165337

Constant monitoring of road traffic is important part of modern smart city systems. The proposed method estimates average speed of road vehicles in the observation period, using a passive acoustic vector sensor. Speed estimation based on sound intensity analysis is a novel approach to the described problem. Sound intensity in two orthogonal axes is measured with a sensor placed alongside the road. Position of the apparent sound source when a vehicle passes by the sensor is estimated by means of sound intensity analysis in three frequency bands: 1 kHz, 2 kHz and 4 kHz. The position signals calculated for each vehicle are averaged in the analysis time frames, and the average speed estimate is calculated using a linear regression. The proposed method was validated in two experiments, one with controlled vehicle speed and another with real, unrestricted traffic. The calculated speed estimates were compared with the reference lidar and radar sensors. Average estimation error from all experiment was 1.4% and the maximum error was 3.2%. The results confirm that the proposed method allow for estimation of time-averaged road traffic speed with accuracy sufficient for gathering traffic statistics, e.g., in a smart city monitoring station.

CZYTAJ PUBLIKACJĘ

CYGERT S, CZYŻEWSKI A, Robustness in Compressed Neural Networks for Object Detection, IEEE Access, Proc. IJCNN 2021. https://doi.org/10.1109/IJCNN52387.2021.9533773

Model compression techniques allow to significantly reduce the computational cost associated with data processing by deep neural networks with only a minor decrease in average accuracy. Simultaneously, reducing the model size may have a large effect on noisy cases or objects belonging to less frequent classes. It is a crucial problem from the perspective of the models’ safety, especially for object detection in the autonomous driving setting, which is considered in this work.
It was shown in the paper that the sensitivity of compressed models to different distortion types is nuanced, and some of the corruptions are heavily impacted by the compression methods (i.e., additive noise), while others (blur effect) are only slightly affected. A common way to improve the robustness of models is to
use data augmentation, which was confirmed to positively affect models’ robustness, also for highly compressed models. It was further shown that while data imbalance methods brought only a slight increase in accuracy for the baseline model (without compression), the impact was more striking at higher compression
rates for the structured pruning. Finally, methods for handling data imbalance brought a significant improvement of the pruned models’ worst-detected class accuracy.

CZYTAJ PUBLIKACJĘ

CYGERT S, WRÓBLEWSKI B, SŁOWIŃSKI R, WOŹNIAK K, CZYŻEWSKI A, Closer Look at the Uncertainty Estimation in Semantic Segmentation under Distributional Shift, IEEE Access, Proc. IJCNN 2021.
https://doi.org/10.1109/IJCNN52387.2021.9533330

While recent computer vision algorithms achieve impressive performance on many benchmarks, they lack robustness – presented with an image from a different distribution, (e.g. weather or lighting conditions not considered during training), they may produce an erroneous prediction. Therefore, it is desired that such a model will be able to reliably predict its confidence measure. In this work, uncertainty estimation for the task of semantic segmentation is evaluated under a varying level of domain shift: in a cross-dataset setting and when adapting a model trained on data from the simulation. It was shown that simple color  transformations already provide a strong baseline, comparable to using more sophisticated style-transfer data augmentation. Further, by constructing an ensemble consisting of models using different backbones and/or augmentation methods, it was possible to improve significantly model performance in terms of overall accuracy and uncertainty estimation under the domain shift setting. The Expected Calibration Error (ECE) on challenging GTA to Cityscapes adaptation was reduced from 4.05 to the competitive value of 1.1. Further, an ensemble of models was utilized in the self-training setting to improve the pseudolabels
generation, which resulted in a significant gain in the final model accuracy, compared to the standard fine-tuning (without ensemble).

CZYTAJ PUBLIKACJĘ

CYGERT S, CZYŻEWSKI A, Toward Robust Pedestrian Detection With Data
Augmentation. IEEE Access, Vol. 8, pp. 136674-136683, 2020.
doi:10.1109/ACCESS.2020.3011356

In this article, the problem of creating a safe pedestrian detection model that can operate in the real world is tackled. While recent advances have led to significantly improved detection accuracy on various benchmarks, existing deep learning models are vulnerable to invisible to the human eye changes in the input image which raises concerns about its safety. A popular and simple technique for improving robustness is using data augmentation. In this work, the robustness of existing data augmentation techniques is evaluated to propose a new simple augmentation scheme where during training, an image is combined with a patch of a stylized version of that image. Evaluation of pedestrian detection models robustness and uncertainty calibration under naturally occurring corruption and in realistic cross-dataset evaluation setting is conducted to show that our proposed solution improves upon previous work. In this paper, the importance of testing the robustness of recognition models is emphasized and it shows a simple way to improve it, which is a step towards creating robust pedestrian and object detection models.

CZYTAJ PUBLIKACJĘ