Research by Qatar Road Safety Studies Center (QRSCC) found that the total number of traffic accidents in Qatar was 290,829 in 2013, with huge economical cost that amounts to 2.7 percent of the country's gross domestic product (GDP). There is a growing research effort to improve road safety and to develop automobile driving assistance systems or even self-driving systems like Google project, which is widely expected to revolutionize automotive industry. Vision sensors will play a prominent role in such applications because they provide intuitive and rich information about the road condition. However, vehicle detection and tracking based on vision information is a challenging task because of the large variability of appearance of vehicles, interference of strong light and sometimes fierce weather condition, and complex interactions amongst drivers.


While previous work usually regards vehicle detection and tracking as separate tasks [1, 2], we propose a unified framework for both tasks. In the detection phase, recent work has mainly focused on building detection systems based on robust feature sets such as histograms of oriented gradients (HOG) [3] and Harr-like features [4] rather than just simple features such as symmetry or edges. However, these robust features involve heavy computational requirements. In this work, we propose an algorithmic framework designed to target both high efficiency and robustness while keeping the computational requirement at an acceptable level.


In the detection phase, in order to reduce the processing latency, we propose to use a hardware-friendly corner detection method obtained from accelerated segment test feature (FAST) [5], which determine interest corners by simply comparing each pixel with its 9 neighbors. If there are contiguous neighboring pixels that are all brighter or darker than a center pixel, it is marked as a corner point. Fig.1 shows the result of FAST corner detector on a real road image. We use recent Learned Arrangements of Three Patch Codes (LATCH) [6] as corner point descriptor. The descriptor is falls into binary descriptor category, but still maintains high performance comparable to histogram based descriptors (like HOG). The descriptors created by LATCH are binary strings, which are computed by comparing image patch-triplets rather than image pixels and as a result, they are less sensitive to noises and minor changes in local appearances. In order to detect vehicles, corners in the successive images are matched to those in the previous images, and thus optical flow at each corner point can be derived according to the movement of corner points. Because of the fact that approaching vehicles in opposite direction will produce a diverging flow, vehicles can be detected from the flow due to ego-motion. Fig.2 illustrates the flow estimated from corner point matching. Sparse optical flow proposed here is quite robust because of the LATCH characteristics, and it also requires much lower computational resources compared to traditional optical flow methods that need to solve time-consuming optimization problem.

Once vehicles are detected, the tracking phase is achieved by matching the corner points. Using Kalman filter for prediction, the matching is fast because probable matched corner point will only be searched near the predicted location. Using corner points for computing sparse optical flow enables the vehicle detection and tracking to be carried-out simultaneously using this unified framework (Fig.3). In addition, this framework allows us to detect newly entered cars in the scene during tracking. Since most image sensors today are based on a rolling shutter integration approach, the image information can be transmitted to the FPGA-based hardware serially and hence the FAST detector and LATCH descriptor could work in a pipeline manner for achieving efficient computation.


In this work, we propose a framework of detecting and tracking vehicles for driving assistance application. The descriptors created by LATCH are binary strings, which are computed by comparing image patch-triplets rather than image pixels and as a result, they are less sensitive to noises and minor changes in local appearances. The vehicles are detected from sparse flow estimated from corner point matching and vehicle tracking is done also with corner point matching with the assistance of Kalman filter. The proposed framework is robust, efficient and requires much lower computational requirements making it a very viable solution for embedded vehicle detection and tracking systems.


[1] S. Sivaraman and M. M. Trivedi, “Looking at vehicles on the road: A survey of vision-based vehicle detection, tracking, and behavior analysis,” IEEE Trans. Intell. Transp. Syst., vol. 14, no. 4,

pp. 1773–1795, 2013.

[2] Z. Sun, G. Bebis, and R. Miller, “On-Road Vehicle Detection: A Review,” vol. 28, no. 5,

pp. 694–711, 2006.

[3] Z. Sun, G. Bebis, and R. Miller, “Monocular precrash vehicle detection: Features and classifiers,” IEEE Trans. Image Process., vol. 15, no. 7, pp. 2019–2034, 2006.

[4] W. C. Chang and C. W. Cho, “Online boosting for vehicle detection,” IEEE Trans. Syst. Man, Cybern. Part B Cybern., vol. 40, no. 3, pp. 892–902, 2010.

[5] E. Rosten and T. Drummond, “Fusing points and lines for high performance tracking,” Tenth IEEE Int. Conf. Comput. Vis. Vol. 1, vol. 2, pp. 1508–1515 Vol. 2, 2005.

[6] G. Levi and T. Hassner, “LATCH: Learned Arrangements of Three Patch Codes,” arXiv, 2015.


Article metrics loading...

Loading full text...

Full text loading...

This is a required field
Please enter a valid email address
Approval was a Success
Invalid data
An Error Occurred
Approval was partially successful, following selected items could not be processed due to error