Video Demo of LiveAR: Real-Time Human Action Recognition over Live Video Streams

Yin Yang

doi:10.5339/qfarc.2016.ICTPP2471

Abstract

We propose to present a video demonstration of LiveAR at the ARC'16 conference. For this purpose, we have prepared three demo videos, which can be found in the submission files. These video demos show the effectiveness and efficiency of LiveAR running on video streams containing a diverse set of human actions. Additionally, the demo also exhibits important system performance parameters such as latency and resource usage.

LiveAR is a novel system for recognizing human actions, such as running and fighting, in a video stream in real time, backed by a massively-parallel processing (MPP) platform. Although action recognition is a well-studied topic in computer vision, so far most attention has been devoted to improving accuracy, rather than efficiency. To our knowledge, LiveAR is the first that achieves real-time efficiency in action recognition, which can be a key enabler in many important applications, e.g., video surveillance and monitoring over critical infrastructure such as water reservoirs. LiveAR is based on a state-of-the-art method for offline action recognition which obtains high accuracy; its main innovation is to adapt this base solution to run on an elastic MPP platform to achieve real-time speed at an affordable cost.

The main objectives in the design of LiveAR are to (i) minimize redundant computations, (ii) reduce communication costs between nodes in the cloud, (iii) allow a high degree of parallelism and (iv) enable dynamic node additions and removals to match the current workload. LiveAR is based on an enhanced version of Apache Storm. Each video manipulation operation is implemented as a bolt (i.e., logical operator) executed by multiple nodes, while the input frame arrive at the system via a spout (i.e., streaming source). The output of the system is presented on screen using FFmpeg.

Next we briefly explain the main operations in LiveAR. The dense point extraction bolt is a first step for video processing, which has two input streams: the input video frame and the current trajectories. The output of this operator consists of dense points sampled in the video frame that are not already on any of the current trajectories. In particular, LiveAR partitions the frame into different regions, and assigns one region to a dense point evaluator, each running in a separate thread. Then, the sampled coordinates are grouped according to the partitioning, and routed to the corresponding dense point evaluator. Meanwhile, coordinates on current trajectories are similarly grouped by a point dispatcher, and routed accordingly. Such partitioning and routing minimizes network transmissions as each node is only fed the pixels and trajectory points it needs.

The optic flow generation operator is executed by multiple nodes in parallel similarly to the dense point extractor. An additional challenge here is that the generation of optic flows involves (i) comparing two frames at consecutive time instances and (ii) multiple pixels in determining the value of the flow in each coordinate. (i) means that the operator is stateful, i.e., each node must store the previous frame and compare with the current one. Hence, node additions and removals (necessary for elasticity) become non-trivial as a new node does not immediately possess the necessary states to work (i.e., pixels on the previous frame) on its inputs. Regarding (ii), each node cannot simply handle a region in the frame, as is the case in the dense point extractor, as the computation at one coordinates relies on the surrounding pixels. Our solution in LiveAR is to split the frame into overlapping patches; each patch contains a partition of the frame, as well as the pixels surrounding the partition. This design effectively reduces the amount of network transmissions, thus improving system scalability.

Lastly, the trajectory tracking operator involves three inputs: the current trajectories, the dense points detected from the input frame, and the optic flows of the input frame. The main idea of this operator is to “grow” a trajectory, either an existing one or a new one starting at a dense point, by adding one more coordinate computed from the optic flow. Note that it is possible that the optic flow indicates that there is no more coordinate on this trajectory in the input frame, ending the trajectory. The parallelization of this operator is similar to that of the dense point extractor, except that each node is assigned trajectories rather than pixels and coordinates. Grouping of the trajectories is performed according to their last coordinates (or the newly identified dense points for new trajectories).

oa Video Demo of LiveAR: Real-Time Human Action Recognition over Live Video Streams

Abstract

Most Read This Month

Most Cited Most Cited RSS feed

Barriers and facilitators influencing the physical activity of Arabic adults: A literature review

Osteoporosis: An under-recognized public health problem

E-learning in Saudi Arabia: Past, present and future

Association of erythrocytes antioxidant enzymes and their cofactors with markers of oxidative stress in patients with sickle cell anemia

Qatar’s economy: Past, present and future