Understanding population level food consumption, which has considerable influence on population health, is a major challenge. For instance, obesity, which is particularly pressing in the Gulf region, is largely driven by changes in food consumption. Many other widespread diseases, such as diabetes, heart disease, and high blood pressure, are directly or indirectly affected by eating habits. Understanding food consumption generally involves surveys and self reporting which introduce certain biases, latency and substantial cost. Social media offers new possibilities to passively monitor and study eating behaviors, and track them in real-time across regions and time.

Predicting population level statistics (e.g. tracking seasonal epidemics like Flu) can be obtained through social media (e.g. shared tags and texts) which provides large scale, non-intrusive, and location-aware (regional) data in real time. Instagram is a hugely popular image sharing application, particularly in the Gulf region. Although users often annotate their social media posts with hashtags, a lot more information remains “hidden” in the actual image, requiring novel processing methods. Noting that “a picture is worth a thousand words”, we make use of this visual information through state-of-the-art deep learning models, particularly concentrated on food-related images.

We propose a method for tracking regional and temporal food habits though social media images. Concerning technical aspects, we have two major contributions: (a) learning visual concepts from social media images with extremely noisy labels, and (b) predicting regional statistics through visual information analysis.

The recent developments in scene and image categorization, particularly the advances in deep learning, show that with large quantities of data (i.e. big data) we can achieve great performance, approaching the level of human annotations. Here we show that even with extremely incomplete labels (i.e. only a limited set of tags exist for images taken from instagram) a robust auto-tagging system can be learned using deep convolutional neural networks, particularly with the help of millions of images. We mainly focus on food images and food related tags collected from instagram. We explored two major deep learning architectures for food label prediction: the Alexnet [1] (see Fig. 1) and VGG network[2].

We explored both training from scratch (i.e. learning the system only using instagram images), and finetuning an existing convolutional network (i.e. build upon an already trained system for object categorization using large-scale imagenet database). Although both models have very strong prediction capabilities, VGG network gives better predictions overall. Figure 1 shows some example images auto-tagged with food labels using our system.

We employ our food label predictor for tracking regional and temporal food habits. These population level statistics not only inform us on healthy eating habits but also provide us cues for predicting consequences of these behaviors such as conditions and diseases. We show that certain public health statistics (e.g. alcohol consumption) can be reliably predicted by analyzing social media images through our deep learning models.

The data collection and fusion is another major issue, particularly required for real-time analysis. We developed a system that can reliably identify all the geo-location tags (e.g. places that are used while tagging the location of the images - houses, cafes, etc.) by performing a large scale grid search using geographic tessellation models. We also developed a distributed system that can keep track of the shared images in the identified locations in real time. In depth analysis are performed over these collected images.

One of the potential outcomes of our project is a system that can visualize regional and temporal food habits which can be used as a tool for predicting the future habits and understanding main dynamics that affect the food consumption. The system can track the food habits across regions, cultures and time (e.g. daily, seasonal, yearly). For instance it will be possible to see how fast food is gaining or losing prominence compared to healthier food, and in which regions it happens more drastically. We can track Christmas dinner changes over time, or what different regions of the world ate for Christmas. We can display what type of dishes are mostly consumed in the month of Ramadan, and if food image sharing is changed during the day and the night. Many other potential applications can be build upon the proposed system.


[1] Krizhevsky, A., Sutskever, I., and Hinton, G. E. ImageNet classification with deep convolutional neural networks. In NIPS, pp. 1106–1114, 2012.

[2] K. Simonyan and A. Zisserman. Very deep convolutional networks for large-scale image recognition. CoRR, abs/1409.1556, 2014.


Article metrics loading...

Loading full text...

Full text loading...

This is a required field
Please enter a valid email address
Approval was a Success
Invalid data
An Error Occurred
Approval was partially successful, following selected items could not be processed due to error