Accurate and up-to-date census data is vital for informed policy decisions ranging from healthcare to infrastructure planning. However, collecting such data takes considerable effort and cost, with, for example, United States performing its census every 10 years. Though different approaches exist, see e.g. https://unstats.un.org/UNSD/demographic/sources/census/alternativeCensusDesigns.htm, they all have their shortcomings. Recent developments in digitization and increased popularity of social media present new sources of information for compiling demographic information of large populations. Such a “digital census” may be compiled within hours or minutes by collecting user information from major websites and it would be dynamic over time. Further, such data may provide statistics on populations which prefer to stay hidden from governmental questionnaires, or are simply too recent to register in the official reports.In our current research we use the Facebook Advertising audience estimates for collecting demographic data within geographic regions. In particular, before ordering an advertising campaign, the potential advertiser is free to query Facebook about the potential reach, or estimated audience, of a particular selection of location, gender, age, language, interests, and a variety of other attributes. For example, an advertiser or a researcher could ask how many Bangladeshi expat Facebook users live in Qatar, and as of October 2017, Facebook estimated this number to be at 260,000, which is reasonably close to the recent estimate of 280,000 in 2016 (http://priyadsouza.com/population-of-qatar-by-nationality-in-2017/). As such reach estimates do not divulge information on any particular user, this information provides an aggregate view of the particular population – precisely the information necessary for a demographic study.As a case study, we explore the neighborhoods of Doha, with each 1km radius (minimum possible location area allowable by Facebook). A visualization of this data is available online at http://fb-doha.qcri.org/. We designed a visual analytic interface to enable exploration of these data. Demographic slices of the data are presented in tree maps (Fig. 1, left) colored based on audience coverage of the respective slices. A set of colored discs shows density of audience matching with specific geographic areas (Fig. 1, right). The user can select a particular demographic (such as male gender or graduate education), automatically updating all other demographic segments and geographic density to correspond with the current selection.For example, one may be interested in finding families with women who use iOS, finding that most are concentrated in West Bay, the Pearl, and south Doha (see Fig. 1). Alternatively, one may be interested in the young males using Android, which can be found at the bottom left of the map, around the Industrial Area of the city (see Fig. 2). The selection tool itself provides a view of the demographic slice, as all demographic selectors adjust with each selection of one. For instance, if we select “Westerner” (and nothing else), the Gender selector adjusts to show a rather even proportion of 12.5K Female to 15.9K Male, vast majority being 25 years and older, and more than half using iOS (15.9K) compared to Android (12.5K). Now, select “Nepali” instead, and the demographics in the selectors change strikingly, and is now dominated by younger males using Android phones. To save a particular “view” of the data, a custom URL is available in the browser's address bar, as well as an export of the statistics in an Excel file.Fig. 1. Visualization of Facebook Advertising audience estimates, where women using iOS are selected, with the most populated regions colored in dark purple on the map.Fig. 2. Visualization wherein males of age 18-24 who use Android cellphones are selected, with most populated areas highlighting the south-west of the map.Although Facebook as a data source suffers from potential biases in sampling and a “black box” nature wherein we as users are not shown the inner workings of the data processing pipeline, the fact that the data captures the work force at the Industrial Area, for instance, provides some validation for the coverage of the data. Furthermore, Facebook claims to have 2,800,000 monthly active users in Qatar, which is even slightly more than the country's population (as of October, 2017), which may be due to bots or fake accounts.Our focus is to develop methodology for the collection, and above all validation, of social media data in the aims of providing reliable, dynamic, and cheap source of demographic knowledge. This knowledge then can be used for informing policy decisions and distribution of resources and better targeting of campaigns.


