The Web is barely 25 years old but in that time it has changed every aspect of our lives. Because of its sociotechnical rather than purely engineered nature, not only is the Web changing society but also we shape the way the technology evolves. The whole process is inherently co-constituted and as such its evolution is unlike any other system. In order to understand how the Web might evolve in the future - for good or bad - we need to study how it has evolved since its inception and the associated emergent behaviours. We call this new research discipline Web Science [1,2], and it is important for all our futures that we urgently address its major research challenges.

We are fast becoming part of a world of digital inter-connectivity, where devices such as smartphones, watches, fitness trackers, and household goods are part of a growing network, capable of sharing data and information. Increasingly, the Web has become the ubiquitous interface to access this network of devices. From sensors, to mobile applications, to fitness devices, these devices are transmitting their data to - often - centralised pools of data, which then become available via Web services. The sheer scale of this data leads to a rich set of high-volume, real-time streams of human activity, which are often is made publicly consumable (potentially at a cost) via some API. For academia, the combination of these sources are providing social scientists and digital ethnographers a far richer understanding of society, and how we as individuals operate.

These streams represent a global network of human and machine communication, interaction, and transaction, and with the right analytical methods, may contain valuable research and commercial insights. In domains such as health and fitness for example, the aggregation of data from mobile devices is supporting the transition towards the quantified-self, and offers rich insight into the health and well-being of individuals, with the potential of diagnosing or decreasing disease.

Why the need for Web Observatories?

Studying the Web provides us with critical insights about how we as individuals and society operate in the digital world. The actions, communications, interactions, and transactions produced by humans and machines has the potential to offer rich insight into our world, allowing us to better understand how we operate at a micro and macro scale. However, there are a number of barriers that prevent researchers from making the most of those data resources.

Herein lies a challenge, and a great opportunity. We are now in a position where the technologies used within the big data processing pipeline are maturing, as are the methods we use to analyse data to provide valuable insights. Yet, overshadowing these benefits are issues of data access, control, and ownership. Whilst the data being produced continue to grow, their availability beyond the walled-gardens of the data holder - whether commercial or institutional - reduce the full potential of analysis envisaged in the big data era.

(a) Datasets are distributed across different domains (b) Metadata about those datasets are not available or are in different vocabularies/formats, (c) Searching or for or inside datasets is not possible, (d) Applying analytics on one or more datasets requires copying them into a central location, (e) Datasets are often provided in the context of specific disciplines lacking the metadata and enrichment that could make them available in other disciplines, (f) The nature of some of the datasets often requires access control in the interest of privacy. (g) There is a need for engines that lower the barrier of engagement with analytics for individuals, organisations and interdisciplinary research communities by supporting the easy application of analytics across datasets without requiring them to be copied into a central location, (h) There is a need for services enabling the publication and sharing of analytical tools within and across interdisciplinary communities.

Addressing the challenges described above, we have introduced the Web Observatory [3], a globally distributed infrastructure that enables users to share data with each other, whilst retaining control over who can view, access, query, and download their data. At its core, a Web Observatory comprises of a list of architectural principles that describe a scalable solution to enable controlled access to heterogeneous forms of historical and real-time data, visualisations, and analytics. In order to handle these new forms of big, and small data, significant effort has gone into developing technologies capable of storing, querying, and analysing high-volume datasets - or streams - in a timely fashion, returning useful insights of social activity and behaviour.

A Global Network of Web Observatories

The Web Observatory (WO) project, developed under the auspices of the Web Science Trust, aims to develop the standards and services that will interlink a number of existing or emergent Web Observatories to enable the sharing, discoverability and use of public or private datasets and analytics across Web Observatories, on a large, distributed scale (http://online.liebertpub.com/doi/abs/10.1089/big.2014.0035). It involves the publication or sharing of both datasets and analytic or visualisation tools (http://webscience.org/web-observatory/). At the same time, it involves the development of appropriate standards to enable the discovery, use, combination and persistence of those resources; effort in the direction of standards is already underway in the W3C Web Observatory community group (http://www.w3.org/community/webobservatory/).

International research collaboration is one of the primary goals of creating a network of Web Observatories. There has already been significant effort in creating a number of Web Observatory nodes globally [4,5]

In this paper will describe an instance of the Web Observatory, the Southampton Web Observatory (SUWO) and how it is being applied both at Southampton and at other institutions in areas such as integrated health management in particular in support of an aging population

We believe that the true potential of the Web Observatory vision will be realised when the different observatories become part of a global network of Wide Web of Observatories, allowing cross-observatory querying and analysis. Only through working through a set of initial application areas, we will show the immediate value that the Web Observatory platform will provide, from the sharing of datasets and resources, to improving international collaboration and research opportunities as a result of the raised awareness of institutional resources.


[1] Berners-Lee, Tim, Hall, Wendy, Hendler, James, Shadbolt, Nigel and Weitzner, Danny Creating a Science of the Web. Science, 313, (5788), 769-771.

[2] Hendler, James, Shadbolt, Nigel, Hall, Wendy, Berners-Lee, Tim and Weitzner, Daniel (Web science: an interdisciplinary approach to understanding the Web. Communications of the ACM, 51, (7), 60-69.

[3] Tiropanis, Thanassis, Hall, Wendy, Shadbolt, Nigel, De Roure, David, Contractor, Noshir and Hendler, Jim, The Web Science Observatory. IEEE Intelligent Systems, 28, (2), 100-104.

[4] Tinati, Ramine, Wang, Xin, Tiropanis, Thanassis and Hall, Wendy, Building a real-time web observatory. IEEE Internet Computing (In Press).

[5] Wang, Xin, Tinati, Ramine, Mayer, Wolfgang, Rowland-Campbell, Anni, Tiropanis, Thanassis, Brown, Ian, Hall, Wendy and O'Hara, Kieron, 2Building a web observatory for south Australian government: supporting an age friendly population. In, 3rd International workshop on Building Web Observatories (BWOW), 10pp.


Article metrics loading...

Loading full text...

Full text loading...

This is a required field
Please enter a valid email address
Approval was a Success
Invalid data
An Error Occurred
Approval was partially successful, following selected items could not be processed due to error