Program - Big Data and Cloud Computing

9:00 - 9:25 Doors open and coffee
9:25 - 9:30 Antonio Fernández Anta (IMDEA Networks Institute)

Introduction and Welcome

9:30 – 10:20

Esteban Moro (Universidad Carlos III de Madrid, Spain) (Download presentation 11MB )          

Social media fingerprints of unemployment
Recent wide-spread adoption of electronic and pervasive technologies has enabled the study of human behavior at an unprecedented level, uncovering universal patterns underlying human activity, mobility, and inter-personal communication. In the present work, we investigate whether deviations from these universal patterns may reveal information about the socio-economical status of geographical regions. We quantify the extent to which deviations in diurnal rhythm, mobility patterns, and communication styles across regions relate to their unemployment incidence. For this we examine a country-scale publicly articulated social media dataset, where we quantify individual behavioral features from over 19 million geo-located messages distributed among more than 340 different Spanish economic regions, inferred by computing communities of cohesive mobility fluxes. We find that regions exhibiting more diverse mobility fluxes, earlier diurnal rhythms, and more correct grammatical styles display lower unemployment rates. As a result, we provide a simple model able to produce accurate, easily interpretable reconstruction of regional unemployment incidence from their social-media digital fingerprints alone. Our results show that cost-effective economical indicators can be built based on publicly-available social media datasets. 

Rosa Badia (Barcelona Supercomputing Center - Centro Nacional de Supercomputación, Spain) 

Programming the Cloud with PyCOMPSs: a task-based approach (Download presentation 11MB )

Cloud computing has become a dominant computing paradigm in the current scene offering a competitive advantage with regard the cost of ownership. However, from the programming point of view, if you take into account distribution and heterogeneity of APIs of cloud providers, it represents an overhead.

The importance of programming models to enable applications to efficiently run in such platforms has been recognized in the recent years by the computer science and scientific community in general. Ideally, programming models should provide an interface by which the application developer expresses algorithms and ideas in a platform-agnostic way. Aspects such as programmability, portability and interoperability been taken into account when proposing such models.

The talk will review aspects related the challenges of programming cloud computing platforms and how have been addressed at BSC with the PyCOMPSs programming model, a task-based programming model that offers a platform unaware interface. A sequential PyCOMPSs code can run in parallel in a distributed cloud, even in federated clouds. Additionally, PyCOMPSs is currently being integrated with new methodologies to store and access data beyond the traditional file systems that will enable to support Big Data applications.




Seif Haridi (KTH/SICS, Sweden)

Apache Flink Streaming (Download presentation 2MB )

Flink Streaming is an extension of the core Flink API for high-throughput, low-latency stateful stream processing system. The system can connect to and process data streams from many data sources like Flume, Twitter, ZeroMQ and also from any user defined data source. Data streams can be transformed and modified using high-level functions similar to the ones provided by the batch processing API. Flink Streaming provides native support for iterative stream processing. The processed data can be pushed to different output types.

In this talk we will give an introduction to Flink Streaming programming model and outline its implementation and fault tolerance. Flink streaming is part of Flink, a top-level Apache project.

12:30-14:00 Lunch/Scientific Posters Exhibit

José Manuel Bernabeu-Aubán (Kumori Systems & ITI, Spain)

Managing Elasticity in a Cloud PaaS (Download presentation 2MB )
A major innovation brought by Cloud Computing was the concept of Infrastructure as a Service (IaaS), and the associated business model of pay-per-use, or utility computing. An IaaS provider offers its customers (typically SaaS providers) the convenience of paying for those resources actually used for the time they are used. IaaS Customers can, additionally, consider having an apparently infinite amount of resources on which to make their services grow when needed.
One of the driving forces behind the development of Cloud infrastructures is optimization of the Operational costs for running a service. This means services must be horizontally scalable, using at each point in time only those resources needed, which necessarily brings elasticity to the fore as a desirable property of Cloud solutions. IaaS, by their nature, cannot focus on elasticity. Within the NIST Cloud computing characterization, the PaaS (Platform as a Service) is the layer in charge of helping Software Services reach their elasticity goal.
In this talk we present the requirements that elasticity imposes on PaaS providers, showing some of the approaches that have been proposed to deal with such requirements.
14:50-15:20 Break

Israel Herraiz (Amadeus Travel Intelligence, Amadeus IT Group SA)

Big data for the travel industry: hype or hope?

2.5M TB of Big Data is created each day by ~3B people, and many of this data is produced in the scope of travel. In this talk, we will show how Amadeus Travel Intelligence is shaping the future of travel, through helping the travel industry to leverage on big data, to have an in-depth understanding of their markets and to make effective decisions quicker and better, to predict trends and customers’ intentions and to propose personalized experiences across the entire travel cycle.


Marco Mellia (Politecnico di Torino, Italy) (Download presentation 4MB ) 

The Web: Source of Big Data with a Measurement Perspective

The Web serves each day trillions of pages and content to users. Data is moved back and forth through the Internet Service Provider networks at an always-increasing pace. Leveraging information from this data has always been a hard problem, typically solved by active crawlers that explore the web. In this talk, I will present some results obtained instead from passive observation of traffic, where a real-time systems extracts web pages from Gb/s streams of network packets generated by actual Internauts.

By running algorithms, we look for interesting information that can be extracted from the raw data.

Applications ranges from content curation systems, to automatically unveil online trackers, i.e., hidden website that collect information of unaware users.


Wrap up