Fundació La Caixa: data collection via open data and scraping

Problem

To know how the weather and the exhibitions and activities of the main competing museums impact on the number of visitors to the museum.

Solution

Automate the process with scraping techniques from the main websites of museums in the country and obtaining meteorological data from AEMET.

Results

To have data to make the analysis and to be able to improve the proposals.

Context

In any sector, knowing why there is more or less public traffic to a point of sale is not an easy task. Normally, we think that it depends on the products offered, the campaigns carried out, the price, the promotion of the day… but there are also other variables that are not in our hands, such as the calendar, seasonality, the weather or the action of the competition. Fundació La Caixa set out to incorporate into its analytical model the data on exhibitions and activities offered by competing museums and the open weather data offered by the AEMET.

Opendata

AEMET OpenData is a REST API developed by AEMET that allows the dissemination and reuse of the Agency’s meteorological and climatological information.

Scraping...

Web scraping refers to the process of extracting content and data from websites using software. We were able to identify a set of data common to all museum websites that were relevant to the analytical model: Title and Subtitle of the Exhibition or Activity, Start and end dates, web url of the activity/exhibition.

On a monthly basis, the process is automatically run and made available to the analysis team.