The covid19-eu-zh/covid19-eu-data repository is an experiment of data scraping and aggregation using GitHub Actions.

covid19-eu-zh is a dynamic and energetic team. Please consider follow their telegram channel for COVID-19 in Europe in Chinese.

The Dataset

The dataset is being updated regularly.1

The following is a table showing the update status and data sources.

Country Status Data Source
AT CI Download AT Data
BE CI Download BE PDF
CH CI Download CH Data
CZ CI Download CZ Data
DE CI Download DE SARS-COV-2 Cases from RKI
DK CI Download DK PDF
ES CI Download ES PDF Files
FR CI Download FR Data
IT CI Download IT Data
NL CI Download NL SARS-COV-2 Cases from volksgezondheidenzorg
NO CI Download NO Data
PL CI Download PL Data
SE CI Download SE
UK CI Download England Data CI Download Scotland Data CI Download Wales Data
EU(ECDC) CI Download All EU from ECDC

A Demo: Confirmed Cases in Germany

The data can be directly loaded into your applications. Here is a simple demo using the data file for Germany. Please refer to /flora/covid19_eu_data/ (link to the dataset) for the data files of all available countries.

Select Region:

Data Collection

The structure of the project is as follows.

.
├── README.md
├── dataset     #where the data files lives
├── documents   #where the raw data and files lives
├── now.json    #zeit now setup for a FAAS service
├── scripts     #scripts to download and aggregate data
└── .dataherb   #where the metadata lives

Scripts

We have a python script for each country for more flexible schedules of each country. We are using classes from utils.py so that the scripts have similar structures.

scripts
├── download_at.py
├── ...
├── requirements.txt
└── utils.py

Dataset

The dataset folder contains the full dataset of each country and the daily pdates of each country.

dataset
├── covid-19-at.csv
├── ...
└── daily
    ├── at
    ├── ...

GitHub Actions

We manage the pipelines using GitHub Actions. The full set of workflows is found in the original repository.

We use Germany as an example. In the workflow for Germany, we have two triggers, pushing to master branch and schedule. The job steps are

  1. Checkout the repository;
  2. Setup python and install python requirements;
  3. Run the python script to download and aggregate data;
  4. Push data to repository.
name: CI Download DE SARS-COV-2 Cases from RKI

on:
  push:
    branches:
      - master
  schedule:
    - cron:  '0 7/1 * * *'

jobs:
  build:

    runs-on: ubuntu-latest

    steps:
      - name: Checkout current repo
        uses: actions/checkout@v2
      - name: Get current directory and files
        run: |
          pwd
          ls
      - uses: actions/setup-python@v1
        with:
          python-version: '3.7' # Version range or exact version of a Python version to use, using SemVer's version range syntax
          architecture: 'x64' # optional x64 or x86. Defaults to x64 if not specified
      - name: Install Python Requirements
        run: |
          python --version
          pip install -r scripts/requirements.txt
      - name: Download Records
        run: |
          python scripts/download_de.py
          ls dataset/daily/de
          git config --local user.email "action@github.com"
          git config --local user.name "GitHub Action"
          git pull
          git status
          git add .
          git commit -m "Update DE Dataset" || echo "Nothing to update"
          git status
      - name: Push changes
        uses: ad-m/github-push-action@master
        with:
          repository: covid19-eu-zh/covid19-eu-data
          github_token: ${{ secrets.GITHUB_TOKEN }}

And Much More

We run a Telegram Channel (in Chinese): 新冠肺炎欧洲中文臺

Chat

If you would like to help or track the progress of this project, check out our roadmap.

  1. Some countries, such as Spain, we only download the record PDF files. Some countries, such as Italy, provides open and well-organized data by the government.