Build Database¶
This section describes how to re-build the Protar PostGIS database from scratch. This presumes that the app and all its dependencies are installed and that the database settings are configured to use a PostGIS backend as specified below.
Building this database takes a substantial amount of resources, and the result is publicly available. In most cases it is therefore not necessary to rebuild this dataset. The description here is for documentation purposes and will be a guidance for potential future updates.
Configure Database¶
The Protar app works only PostGIS as database backends. To point the app to a specific database, specify the following environmental variables.
- Database name
DB_NAME
defaults toprotar
- Database user
DB_USER
defaults topostgres
- Database host
DB_HOST
defaults tolocalhost
- Database port
DB_PORT
defaults to5432
- Database password
DB_PASSWORD
defaults to an empty string
Download and decompress¶
The first step is to download the data from the EEA. The sources are described in the Data Management Plan. After download, decompress all files for Corine and Natura into separate folders. The corine data should have one subfolder with the Legend information.
For the Natura data, download the shapefiles and the tabular data as csv. Decompress both the spatial data and the tabluar data into a single folder. For the Corine data, the sqlite versions are required. Download the Corine land cover and change data for the four landcover periods as spatialite and decompress all of those into one folder. Keep one of the Legend subfolders contained in the zip files of the sqlite verion of the corine data. The legend folder will be used to build the Corine data legend.
For Corine data, the version required is v18.5. For the Natura data, the required version is 7.
Parse Corine Data¶
The next step is to parse the corine vector data. For this, build the
nomenclature and the legend objects first, then load the data into
the app using scripts built into protar. First, set an environmental
variable telling protar where the corine data sits (separate folders
for the legend and the landcover data). The legend folder should contain
a clc_legend.xls
file which comes with the landcover sqlite files. The
data folder should contain spatialite files for all land cover and land cover
change steps. Then scripts can be called as follows:
export CORINE_DATA_DIRECTORY=/path/to/corine/data/Legend
./manage.py runscript corine.scripts.nomenclature
./manage.py runscript corine.scripts.rasterlegend
export CORINE_DATA_DIRECTORY=/path/to/corine/data
./manage.py runscript corine.scripts.load
The data volume of the vector format of the Corine land cover is quite substantial. There are 8191080 polygons if counting all years and including change data. The size of this table in PostGIS is about 30GB, and it requires another 30GB for the index.
A part of the Corine landcover geometries are not valid geometries. Before computing the intersection, it is therefore necessary to clean the Coine data. The script to clean the data calls ST_MakeValid on all geometries of the dataset. Run the script using the following command:
./manage.py runscript corine.scripts.clean
Parse Natura Data¶
To load the Natura 2000 protected areas into the database, specify the Natura data directory through an environment variable. The Natura data folder should contain one shapefile with the Natura data and a series of CSV files with the Natura tabular data. Then the spatial and tabular data can be loaded using the following command:
export NATURA_DATA_DIRECTORY=/path/to/natura/data
./manage.py runscript natura.scripts.load
The natura data consists of 27372 protected areas, the size of the table and index is around 1GB.
Compute Intersection¶
Once all load scrips have completed successfully, the intersection data can be built with a script as well.
The intersection script computes the landcover statistics for all protected areas. This geoprocessing step takes many hours of computations. Therefore, the asynchronous task manager Celery is used to do the geoprocessing of the data. The computations are split into small batches of Natura sites, each of which is a separate Celery task.
To learn how to setup Celery, consult its documentation. Protar assumes a local RabbitMQ instance as broker and a Redis instance setup for the result backend. Both are expected to be running in the default locations. In that case, celery should work automatically out of the box. To start Celery use:
celery worker -A protar --loglevel=INFO
The environment variables to specify a custom broker backend is BROKER_URL
,
and CELERY_RESULT_BACKEND
for the result backend. The concurrency of the
Celery workers defaults to the number of available CPUs, but can be manually
specified using the CELERYD_CONCURRENCY
environment variable. A more
detailed description of how to use Celery goes beyond the scope of this
documentation, consult the Celery documentation for more details.
With Celery up and running, execute the following script to add tasks to the queue that will build the intersection data squentially:
./manage.py runscript natura.scripts.intersect
Due to the data volume of both the Corine and the Natura data, this intersection is a substantial task. On a server with 4 CPUs and SSD disks the intersection took roughly 20 hours to complete.
Dump the data¶
The protar frontend does not make any use of the Corine landcover geometries
after computing the intersection. To use the data for running the app, it is
therefore sufficient to use a database without the corine_patch
table. To
dump the data without the patches, use the following command:
pg_dump protar --exclude-table-data=corine_patch -F c -v -f protar.dump
Load raster data¶
The parsing of the raster version of the corine data is for visualization
purposes only. It is a more manual process that is done through the admin
utilities of the django-raster package. To load the rasters, create one
RasterLayer
object for each raster through the admin, and link it to one
CorineLayer
object in the corine
app. The raster layers span all of
europe, and hence the parsing takes about 4 hours per layer and significant
amounts of disk space are required during parsing.
If you want to use the Django shell to create those rasters, use something like the following command, this will create the raster objects and trigger the parsing through celery:
from raster.models import RasterLayer
from django.core.files import File
rst = File(open('/path/to/corine/data/g100_clc90_V18_5.tif', 'rb'))
lyr = RasterLayer.objects.create(
name="CLC90 V18.5", datatype='ca', srid=3035, rasterfile=rst
)
With that, reference the raster layer as a corine layer in the corine app.
The frontend interface expects one CorineLayer
object for each available
year (1990, 2000, 2006, and 2012):
from corine.models import CorineLayer
CorineLayer.objects.create(rasterlayer=lyr, year=1990)