Data harmonization

To allow common re-use of diverse data sets in Edaphobase, the system harmonizes data sets to ensure maximal comparability. A first harmonization step takes place during data upload, where a data provider’s original data is mapped to Edaphobase’s data structures and, especially, to its controlled and standardized vocabularies (including vocabularies for content of categorical variables).

Further harmonization steps take place automatically within Edaphobase after data import. Examples are:

  • Species nomenclatures: if older species names are used in the original data, Edaphobase’s taxonomic backbone automatically links these to the valid species names.
  • Units: conversion of information entered by a user into a uniform reference system (unit) for the respective information field. For example: Sampling depth in cm (from mm or dm), Soil Organic Material in g/kg (from mg/g or ppm), etc.
  • Geo-coordinates: conversion of the entered coordinates into a uniform reference system (WGS84).
  • Quantities: to allow data comparability, where possible, all absolute numbers for species individuals are transformed to abundances (densities) in individuals/m² or in “activity densities” in individuals/trap/time period. This is only possible if the necessary metadata is given (i.e., number of samples/traps, surface area of sample or exposition time of traps).
  • Habitat information: entered habitat information is transformed into alternative concepts, where feasible, e.g., between German habitat types and European habitat or land-use types (i.e., CORINE, EUNIS). However, this is only done if a unique (1 : 1) assignment is possible. This means that not all habitat information can be transformed into other concepts or information loss is possible. Furthermore, habitat-descriptive quantities are transformed to alternative concepts (e.g., if percentage of sand, silt and clay is given, it is possible to identify the soil texture) or missing information can be reconstructed (e.g., if one of Corg, Ntot or C/N ratio is missing, te missing variable can be calculated).
  • Climate data: location data can be enriched with climate data external from raster data.
  • Elevation data: location data can be enriched with elevation data from external raster data.