Astrophysical Virtual Observatory

::: BrunoRinoSandbox :::
# Sitemap  
  AVO TWiki system # Edit # Attach # Diffs # Printable # More # Changes # Register # Search

:::
Logged in as TWikiGuest

EUROVO FC

AVO

EGG

Main

Know

Test
TestTopic1
TestTopic2
TestTopic3
TestTopic4
TestTopic5
TestTopic6
TestTopic7
TestTopic8

TWiki

FITS keyword mapping documentation

Rationale

Ingest the metadata present in FITS keywords into existing data models.

FITS keywords vary a lot across sources; files that come from different observatories/missions/instruments commonly have different metadata stuctures; later processing, either by standard pipeplines or astronomers analising the data can further increase the differences, although adding useful metadata.

These differences range from different key names, different units, split data, to missing metadadata. The FITS keyword mapping tool intends to normalise the metadata, so that the metadata can be used to describe the data in a uniform way.

Architecture

To achieve the proposed goal in a generic way, usable by any data centre, the tool is split in two components, executed in sequence: keyword mapping and persistence.

The keyword mapping component processes the FITS files, applies the mapping definition to them, and produces an in-memory list of all files and normalised keyword values.

The persistence component persists this list in whatever format/database/etc a specific application might require; a particular data centre will customise this component to meet its data model. Possible uses are:

  • persisting to a VOTable
  • persisting to an existing database structure
  • persisting into the original FITS files

mapp_arch.gif

Key concepts

package
Set of FITS files to be ingested in one go.
FITS file datatype
Describes the kind of data is present in the file. This is defined by the target model, and reflect the need of different metadata to describe different kinds of data (e.g., Spectral Resolution won't apply to image data).
model item
Some concept that exists on the target data model. A model item is a 'bucket' that is filled with the values extrated from the FITS files by the mapping tool. Different sets of model items will be required for different datatypes.
mapping rule
Definition of how a single value for a given model item is extracted from the FITS files.
target model
The (data) model, as defined by the data centre, into which the mapped values are ingested.

Keyword mapping

Package

For a given run of the mapping tool, a package must be provided.

A package made is of two components:

  • the package file list; for each entry:
    • path to file
    • file datatype
    • association
  • the FITS files

The package file list contains a list of the FITS files, with additional metadata that permit to extract values from the FITS headers.

The file datatype relates to the target data model. It is expected that the target data model will differentiate types of data, and use different metadata to describe them. As such, this datatype is used to determine the required model items.

The association field defines things that go togheter. For instance, an image present in a FITS file might be acompanied another file containing a weight map.

The required attribute is used to validate that the mapping definition files contain enough rules to meet the data centre requirements. If a model item is required but there is no mapping rule to extract it, the mapping run will fail with an error. Non-required (optional) keywords are extra metadata that the data center supports, but doesn't require.

Mapping definition

A mapping definition is a list of mapping rules (see below for syntax discussion), which maps FITS keywords to model items. These mapping rules are related to the file dayatype: they must provide a rule for every required model item. A distinct mapping definition file must be defined for each FITS file with a different structure, i.e, when there are different FITS keywords to retreive the metadata required by the target model.

Note: Depending on a data centre's requirements, the mapping rules might remain constant (if the FITS files' structure is constant). However the general case does not require it; they might nonetheless be reused across packages.

Configuration

The mapping tool also needs some configuration that remains constant throughout every run of the tool, as they reflect a data centre's requirements. It consists of the model items definition.

The model items definition lists all existing model items. The model items are characterised in terms of:

  • model item name (utype)
  • ucd
  • unit
  • description

The characterisation also includes "selector" attributes, that define when the model item applies:

  • data type (specifies to which FITS file datatype the model item applies)
  • required (indicates if the model item is required)

The required attribute is used to validate that the mapping definition files contain enough rules to meet the data centre requirements. If a model item is required but there is no mapping rule to extract it, the mapping run will fail with an error. Non-required (optional) keywords are extra metadata that the data center supports, but doesn't require.

Persistence

Persistence of the mapped values is out of the scope of the mapping tool itself. Persistence mappers must be developed by data centres to meet their requirements.

Users's Guide

The mappings

This section contains the several types of mapping rules necessary to ingest the metadata from the test datasets. Some rules are not necessary to ingest the test data, so examples are artificial.

The syntax itself is provided as illustration; it will be formally defined at a later time, and is subject to change. One of the goals is that it should be easily readable.

Keyword/value mapping

What: The value is the content of a know FITS keyword.
How: Specify FITS keyword
Example: TotalExposureTime = EXPTIME

Constant

What: A value is fixed across all the data to ingest
How: Specify constant value
Example: TypeOfObservation = "SPECTROSCOPY"

Conversion

What: A value is present, but in different unit/formatting than the one expected
How: Add unit/formatting info along with the value; a list of supported conversions must be defined (see below)
Example 1 (expected unit: degrees): SpatialInfo.ERR_SLIT = ERR_SLIT,arcsec
Example 2 (expected format: ISO8601): StartDateTime = {MJD-OBS,mjd}

Arithmetic expressions, string concatenation

What: A value must be calculated
How: Evaluate expressions with simple arithmetic operators (+. -. /, *), and string concatenation (&)
Example 1: Wavelength Min = CRVAL1 - CRPIX1 * CDELT1
Example 2: ProcessingType = "ESO UVES pipeline" & HIERARCH ESO PRO REC1 PIPE ID

Choice

What: A value exists in one of several keywords
How: A list of candidate keywords is explicitly defined; the first one to be found is used.
Example: OpticalElements.FILTER = FILTER2|FILTER3 (either FILTER2 or FILTER3 keyword exist on a file)

Indirection

What: A keyword contains a keyword name to lookup
Example: AverageResolution = >RESOLAVG

Addressing other FITS extensions

What: A keyword resides on an extension other that the primary
How: Specify the extension along with the keyword. Extension 0 will be the primary FITS unit; 1 will be the first extension.
Example: Instrument = 1:INSTRUME

Addressing other FITS files

What: A keyword resides on an external FITS file
How: Specify the file name along with the keyword
Example: AverageResolution = {QC-FILE}:0:RESOLAVG

Standard computations

What: Some rather important values are systematicaly absent the header, but can be computed from the data itself.
How: A set of standard computations are included in the mapping tool, that are applied based on the model item name.
Example: Spatial.Seeing = %COMPUTE%

Precomputed table

What: Some values are extraordinarily hard to get, or even missing.

The precomputed table is a fallback solution when a value is both too hard to get using mapping rules and isn't amenable to be computed by a standard compuation.

An example from UVES data: lookup the keyword name which has a value of 'LINE_TABLE_*2' (where * = BLUE, REDL or REDU), replace the last 4 characters 'CATG' with 'NAME', get the value of that keyword, use the value as a filename to retrieve from the ESO archive, and lookup in that file the value of the keyword HIERARCH ESO QC RESOLAVG
How: Such complicated mapping will not be automated, as they are not likely to occur more than once. It will involve user intervention, in the form of an external file, in tsv format, provided with the values for each file (this effectively corresponds to hard-coding the values in this file). This scheme also allows for inclusion of metadata that is not present in any form on the FITS files.
Example: AverageResolution = %EXTERNAL%
The external file (first line contain the concepts to map to):

filename                                            AverageResolution          other....
r.UVES.2002-02-03T04:20:57.989_0011.fits            50465.089
r.UVES.2002-02-03T04:20:57.989_0012.fits            79456.827
r.UVES.2002-02-03T04:20:57.989_0013.fits            67034.8444
.....

Unit Conversion

For unit conversion we rely on CDS's Unit conversion library.

TBD

Usage

The keyword mapper component (extractmetadata.py) defines the following command-line parameters:

--itemdef-file model item definitions file
-l file list
-d (optional) path to prepend to the filenames on the file list
-m (optional) mappings file
-e (optional) external keys file
--verbosity (optional) verbosity level
--help (optional) print usage

Persistence components might define extra parameters:

persist_debug.py
-o output file
persist_votable.py
-o output file
--split-output (optional) whether to create separate votable files for each datatype
persist_mysql.py
--host MySQL hostname
--user MySQL username
--password MySQL password
--database MySQL database
--pkgFile global package metadata

Programmer's Guide

Standard computations

Standard computations are rules that apply to most of the data, in a standard way. These are defined by code, as a separate file (compute.py).

Edit the computeMethodDict dictionary. Each entry defines the model items that are to be computed, and the function that will do the computation (a DoCompute function). Since values from other model items are likely to be needed, and might not yet be present (if they are to be computed), if a DoCompute? function fails on that account it will be re-executed again after all the other DoCompute? functions ran.

Persistence

TBD The keyword mapping component itself is a stand-alone program that simply outputs its results to stdout.

For real use of the keyword mapping tool, one needs to write a persistence component. Persistence components call directly the keyword mapping component, read its in-memory lists of metadata, and output it as they see fit.

persist_debug.py is a simple persistence component that simply dumps the relevant metadata into an html file: the model items definition, the file list, and the mapped values. It is both a good starting point to define a new persistence component and to inspect the structure and contents of the keyword mapping lists.

A typical persistence component will:

  • define extra command-line parameters
  • run the keyword mapping component
  • persist the new values

A persistence component (might) require configuration of their own. The keyword mapping component defines command-line parameters that all persistence components must abide, and provides a mechanism to easily define extra command-line parameters using the ParseArguments function. persist_debug adds one argument: outputFile, which specifies where the html file will be saved.

The ExtractMetadata runs the keyword mapping component. The returned object contains:

  • itemDefDict: the model item definition
  • fileDesc: the file list
  • fileMD: the mapped values

Albeit the last dictionary is the interessant one, the other two contain information that a data centre might whish to persist.

Installation

TBD Fedora Core 3 Linux , Python 2.4
  • cds units
  • jpypecode

MySQL?, Mysql python

Sample

TBD

The download includes three persistence components:

debug
this module outputs the important in-memory lists produced by the keyword mapping component
votable
output the data as VOTable(s)
mysql
outputs into a MySQL database (DDL file is provided to create the tables)

Two sample packages are provided: uves and gabods. FITS files are stored under the data folder.

Test case

A few datasets were chosen to serve as test case. The examples provided in this document apply to UVES data. The metadata will be ingested into ESO's Science Archive Facility, which will become the backend repository for ESO's registry.

Advanced features, known issues and next developments

Advanced features

The tool supports some features not discussed in this document:
  • Using FITS files as a transport mechanism. If a FITS file contains ore that one product within, each with its own set of metadata, the HDUs that need to be processed must be specified in the file list.
  • Using different mapping files for different files of the dataset. Most commonly a package will contain only one "main" datatype (one that will have its FITS keywords processed), and as such specifying the mapping rules file as a command line parameter is sufficient. But if that is not the case, a mapping file may be specified in the file list (for each file) rather than as a command line argument. You can even apply different mappings to files with the same datatype!
  • The model item definition is extensible; you can add new columns to it if your persistence component needs more description about model items. The first (comment) line is used as key in the memory representation.

Known issues

  • The unit conversion library doesn't currently support all the conversions needed by the test cases; this should be fixed soon. At present some workarounds are in place. If unit conversion problems are encountered, you might want to try to disable the unit conversion library (with the --without-cds parameter), and use legacy conversion code. Note however that this legacy mode is tunned to work with the test data.

Next developments

The FITS Keyword Mapper is still a work in progress; new features, namely new mapping rules types, might be added. Feedback is appreciated.

References/Acknowlegments

The requirements for this tool were presented? and defined at DS5PlanningStage03?.

The mapping tool is developed at ESO's Advanced Data Products group by Remco Slijkhuis.

Rules requirements and model item definifions set up by Nausicaa Delmotte, Markus Dolensky, Jörg Retzlaff, Bruno Rino, Remco Slijkhuis, Andreas Wicenec.

The unit conversion is powered by CDS: "This software uses source code created at the Centre de Données astronomiques de Strasbourg, France."

Presented at: DS5PlanningStage04?.

-- BrunoRino - date TBD

# Edit menu  


Topic revision r1.2 - 03 Sep 2006 - 08:20 - BrunoRino
Topic parents: TWikiUsers > BrunoRino
Copyright © 2003 by the contributing authors. All material on this collaboration tool is the property of the contributing authors. Ideas, requests, problems regarding AVO TWiki ? Send feedback.