AstroPhysical Virtual Observatory

First Annual Report

Report from Work Area 3 (WA3)

November 13th 2002

Andy Lawrence
WA3 Manager
University of Edinburgh

E-mail: al@roe.ac.uk

(1) Overview

The overall purpose of WA3 is to assess, develop, and deploy new technologies needed to construct a Virtual Observatory. These technologies are in three main areas - storage and compute technology, Grid technology, and database/datamining technology. In all these areas the technological background is evolving very rapidly, and application projects throughout science, and indeed business requirements, are facing very similar challenges. For this reason, we need to remain agile in our approach.

WA3 has been a success to date, with significant progress in assessment and testing of key technologies, and establishing relationships with other grid projects Europe wide. New recruitment into WA3 has been slow, but the existing staff have been very active from the start of the project, and AVO has benefitted indirectly from a larger pool of effort in the AstroGrid and NGAST projects. The existing staff are mostly scientists, whereas newly recruited staff are mostly professional software developers. Along with the rapidly evolving nature of the technological background, this has a led a slight change of emphasis in the work plan, in that we have taken a simplified attitude to benchmark design, have accelerated the technology assessment programme, and are now moving towards trial deployments as rapidly as possible.

Assessment of technology in the area of databases and storage/compute technology has been progressing well, with some very interesting results. Perhaps the most striking success however has been in assessment and testing of grid middleware, both from the academic computer science side (OGSA etc) and from the commercial/W3C side (web services etc), and a subsequent involvement in interdisciplinary efforts to forge a merger between these approaches.

(2) Structure of Work and Responsibilities

WA3 is under the overall management of A.Lawrence, at the University of Edinburgh, who is also the AstroGrid Project Leader. The AstroGrid consortium takes principal responsibility overall for AVO technologies. There are three sub-packages.

WP3.1 (Grid Technology) is undertaken primarily by AstroGrid staff, and is managed by G.Rixon at the University of Cambridge. However much of the work intersects with that in WA2, and the project as a whole is kept closely informed of progress.

WP3.2 (Storage/Computer Technology) is undertaken primarily by staff at ESO, and is managed by A.Wicenec.

WP3.3 (Database Technology) is undertaken primarily by AstroGrid staff, and is managed by C.Page at the University of Leicester. However there is also a significant programme of work at TeraPix under the direction of Y.Mellier. To date this has been largely independent, but we are now in the process of merging these programmes.

(3) First Year Activities

(3.1) WP3.1 Grid Technology

Middleware

We have undertaken a thorough investigation of grid middleware technologies, which has led to an evolution of a forward vision. The work done here is summarised in a document originally written as part of the AstroGrid Phase A report, and which can be found here at http://wiki.astrogrid.org/bin/view/Astrogrid/RbGridTechnologyReport. Here we summarise the main findings.

We investigated several grid middleware packages and related technologies, including Globus, Condor, Storage Resource Broker, and a commercial implementation from Sun called Grid Engine. Condor and Grid Engine were the most robust and reliable, but of limited usefulness to us, as they are aimed at computational grids, as opposed to the data grid structure we need. The most important package is Globus, as it was developing as the standard international toolkit for Grid applications. However it had clear limitations. The first is that it is a development project, not yet a real product, being hard to implement and use, and with many parts not working. The second major drawback is that data transfer is for flat files only, with no way to address structured databases.

We see the Virtual Observatory as primarily a service grid, i.e. a network of data centres which offer operations on datasets. This matches well onto the web services technology, which during the last year has emerged as a new industry standard. This involves data exchange in XML formats, sent in messages with SOAP wrappers, and with the data service specified in WSDL, and published to some kind of registry. This technology offers most of what we need for the VO, but there are problems. Firstly, standard web services are one-to-one whereas we need to compose services; secondly, XML is bulky, so we need a standardised way to link to binary data; and thirdly the standard emerging registry, UDDI, is for various reasons not suitable.

The Globus and Web Services worlds are now merging into the concept of grid services, to be implemented through a project called OGSA (Open Grid Services Architecture). Of particular importance for the VO is the sub-project to establish a Database Access Infrastructure (OGSA-DAI), which is a collaboration between the UK e-Science core programme and IBM Hursley. We have established a working relationship with the OGSA-DAI team.

Since September the effort in WP3.1 has shifted focus, and is now concentrated on test-deployments, firstly of the Globus toolkit, and also of web service technology, in collaboration with WA1 and WA2 as part of the January 2003 demonstrations. This involves adapting existing tools to work remotely with a distributed data set using XML/SOAP standards. We expect both to make a useable product, and so learn lessons about user requirements, but also to learn technological lessons.

Network studies

Following initial research, it became clear that the necessary network monitoring benchmark tasks needed had already been written by the particle physics community for a variety of projects. We joined two key UK-based committees (CNAP and PPNCG) and requested addition of key astronomical sites across Europe to the monitoring list. Specific results have not yet been analysed, but a general conclusion Europe wide is that in practice working bandwidth is not limited by fibre capacity, but largely by security bottlenecks and other end-point and router CPU issues. Meanwhile, as well as analysing the physical network, we need to decide how to assess the speed performance of grid infrastructure components across the network. Guy Rixon in Cambridge has written a draft document setting out a proposal of how to take this analysis forward. This document can be found here at http://www.euro-vo.org/twiki/bin/view/Avo/WorkPackageThreeOne.

(3.2) WP3.2 Storage/Compute Technology

At ESO, work has concentrated on continued development of the NGAS system. The concept is one of accumulating PC ands IDE storage as a cluster with associated software so that storage and CPU are configurable and hot-swappable, while the system grows scaleably, and with CPU and storage maintained in balance. During this year we have established data check plugin capability. This plugin is executed in configurable intervalls on each server, computes checksums of every file and compares the result with a value stored in the data base. Extended NGAS capabilites include transparent cloning and synchronization of data to another host somewhere on the network. All NGAS data is now treated on a file basis, i.e. the concept of master and replication disks has been dropped in favor of a very homogeneous management of multiple copies of the same file. Retrieval logic returns files preferably from the location closest to the requester. NGAS disks can be removed from the repository and recycled using the NGAS server. In the both cases the files registered on the disk are removed from the repository as well. Extended NGAS capabilities include transparent processing requests. Registered processing requests to any NGAS node are always carried out close to the data, i.e. on the host which holds the data.

At Jodrell Bank, work in WA3.2 has focussed on testing the performance of compute-intensive data reduction and analysis on the MERLIN archive with AIPS++ using a large Beowulf cluster (with 182 nodes). A key aim in federating radio data with other wavebands is to be able to derive images on demand and on the fly from fringe data, but this is computationally challenging. This system seems now to be working well, and the next step is to make it work across the network within Globus.

A variety of other machines are being installed at tested at various sites. For example in Edinburgh duplicate versions of the SuperCOSMOS archive are being set up on a monolithic RAID array, on a Windows cluster running SQL server, and on a 16 processor SMP machine running DB2 kindly donated by IBM.

(3.3) WP3.3 Database Technology

A thorough investigation of the issues involved in data mining, and performance tests of specific DBMS, has been carried out by AstroGrid staff, and is described in the relevant chapter of the AstroGrid Phase A report. This can be found here at http://wiki.astrogrid.org/bin/view/Astrogrid/RbDataminingTechnologyReport. We have evaluated DB2, MySQL, Oracle, Postgres, and SQL Server. All had fairly similar speed performance to within the uncertainties of different platforms, tunings, and so on. There were some clear worries that holds lessons for how to implement networked astronomical archives. All DBMS supported indexed operations efficiently, but were several times slower than expected on sequential scans. None of the DBMS had a good way of importing bulk data from or exporting to binary files such as images. Quite a few of the kind of queries that astronomers want to do are not easily expressible in standard SQL. Moreover, none of the DBMS implement full standard SQL92, so that many of the benchmark tests had to be adjusted for each separate DBMS. These last two points together (non-standard queries, and non-compliant implementation) suggest that we need to develop some kind of standard Astronomical Query Language (AQL), with a "driver" for each specific DBMS.

Further performance tests were run by AVO staff at TeraPix, concentrating on comparing an object oriented DBMS (Objectivity) with a relational one (MySQL). Unlike the AstroGrid tests which were very general, the TeraPix staff made more detailed tests of the sensitivity of the DBMS to various tuneable parameters. These results have been presented at the recent SPIE conference.

(3.4) Connections with other projects.

The technology issues are of course much the same for any VO project world-wide. AstroGrid and AVO are of course working closely together. We have also however been in close correspondence with our US colleagues in the US-NVO project, and have had several joint meetings on a variety of technical topics. Now that the International Virtual Observatory Alliance (IVOA) is building up, this technical coherence is even more important. Some issues (eg data formats) require us to implement the same technology. Other issues do not, but we still prefer to keep an open exchange of ideas.

Within the UK, the AstroGrid component of AVO has worked closely with the interdisciplinary core e-Science programme, especially with the OGSA-DAI project, and also with other application projects, especially MyGrid (bio-informatics) and GridPP (particle physics). Across Europe, AVO is part of an umbrella project called GridStart linking grid projects in all subjects funded by the EU. These links look set to become increasingly important.

(4) Progress against milestones

(4.1) Work Package 3.1 : Grid Technology

M3.1.1 : Benchmark Network Tasks (+12 months)

The original task is essentially complete, by virtue of joining in the PP monitoring network. However we have expanded it to design grid performance tasks. These are in draft form.

M3.1.2 : Middleware assessment report and recommendations (+24 months)

This task is well ahead of schedule, with an initial report already complete. It will however be revisited amd revised.

M3.1.3 : Network Analysis report and recommendation on strategic network investments (+24 months)

This task is on schedule.

M3.1.4 : Demonstration of working datagrid (+24 months)

This task is on schedule.

(4.2) Work Package 3.2 : Storage/Compute Technology

M3.2.1 : Deploy Trial Hardware (+12 months)

Due to decreased funds, this has not been undertaken. Rather, each partner organisations is using its own resources internally to deploy and test hardware.

M3.2.2 : Storage Assessment report and recommendations (+24 months)

This task is on schedule.

M3.2.3 : Architecture Assessment report and recommendations (+30 months)

This task is somewhat downstream, but the target date is unchanged.

(4.3) Work Package 3.3 : Database Technology

M3.3.1.1 Benchmark Datamining Tasks (+12 months)

This was completed ahead of schedule and we have already been making tests.

M3.3.2 : DBMS Assessment report and recommendations (+18 months)

An initial report is complete already. We are still improving this work however and expect to issue a complete report on time.

M3.3.3 : Working datamining system for external user testing (+24 months

This task is on schedule.

(5) Staff Effort Expended

The current effort deployed in WA3 is 5.6 FTEs (Full Time Equivalents) spread over seventeen individuals. This includes effort from four newly recruited staff, two of whom working wholely in this area, and two partly, and all of whom were recruited part way through the year. The integrated effort in WA3 over the year was approxiately 4.5 sy (staff years).

(5.1) Newly recruited staff using EU funds

Two new full time staff were recruited at Edinburgh University, dedicated entirely to WA3. In addition, two full time staff members recruited at CDS and at TeraPix worked partly in WA3.

The original plan was that the senior developer would concentrate on WA3.3 (Database Technology) and the junior developer on WA3.1 (Grid Technology). In fact, given Hill's wide experience, we have decided to deploy them as a team on a range of problems, and currently they both work 50% in each workpackage.

(5.2) Summary of WA3 staff effort

Note that effort is not fixed between the various sub-packages, but deployed as necessary. The FTEs below reflect the main effort this year.

Name Place Partner/EU funding Main WP FTE
Rixon AstroGrid(Cambridge) Partner 3.1 (manager) 0.25
Leoni ESO Partner 3.1 0.5
Fernique CDS Partner 3.1 0.1
Bonnarel CDS Partner 3.1 0.1
Wenger CDS Partner 3.1 0.1
Holloway JBO Partner 3.1 0.1
Hill Edinburgh EU 3.1/3.3 1.0
Maxwell Edinburgh EU 3.1/3.3 1.0
Wicenec ESO Partner 3.2(manager) 0.2
Knudstrup ESO Partner 3.2 0.2
Devillard ESO Partner 3.2 0.1
Suchar ESO Partner 3.2 0.2
Noble JBO Partner 3.2 0.1
Garrington JBO Partner 3.2 0.1
Richards JBO EU 3.2 0.25
Page AstroGrid(Leicester) Partner 3.3 (manager) 0.25
Davenhall AstroGrid(Edinburgh) Partner 3.3 0.25
Didelon TeraPix Partner 3.3 0.8
Tissier TeraPix EU 3.3 0.3

(5.3) Indirect support

Work in this area has also been benfitted indirectly from the overall work of the AstroGrid project. Only a small proportion of AstroGrid staff effort is formally devoted to AVO work, but a large fraction of the AstroGrid work is applicable to AVO. Likewise, at ESO, AVO has benefitted from the NGAST project aimed at solving ESO's own storage and archiving problems, although only a fraction of this effort is formally devoted to AVO.

(6) Dissemination of Results

Staff associated with WA3 have given many seminars and presentation on the work concerned over the last year, and have also given talks at several conferences, with the most important being the SPIE conference in Hawaii in August 2002, and a VO dedicated conference, "Towards the International Virtual Observatory" in Garching in June 2002. Both the latter two conferences have resulted in published papers. Much of the technology work done by AstroGrid staff has also been effectively published as part of the AstroGrid Phase A report.