head 1.1; access; symbols; locks http:1.1; strict; comment @# @; expand @b@; 1.1 date 2004.02.06.11.48.29; author AndyLawrence; state Exp; branches; next ; desc @none @ 1.1 log @none @ text @
Andy Lawrence
WA3 Manager
University of Edinburgh
E-mail: al@@roe.ac.uk
The overall purpose of WA3 is to assess, develop, and deploy new technologies needed to construct a Virtual Observatory. These technologies are in three main areas - storage and compute technology, Grid technology, and database/datamining technology. This is a challenging task, as the background technology is evolving rapidly. This year we have completed several major milestones - completion of our studies on grid middleware and datamining technology, culminating in public reports with recommendations, and deployment of sufficient working implementations of key technology advances to help the AVO overall deliver a very successful demonstration of genuine new capacity to our international Science Working Group. The major remaining task before us is to complete design of a recommended architecture for the coming Euro-VO. Some more detail on the WA3 programme, and key reports, can be found at http://www.euro-vo.org/twiki/bin/view/Avo/WorkAreaThree
WA3 is under the overall management of A.Lawrence, at the University of Edinburgh, who is also the AstroGrid Project Leader. The AstroGrid consortium takes principal responsibility overall for AVO technologies. There are three sub-packages.
WP3.1 (Grid Technology) is undertaken primarily by AstroGrid staff, and is managed by G.Rixon at the University of Cambridge. Progress in this area is however central to the future design of the European Virtual Observatory (Euro-VO), so staff in other work areas are kept informed, and several are closely involved.
WP3.2 (Storage/Computer Technology) is undertaken primarily by staff at ESO, and is managed by A.Wicenec. Experiences at other data centres, including those outside the formal partnership, is however also collated.
WP3.3 (Database Technology) is undertaken primarily by AstroGrid staff, and is managed by C.Page at the University of Leicester. There is considerable intersection with staff in other work areas. There is also a programme of WP3.3 work at TeraPix under the direction of Y.Mellier. To date this has been largely independent.
Middleware
This segment of our work has achieved significantly more than our objectives by this stage. The last year has seen very significant progress in two strands - deployment of web services, and trials of OGSA compliant grid middleware. This has culminated in the completion of one of our major milestones - a report and recommendations on grid midleware.
During year one, various protocols and standards for web services stabilised and real-world deployments began to proliferate. In the meantime, it was clear that OGSA-compliant grid services were not yet sufficiently reliable or stable. We therefore began a working deployment with industry standard web-services, with the intention of subsequent re-engineering with grid services. Web-service deployment mostly centred round the building of the AstroGrid component infrastructure (ref) but also within the AVO demonstration suites (ref). The technologies used were XML, SOAP, and WSDL, operating over HTTP. We did not use UDDI, but, together with other VO projects worldwide, constructed our own registries - this has been a major focus of the international standards effort during the last year.
Web services however will not provide all the VO's requirements, as they are synchronous, stateless, and do not carry authentication information. Potentially, OGSI-compliant grid services solve this problem. We have therefore spent much of the year running experimental OGSI deployments. These have been successful, but OGSI is still not stable or reliable. Meanwhile, the IT industry standard is developing separate methods of solving the limitations of web services, known as Web Service Context, or WS-CTX. The protocols are not yet complete, but it seems likely that for the basic VO infrastructure we should build round WS-CTX rather than OGSI. We have also been testing other elements of Grid technology, many of which look much more likely to be deployed in Euro-VO - for example OGSA-DAI for querying, and GIS/CAS for authentication.
Within AstroGrid, we have also been developing other key components of the VO infastructure - for example, a workflow system, the MySpace virtual grid-storage system , and a Cocoon based portal "harness" through which a variety of components can present their user interfaces.
Network studies
During most of year two, this task was put on the back burner because of lack of staff effort, and other tasks taking higher priority. However, from October onwards, with the hire of C.Davenhall, we have re-started this task. There are three main strands to current work. The first is an analysis of background issues and work done by other projects. The second is a programme of new measurements amongst the AVO partners, involving both analysis of of existing network data, to identify bottlenecks and make recommendations, and also a programme of improvements and tuning between partners - adjusting TCP parameters, installing GridFTP and tuning again, and so on. In early experiments this has already improved real bandwidth by an order of magnitude. Much of this is a familar story from other Grid projects such as Dante Perfmonit or GridMon, but in the case of astronomical data centres, we are additionally limited in practice by exit from and ingest to DBM systems and related software, such that disc I/O rather than internet bandwidth soons becomes the limiting factor. We also however have done experiments that show how to improved disc I/O by a factor of thirty, and will include such recommendations in our final report in this area, expected in Spring 2004.
The original plan to deploy hardware to the various AVO sites was abandoned early in the project due to lack of resources. Instead it has been decided to produce a report and recommendations for archiving and data center hardware. Consortium activities around hardware and system evaluation in this area have been undertaken as part of key projects that are part of each partners missions - especially the ESO Next Generation Archive System (NGAST), the VISTA Data Flow System (VDFS), and the MERLIN Archive . Yet more activities in this area are carried out outside the consortium and even outside the astronomical community at large, for example the Microsoft Research Report carried out in collaboration with the Sloan Digital Sky Survey team. In 2003 we have been collating the work of all these internal and external teams, both through published reports and private discussions. This is leading towards a final report expected during February 2004.
Since the area of storage solutions is evolving quite rapidly, reports are by definition outdated at the moment when they are published. Yet such reports are still valuable, because the collection of the information makes it more valuable than the single parts. Moreover some of the parts are evolving more rapidly than others. In particular the parts of the final report related to file systems and basic storage technologies are more stable, but also more complex to assess, since many of the possible solutions are highly dependant on the actual problems to be solved. Even in the pretty narrow domain of astronomical archives the boundary conditions may force the different groups to adopt different solutions. The main goal thus seems to be able to inter-connect the different solutions chosen by the different archives. For archives which will be established in the future the various solutions existing in the AVO community should be carefully assessed and if any possible one of those should be chosen. This requires that the existing solutions are documented in a way which makes it easy to carry out such an assessment.
Extensive studies in the performance of Database management systems and related technology has continued this year, mostly carried out within AstroGrid. This has culminated in the completion of one of our major milestones - a report and recommendations on datamining technology. .
Part of the work this year has involved quantitative evaluation of a set of specific popular DBM systems. There is no single clear winner, advantages depending somewhat on the circumstances and requirements of a particular data centre. There are however some general problems with standard relational DBM systems and SQL interfaces for astronomical datamining, such as not naturally accomodating the necessary metadata, coping poorly with extremely large datasets, and inadequate mathematical, statistical, and graphical facilities. Many of these drawbacks are now being addressed by the international VO community, for example developing "Astronomical Data Query Language (ADQL)". It was also noted that column oriented storage, available in a minority of products, has strong advantages for data-intensive querying.
Further work in the DBMS area has been carried out at the TeraPix centre in Paris, which this year has mostly been centred on WA2 topics. This work is therefore described in the WA2 report, although TerAPix staff effort is still formally accounted here within WA3.
Often, there is no clear distinction between work in WA2 and WA3, and the two areas collaborate closely. WA3 also works closely with WA1. The overall design for demonstration events and demonstration software suites is the responsibility of WA1, but in order to make these a reality, it relies on taking the R&D done in WA3 and WA2, and converting it into concrete deployments. Meanwhile, WA3 in general and AstroGrid in particular are building towards a robust general purpose component architecture which gives a framework in which WA1 can achieve its goals.
The global nature of the VO has become ever more obvious, and this has now become formalised through the creation of the International Virtual Observatory Alliance. The IVOA has set up a series of working groups, and holds twice yearly interoperability workshops. Some of these interactions are at political or scientific levels, but mostly they are on very detailed technical issues. WA3 work is closely aligned with the the IVOA working group agenda.
WA3 also has close interactions with other grid and e-science projects, especially within the UK e-science programme. The grid technology environment is evolving fast, and the most exciting new technology is not always the most reliable. We therefore have to keep two streams going - a conservative deployment stream, and a more high risk R&D stream. Astronomical recognition within this larger scene is improving. We have for example set up a "Birds of a Feather" group within the Global Grid Forum.
M3.1.1 : Benchmark Network Tasks (+12 months)
This milestone was passed during year one.
M3.1.2 : Middleware assessment report and recommendations (+24 months)
A preliminary version of this milestone was complete ahead of time, during year one. However, there have been continuing and considerable changes in the background technology, so that this work continued very actively. However we have completed a final report including recommendations as of January 2004.
M3.1.3 : Network Analysis report and recommendation on strategic network investments (+24 months)
Due to staff effort problems (slow recruitment, and a key staff member absent on military action) this task was temporarily put on the back burner. However it re-started successfully in October 2003 and we anticipate a report and recommendations in April 2004.
M3.1.4 : Demonstration of working datagrid (+24 months)
This is an overall AVO task, with key contributions from WA3, which has been essentially completed through the software suite demonstrated to the Swience Working Group in January 2004.
M3.2.1 : Deploy Trial Hardware (+12 months)
Due to decreased funds, this was not undertaken. Rather, each partner organisations is using its own resources internally to deploy and test hardware.
M3.2.2 : Storage Assessment report and recommendations (+24 months)
Initially this task was centred purely around the ESO NGAST team. We decided however to expand its remit to include experiences and experiments of other partners. A revised completion date has been during Spring 2004.
M3.2.3 : Architecture Assessment report and recommendations (+30 months)
This is now the key task in front of us during the next year, gathering what we have learned so far to conclude a top-level design for the Euro-VO that will follow on from the AVO project. We are aiming to complete this task in July 2004.
M3.3.1.1 Benchmark Datamining Tasks (+12 months)
This was completed in year one.
M3.3.2 : DBMS Assessment report and recommendations (+18 months)
A preliminary version of this milestone was complete ahead of time, during year one. However, substantial work has continued, and the final report brought back somewhat. We have completed the final report including recommendations as of January 2004.
M3.3.3 : Working datamining system for external user testing (+24 months
This is an overall AVO task, with key contributions from WA3, which has been essentially completed through the software suite demonstrated to the Swience Working Group in January 2004.
As of January 2004, the effort deployed in WA3 is 6.6 FTEs (Full Time Equivalents) spread over eighteen individuals. (During the first half of the year however it was only 4.6 FTEs). This includes effort from both EC-funded staff, and staff effort contributed from partner funds. The integrated effort in WA3 over the year was approxiately 5.1 sy (staff years).
There were several significant staff changes during year two, leading eventually to two new recruitments.
Note that Davenhall was recruited on completion of a PPARC funded contract working for AstroGrid, 0.25FTE of which effort was already devoted to the AVO project. During the period of this report (Nov 2002 - Oct 2003) his contribution was therefore entirely partner funded. Next year his work will be EC funded. Other EC-funded staff contributions were the same as last year :
Staff effort contributed by partner organisations was unchanged from last year.
Note that effort is not fixed between the various sub-packages, but deployed as necessary. The FTE figure is the maximum during the year. The sy figure is the number of staff-years during year two, ie November 2002 to October 2003 inclusive.
| Name | Place | Partner/EU funding | Main WP | max FTE | sy in Y2 |
| Rixon | AstroGrid(Cambridge) | Partner | 3.1 (manager) | 0.25 | 0.25 |
| Leoni | ESO | Partner | 3.1 | 0.5 | 0.5 |
| Fernique | CDS | Partner | 3.1 | 0.1 | 0.1 |
| Bonnarel | CDS | Partner | 3.1 | 0.1 | 0.1 |
| Wenger | CDS | Partner | 3.1 | 0.1 | 0.1 |
| Holloway | JBO | Partner | 3.1 | 0.1 | 0.1 |
| Hill | Edinburgh | EU | 3.1/3.3 | 1.0 | 0.7 |
| Maxwell | Edinburgh | EU | 3.1/3.3 | 1.0 | 0.5 |
| Taylor | Edinburgh | EU | 3.1/3.3 | 1.0 | 0.1 |
| Wicenec | ESO | Partner | 3.2(manager) | 0.2 | 0.2 |
| Knudstrup | ESO | Partner | 3.2 | 0.2 | 0.2 |
| Devillard | ESO | Partner | 3.2 | 0.1 | 0.1 |
| Suchar | ESO | Partner | 3.2 | 0.2 | 0.2 |
| Noble | JBO | Partner | 3.2 | 0.1 | 0.1 |
| Garrington | JBO | Partner | 3.2 | 0.1 | 0.1 |
| Richards | JBO | EU | 3.2 | 0.25 | 0.25 |
| Page | AstroGrid(Leicester) | Partner | 3.3 (manager) | 0.25 | 0.25 |
| Davenhall | AstroGrid(Edinburgh) | Partner | 3.3 | 0.25 | 0.25 |
| Didelon | TeraPix | Partner | 3.3 | 0.8 | 0.8 |
| Tissier | TeraPix | EU | 3.3 | 0.3 | 0.3 |
Work in this area has also been benfitted indirectly from the overall work of the AstroGrid project. Only a small proportion of AstroGrid staff effort is formally devoted to AVO work, but a large fraction of the AstroGrid work is applicable to AVO. Likewise, at ESO, AVO has benefitted from the NGAST project aimed at solving ESO's own storage and archiving problems, although only a fraction of this effort is formally devoted to AVO.
Staff associated with WA3 have given many seminars and presentation on the work concerned over the last year, and have also given talks at several conferences, with the most important being the IAU Symposium in Sydney in July 2003. Within the worldwide VO community, results have been widely disseminated via our own web pages, but also by contributions to the IVOA document store.
@