head 1.1; access; symbols; locks http:1.1; strict; comment @# @; expand @b@; 1.1 date 2005.01.05.14.52.52; author AndyLawrence; state Exp; branches; next ; desc @none @ 1.1 log @none @ text @
Andy Lawrence
WA3 Manager
University of Edinburgh
E-mail: al@@roe.ac.uk
The overall purpose of WA3 is to assess, develop, and deploy new technologies needed to construct a Virtual Observatory. These technologies are in three main areas - storage and compute technology, Grid technology, and database/datamining technology. This is a challenging task, as the background technology is evolving rapidly.
This year we have completed two further major milestones - completion of a report on network issues for the VO, and the production of a prototype technical architecture as the basis for future Euro-VO full design studies. Overall we have completed ten out of twelve milestones allocated to WA3, and of the remaining two, one was formally abandoned at the start of the project due to lack of funds, and the other relates to an area that continues to be of interest to the partners, and so may be completed informally in due course. In addition to these key milestones, we have delivered improved functionality software prototypes for the demonstration planned for January 2005, and continued studies in grid middleware and datamining technology assessment.
The other development of note is the successful planning, by a consortium closely connected to the current partners, of a VO technology project that follows on directly from AVO-WA3 - the VOTECH project.
A summary of WA3 work and the key reports produced can be found at http://www.euro-vo.org/twiki/bin/view/Avo/WorkAreaThree
WA3 is under the overall management of A.Lawrence, at the University of Edinburgh, who is also the AstroGrid Project Leader. The AstroGrid consortium takes principal responsibility overall for AVO technologies. There are three sub-packages.
WP3.1 (Grid Technology) is undertaken primarily by AstroGrid staff, and is managed by G.Rixon at the University of Cambridge. Progress in this area is however central to the future design of the European Virtual Observatory (Euro-VO), so staff in other work areas are kept informed, and several are closely involved.
WP3.2 (Storage/Computer Technology) is undertaken primarily by staff at ESO, and is managed by A.Wicenec. Experiences at other data centres, including those outside the formal partnership, is however also collated.
WP3.3 (Database Technology) is undertaken primarily by AstroGrid staff, and is managed by C.Page at the University of Leicester. There is considerable intersection with staff in other work areas. There is also a programme of WP3.3 work at TeraPix under the direction of Y.Mellier. This has acted largely independently.
Middleware
Last year saw the delivery of a major deliverable in this subpackage - a report and reccomendations on Grid middleware. (this can be found here .) The main conclusions and recommendations from that report remain valid, but the external technology background has continued to evolve rapidly, so we have continued study and analysis in this area, writing many internal notes within our own web pages. However, most of the work in WA3.1 has shifted to (i) architecture development and (ii) deployment of new technologies in working systems.
(i) AstroGrid has developed a complete technical architecture. (This is a rather complicated set of interlinked web pages and UML diagrams. An overview can be found here and an entry point to the current draft of the detailed architecture can be found here . This technical architecture has two purposes. The first is to document the actual system deployed within the working AstroGrid pilot and relevant parts of the AVO demonstration suites, and to act as a guide for developers constructing new code. The second purpose is as a preliminary study for the technical architecture of the future Euro-VO system. A full design study for the actual Euro-VO architecture will be undertaken as part of the VOTECH project, taking the current architecture study as a starting point. Note that architecture assessment is a top-level milestone for AVO overall. The work described here constitutes the detailed technical component of that assessment.
(ii) We have deployed most of the technologies discussed in the middleware report, as well as several others, in two contexts - firstly in the AVO software demonstration suites, and secondly in the AstroGrid working prototype. These working systems are built around SOAP services with some additional experimental grid services. We have established two separate resource registries employing IVOA metadata definitions, through AstroGrid and through CDS. We have deployed a standardised data access layer and data exchange formats. We have working examples of distributed virtual storage (MySpace) and a workflow engine for composing complex jobs. We have developed a standardised interface definition for the integration of applications within the datagrid infrastructure. Through the AstroGrid system we have a deployment using a web based portal system running on Cocoon. Through the AVO demonstration suite we have a variety of applications such as Aladin and SeXtractor which can communicate with the same infrastructure.
Network studies
During this year we completed a major study on network issues for the Virtual Observatory, culminating in a thorough report and set of recommendations which can be found . This work includes a summary of current physical networks, an estimate of likely VO traffic requirements, measurement of network performance and routing between partners, and tests of a variety of protocols and accelerators aimed at improving bandwidth. It also acts as a network primer. We expect that this report will be of great interest to data centres as well as other VO projects and it will therefore be given a wide international circulation.
The clearest conclusion is that bandwidth bottlenecks are nearly always at end-stations, not in the national and European backbones. This has a variety of hardware and software causes, but overall it means that we will be wasting the investment in GEANT unless we make infrastructure investment at European data centres. This means fast modern hardware, disk striping, I/O parallelism and so on, as well as carefully chosen firewalls.
The second key conclusion is that significant improvements can also be made in choice of protocol - this can mean TCP tuning, and use of accelerators such as Axel, but we also recommend that the VO community consider adopting the bbFTP protocol developed in particle physics, as this combines tuning, encryption, and multi-streaming. We also recommend looking seriously at use of circuit-switched networks between data centres. (This is already being tested by European VLBI.)
Putting together hardware, software, and protocol improvements can make a large difference - experiments between Edinburgh and Cambridge improved bandwidth by a factor of fifteen. However this cannot be achieved for every end-user, for whom realistic bandwidth will be 1MByte/s or less. The basic philosophy behind the VO - offer query and analysis services at specialised centres, and shift the results not the data - is therefore strongly re-inforced. Between data centres, we might hope to achieve 20-30 MByte/sec. This is good enough to offer distributed storage services and for most joint queries to multiple archives, but some trawl-intensive queries will still benefit from local mirrors of large databases in order to minimise network traffic.
Words from Andreas at ESO.
Additional words from Jodrell if desired.
Last year we issued a formal report completing our deliverables in this area. (This can be found here .)
However considerable work has continued in this area, in continued testing of specific DBMS systems, in general research and analysis of techniques, and in deployment of some of the ideas, for example the cross-matching tool which is used both in the AstroGrid prototype and in the upcoming demonstration software suite for the planned event in January 2005. Much of the work in this area has been working towards plans for one of the key areas in the VOTECH project, "Data Exploration". The focus will shift away from database management systems and towards datamining technologies and algorithms, and how to deploy them within the emerging VO infrastructure.
Further work in the DBMS area has been carried out at the TeraPix centre in Paris.
Often, there is no clear distinction between work in WA2 and WA3, and the two areas collaborate closely. WA3 also works closely with WA1. The overall design for demonstration events and demonstration software suites is the responsibility of WA1, but in order to make these a reality, it relies on taking the R&D done in WA3 and WA2, and converting it into concrete deployments. Meanwhile, WA3 in general and AstroGrid in particular are building towards a robust general purpose component architecture which gives a framework in which WA1 can achieve its goals.
The International Virtual Observatory Alliance (IVOA) has become even more central\ to our work than before. A substantial fraction of our development work is aimed directly at developing new standards and protocols which we move through the agreed stages of standardisation. We develop these standards and our own software implementing such standards hand-in-hand. The IVOA is organised through a set of technical working groups, which communicate through email fora on a daily basis. We also hold twice yearly interoperability workshops. A large fraction of WA3 workers attend these workshops.
WA3 also has close interactions with other grid and e-science projects, especially within the UK e-science programme. Initially our closest contacts were with particle physics programmes, but in fact our requirements are much closer to biology, and we have had fruitful interactions with bio-informatics projects. On the international Grid scene, astronomy is recognised as a significant activity.Last year we collaborate with international colleagues in setting up a "Birds of a Feather" group within the Global Grid Forum. This met again during year three, as part of the Grid Global Forum.
M3.1.1 : Benchmark Network Tasks (+12 months)
This milestone was passed during year one. A report was issued which can be found at here
M3.1.2 : Middleware assessment report and recommendations (+24 months)
Web, Grid and e-Science middleware continues to evolve rapidly, and it is crucial for Euro-VO to keep on top of these changes. As a consequence, this is very much a rolling task. Formally however, we met this milestone by producing a report on grid middleware, with recommendations, at the end of year two.
M3.1.3 : Network Analysis report and recommendation on strategic network investments (+24 months)
This work was delayed but eventually completed very successfully, producing a very full network analysis report in January 2005.
M3.1.4 : Demonstration of working datagrid (+24 months)
This is an overall AVO task, with key contributions from WA3, which was essentially completed through the software suite demonstrated to the Science Working Group in January 2004. An improved system with added functionality will be demonstrated publicly in January 2005.
M3.2.1 : Deploy Trial Hardware (+12 months)
Due to decreased funds, this was not undertaken. Rather, each partner organisations is using its own resources internally to deploy and test hardware.
M3.2.2 : Storage Assessment report and recommendations (+24 months)
There has been considerable work in this area and sharing of ideas and experience between partners. A formal report has not yet however been produced. Even though the AVO project has formally completed, this is still of interest and we are likely to produce an informal report for the purposes of the new Data Centre Alliance (DCA) within the VOTECH project.
M3.2.3 : Architecture Assessment report and recommendations (+30 months)
A very thorough technical architecture for a VO system has been produced by the AstroGrid team. This is documented in a large complicated set of web pages linked here. This technical architecture was intended both as a working architecture for the AstroGrid pilot system, and as an architecture study for the final Euro-VO system. It will be developed into a full architecture design by the VOTECH project.
M3.3.1.1 Benchmark Datamining Tasks (+12 months)
This was completed in year one.
M3.3.2 : DBMS Assessment report and recommendations (+18 months)
A preliminary version of this milestone was complete ahead of time, during year one. In January 2004 we completed a final database technology report.
M3.3.3 : Working datamining system for external user testing (+24 months
This is an overall AVO task, with key contributions from WA3, which was essentially completed through the software suite demonstrated to the Swience Working Group in January 2004. An improved system with added functionality will be demonstrated publicly in January 2005.
As of January 2004, the effort deployed in WA3 is 6.6 FTEs (Full Time Equivalents) spread over eighteen individuals. (During the first half of the year however it was only 4.6 FTEs). This includes effort from both EC-funded staff, and staff effort contributed from partner funds. The integrated effort in WA3 over the year was approxiately 5.1 sy (staff years).
In Edinburgh, Davenhall started work with the project from November 1st, working full time in WA3. (He had previously been partner-supplied staff). Hill and Taylor worked full time for AVO in WA3 throughout the year. Other EC-funded staff contributions were the same as last year :
Note that both Tissier and Richards also contribute in other AVO work areas. Staff effort contributed by partner organisations was unchanged from last year apart from the subtraction of Davenhall, who had become an EC-funded employee, and an increase of effort by Page to 0.5 FTE.
Note that effort is not fixed between the various sub-packages, but deployed as necessary. The sy figure is the number of staff-years during year three, ie November 2002 to October 2003 inclusive.
| Name | Place | Partner/EU funding | Main WP | sy in Y2 |
| Rixon | AstroGrid(Cambridge) | Partner | 3.1 (manager) | 0.25 |
| Leoni | ESO | Partner | 3.1 | 0.5 |
| Fernique | CDS | Partner | 3.1 | 0.1 |
| Bonnarel | CDS | Partner | 3.1 | 0.1 |
| Wenger | CDS | Partner | 3.1 | 0.1 |
| Holloway | JBO | Partner | 3.1 | 0.1 |
| Hill | Edinburgh | EU | 3.1/3.3 | 1.0 |
| Taylor | Edinburgh | EU | 3.1/3.3 | 1.0 |
| Davenhall | Edinburgh | EU | 3.1 | 1.0 |
| Wicenec | ESO | Partner | 3.2(manager) | 0.2 |
| Knudstrup | ESO | Partner | 3.2 | 0.2 |
| Devillard | ESO | Partner | 3.2 | 0.1 |
| Suchar | ESO | Partner | 3.2 | 0.2 |
| Noble | JBO | Partner | 3.2 | 0.1 |
| Garrington | JBO | Partner | 3.2 | 0.1 |
| Richards | JBO | EU | 3.2 | 0.25 |
| Page | AstroGrid(Leicester) | Partner | 3.3 (manager) | 0.5 |
| Didelon | TeraPix | Partner | 3.3 | 0.8 |
| Tissier | TeraPix | EU | 3.3 | 0.3 |
Work in this area has also benefited indirectly from the overall work of the AstroGrid project. Only a small proportion of AstroGrid staff effort is formally devoted to AVO work, but a large fraction of the AstroGrid work is applicable to AVO - most obviously this year in the production of a technical architecture. Likewise, at ESO, AVO has benefitted from the NGAST project aimed at solving ESO's own storage and archiving problems, although only a fraction of this effort is formally devoted to AVO.
Staff associated with WA3 have given many seminars and presentations on the work concerned over the last year, and have also given talks at several conferences, with the most important being the SPIE meeting in Glasgow in July 2004. Within the worldwide VO community, results have been widely disseminated via our own web pages, by contributions to the IVOA document store, and by participation in twice yearly international IVOA interoperability workshops.
WA3 has achieved its basic goal of assessing datagrid technologies of relevance to the Virtual Observatory, making test deployments of those technologies, and, in collaboration with WA2, developing working examples of datagrids to test the concepts against the science drivers set by WA1. As well as achieving these basic aims, WA3 has accomplished ten out of twelve of the original milestones proposed in the Project Plan at contract negotiation.
Now that the AVo project has completed, it has become clear what else needs to be accomplished to achieve the long term goal of a Euro-VO. The overall vision is seen as needing a VO Facility Centre (VOFC), a Data Centre Alliance (DCA), and a VO Technology Centre (VOTC). This last part is the successor of the WA3 programme, together with some elements of the WA2 programme. Some of what we have developed in WA3 is ready for implementation by the VOFC; but other parts need continued work, and new concepts and technologies have emerged which need testing followed by designs. As well as continued basic infrastructure, the key needs are are for developments in new user tools, in automated resourced discovery, and in data exploration - data mining and visualisation.
@