|
|
GSC(02)03 8 March 2002 |
Grid Steering Committee
VISTA Pipeline/Archive proposal (Jim Emerson)
Note by Guy Rickett
1. At its first meeting the Committee considered the VISTA e-Science proposal (e-VPAS) and sought a resubmission addressing a range of issues.
2. Attached is a revised proposal from Prof Jim Emerson, VISTA Consortium Director.
3. The Committee is invited to note the proposal and recommend an appropriate award.
GMWR
This page is intentionally left blank
VISTA Data Flow System & Test Bed
Principal Applicant
Prof JP Emerson (VISTA Consortium Director & VISTA Principal Investigator) on behalf of VISTA
Co-Applicants
Dr RG McMahon (VISTA co-I, AstroGrid co-I and Cambridge Astronomical Survey Unit – CASU) on behalf of CASU
Prof A Lawrence (VISTA co-I, AstroGrid Project Leader and Wide Field Astronomy Unit Edinburgh – WFAU) on behalf of WFAU
March 01, 2002
Contact Details
PPARC PIN 4880
Jim Emerson (VISTA Consortium Director and Principal Investigator VISTA)
Email: j.p.emerson@qmw.ac.uk
Queen Mary: Office +44 (0)207 882 5040 | FAX:+44 (0)208 981 9465
Mobile: +44 (0)794 127 1548
Edinburgh: Office +44 (0)131 668 8296 | FAX:+44 (0)131 662 1668
Address: Prof J.P.Emerson,
Department of Physics,
Queen Mary, University of London
Mile End Road,
London E1 4NS, UK
Home: London +44 (0)208 858 3076 | Edinburgh:+44 (0)131 667 2155
TABLE OF CONTENTS
2.3 CASU & WFAU -- the UK’s Wide Field Astronomy Units
4.1 WP-0 Project Infrastructure
4.1.2 Project Collaboration Panel
4.2 WP-1 Survey Planning & Progress Tools
4.3 WP-2 Quality Control and Calibration Pipeline Modules
4.3.1 Quality Control Pipeline
4.4 WP-3 Calibration Pipeline (UK)
4.5 WP-4 General Survey Product Pipeline
4.6 WP-5 Advanced Survey Products --Developments
4.7 WP-6 Survey Products Archive Access & Curation
4.7.3 Likely Technical solutions
4.8 WP-7 Liaison & Dissemination of Results
4.8.1 Interaction with VISTA Project Office
4.8.2 Interaction with VISTA Project Management Committee
4.8.4 Interaction with VISTA Consortium
4.8.5 Interaction with WFCAM users (UKIDSS)
4.8.6 Interaction with Joint Astronomy Centre
4.8.7 Interaction with AstroGrid
4.8.8 Interaction with ASTRO-WISE
4.8.9 Interaction with Sloan Digital Sky Survey (SDSS)
4.8.10 Interaction with NeSC, GridPP and other e-science
4.8.12 Dissemination of Results
5 Phase A (Apr 2002-Sep 2004) Requirements
5.1 Effort/Cost summary: Staff
5.2.1 WFCAM VISTA Test Bed Pipeline
5.2.2 WFCAM Test Bed Database/archive
5.3 Phase A Costs Sought Summary (£k)
Appendix B. Deliverables to ESO
Data Flow System User Requirements Document
Data Reduction Specifications Document
Instrument Description and Calibration Database
This bid seeks Phase A (to Oct 2004) funding for the VISTA Data Flow System & Test bed. We want to build a system which will produce massive amounts of data to
a) Prepare to service VISTA’s needs
b) Handle actual WFCAM data as a test bed for the even more challenging VISTA data volumes.
In outline we need to:
1) design and start to construct a system for survey planning and progress evaluation both for observing efficiency, variability studies requiring a sequence of windowed observation times and to enable timely completion of surveys and to enable early releases of partial survey products;
2) design and start to construct the components needed for pipelines to provide quality control measures and process camera data to the stage of photometrically and astrometrically calibrated single frames, with instrumental artefacts removed; (and deliver these to ESO for VISTA).
3) design and start to construct a pipeline to extract general survey products for defined surveys from single or combined calibrated frames;
4) design and start to construct the archive for the the calibrated frames and general survey products ensuring the related database contains the necessary parameters and capabilities (in close collaboration with AstroGrid) to enable their ready use in future Virtual Observatories;
5) carry out conceptual design of more advanced processing pipeline structure(s), for example to enable on the fly access and processing of data from the archive with user driven parameters or even algorithms;
6) test the concepts and prove the scalability of all of the above by test-bedding the VISTA DFS components on data from the UKIRT Wide Field Camera (WFCAM);
7) Prepare a Phase B proposal for further funds to implement the designs or complete the work started in Phase A, informed by the experience with WFCAM data.
Work done during Phase A will strongly affect the precise contents of Phase B (from Oct 2004) which will consist of testing pipeline development on WFCAM data and the completion and delivery of the VISTA pipelines well prior to the start of VISTA operations in the last quarter of 2006.
Funding for the VISTA operational phase from 2007 is not considered here but will be covered in a future submission for Phase B funding.
The long term aim is to provide the data in the form suitable for the tools developed by ourselves and by AstroGrid (and other related projects) to be efficiently and seamlessly employed to allow astronomers to work with these VISTA product databases both alone and in conjunction with other surveys. It is anticipated that the VISTA database will be the largest such database forming a component of the Virtual Observatory for a number of years.
Before describing the work that needs to be done it is appropriate to give some background to orientate the reader. This consists of a brief description of; the VISTA Project; the WFCAM project whose data will be processed with a prototype VISTA Data Flow System providing a test-bed to test the scalability for the concepts and implementations being adopted for VISTA; the UK’s Wide Field Astronomy Units both of which are expected to play major roles in delivering the Phase A work.
The Visible and Infrared Survey Telescope for Astronomy (VISTA) is a UK Universities initiative, funded with a grant from the Joint Infrastructure Fund (JIF) and subsequently supplemented with a substantial contribution from PPARC, to produce the world’s leading survey machine for near-IR (and eventually optical) astronomy. VISTA consists of a 4-m diameter telescope initially with a wide field Infrared imager with a 1.65 degree diameter field of view (16 detectors each 2048x2048 pixels (total 67 million pixels), operating in any of 3or4 main wavelength bands). [The telescope is also capable of taking a wide field optical camera with a 2.13 degree diameter field of view (total 209 million pixels, operating in any of 5 main wavelength bands) but currently insufficient funds are available to construct this.] VISTA will be sited at ESO’s Cerro Paranal Observatory in Chile for completion by August 2006. When completed the European Southern Observatory will operate it as an ESO telescope mainly for large scale surveys. ESO will credit a timely delivery of VISTA as a UK contribution in kind to the UK’s joining fee for ESO membership, so VISTA’s success holds an even wider importance to UK astronomy than that well merited by its importance in survey astronomy.
The complete end-to-end VISTA project in broad outline consists of:
The two items above are already funded by the VISTA Consortium together with PPARC. The remaining items are not fundable from VISTA Consortium monies.
The next three costs are borne by ESO (included to give a complete picture)
The remaining items need costs covered by the UK
and
The design and prototyping of the pipelines will include sensitivity to minimizing the running costs (which are not themselves sought at this stage).
It is envisioned that complementary/partner projects, such as AstroGrid, will strongly influence the final design of the database archive systems, and the ways of using them, leading to likely enhancements to the basic specification presented here. For example the calibrated frames and survey products will be placed in a database-archive the design, format, and use of which will be determined by the work on this generic issue being done in AstroGrid. AstroGrid tools will be used wherever possible to enable astronomers to work with these databases.
We have estimated, based on previous UK experience in constructing the pipelines for optical and IR image data, and supplying digital Schmidt survey archives, the resources that will be required to build VDFST to cater for the needs of VISTA and its user community.
The UKIRT Wide Field camera (WFCAM), scheduled to be operational by the end of 2003 (2.5 years before VISTA) and shares many science aims with VISTA, but is suited for the Northern sky.
WFCAM will use similar detectors to VISTA’s, but has 4 detectors instead of VISTA’s 16, and although it has a much smaller field of view it will provide a good sample of problems arising from sky variations (e.g. OH emission) across the field. In terms of number of pixels WFCAM data will be ¼ the volume of VISTA data, and the instrument will not be on the telescope all the time. Taking into account usage on each telescope WFCAM’s data volume will be 1/0th VISTA’s. Other than these differences WFCAM’s requirements are almost identical to VISTA’s, although it is the Joint Astronomy Centre Hawaii, rather than ESO, who will operate the instrument. It is for this reason that WFCAM data provides the almost perfect test bed for VISTA.
The developments for the VISTA pipeline will be prototyped as the pipeline for processing WFCAM data. This will allow us to test the real behaviour of the algorithms and system two years before the VISTA system has to be complete allowing us to tune, test and optimize the system in good time to deal with the actual VISTA data. This will allow us to demonstrate the scalability of the solutions adopted and substantially reduce the risks to the VISTA pipeline and database/archive.
The UK has two Wide Field Astronomy Units, the Cambridge Astronomy Survey Unit (CASU) and Edinburgh Wide Field Astronomy Unit (WFAU), which have been involved in processing and archiving digital wide-field surveys for many years. We will use the experience they gained from these surveys (e.g. APM/SuperCOSMOS, INT WFS and CIRSI surveys) to help design pipeline processing schemes, algorithms, define data products and produce a survey products archive for VISTA data. [APM Catalog website http://www.ast.cam.ac.uk/~mike/apmcat, SuperCOSMOS Sky Surveys website at http://www-wfau.roe.ac.uk/sss]
In addition to generic survey projects CASU have also developed pipeline processing tools to deal with user data from INGRID and the wide field mosaic optical imagers at the AAO and on the CFHT. We envisage that development of the tools available within these pipelines will form an excellent basis for building the data processing system for VISTA data, through the intermediate test bed of WFCAM data.
Likewise WFAU’s current science archiving and provision of user access to a multiplicity of different astronomical datasets will form the basis for the development of the VISTA survey products archive, through the intermediate test bed of WFCAM data
The good links that will be necessary between CASU, WFAU and Astrogrid already exist. For example the AstroGrid Project Scientist is co-located with CASU at Cambridge and has worked closely with the CASU group on the INT Wide Field Surveys, and McMahon is one of the Work Package Managers within AstroGrid. The AstroGrid Principal Investigator, Lawrence, is also in charge of the WFAU. Links will be thoroughly exploited during this project.
The full science case for VISTA was stated in the VISTA JIF bid which was very highly rated by the international assessment panel. The strength of this case was accepted by PPARC when it supported the bid, as was the need for processing and archiving of VISTA data. The decision of ESO to accept VISTA as a contribution in kind to the UK joining fee for ESO membership further underlines the international scientific importance of VISTA. We do not therefore elaborate on the science, but instead address the rationale for the sort of processing system required.
The strength, and efficiency, of survey telescopes such as VISTA is that the data obtained can be used/mined for many different science programmes. VISTA’s observations will have already been made encompassing everything (to the survey limits) on the (large) portions of sky observed at each wavelength band. Astronomers have then to use/mine the data -- they do not have to go to a telescope. This sort of database will form an important part of the Virtual Observatory towards which many astronomical e-science programs are pushing.
The IR Camera accepts a 1.65 degree diameter unvignetted field of view. The focal plane will be populated with 16 of 2kx2k non buttable IR science detectors with ~0.31”pixels. With the anticipated detector arrangement four integrations, separated by dithers of ~9.3’, fully cover an area of ~1.75 deg2. Dithering (small offsets) will be used to fill the interpixel gaps and deal with rogue pixels.
At each sky position imaged the 16 detectors, each 2048x2048 pixels and producing 4 bytes per pixel, will produce an 270MB file leading (taking into account integration times and dead times) to an overall data rate of 11.4MB/sec, or ~0.41 TB per 10 hour night. The IR camera is the only instrument on the telescope (even if the optical camera is added later it has almost the same data rate) and will operate every night possible. The raw data stream will come in at 150TB/yr for at least 15 years producing a total 2.25 PB of raw data. Processed data and extracted products will increase this by a factor of order 2. These are extremely large data rates and volumes in the context of astronomy due to the diverse types of processing that will be performed on the data.
The earlier WFCAM data will be the largest IR camera in existence when commissioned, and will remain so until VISTA is commissioned 2.5 years later. WFCAM’s expected data rate is 0.1TB per night for ~150 useable nights per year giving 15TB/yr. We propose to build the WFCAM pipeline system as a VISTA test bed and to use this system for production processing of WFCAM data and initial scalability tests for the VISTA system.
To exploit this data a processing system capable of keeping up with the incoming data rate is crucial. An observation planning system is first needed, to factor in the sky and wavelength coverage, quality of any data already available (sensitivity achieved in each exposure depends on the (variable) atmospheric transparency and stability) in a timely manner before deriving short and medium term schedules. Pipe-lines are needed to check the quality of all the science, technical and calibration data, and to remove instrumental effects, and to combine the output from each of the slightly offset dither positions used to fill gaps between detectors and dead pixels, into single photometrically and astrometrically calibrated frames for each detector. (The means to do the above must be delivered to ESO as part of the instrument). The calibrated frames must then be used to produce survey products for each defined survey which at a minimum will extract catalogues of the sources contained in each frame, or collection of frames; coadd multiple frames of the same field to reach fainter limits; stitch together many adjacent frames into larger images [we will always need to stitch a minimum of 4 such frames together to make a filled 1.75 sq deg image]. Dealing with these large data volumes/rates (and a variable sky background, transparency seeing and other factors) in the fixed period before the next deluge of data arrives requires very careful quality control, traceability and processing if we are not to be quickly overwhelmed by the large volumes of data. [Pipeline processing of IR data is much more challenging that the processing of optical data since the sky emission is 10-50 times brighter and varies both spatially and temporally in a more complex manner than at optical wavelengths. ]
We intend to provide standard processing that will be suitable for the science goals of most users (including non VISTA/WFCAM specialists), but also to make readily available the necessary innovative (science goal driven) mining tools for processing VISTA data (& WFCAM data with the test bed system) in more specialized ways by users whose science goals are better served by different processing of their data. To pick just one example, variability and proper motion studies formed a strong part of the JIF bid science case, and cannot be readily catered for in standard pipelines. The large data volumes make this a very interesting challenge.
Data will be taken according to a finite set of pre-defined parameterised observing protocols, in order to guarantee pipeline processing both at the telescope and elsewhere. These will include single shot, dither sequences, tile sequences and so on (see glossary in Appendix A for definition of terms) – the details of the protocols will be defined in discussion with the instrument hardware team early in Phase A.
To minimise data volumes the data acquisition system will carry out basic operations such as reset-correction and stacking multiple sequential exposures (an 'integration') at the same sky position in situ before forwarding data to the rest of the system.
Frames will produce in internationally agreed standard formats (currently FITS, the Flexible Image Transport System, is the astronomical standard but there are international efforts to define an astrophysical XML schema -- though FITS ‘header’ information may be thought of as metadata defining the context and format of the image). As the current standard used by ESO is FITS we will, for brevity, refer to the format as FITS, but we do not imply we are wedded to it if a new international standard emerges. Stored within the FITS files will be sufficient ancillary information to fully describe the data taken and to invoke the appropriate pipeline processing actions.
The general philosophy is that all fundamental data products are FITS, including the generated catalogue binary tables, and that all derived information (quality control information, photometric and astrometric calibration) and processing steps are also incorporated within the FITS headers. The FITS headers provide the basis for ingest into databases for archiving and databases for monitoring of survey progress and survey planning, and the processing is driven by the content of the FITS headers.
There are operational and practical requirements for several distinct (but related) data processing pipelines for VISTA (and similarly for WFCAM, except for geography):
1) a Quality Control Pipeline at the telescope site;
The Quality Control (QC) pipeline will consist of a subset of the operations carried out in the full Calibration Pipeline for data quality control.
An on-line database describing the observations, and the Quality Control information will be maintained (in Cambridge). This database will form the knowledge interface that helps define and plan the overall observing strategy.
2) for VISTA a Calibration Pipeline (ESO) will run at ESO HQ (in Garching, Germany) to calibrate frames for individual users. ESO’s data processing responsibility for the ESO community stops at this point. [This step, an ESO requirement, does not exist for WFCAM data, which goes straight from the telescope to the UK]
To extract science data for the UK a generic science survey product pipeline is needed containing:
3) Calibration Pipeline (UK) to produce calibrated frames in the UK (it is not clear that ESO can routinely output calibrated frames for VISTA survey data, and send them to the UK) and pass them on to the database\archive and to the general survey product pipleine
and a
4) General Survey Product Pipeline (GSPP) to combine appropriate individual frames and generate survey products from the frames for each defined survey.
However for full exploitation of the data a subset of users will inevitably find the general survey pipeline products are inadequate for their science aims. [To try and make the GSPP all things to all people would be a poor strategy as it would become overcomplicated for what most users will need]
5) Advanced Survey Product Development to deal with the science programs for which the GSPP will not provide enough, or the right sort of, information. e.g.: surveys needing colours (or colour limits) on all bands will compare individual frames taken in different bands to identify the many objects in common and calculate their colours, finding upper limits at predefined positions in bands with no detections; searching for variable or moving objects by comparing multiple frames of the same area or by searching across all data taken at a given sky coordinate; detecting low surface brightness objects; searching for parsec scale jets from young stellar objects; etc. etc.
There are also very taxing requirements for archiving the enormous data volumes of
1) Raw data frames
2) calibrated data frames
3) survey products generated from calibrated frames of combinations of calibrated frames
and requirements for ready access to and manipulation of these data and products with the aim of forming part of the Virtual Observatory/AstroGrid.
The Work packages defined for Phase-A are:
| WP-0 |
Project Infrastructure & Management |
| WP-1 |
Survey Planning & Progress Tools |
| WP-2 |
Quality Control and Calibration Pipeline Modules |
| WP-4 |
Calibration Pipeline (UK) |
| WP-4 |
General Survey Products Pipeline |
| WP-5 |
Advanced Survey Product Development |
| WP-6 |
Survey Products Archive Access & Curation |
| WP-7 |
Liaison & Dissemination of Results |
The detailed work package descriptions follow. (The Work packages for Phase-B will be developed as part of Phase-A). Formal User Requirements documents for each package will be developed early in Phase A and agreed by the Project Collaboration Panel.
The Project has several sorts of end users to satisfy. ESO as the operators of VISTA and the calibration pipeline at ESO HQ; Astrogrid/Virtual Observatory (VO) interested in the interoperability of VISTA’ enormous database of products with other materials in the VO; and primarily the science users of VISTA and (as a test bed) WFCAM data. Liaison between all of these will be crucial to ensure that the system developed is fit for this purpose, and to ensure that work is neither duplicated nor left undone in areas where VISTA project specific and VO general methods of working are involved.
The coordination and oversight of this undertaking requires major “diplomatic”, management and advisory functions that will require the attention of the PI as Project Leader at least half-time. It is therefore proposed that he be 50% bought out by PPARC from October 2002. [The PI is already available to work on this full time until end Sep 02.]
Within each of CASU & WFAU a local manager is needed to oversee the work locally at the level of 0.2FTE/yr. A further 0.1 FTE/yr would be needed for oversight from the VISTA Project Office particularly of WP-2 and WP-1. Thus a total of 1 FTE/yr would be associated with the management of the project.
For Phase A we envisage a formal Project Collaboration Panel consisting of representatives of many of the major collaborators involved to critically scrutinize and advise the Project on solving problems, issues, progress, and opportunities.
Chair
Project leader (Emerson – Queen Mary)
Members
VISTA Project Office Software Work Package Manager (e.g. Stewart - UKATC Edinburgh)
VISTA Project Scientist (Sutherland - Cambridge)
WFCAM Project Scientist (Warren- Imperial)
AstroGrid Project Rep (e.g. Project Scientist Walton – Cambridge)
PPARC Director of e-science (Geddes - PPARC) (or someone form the expected astro e-science oversight body)
In attendance
WorkPackageManagers as appropriate
WFAU representative (e.g. Williams – IfA Edinburgh)
CASU representative (e.g. Irwin – Cambridge)
The PCP will itself report on progress, though its Chair, to PPARC’s VISTA Project Management Committee, to the Grid Science Committee, to the VISTA Consortium Science Committee, and to the UKIDSS (WFCAM) Consortium.
Each of Work Packages 1 to 6 will have a designated Work Package Manager who is responsible for seeing that the work is carried out to schedule and specification, and for reporting on plans and progress to the PCP. These Work Packages will each produce a User Requirements Document and a Design Document and schedules and spend profiles for the work. These plans will include, where appropriate, specification of the necessary (Monte Carlo) simulations to test and understand the processing tools, their completeness, accuracy, etc. [Although random errors are predictable, subtle systematic effects including incompleteness of catalogues etc., aren't and have to be assessed by suitable detailed simulations.]
Each Work Package’s User Requirement Document and the Design Document will undergo peer review by a panel including external experts at “Preliminary” and “Final” Design reviews.
Performance against specifications, schedules and cost will be closely monitored by the Project Leader and PCP to ensure the goals of the Project are realized.
The manpower needed for the work in each of the workpackages has been identified and is given in the table at Section 5.1.
| Apr 02 |
Start Phase A work for January PDR |
| Jan 16 03 |
DFS for ESO: Preliminary Design review |
| Jul 17 03 |
DFS for ESO: Final design review |
| Aug 03 |
WFCAM commissioning data begins to be available |
| Dec 03 |
Real Test-bed date available (WFCAM operational) |
| Mar 04 |
Submit Phase B plan |
| Sep 04 |
Phase A complete |
| Oct 04 |
Phase B starts |
Phase A lasts until end Sep 2004 (when it is assumed the current tranche of e-science monies ends). In due course we shall seek funds for a further 27 months of work from Oct 2004 (Phase B), together with the necessary hardware to complete the developments and commission the VISTA system, and to implement those advanced processing requirements for which Phase A studies show effective solutions for. Phase B will end a few months after VISTA becomes operational, and will be followed by a period when funding will be needed for routine operation of the pipelines and archives. The planning for Phase B will form part of the Phase A work.
| Start Recruitment of Staff |
|
| Oct 04 |
Phase B starts |
| Jan 03 05 |
Test images from VISTA focal plane available |
| Sep 22 05 |
Instrument test (Europe) starts |
| Feb 09 06 |
European Instrument Integration for Preliminary acceptance complete |
| Mar 23 06 |
Instrument Commissioning Pt 1 begins |
| May 04 96 |
Instrument Commissioning Pt 2 begins |
| June 15 06 |
Final Instrument Delivery review |
| July 27 06 |
ESO Commissioning & Acceptance complete |
| Aug 16 06 |
VISTA scientifically operational. |
| Dec 31 06 |
Debug phase with real VISTA data complete |
| Phase B ends |
|
| Jan 01 07 |
Operational Phase begins |
A survey planning & progress system will be necessary, and will need to interact with, and preferably be integrated into, ESO’s scheduling system.
Automatic monitoring of survey progress with minimal user input via an online database of observations done and awaited is required to produce the schedules for intelligently scheduling observations from the vast number of survey fields that VISTA could potentially observe on any night. We will study whether to use a RDBMS or OORDMS (such as Oracle 9i, as Objectivity may be in financial difficulty).
This database could also provide the driver for many of the more advanced processing options that require predefined (asynchronous) observations to have occurred in for example, multiple passbands or multiple visits to the same field for deeper stacking.
The best survey strategy given Quality Control feedback and weather predictions is an NP (Non-Polynomial) hard combinatorial problem and will involve innovative exploration of (pseudo) optimal techniques such as genetic algorithms.
The provision of simple visualisation of overall survey progress is also necessary. This should include interactive GUIs and CLI for querying survey DQ, state of processing, progress of survey, and should also be capable of feedback to observing strategy.
These tools should be integrated into ESO’s observation planning system (which is not currently adapted to the planning of large surveys) to enhance it for use with survey telescopes. In view of need for the existing structures at ESO to continue to support non-survey observations this may not be possible at ESO in time for VISTA first light, and so we baseline a system run in the UK.
The main purpose of the Quality Control (QC) pipeline is to generate near real-time quality control information by preliminary processing data on-the-fly using library master calibration frames. It will also produce the export (FITS) files and instigate a header and data verification program.
The QC pipeline will have to maintain and update a ''best guess'' series of master calibration frames (dark, flats, skies ....) to provide basic 2D instrumental signature removal. The QC feedback will require some form of catalogue generation software which can be used to produce measures for QC monitoring, including: pointing accuracy, photometric throughput; image shape measures, sky brightness levels; sky noise levels. This software should be a subset of the Calibration pipeline modules.
Data should already come with the following QC measures that will greatly enhance the usefulness of data products for the end user:
Individual frames will have the above information written into their headers or available from logs; QC pipeline will then derive further QC measures and insert into headers:
As it is crucial that accurate, and complete FITS headers are attached to all files for automatic pipeline processing a FITS header (and data) verification program should be run as part of QC on all telescope-produced FITS files before export to the Calibration Pipeline. Any faults should be fixed in situ before shipping.
On rare occasions there will be science requirements for real-time data processing e.g. for transient event detection such as moving objects or variables. A quick system for doing this at the telescope is highly desirable, but such a system is not needed for delivery for VISTA first light, so we do not seek funding for it now.
For data transport from Paranal to Garching, ESO plan to ship all data on removable magnetic disk. These will be plugged into RAID arrays in Garching and kept on-line for use in the (ESO) Calibration pipeline. WFCAM has not yet decided how its data will be shipped to Cambridge.
The Calibration Pipeline will operate on single pointings or observing sequences involving dithers (and for WFCAM also micro-steps) taken during a single night to produce instrument signature-free images. The astrometric and photometric calibration will use object catalogues derived from the frames. The two-dimensional image data products at this stage will be instrumental signature-corrected: single frames; lossless interleaved super frames; stacked dither (super)frames; and confidence maps for all pipeline output products.
In more detail the requirements are:
The steps in the Calibration Pipeline are as follows:
The data products from the Calibration Pipeline will be:
The single frame catalogue data product will be lists of detected images with set of parameters summarising useful astronomical information and providing the necessary DQC information.
The Deliverables to ESO are defined in detail in Appendix B. They consist of recipes, procedures and software necessary to run the Data Quality Control Pipeline at the telescope and the Calibration Pipeline at ESO HQ.
The WFCAM Calibration pipeline is carried out in Cambridge in the UK, and some Quality Control modules may be delivered to the Joint Astronomy Centre Hawaii. Apart from this geographical detail the requirements are very similar and the modules in the WFCAM pipeline will be excellent test bed modules for the VISTA pipeline
The raw data frames received from ESO will be passed through the Calibration Pipeline (UK), to produce calibrated frames, using the same modules as delivered to ESO for their Calibration Pipeline.
This is necessary, despite the existence of the ESO Calibration Pipeline, because ESO do not have a system to produce survey products from large collections of individual frames, their system is apparently well suited for individual users wanting data from a few individual observations but will not routinely output large volumes of calibrated data frames. However neither is this a problem for the UK as, in anticipation of this difficulty, the UK has ensured its own copy of the raw data, and will have just such a pipeline, the components of which were delivered to ESO.
Even if the ESO Calibration pipeline was routinely run and Calibrated frames exported to the UK, that Pipeline can only use prior knowledge and there will inevitably be occasions when the Calibration performed at ESO HQ becomes out dated, for example when previously unrecognized problems, in data used, effects catered for, or software used, are found. It is not foreseen that the ESO HQ CP will be able to routinely reprocess all data when this occurs. This in another reason it is necessary to also run a Calibration Pipeline in the UK.
Depending on the details (to be agreed) of the raw frames received from ESO CASU will augment the FITS headers of the raw data frames with extra housekeeping data from logs, fix FITS headers and do any other operations necessary to make its output Calibrated frames compatible with the database specification (to be derived with AstroGrid) that Calibrated frames will reside in. [Ideally agreement with ESO will make this unnecessary, but the different constraints (financial, managerial and standards) that might apply to what is produced at ESO make designing such a pipeline necessary for the UK].
All processing done by this pipeline will record progress in image FITS headers and derive further QC measures
As all the modules for such a pipeline will be available the pipeline will be constructed for use in the UK, and will also serve as a test-bed for any modifications needed to the QCP and CP run by ESO.
After the calibrated frames have been produced and run through the General Survey Products Pipeline in Cambridge they will be sent to Edinburgh where they will be ingested into the database archive which will form the basic information on which much of the advanced pipeline will operate.
The primary aim of the General Survey Product Pipeline is to follow a well defined set of generic processing operations designed to extract maximal general purpose astronomical information. These products will be science goal driven by the main astronomical surveys that will be undertaken and will include: detection, parameterisation and cataloguing of objects over given regions; coadding stacked frames for deep surveys; mosaicing frames to produce contiguous tiled images; mosaicing tiled images to produce seamless large area images; extracting catalogues from same; assessment of the accuracy, reliability and completeness of derived catalogue products.
All processing done by the pipeline will record progress in the image (and catalogue) FITS headers and form the basis for deriving the final QC measures
More advanced processing and other non-standard (and more open ended) options will form part of the challenges for the Advanced Survey products work package.
CASU analysis produces 32 4-byte parameters per detected object. It is proposed that the parameters derived from an image in a single passband will be similar to the INT-WFC set in Appendix A but enhanced to include error estimates for many of the parameters. Input from the user community will be used to help finalise the details of the parameter set and we shall also likely include a much larger set of parameters per object including as well as error estimates, different flux measures - e.g. Petrosian magnitudes, list-driven flux measures expressed in asinh magnitudes (whereby flux estimates at the same position in all other wavebands are obtained given a significant detection in any one passband) and other parameters adopted in the Sloan Digital Sky Survey (SDSS) project (e.g. Stoughton et al AJ 123, 485, 2002).
In addition to the independent archive of housekeeping data, information pertinent to subsets of image/catalogue data will be stored with those data (in the form of header information). Housekeeping data will include generic QC measures including site information (conditions/weather), software versions, 'release' versions (for calibrations etc.)
The most pressing requirement is to provide a robust system that can be guaranteed to be available on schedule for VISTA first light. However to deal with the vast quantities of data in a timely way, and to produce products allowing a greater range of science purposes, and to control running costs (minimize need for human intervention) there are a number of developments that are highly desirable requiring more novel methods of data processing. Whilst our (prudently modest) baseline plan for first light does not include use of these techniques this is solely because the implementation of some is insufficiently well defined to allow us to guarantee to offer them at VISTA first light. However we also propose to work on these advanced pipeline issues, and to update our pipeline plans where appropriate in the light of clear progress to effective solutions. Building a remotely accessible and configurable pipeline, accessing a vast database, and with large accessible compute power is what is needed.
Without going into great detail two generic reasons for wanting such advanced pipelines are
1) The specification and implementation of the CP and GSPP will inevitably not satisfy the purposes of all users (though we expect it to do so for the majority), and the ability to reprocess the data using alternative techniques will be required by a significant subset of users. Traditionally, an end-user with specialised requirements (e.g. searching for variable objects in many frames covering the same bit of sky) would set up a processing system at his/her home site and copy the data en bloc to local discs for processing. For any program that is using the real power of VISTA or WFCAM (their data volumes) this traditional approach is nigh impossible. The end-user needs remote access to the data, the tools to work with it, and the processing power to handle it, and to receive the results. A remotely accessible and configurable pipeline, accessing a vast database, and with large compute power is what is needed.
2) Traditionally sky surveys have processed all their data in a single (set of) pipelines and released the products to their user communities for scientific use, with the hope (but not expectation) that no reprocessing would ever be needed. However, despite all plans to the contrary, most surveys have had to modify and rerun their own pipelines several times in response to their evolving knowledge of the data and the effects of the processing (e.g. as a result of upgrades/bug fixes). Such survey products have generally been extremely productive when carefully designed and implemented (as most have been), but the manpower involved in re-processing has often required calling a halt for financial rather than scientific reasons. Also later reprocessing by users has been effectively limited by lack of easy access to the data, programs and computing power.
We need to ensure that, on the timescale that the vast amounts of VISTA data will be received, there are science and cost effective ways of overcoming the difficulties and limitations formerly associated with use of standardised pipeline products. (Similar ideas are described by Annis et al in “SDSS Virtual Dat Requirements: A weak lensing map as a prototype data set” at ww.griphyn.org/documents).
The likely solution is to design and implement the VISTA pipelines and all intermediate products and archives with the ideas of enabling advanced user-driven remote reprocessing on the fly of all levels of processing (even of raw data), in addition to the access that will anyway be needed to prepare reduced image archives and their associated large catalogue databases. Designing the system this way from the first will make the inevitable reprocessing needed for standard products relatively straightforward, and allow much fuller scientific exploitation of all the data, limited by the imagination of scientists, rather than the practicalities of getting at and working with the data. Possible routes that will, inter alia, be explored are
a) Driving the pipelines from a database e.g. stacking images taken at different times and producing contiguous tiled regions could be driven automatically from a DataBase driven list of survey fields and survey progress e.g. is it worth (re)stacking this field? [includes assessing various stacking techniques, i.e. how to deal with different PSFs, and producing object catalogues from stacked images]. This could enable on the fly reprocessing from raw data frames on demand.
b) Enabling users to operate their own modifications to the standard pipelines, by allowing external users to upload or stage own code to process selected regions, e.g. user-provided code for optimal (for his purpose) aperture photometry from a pre-defined target list on each frame to ensure that all images, present or otherwise, are analysed in the same special way.
Other Advanced features that are desirable as parts of a Toolkit include:
It is premature to discuss technical solutions that may emerge before we have even started our Phase A study. However the problems of handling advanced processing of the data volumes associated with the ~1TB/day data volumes associated with VISTA are signifcant. Because of the need to access vast amounts of data it seems likely, given current projections on network speeds, that it will often be most efficient for requests to process VISTA data to be processed on a machine closely related to the bulk data being processed. Thus we will be receiving requests from across the network but may be doing the processing locally, and might not normally need to seek extra computing power, or data, from elsewhere. However compute grids are relatively simple to superpose on the infrastructure of a data grid and the computing load due to user-invoked pipelines and processing may be erratic and hard to predict; hence it may prove infeasible to always have sufficient computing resources for this function and it may also become part of a compute grid. The best solution is not clear, but should be defined, in close liaison with AstroGrid, during our Phase A study.
Defining the way to best federate VISTA (&WFCAM) catalogues with other catalogues is a task that will be left to Astrogrid, but the intent is to implement such a system when work by AstroGrid (and other Virtual Observatory projects) has defined the best way to implement this.
The UK’s copy of the VISTA (& WFCAM) raw data will be kept in Cambridge. This will form both a backup copy of the ESO archive.
The Survey Products Archive (SPA) will consist of a (eventually) publicly accessible database of astrometrically and photometrically calibrated images and object catalogues along with housekeeping data. Facilities will be in place to restrict data sets to appropriate individuals or groups whilst data awaits release during proprietary periods. The SPA will contain all data from VISTA (and analogously for the WFCAM test bed). The SPA will thus consist (in surveyed areas) of the following logically independent entities:
The intermediate basic survey products (providing they don't need reprocessing because of bugs) should together with the archive of master calibration frames contain all the information necessary for any subequent processing, advanced or otherwise.
Data products extracted from the SPA will be in FITS format (images; small object catalogues attached to images; very large object catalogues) and ASCII format (general object catalogues and housekeeping data). Catalogues will additionally be available in a (Tab-separated List – TSL?) format for compatible with AstroGrid compliant browsers and remote server communication.
The design of the database will be very important to ensure it can both handle the data volumes and deal with requests in a timely manner, and in ways that are compatible with AstroGrid/VO developments.
The end-user will access the archive via the internet and Graphical User Interfaces, or through more advanced grid interfaces (there is no requirement to distribute all or part of the archive on hard media). Pixel data will be supplied with astrometric WCS, flux calibrated pixels, and confidence maps which include bad pixel flags/masks; object catalogues will be merged across available passbands and shall include error estimates for subsets of the more important parameters.
The database/archive will be constructed in such a way so as to make data available to users as soon as is practicable - hence it must be able to cope with incomplete observations and regular data ingest without disrupting online access to the existing database.
The end-user will be offered ranges of the following parameters for example-:
Survey database federation with other source catalogues (primarily SDSS and Schmidt photographic sky surveys) is considered an AstroGrid responsibility.
the database/archive will be constructed in such a way so absorb AstroGrid-type functionality/enhancements
Over the last few years, provision for user access to large databases has grown from simple position and proximity-based querying capability to highly flexible systems employing 'structured query language' (SQL) syntax. Examples of the former are the UK photographic archives at Cambridge (APM), and at Edinburgh (SuperCOSMOS) while an example of the latter is the 2MASS archive. (www.ipac.caltech.edu/2mass)
The most recent large-scale archive, that of the SDSS (www.sdss.jhu.edu) , employs an extended SQL (SXQL) syntax in a flexible database system (SX) which sits on top of the commercial object-oriented database management system Objectivity; details are available at the SDSS web. The object-oriented approach brings many advantages, not the least of which is enabling whole database searches for specific queries to be executed quickly provided the database is indexed (or 'tagged') in an appropriate way. A simple example of an SXQL procedure is:
// Select all stars in a given RA range
SELECT
RA(),DEC(),J,H,K
FROM sxStar
WHERE (RA() BETWEEN 15 AND 16 && g < 20)
where the select-from-where construct allows the user to choose the parameters to be output, the 'class' of object to be searched, and the search criteria to be applied to all images in the searched class.
A mirror site for the SDSS EarlyDataRelease (Stoughton et al AJ 122,485, 2002) is being established at WFAU. Given the manpower effort expended on the state-of-the-art SDSS system and its obvious flexibility, it looks a promising basis for the VISTA (& WFCAM) Surveys Products Archive. The general feeling is that SX is likely to become the community-wide standard since much development has already been done on it. We propose that SX be test bedded for VISTA using WFCAM data, and experiments with SX are underway at WFAU including implementation on a Beowulf cluster for enhanced access speed.
Close liaison will be necessary with other bodies with an interest in the delivery of a well functioning end-to-end system and the exploitation of its data products, and with bodies whose expertise can be helpful. The involvement of both McMahon & Lawrence in this proposal and in AstroGrid will, along with other measures below, ensure the closeness of work with AstroGrid, and the use of WFCAM to test-bed the VISTA processing system data will ensure the closest of liaison with and learning from WFCAM’s experience.
The VISTA Project Office (VPO) at the UKATC manages the project to construct the VISTA telescope, camera and associated Facilities and deliver it to ESO. The camera workpackage is likely to be taken by a Consortium involving CLRC, ATC & Durham.
The Data Flow System deliverables to ESO must be available for VISTA to be able to complete various acceptance tests of VISTA. Therefore the VPO must authorise the schedules and interfaces and any other items that interact with the design of the DFS modules for delivery to ESO (WP-2).
Close interaction with the instrument team (through the VPO) will be required to ensure that appropriate information is provided for the DFS to operate on. As resource for this work is not in the VPO allocation we budget 0.1FTE/yr for this work.
PPARC’s VISTA Project Management Committee (VPMC) is charged with oversight of the delivery of VISTA to ESO, and will therefore also need to oversee the part of the VDFST that are ESO DFS deliverables, through its oversight of the VPO.
ESO will integrate the ESO DFS deliverable modules for VISTA into their Data Flow System’s in Paranal (Quality Control) and Garching (Calibration). It will be necessary to ensure that the modules and files conform to ESO standards (or obtain waivers for those that cannot) and that the results meet the expectations of recipients of the calibrated frames. This will be achieved by involving ESO in all formal Reviews and in informal discussions.
As representative users of VISTA data, the VISTA Science Committee will, review the proposed products and algorithms to ensure their acceptability to the end users. It is anticipated that the VISTA Science Committee will be augmented with members from other ESO member states, so that the views of a wide range of users will be available
As representative users of WFCAM data, the UK Infrared Deep Sky Survey (UKIDSS) Consortium will, review the proposed WFCAM test bed products and algorithms to ensure their acceptability to the end users.
The Joint Astronomy Centre in Hawaii manage the project to integrate the WFCAM camera onto the UKIRT telescope, and control the content and format of the data sent to the UK from UKIRT. Interaction with JAC will be required to ensure that appropriate information is provided for the DFS to operate on, and that appropriate QC information are recoded at the telescope.
AstroGrid was partly conceived with VISTA (&WFCAM) data in mind but is mainly starting work on static output i.e. existing large catalogues which it will federate with other large catalogues or data. AstroGrid will provide the tools to federate these databases with other large databases elsewhere. This will be very useful once we have catalogues, but requires the data to have already been processed and the catalogues and final images to be already available. This proposal concentrates on producing the databases on which the AstroGrid tools should eventually work.
There has already been VISTA (& WFCAM) representation at major AstroGrid meetings and we will liaise very closely with AstroGrid (the two co-applicants are leading participants in AstroGrid) and the wider European ‘Astrophysical Virtual Observatory’ to ensure that our design and systems draw maximum benefit from, and are design with the flexibility needed to have maximum compatibility with their work. Effort will therefore be earmarked for close regular liaison with AstroGrid work, and any other similar work going on elsewhere. A VDFST representative should attend AstroGrid meetings.
ASTRO-WISE is the Astronomical Wide-field Imaging System for Europe an RTD programme funded by the EC Action "Enhancing Access to Research Infrastructures" a partnership of NOVA/Kapteyn Institute - Groningen, Osservatorio Astronomico di Capodimonte - Naples, Terapix - IAP Paris, ESO Garching bei München, Universitäts-Sternwarte München and VISTA. The aim of the ASTRO-WISE programme is to provide an astronomical survey system for OmegaCam (the optical sky survey camera on ESO’s VST wide field telescope). The programme consolidates the common expertise of the partners and co-ordinates the development of software tools. In the successful application a liaison task between ASTRO-WISE and VISTA was specified to ensure coordination and exchange of expertise. VISTA’s initial prime tasks will be to provide advice on operating survey systems.
This interaction is already in place and should be fruitful to both parties, especially as users of southern Wide Field Surveys will turn to VISTA for IR data and VST for optical data. In particular the cross fertilisation of studies, designs and code should lead to maximum cost effectiveness both for VISTA, and for VST users in the UK.
We are well aware of the developments that have taken place for the SDSS and members of the WFAU have had extensive discussions with Sloan. The UK’s existing expertise in processing survey data is not superseded by the Sloan work, and remains applicable. The actual pipeline to reduce data is relatively unaffected by Sloan work as it is instrument specific. The ways of putting the Sloan data into the archive, indexing and querying it, are novel features which we expect to evaluate and likely use for the access and use of VISTA survey products. Notably we expect to baseline the system on the Sloan’s SX and will test bed it on WFCAM data. This will be immensely important in the exploitation of the products but as we have emphasized elsewhere the purpose of much of our work is to produce pipelines to produce data to go into such databases, or to operate on data extracted from such databases, not to try to design the databases themselves all by ourselves, so this aspect of dealing with survey products will lean heavily on the progress made and recommendations produced by AstroGrid and discussion with it.
Full use will be made of the potential advice that can be obtained from NeSC and GridPP. Both Cambridge and Edinburgh host regional e-science centers. Various meetings at NeSC have already been attended and useful contacts made. The Chair of the GridPP Collaboration Board (Steve Lloyd) is at Queen Mary (the Institution of this PI’s proposal) and there has already been much discussion at Queen Mary about possible synergies between GridPP and VISTA work. This led to a successful joint proposal by Emerson & Lloyd for a PPARC e-science studentship to work on VISTA/GridPP and in particular on using computing power accessed through the Grid to deal with the computationally challenging problem of stacking (coadding)/tiling multiple astronomical images taken at different times (and hence under different conditions with different point source response functions) to produce deep images of contiguous tiled regions. e-Science resources provided to Queen Mary though SRIF have allowed a total of £1.2M to be allocated to refurbish and equip a machine room as an e-Science facility for the College, shared between Particle Physicists, Astronomers and members of the Chemistry Department (Prof Coveney) working on the Reality Grid, funded by EPSRC. Interactions with NeSC and other e-science centers (and with the other e-Science initiatives at QMUL) will be actively pursued, both to ensure that we neither ignore nor duplicate work elsewhere, nor assume work elsewhere will solve our own problems when it won’t.
Whilst there is no Work Package associated with this item we point out the potential wider applicability in astronomy of the system we are proposing. Our system is specifically intended to handle the VISTA data problem, and to test possible solutions using WFCAM data as a test bed. However many aspects of such a system should lend themselves to dealing, without much modification, with data streams from other imaging survey instruments, for example ESO’s OmegaCam which will provide complementary optical data to VISTA’s IR surveys. The science benefits flowing to the UK from the accessibility of such a system to process other surveys should not be underestimated.
Presentations will be made in national and international meetings and to AstroGrid, the Astronomy Virtual Observatory etc. etc. Methods will also be disseminated through meetings of the SPIE and the regular ADASS (Astronomical Data Analysis Software & Systems) and other topical meetings, and in specialised peer reviewed journals.
VISTA data products will be released with Explanatory Supplements to explain functionally the ways the data was processed. Those techniques which are too detailed to go in printed Explanatory Supplements, will be documented and disseminated via the www (which will also hold the Explanatory Supplements).
Rather than costing each individual and their travel and overhead needs we have assumed an average cost of £65k/FTEyr based on actual mean costs at CASU corresponding to a senior PDRA @ £32.5k +22%NIS +46% overhead + 2.2k 8% secretary with overhead + 3.9k travel + 1k workstation/laptop provision. The travel level takes account of the need for liaison/collaboration meetings in various locations in the UK and Europe (no travel to Chile or Hawaii is included). The total travel fund will also finance the travel of the external members of the Project Collaboration Panel to its meetings.
| STAFF EFFORT |
Apr02 Sep02 |
Oct02 Sep03 |
Oct03 Sep04 |
by task |
|
| WP-0 |
Project Infrastructure/Management |
||||
| Management |
|||||
| Project Leader |
0.00 |
0.50 |
0.50 |
1.00 |
|
| UK ATC Management effort |
0.05 |
0.10 |
0.10 |
0.25 |
|
| CASU Local Management |
0.10 |
0.20 |
0.20 |
0.50 |
|
| WFAU Local Management |
0.10 |
0.20 |
0.20 |
0.50 |
|
| Hardware System Management |
|||||
| CASU hardware system Mgt |
0.10 |
0.20 |
0.20 |
0.50 |
|
| WFAU hardware system Mgt |
0.10 |
0.20 |
0.20 |
0.50 |
|
| WP total by yr |
0.45 |
1.40 |
1.40 |
3.25 |
|
| WP-1 |
Survey Planning & Progress Tools |
||||
| 0.00 |
0.20 |
0.50 |
0.70 |
||
| WP total by yr |
0.00 |
0.20 |
0.50 |
0.70 |
|
| WP-2 |
Quality Control and Calibration Pipeline Modules |
||||
| Modules |
0.20 |
1.00 |
1.00 |
2.20 |
|
| Management in addition to WP0 |
0.05 |
0.20 |
0.20 |
0.45 |
|
| Liaison in addition to WP7 |
0.05 |
0.10 |
0.10 |
0.25 |
|
| WP total by yr |
0.30 |
1.30 |
1.30 |
2.90 |
|
| WP-3 |
Calibration Update Pipeline |
||||
| Design Coding Testing Development |
0.35 |
0.70 |
0.70 |
1.75 |
|
| WP total by yr |
0.35 |
0.70 |
0.70 |
1.75 |
|
| WP-4 |
Survey Product Pipeline |
||||
| Design Coding Testing Development |
0.25 |
0.50 |
0.50 |
1.25 |
|
| WP total by yr |
0.25 |
0.50 |
0.50 |
1.25 |
|
| WP-5 |
Advanced Pipeline Structure Developments |
||||
| Design Coding Testing Development |
0.25 |
1.00 |
2.00 |
3.25 |
|
| WP total by yr |
0.25 |
1.00 |
2.00 |
3.25 |
|
| WP-6 |
Survey Products Archive Access & Curation |
||||
| Design Coding Testing Development |
1.00 |
2.00 |
2.00 |
5.00 |
|
| WP total by yr |
1.00 |
2.00 |
2.00 |
5.00 |
|
| WP-7 |
Liaison & Dissemination of Results |
||||
| 0.15 |
0.40 |
0.40 |
0.95 |
||
| WP total by yr |
0.15 |
0.40 |
0.40 |
0.95 |
|
| GRAND total FTEs |
2.75 |
7.50 |
8.80 |
19.05 |
|
| COST |
£178.8 |
£487.5 |
£572.0 |
£1,238.3 |
|
Application of Moore's law makes it imperative to delay detailed configuration and purchase of the majority of the VISTA hardware for the actual running pipelines as long as possible, but hardware for test bedding with WFCAM data is required in Phase A.
Disks
During the first year of WFCAM operations it will be essential to store all the raw data online so that we can characterise the instrument. We will need to store both the raw and pipeline calibrated products. This will require 30 TB of disk storage and 2 tape drives to read the raw data. We will need to start to buy some storage early in the program but will postpone as much as possible. 2TB (£10k) are needed immediately for test pipelines running on available CIRSI data (4x 1024^2 chips instead of WFCAMs 4x2048^2) as a surrogate for WFCAM/VISTA data for development work. In June 2003 we will buy a further 3TB (10K UK) to process WFCAM lab test data In Jan 2004, a further 5TB followed by 5TB every 2 months during 2004 (at £2.5k/TB).
CPU
We have estimated the processing requirements for the WFCAM test bed pipeline using our prototype pipeline as developed to process IR data from the IoA CIRSI camera and the WHT INGRID camera (this pipeline can be regarded as a prototype for the WFCAM pipeline which itself is a prototype for VISTA). We find a pixel processing rate of 0.2-1.0 * 105 pixels per second depending on the level of sophistication of processing used. This has been timed on a dual processor (though only using one of the processors) 700Mhz P3 system configured with a SCSI-IDE raid array. Such a system would take up to 400 hrs to process a single 8hr night of UKIRT WFCAM data or 1,600hrs to process an 8hr night of VISTA data. In the case of the WFCAM test bed we would like to be able to process 2 nights of UKIRT data per 24hr day i.e. 12hrs. At todays processing rates WFCAM would require a 32 node Beowulf but assuming Moore's law a 16-node system should be sufficient for each of WFCAM (2004-Q1) and VISTA (2006-Q1). It is vital that we show that these assumptions are correct so we propose to purchase a 4-node Beowulf cluster immediately for the WFCAM test bed. We only need a 4-node machine (each 2.0GHz P4 plus 2GB or memory £3k) since each chip is effectively independent. In 2004-Q1 we will purchase a 16-node production Beowulf system for the WFCAM test bed (same cost per node-total £48k)
We also need 2 CPUs (2x3k) for the advanced pipeline development work to continue while WFCAM production pipeline work is being carried out.
Disks
WFAU needs to increase its disk capacity by a further 14TB for receipt of WFCAM data products and testing various data access solutions.
At the archive centre the big choice is disk-farm versus fibre-channel SAN plus RAID. At todays prices the former works out about £7k/TB and the latter about £30k/TB. However the former almost certainly implies quite a lot of nursing care and attention, so the "total cost of ownership" may not be so different. The SAN version is also very clearly scaleable and configurable, and may have much better total I/O both because of the fibres and because of the multi-hosting. For the archive, stability and I/O may be crucial so SAN makes sense so we want to test out the SAN route, despite its apparent expense and therefore request funds for 2x16 port SAN switches & fibres.
CPU
We will also trial a 50 node Beowulf cluster for speaking to different virtual disks in our RAID array.
| HARDWARE (£k) |
Apr02 - Sep02 |
Oct02 - Sep03 |
Oct03 - Sep04 |
over Phase A |
| Pipeline work at CASU |
||||
| CPU |
||||
| 4 nodes of 2GHz P4 +2GB memory |
£12.0 |
£12.0 |
||
| 16 nodes of >2GHz P4 +>2GB memory |
£48.0 |
£48.0 |
||
| Disks |
||||
| 2TB test early pipelines |
£10.0 |
£10.0 |
||
| 3TB to process lab data from real detectors |
£12.0 |
£12.0 |
||
| 25TB over 10 months for incoming WFCAM data |
£62.5 |
£62.5 |
||
| 2 CPUs for tests of advanced pipelines |
£8.0 |
£8.0 |
||
| CASU total |
£22.0 |
£20.0 |
£110.5 |
£152.5 |
| Survey Products Archive Access & Curation |
||||
| CPU |
||||
| 50 node Beowulf /or multiprocessor SMP machine |
£50.0 |
£50.0 |
||
| Disks |
||||
| Expand RAID to 14TB |
£50.0 |
£40.0 |
£90.0 |
|
| 2x16 port SAN switches & fibres |
£15.0 |
£15.0 |
||
| WFAU total |
£50.0 |
£40.0 |
£ 65.0 |
£155.0 |
| Total Hardware request |
£72.00 |
£60.0 |
£175.5 |
£307.5 |
| Apr02 Sep02 |
Oct02 Sep03 |
Oct03 Sep04 |
Total (£k) |
|
| Staff including indirect costs |
£ 178.8k |
£487.5k |
£572.0k |
£1,238.3k |
| Hardware |
£ 72.0k |
£ 60.0k |
£175.5k |
£ 307.5k |
| Total |
£ 250.8k |
£ 547.5k |
£747.5k |
£1,545.8k |
channel
data channel from each 2kx2k detector through to the backend of the pipeline
pointing
a telescope slew to position with new guide star acquisition. Note that a microstep sequence is done within one pointing
read
a read is the act of physically reading a detector. This generally means the act of digitising the data, and can be either destructive (i.e. followed by a clear) or non-destructive
exposure
an exposure is a sequence of one or more reads, that are used to produce a single output image
FITS
Flexible Image Transport System
integration
an integration is a sequence of one or more exposures. Whereas the exposure is the result of a sequence of reads, the integration is a summation (or similar) of a sequence of exposures
integration sequence
an integration sequence is a sequence of integrations as the name suggests. It is characterised by the types of operations that can take place during it; an integration sequence will contain oversampling operations etc.
microstep sequence
a sequence of 1, 2x2 or 3x3 integrations taken with sub-pixel shifts. A microstep sequence is one type of integration sequence
observation
an observation is a sequence of integration sequences that makes up a usable scientific observation. An observation is the level at which calibration frames (flats, skies, darks etc.) are collected
reduced frame
result of reducing an integration
superframe
the result from combining (interleaving) the reduced frames from a microstep sequence. Superframe is 2kx2k, 4kx4k or 6kx6k depending on microstep mode
loops
multiple exposures on one place to increase integration time
dithers
sequence of exposures or loops macrostepped in ''spiral'' pattern for use in bad pixel rejection, deep stacking etc ....
tile
any or all of four separate exposures, loops, dithers or pointings taken so as to cover a contiguous area of sky involving all 4 detectors
coadd
image resulting from combining loops taken sequentially
stack
image resulting from combining dithers including coadds if present
mosaic
complete contiguous image combined from tile exposures
confidence map
a normalised inverse variance weight map defining the ''confidence'' associated with the intensity value in each pixel, this also encodes for hot pixels, bad pixels, dead pixels and so on. The raw confidence maps for each frame output by the Data Acquisition System (DAS) will be derived from regular (frequency TBD) analysis of the calibration flats and darks. All output processed frames (stacked, tiled, mosaiced etc.) will also have an associated derived confidence map.
The Deliverables to ESO are defined in “Data Flow for VLT instruments Requirement Specification, VLTSPEESO190001618, Issue 1.00, Date 19990421” as spelt out in more detail in “Data Flow System Operations Model for VLT/VLTI Instrumentation, VLT-PLA-ESO-19000-1183, Issue 1.0, Date 9/12/96”. They consist of recipes, procedures and software necessary to run the Data Quality Control Pipeline at the telescope and the Calibration Pipeline at ESO HQ.
The general scope of the Data Flow System for the instrument is described in the Data Flow System User Requirements document. It defines the user requirements including operational scenarios which must be supported and therefore may impact definition of templates, ETC or pipeline procedures. Other Data Flow System documents (e.g. Calibration Plan) are based on these requirements.
Usage by ESO:
This document defines the high level scope of the Data Flow System for the instrument.
The Calibration Plan is the prime document which describes the different instrument specific components of the Data Flow System. It defines the following items:
1. all templates available for the instrument,
2. formats of all raw data frames produced,
3. data and procedures required for calibration,
4. observing programs to acquire calibration data,
5. procedures required for verification of instrument performance.
Usage by ESO:
This document defines detailed scope of the Data Flow System implementation for the instrument.
This document defines all special algorithms required for the reduction of raw data from the instrument. The Data Flow System pipeline will provide a set of standard functions (e.g. arithmetic on images, manipulations on table). Only algorithms which are not available and therefore must be developed should be specified in this document. The high level usage of these algorithms is defined in the Calibration Plan.
Usage by ESO:
This document defines the detailed data reduction procedures to be applied to raw data frames including input data and algorithms required.
The special algorithms, defined in the Data Reduction Specification, must be implemented and delivered to ESO. The set of procedures implementing these algorithms will be integrated into the Data Flow System Pipeline for the instrument by ESO. The configuration of the Pipeline will also be made by ESO based on the Calibration Plan.
Explicit deliveries:
1. one source code file of each data reduction procedure, not yet available, including full documentation,
2. one set of test data for each of these procedures.
3. one verification procedure for each of these procedures,
Usage by ESO:
The procedures will be integrated into the DFS pipeline by ESO who also will configure it.
It is essential that all raw data produced by the instrument are fully documented as defined by the Data Interface Dictionary. Further, it must be possible to calibrate and reduce data from instrument modes offered in Service Mode. A full set of raw data frames from the instrument must be provided to verify the conformance of the data to the Data Interface Dictionary. These data will also be used for the testing of the Pipeline by ESO. Finally, calibration data must be generated for the configurations offered in Service Mode. They will be used to generate a set of master calibration frames in the Calibration Database.
Explicit deliveries:
1. one full set of raw data frames produced by each template,
2. one full data set required to generate each calibration frame defined by the Calibration Plan.
Usage by ESO:
The data sets define the reference data for the instrument. They will be used the check the DID/DICB compliance of raw data and verify pipeline reduction procedures. Further, the calibration frames produced will initiate the first version of the Calibration Database. Estimated provided by the Exposure Time Calculator will also be verified using these data.
The parameters of a template are defined as its signature which is specified in the Calibration Plan. This information is coded in a file for each template and used by the P2PP system for definition of OB's. The valid range of the individual parameters are also given to make it possible for P2PP to perform sanity checks. Parameters which specify the use of optical components must refer to items defined in the Instrument Description and Calibration Database (IDC).
Explicit deliveries:
1. one Template Signature file for each template defined in the Calibration Plan including detailed descriptions its function and parameters.
Usage by ESO:
The Template Signatures will be used by the P2PP tool to create OB's for the instrument.
All detector and optical components which can be used in the instrument must be defined in the Instrument Description and Calibration Database. Further, all physical data required for calibrations and quality control must be defined as specified in the Calibration Plan if they are specific to the instrument. This may include tables of wavelengths for spectral calibration lamps and lists of standard stars with their magnitudes and positions.
Explicit deliveries:
1. one file for each detector or optical component defining its properties,
2. one file for each physical data set defined in the Calibration Plan.
Usage by ESO:
The data will be integrated into the Instrument Description and Calibration Database by ESO. They will be used by a) P2PP to define available components, b) ETC to define optical characteristics of components and c) reduction pipeline to access physical data required.
An Exposure Time Calculator must be available for all instrument mode. It is used by people during the proposal preparation phases to estimate the exposure time required for a given instrument configuration. Calibrations are also checked against the values predicted by the ETC to verify that it reflects the current state of the instrument. Modules, not yet available, must be delivered together with verification procedures. Further, an ETC instrument setup template and associated dictionary must be provided.
Explicit deliveries:
1. one instrument setup template including the associated dictionary,
2. one source code file for each component, not yet available, including full documentation,
3. one source code file of the instrument model with documentation if not yet available,
4. one file for each instrument setup with a verification procedure.
Usage by ESO:
The additional modules will be included in the general ETC library whereas the verification procedures will be included in the ESO regression tests of ETC's. ESO will create the GUI's required for the instrument ETC either by adding to existing ones or creating a new.
All data generated by the instrument are described by a set of data definitions e.g. keywords for FITS files. These data definitions describe the instrument state at acquisition and are used by the pipeline and quality control. The Dictionary is reviewed by DICB to ensure that it conforms to the ESO standards.
Explicit deliveries:
1. one file defining the Data Interface Dictionary for the instrument conforming to the DICB standard.
Usage by ESO:
The DID will be reviewed and accepted by DICB. After the acceptance, it will serve as the reference for the data description of raw data frames when used by the Science Archive, Pipeline and Quality Control.
--------------------------------------- end of document -------------------------------------