Find it, Plot it, Grab it: Distributing Climate Data Via the Web

Julia Collins
Roland Schweitzer
NOAA-CIRES Climate Diagnostics Center
Boulder Colorado 80309, USA
jac@cdc.noaa.gov
rhs@cdc.noaa.gov

Abstract

Scattered archive locations and the sheer volume of data make it difficult for climate researchers to locate desired information. The goal of the NOAA-CIRES Climate Diagnostics Center (CDC) data providers is to make the process of locating, acquiring, and sharing climate data one which enables and encourages researchers' individual and collaborative activities. This poster describes our approach to providing Web access to our climate data holdings. We describe our use of the Web -- in combination with file metadata and established visualization tools -- to make the process of finding, evaluating, and retrieving climate data a straightforward and user-friendly one.

Introduction

A vast amount of climate data exists at repositories around the globe, including the NOAA-CIRES Climate Diagnostics Center (CDC). The scattered archive locations and sheer volume of data make it difficult for researchers to locate desired information. This situation is compounded by the fact that many of these data sets are stored in unfamiliar formats, or are simply not well-described, and therefore difficult to access and use. Collaborative opportunities may be lost when a scientist generates a specialized data set, but does not have the means to advertise this resource to the rest of the research community.

As data providers at CDC, our goal is to make the process of locating, acquiring, and sharing climate data one which enables and encourages researchers' participation. The Web allows us to address many of the difficulties inherent in advertising and distributing data to a worldwide research community. The nature of hypertext, combined with the comfortable Web browser interface, allows members of this research community -- most of whom are not computer scientists by training -- to easily navigate our site from any location on the Internet and retrieve information relevant to their work. The efficient access to data and ability to present current information in a timely manner made possible by the Web also allows researchers to share information more readily and strengthen their collaborative efforts.

This paper will describe our approach to providing Web access to our climate data. It builds upon and extends our previous work regarding search techniques (Collins and Schweitzer, 1995) into the areas of data browsing and retrieval mechanisms. In general, we will describe how the use of the Web as a data distribution platform allows us to...

Our current Web site is the result of a variety of decisions made regarding our data storage mechanisms, the tools used to manipulate the data, and the CGI processes we generated to integrate our HTTP server, the stored data, and data manipulation processes.

Data Storage

An important part of our data access scheme is the consistent structure of our climate data sets. Our formally maintained data sets are all stored in netCDF, a machine-independent "Common Data Form." The metadata in these files conform to the conventions adopted by participants in the Cooperative Ocean-Atmosphere Research Data Service. These metadata include information such as the names and units of all of the data's physical coordinates, geographic extents, the time range covered by the data, and the minimum and maximum data values. While most of the data at CDC are maintained by software engineers in a data management role, local scientists frequently produce their own small data sets which they wish to share with collaborators. To encourage the use of netCDF in these efforts, we have generated a Web-based form and supporting software tools which allow researchers unfamiliar with the details of netCDF to easily produce netCDF files. This allows those of us in a data management role to ensure that we have the potential to apply our standard advertising, searching, visualization and retrieval mechanisms to these customized data.

Data Manipulation

Many climate researchers are familiar with the Grid Analysis and Display System (GrADS), a toolkit for manipulating and plotting scientific data. Given the acceptance of the GrADS capabilities, and recent enhancements to GrADS enabling it to parse netCDF files, we decided to use it as a mechanism for visualizing our data via the Web. Since GrADS may be run in a batch mode, it is well-suited to incorporation in a CGI process.

Putting the Pieces Together in the Web Environment

Finding It

Our search implementation strategy for our data holdings has been described in previous publications (e.g., Collins and Schweitzer, 1995). We use a modified version of the Harvest Information Discovery System to index and search our data. The search indices are constructed from the consistent metadata inherent in the netCDF files (described above). This allows us to construct a search interface which guides users in their choice of keywords and helps to ensure the success of the search effort.

The results of the search are formatted to provide a direct hyperlink to the data visualization process, the netCDF file itself, the metadata associated with the file(s) identified by the search, and to a description of the data set containing the individual data file(s). The user can then follow the desired hyperlinks to evaluate, and, if desired, retrieve the data. These hyperlinks are illustrated in the Web page subset shown in Figure 1.

Figure 1: Hyperlinks associated with search results Hyperlinks associated with search results

Plotting It

The data visualization Web interface takes advantage of the netCDF metadata to build a series of Web documents customized to the data being examined. Figure 2 shows the Web page generated when the user follows the Graphically browse file data hyperlink shown in Figure 1. The interface in Figure 2 is constructed dynamically by a CGI process which populates the text fields and scrolling lists with values extracted directly from the netCDF file. The names and minimum and maximum values for geographic extents, time ranges, and level or height coordinates are presented by default in the Web form's text fields, scrolling lists, and menus.

Figure 2: Visualization Parameters Visualization Parameters

After modifying coordinates of interest, the user submits the form to generate a plot of the data (Figure 3). The CGI process in this case executes GrADS with the appropriate data parameters. The GrADS process generates a GIF image which is then included in the resulting Web page.

Figure 3: Visualization Output Visualization Output

This series of actions gives the user direct, real time access to data they are interested in, and presents them with an opportunity to examine the features of those data before retrieving the file and storing it on their local disk.

Grabbing It

The user has two options for retrieving data file(s) via the CDC Web interface: They may follow the search results hyperlink directly to the netCDF file (see Figure 1), or after graphically browsing the data, they may follow the FTP the data (in netCDF format) used to generate this image hyperlink at the bottom of the visualization output (see Figure 3).

The first action -- that of following the search result hyperlink directly to the netCDF file -- takes advantage of Web browser FTP support to retrieve the file. The second action -- following the hyperlink associated with visualization output -- executes a CGI process to extract the desired range of data from the netCDF file and create a new netCDF file. It then again utilizes Web browser FTP support to enable the user to retrieve the file subset. Both situations shield the user from the details regarding the FTP site, and the second situation also transparently handles the subsetting of the netCDF file for the user. The goal (and hopefully result) of this sequence of actions is to make the location and retrieval of climate data as easy as possible.

Summary

The Web interface was designed to be easily applied to any data conforming to the netCDF conventions described above. Originally developed using one particular data set, the Perl scripts which support the visualization interface have been integrated into the search results output for all of our formal data holdings. Thus, any data which are made available in our search indices may also be visually browsed by the user. These CGI processes may also be applied to any user-generated data which are stored according to our netCDF conventions.

The World Wide Web is an integral part of our efforts to give a large segment of the climate research community information about and access to our data holdings. The combination of Web protocols and well-defined data storage mechanisms allows us to give researchers an opportunity to graphically and textually access data regardless of the physical location of the user - they need only have an Internet connection and Web browser to explore the CDC data holdings.

References

  1. Hankin, S.C., cited 1997: The COARDS netCDF profile. [Available on-line from http://ferret.wrc.noaa.gov/noaa_coop/coop_cdf_profile.html.]

  2. Collins, J.A., 1996: Communicating Distributed Search Results to Distributed Data Servers. Preprints, Twelfth International Conference on Interactive Information and Processing Systems for Meteorology, Oceanography, and Hydrology, Atlanta, GA, AMS, 397-400.

  3. Collins, J.A. and R.H. Schweitzer, 1995: Applying Metadata to the Search Interface: Constructing Effective Local and Distributed Searches of Web-Based Scientific Data. Poster Proceedings, Fourth International World Wide Web Conference: The Web Revolution, Boston, MA, O'Reilly & Associates, Inc., 36-37.

  4. Collins, J.A., J.D. Scott, C.A. Smith, and M.A. Alexander, 1997: Climate Data Visualization on the Web: Implementation Details of the Climate Diagnostics Center Web Atlas Interface. Preprints, 13th International Conference on Interactive Information and Processing Systems for Meteorology, Oceanography, and Hydrology, Long Beach, CA, AMS, 157-159.

  5. www@grads.iges.org, cited 1997: The Grid Analysis and Display System [Available on-line from http://grads.iges.org/grads/head.html.]

  6. Rew, R.K., cited 1997: Unidata netCDF [Available on-line from http://www.unidata.ucar.edu/packages/netcdf/.]


Author correspondence should be directed to Julia Collins at:
CIRES
Campus Box 449
University of Colorado
Boulder, Colorado, USA, 80309-0449
Voice: 303-492-0842
FAX: 303-492-2468
E-mail: jac@cdc.noaa.gov




Return to Top of Page
Return to Posters Index