The DHARMA software architecture

Daniel Andresen, Mitchell Neilsen, Gurdip Singh,
Department of Computing and Information Sciences
234 Nichols Hall, Kansas State University
{dan, neilsen, singh}@cis.ksu.edu
Office: (785)532-6350, Fax: (785)532-7353

Michael C. Hirschi, Prasanta Kalita
Department of Agricultural Engineering
University of Illinois, Urbana-Champaign

Abstract:

The DHARMA domain-specific middleware system is intended to allow hydrologic field engineers to tackle water-management problems on a scale previously impossible without sophisticated computational management systems. DHARMA provides automatic data acquisition via the Internet; data fusion from online, local, and cached resources; smart caching of intermediate results; and interfaces with existing metacomputing systems. Our target watershed model, WEPP, is limited to very small watersheds with current computer technology. A revolutionary change in hydrologic modeling on the watershed scale will be brought about by applying WEPP to the 925 sq. miles Lake Decatur watershed.

In this paper we discuss the evolving software architecture of DHARMA and its interaction with the Web and Grid infrastructures.

Keywords: hydrology, distributed computing, XML

1. Introduction

The DHARMA project attacks two fundamental problems in today's hydrological research environment, particularly in the biological and environmental sciences. First, the inability of researchers to perform even simple investigations due to the inaccessibility and essential difficulty in acquiring and utilizing the necessary data. Often the data and simulation models are available via the Internet, but through a combination of obscurity, incompatibility, and inefficiency are essentially unusable. Second, local computing resources are inadequate to support modeling systems at the level desired, but interfacing to remote resources can require complex software installations and procedures. DHARMA addresses these problems through significantly easier access to the computational power and data acquisition capabilities of the Internet.

The objective of the DHARMA project is to expand the applicability of the WEPP (Water Erosion Prediction Project) model and the SITES (Water Resource Site Analysis) model to large watersheds, specifically applying the extended model to the 925 sq. mile Lake Decatur watershed in Illinois.

The DHARMA domain-specific middleware system is intended to allow hydrologic field engineers to tackle water-management problems on a scale previously impossible without sophisticated computational management systems (Figure 1). The system is intended for use not only by hydrologic researchers, but also by engineers in the field working on a day by day basis. These engineers typically have older, inadequate local computing power for the increasingly complex models required.

DHARMA has the following major points of functionality:

automatic data acquisition via the Internet for many types of data, in particular, geotemporal climatic data from online databases and digital libraries; data merging necessary for the computation to occur, from online, local, and cached resources,
smart caching of intermediate results to allow for reuse in future simulation cycles,
task graph acquisition from the UI layer, and optimization through application-specific knowledge and dynamic results caching strategies; execution time optimization through awareness of remote and local resources, including memory, processor, storage, and data, and
interfacing with computing resource managers, such as Globus and Condor [4,11].

Figure 1: DHARMA block diagram.

Major technical problems to be solved include: How do we automatically acquire the necessary data? How do we know what items can and cannot be cached? How can results be passed from simulation to simulation with dissimilar data formats? What Document Type Definition's (DTD) must be developed for information within this domain? What domain-specific task graph heuristics are needed for efficient computation?

To address these issues, we:

use the eXtensible Markup Language (XML) and DTDs as a lingua franca between simulations, remote data sources, and DHARMA components.
realize our primary goal is flexibility and ease in data acquisition and data transfer between simulations, which cannot be sacrificed for utility-limiting performance enhancements.
allow DHARMA to have multiple interfaces to existing metacomputing system, and use DHARMA to package up the computation into a tidy set for computation by these systems.
develop heuristics for scheduling task graphs, interfaces for the various resources, and application-specific caching techniques.

The paper is organized as follows: Section 2 gives the background needed on WEPP and XML. Section 3 discusses the DHARMA software architecture and various implementation issues. Section 4 outlines our current efforts towards automatically partitioning tasks for parallel execution, and Section 5 discusses related work and conclusions.

2. Background: WEPP and XML

WEPP The WEPP watershed model is a continuous simulation processes-based model which represents new soil erosion prediction technology based on fundamentals of stochastic weather generation, infiltration theory, hydrology, soil physics, plant science, hydraulics, and erosion mechanics [9, 10]. The applicability of the model, however, is limited to very small watersheds (up to few hundred acres) due to currently available computer technology in handling input data requirements. The model is capable of estimating spatial and temporal distribution of soil loss, runoff, and sediment yield, and can be extrapolated to a broad range of conditions that may not be practical or economical to field test. WEPP is the only model that is currently available for accurate prediction of soil erosion and sedimentation based on fundamental scientific theory and principles.

If the WEPP model can be advanced so that it can be applied to actual watersheds that are often several thousand acres, it will bring a revolutionary change in hydrologic modeling on the watershed scale. Currently, all watershed scale hydrologic models are based on empirical relationship and seldom include process interactions among soils, hydrologic, plant, and atmospheric processes. WEPP is the only model which accounts for soils, sediment transport, runoff, channel flow, plant growth, decomposition, snow melt, freeze-thaw effects, and climatic conditions. If this model can be enhanced for its applicability over larger watersheds, it will be the only such model which will provide accurate predictions and evaluate effects of alternative watershed management practices on watershed water quality. Developing the sophisticated automated data acquisition tools to support this endeavor will require substantial advances in distributed algorithms, which will in turn spur advances in other areas.

XML. We have chosen an XML-based data format for several reasons. Using XML leverages a huge wave of support from industry in terms of tools and source code. However, the primary advantage is the combination of device- and OS-independent data transfer, with the ability to retain its semantic structure [19, 3]. This is a critical advantage over text files or HTML. Unlike native TCP/IP sockets, the XML utilities allow the easy transfer of structured data. Furthermore, since the semantic information is retained, a single information source can serve multiple applications. For example, our weather data server can easily be used by other weather-based simulations.

Alternative technologies to XML include CORBA, sockets, NetCDF, Java RMI, and other custom networking protocols. CORBA typically is more complex for developers, but provides an object-oriented interface [16, 17]. NetCDF is designed for the exchange of scientific information, and offers an intrinsic Web interface, but has failed to achieve the same level of support across the entire Web industry as XML [13]. We hope to provide a gateway between our system and NetCDF data sources, due to the significant amount of scientific data available in that format. Java RMI provides an excellent mechanism for communicating between Java processes, but lacks the flexibility and genericity of XML [18]. Through the use of industry-standard information exchange mechanisms, we can provide for future expansion and compatibility with online information sources.

Figure 2: DHARMA layers diagram.

3. The DHARMA architecture

We now give an overview of the DHARMA architecture, as illustrated in Figure 2. Our architecture generally follows the standard n-tier model with a user interface, task logic, and, differing from the standard three-tier model, a cluster-based computation/data layer. This includes the multiple user interfaces, the job manager, automated data acquisition, and execution engine.

3.1 The DHARMA user interfaces

There are three possible interfaces that can be used to run a simulation in our system. The three interfaces are the WEPP Windows Interface, the Dharma Web Interface and the Drag and Drop Interface. Each of these interfaces retrieve data from the user and interact with the middleware in a different manner. The primary interface that will be used when the project is fully functional will be the Drag and Drop Interface. The other two interfaces will still be supported at the end of the project.

Figure 3: The WEPP Windows Interface.

WEPP Windows Interface The first of the three interfaces that can be used to interact with our system is the pre-existing WEPP Windows Interface. The designers of the WEPP simulation created a windows application that allows the users to run soil loss simulations on hill slopes and channels. This interface allows the user to input some basic attributes of the slope or channel that the simulation will model on. These inputs are then used to generate the files necessary to run the WEPP simulation.

Among the functions that the windows interface performs is the generation of Cligen(.cli) files. The closest weather station to the hill slope or channel is entered as a part of the initial data that the interface requests from the user. The interface has data files about the weather stations that it uses to generate a Cligen file.

The files that are created by this interface can be used to run a simulation on our system. One way to begin a simulation in one of the other two interfaces is to give them the files necessary to run a simulation. These files are then fed directly to the middleware and the simulation is executed. If a user is unable to build the files himself, but still wants to use our system, he may use the windows interface to create the files and feed them to our system.

Figure 4: Dharma Web Interface.

Dharma Web Interface The second of the three interfaces that are at the disposal of the users of the Dharma system is the Dharma Web Interface (DWI). The DWI uses HTML, XML, and Java Servlets. The web interface was designed in a manner that allows for easy integration of other simulations. The only system that the DWI currently supports is the WEPP simulation. However, adding new simulations to the interface is relatively painless.

The overall structure of this interface is fairly straightforward. A user enters the system and is presented with a dynamically generated list of simulations that they can run with our system. The user will select one of these options. When the user has selected the appropriate simulation, our system will dynamically generate a form from an XML document that describes the inputs necessary for the simulation. The user then enters the data and clicks submit. The submit button launches a Java Web Start application that checks to see if the Dharma servers have the data cached already. The application then bundles up the job and sends it to the middleware.

We chose to use a Java servlet solution to implement the data gathering part of the interface. We did this because it is relatively easy and clean to get the data via a form and servlet. The servlet also allowed us to create a clean and aesthetically pleasing interface using HTML and XML style sheets. The servlets are also easy to write and extend later on. With a servlet solution, if anyone needs to modify the web interface in the future they can simply write a function or add an XML document.

Figure 5: Dharma GUI process.

Java Web Start was used to gather information about the files on the client. The Dharma system caches files that it receives so that they do not have to be later uploaded again. In order to ensure that we were not uploading files that were already on the servers, it was necessary to gather information about the files while they were still on the client side. Java Web Start was selected to do this because with Java Web Start we were able to launch a normal Java application on the client. This application was able to access the local file system on the client with the users permission. In order for the program to run, the users of the system must install the Java Web Start plugin and the Java plugin.

As shown in Figure 5, after a job has been submitted, the web start application forwards the user to a results page that allows the user to check the progress of their job. When they check for progress they receive one of three responses. The first response is a message that says the job is awaiting execution. The second response is that the job in being executed. The third response is a list of hyperlinks to the results of the simulation. The user can click on the links to the files that they wish to download. The user only has to download the files that they need. They are not sent any files they do not ask for. This makes the system more efficient and user friendly.

When the user is inputting data, they are offered two options for each input file. They can submit a file or they can ask to have the file automatically generated. Currently, the user is only allowed to automatically generate climate files. The web interface will not be further updated to allow users to have other types of files automatically generated. If users want to automatically generate data, they will need to use the Drag and Drop Interface.

The DWI can be easily extended to support other types of simulations. All that must be added to the DWI for a new simulation is an XML document and a function in the servlet that handles input. The XML document describes the form for the simulation and the function will allow the data to be processed and handed to the Java Web Start application. The DWI will be maintained as a quick and easy way of starting a job by submitting the files necessary to run a simulation.

Drag and Drop User Interface The Drag and Drop Interface (DND) will be the main user interface for our system in the future (Figure 6). This system is currently a work in progress. Parts of this interface are currently functional. The rest of the interface will be completed by the end of December. The DND is an interactive interface that allows the user to create a diagram that visually represents a simulation run. The interface will be deployed on client machines using Java Web Start and is written in Java.

Figure 6: Dharma Advanced Interface - Drag and Drop Interface with two simulations on it.

The user is allowed to place buttons on the screen that represent simulations. These buttons can be dragged around the screen. This functionality was implemented using the Java Drag and Drop API. When a user clicks on the button for a simulation a dialog box will pop up that allows the user to input all of the data necessary to run the simulation. These dialog boxes will be dynamically created from XML documents that describe the necessary inputs for a simulation. After a user has filled in the information on a simulation in the dialog box, other icons will appear on the screen. An input icon will appear on the screen for each input into the simulation. These icons will be connected to the simulation button by arrows. There will also be a button that will appear on the screen that represents the output. There will be an arrow from the simulation button to the output button. When a user clicks on the output button they will see a dialog box that has one of three different data sets on it. The first of these will be a message that says the job is awaiting execution. This is the default until the job is run. The second says that the job is currently being executed. The third type is a dialog box that lists all of the files that were output by the simulation. Next to each file name will be a download button. When the user clicks on this button the file will be downloaded from the Dharma servers onto the clients machine.

The user will also be allowed to chain simulations together. Chaining of simulations will be shown on the diagram by lines drawn between the simulations. The user is allowed to draw the lines. This is accomplished by clicking on the line button on the tool bar and then drawing the line as one would do in a drawing application. When lines are drawn between simulations dependencies will be registered in the object that stores the data on a simulation.

Figure 7: Sample input dialog for WEPP simulation.

The dynamic generation of XML Dialogs is accomplished by using an XML parser written for this project. The parser will take in an XML file name and return a JDialog. This will be generic enough that it could be used in other projects needing the same type of functionality. The dynamic generation of dialog boxes will allow the system to be easily extended. All that will have to be added to the interface is an XML document detailing the necessary inputs for a simulation.

The interface will also have a file menu that will allow a user to open an existing diagram, save a diagram, and exit the interface. There will also be an Edit menu that allows a user to clear the diagram, start a job, and add simulation buttons.

When a user selects start a job, all simulations on the diagram will be executed. The interface will grab all of the data for the simulations on the screen and first check with the Metadata Manager (described in Section 3.4) to see if the files exist on the Dharma servers. The application will then bundle all of the information up into SOAP objects. These objects represent a simulation run. These objects will then be sent to the middleware for processing and execution. If there are multiple simulations on the same diagram that are dependent on each other, the dependencies will be sent to the middleware as well so that scheduling can be done.

The Drag and Drop interface will be a highly extensible user interface that will allow the Dharma project to function for many different simulations. The only modifications that will be necessary to extend the interface will be an XML document detailing the inputs necessary for the creation of a dialog box for that simulation. Furthermore due to the utilization of Java Web Start it will not be necessary for users to acquire new releases of the software. With Java Web Start new versions are automatically downloaded every time there is a revision to the software. This process is completely transparent to the users of the system. This interface will be the primary interface for the Dharma project in the future due to its easy extensibility.

3.2. Simulation Handling

Currently all handling of simulations is generic and ignorant of what a simulation needs for its inputs. The way the system currently runs, all that is required to add a simulation to the interface is an XML document specifying the inputs that are necessary for the new simulation.

When the user clicks on a button on the workspace, a dialog box is automatically generated from XML using Castor and Swing. If the button has already been clicked then the box that was previously created is made visible again. The buttons in the workspace are custom buttons that were created specifically for the Dharma project. They store all of the data that is entered in the dialog box as well as a vector that contains information about dependencies.

After a user has entered all of the necessary information on a simulation, they are now ready to run their job. This is done by selecting the ``Run Job'' option from the file menu. When the user runs a job, the interface runs through each button on the workspace and prepares it for submission.

It is sometimes necessary for the metadata manager to know certain information contained in the files that are being sent to it. In order to maintain the generic nature of the interface, we implemented a class loader that allows the interface to get the information from the files without actually knowing anything about the files. We store classes that can be used to extract the data from the files on the Dharma servers. When the UI is getting ready to check with the metadata manager to see if it should send a file, it downloads this class and calls the method that extracts the data. Once the data is extracted to the file it is added to the object that is sent to the metadata manager. The object is then sent to the metadata manager and the job continues in the same manner that it did previous to the changes.

By using the class loader we are able to maintain the generic nature of the interface. Had we not used the class loader, it would have been necessary for the interface to extract that data. This would have compromised the generic nature of the interface, because for each simulation we would have had to add code that extracted the fields. This would have made the work done to dynamically generate dialog boxes worthless because we would have had to add code anyway.

3.3. Middleware

The purpose of the middleware is to provide a flexible way of scheduling large computing jobs. In the context of Dharma, a job is composed of many different subtasks. The individual subtasks consist of a task type and resources. The task type is used to figure out what mechanism should be used to process the task, and the resources serve as the initial input data. The tasks are arranged into a directed acyclic graph (job graph) that describes the dependencies between the different tasks. If a task has any children, it means that those children cannot run until their parent is complete and it also means that they also have access to any of the resources that their parent creates.

Figure 8: Dharma middleware overview.

The middleware provides an interface for adding jobs, checking on the status of jobs (and individual tasks within a job), and getting the results of jobs (and tasks). Status objects have two attributes, a status code and a status string. They work similar to HTTP error codes where the status codes have well known predefined meanings, and the status string provides further, detailed information of the fault to the user. The resources that move around in the system also are not simply raw data. Attached to the resources is information that defines the type of data it is, metadata that describes the data, and either a URL that describes where the data is or the raw data itself. The advantage of using URLs instead of raw data is that it reduces network overheard when passing resources from component of the system to another.

This middleware interface is made available to remote clients via SOAP. Once the front-end has generated one of these job graphs it uses that interface to submit the graph. Once the middleware has received a graph, it is responsible for making sure that all of the tasks are executed in the proper order (i.e. no child is executed before its parent) and the execution of tasks proceeds in an optimal manner. In the current implementation, the middleware immediately executes any task that has no parent (or its parent has finished execution).

Figure 9: Middleware details..

Because this approach to scheduling may not be desirable in the future, the software has been designed in such a way to allow different scheduling algorithms to be used. There is a Java interface that defines all of the methods that the scheduler needs to implement. The interface defines methods for starting and stopping jobs, checking on the status of jobs, and providing results. Once a new scheduler has been created, it is only necessary to change a configuration file to point to the class name of the new scheduler. The middleware dynamically instantiates the proper type of scheduler and uses it. Adding the capability to handle new tasks is equally as simple. There is an interface that defines the methods that a task executer is to implement which allows the job executer to mange it. All that is needed to extend the middleware to handle a new task type, is to create a class which implements the interface and update the configuration file to associate that task type to the new task executer.

One of the future improvements that are planned for the middleware is improve the scheduling algorithm. As previously stated, it immediately executes any runnable task on any machine that is available. Examining the graph and figuring out better times and places to run tasks could optimize the scheduling. For instance, if there is a series of tasks that create large amounts of data, it would be time consuming and wastage of network bandwidth to try to move that data from machine to machine as tasks are executed. Instead, that series of task should be run on one set of machines. Another improvement for the middleware is to allow for translation of resources from one type to another. Sometimes the output format of one task will not match the input format of another task and it will need to be changed. This should be done automatically in the middleware.

3.4. Metadata Manager

In order to make the system more efficient, we cache files that are submitted to us by the users when they submit a job. We also cache the output files that are created by simulation runs. To solve the problem of caching we created the Metadata Manager (MDM). The MDM is responsible for maintaining information on all of the files that have been cached on the Dharma servers.

The MDM stores information on all of the files in the system. The primary things that are stored by the MDM are file name, file size, checksum of the file, file type, the lat- long box that the file stores data on, location on servers, and many other specialized attributes. The manager was designed to be easily extensible. For each type of file in the system there is a Java class that represents the data stored on that file. These classes contain information that is stored about the files. There are also methods in these classes that tell a database how to create tables for this type of file and create queries to get information about these file types. All that is required to add a new file type to the system is a class that tells the manager how to set up and access data on files of this type.

The actual data on the files is stored in a Postgres database. We chose to use this type of database because it is easy to maintain and set up. The MDM can easily be adapted to work with any type of database and query language. The methods in each class that tell how to create a query produce abstract queries that can be translated into any type of query language.

The main action of the MDM is performing searches on the database. The MDM allows user to do searches on just about any of the data in the system. It is possible to search based on filename, checksum, file type, and lat-long box. This is allowed to make the manager versatile and extensible.

To access data in the MDM you must create a SOAP request that the MDM will process and return the results of the query. When looking for a file matching some specific parameters, the user will be returned a list of URLs to where the files are located on the Dharma servers. Users of the system are not allowed to directly access the database that stores the information on the files for security reasons. It was determined that it would be dangerous to allow anyone access to the database. The abstraction added by the MDM provides a security buffer between the metadata and the outside world. This will also allow us to later implement the idea of file ownership. In the future users will only be allowed to access files that you have uploaded to the system.

The Meta Data Manager was a necessary buffer between the metadata and outside world. The main reason for this was security. However, we also had to have an efficient way to store information on files that we have cached so that our system would run efficiently. In our test system, execution times typically drop over 70% compared to uncached execution (1178ms vs. 400ms from job receipt to WEPP startup), indicating the overhead introduced by our system is small and the caching is effective.

3.5. Automatic weather data acquisition

DHARMA is a Domain Specific Metaware for Hydrologic Applications. It intends to build a middleware layer to provide the resources needed to revolutionize hydrologic modeling. The required resources range from local data to the supercomputing power on the national computational Grid. In hydrology, the project proposes to expand the applicability of the WEPP (Water Erosion Prediction Project) model and the SITES (Water Resource Site Analysis) model to large watersheds, specifically applying the extended model to the Lake Decatur watershed in Illinois [10, 7, 8, 15, 14].

Automatic data acquisition over the Internet is one of the major points of functionality of Dharma. The module described in this section is an initial step towards the automatic acquisition of geotemporal climatic data from online databases and digital libraries.

The data acquisition module has the following broad goals:

Automatic data acquisition via the Internet for geotemporal climatic data from online databases and digital libraries
Merging the data necessary for computation to occur, and storing it in an XML annotated form, to facilitate easily conversion to different formats.
Conversion of the XML annotated data to different formats as may be needed. More specifically, this module performs the conversion for WEPP. The conversion is performed by using an XSLT stylesheet.

Currently, DHARMA uses CLIGEN (a Climate Generator program) to provide WEPP with climatic data over a given time span in a per-point-per-day format. The WEPP interface takes cligen-formatted files as input. This approach has two disadvantages:

Cligen can generate the data only for weather stations for which past data is available.
Since cligen requires historical data, it cannot generate per-point-per-day data for any arbitrary point
Cligen also cannot interpolate data from nearby stations to give an approximation of the weather at a particular point during a given time span.
Cligen can generate data only for one weather station at a time. However, often we may need data for a set of points.

This module addresses all these issues. Thus, it can effectively be used as a replacement for CLIGEN, to provide data gathered in real-time to the WEPP interface.

Data acquisition architecture Overview

Figure 10 depicts the architecture of this module and its environment.

Figure 10: Automatic weather acquisition overview.

The Online Weather Database/Data Library collects data from various weather stations. It serves out this data in the form of XML. The actual module being described is the Data Acquisition module. This module accepts requests from users, interacts with the Online Weather Database/Data Library, and stores its results in Cligen format.

When requesting for data, users need to specify the set of points needed as a rectangle or box, in terms of the latitude/longitudes of the topleft and bottom-right corners of the box. More specifically, requests for data must have the following parameters:

Latitude/longitude of top-left corner
Latitude/longitude of bottom-right corner
Horizontal spacing between points in the rectangle specified above
Horizontal spacing between points in the rectangle specified above
Starting day, month and year for the climatic data
Ending day, month and year for the climatic data
Maximum distance of any point from stations to be used for interpolating data
Maximum number of stations to be used for interpolating data.

On receiving a request, the module performs the following actions:

Break the input request into a set of requests for individual points. For this, it interpolates the individual latitude/longitude for each point in the specified grid.
Fetch the data for each point for each day in the specified time span, from the Online Weather Database/Data Library. The fetching is done in a multithreaded fashion, with one thread being used for each point in the grid
Convert the fetched data into an XML annotated format.
Convert the per-point climatic data into cligens output format using XSLT. The conversion to the cligen format is again done in a multi-threaded way.
Merge the XML data for all points into a single file, and store the file to disk
Merge the cligen formatted data for all points into a single file, and store the file to disk

The user can use the files created above directly, or they may be used as input to the WEPP interfaces.

Performance is very acceptable, averaging approximately one minute to retrieve two years worth of weather data for over 700 points from a local weather web server. This information is cached, so repeated invocations are very efficient.

3.6. Backend

The task of running a WEPP job is important and consists of several parts. First, a job must be passed to the ExecutionManager from the middleware. Next, a WEPP job must be formed. This process includes building the WEPP input control file and setting up the Condor environment. Then, the process be run via Condor. Finally, the ExecutionManager returns after the job has completed with a designated WEPP output file. There are a few reasons for using Condor and several changes must be made in the way jobs are executed in order to execute of WEPP jobs with Condor or other toolkits like GLOBUS.

Jobs are passed via SOAP from the middleware to the backend, ExecutionManager for job processing. ExecutionManager began as a Java servlet but has since become just a normal Java object used for job execution. The parameters for executing a job include the unique job ID, and the location of input files for WEPP. ExecutionManager first checks to make sure all the files exist and copies any remote files to the local filesystem. Then it creates a directory for the unique job. This directory stores files, results, etc. for the job.

Once the necessary functions of handling files has been taken care of, the WEPP input file needs to be created. This input file contains the answers to various questions WEPP asks the user during execution. This file is given as standard input to the WEPP executable. Currently, the file that is created is static, that is the file is the same for each WEPP job that is run. Obviously, this problem needs to be addressed. Several options are being looked into on what is the best way to dynamically create the input files.

Setting up the Condor environment is relatively easy. A Condor control file must be created and then a job is submitted to the Condor job queue. The Condor control file consists of a few elements, including the starting path of the job, the executable, the standard input, output and error streams and any environmental variables. ExecutionManager creates a DOM with these values and then uses an XSLT to transform the DOM into the Condor control file. Creating an XML DOM seems like overkill, but it is useful in that in many cases the same DOM can be used to create the control file for other toolkit programs, like GLOBUS. All that is required for creating either a Condor or GLOBUS control file is the appropriate XSLT stylesheet.

Next, the job is submitted the Condor job queue. ExecutionManager creates a new process to run the Condor executable, condor_submit. The program, condor_submit, reads the control file for the job and then submits the job to the local condor_master for execution. The execution can take place on any machine running a condor scheduler. Job assignment is handled by the condor scheduler that is running on one machine and has access to the resources on available machine. Condor tries to match a job with a machine that is available.

Once the job is submitted to the condor_master, ExecutionManager waits for the job to finish and then returns a file that is used for analysis of the WEPP results. This is done by simply waiting until the appropriate file exists in the correct directory. ExecutionManager then creates a byte array from the file and returns the byte array back up to the middleware software. In the future, ExecutionManager will create the process for submitting the job will forked off and then ExecutionManager will simply return. It is impractical for ExecutionManager to wait for a file to appear. This leaves up open the possibility of deadlock and such. Instead, a new utility class will be created that is used for polling the Condor system for job status and returning any necessary files once the job has finished.

Condor is used currently because it has several features that the GLOBUS toolkit does not come with by default. These features include built-in scheduling, resource brokering and monitoring and an easy to use interface with GLOBUS gatekeepers. While only Condor is currently being used, future plans include configuring Condor to interface with GLOBUS gatekeepers and the National Computing GRID. According to the documentation available, this should be relatively easy, and actually easier for both network administrators and developers. Condor allows for easier development because much of the guts of resource monitoring and scheduling are hidden from the applications developer. Also, Condor can handle the keys for GLOBUS gatekeepers and make using those resources much easier.

The entire process of receiving a job and beginning execution takes approximately 1s. for simple jobs, which is more than reasonable given the levels of functionality offered. In general, the time required for data transfer and computation will signficantly outweigh the scheduling/job creation overhead.

4. Future work

We are currently implementing a scheduling system for WEPP tasks. The monolithic style of watershed execution does not achieve any of the scalability goals of this project; we are only now beginning to add functionality for distributed execution.

The essential problem when considering a method to execute a WEPP job in a distributed environment is to convert a WEPP job from a large watershed to a sequence of individual tasks whose simulation dependencies are modeled as a tree structure (e.g., an upstream hillslope would be represented as a child node of its downstream dependency). A portion of the tree which corresponds to an amount of work that possesses a desirable amount of network communicator vs. computation time will be processed on any one of the available hosts in Condor space. This newly-generated WEPP job is shipped to Condor and executed, and its resulting directory structure moved back into DHARMA space. The results from the sub-tree's individual hillslope and channel elements will be accumulated, and they will collectively be represented as the net simulation data for their parent node.

After the aggregation of individual hillslope simulations has been performed for a branch of the tree, the collective data for this subtree may be used as input (masquerading as a single hillslope or channel) to other portions of the overall WEPP watershed closer to the root of the dependency tree.

Figure 11: A typical WEPP watershed job divided into autonomous hillslopes, with dependencies shown.

As shown in Figure 11, a multiple-channel, multiple-hillslope simulation is called a watershed in WEPP. We have isolated each hillslope element of the watershed as an individual unit of computation. We consider a hillslope to be an ideal atomic unit of parallel computation because-by definition-runoff from one hillslope does not flow into another.

A branch of the dependency tree has been isolated in Figure 12. Note that there are only three hillslopes feeding the channel in the sub-watershed. This owes to two factors:

WEPP itself only supports up to three hillslopes which empty into any one channel.
Because we control the size of the sub-watersheds (i.e., they are defined to meet our purposes), we elect to maintain a relatively small size in order to maximize parallelization.

Figure 12: A sub-watershed is generated for computation on a node.

We will invoke the WEPP binary upon the results of the hillslopes inside the ellipse of Figure 2. The cumulative water runoff from the ``watershed'' will be used as the displacement through the channel c when this channel is used as an input to a later watershed.

Because a large WEPP job will be represented as a dependency tree, we expect significant performance enhancements when a large simulation is re-executed with only key changes made to upstream elements. All the aggregate data along non-affected paths of the tree are already at hand, and may be reused without incurring the penalty of recomputing branches of the tree which are known to be the same as for previous runs.

5. Related work and conclusions

Presenting a simple interface to the researcher and educator, with its seamless and transparent access to distributed databases, is a daunting technological challenge. Various other distributed computation systems have previously been developed but almost invariably are aimed at automating the distribution of existing scientific programming and data. Systems such as SWEB++, Ninf, AppLeS, Globus, and Legion all will schedule computation quite competently [1, 12, 2, 4, 6]. However, they fail to address the most fundamental difficulty in conducting any sort of operation on distributed data sets - the acquisition and matching of data to the models being used. Furthermore, the interface presented to the user is often more suited to the professional programmer rather than a researcher in the physical sciences. Systems devoted to universal data access, such as the WebHLA and DATORR projects, lack the domain-specific knowledge to acquire and optimally use the necessary hydrological data [5].

In this paper we have presented the architecture of the DHARMA system, which, through the use of standards from the computational Grid and the WWW, meets our goals of increasing user capabilities while maintaining simplicity of use. We show how we have implemented automatic weather data acquisition, multi-level caching, and an extensible set of user interfaces to bring hydrologists into a new era.

Acknowledgments

This work was supported in part by startup funds from Kansas State University, and NSF ITR-0082667. We also wish to thank Ryan Porter, Jesse Greenwald, Dan Lang, Matt Hoosier, Mahesh Kondwilka, and Doug Armknecht for their efforts and enthusiasm.

Bibliography

[1] D. Andresen and T. Yang.
Multiprocessor scheduling with client resources to improve the response time of WWW applications.
In Proceedings of the 11th ACM/SIGARCH Conference on Supercomputing (ICS'97), Vienna, Austria, July 1997.
[2] Fran Berman, Richard Wolski, Silvia Figueira, Jennifer Schopf, and Gary Shao.
Application-level scheduling on distributed heterogeneous networks.
In Proceedings of Supercomputing'96, Pittsburgh, PA, November 1996.
[3] Tim Bray, Jean Paoli, and C.M. Sperberg-McQueen.
Extensible Markup Language (XML) 1.0.
W3C, February 1998.
http://www.w3.org/TR/1998/REC-xml-19980210.
[4] I. Foster and C. Kesselman.
Globus: A metacomputing infrastructure toolkit.
The International Journal of Supercomputer Applications and High Performance Computing, 11(2):115-128, Summer 1997.
[5] G. C. Fox, W. Furmanski, G. Krishnamurthy, and H. T. Ozdemir.
Using WebHLA to integrate HPC FMS modules with Web commodity based distributed object technologies of CORBA, Java, COM and XML.
In A. Tentner, editor, High performance computing: HPC '99: Conference; 7th -- April 1999, San Diego, CA, PROCEEDINGS OF THE HIGH PERFORMANCE COMPUTING SYMPOSIUM 1999; 7th, pages 273-278, San Diego, CA, USA, 1999. Society for Computer Simulation.
[6] Andrew S. Grimshaw, William A. Wulf, James C. French, Alfred C. Weaver, and Paul F. Reynolds, Jr.
Legion: The next logical step toward a nationwide virtual computer.
Technical Report CS-94-21, Department of Computer Science, University of Virginia, June 08 1994.
[7] C.G. Henry, P.K. Kalita, and M.R. Keaton.
Using WEPP Watershed Model for Crosscreek Watershed, Kansas.
In Proceedings of the 1997 ASAE Mid-Central Annual Meeting, St. Joseph, MO, ASAE Paper No. MC97-133, 1997.
[8] J.K. Koelliker, M.R. Mankin, P.K. Kalita, and C.G. Henry.
Integrated approach for watershed and lake water quality assessment.
In Proceedings of the 1997 ASAE Annual International Meeting, Minneapolis, MN, ASAE Paper No. 972151, 1997.
[9] J.M. Laflen, L.J. Leonard, J. Lane, and G.R. Foster.
WEPP: A New Generation of Erosion Prediction Technology.
Journal of Soil and Water Conservation, 1991.
[10] L.J. Lane and M.A. Nearing.
USDA-Water Erosion Prediction Project: Hillslope profile model documentation.
Technical report, NSERL Report No. 2, 1989.
[11] Michael J. Litzkow, Miron Livny, and Matt W. Mutka.
Condor - A Hunter of Idle Workstations.
In Proceedings of the 8th International Conference on Distributed Computer Systems, pages 104-111. IEEE, June 1988.
[12] Ninf, a network based information library for global world-wide computing infrastructure, 1996.
http://hpc.etl.go.jp/NinfDemo.html.
[13] Russ Rew and Glenn Davis.
NetCDF: An interface for scientific data access.
IEEE Computer Graphics and Applications, 10(4):76-82, July 1990.
[14] D.M. Temple and M.L. Neilsen.
SITES' New Face.
In Proceedings of the Assoc. of State Dam Safety Officials Western Regional Conference, May 1997.
[15] D.M. Temple, H. Richardson, H.F. Moody, H. Goon, M. Lobrecht, J. Brevard, G. Hanson, E. Putnam, D. Woodward, and N. Miller.
Water Resource Site Analysis Computer Program (SITES) - User's Guide.
Technical report, USDA, 1996.
[16] Steve Vinoski.
Distributed Object Computing with CORBA.
C++ Report, 5(6), August 1993.
[17] Michael Weiss, Andy Jhonson, and Joe Kiniry.
Distributed Computing: Java, CORBA, and DCE.
Open Software Foundation Version 2.1, February 1996.
[18] Anne Wollrath and Jim Waldo.
Trail: RMI.
The Java Tutorial, July 1999.
http://java.sun.com/docs/books/tutorial/rmi/index.html.
[19] Extensible Markup Language (XML).
W3C, July 1999.
http://www.w3.org/XML/.
Copyright is held by the author/owner(s). WWW2002, May 7-11, 2002, Honolulu, Hawaii, USA. ACM 1-58113-449-5/02/0005.