Abstract: |
Grid-enabling existing stand-alone data mining programs, data and other resources, such as computational servers, is motivated by the possibility for their sharing via local and wide area networks. Expected benefits are improved effectiveness, efficiency, wider access and better use of existing resources. In this paper, the problem of how to grid enable a variety of existing data mining programs, is investigated. The presented solution is a simple procedure, which was developed under the DataMiningGrid project. The actual data mining program, which is a batch-style executable, is uploaded on a grid server and an XML document that describes the program is prepared and registered with the underlying grid information services. The XML document conforms to an Application Description Schema, and is used to facilitate discovery and execution of the program in the grid environment. Over 20 stand-alone data mining programs have already been grid enabled by using the DataMiningGrid system. By using Triana, a workflow editor and manager which represents the end-user interface to the grid infrastructure, it is possible to combine grid enabled data mining programs and data into complex data mining applications. Grid-enabled resource sharing may facilitate novel, scalable, distributed data mining applications, which have not been possible before. |