October 10, 2007
Grid-Enabling Resource-Intensive ApplicationsFrom one computer to manyTimothy Hoehn and Bob Zeidman
Our authors take an application designed to run on a single PC and distribute its processing over a network grid.
Tim is a research engineer at Zeidman Consulting. He can be reached at tim@zeidmanconsulting.com. Bob is the president of Zeidman Consulting. He can be reached at bob@zeidmanconsulting.com.
In this article, we discuss how we took an application designed to run on a single PC and created a framework that enabled it to distribute its processing over a network grid. The original application is called "CodeMatch" and the grid version we created is "CodeGrid." CodeMatch detects software plagiarism by comparing two directories of possibly thousands of source-code files, then producing a report showing all matching patterns found (www.ddj.com/dept/architect/184405734). CodeMatch is used in court cases to help prove or disprove copyright infringement. Because its algorithms are very rigorous, they are also very time consuming, and CodeMatch can take a long time to compare large numbers of files. For this reason, we were seeking a way to speed it up and came up with the idea of a network grid for resource sharing.
The problem we were trying to solve can be defined as follows:
There are many different ways to share resources between multiple computers over a network, and in our research, we compared some of these by listing the respective advantages, disadvantages, and the trade-offs of choosing one way over another (see Table 1). The main advantages of distributed computing are increased performance by sharing the load of resource-intensive applications, improved scalability, and fault tolerance. These advantages come at a cost however of added complexity, potential security problems, and increased manageability issues. To develop a successful distributed system, careful planning is necessary as remote calls can be 1000 times slower than a local call. There has to be a balance between the added overhead of the network management and the size of the job itself.
Table 1: Architecture comparison.
In the search for the best approach to this problem, we explored several possible solutions, using different programming methodologies (such as procedural and object-oriented programming) and different frameworks (such as .NET and Java frameworks). We looked at the various software tools available, and different communications methods such as Remote Method Invocation (RMI) and Remote Procedure Calls (RPC). We also looked at different algorithms for distributing the work such as the master-slave paradigm and the push-pull model. While our design choices are by no means the only ones, they represent what we felt were the best choices in this particular project.
|
|
|||||||||||||||||||||||||||||||||||||||||||||||||||
|
|
|
|