python rpcServices

Introduction
Examples

webCrawler
simple stack example

Documentation
Download
Installation
Author

Introduction

rpcServices is a python library for managing scalable distributed components via xml-Rpc. A master server provides several services to local and remote client processes. The API is simple and easy to handle. The clients uses a Stub object to contact the server. The master process maps the requests to his ServerSideObjects.

                                         xml-rpc
   ServerSideObject  <-----  Master <--------------- Stub <----- Client-1 
                                               `---- Stub <----- Client-2
                                                 `-- Stub <----- Client-3

Features

Following remote services are available

stack: Simple implementation of a remote stack (push, pop, ...).

storage: Simple implementation of a remote data storage. Parameter can be saved in key value format (setParameter, getParameter, ...)

provider: The provider can be used for interprocess-communication and monitoring. It is possible to get informations about the clients and send messages to then

Examples

all examples can be found in the examples package or in the cvs repository.

webCrawler

The webCrawler example is a webcrawler implemented as scalable application. Several crawler processes select urls from a remote stack. If one crawler gets a url it loads the side. After storing side to a file, the crawler pushes all link urls back into the stack. A noter crawler can asynchronous pop one of the sub urls from the stack and proceed.

The master server manages the stack. All crawlers are clients connected to the master. The clients can be monitored by a special monitor client. A control script can add a initial url to the stack.

                [crawler-1]              [master (stack)]  
                     |                         |
                     |                         | <------ push initial url
                     | <---- pop url --------- |   
 -- read website --> |                         |
                     | --- push link-urls ---> |
   save content <--- |                         |
                                               |
                [crawler-2]                    |
                     |                         |
                     | <---- pop url --------- |
                     :                         :
                     .                         .

Master, crawler and monitor can run on several hosts. Crawler can run in multiple instances.

Unpack the examples package which you can find under the download section and enter the folder examples/webCrawler. Just run every process in a separate shell in following order:

./control master
./control crawler
./control monitor
./control addurl <http://myurl.net>

simple stack example

The following example is very simple and shows how easy it is to use a remote service from the rpcServices library. The first component is the Master server. master.py:

from rpcServices import master

server = master.XmlRpcMaster(("localhost", 8000))
server.serveForever()

start the master server:

python master.py

The following program creates a stack with name "in" and pushes the numbers 0 - 9 into it. producer.py:

from rpcServices import client

myClient = client.XmlRpcClient("http://localhost:8000")
inStack = myClient.stack.createStack("in")
for item in range(10):
    print "push item %s" % item
    inStack.push(item)

run the producer in a new shell:

python producer.py

The consumer reads all stack elements an print them out. consumer.py:

from rpcServices import client, services

myClient = client.XmlRpcClient("http://localhost:8000")
inStack = myClient.stack.createStack("in")
while(1):
    try:
        item = inStack.pop()
    except services.stack.StackIndexError:
        break
    print "pop item %s" % item

run the consumer:

python consumer.py

There is no limitation about how much producers and consumers uses the same stack. Every process can run on a several host.