I'm writing a program that uses genetic techniques to evolve equations. I want to be able to submit the function 'mainfunc' to the Parallel Python 'submit' function. The function 'mainfunc' calls two or three methods defined in the Utility class. They instantiate other classes and call various methods. I think what I want is all of it in one NAMESPACE. So I've instantiated some (maybe it should be all) of the classes inside the function 'mainfunc'. I call the Utility method 'generate()'. If we were to follow it's chain of execution it would involve all of the classes and methods in the code.
Now, the equations are stored in a tree. Each time a tree is generated, mutated or cross bred, the nodes need to be given a new key so they can be accessed from a dictionary attribute of the tree. The class 'KeySeq' generates these keys.
In Parallel Python, I'm going to send multiple instances of 'mainfunc' to the 'submit' function of PP. Each has to be able to access 'KeySeq'. It would be nice if they all accessed the same instance of KeySeq so that none of the nodes on the returned trees had the same key, but I could get around that if necessary.
So: my question is about stuffing EVERYTHING into mainfunc. Thanks (Edit) If I don't include everything in mainfunc, I have to try to tell PP about dependent functions, etc by passing various arguements in various places. I'm trying to avoid that.
(late Edit) if ks.next() is called inside the 'generate() function, it returns the error 'NameError: global name 'ks' is not defined'
class KeySeq: "Iterator to produce sequential \ integers for keys in dict" def __init__(self, data = 0): self.data = data def __iter__(self): return self def next(self): self.data = self.data + 1 return self.data class One: 'some code' class Two: 'some code' class Three: 'some code' class Utilities: def generate(x): '___________' def obfiscate(y): '___________' def ruminate(z): '__________' def mainfunc(z): ks = KeySeq() one = One() two = Two() three = Three() utilities = Utilities() list_of_interest = utilities.generate(5) return list_of_interest result = mainfunc(params)
If you want all of the instances of
mainfunc to use the same
KeySeq object, you can use the default parameter value trick:
def mainfunc(ks=KeySeq()): key = ks.next()
As long as you don't actually pass in a value of
ks, all calls to
mainfunc will use the instance of
KeySeq that was created when the function was defined.
Here's why, in case you don't know: A function is an object. It has attributes. One of its attributes is named
func_defaults; it's a tuple containing the default values of all of the arguments in its signature that have defaults. When you call a function and don't provide a value for an argument that has a default, the function retrieves the value from
func_defaults. So when you call
mainfunc without providing a value for
ks, it gets the
KeySeq() instance out of the
func_defaults tuple. Which, for that instance of
mainfunc, is always the same
Now, you say that you're going to send "multiple instances of
mainfunc to the
submit function of PP." Do you really mean multiple instances? If so, the mechanism I'm describing won't work.
But it's tricky to create multiple instances of a function (and the code you've posted doesn't). For example, this function does return a new instance of
g every time it's called:
>>> def f(): def g(x=): return x return g >>> g1 = f() >>> g2 = f() >>> g1().append('a') >>> g2().append('b') >>> g1() ['a'] >>> g2() ['b']
If I call
g() with no argument, it returns the default value (initially an empty list) from its
func_defaults tuple. Since
g2 are different instances of the
g function, their default value for the
x argument is also a different instance, which the above demonstrates.
If you'd like to make this more explicit than using a tricky side-effect of default values, here's another way to do it:
def mainfunc(): if not hasattr(mainfunc, "ks"): setattr(mainfunc, "ks", KeySeq()) key = mainfunc.ks.next()
Finally, a super important point that the code you've posted overlooks: If you're going to be doing parallel processing on shared data, the code that touches that data needs to implement locking. Look at the
callback.py example in the Parallel Python documentation and see how locking is used in the
Sum class, and why.
It's fine to structure your program that way. A lot of command line utilities follow the same pattern:
#imports, utilities, other functions def main(arg): #... if __name__ == '__main__': import sys main(sys.argv)
That way you can call the
main function from another module by importing it, or you can run it from the command line.