"Once more into the breach..,"
I have 2 identical Xserves running OS X Server 10.6.8 each with 2 quad core Intel Xenon processors a total of 16 kernels. I have configured both servers identically and use them exclusively as a computing grid for Mathematica.
I run the WolframLightWeightGrid manager on the servers and distribute processing jobs to them from a number of different notebooks.
An earlier question Wolfram Light Weight Grid and parallel computing from when I setup this environment gives more background.
This environment has pretty much run like a charm over the past four months.
As a check on what goes on I run the following few lines of code when launching the grid:
Needs["LightweightGridClient`"]
Column[{
ParallelEvaluate[$ProcessID],
ParallelEvaluate[$MachineName],
RemoteServicesAgents[] // ColumnForm}]
This will give me lists of the process IDs and machine names and remote service agents.
But today something has gone awry:
I should see 8 additional process ID's and 8 additional machine names (specifically "abb-1").
The output correctly identifies the remote service agents, but then it gets strange as it also generates the following error messages:
I've tried calling Wolfram and while I have great respect for the support staff, not all of them have extensive experience with parallel processing and honestly, one can't reasonably expect all of them to have such experience.
If I get anything back from them I'll report it here. Until then...
The WolframLightWeightGrid manager should launch when the server boots. I have restarted "abb-1" several times, even shutting it down completely and powering it off, but still have the same problem.
I have current licenses for all the kernels in question.
Since RemoteServicesAgents[]
identifies the server "abb-1" does this imply that the grid manager has launched?
It seems that something has caused to grid manager to lose its licensing infromation.
- What could have gone wrong?
- Where should I look to trouble shoot this?
I hope to get this working properly so that I can automate more of my daily processes and calculation. AND I need it to work reliably.
Suggestions, solutions, insights, and comments welcome.
Thx...
Answer
Well, further experiments have shown that altering my code to:
Needs["LightweightGridClient`"]
Column[{
CloseKernels[],
LaunchKernels[],
ParallelEvaluate[$ProcessID],
ParallelEvaluate[$MachineName],
RemoteServicesAgents[] // ColumnForm}]
Which gives me:
appears to resolve the problem.
This does not explain why the problem occurred or why it did not occur before so, I'll happily vote up and select any answer that does provide an explanation (or even a plausible speculation) of why this happened.
It does seem that Manual Launching
and thereby controlling the entire process of launching kernels programmatically and by passing the configuration mechanisms, the process runs more smoothly.
Comments
Post a Comment