parallelization - Server running WolframLightWeightGrid manager loses licensing information

"Once more into the breach..,"

I have 2 identical Xserves running OS X Server 10.6.8 each with 2 quad core Intel Xenon processors a total of 16 kernels. I have configured both servers identically and use them exclusively as a computing grid for Mathematica.

I run the WolframLightWeightGrid manager on the servers and distribute processing jobs to them from a number of different notebooks.

An earlier question Wolfram Light Weight Grid and parallel computing from when I setup this environment gives more background.

This environment has pretty much run like a charm over the past four months.

As a check on what goes on I run the following few lines of code when launching the grid:

Needs["LightweightGridClient`"]
Column[{
  ParallelEvaluate[$ProcessID],
  ParallelEvaluate[$MachineName],
  RemoteServicesAgents[] // ColumnForm}]

This will give me lists of the process IDs and machine names and remote service agents.

But today something has gone awry:

code & output

I should see 8 additional process ID's and 8 additional machine names (specifically "abb-1").

The output correctly identifies the remote service agents, but then it gets strange as it also generates the following error messages:

messages

I've tried calling Wolfram and while I have great respect for the support staff, not all of them have extensive experience with parallel processing and honestly, one can't reasonably expect all of them to have such experience.

If I get anything back from them I'll report it here. Until then...

The WolframLightWeightGrid manager should launch when the server boots. I have restarted "abb-1" several times, even shutting it down completely and powering it off, but still have the same problem.

I have current licenses for all the kernels in question.

Since RemoteServicesAgents[] identifies the server "abb-1" does this imply that the grid manager has launched?

It seems that something has caused to grid manager to lose its licensing infromation.

What could have gone wrong?

Where should I look to trouble shoot this?

I hope to get this working properly so that I can automate more of my daily processes and calculation. AND I need it to work reliably.

Suggestions, solutions, insights, and comments welcome.

Thx...

Answer

Well, further experiments have shown that altering my code to:

Needs["LightweightGridClient`"]
Column[{

  CloseKernels[],
  LaunchKernels[],
  ParallelEvaluate[$ProcessID],
  ParallelEvaluate[$MachineName],
  RemoteServicesAgents[] // ColumnForm}]

Which gives me:

output

appears to resolve the problem.

This does not explain why the problem occurred or why it did not occur before so, I'll happily vote up and select any answer that does provide an explanation (or even a plausible speculation) of why this happened.

It does seem that Manual Launching

enter image description here

see: Launching and Connecting

and thereby controlling the entire process of launching kernels programmatically and by passing the configuration mechanisms, the process runs more smoothly.

Blog

Search This Blog

parallelization - Server running WolframLightWeightGrid manager loses licensing information

Comments

Post a Comment

Popular posts from this blog

front end - keyboard shortcut to invoke Insert new matrix

How to thread a list

plotting - Magnifying Glass on a Plot