I have a new ATI gpu that I have been trying on 3 different distributed computing projects. I get errors on each work unit.
The errors indicate No Protocol which could be caused by a bad environment variable that doesn't point to the correct path.
On Linux, Mathematica returns $Failed for this:
Environment["ATISTREAMSDKROOT"]
So, I was wondering if there is a test suite that would validate the calculations of such a card?
{1->{Version->OpenCL 1.2 AMD-APP (923.1),Name->AMD Accelerated Parallel Processing,Vendor->Advanced Micro Devices, Inc.,Extensions->{cl_khr_icd,cl_amd_event_callback,cl_amd_offline_devices},1->{Type->GPU,Name->Cypress,Version->OpenCL 1.2 AMD-APP (923.1),Extensions->{cl_khr_fp64,cl_amd_fp64,cl_khr_global_int32_base_atomics,cl_khr_global_int32_extended_atomics,cl_khr_local_int32_base_atomics,cl_khr_local_int32_extended_atomics,cl_khr_3d_image_writes,cl_khr_byte_addressable_store,cl_khr_gl_sharing,cl_ext_atomic_counters_32,cl_amd_device_attribute_query,cl_amd_vec3,cl_amd_printf,cl_amd_media_ops,cl_amd_popcnt},Driver Version->CAL 1.4.1741,Vendor->Advanced Micro Devices, Inc.,Profile->FULL_PROFILE,Vendor ID->4098,Compute Units->18,Core Count->1440,Maximum Work Item Dimensions->3,Maximum Work Item Sizes->{256,256,256},Maximum Work Group Size->256,Preferred Vector Width Character->16,Preferred Vector Width Short->16,Preferred Vector Width Integer->4,Preferred Vector Width Long->2,Preferred Vector Width Float->4,Preferred Vector Width Double->2,Maximum Clock Frequency->765,Address Bits->32,Maximum Memory Allocation Size->134217728,Image Support->True,Maximum Read Image Arguments->128,Maximum Write Image Arguments->8,Maximum Image2D Width->8192,Maximum Image2D Height->8192,Maximum Image3D Width->2048,Maximum Image3D Height->2048,Maximum Image3D Depth->2048,Maximum Samplers->16,Maximum Parameter Size->1024,Memory Base Address Align->2048,Memory Data Type Align Size->128,Floating Point Precision Configuration->{Infinity,NaNs,Round to Nearest,Round to Infinity,Round to Zero,IEEE754-2008 Fused MAD},Global Memory Cache Type->None,Global Memory Cache Line Size->0,Global Memory Cache Size->0,Global Memory Size->536870912,Maximum Constant Buffer Size->65536,Maximum Constant Arguments->8,Local Memory Type->Local,Local Memory Size->32768,Error Correction Support->False,Profiling Timer Resolution->1,Endian Little->True,Available->True,Compiler Available->True,Execution Capabilities->{Kernel Execution},Command Queue Properties->Profiling Enabled},2->{Type->CPU,Name->AMD Phenom(tm) II X6 1100T Processor,Version->OpenCL 1.2 AMD-APP (923.1),Extensions->{cl_khr_fp64,cl_amd_fp64,cl_khr_global_int32_base_atomics,cl_khr_global_int32_extended_atomics,cl_khr_local_int32_base_atomics,cl_khr_local_int32_extended_atomics,cl_khr_int64_base_atomics,cl_khr_int64_extended_atomics,cl_khr_byte_addressable_store,cl_khr_gl_sharing,cl_ext_device_fission,cl_amd_device_attribute_query,cl_amd_vec3,cl_amd_printf,cl_amd_media_ops,cl_amd_popcnt},Driver Version->2.0 (sse2),Vendor->AuthenticAMD,Profile->FULL_PROFILE,Vendor ID->4098,Compute Units->6,Core Count->6,Maximum Work Item Dimensions->3,Maximum Work Item Sizes->{1024,1024,1024},Maximum Work Group Size->1024,Preferred Vector Width Character->16,Preferred Vector Width Short->16,Preferred Vector Width Integer->4,Preferred Vector Width Long->2,Preferred Vector Width Float->4,Preferred Vector Width Double->0,Maximum Clock Frequency->3314,Address Bits->64,Maximum Memory Allocation Size->4215913470,Image Support->True,Maximum Read Image Arguments->128,Maximum Write Image Arguments->8,Maximum Image2D Width->8192,Maximum Image2D Height->8192,Maximum Image3D Width->2048,Maximum Image3D Height->2048,Maximum Image3D Depth->2048,Maximum Samplers->16,Maximum Parameter Size->4096,Memory Base Address Align->1024,Memory Data Type Align Size->128,Floating Point Precision Configuration->{Denorms,Infinity,NaNs,Round to Nearest,Round to Infinity,Round to Zero,IEEE754-2008 Fused MAD},Global Memory Cache Type->Read Write,Global Memory Cache Line Size->64,Global Memory Cache Size->65536,Global Memory Size->16863653880,Maximum Constant Buffer Size->65536,Maximum Constant Arguments->8,Local Memory Type->Global,Local Memory Size->32768,Error Correction Support->False,Profiling Timer Resolution->1,Endian Little->True,Available->True,Compiler Available->True,Execution Capabilities->{Kernel Execution,Native Kernel Execution},Command Queue Properties->Profiling Enabled}}}
EDIT: when I run an example I get: OpenCLMemoryAllocate::invdev: OpenCLLink device is invalid.
Since the information in the scroll-bar above seems to be complete, I don't have any idea what to fix.
Coda I solved this by running as root. It seems the AMD Linux driver has a slight permissions problem accessing the libaries.
Answer
EDIT: This is only for CUDA, I did not pay proper attention there (feel free to comment on deletion). Doing the same with OpenCL would be considerably more work.
Do you mean something like this? This compares the results for equivalent Map and CUDAMap expressions:
Needs["CUDALink`"]
vals = Table[i, {i, 0, 10, .1}];
m1 = CUDAMap[Cos, vals];
m2 = Map[Cos, vals];
m1 == m2
True
CUDAMap even balks at invalid operations:
CUDAMap[Erf, vals];
CUDAMap::op: Specified operation, Erf, is invalid. Valid operations are: Cos, Sin, Tan, ArcCos, ArcSin, ArcTan, Cosh, Sinh, Exp, Log, Log10, Sqrt, Ceiling, Floor, Abs >>
Comments
Post a Comment