CSIRO Windows (HPC 2008 R2) GPU Cluster

Specialised Graphical Processing Unit computing facility

Getting Started

Note: General Information for MS Windows HPC server applies to the CPU and GPU Clusters.

Accessing the Microsoft Windows Server 2008 R2 GPU cluster
Request clusterhelp@hpsc.csiro.au for your NEXUS account to be added to the Nexus group "HPC Cluster Users".
If you need to login to the cluster for development (wintest) or you don't have HPC Client toolkit, Microsoft Remote Desktop Connection can be used to connect to the login node. Connect to the HPC Server FQDN or IP Address.
MS Windows HPC Cluster Servers:
GPU login node: wintest.csiro.au
CPU login node: dellcpu.csiro.au


Job Submission
Submiting Jobs using Microsoft (MS) HPC Client toolkit
The MS HPC Client Toolkit can be installed on your own Desktop PC running MS Windows XP or Win7. The HPC Pack 2008 R2 can be downloaded from Microsoft or installed from from ASC IM&T.
Run the appropriate x86 or x64 msi file from \\hnas1-cdc.nexus.csiro.au\tools\windows\HPC_Pack\HPCPack2008R2_Client_Utilities\
Jobs can be submitted from running the HPC Job Manager from the Remote login to the login node (See accessing above) or from runing the Job manager on your desktop and connecting to the Cluster. Running HPC Job Managment from your own PC is the prefered option as it prevents a bottleneck on the loginnode. When starting the HPC Job Manager from your PC address it to the cluster head node.
Once the HPC Client toolkit is installed command line utilities can be used if you prefer. See http://technet.microsoft.com/en-us/library/cc972848%28WS.10%29.aspx
GPU Head node: wingpu.nexus.csiro.au
CPU Head node: dellcpu.csiro.au

Parametric Sweep Jobs
To prevent reinventing the wheel there is a MS doc that covers Parametric Sweeps.
http://technet.microsoft.com/en-us/library/cc972864%28WS.10%29.aspx

Specific to CPU Cluster: DellCPU
Accessing data and applications:
There is limited disk space available on the management node so you can request a home or data directory on hnas-cl. See staging data below. Nodes can access data across the network to your own shared drive but at network speeds. Applications may be better run from the management node. Request Application installs from clusterhelp@hpsc.csiro.au. A working directory has been setup on the head node for small log files etc. In your job setup use \\192.168.80.221\working\your-own-dir to force the use of the application network. For Data use the hnas or you share \\Computer-name.domain-if-needed\your-share.

Running Jobs
There is a variety of jobs run on the cluster. Short, long, single core, many core, single node, many nodes. Normally when submitting a job using the default job template you can leave the resource un-ticked and the cluster manager will sort out as many nodes/cores as available and needed. This will allow jobs to be run as resources are available.
large long running jobs and requirements for dedicated resources need these two items need to be addressed.
1) Test your job and make sure you are fully utilising the resourses you request. If you select a number of nodes, your job should be using all the cores on the nodes and not just one core on each node.
2) If you are satisfied using nodes 02 - 13 select the Long_Job template. If you need more resources ask the community using the cluster: Email "HPC Cluster Users" informing them of your plans. When starting, how long you estimate the job will take, asking if it is likely to clash with someone and the number of nodes you will be requesting. Use the default job template and request the resources. If there are clashes individuals should negotiate timing and number of nodes.
Short running jobs can be submitted with the default template. If you see that nodes 02 - 13 are in use a long job is most likely running. In this case use the default job template and the job manager will run your job on available resources.

Modifying Jobs
If a job does not run select it and choose modify.
Change one parameter (and back if you want it the same) and click modify.
This will change the status to configuring. You can then choose to modify it again and your inputs will be able to be modified before resubmitting the job.
To run another identical job choose Copy job.
The modifying or copying jobs is the best place to specifically allocate computing hardware resources.


Data Staging
WinGPU Cluster
Request through clusterhelp@hpsc.csiro.au an ident home directory on \\hnas1-cdc.nexus.csiro.au\home\your-ident This will be accessible from any Computer in CSIRO using your nexus account details. Other data directories available in addition to \home are \data & \flush . When submitting a job to the cluster, use \\hnas1-cdc.cluster\ or \\10.0.0.220\ which will access all the same directories in the hnas1-cdc.nexus.csiro.au file system, but across a faster 10Gb network.
This is the same file system whether using the Windows or Linux sides of the GPU cluster so information on quotas can be found using information from http://intra.hpsc.csiro.au/userguides/GPU/localguide.php#resources. Linux information and userguides are available from http://intra.hpsc.csiro.au/userguides/GPU/. Select the File System directories home, flush & data with respect to your purposes and volume of data.
To check your usage and quotas on these filesystems, Remote Desktop into the login node wintest.csiro.au. From the windows start button, Start - All Programs - Disk Quotas, Login as directed and answer "y" to the security key the first time. Use exit to close the window after you have you information.

DellCPU Cluster
Request through clusterhelp@hpsc.csiro.au an ident home directory on \\hnas-cl.nexus.csiro.au\home\your-ident This will be accessible from any Computer in CSIRO using your nexus account details. When submitting a job to the cluster use \\192.168.80.60\home\your-ident for your job to access the same data through the private LAN.



Development
GPU
To develop and test GPU code there is a GPU Development node which has the same hardware as the GPU nodes. You are quite welcome to login with MS Remote desktop to wintest.csiro.au development your code.
The Development environment includes:
Windows server 2008 R2
HPC Job submission R2
20 CALs for Client access
CUDA 3.2 and CUDA 4.0 including SDK's
Visual Studio 2008 including Fortran
MSDN Document Library for Visual Studio
Nvidia NSight 1.5

NVIDIA CUDA
CUDA can be downloaded to your PC to compile code in GPU emulation mode. On your Windows Desktop can you install the Cuda Toolkit available from
http://www.nvidia.com/object/cuda_get.html
Also install the HPC Tool Pack 2008 R2 as mentioned in Job submission above.
Installshield wizard for nvidia gpu Computing SDK and then CUDA 4.0 and the toolkit.
Using nvcc to compile code nvcc -deviceemu -I..\common\inc sourcefilename.cu
A useful guide. http://developer.download.nvidia.com/compute/cuda/2_3/docs/CUDA_Getting_Started_2.3_Windows.pdf


CMD Line
Submit jobs can be done via the CMD line. There needs to be a share set up on the head node for the programs and output.
Here are some examples:
C:\>clusrun /all /outputdir:\\machinename.domain\share dir
C:\>clusrun /all /outputdir:\\machinename.domain\share
copy \\machinename.domain\share\txt.txt c:\


Options
As they become available.



Help and Resources
For assistance on the gpu cluster, email: gpuhelp@hpsc.csiro.au

Show gpu cluster status linuxmanage status

Show gpu cluster utilisation GPU Cluster monitoring