GPU Processing Cores

Modern NVIDIA GPUs come with three different types of processing cores:

NVIDIA CUDA Cores

Compute Unified Device Architecture (CUDA) is a parallel computing platform built on specialized hardware and the application programming interface (API) for the NVIDIA instruction set. CUDA cores are discreet processors, usually enumerated in the thousands on a single GPU chip that allows data to be processed in parallel across those cores.

CUDA cores are the workhorse component of general purpose GPU computing. They improve both the performance and cost effectiveness of parallel processing for myriad scientific workloads. When they are complimented by other specialty GPU core types, workload performance is further accelerated. 

NVIDIA Tensor Cores

Tensor is a data type that can represent nearly any type of ordered or unordered data. It can be thought of as a container in which multi-dimensional data sets can be stored. In the simplest terms it can be considered as an extension of a matrix. For example, matrices are two-dimensional structures containing numbers, but a tensor is a multi-dimensional set of numbers.

Tensor Cores enable mixed-precision computing, dynamically adapting calculations to accelerate throughput while preserving accuracy. The latest generation expands these speedups to a full range of workloads. As examples, up to 10X speedups in artificial intelligence (AI), machine learning (ML), and deep learning (DL) workloads, and 2.5X boosts for general HPC workloads is common.

Tensor cores can compute faster than CUDA cores. Primarily because CUDA cores perform one operation per clock cycle, whereas tensor cores can perform multiple operations per clock cycle. For ML and DL models, CUDA cores are not as effective as Tensor cores in terms of both cost and computation speed but they still augment their productivity.

NVIDIA Ray Tracing Cores

Ray tracing cores are exclusive to NVIDIA RTX graphics cards. RTX technology enables detail and accuracy for 3D designs and rendering and photorealistic physical world simulations including visual effects. Simulation and visualization capabilities produced are not limited to how something looks, but also how it behaves. The marriage of CUDA cores and APIs with RTX cores enables accurate modeling of the behavior of real-world objects and granular data visualization capabilities.

Cluster storage

The HOME file system (/home) is used for storage of job submission scripts, small applications, databases, and other user files. All users will be presented with a home directory when logging into any cluster system. Contents of user home directories will be the same across cluster systems.

All user home directories have a 20GB quota.

The SCRATCH file system (/scratch) is a capacious and performant parallel file system intended to be used for compute jobs. The SCRATCH file system is considered ephemeral storage.

The LOCAL SCRATCH file system (/lscratch or /tmp) is a reasonably performant but less capacious shared file system local to each compute node, also intended to be used for compute jobs. LOCAL SCRATCH file systems are also considered ephemeral storage.

The WORK file system (/work) is used to store shared data common across users or compute jobs. The WORK file system can be read by compute nodes in the cluster but they cannot write to it. As a result it is referred to as “near-line storage”.

All group work directories have a 500GB quota.

The PROJECT file system (/project) is for longer term user or shared group storage for active job data. Compute project data is considered active if it is needed for current or ongoing compute jobs. The PROJECT file system is not an archive for legacy job data and is eferred to as "offline storage" as it is not accessible by compute nodes.

All group project directories have a 1TB quota.

Cluster Nodes

Job submission nodes

Job submission nodes allow users to authenticate to the cluster and are sometimes referred to as “login nodes”. They also provide applications required for scripting, submitting and managing batch compute jobs. Batch compute jobs are submitted to the cluster work queue. The user then waits for the job to be scheduled and run when the requested compute resources are available.

Users require an AU NetID that has been provisioned for AUHPCS, and a device configured for Duo two-factor authentication to access job submission nodes.

Data transfer nodes

Data transfer nodes provide access to user file systems in the cluster. Their role is to facilitate high speed transfer of data across those file systems within the cluster. These nodes can also be used for the transfer of data in and out of the cluster.

Users require an AU NetID that has been provisioned for AUHPCS, and a device configured for Duo two-factor authentication to access data transfer nodes.

Compute node profiles

General Intel compute nodes:

  • Model: Dell PowerEdge R440 Server
  • Processors: (2) Intel Xeon Silver 4210R (20 Cascade Lake cores) 2.4G, 13.75M Cache
  • Memory: 96GB DDR4-2400
  • Local scratch space: 960GB SSD
  • Number of nodes: 18

These nodes are candidates for bioinformatics, genomics, population science, mathematics, chemistry, and physics workloads with the most modest resource needs.

Middle memory Intel compute nodes:

  • Model: Dell PowerEdge R640 Server
  • Processors: (2) Intel Xeon Gold 5218R (40 Cascade Lake cores) 2.1G, 27.5M Cache
  • Memory: 768GB DDR4-2666
  • Local scratch space: 960GB SSD
  • Number of nodes: 8

These nodes are candidates for bioinformatics, genomics, population science, mathematics, chemistry, physics, and some modeling workloads with the additional resource needs. It is also likely these nodes would be suitable for pharmaceutical, molecular biology, and simulation workloads.

High memory Intel compute nodes:

  • Model: Dell PowerEdge R640 Server
  • Processors: (2) Intel Xeon Gold 5220R (48 Cascade Lake cores) 2.2G, 35.75M Cache
  • Memory: 1.53TB DDR4-2666
  • Local scratch space: 1.92TB SSD
  • Number of nodes: 2

These nodes are candidates for bioinformatics, genomics, population science, mathematics, chemistry, physics, modeling, pharmaceutical, molecular biology, and simulation workloads with the largest resource needs

NVIDIA Quadro RTX - Intel compute nodes:

  • Model: Dell PowerEdge R740XD Server
  • Processors: (2) Intel Xeon Gold 6246R (32 Cascade Lake cores) 3.4G, 35.75M Cache
  • Processors: (3) NVIDIA Quadro RTX 6000 (Turing, CUDA/Tensor 27,648/3,456 cores)
  • Memory: 768GB DDR4-2933 (CPU), 72GB GDDR6 (GPU)
  • Local scratch space: 1.92TB SSD
  • Number of nodes: 2

These nodes are candidates for data sciences, physics and life science modeling, artificial intelligence, inference, and simulation workloads with modest resource needs. These systems also include hardware features that can be used to accelerate complex simulations of the physical world such as particle or fluid dynamics for scientific and data visualization. They could also be used for film, video, and graphic rendering, or even special effects workloads.

NVIDIA Tesla T4 - Intel compute nodes:

  • Model: Dell PowerEdge R740XD Server
  • Processors: (2) Intel Xeon Gold 6246R (32 Cascade Lake cores) 3.4G, 35.75M Cache
  • Processors: (2) NVIDIA Tesla T4 (Turing, CUDA/Tensor 10,240/1,280 cores)
  • Memory: 768GB DDR4-2933 (CPU), 32GB GDDR6 (GPU)
  • Local scratch space: 1.92TB SSD
  • Number of nodes: 2

These nodes are candidates for mathematics, data sciences, artificial intelligence, inference, machine learning, deep learning, and simulation workloads with modest resource needs.

NVIDIA A100 – AMD compute node:

  • Model: NVIDIA DGX A100 P3687
  • Processors: (2) AMD EPYC 7742 (128 Rome cores) 2.25G, 256M Cache
  • Processors: (8) NVIDIA A100 Tensor Core (Ampere, CUDA/Tensor 55,296/3,456 cores)
  • Memory: 1TB DDR4-3200 (CPU), 320 GB, HBM2e (GPU)
  • Local scratch space: 15TB NVMe
  • Number of nodes: 1

This node provides the greatest end-to-end HPC platform performance in the cluster. It offers many enhancements that deliver significant speedups for largescale artificial intelligence, inference, deep learning, data analytics, and digital forensic workloads.

Software

Red Hat system utility and application support

Research Technology systems administration staff can assist with most Red Hat Linux operating system, application, and system utility support. Enterprise support for Red Hat Linux can be extended to professional services if needed. However, because there is often commonality across many Linux distributions, Research Technology may be able to help support application and utilities on other distributions as well.

Job submission script support

Research Technology HPC systems engineering and application staff can assist with many job submission script composition and troubleshooting tasks. However, it is important to understand that job submission script issues and debugging can be complex and support will often be a cooperative effort between users and staff. 

Software delivery

Software will be made available to the cluster using the following methodologies.

  • Installation by users in their home directories
  • Installation into the cluster as a LMOD environment module
  • Installation in an end user managed Singularity container
  • Installation on a HPCS specific network application server
  • Some externally hosted applications requiring GPU resources

Access

To access AUHPCS researchers, faculty, or staff must meet the following requirements:

  • Have an active AU NetID and a device configured for Duo two factor authentication.
  • Register as a Principal Investigator (PI) using the iLAB “High Performance Computing and Parallel Computing Services Core” page, or be added to an existing project by a registered PI.
  • Complete the required training course(s) for basic Linux competency (if required) and AUHPCS cluster concepts, use, and workflow.

Once access is granted for approved compute projects adherence to AUHPCS governance policies is a continuing requirement.

 

Get Help

Support

Consultation and assistance with HPC Services from the ITSS Research Technology group can be requested using the standard AU enterprise support services. Requests can be directed to us using the “Research HPC Services” assignment group.

Software Support

HPCS will make every effort to provide some level of support the scientific software installed in the cluster. However due to the open-source origins of many of these software packages, standard support is generally not available in the majority of cases. In these cases support will be a collaborative effort leveraging HPCS staff experience, HPCS staff research, and end user self-support. For purchased software with support HPCS staff will liaise with end users and vendors to support such software. Scientific software that requires funding for licenses and support that the university is not already licensed for will have to be purchased by the requesting department.