NCP-AIO: AI Operations
Free Practice Exam Questions (page: 8)
Updated On: 10-Jan-2026

You are managing a Slurm cluster with multiple GPU nodes, each equipped with different types of GPUs. Some jobs are being allocated GPUs that should be reserved for other purposes, such as display rendering.

How would you ensure that only the intended GPUs are allocated to jobs?

  1. Verify that the GPUs are correctly listed in both gres.conf and slurm.conf, and ensure that unconfigured GPUs are excluded.
  2. Use nvidia-smi to manually assign GPUs to each job before submission.
  3. Reinstall the NVIDIA drivers to ensure proper GPU detection by Slurm.
  4. Increase the number of GPUs requested in the job script to avoid using unconfigured GPUs.

Answer(s): A

Explanation:

Comprehensive and Detailed Explanation From Exact Extract:

In Slurm GPU resource management, the gres.conf file defines the available GPUs (generic resources) per node, while slurm.conf configures the cluster-wide GPU scheduling policies. To prevent jobs from using GPUs reserved for other purposes (e.g., display rendering GPUs), administrators must ensure that only the GPUs intended for compute workloads are listed in these configuration files.

Properly configuring gres.conf allows Slurm to recognize and expose only those GPUs meant for jobs.

slurm.conf must be aligned to exclude or restrict unconfigured GPUs.

Manual GPU assignment using nvidia-smi is not scalable or integrated with Slurm scheduling.

Reinstalling drivers or increasing GPU requests does not solve resource exclusion.

Thus, the correct approach is to verify and configure GPU listings accurately in gres.conf and slurm.conf to restrict job allocations to intended GPUs.



A data scientist is training a deep learning model and notices slower than expected training times. The data scientist alerts a system administrator to inspect the issue. The system administrator suspects the disk IO is the issue.

What command should be used?

  1. tcpdump
  2. iostat
  3. nvidia-smi
  4. htop

Answer(s): B

Explanation:

Comprehensive and Detailed Explanation From Exact Extract:

To diagnose disk IO performance issues, the system administrator should use the iostat command, which reports CPU statistics and input/output statistics for devices and partitions. It helps identify bottlenecks in disk throughput or latency affecting application performance.

tcpdump is used for network traffic analysis, not disk IO.

nvidia-smi monitors NVIDIA GPU status but not disk IO.

htop shows CPU, memory, and process usage but provides limited disk IO details.

Therefore, iostat is the appropriate tool to assess disk IO performance and diagnose bottlenecks impacting training times.



You have noticed that users can access all GPUs on a node even when they request only one GPU in their job script using --gres=gpu:1. This is causing resource contention and inefficient GPU usage.

What configuration change would you make to restrict users' access to only their allocated GPUs?

  1. Increase the memory allocation per job to limit access to other resources on the node.
  2. Enable cgroup enforcement in cgroup.conf by setting ConstrainDevices=yes.
  3. Set a higher priority for Jobs requesting fewer GPUs, so they finish faster and free up resources sooner.
  4. Modify the job script to include additional resource requests for CPU cores alongside GPUs.

Answer(s): B

Explanation:

Comprehensive and Detailed Explanation From Exact Extract:

To restrict users' access strictly to the GPUs allocated to their jobs, Slurm uses cgroups (control groups) for resource isolation. Enabling device cgroup enforcement by setting ConstrainDevices=yes in cgroup.conf enforces device access restrictions, ensuring jobs cannot access GPUs beyond those assigned.

Increasing memory allocation or setting job priorities does not restrict device access.

Modifying job scripts to request additional CPU cores does not limit GPU access.

Hence, enabling cgroup enforcement with ConstrainDevices=yes is the correct method to prevent users from accessing unallocated GPUs.



A new researcher needs access to GPU resources but should not have permission to modify cluster settings or manage other users.

What role should you assign them in Run:ai?

  1. L1 Researcher
  2. Department Administrator
  3. Application Administrator
  4. Research Manager

Answer(s): A

Explanation:

Comprehensive and Detailed Explanation From Exact Extract:

In Run:ai, roles are assigned based on levels of permissions. The L1 Researcher role is designed for users who need access to GPU resources for running jobs and experiments but should not have administrative rights over cluster settings or other users. This role ensures researchers can use resources without affecting cluster configurations or user management. Other roles like Department Administrator, Application Administrator, or Research Manager have broader privileges, including managing users and settings, which are not appropriate for the new researcher's requirements.



When troubleshooting Slurm job scheduling issues, a common source of problems is jobs getting stuck in a pending state indefinitely.

Which Slurm command can be used to view detailed information about all pending jobs and identify the cause of the delay?

  1. scontrol
  2. sacct
  3. sinfo

Answer(s): A

Explanation:

Comprehensive and Detailed Explanation From Exact Extract:

The Slurm command scontrol provides detailed job control and information capabilities. Using scontrol (e.g., scontrol show job <jobid>) can reveal comprehensive details about jobs, including pending jobs, and the specific reasons why they are delayed or blocked. It is the go-to command for in-depth troubleshooting of job states.
While sacct provides accounting information and sinfo displays node and partition status, neither provides as detailed or actionable information on pending job causes as scontrol.



Viewing page 8 of 15
Viewing questions 36 - 40 out of 66 questions



Post your Comments and Discuss NVIDIA NCP-AIO exam prep with other Community members:

NCP-AIO Exam Discussions & Posts