Table of Contents
GPUDirect RDMA (Remote Direct Memory Access) is a technology that enables a direct path for data exchange between the GPU and a third-party peer device using standard features of PCI Express.
The NVIDIA GPU driver package provides a kernel module,
nvidia-peermem.ko, which provides
Mellanox InfiniBand based HCAs (Host Channel Adapters) direct
peer-to-peer read and write access to the NVIDIA GPU's video
memory. It allows GPUDirect RDMA-based applications to use GPU
computing power with the RDMA interconnect without needing to copy
data to host memory.
This capability is supported with Mellanox ConnectX-3 VPI or newer adapters. It works with both InfiniBand and RoCE (RDMA over Converged Ethernet) technologies.
Mellanox OFED (Open Fabrics Enterprise Distribution) or MOFED, introduces an API between the InfiniBand Core and peer memory clients such as NVIDIA GPUs, called PeerDirect, see https://community.mellanox.com/s/article/howto-implement-peerdirect-client-using-mlnx-ofed.
The nvidia-peermem.ko module
registers the NVIDIA GPU with the InfiniBand subsystem by using
peer-to-peer APIs provided by the NVIDIA GPU driver.
This module, originally maintained by Mellanox on GitHub, is now included with the NVIDIA Linux GPU driver. The original GitHub project at https://github.com/Mellanox/nv_peer_memory should be considered deprecated and only critical bugs will be addressed for existing installations.
The kernel must have the required support for RDMA peer memory
either through additional patches to the kernel or via Mellanox
OFED package (MOFED) as a prerequisite for loading and using
nvidia-peermem.ko.
It is possible that the nv_peer_mem module from the GitHub
project may be installed and loaded on the system. Installation of
nvidia-peermem.ko will not affect the
functionality of the existing nv_peer_mem module. But, to load and use
nvidia-peermem.ko, users must disable
the nv_peer_mem
service. Additionally, it is encouraged to uninstall the
nv_peer_mem package
to avoid any conflict with nvidia-peermem.ko since only one module can be
loaded at any time.
Stop the nv_peer_mem service:
    # service nv_peer_mem stop
Check if nv_peer_mem.ko is still
loaded after stopping the service:
    # lsmod | grep nv_peer_mem
If nv_peer_mem.ko is still loaded,
unload it with:
    # rmmod nv_peer_mem
Uninstall nv_peer_mem package:
For DEB based OS:
    # dpkg -P nvidia-peer-memory
    # dpkg -P nvidia-peer-memory-dkms
For RPM based OS:
    # rpm -e nvidia_peer_memory
After ensuring kernel support and installing the GPU driver,
nvidia-peermem.ko can be loaded with
the following command with root privileges in a terminal
window:
    # modprobe nvidia-peermem
Note: If the NVIDIA GPU driver is installed before MOFED, the
GPU driver must be uninstalled and installed again to make sure
nvidia-peermem.ko is compiled with
the RDMA APIs that are provided by MOFED.
peerdirect_support: this
parameter takes the following integer values:
0, which is the default and is appropriate for a kernel that has the PeerDirect APIs roughly corresponding to MOFED 5.1.
1, which is required in combination with the legacy PeerDirect APIs, as currently shipping in MOFED 5.0 and older releases, notably in MOFED LTS.
As a reference, in the legacy PeerDirect APIs, the peer_memory_client structure declared in peer_mem.h has the two extra function pointers shown below:
void* (*get_context_private_data)(u64 peer_id); void (*put_context_private_data)(void *context);
Note that MOFED LTS as well as MOFED 5.0 and previous releases
ship with legacy PeerDirect APIs. So for example, when using MOFED
LTS, GPUDirect RDMA support for the Mellanox HCAs will not work
correctly unless peerdirect_support is set to one.
Instead for MOFED 5.1 or newer, the default value of zero is appropriate, so no special actions are needed.
Currently, there is no service to automatically load
nvidia-peermem.ko. Users need to load
the module manually.
When loading nvidia-peermem.ko on
a kernel with legacy PeerDirect APIs, the module parameter
peerdirect_support has to be
set to one.
The PeerDirect APIs shipping
in MOFED releases 5.1 and later are affected by a lock inversion
bug which may lead to a kernel-side deadlock. This is tracked by
the NVIDIA-internal reference number 2696789. PeerDirect APIs in newer MOFED
releases belonging to some branches, like 5.3-1.0.0.1.43, offer an opt-in feature to
mitigate that problem. Starting from this release the nvidia-peermem.ko kernel module explicitly
enables it, unless peerdirect_support is set to one.