All of lore.kernel.org
 help / color / mirror / Atom feed
* mempolicy ignored during ODP faults
@ 2022-06-29 22:44 Mor, Omri
  0 siblings, 0 replies; only message in thread
From: Mor, Omri @ 2022-06-29 22:44 UTC (permalink / raw)
  To: linux-rdma

Hello,

I am a PhD student working on a networking runtime, similar to MPI, that works with IBVerbs and other low-level HPC networking interfaces. In my recent work I’ve been using the SDSC Expanse cluster, which uses 2x AMD “Rome” CPUs and HDR InfiniBand supplied by Mellanox/Nvidia Networking.
After experiencing some trouble with a registration cache that could cause the CPU and HCA page tables to become non-coherent, I devised a workaround that makes use of explicit ODP registrations, which forces the HCA page tables to be updated when the page is no longer valid on the CPU.

In testing, however, I’ve discovered what appears to be troubling behavior leading to performance differences and/or degradation under certain circumstances. When a fresh set of pages are mmap’d and registered with explicit ODP, they are not necessarily pinned into physical memory. If then an RDMA Write is performed to these pages and a page fault triggered by the HCA, the pages must be pinned and the new virt-to-phys mapping added to the HCA page tables. This allocation of physical pages appears to ignore the current mempolicy settings and allocates all the pages on the NUMA zone close to the HCA (or, alternatively, the NUMA zone close to the thread that made the registration—those are the same under my current configuration). By modifying each page in the allocation prior to registering the MR, I can ensure that the pages are present in physical memory and force adherence to the mempolicy; however, this should not be necessary, as this causes unneeded memory accesses and pollutes the cache, which will be evicted anyway due to the incoming RDMA Write.

System details: SDSC Expanse
Distro: Rocky Linux 8.5
Kernel: 4.18 (4.18.0-348.23.1.el8_5.x86_64)
IB Hardware: Mellanox Technologies MT28908 Family [ConnectX-6]
IB Firmware: 20.31.2006

Notably, the kernel here isn’t particularly recent and doesn’t support many of the newest features in IBVerbs e.g. ibv_advise_mr. This apparent problem may have been fixed either intentionally—though I didn’t see any commits specifically referring to the issue—or through happenstance, due to other changes in ODP behavior. I’d much appreciate some help in a) confirming that my observations and hypothesis of what is occurring are correct, as of Linux 4.18 and b) determine whether this is fixed in more recent kernels.

Many thanks,
Omri Mor

^ permalink raw reply	[flat|nested] only message in thread

only message in thread, other threads:[~2022-06-29 23:42 UTC | newest]

Thread overview: (only message) (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-06-29 22:44 mempolicy ignored during ODP faults Mor, Omri

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.