I have tested the MAP_LOCKED, it doesn't help in this case. I do intend to report to the kernel but was wondering if others have hit upon this first. On Tue, May 30, 2023 at 4:35 AM Stephen Hemminger < stephen@networkplumber.org> wrote: > On Sun, 28 May 2023 23:07:40 +0300 > Baruch Even wrote: > > > Hi, > > > > We found an issue with newer kernels (5.13+) that are found on newer OSes > > (Ubuntu22, Rocky9, Ubuntu20 with kernel 5.15) where a 2M page that was > > allocated for DPDK was migrated (moved into another physical page) when a > > 1G page was allocated. > > > > From our reading of the kernel commits this started with commit > > ae37c7ff79f1f030e28ec76c46ee032f8fd07607 > > mm: make alloc_contig_range handle in-use hugetlb pages > > > > This caused what looked like memory corruptions to us and cases where the > > rings were moved from their physical location and communication was no > > longer possible. > > > > I wanted to ask if anyone else hit this issue and what mitigations are > > available? > > > > We are currently looking at using a kernel driver to pin the pages but I > > expect that this issue will affect others and that a more general > approach > > is needed. > > > > Thanks, > > Baruch > > > > Fix might be as simple as asking kernel to lock the mmap(). > > diff --git a/lib/eal/linux/eal_hugepage_info.c > b/lib/eal/linux/eal_hugepage_info.c > index 581d9dfc91eb..989c69387233 100644 > --- a/lib/eal/linux/eal_hugepage_info.c > +++ b/lib/eal/linux/eal_hugepage_info.c > @@ -48,7 +48,8 @@ map_shared_memory(const char *filename, const size_t > mem_size, int flags) > return NULL; > } > retval = mmap(NULL, mem_size, PROT_READ | PROT_WRITE, > - MAP_SHARED, fd, 0); > + MAP_SHARED_VALIDATE | MAP_LOCKED, fd, 0); > + > close(fd); > return retval == MAP_FAILED ? NULL : retval; > } > -- Baruch Even Platform Technical Lead, WEKA E baruch@weka.io* ­*W www.weka.io * ­* * ­*