On Tue, May 30, 2023 at 11:04 AM Bruce Richardson <bruce.richardson@intel.com> wrote:
On Sun, May 28, 2023 at 11:07:40PM +0300, Baruch Even wrote:
>    Hi,
>    We found an issue with newer kernels (5.13+) that are found on newer
>    OSes (Ubuntu22, Rocky9, Ubuntu20 with kernel 5.15) where a 2M page that
>    was allocated for DPDK was migrated (moved into another physical page)
>    when a 1G page was allocated.
>    From our reading of the kernel commits this started with commit
>    ae37c7ff79f1f030e28ec76c46ee032f8fd07607
>        mm: make alloc_contig_range handle in-use hugetlb pages
>    This caused what looked like memory corruptions to us and cases where
>    the rings were moved from their physical location and communication was
>    no longer possible.
>    I wanted to ask if anyone else hit this issue and what mitigations are
>    available?
>    We are currently looking at using a kernel driver to pin the pages but
>    I expect that this issue will affect others and that a more general
>    approach is needed.
>    Thanks,
>    Baruch
>    --

Hi,

what kernel driver was being used for the device I/O part? Was it a UIO
based driver or "vfio-pci"? When using vfio-pci and configuring IOMMU
mappings, the pages mapped should be pinned by the kernel, I would have
thought, since the kernel knows they are being used by devices.

/Bruce

This was using igb_uio on an AWS instance with their ena driver.

Baruch

--
Baruch Even
Platform Technical Lead,  WEKA
baruch@weka.io ­www.weka.io ­  ­