I have tried to set it once at probe and then leave it in place, however 
the NVMe driver does not seem to like it, and the VM does not boot
correctly. I have spent a couple days debugging but I am a bit lost 
now. 

Basically whenever nvme_setup_prp_simple  is mapping with the mask,
I am getting timeout issues on boot, which to my knowledge shows NVMe
driver failure:
[    5.500662] random: crng init done
[    5.502933] random: 7 urandom warning(s) missed due to ratelimiting
[  132.077795] dracut-initqueue[472]: Warning: dracut-initqueue timeout - starting timeout scripts
[  132.614755] dracut-initqueue[472]: Warning: dracut-initqueue timeout - starting timeout scripts

I have checked that all the mappings happened correctly:
[    4.773570] nvme 0000:00:04.0: nvme setup prp simple is mapping 200 data, with offset 200, from fffffc9acd6c6040, mapped at 7affb200
[    4.784540] nvme 0000:00:04.0: nvme setup prp simple is mapping 200 data, with offset 400, from fffffc9acd6c6040, mapped at 7affc400
[    4.794096] nvme 0000:00:04.0: nvme setup prp simple is mapping 200 data, with offset 600, from fffffc9acd6c6040, mapped at 7affd600
[    4.801983] nvme 0000:00:04.0: nvme setup prp simple is mapping 200 data, with offset 800, from fffffc9acd6c6040, mapped at 7affe800
[    4.806873] nvme 0000:00:04.0: nvme setup prp simple is mapping 200 data, with offset a00, from fffffc9acd6c6040, mapped at 7afffa00
[    4.812382] nvme 0000:00:04.0: nvme setup prp simple is mapping 200 data, with offset c00, from fffffc9acd6c6040, mapped at 7b000c00
[    4.817423] nvme 0000:00:04.0: nvme setup prp simple is mapping 200 data, with offset e00, from fffffc9acd6c6040, mapped at 7b001e00
[    4.823652] nvme 0000:00:04.0: nvme setup prp simple is mapping 200 data, with offset 200, from fffffc9acd6c60c0, mapped at 7b003200
[    4.828679] nvme 0000:00:04.0: nvme setup prp simple is mapping 200 data, with offset 400, from fffffc9acd6c60c0, mapped at 7b004400
[    4.833875] nvme 0000:00:04.0: nvme setup prp simple is mapping 200 data, with offset 600, from fffffc9acd6c60c0, mapped at 7b005600
[    4.838926] nvme 0000:00:04.0: nvme setup prp simple is mapping 200 data, with offset 800, from fffffc9acd6c60c0, mapped at 7b006800 
... 
I have compared it to not setting the mask. The only difference in result is
that instead of being mapped to *200|* 400|*600 etc, they are all mapped 
to *000. So I believe the mapping is done correctly, and according to NVMe
spec figure 108/109, the mapping should have the offset kept. I am not 
sure what caused the error that eventually led to the failure. Is there another
bug in the NVMe driver?

On Thu, Jan 28, 2021 at 10:18 AM Christoph Hellwig <hch@lst.de> wrote:
On Thu, Jan 28, 2021 at 06:00:58PM +0000, Robin Murphy wrote:
> If it were possible for this to fail, you might leak the DMA mapping here.
> However if dev->dma_parms somehow disappeared since a dozen lines above
> then I think you've got far bigger problems anyway.
>
> That said, do you really need to keep toggling this back and forth all the
> time? Even if the device does make other mappings elsewhere that don't
> necessarily need the same strict alignment, would it be significantly
> harmful just to set it once at probe and leave it in place anyway?

Yes, we should kept it set all the time.  While some NVMe devices have
the optional to use SGLs that do not have this limitation, I have
absolutely no sympathy for anyone running NVMe with swiotlb as that
means their system imposes an addressing limitation.  We need to make
sure it does not corrupt data, but we're not going to make any effort
to optimize for such a degenerated setup.


--
Jianxiong Gao