linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* Memory failure handling of VFIO-pinned THP
@ 2020-01-23 21:39 Vikram Sethi
  2020-01-24  9:16 ` HORIGUCHI NAOYA(堀口 直也)
  0 siblings, 1 reply; 2+ messages in thread
From: Vikram Sethi @ 2020-01-23 21:39 UTC (permalink / raw)
  To: linux-mm; +Cc: n-horiguchi, James.Morse, alex.williamson

[-- Attachment #1: Type: text/plain, Size: 1950 bytes --]

Hello,

I was looking at memory_failure handling of pinned transparent hugepages (specifically pinned by VFIO for a VM with physical I/O).

AFAICT, on the initial memory error detected interrupt call memory_failure won't be able to split the THP because it is pinned, and will return -EBUSY without actually unmapping any processes with mappings to the THP with uncorrected memory error.

 

Later, when the VM does a load to the bad location (consumes poison), looking at the firmware first path on ARM64, the SEA exception will be forwarded by Firmware to host kernel, where the GHES code will queue work for memory_failure, where again memory_failure will exit early for the pinned THP, and userspace won't get the SIGBUS with Action Required code to be able to inject the error into the VM.

 

Discussing with James, we were wondering why the pinned THP isn't treated like hugetlbfs memory failure, marking the entire hugepage with hw_poison flag, and unmapping of mapped processes when the error is detected (memory_failure_hugetlb calling hwpoison_user_mappings)? If that were done, when the VM later tries to load the bad location, the resulting VM fault will get the appropriate VM_FAULT_HWPOISON code, which will trigger KVM to send the SIGBUS with Action Required code to userspace, which can then inject to the VM?

I do understand that the page is pinned so that DMAs can happen from the VM's I/O devices without I/O faults, but since the hw_poison flag would be set for the page on the initial "error detected" interrupt by memory_failure, the kernel wouldn't reallocate the page anyway. And any interim DMA writes that hit the bad page wouldn't be corrupting anyone else, and DMA reads would be getting poison back/completer abort. 

 

Am I missing something, or is this currently broken for VFIO and VM THP pages with memory failure (at least as far as signaling user space goes)?

 

Thanks,

Vikram


[-- Attachment #2: Type: text/html, Size: 3353 bytes --]

^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: Memory failure handling of VFIO-pinned THP
  2020-01-23 21:39 Memory failure handling of VFIO-pinned THP Vikram Sethi
@ 2020-01-24  9:16 ` HORIGUCHI NAOYA(堀口 直也)
  0 siblings, 0 replies; 2+ messages in thread
From: HORIGUCHI NAOYA(堀口 直也) @ 2020-01-24  9:16 UTC (permalink / raw)
  To: Vikram Sethi; +Cc: linux-mm, n-horiguchi, James.Morse, alex.williamson

Hi Vikram,

On Thu, Jan 23, 2020 at 03:39:33PM -0600, Vikram Sethi wrote:
> Hello,
> 
> I was looking at memory_failure handling of pinned transparent hugepages
> (specifically pinned by VFIO for a VM with physical I/O).
> 
> AFAICT, on the initial memory error detected interrupt call memory_failure
> won't be able to split the THP because it is pinned, and will return -EBUSY
> without actually unmapping any processes with mappings to the THP with
> uncorrected memory error.

Yes, that's the current behavior.

> 
> Later, when the VM does a load to the bad location (consumes poison), looking
> at the firmware first path on ARM64, the SEA exception will be forwarded by
> Firmware to host kernel, where the GHES code will queue work for
> memory_failure, where again memory_failure will exit early for the pinned THP,
> and userspace won't get the SIGBUS with Action Required code to be able to
> inject the error into the VM.
> 
>  
> 
> Discussing with James, we were wondering why the pinned THP isn't treated like
> hugetlbfs memory failure, marking the entire hugepage with hw_poison flag, and
> unmapping of mapped processes when the error is detected
> (memory_failure_hugetlb calling hwpoison_user_mappings)? If that were done,
> when the VM later tries to load the bad location, the resulting VM fault will
> get the appropriate VM_FAULT_HWPOISON code, which will trigger KVM to send the
> SIGBUS with Action Required code to userspace, which can then inject to the VM?

Generally, THP can be shared by multiple processes, where some map with pte
mapping, and the others map with pmd mapping. So if we treat all pages in
the pinned thp as hwpoisoned, processes mapping with pte mapping could be
signaled by accessing to non-error subpages, which seems to me suboptimal.
But I agree that containing a whole thp could improve error reporting when
there's no pte mapping for the pinned thp.

> I do understand that the page is pinned so that DMAs can happen from the VM's I
> /O devices without I/O faults, but since the hw_poison flag would be set for
> the page on the initial "error detected" interrupt by memory_failure, the
> kernel wouldn't reallocate the page anyway. And any interim DMA writes that hit
> the bad page wouldn't be corrupting anyone else, and DMA reads would be getting
> poison back/completer abort. 
>  
> 
> Am I missing something, or is this currently broken for VFIO and VM THP pages
> with memory failure (at least as far as signaling user space goes)?

You're right, it's simply not implemented.

Thanks,
Naoya Horiguchi

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2020-01-24  9:16 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-01-23 21:39 Memory failure handling of VFIO-pinned THP Vikram Sethi
2020-01-24  9:16 ` HORIGUCHI NAOYA(堀口 直也)

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).