Re: [LSF/MM ATTEND] HMM (heterogeneous memory manager) and GPU

From: David Woodhouse <dwmw2@infradead.org>
To: Oded Gabbay <oded.gabbay@gmail.com>
Cc: Jerome Glisse <j.glisse@gmail.com>,
	lsf-pc@lists.linux-foundation.org,
	"linux-mm@kvack.org" <linux-mm@kvack.org>,
	Joerg Roedel <joro@8bytes.org>
Subject: Re: [LSF/MM ATTEND] HMM (heterogeneous memory manager) and GPU
Date: Wed, 03 Feb 2016 10:15:08 +0000	[thread overview]
Message-ID: <1454494508.4788.154.camel@infradead.org> (raw)
In-Reply-To: <CAFCwf13VCoJvWbmxa7mZByseHc97VGzYZvi0zv6ww8_7hqF7Gw@mail.gmail.com>

[-- Attachment #1: Type: text/plain, Size: 3560 bytes --]

On Wed, 2016-02-03 at 11:21 +0200, Oded Gabbay wrote:

> OK, so I think I got confused up a little, but looking at your code I
> see that you register SVM for the mm notifier (intel_mm_release),
> therefore I guess what you meant to say you don't want to call a
> device driver callback from your mm notifier callback, correct ? (like
> the amd_iommu_v2 does when it calls ev_state->inv_ctx_cb inside its
> mn_release)

Right.

> Because you can't really control what the device driver will do, i.e.
> if it decides to register itself to the mm notifier in its own code.

Right. I can't *prevent* them from doing it. But I don't need to
encourage or facilitate it :)

> And because you don't call the device driver, the driver can/will get
> errors for using this PASID (since you unbinded it) and the device
> driver is supposed to handle it. Did I understood that correctly ?

In the case of an unclean exit, yes. In an orderly shutdown of the
process, one would hope that the device context is relinquished cleanly
rather than the process simply exiting.

And yes, the device and its driver are expected to handle faults. If
they don't do that, they are broken :)

> If I understood it correctly, doesn't it confuses between error/fault
> and normal unbinding ? Won't it be better to actively notify them and
> indeed *wait* until the device driver cleared its H/W pipeline before
> "pulling the carpet under their feet" ?
> 
> In our case (AMD GPUs), if we have such an error it could make the GPU
> stuck. That's why we even reset the wavefronts inside the GPU, if we
> can't gracefully remove the work from the GPU (see
> kfd_unbind_process_from_device)

But a rogue process can easily trigger faults — just request access to
an address that doesn't exist. My conversation with the hardware
designers was not about the peculiarities of any specific
implementation, but just getting them to confirm my assertion that if a
device *doesn't* cleanly handle faults on *one* PASID without screwing
over all the *other* PASIDs, then it is utterly broken by design and
should never get to production.

I *do* anticipate broken hardware which will crap itself completely
when it takes a fault, and have implemented a callback from the fault
handler so that the driver gets notified when a fault *happens* (even
on a PASID which is still alive), and can prod the broken hardware if
it needs to.

But I wasn't expecting it to be the norm.

> In the patch's comment you wrote:
> "Hardware designers have confirmed that the resulting 'PASID not present'
> faults should be handled just as gracefully as 'page not present' faults"
> 
> Unless *all* the H/W that is going to use SVM is designed by the same
> company, I don't think we can say such a thing. And even then, from my
> experience, H/W designers can be "creative" sometimes.

If we have to turn it into a 'page not present' fault instead of a
'PASID not present' fault, that's easy enough to do by pointing it at a
dummy PML4 (the zero page will do).

But I stand by my assertion that any hardware which doesn't handle at
least a 'page not present' fault in a given PASID without screwing over
all the other users of the hardware is BROKEN.

We could *almost* forgive hardware for stalling when it sees a 'PASID
not present' fault. Since that *does* require OS participation.

-- 
David Woodhouse                            Open Source Technology Centre
David.Woodhouse@intel.com                              Intel Corporation

[-- Attachment #2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 5691 bytes --]