linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Robin Murphy <robin.murphy@arm.com>
To: Jason Gunthorpe <jgg@nvidia.com>
Cc: Christoph Hellwig <hch@infradead.org>,
	Lu Baolu <baolu.lu@linux.intel.com>,
	Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	Joerg Roedel <joro@8bytes.org>,
	Alex Williamson <alex.williamson@redhat.com>,
	Bjorn Helgaas <bhelgaas@google.com>,
	Kevin Tian <kevin.tian@intel.com>,
	Ashok Raj <ashok.raj@intel.com>,
	kvm@vger.kernel.org, rafael@kernel.org,
	David Airlie <airlied@linux.ie>,
	linux-pci@vger.kernel.org,
	Thierry Reding <thierry.reding@gmail.com>,
	Diana Craciun <diana.craciun@oss.nxp.com>,
	Dmitry Osipenko <digetx@gmail.com>, Will Deacon <will@kernel.org>,
	Stuart Yoder <stuyoder@gmail.com>,
	Jonathan Hunter <jonathanh@nvidia.com>,
	Chaitanya Kulkarni <kch@nvidia.com>,
	Dan Williams <dan.j.williams@intel.com>,
	Cornelia Huck <cohuck@redhat.com>,
	linux-kernel@vger.kernel.org, Li Yang <leoyang.li@nxp.com>,
	iommu@lists.linux-foundation.org,
	Jacob jun Pan <jacob.jun.pan@intel.com>,
	Daniel Vetter <daniel@ffwll.ch>
Subject: Re: [PATCH v6 02/11] driver core: Add dma_cleanup callback in bus_type
Date: Tue, 22 Feb 2022 21:18:23 +0000	[thread overview]
Message-ID: <3d4c3bf1-fed6-f640-dc20-36d667de7461@arm.com> (raw)
In-Reply-To: <20220222151632.GB10061@nvidia.com>

On 2022-02-22 15:16, Jason Gunthorpe wrote:
> On Tue, Feb 22, 2022 at 10:58:37AM +0000, Robin Murphy wrote:
>> On 2022-02-21 23:48, Jason Gunthorpe wrote:
>>> On Mon, Feb 21, 2022 at 08:43:33PM +0000, Robin Murphy wrote:
>>>> On 2022-02-19 07:32, Christoph Hellwig wrote:
>>>>> So we are back to the callback madness instead of the nice and simple
>>>>> flag?  Sigh.
>>>>
>>>> TBH, I *think* this part could be a fair bit simpler. It looks like this
>>>> whole callback mess is effectively just to decrement
>>>> group->owner_cnt, but
>>>
>>> Right, the new callback is because of Greg's push to put all the work
>>> into the existing bus callback. Having symetrical callbacks is
>>> cleaner.
>>
>> I'll continue to disagree that having tons more code purely for the sake of
>> it is cleaner. The high-level requirements are fundamentally asymmetrical -
>> ownership has to be actively claimed by the bus code at a point during probe
>> where it can block probing if necessary, but it can be released anywhere at
>> all during remove since that cannot fail. I don't personally see the value
>> in a bunch of code bloat for no reason other than trying to pretend that an
>> asymmetrical thing isn't.
> 
> Then we should put this in the share core code like most of us want.
> 
> If we are doing this distorted thing then it may as well make some
> kind of self consistent sense with a configure/unconfigure op pair.
> 
>> group->owner?  Walking the list would only have to be done for *releasing*
>> ownership and I'm pretty sure all the races there are benign - only
>> probe/remove of the driver (or DMA API token) matching a current non-NULL
>> owner matter; if two removes race, the first might end up releasing
>> ownership "early", but the second is waiting to do that anyway so it's OK;
>> if a remove races with a probe, the remove may end up leaving the owner set,
>> but the probe is waiting to do that anyway so it's OK.
> 
> With a lockless algorithm the race is probably wrongly releasing an
> ownership that probe just set in the multi-device group case.
> 
> Still not sure I see what you are thinking though..

What part of "How hard is it to hold group->mutex when reading or 
writing group->owner?" sounded like "complex lockless algorithm", exactly?

To spell it out, the scheme I'm proposing looks like this:

probe/claim:
	void *owner = driver_or_DMA_API_token(dev);//oversimplification!
	if (owner) {
		mutex_lock(group->mutex);
		if (!group->owner)
			group->owner = owner;
		else if (group->owner != owner);
			ret = -EBUSY;
		mutex_unlock(group->mutex);
	}

remove:
	bool still_owned = false;
	mutex_lock(group->mutex);
	list_for_each_entry(tmp, &group->devices, list) {
		void *owner = driver_or_DMA_API_token(tmp);
		if (tmp == dev || !owner || owner != group->owner)
			continue;
		still_owned = true;
		break;
	}
	if (!still_owned)
		group->owner = NULL;
	mutex_unlock(group->mutex);

Of course now that I've made it more concrete I realise that the remove 
hook does need to run *after* dev->driver is cleared, so not quite 
"anywhere at all", but the main point remains: as long as actual changes 
of ownership are always serialised, even if the list walk in the remove 
hook sees "future" information WRT other devices' drivers, at worst it 
should merely short-cut to a corresponding pending reclaim of ownership.

> How did we get from adding a few simple lines to dd.c into building
> some complex lockless algorithm and hoping we did it right?

Because the current alternative to adding a few simple lines to dd.c is 
adding loads of lines all over the place to end up calling back into 
common IOMMU code, to do something I'm 99% certain the common IOMMU code 
could do for itself in private. That said, having worked through the 
above, it does start looking like a bit of a big change for this series 
at this point, so I'm happy to keep it on the back burner for when I 
have to rip .dma_configure to pieces anyway.

According to lockdep, I think I've solved the VFIO locking issue 
provided vfio_group_viable() goes away, so I'm certainly keen not to 
delay that for another cycle!

>>>> It has to be s It should be pretty straightforward for
>>>> iommu_bus_notifier to clear group->owner automatically upon an
>>>> unbind of the matching driver when it's no longer bound to any other
>>>> devices in the group either.
>>>
>>> That not_bound/unbind notifier isn't currently triggred during
>>> necessary failure paths of really_probe().
>>
>> Eh? Just look at the context of patch #2, let alone the rest of the
>> function, and tell me how, if we can't rely on BUS_NOTIFY_DRIVER_NOT_BOUND,
>> calling .dma_cleanup *from the exact same place* is somehow more reliable?
> 
> Yeah, OK
> 
>> AFAICS, a notifier handling both BUS_NOTIFY_UNBOUND_DRIVER and
>> BUS_NOTIFY_DRIVER_NOT_BOUND would be directly equivalent to the callers of
>> .dma_cleanup here.
> 
> Yes, but why hide this in a notifier, it is still spaghetti

Quick quiz!

1: The existing IOMMU group management has spent the last 10 years being 
driven from:

   A - All over random bits of bus code and the driver core
   B - A private bus notifier


2: The functionality that this series replaces and improves upon was 
split between VFIO and...

   A - Random bits of bus code and the driver core
   B - The same private bus notifier

>>>> use-case) then it should be up to VFIO to decide when it's finally
>>>> finished with the whole group, rather than pretending we can keep
>>>> track of nested ownership claims from inside the API.
>>>
>>> What nesting?
>>
>> The current implementation of iommu_group_claim_dma_owner() allows owner_cnt
>> to increase beyond 1, and correspondingly requires
>> iommu_group_release_dma_owner() to be called the same number of times. It
>> doesn't appear that VFIO needs that, and I'm not sure I'd trust any other
>> potential users to get it right either.
> 
> That isn't for "nesting" it is keeping track of multi-device
> groups. Each count represents a device, not a nest.

I was originally going to say "recursion", but then thought that might 
carry too much risk of misinterpretation, oh well. Hold your favourite 
word for "taking a mutual-exclusion token that you already hold" in mind 
and read my paragraph quoted above again. I'm not talking about 
automatic DMA API claiming, that clearly happens per-device; I'm talking 
about explicit callers of iommu_group_claim_dma_owner(). Does VFIO call 
that multiple times for individual devices? No. Should it? No. Is it 
reasonable that any other future callers should need to? I don't think 
so. Would things be easier to reason about if we just disallowed it 
outright? For sure.

>>>> FWIW I have some ideas for re-converging .dma_configure in future
>>>> which I think should probably be able to subsume this into a
>>>> completely generic common path, given a common flag.
>>>
>>> This would be great!
>>
>> Indeed, so if we're enthusiastic about future cleanup that necessitates a
>> generic flag, why not make the flag generic to start with?
> 
> Maybe when someone has patches to delete the bus ops completely they
> can convince Greg. The good news is that it isn't much work to flip
> the flag, Lu has already done it 3 times in the previous versions..
> 
> It has already been 8 weeks on this point, lets just move on please.

Sure, if it was rc7 with the merge window looming I'd be saying "this is 
close enough, let's get it in now and fix the small stuff next cycle". 
However while there's still potentially time to get things right first 
time, I for one am going to continue to point them out because I'm not a 
fan of avoidable churn. I'm sorry I haven't had a chance to look 
properly at this series between v1 and v6, but that's just how things 
have been.

Robin.

  reply	other threads:[~2022-02-22 21:18 UTC|newest]

Thread overview: 45+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-02-18  0:55 [PATCH v6 00/11] Fix BUG_ON in vfio_iommu_group_notifier() Lu Baolu
2022-02-18  0:55 ` [PATCH v6 01/11] iommu: Add dma ownership management interfaces Lu Baolu
2022-02-19  7:31   ` Christoph Hellwig
2022-02-21  4:02     ` Lu Baolu
2022-02-23 18:00   ` Robin Murphy
2022-02-23 18:02     ` Jason Gunthorpe
2022-02-23 18:20       ` Robin Murphy
2022-02-23 18:32         ` Jason Gunthorpe
2022-02-24  5:16       ` Lu Baolu
2022-02-24  5:29         ` Lu Baolu
2022-02-24  8:58           ` Robin Murphy
2022-02-24  5:21     ` Lu Baolu
2022-02-18  0:55 ` [PATCH v6 02/11] driver core: Add dma_cleanup callback in bus_type Lu Baolu
2022-02-19  7:32   ` Christoph Hellwig
2022-02-21 20:43     ` Robin Murphy
2022-02-21 23:48       ` Jason Gunthorpe
2022-02-22  4:48         ` Lu Baolu
2022-02-22 10:58         ` Robin Murphy
2022-02-22 15:16           ` Jason Gunthorpe
2022-02-22 21:18             ` Robin Murphy [this message]
2022-02-22 23:53               ` Jason Gunthorpe
2022-02-23  5:01                 ` Lu Baolu
2022-02-23 13:04                   ` Robin Murphy
2022-02-23 13:46                     ` Jason Gunthorpe
2022-02-23 14:06                       ` Greg Kroah-Hartman
2022-02-23 14:09                         ` Jason Gunthorpe
2022-02-23 14:30                           ` Jason Gunthorpe
2022-02-23 16:03                             ` Greg Kroah-Hartman
2022-02-23 17:05                               ` Robin Murphy
2022-02-23 17:47                                 ` Greg Kroah-Hartman
2022-02-18  0:55 ` [PATCH v6 03/11] amba: Stop sharing platform_dma_configure() Lu Baolu
2022-02-18  0:55 ` [PATCH v6 04/11] bus: platform,amba,fsl-mc,PCI: Add device DMA ownership management Lu Baolu
2022-02-18  7:55   ` Greg Kroah-Hartman
2022-02-18  0:55 ` [PATCH v6 05/11] PCI: pci_stub: Set driver_managed_dma Lu Baolu
2022-02-18  0:55 ` [PATCH v6 06/11] PCI: portdrv: " Lu Baolu
2022-02-18  0:55 ` [PATCH v6 07/11] vfio: Set DMA ownership for VFIO devices Lu Baolu
2022-02-18  0:55 ` [PATCH v6 08/11] vfio: Remove use of vfio_group_viable() Lu Baolu
2022-02-18  0:55 ` [PATCH v6 09/11] vfio: Delete the unbound_list Lu Baolu
2022-02-18  0:55 ` [PATCH v6 10/11] vfio: Remove iommu group notifier Lu Baolu
2022-02-23 21:53   ` Alex Williamson
2022-02-24  2:49     ` Lu Baolu
2022-02-18  0:55 ` [PATCH v6 11/11] iommu: Remove iommu group changes notifier Lu Baolu
2022-02-18 15:51 ` [PATCH v6 00/11] Fix BUG_ON in vfio_iommu_group_notifier() Jason Gunthorpe
2022-02-21  3:38   ` Lu Baolu
2022-02-28  0:58 ` Lu Baolu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=3d4c3bf1-fed6-f640-dc20-36d667de7461@arm.com \
    --to=robin.murphy@arm.com \
    --cc=airlied@linux.ie \
    --cc=alex.williamson@redhat.com \
    --cc=ashok.raj@intel.com \
    --cc=baolu.lu@linux.intel.com \
    --cc=bhelgaas@google.com \
    --cc=cohuck@redhat.com \
    --cc=dan.j.williams@intel.com \
    --cc=daniel@ffwll.ch \
    --cc=diana.craciun@oss.nxp.com \
    --cc=digetx@gmail.com \
    --cc=gregkh@linuxfoundation.org \
    --cc=hch@infradead.org \
    --cc=iommu@lists.linux-foundation.org \
    --cc=jacob.jun.pan@intel.com \
    --cc=jgg@nvidia.com \
    --cc=jonathanh@nvidia.com \
    --cc=joro@8bytes.org \
    --cc=kch@nvidia.com \
    --cc=kevin.tian@intel.com \
    --cc=kvm@vger.kernel.org \
    --cc=leoyang.li@nxp.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pci@vger.kernel.org \
    --cc=rafael@kernel.org \
    --cc=stuyoder@gmail.com \
    --cc=thierry.reding@gmail.com \
    --cc=will@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).