kvm.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Robin Murphy <robin.murphy@arm.com>
To: Jason Gunthorpe <jgg@nvidia.com>, Joerg Roedel <joro@8bytes.org>
Cc: Nicolin Chen <nicolinc@nvidia.com>,
	will@kernel.org, alex.williamson@redhat.com,
	suravee.suthikulpanit@amd.com, marcan@marcan.st,
	sven@svenpeter.dev, alyssa@rosenzweig.io, robdclark@gmail.com,
	dwmw2@infradead.org, baolu.lu@linux.intel.com,
	mjrosato@linux.ibm.com, gerald.schaefer@linux.ibm.com,
	orsonzhai@gmail.com, baolin.wang@linux.alibaba.com,
	zhang.lyra@gmail.com, thierry.reding@gmail.com,
	vdumpa@nvidia.com, jonathanh@nvidia.com,
	jean-philippe@linaro.org, cohuck@redhat.com, tglx@linutronix.de,
	shameerali.kolothum.thodi@huawei.com, thunder.leizhen@huawei.com,
	christophe.jaillet@wanadoo.fr, yangyingliang@huawei.com,
	jon@solid-run.com, iommu@lists.linux.dev,
	linux-kernel@vger.kernel.org, asahi@lists.linux.dev,
	linux-arm-kernel@lists.infradead.org,
	linux-arm-msm@vger.kernel.org, linux-s390@vger.kernel.org,
	linux-tegra@vger.kernel.org,
	virtualization@lists.linux-foundation.org, kvm@vger.kernel.org,
	kevin.tian@intel.com
Subject: Re: [PATCH v6 1/5] iommu: Return -EMEDIUMTYPE for incompatible domain and device/group
Date: Thu, 8 Sep 2022 11:25:46 +0100	[thread overview]
Message-ID: <7ef259b2-121e-643e-49c2-0b65923d392d@arm.com> (raw)
In-Reply-To: <Yxk6sR4JiAAn3Jf5@nvidia.com>

On 2022-09-08 01:43, Jason Gunthorpe wrote:
> On Wed, Sep 07, 2022 at 08:41:13PM +0100, Robin Murphy wrote:
> 
>>>> FWIW, we're now very close to being able to validate dev->iommu against
>>>> where the domain came from in core code, and so short-circuit ->attach_dev
>>>> entirely if they don't match.
>>>
>>> I don't think this is a long term direction. We have systems now with
>>> a number of SMMU blocks and we really are going to see a need that
>>> they share the iommu_domains so we don't have unncessary overheads
>>> from duplicated io page table memory.
>>>
>>> So ultimately I'd expect to pass the iommu_domain to the driver and
>>> the driver will decide if the page table memory it represents is
>>> compatible or not. Restricting to only the same iommu instance isn't
>>> good..
>>
>> Who said IOMMU instance?
> 
> Ah, I completely misunderstood what 'dev->iommu' was referring too, OK
> I see.
> 
>> Again, not what I was suggesting. In fact the nature of iommu_attach_group()
>> already rules out bogus devices getting this far, so all a driver currently
>> has to worry about is compatibility of a device that it definitely probed
>> with a domain that it definitely allocated. Therefore, from a caller's point
>> of view, if attaching to an existing domain returns -EINVAL, try another
>> domain; multiple different existing domains can be tried, and may also
>> return -EINVAL for the same or different reasons; the final attempt is to
>> allocate a fresh domain and attach to that, which should always be nominally
>> valid and *never* return -EINVAL. If any attempt returns any other error,
>> bail out down the usual "this should have worked but something went wrong"
>> path. Even if any driver did have a nonsensical "nothing went wrong, I just
>> can't attach my device to any of my domains" case, I don't think it would
>> really need distinguishing from any other general error anyway.
> 
> The algorithm you described is exactly what this series does, it just
> used EMEDIUMTYPE instead of EINVAL. Changing it to EINVAL is not a
> fundamental problem, just a bit more work.
> 
> Looking at Nicolin's series there is a bunch of existing errnos that
> would still need converting, ie EXDEV, EBUSY, EOPNOTSUPP, EFAULT, and
> ENXIO are all returned as codes for 'domain incompatible with device'
> in various drivers. So the patch would still look much the same, just
> changing them to EINVAL instead of EMEDIUMTYPE.
> 
> That leaves the question of the remaining EINVAL's that Nicolin did
> not convert to EMEDIUMTYPE.
> 
> eg in the AMD driver:
> 
> 	if (!check_device(dev))
> 		return -EINVAL;
> 
> 	iommu = rlookup_amd_iommu(dev);
> 	if (!iommu)
> 		return -EINVAL;
> 
> These are all cases of 'something is really wrong with the device or
> iommu, everything will fail'. Other drivers are using ENODEV for this
> already, so we'd probably have an additional patch changing various
> places like that to ENODEV.
> 
> This mixture of error codes is the basic reason why a new code was
> used, because none of the existing codes are used with any
> consistency.
> 
> But OK, I'm on board, lets use more common errnos with specific
> meaning, that can be documented in a comment someplace:
>   ENOMEM - out of memory
>   ENODEV - no domain can attach, device or iommu is messed up
>   EINVAL - the domain is incompatible with the device
>   <others> - Same behavior as ENODEV, use is discouraged.
> 
> I think achieving consistency of error codes is a generally desirable
> goal, it makes the error code actually useful.
> 
> Joerg this is a good bit of work, will you be OK with it?
> 
>> Thus as long as we can maintain that basic guarantee that attaching
>> a group to a newly allocated domain can only ever fail for resource
>> allocation reasons and not some spurious "incompatibility", then we
>> don't need any obscure trickery, and a single, clear, error code is
>> in fact enough to say all that needs to be said.
> 
> As above, this is not the case, drivers do seem to have error paths
> that are unconditional on the domain. Perhaps they are just protective
> assertions and never happen.

Right, that's the gist of what I was getting at - I think it's worth 
putting in the effort to audit and fix the drivers so that that *can* be 
the case, then we can have a meaningful error API with standard codes 
effectively for free, rather than just sighing at the existing mess and 
building a slightly esoteric special case on top.

Case in point, the AMD checks quoted above are pointless, since it 
checks the same things in ->probe_device, and if that fails then the 
device won't get a group so there's no way for it to even reach 
->attach_dev any more. I'm sure there's a *lot* of cruft that can be 
cleared out now that per-device and per-domain ops give us this kind of 
inherent robustness.

Cheers,
Robin.

> Regardless, it doesn't matter. If they return ENODEV or EINVAL the
> VFIO side algorithm will continue to work fine, it just does alot more
> work if EINVAL is permanently returned.
> 
> Thanks,
> Jason

  parent reply	other threads:[~2022-09-08 10:26 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-08-15 18:14 [PATCH v6 0/5] Simplify vfio_iommu_type1 attach/detach routine Nicolin Chen
2022-08-15 18:14 ` [PATCH v6 1/5] iommu: Return -EMEDIUMTYPE for incompatible domain and device/group Nicolin Chen
2022-09-07 12:41   ` Joerg Roedel
2022-09-07 13:47     ` Jason Gunthorpe
2022-09-07 14:06       ` Joerg Roedel
2022-09-07 17:10         ` Jason Gunthorpe
2022-09-08 13:28           ` Joerg Roedel
2022-09-08 16:14             ` Jason Gunthorpe
2022-09-09  3:17               ` Nicolin Chen
2022-09-09  5:00                 ` Tian, Kevin
2022-09-09 12:07                   ` Jason Gunthorpe
2022-09-13  2:22                     ` Tian, Kevin
2022-09-13  5:07                       ` Nicolin Chen
2022-09-07 14:23       ` Robin Murphy
2022-09-07 17:00         ` Jason Gunthorpe
2022-09-07 19:41           ` Robin Murphy
2022-09-08  0:43             ` Jason Gunthorpe
2022-09-08  9:30               ` Tian, Kevin
2022-09-08 12:08                 ` Jason Gunthorpe
2022-09-10 23:35                   ` Nicolin Chen
2022-09-13  2:24                     ` Tian, Kevin
2022-09-13  8:36                       ` Nicolin Chen
2022-09-08  9:54               ` Tian, Kevin
2022-09-08 10:25               ` Robin Murphy [this message]
2022-08-15 18:14 ` [PATCH v6 2/5] vfio/iommu_type1: Prefer to reuse domains vs match enforced cache coherency Nicolin Chen
2022-08-15 18:14 ` [PATCH v6 3/5] vfio/iommu_type1: Remove the domain->ops comparison Nicolin Chen
2022-08-15 18:14 ` [PATCH v6 4/5] vfio/iommu_type1: Clean up update_dirty_scope in detach_group() Nicolin Chen
2022-08-15 18:14 ` [PATCH v6 5/5] vfio/iommu_type1: Simplify group attachment Nicolin Chen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=7ef259b2-121e-643e-49c2-0b65923d392d@arm.com \
    --to=robin.murphy@arm.com \
    --cc=alex.williamson@redhat.com \
    --cc=alyssa@rosenzweig.io \
    --cc=asahi@lists.linux.dev \
    --cc=baolin.wang@linux.alibaba.com \
    --cc=baolu.lu@linux.intel.com \
    --cc=christophe.jaillet@wanadoo.fr \
    --cc=cohuck@redhat.com \
    --cc=dwmw2@infradead.org \
    --cc=gerald.schaefer@linux.ibm.com \
    --cc=iommu@lists.linux.dev \
    --cc=jean-philippe@linaro.org \
    --cc=jgg@nvidia.com \
    --cc=jon@solid-run.com \
    --cc=jonathanh@nvidia.com \
    --cc=joro@8bytes.org \
    --cc=kevin.tian@intel.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-arm-msm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-s390@vger.kernel.org \
    --cc=linux-tegra@vger.kernel.org \
    --cc=marcan@marcan.st \
    --cc=mjrosato@linux.ibm.com \
    --cc=nicolinc@nvidia.com \
    --cc=orsonzhai@gmail.com \
    --cc=robdclark@gmail.com \
    --cc=shameerali.kolothum.thodi@huawei.com \
    --cc=suravee.suthikulpanit@amd.com \
    --cc=sven@svenpeter.dev \
    --cc=tglx@linutronix.de \
    --cc=thierry.reding@gmail.com \
    --cc=thunder.leizhen@huawei.com \
    --cc=vdumpa@nvidia.com \
    --cc=virtualization@lists.linux-foundation.org \
    --cc=will@kernel.org \
    --cc=yangyingliang@huawei.com \
    --cc=zhang.lyra@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).