All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jason Gunthorpe <jgg@nvidia.com>
To: "Tian, Kevin" <kevin.tian@intel.com>
Cc: Lu Baolu <baolu.lu@linux.intel.com>,
	"bpf@vger.kernel.org" <bpf@vger.kernel.org>,
	Jonathan Corbet <corbet@lwn.net>,
	David Woodhouse <dwmw2@infradead.org>,
	"iommu@lists.linux.dev" <iommu@lists.linux.dev>,
	Joerg Roedel <joro@8bytes.org>,
	"linux-doc@vger.kernel.org" <linux-doc@vger.kernel.org>,
	"linux-kselftest@vger.kernel.org"
	<linux-kselftest@vger.kernel.org>,
	"llvm@lists.linux.dev" <llvm@lists.linux.dev>,
	Nathan Chancellor <nathan@kernel.org>,
	Nick Desaulniers <ndesaulniers@google.com>,
	Miguel Ojeda <ojeda@kernel.org>,
	Robin Murphy <robin.murphy@arm.com>,
	Shuah Khan <shuah@kernel.org>,
	Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>,
	Tom Rix <trix@redhat.com>, Will Deacon <will@kernel.org>,
	Alex Williamson <alex.williamson@redhat.com>,
	Chaitanya Kulkarni <chaitanyak@nvidia.com>,
	Cornelia Huck <cohuck@redhat.com>,
	Daniel Jordan <daniel.m.jordan@oracle.com>,
	David Gibson <david@gibson.dropbear.id.au>,
	Eric Auger <eric.auger@redhat.com>,
	Eric Farman <farman@linux.ibm.com>,
	Jason Wang <jasowang@redhat.com>,
	Jean-Philippe Brucker <jean-philippe@linaro.org>,
	"Martins, Joao" <joao.m.martins@oracle.com>,
	"kvm@vger.kernel.org" <kvm@vger.kernel.org>,
	Matthew Rosato <mjrosato@linux.ibm.com>,
	"Michael S. Tsirkin" <mst@redhat.com>,
	Nicolin Chen <nicolinc@nvidia.com>,
	Niklas Schnelle <schnelle@linux.ibm.com>,
	Shameerali Kolothum Thodi <shameerali.kolothum.thodi@huawei.com>,
	"Liu, Yi L" <yi.l.liu@intel.com>,
	Keqian Zhu <zhukeqian1@huawei.com>
Subject: Re: [PATCH v3 05/15] iommufd: File descriptor, context, kconfig and makefiles
Date: Mon, 7 Nov 2022 13:00:52 -0400	[thread overview]
Message-ID: <Y2k5xIp3TksL94MV@nvidia.com> (raw)
In-Reply-To: <BN9PR11MB527695C6CE5BEA4AFACF3DBB8C389@BN9PR11MB5276.namprd11.prod.outlook.com>

On Thu, Nov 03, 2022 at 07:22:11AM +0000, Tian, Kevin wrote:
> > From: Jason Gunthorpe <jgg@nvidia.com>
> > Sent: Wednesday, October 26, 2022 2:12 AM
> > 
> > --- a/MAINTAINERS
> > +++ b/MAINTAINERS
> > @@ -10714,6 +10714,16 @@ F:	drivers/iommu/dma-iommu.h
> >  F:	drivers/iommu/iova.c
> >  F:	include/linux/iova.h
> > 
> > +IOMMU FD
> 
> remove the space, i.e. IOMMUFD

OK

> > +config IOMMUFD
> > +	tristate "IOMMU Userspace API"
> > +	select INTERVAL_TREE
> > +	select INTERVAL_TREE_SPAN_ITER
> > +	select IOMMU_API
> > +	default n
> > +	help
> > +	  Provides /dev/iommu the user API to control the IOMMU subsystem
> > as
> > +	  it relates to managing IO page tables that point at user space
> > memory.
> > +
> > +	  This would commonly be used in combination with VFIO.
> 
> remove this line

Sure
 
> > +/**
> > + * iommufd_put_object_keep_user() - Release part of the refcount on obj
> 
> what does 'part of the refcount' mean?
> 
> > + * @obj - Object to release
> > + *
> > + * Objects have two protections to ensure that userspace has a consistent
> > + * experience with destruction. Normally objects are locked so that destroy
> > will
> > + * block while there are concurrent users, and wait for the object to be
> > + * unlocked.
> > + *
> > + * However, destroy can also be blocked by holding users reference counts
> > on the
> > + * objects, in that case destroy will immediately return EBUSY and will not
> > wait
> > + * for reference counts to go to zero.
> > + *
> > + * This function switches from blocking userspace to returning EBUSY.
> 
> Not sure where "switch from... to..." comes from. Also this function alone
> doesn't deal anything with EBUSY. Probably it is clearer that this interface
> is used for long-term refcounting which the destroy path should favor to
> not block as it did for transient refcounting in concurrent ioctl paths.

How about this:

/**
 * iommufd_ref_to_users() - Switch from destroy_rwsem to users refcount
 *        protection
 * @obj - Object to release
 *
 * Objects have two refcount protections (destroy_rwsem and the refcount_t
 * users). Holding either of these will prevent the object from being destroyed.
 *
 * Depending on the use case, one protection or the other is appropriate.  In
 * most cases references are being protected by the destroy_rwsem. This allows
 * orderly destruction of the object becuase iommufd_object_destroy_user() will
 * wait for it to become unlocked. However, as a rwsem, it cannot be held across
 * a system call return. So cases that have longer term needs must switch
 * to the weaker users refcount_t.
 *
 * With users protection iommufd_object_destroy_user() will return -EBUSY to
 * userspace and refuse to destroy the object.
 */

With a function rename.

> > + * It should be used in places where the users will be held beyond a single
> > + * system call.
> 
> 'users' or 'external drivers'? Do we ever allow userspace to hold the lock
> of a kernel object for undefined time?

"users" is the name of the variable and the refcount

We don't allow userspace to hold a lock, but we do allow it to hold a
refcount. And the refcount is defined so that it must be unheld
internally in the kernel when the fd is released. Eg this is part of
what the access_ops unmap callback is doing.

 
> > +++ b/drivers/iommu/iommufd/main.c
> > @@ -0,0 +1,345 @@
> > +// SPDX-License-Identifier: GPL-2.0-only
> > +/* Copyright (C) 2021 Intel Corporation
> > + * Copyright (c) 2021-2022, NVIDIA CORPORATION & AFFILIATES
> > + *
> > + * iommufd provides control over the IOMMU HW objects created by
> > IOMMU kernel
> > + * drivers. IOMMU HW objects revolve around IO page tables that map
> > incoming DMA
> > + * addresses (IOVA) to CPU addresses.
> 
> "to bus addresses".

CPU address is correct, these are all phys_addr_t's. The IOVA is the
"bus address" in kernel terminology.

> > + *
> > + * The API is divided into a general portion that is intended to work with any
> > + * kernel IOMMU driver, and a device specific portion that  is intended to be
> > + * used with a userspace HW driver paired with the specific kernel driver.
> > This
> > + * mechanism allows all the unique functionalities in individual IOMMUs to
> > be
> > + * exposed to userspace control.
> 
> there is no device specific portion in this series.

Lets drop the paragraph

> > +/*
> > + * Allow concurrent access to the object. This should only be done once the
> > + * system call that created the object is guaranteed to succeed.
> 
> an object is not always created by a system call, e.g. iommufd_access

Yes, but that is actually a special case, "iommufd_access" is a "leaf"
object that doesn't have any way for userspace to block it. Something
like that can always be destroyed, indeed we require it as part of the
drive facing API. So this guidance isn't aimed at that.

How about:


/*
 * Allow concurrent access to the object.
 *
 * Once another thread can see the object pointer it can prevent object
 * destruction. Expect for special kernel-only objects there is no in-kernel way
 * to reliably destroy a single object. Thus all APIs that are creating objects
 * must use iommufd_object_abort() to handle their errors and only call
 * iommufd_object_finalize() once object creation cannot fail.
 */

> > +static int iommufd_destroy(struct iommufd_ucmd *ucmd)
> > +{
> > +	struct iommu_destroy *cmd = ucmd->cmd;
> > +	struct iommufd_object *obj;
> > +
> > +	obj = iommufd_get_object(ucmd->ictx, cmd->id,
> > IOMMUFD_OBJ_ANY);
> > +	if (IS_ERR(obj))
> > +		return PTR_ERR(obj);
> > +	iommufd_put_object_keep_user(obj);
> > +	if (!iommufd_object_destroy_user(ucmd->ictx, obj))
> > +		return -EBUSY;
> 
> Add a comment that it implies a refcnt hold by external driver in a
> long time so return error instead of blocking...

Lets do "See iommufd_ref_to_users" 

> > +	nr = _IOC_NR(cmd);
> > +	if (nr < IOMMUFD_CMD_BASE ||
> > +	    (nr - IOMMUFD_CMD_BASE) >= ARRAY_SIZE(iommufd_ioctl_ops))
> > +		return -ENOIOCTLCMD;
> 
> According to the description in iommufd.h:
> 
> 	*  - ENOTTY: The IOCTL number itself is not supported at all

Yes, it is weird, but in ioctl handlers you return ENOTTY by returning
ENOIOCTLCMD which is then converted to ENOTTY in vfs_ioctl()
 
> > +	op = &iommufd_ioctl_ops[nr - IOMMUFD_CMD_BASE];
> > +	if (op->ioctl_num != cmd)
> > +		return -ENOIOCTLCMD;
> > +	if (ucmd.user_size < op->min_size)
> > +		return -EOPNOTSUPP;
> 
> -EINVAL?

Yes

> > +/**
> > + * DOC: General ioctl format
> > + *
> > + * The ioctl mechanims follows a general format to allow for extensibility.
> 
> mechanism

It was changed to "interface" in another comment

Thanks,
Jason 

  reply	other threads:[~2022-11-07 17:00 UTC|newest]

Thread overview: 85+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-10-25 18:12 [PATCH v3 00/15] IOMMUFD Generic interface Jason Gunthorpe
2022-10-25 18:12 ` [PATCH v3 01/15] iommu: Add IOMMU_CAP_ENFORCE_CACHE_COHERENCY Jason Gunthorpe
2022-10-26 12:45   ` Baolu Lu
2022-11-03  5:03   ` Tian, Kevin
2022-11-04 19:25     ` Jason Gunthorpe
2022-10-25 18:12 ` [PATCH v3 02/15] iommu: Add device-centric DMA ownership interfaces Jason Gunthorpe
2022-11-03  5:11   ` Tian, Kevin
2022-11-04 19:32     ` Jason Gunthorpe
2022-10-25 18:12 ` [PATCH v3 03/15] interval-tree: Add a utility to iterate over spans in an interval tree Jason Gunthorpe
2022-11-03  5:31   ` Tian, Kevin
2022-11-04 19:38     ` Jason Gunthorpe
2022-11-05  1:32       ` Tian, Kevin
2022-11-05  1:48       ` Matthew Wilcox
2022-11-07 14:38         ` Jason Gunthorpe
2022-10-25 18:12 ` [PATCH v3 04/15] iommufd: Overview documentation Jason Gunthorpe
2022-10-26  4:17   ` Bagas Sanjaya
2022-10-28 19:09     ` Jason Gunthorpe
2022-10-25 18:12 ` [PATCH v3 05/15] iommufd: File descriptor, context, kconfig and makefiles Jason Gunthorpe
2022-10-26 12:58   ` Baolu Lu
2022-10-26 17:14     ` Jason Gunthorpe
2022-10-29  3:43       ` Baolu Lu
2022-11-03  7:22   ` Tian, Kevin
2022-11-07 17:00     ` Jason Gunthorpe [this message]
2022-10-25 18:12 ` [PATCH v3 06/15] kernel/user: Allow user::locked_vm to be usable for iommufd Jason Gunthorpe
2022-11-03  7:23   ` Tian, Kevin
2022-10-25 18:12 ` [PATCH v3 07/15] iommufd: PFN handling for iopt_pages Jason Gunthorpe
2022-11-01 19:38   ` Nicolin Chen
2022-11-02 13:13     ` Jason Gunthorpe
2022-10-25 18:12 ` [PATCH v3 08/15] iommufd: Algorithms for PFN storage Jason Gunthorpe
2022-10-31 16:01   ` [PATCH v3 8/15] " Jason Gunthorpe
2022-11-01 16:09   ` Jason Gunthorpe
2022-11-03 20:08   ` Jason Gunthorpe
2022-11-04 16:26     ` Jason Gunthorpe
2022-11-04 16:04   ` Jason Gunthorpe
2022-10-25 18:12 ` [PATCH v3 09/15] iommufd: Data structure to provide IOVA to PFN mapping Jason Gunthorpe
2022-10-26 18:46   ` [PATCH v3 9/15] " Jason Gunthorpe
2022-10-27 11:37   ` Jason Gunthorpe
2022-10-27 13:35   ` Jason Gunthorpe
2022-10-28 18:52   ` Jason Gunthorpe
2022-11-01 19:17   ` [PATCH v3 09/15] " Nicolin Chen
2022-11-02 13:11     ` Jason Gunthorpe
2022-10-25 18:12 ` [PATCH v3 10/15] iommufd: IOCTLs for the io_pagetable Jason Gunthorpe
2022-10-26 17:01   ` Jason Gunthorpe
2022-10-26 23:17   ` Jason Gunthorpe
2022-10-29  7:25   ` Baolu Lu
2022-11-07 14:17     ` Jason Gunthorpe
2022-11-04  8:32   ` Tian, Kevin
2022-11-07 15:02     ` Jason Gunthorpe
2022-11-08  2:05       ` Tian, Kevin
2022-11-08 17:29         ` Jason Gunthorpe
2022-11-09  2:50           ` Tian, Kevin
2022-11-09 13:05             ` Jason Gunthorpe
2022-10-25 18:12 ` [PATCH v3 11/15] iommufd: Add a HW pagetable object Jason Gunthorpe
2022-11-04 10:00   ` Tian, Kevin
2022-10-25 18:12 ` [PATCH v3 12/15] iommufd: Add kAPI toward external drivers for physical devices Jason Gunthorpe
2022-10-29  7:19   ` Baolu Lu
2022-11-07 14:14     ` Jason Gunthorpe
2022-11-05  7:17   ` Tian, Kevin
2022-11-07 17:54     ` Jason Gunthorpe
2022-11-08  2:17       ` Tian, Kevin
2022-10-25 18:12 ` [PATCH v3 13/15] iommufd: Add kAPI toward external drivers for kernel access Jason Gunthorpe
2022-10-25 18:12 ` [PATCH v3 14/15] iommufd: vfio container FD ioctl compatibility Jason Gunthorpe
2022-10-27 14:12   ` Jason Gunthorpe
2022-11-01 19:45   ` Nicolin Chen
2022-11-02 13:15     ` Jason Gunthorpe
2022-11-05  0:07   ` Baolu Lu
2022-11-07 14:23     ` Jason Gunthorpe
2022-11-05  9:31   ` Tian, Kevin
2022-11-07 17:08     ` Jason Gunthorpe
2022-11-07 23:53       ` Tian, Kevin
2022-11-08  0:09         ` Jason Gunthorpe
2022-11-08  0:13           ` Tian, Kevin
2022-11-08  0:17             ` Jason Gunthorpe
2022-10-25 18:12 ` [PATCH v3 15/15] iommufd: Add a selftest Jason Gunthorpe
2022-11-01 20:32   ` Nicolin Chen
2022-11-02 13:17     ` Jason Gunthorpe
2022-11-02 18:49       ` Nathan Chancellor
2022-11-04  1:01   ` Jason Gunthorpe
2022-11-04  5:43     ` Tian, Kevin
2022-11-04 19:42       ` Jason Gunthorpe
2022-10-28 23:57 ` [PATCH v3 00/15] IOMMUFD Generic interface Nicolin Chen
2022-11-04 21:27 ` Alex Williamson
2022-11-04 22:03   ` Alex Williamson
2022-11-07 14:22     ` Jason Gunthorpe
2022-11-07 14:19   ` Jason Gunthorpe

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Y2k5xIp3TksL94MV@nvidia.com \
    --to=jgg@nvidia.com \
    --cc=alex.williamson@redhat.com \
    --cc=baolu.lu@linux.intel.com \
    --cc=bpf@vger.kernel.org \
    --cc=chaitanyak@nvidia.com \
    --cc=cohuck@redhat.com \
    --cc=corbet@lwn.net \
    --cc=daniel.m.jordan@oracle.com \
    --cc=david@gibson.dropbear.id.au \
    --cc=dwmw2@infradead.org \
    --cc=eric.auger@redhat.com \
    --cc=farman@linux.ibm.com \
    --cc=iommu@lists.linux.dev \
    --cc=jasowang@redhat.com \
    --cc=jean-philippe@linaro.org \
    --cc=joao.m.martins@oracle.com \
    --cc=joro@8bytes.org \
    --cc=kevin.tian@intel.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-kselftest@vger.kernel.org \
    --cc=llvm@lists.linux.dev \
    --cc=mjrosato@linux.ibm.com \
    --cc=mst@redhat.com \
    --cc=nathan@kernel.org \
    --cc=ndesaulniers@google.com \
    --cc=nicolinc@nvidia.com \
    --cc=ojeda@kernel.org \
    --cc=robin.murphy@arm.com \
    --cc=schnelle@linux.ibm.com \
    --cc=shameerali.kolothum.thodi@huawei.com \
    --cc=shuah@kernel.org \
    --cc=suravee.suthikulpanit@amd.com \
    --cc=trix@redhat.com \
    --cc=will@kernel.org \
    --cc=yi.l.liu@intel.com \
    --cc=zhukeqian1@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.