kvm.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jason Gunthorpe <jgg@nvidia.com>
To: David Gibson <david@gibson.dropbear.id.au>
Cc: Alex Williamson <alex.williamson@redhat.com>,
	Lu Baolu <baolu.lu@linux.intel.com>,
	Chaitanya Kulkarni <chaitanyak@nvidia.com>,
	Cornelia Huck <cohuck@redhat.com>,
	Daniel Jordan <daniel.m.jordan@oracle.com>,
	Eric Auger <eric.auger@redhat.com>,
	iommu@lists.linux-foundation.org,
	Jason Wang <jasowang@redhat.com>,
	Jean-Philippe Brucker <jean-philippe@linaro.org>,
	Joao Martins <joao.m.martins@oracle.com>,
	Kevin Tian <kevin.tian@intel.com>,
	kvm@vger.kernel.org, Matthew Rosato <mjrosato@linux.ibm.com>,
	"Michael S. Tsirkin" <mst@redhat.com>,
	Nicolin Chen <nicolinc@nvidia.com>,
	Niklas Schnelle <schnelle@linux.ibm.com>,
	Shameerali Kolothum Thodi  <shameerali.kolothum.thodi@huawei.com>,
	Yi Liu <yi.l.liu@intel.com>, Keqian Zhu <zhukeqian1@huawei.com>
Subject: Re: [PATCH RFC 08/12] iommufd: IOCTLs for the io_pagetable
Date: Thu, 28 Apr 2022 11:22:58 -0300	[thread overview]
Message-ID: <20220428142258.GH8364@nvidia.com> (raw)
In-Reply-To: <YmotBkM103HqanoZ@yekko>

On Thu, Apr 28, 2022 at 03:58:30PM +1000, David Gibson wrote:
> On Thu, Mar 31, 2022 at 09:58:41AM -0300, Jason Gunthorpe wrote:
> > On Thu, Mar 31, 2022 at 03:36:29PM +1100, David Gibson wrote:
> > 
> > > > +/**
> > > > + * struct iommu_ioas_iova_ranges - ioctl(IOMMU_IOAS_IOVA_RANGES)
> > > > + * @size: sizeof(struct iommu_ioas_iova_ranges)
> > > > + * @ioas_id: IOAS ID to read ranges from
> > > > + * @out_num_iovas: Output total number of ranges in the IOAS
> > > > + * @__reserved: Must be 0
> > > > + * @out_valid_iovas: Array of valid IOVA ranges. The array length is the smaller
> > > > + *                   of out_num_iovas or the length implied by size.
> > > > + * @out_valid_iovas.start: First IOVA in the allowed range
> > > > + * @out_valid_iovas.last: Inclusive last IOVA in the allowed range
> > > > + *
> > > > + * Query an IOAS for ranges of allowed IOVAs. Operation outside these ranges is
> > > > + * not allowed. out_num_iovas will be set to the total number of iovas
> > > > + * and the out_valid_iovas[] will be filled in as space permits.
> > > > + * size should include the allocated flex array.
> > > > + */
> > > > +struct iommu_ioas_iova_ranges {
> > > > +	__u32 size;
> > > > +	__u32 ioas_id;
> > > > +	__u32 out_num_iovas;
> > > > +	__u32 __reserved;
> > > > +	struct iommu_valid_iovas {
> > > > +		__aligned_u64 start;
> > > > +		__aligned_u64 last;
> > > > +	} out_valid_iovas[];
> > > > +};
> > > > +#define IOMMU_IOAS_IOVA_RANGES _IO(IOMMUFD_TYPE, IOMMUFD_CMD_IOAS_IOVA_RANGES)
> > > 
> > > Is the information returned by this valid for the lifeime of the IOAS,
> > > or can it change?  If it can change, what events can change it?
> > >
> > > If it *can't* change, then how do we have enough information to
> > > determine this at ALLOC time, since we don't necessarily know which
> > > (if any) hardware IOMMU will be attached to it.
> > 
> > It is a good point worth documenting. It can change. Particularly
> > after any device attachment.
> 
> Right.. this is vital and needs to be front and centre in the
> comments/docs here.  Really, I think an interface that *doesn't* have
> magically changing status would be better (which is why I was
> advocating that the user set the constraints, and the kernel supplied
> or failed outright).  Still I recognize that has its own problems.

That is a neat idea, it could be a nice option, it lets userspace
further customize the kernel allocator.

But I don't have a use case in mind? The simplified things I know
about want to attach their devices then allocate valid IOVA, they
don't really have a notion about what IOVA regions they are willing to
accept, or necessarily do hotplug.

What might be interesting is to have some option to load in a machine
specific default ranges - ie the union of every group and and every
iommu_domain. The idea being that after such a call hotplug of a
device should be very likely to succeed.

Though I don't have a user in mind..

> > I added this:
> > 
> >  * Query an IOAS for ranges of allowed IOVAs. Mapping IOVA outside these ranges
> >  * is not allowed. out_num_iovas will be set to the total number of iovas and
> >  * the out_valid_iovas[] will be filled in as space permits. size should include
> >  * the allocated flex array.
> >  *
> >  * The allowed ranges are dependent on the HW path the DMA operation takes, and
> >  * can change during the lifetime of the IOAS. A fresh empty IOAS will have a
> >  * full range, and each attached device will narrow the ranges based on that
> >  * devices HW restrictions.
> 
> I think you need to be even more explicit about this: which exact
> operations on the fd can invalidate exactly which items in the
> information from this call?  Can it only ever be narrowed, or can it
> be broadened with any operations?

I think "attach" is the phrase we are using for that operation - it is
not a specific IOCTL here because it happens on, say, the VFIO device FD.

Let's add "detatching a device can widen the ranges. Userspace should
query ranges after every attach/detatch to know what IOVAs are valid
for mapping."

> > > > +#define IOMMU_IOAS_COPY _IO(IOMMUFD_TYPE, IOMMUFD_CMD_IOAS_COPY)
> > > 
> > > Since it can only copy a single mapping, what's the benefit of this
> > > over just repeating an IOAS_MAP in the new IOAS?
> > 
> > It causes the underlying pin accounting to be shared and can avoid
> > calling GUP entirely.
> 
> If that's the only purpose, then that needs to be right here in the
> comments too.  So is expected best practice to IOAS_MAP everything you
> might want to map into a sort of "scratch" IOAS, then IOAS_COPY the
> mappings you actually end up wanting into the "real" IOASes for use?

That is one possibility, yes. qemu seems to be using this to establish
a clone ioas of an existing operational one which is another usage
model.

I added this additionally:

 * This may be used to efficiently clone a subset of an IOAS to another, or as a
 * kind of 'cache' to speed up mapping. Copy has an effciency advantage over
 * establishing equivilant new mappings, as internal resources are shared, and
 * the kernel will pin the user memory only once.

> Seems like it would be nicer for the interface to just figure it out
> for you: I can see there being sufficient complications with that to
> have this slightly awkward interface, but I think it needs a rationale
> to accompany it.

It is more than complicates, the kernel has no way to accurately know
when a user pointer is an alias of an existing user pointer or is
something new because the mm has become incoherent.

It is possible that uncoordinated modules in userspace could
experience data corruption if the wrong decision is made - mm
coherence with pinning is pretty weak in Linux.. Since I dislike that
kind of unreliable magic I made it explicit.

Thanks,
Jason


  reply	other threads:[~2022-04-28 14:23 UTC|newest]

Thread overview: 122+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-03-18 17:27 [PATCH RFC 00/12] IOMMUFD Generic interface Jason Gunthorpe
2022-03-18 17:27 ` [PATCH RFC 01/12] interval-tree: Add a utility to iterate over spans in an interval tree Jason Gunthorpe
2022-03-18 17:27 ` [PATCH RFC 02/12] iommufd: Overview documentation Jason Gunthorpe
2022-03-18 17:27 ` [PATCH RFC 03/12] iommufd: File descriptor, context, kconfig and makefiles Jason Gunthorpe
2022-03-22 14:18   ` Niklas Schnelle
2022-03-22 14:50     ` Jason Gunthorpe
2022-03-18 17:27 ` [PATCH RFC 04/12] kernel/user: Allow user::locked_vm to be usable for iommufd Jason Gunthorpe
2022-03-22 14:28   ` Niklas Schnelle
2022-03-22 14:57     ` Jason Gunthorpe
2022-03-22 15:29       ` Alex Williamson
2022-03-22 16:15         ` Jason Gunthorpe
2022-03-24  2:11           ` Tian, Kevin
2022-03-24  2:27             ` Jason Wang
2022-03-24  2:42               ` Tian, Kevin
2022-03-24  2:57                 ` Jason Wang
2022-03-24  3:15                   ` Tian, Kevin
2022-03-24  3:50                     ` Jason Wang
2022-03-24  4:29                       ` Tian, Kevin
2022-03-24 11:46                       ` Jason Gunthorpe
2022-03-28  1:53                         ` Jason Wang
2022-03-28 12:22                           ` Jason Gunthorpe
2022-03-29  4:59                             ` Jason Wang
2022-03-29 11:46                               ` Jason Gunthorpe
2022-03-28 13:14                           ` Sean Mooney
2022-03-28 14:27                             ` Jason Gunthorpe
2022-03-24 20:40           ` Alex Williamson
2022-03-24 22:27             ` Jason Gunthorpe
2022-03-24 22:41               ` Alex Williamson
2022-03-22 16:31       ` Niklas Schnelle
2022-03-22 16:41         ` Jason Gunthorpe
2022-03-18 17:27 ` [PATCH RFC 05/12] iommufd: PFN handling for iopt_pages Jason Gunthorpe
2022-03-23 15:37   ` Niklas Schnelle
2022-03-23 16:09     ` Jason Gunthorpe
2022-03-18 17:27 ` [PATCH RFC 06/12] iommufd: Algorithms for PFN storage Jason Gunthorpe
2022-03-18 17:27 ` [PATCH RFC 07/12] iommufd: Data structure to provide IOVA to PFN mapping Jason Gunthorpe
2022-03-22 22:15   ` Alex Williamson
2022-03-23 18:15     ` Jason Gunthorpe
2022-03-24  3:09       ` Tian, Kevin
2022-03-24 12:46         ` Jason Gunthorpe
2022-03-25 13:34   ` zhangfei.gao
2022-03-25 17:19     ` Jason Gunthorpe
2022-04-13 14:02   ` Yi Liu
2022-04-13 14:36     ` Jason Gunthorpe
2022-04-13 14:49       ` Yi Liu
2022-04-17 14:56         ` Yi Liu
2022-04-18 10:47           ` Yi Liu
2022-03-18 17:27 ` [PATCH RFC 08/12] iommufd: IOCTLs for the io_pagetable Jason Gunthorpe
2022-03-23 19:10   ` Alex Williamson
2022-03-23 19:34     ` Jason Gunthorpe
2022-03-23 20:04       ` Alex Williamson
2022-03-23 20:34         ` Jason Gunthorpe
2022-03-23 22:54           ` Jason Gunthorpe
2022-03-24  7:25             ` Tian, Kevin
2022-03-24 13:46               ` Jason Gunthorpe
2022-03-25  2:15                 ` Tian, Kevin
2022-03-27  2:32                 ` Tian, Kevin
2022-03-27 14:28                   ` Jason Gunthorpe
2022-03-28 17:17                 ` Alex Williamson
2022-03-28 18:57                   ` Jason Gunthorpe
2022-03-28 19:47                     ` Jason Gunthorpe
2022-03-28 21:26                       ` Alex Williamson
2022-03-24  6:46           ` Tian, Kevin
2022-03-30 13:35   ` Yi Liu
2022-03-31 12:59     ` Jason Gunthorpe
2022-04-01 13:30       ` Yi Liu
2022-03-31  4:36   ` David Gibson
2022-03-31  5:41     ` Tian, Kevin
2022-03-31 12:58     ` Jason Gunthorpe
2022-04-28  5:58       ` David Gibson
2022-04-28 14:22         ` Jason Gunthorpe [this message]
2022-04-29  6:00           ` David Gibson
2022-04-29 12:54             ` Jason Gunthorpe
2022-04-30 14:44               ` David Gibson
2022-03-18 17:27 ` [PATCH RFC 09/12] iommufd: Add a HW pagetable object Jason Gunthorpe
2022-03-18 17:27 ` [PATCH RFC 10/12] iommufd: Add kAPI toward external drivers Jason Gunthorpe
2022-03-23 18:10   ` Alex Williamson
2022-03-23 18:15     ` Jason Gunthorpe
2022-05-11 12:54   ` Yi Liu
2022-05-19  9:45   ` Yi Liu
2022-05-19 12:35     ` Jason Gunthorpe
2022-03-18 17:27 ` [PATCH RFC 11/12] iommufd: vfio container FD ioctl compatibility Jason Gunthorpe
2022-03-23 22:51   ` Alex Williamson
2022-03-24  0:33     ` Jason Gunthorpe
2022-03-24  8:13       ` Eric Auger
2022-03-24 22:04       ` Alex Williamson
2022-03-24 23:11         ` Jason Gunthorpe
2022-03-25  3:10           ` Tian, Kevin
2022-03-25 11:24           ` Joao Martins
2022-04-28 14:53         ` David Gibson
2022-04-28 15:10           ` Jason Gunthorpe
2022-04-29  1:21             ` Tian, Kevin
2022-04-29  6:22               ` David Gibson
2022-04-29 12:50                 ` Jason Gunthorpe
2022-05-02  4:10                   ` David Gibson
2022-04-29  6:20             ` David Gibson
2022-04-29 12:48               ` Jason Gunthorpe
2022-05-02  7:30                 ` David Gibson
2022-05-05 19:07                   ` Jason Gunthorpe
2022-05-06  5:25                     ` David Gibson
2022-05-06 10:42                       ` Tian, Kevin
2022-05-09  3:36                         ` David Gibson
2022-05-06 12:48                       ` Jason Gunthorpe
2022-05-09  6:01                         ` David Gibson
2022-05-09 14:00                           ` Jason Gunthorpe
2022-05-10  7:12                             ` David Gibson
2022-05-10 19:00                               ` Jason Gunthorpe
2022-05-11  3:15                                 ` Tian, Kevin
2022-05-11 16:32                                   ` Jason Gunthorpe
2022-05-11 23:23                                     ` Tian, Kevin
2022-05-13  4:35                                   ` David Gibson
2022-05-11  4:40                                 ` David Gibson
2022-05-11  2:46                             ` Tian, Kevin
2022-05-23  6:02           ` Alexey Kardashevskiy
2022-05-24 13:25             ` Jason Gunthorpe
2022-05-25  1:39               ` David Gibson
2022-05-25  2:09               ` Alexey Kardashevskiy
2022-03-29  9:17     ` Yi Liu
2022-03-18 17:27 ` [PATCH RFC 12/12] iommufd: Add a selftest Jason Gunthorpe
2022-04-12 20:13 ` [PATCH RFC 00/12] IOMMUFD Generic interface Eric Auger
2022-04-12 20:22   ` Jason Gunthorpe
2022-04-12 20:50     ` Eric Auger
2022-04-14 10:56 ` Yi Liu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20220428142258.GH8364@nvidia.com \
    --to=jgg@nvidia.com \
    --cc=alex.williamson@redhat.com \
    --cc=baolu.lu@linux.intel.com \
    --cc=chaitanyak@nvidia.com \
    --cc=cohuck@redhat.com \
    --cc=daniel.m.jordan@oracle.com \
    --cc=david@gibson.dropbear.id.au \
    --cc=eric.auger@redhat.com \
    --cc=iommu@lists.linux-foundation.org \
    --cc=jasowang@redhat.com \
    --cc=jean-philippe@linaro.org \
    --cc=joao.m.martins@oracle.com \
    --cc=kevin.tian@intel.com \
    --cc=kvm@vger.kernel.org \
    --cc=mjrosato@linux.ibm.com \
    --cc=mst@redhat.com \
    --cc=nicolinc@nvidia.com \
    --cc=schnelle@linux.ibm.com \
    --cc=shameerali.kolothum.thodi@huawei.com \
    --cc=yi.l.liu@intel.com \
    --cc=zhukeqian1@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).