From: "Tian, Kevin" <kevin.tian@intel.com>
To: Jason Wang <jasowang@redhat.com>
Cc: Jason Gunthorpe <jgg@nvidia.com>,
Alex Williamson <alex.williamson@redhat.com>,
Niklas Schnelle <schnelle@linux.ibm.com>,
"Lu Baolu" <baolu.lu@linux.intel.com>,
Chaitanya Kulkarni <chaitanyak@nvidia.com>,
Cornelia Huck <cohuck@redhat.com>,
Daniel Jordan <daniel.m.jordan@oracle.com>,
David Gibson <david@gibson.dropbear.id.au>,
Eric Auger <eric.auger@redhat.com>,
"iommu@lists.linux-foundation.org"
<iommu@lists.linux-foundation.org>,
Jean-Philippe Brucker <jean-philippe@linaro.org>,
"Martins, Joao" <joao.m.martins@oracle.com>,
"kvm@vger.kernel.org" <kvm@vger.kernel.org>,
Matthew Rosato <mjrosato@linux.ibm.com>,
"Michael S. Tsirkin" <mst@redhat.com>,
Nicolin Chen <nicolinc@nvidia.com>,
Shameerali Kolothum Thodi <shameerali.kolothum.thodi@huawei.com>,
"Liu, Yi L" <yi.l.liu@intel.com>,
Keqian Zhu <zhukeqian1@huawei.com>
Subject: RE: [PATCH RFC 04/12] kernel/user: Allow user::locked_vm to be usable for iommufd
Date: Thu, 24 Mar 2022 02:42:38 +0000 [thread overview]
Message-ID: <BN9PR11MB5276E3566D633CEE245004D08C199@BN9PR11MB5276.namprd11.prod.outlook.com> (raw)
In-Reply-To: <CACGkMEutpbOc_+5n3SDuNDyHn19jSH4ukSM9i0SUgWmXDydxnA@mail.gmail.com>
> From: Jason Wang <jasowang@redhat.com>
> Sent: Thursday, March 24, 2022 10:28 AM
>
> On Thu, Mar 24, 2022 at 10:12 AM Tian, Kevin <kevin.tian@intel.com> wrote:
> >
> > > From: Jason Gunthorpe <jgg@nvidia.com>
> > > Sent: Wednesday, March 23, 2022 12:15 AM
> > >
> > > On Tue, Mar 22, 2022 at 09:29:23AM -0600, Alex Williamson wrote:
> > >
> > > > I'm still picking my way through the series, but the later compat
> > > > interface doesn't mention this difference as an outstanding issue.
> > > > Doesn't this difference need to be accounted in how libvirt manages VM
> > > > resource limits?
> > >
> > > AFACIT, no, but it should be checked.
> > >
> > > > AIUI libvirt uses some form of prlimit(2) to set process locked
> > > > memory limits.
> > >
> > > Yes, and ulimit does work fully. prlimit adjusts the value:
> > >
> > > int do_prlimit(struct task_struct *tsk, unsigned int resource,
> > > struct rlimit *new_rlim, struct rlimit *old_rlim)
> > > {
> > > rlim = tsk->signal->rlim + resource;
> > > [..]
> > > if (new_rlim)
> > > *rlim = *new_rlim;
> > >
> > > Which vfio reads back here:
> > >
> > > drivers/vfio/vfio_iommu_type1.c: unsigned long pfn, limit =
> > > rlimit(RLIMIT_MEMLOCK) >> PAGE_SHIFT;
> > > drivers/vfio/vfio_iommu_type1.c: unsigned long limit =
> > > rlimit(RLIMIT_MEMLOCK) >> PAGE_SHIFT;
> > >
> > > And iommufd does the same read back:
> > >
> > > lock_limit =
> > > task_rlimit(pages->source_task, RLIMIT_MEMLOCK) >>
> > > PAGE_SHIFT;
> > > npages = pages->npinned - pages->last_npinned;
> > > do {
> > > cur_pages = atomic_long_read(&pages->source_user-
> > > >locked_vm);
> > > new_pages = cur_pages + npages;
> > > if (new_pages > lock_limit)
> > > return -ENOMEM;
> > > } while (atomic_long_cmpxchg(&pages->source_user->locked_vm,
> > > cur_pages,
> > > new_pages) != cur_pages);
> > >
> > > So it does work essentially the same.
> > >
> > > The difference is more subtle, iouring/etc puts the charge in the user
> > > so it is additive with things like iouring and additively spans all
> > > the users processes.
> > >
> > > However vfio is accounting only per-process and only for itself - no
> > > other subsystem uses locked as the charge variable for DMA pins.
> > >
> > > The user visible difference will be that a limit X that worked with
> > > VFIO may start to fail after a kernel upgrade as the charge accounting
> > > is now cross user and additive with things like iommufd.
> > >
> > > This whole area is a bit peculiar (eg mlock itself works differently),
> > > IMHO, but with most of the places doing pins voting to use
> > > user->locked_vm as the charge it seems the right path in today's
> > > kernel.
> > >
> > > Ceratinly having qemu concurrently using three different subsystems
> > > (vfio, rdma, iouring) issuing FOLL_LONGTERM and all accounting for
> > > RLIMIT_MEMLOCK differently cannot be sane or correct.
> > >
> > > I plan to fix RDMA like this as well so at least we can have
> > > consistency within qemu.
> > >
> >
> > I have an impression that iommufd and vfio type1 must use
> > the same accounting scheme given the management stack
> > has no insight into qemu on which one is actually used thus
> > cannot adapt to the subtle difference in between. in this
> > regard either we start fixing vfio type1 to use user->locked_vm
> > now or have iommufd follow vfio type1 for upward compatibility
> > and then change them together at a later point.
> >
> > I prefer to the former as IMHO I don't know when will be a later
> > point w/o certain kernel changes to actually break the userspace
> > policy built on a wrong accounting scheme...
>
> I wonder if the kernel is the right place to do this. We have new uAPI
I didn't get this. This thread is about that VFIO uses a wrong accounting
scheme and then the discussion is about the impact of fixing it to the
userspace. I didn't see the question on the right place part.
> so management layer can know the difference of the accounting in
> advance by
>
> -device vfio-pci,iommufd=on
>
I suppose iommufd will be used once Qemu supports it, as long as
the compatibility opens that Jason/Alex discussed in another thread
are well addressed. It is not necessarily to be a control knob exposed
to the caller.
Thanks
Kevin
next prev parent reply other threads:[~2022-03-24 2:42 UTC|newest]
Thread overview: 122+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-03-18 17:27 [PATCH RFC 00/12] IOMMUFD Generic interface Jason Gunthorpe
2022-03-18 17:27 ` [PATCH RFC 01/12] interval-tree: Add a utility to iterate over spans in an interval tree Jason Gunthorpe
2022-03-18 17:27 ` [PATCH RFC 02/12] iommufd: Overview documentation Jason Gunthorpe
2022-03-18 17:27 ` [PATCH RFC 03/12] iommufd: File descriptor, context, kconfig and makefiles Jason Gunthorpe
2022-03-22 14:18 ` Niklas Schnelle
2022-03-22 14:50 ` Jason Gunthorpe
2022-03-18 17:27 ` [PATCH RFC 04/12] kernel/user: Allow user::locked_vm to be usable for iommufd Jason Gunthorpe
2022-03-22 14:28 ` Niklas Schnelle
2022-03-22 14:57 ` Jason Gunthorpe
2022-03-22 15:29 ` Alex Williamson
2022-03-22 16:15 ` Jason Gunthorpe
2022-03-24 2:11 ` Tian, Kevin
2022-03-24 2:27 ` Jason Wang
2022-03-24 2:42 ` Tian, Kevin [this message]
2022-03-24 2:57 ` Jason Wang
2022-03-24 3:15 ` Tian, Kevin
2022-03-24 3:50 ` Jason Wang
2022-03-24 4:29 ` Tian, Kevin
2022-03-24 11:46 ` Jason Gunthorpe
2022-03-28 1:53 ` Jason Wang
2022-03-28 12:22 ` Jason Gunthorpe
2022-03-29 4:59 ` Jason Wang
2022-03-29 11:46 ` Jason Gunthorpe
2022-03-28 13:14 ` Sean Mooney
2022-03-28 14:27 ` Jason Gunthorpe
2022-03-24 20:40 ` Alex Williamson
2022-03-24 22:27 ` Jason Gunthorpe
2022-03-24 22:41 ` Alex Williamson
2022-03-22 16:31 ` Niklas Schnelle
2022-03-22 16:41 ` Jason Gunthorpe
2022-03-18 17:27 ` [PATCH RFC 05/12] iommufd: PFN handling for iopt_pages Jason Gunthorpe
2022-03-23 15:37 ` Niklas Schnelle
2022-03-23 16:09 ` Jason Gunthorpe
2022-03-18 17:27 ` [PATCH RFC 06/12] iommufd: Algorithms for PFN storage Jason Gunthorpe
2022-03-18 17:27 ` [PATCH RFC 07/12] iommufd: Data structure to provide IOVA to PFN mapping Jason Gunthorpe
2022-03-22 22:15 ` Alex Williamson
2022-03-23 18:15 ` Jason Gunthorpe
2022-03-24 3:09 ` Tian, Kevin
2022-03-24 12:46 ` Jason Gunthorpe
2022-03-25 13:34 ` zhangfei.gao
2022-03-25 17:19 ` Jason Gunthorpe
2022-04-13 14:02 ` Yi Liu
2022-04-13 14:36 ` Jason Gunthorpe
2022-04-13 14:49 ` Yi Liu
2022-04-17 14:56 ` Yi Liu
2022-04-18 10:47 ` Yi Liu
2022-03-18 17:27 ` [PATCH RFC 08/12] iommufd: IOCTLs for the io_pagetable Jason Gunthorpe
2022-03-23 19:10 ` Alex Williamson
2022-03-23 19:34 ` Jason Gunthorpe
2022-03-23 20:04 ` Alex Williamson
2022-03-23 20:34 ` Jason Gunthorpe
2022-03-23 22:54 ` Jason Gunthorpe
2022-03-24 7:25 ` Tian, Kevin
2022-03-24 13:46 ` Jason Gunthorpe
2022-03-25 2:15 ` Tian, Kevin
2022-03-27 2:32 ` Tian, Kevin
2022-03-27 14:28 ` Jason Gunthorpe
2022-03-28 17:17 ` Alex Williamson
2022-03-28 18:57 ` Jason Gunthorpe
2022-03-28 19:47 ` Jason Gunthorpe
2022-03-28 21:26 ` Alex Williamson
2022-03-24 6:46 ` Tian, Kevin
2022-03-30 13:35 ` Yi Liu
2022-03-31 12:59 ` Jason Gunthorpe
2022-04-01 13:30 ` Yi Liu
2022-03-31 4:36 ` David Gibson
2022-03-31 5:41 ` Tian, Kevin
2022-03-31 12:58 ` Jason Gunthorpe
2022-04-28 5:58 ` David Gibson
2022-04-28 14:22 ` Jason Gunthorpe
2022-04-29 6:00 ` David Gibson
2022-04-29 12:54 ` Jason Gunthorpe
2022-04-30 14:44 ` David Gibson
2022-03-18 17:27 ` [PATCH RFC 09/12] iommufd: Add a HW pagetable object Jason Gunthorpe
2022-03-18 17:27 ` [PATCH RFC 10/12] iommufd: Add kAPI toward external drivers Jason Gunthorpe
2022-03-23 18:10 ` Alex Williamson
2022-03-23 18:15 ` Jason Gunthorpe
2022-05-11 12:54 ` Yi Liu
2022-05-19 9:45 ` Yi Liu
2022-05-19 12:35 ` Jason Gunthorpe
2022-03-18 17:27 ` [PATCH RFC 11/12] iommufd: vfio container FD ioctl compatibility Jason Gunthorpe
2022-03-23 22:51 ` Alex Williamson
2022-03-24 0:33 ` Jason Gunthorpe
2022-03-24 8:13 ` Eric Auger
2022-03-24 22:04 ` Alex Williamson
2022-03-24 23:11 ` Jason Gunthorpe
2022-03-25 3:10 ` Tian, Kevin
2022-03-25 11:24 ` Joao Martins
2022-04-28 14:53 ` David Gibson
2022-04-28 15:10 ` Jason Gunthorpe
2022-04-29 1:21 ` Tian, Kevin
2022-04-29 6:22 ` David Gibson
2022-04-29 12:50 ` Jason Gunthorpe
2022-05-02 4:10 ` David Gibson
2022-04-29 6:20 ` David Gibson
2022-04-29 12:48 ` Jason Gunthorpe
2022-05-02 7:30 ` David Gibson
2022-05-05 19:07 ` Jason Gunthorpe
2022-05-06 5:25 ` David Gibson
2022-05-06 10:42 ` Tian, Kevin
2022-05-09 3:36 ` David Gibson
2022-05-06 12:48 ` Jason Gunthorpe
2022-05-09 6:01 ` David Gibson
2022-05-09 14:00 ` Jason Gunthorpe
2022-05-10 7:12 ` David Gibson
2022-05-10 19:00 ` Jason Gunthorpe
2022-05-11 3:15 ` Tian, Kevin
2022-05-11 16:32 ` Jason Gunthorpe
2022-05-11 23:23 ` Tian, Kevin
2022-05-13 4:35 ` David Gibson
2022-05-11 4:40 ` David Gibson
2022-05-11 2:46 ` Tian, Kevin
2022-05-23 6:02 ` Alexey Kardashevskiy
2022-05-24 13:25 ` Jason Gunthorpe
2022-05-25 1:39 ` David Gibson
2022-05-25 2:09 ` Alexey Kardashevskiy
2022-03-29 9:17 ` Yi Liu
2022-03-18 17:27 ` [PATCH RFC 12/12] iommufd: Add a selftest Jason Gunthorpe
2022-04-12 20:13 ` [PATCH RFC 00/12] IOMMUFD Generic interface Eric Auger
2022-04-12 20:22 ` Jason Gunthorpe
2022-04-12 20:50 ` Eric Auger
2022-04-14 10:56 ` Yi Liu
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=BN9PR11MB5276E3566D633CEE245004D08C199@BN9PR11MB5276.namprd11.prod.outlook.com \
--to=kevin.tian@intel.com \
--cc=alex.williamson@redhat.com \
--cc=baolu.lu@linux.intel.com \
--cc=chaitanyak@nvidia.com \
--cc=cohuck@redhat.com \
--cc=daniel.m.jordan@oracle.com \
--cc=david@gibson.dropbear.id.au \
--cc=eric.auger@redhat.com \
--cc=iommu@lists.linux-foundation.org \
--cc=jasowang@redhat.com \
--cc=jean-philippe@linaro.org \
--cc=jgg@nvidia.com \
--cc=joao.m.martins@oracle.com \
--cc=kvm@vger.kernel.org \
--cc=mjrosato@linux.ibm.com \
--cc=mst@redhat.com \
--cc=nicolinc@nvidia.com \
--cc=schnelle@linux.ibm.com \
--cc=shameerali.kolothum.thodi@huawei.com \
--cc=yi.l.liu@intel.com \
--cc=zhukeqian1@huawei.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).