All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jason Gunthorpe <jgg@nvidia.com>
To: Steven Sistare <steven.sistare@oracle.com>
Cc: "Alex Williamson" <alex.williamson@redhat.com>,
	"Eric Auger" <eric.auger@redhat.com>,
	"Tian, Kevin" <kevin.tian@intel.com>,
	"Rodel, Jorg" <jroedel@suse.de>,
	"Lu Baolu" <baolu.lu@linux.intel.com>,
	"Chaitanya Kulkarni" <chaitanyak@nvidia.com>,
	"Cornelia Huck" <cohuck@redhat.com>,
	"Daniel Jordan" <daniel.m.jordan@oracle.com>,
	"David Gibson" <david@gibson.dropbear.id.au>,
	"Eric Farman" <farman@linux.ibm.com>,
	"iommu@lists.linux.dev" <iommu@lists.linux.dev>,
	"Jason Wang" <jasowang@redhat.com>,
	"Jean-Philippe Brucker" <jean-philippe@linaro.org>,
	"Martins, Joao" <joao.m.martins@oracle.com>,
	"kvm@vger.kernel.org" <kvm@vger.kernel.org>,
	"Matthew Rosato" <mjrosato@linux.ibm.com>,
	"Michael S. Tsirkin" <mst@redhat.com>,
	"Nicolin Chen" <nicolinc@nvidia.com>,
	"Niklas Schnelle" <schnelle@linux.ibm.com>,
	"Shameerali Kolothum Thodi"
	<shameerali.kolothum.thodi@huawei.com>,
	"Liu, Yi L" <yi.l.liu@intel.com>,
	"Keqian Zhu" <zhukeqian1@huawei.com>,
	"libvir-list@redhat.com" <libvir-list@redhat.com>,
	"Daniel P. Berrangé" <berrange@redhat.com>,
	"Laine Stump" <laine@redhat.com>
Subject: Re: [PATCH RFC v2 00/13] IOMMUFD Generic interface
Date: Thu, 6 Oct 2022 13:01:49 -0300	[thread overview]
Message-ID: <Yz777bJZjTyLrHEQ@nvidia.com> (raw)
In-Reply-To: <YyuZwnksf70lj84L@nvidia.com>

On Wed, Sep 21, 2022 at 08:09:54PM -0300, Jason Gunthorpe wrote:
> On Wed, Sep 21, 2022 at 03:30:55PM -0400, Steven Sistare wrote:
> 
> > > If Steve wants to keep it then someone needs to fix the deadlock in
> > > the vfio implementation before any userspace starts to appear. 
> > 
> > The only VFIO_DMA_UNMAP_FLAG_VADDR issue I am aware of is broken pinned accounting
> > across exec, which can result in mm->locked_vm becoming negative. I have several 
> > fixes, but none result in limits being reached at exactly the same time as before --
> > the same general issue being discussed for iommufd.  I am still thinking about it.
> 
> Oh, yeah, I noticed this was all busted up too.
> 
> > I am not aware of a deadlock problem.  Please elaborate or point me to an
> > email thread.
> 
> VFIO_DMA_UNMAP_FLAG_VADDR open codes a lock in the kernel where
> userspace can tigger the lock to be taken and then returns to
> userspace with the lock held.
> 
> Any scenario where a kernel thread hits that open-coded lock and then
> userspace does-the-wrong-thing will deadlock the kernel.
> 
> For instance consider a mdev driver. We assert
> VFIO_DMA_UNMAP_FLAG_VADDR, the mdev driver does a DMA in a workqueue
> and becomes blocked on the now locked lock. Userspace then tries to
> close the device FD.
> 
> FD closure will trigger device close and the VFIO core code
> requirement is that mdev driver device teardown must halt all
> concurrent threads touching vfio_device. Thus the mdev will try to
> fence its workqeue and then deadlock - unable to flush/cancel a work
> that is currently blocked on a lock held by userspace that will never
> be unlocked.
> 
> This is just the first scenario that comes to mind. The approach to
> give userspace control of a lock that kernel threads can become
> blocked on is so completely sketchy it is a complete no-go in my
> opinion. If I had seen it when it was posted I would have hard NAK'd
> it.
> 
> My "full" solution in mind for iommufd is to pin all the memory upon
> VFIO_DMA_UNMAP_FLAG_VADDR, so we can continue satisfy DMA requests
> while the mm_struct is not available. But IMHO this is basically
> useless for any actual user of mdevs.
> 
> The other option is to just exclude mdevs and fail the
> VFIO_DMA_UNMAP_FLAG_VADDR if any are present, then prevent them from
> becoming present while it is asserted. In this way we don't need to do
> anything beyond a simple check as the iommu_domain is already fully
> populated and pinned.

Do we have a solution to this?

If not I would like to make a patch removing VFIO_DMA_UNMAP_FLAG_VADDR

Aside from the approach to use the FD, another idea is to just use
fork.

qemu would do something like

 .. stop all container ioctl activity ..
 fork()
    ioctl(CHANGE_MM) // switch all maps to this mm
    .. signal parent.. 
    .. wait parent..
    exit(0)
 .. wait child ..
 exec()
 ioctl(CHANGE_MM) // switch all maps to this mm
 ..signal child..
 waitpid(childpid)

This way the kernel is never left without a page provider for the
maps, the dummy mm_struct belonging to the fork will serve that role
for the gap.

And the above is only required if we have mdevs, so we could imagine
userspace optimizing it away for, eg vfio-pci only cases.

It is not as efficient as using a FD backing, but this is super easy
to implement in the kernel.

Jason

  reply	other threads:[~2022-10-06 16:02 UTC|newest]

Thread overview: 78+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-09-02 19:59 [PATCH RFC v2 00/13] IOMMUFD Generic interface Jason Gunthorpe
2022-09-02 19:59 ` Jason Gunthorpe
2022-09-02 19:59 ` [PATCH RFC v2 01/13] interval-tree: Add a utility to iterate over spans in an interval tree Jason Gunthorpe
2022-09-02 19:59   ` Jason Gunthorpe
2022-09-02 19:59 ` [PATCH RFC v2 02/13] iommufd: Overview documentation Jason Gunthorpe
2022-09-02 19:59   ` Jason Gunthorpe
2022-09-07  1:39   ` David Gibson
2022-09-09 18:52     ` Jason Gunthorpe
2022-09-12 10:40       ` David Gibson
2022-09-27 17:33         ` Jason Gunthorpe
2022-09-29  3:47           ` David Gibson
2022-09-02 19:59 ` [PATCH RFC v2 03/13] iommufd: File descriptor, context, kconfig and makefiles Jason Gunthorpe
2022-09-02 19:59   ` Jason Gunthorpe
2022-09-04  8:19   ` Baolu Lu
2022-09-09 18:46     ` Jason Gunthorpe
2022-09-02 19:59 ` [PATCH RFC v2 04/13] kernel/user: Allow user::locked_vm to be usable for iommufd Jason Gunthorpe
2022-09-02 19:59   ` Jason Gunthorpe
2022-09-02 19:59 ` [PATCH RFC v2 05/13] iommufd: PFN handling for iopt_pages Jason Gunthorpe
2022-09-02 19:59   ` Jason Gunthorpe
2022-09-02 19:59 ` [PATCH RFC v2 06/13] iommufd: Algorithms for PFN storage Jason Gunthorpe
2022-09-02 19:59   ` Jason Gunthorpe
2022-09-02 19:59 ` [PATCH RFC v2 07/13] iommufd: Data structure to provide IOVA to PFN mapping Jason Gunthorpe
2022-09-02 19:59   ` Jason Gunthorpe
2022-09-02 19:59 ` [PATCH RFC v2 08/13] iommufd: IOCTLs for the io_pagetable Jason Gunthorpe
2022-09-02 19:59   ` Jason Gunthorpe
2022-09-02 19:59 ` [PATCH RFC v2 09/13] iommufd: Add a HW pagetable object Jason Gunthorpe
2022-09-02 19:59   ` Jason Gunthorpe
2022-09-02 19:59 ` [PATCH RFC v2 10/13] iommufd: Add kAPI toward external drivers for physical devices Jason Gunthorpe
2022-09-02 19:59   ` Jason Gunthorpe
2022-09-02 19:59 ` [PATCH RFC v2 11/13] iommufd: Add kAPI toward external drivers for kernel access Jason Gunthorpe
2022-09-02 19:59   ` Jason Gunthorpe
2022-09-02 19:59 ` [PATCH RFC v2 12/13] iommufd: vfio container FD ioctl compatibility Jason Gunthorpe
2022-09-02 19:59   ` Jason Gunthorpe
2022-09-02 19:59 ` [PATCH RFC v2 13/13] iommufd: Add a selftest Jason Gunthorpe
2022-09-02 19:59   ` Jason Gunthorpe
2022-09-13  1:55 ` [PATCH RFC v2 00/13] IOMMUFD Generic interface Tian, Kevin
2022-09-13  7:28   ` Eric Auger
2022-09-20 19:56     ` Jason Gunthorpe
2022-09-21  3:48       ` Tian, Kevin
2022-09-21 18:06       ` Alex Williamson
2022-09-21 18:44         ` Jason Gunthorpe
2022-09-21 19:30           ` Steven Sistare
2022-09-21 23:09             ` Jason Gunthorpe
2022-10-06 16:01               ` Jason Gunthorpe [this message]
2022-10-06 22:57                 ` Steven Sistare
2022-10-10 20:54                 ` Steven Sistare
2022-10-11 12:30                   ` Jason Gunthorpe
2022-10-11 20:30                     ` Steven Sistare
2022-10-12 12:32                       ` Jason Gunthorpe
2022-10-12 13:50                         ` Steven Sistare
2022-10-12 14:40                           ` Jason Gunthorpe
2022-10-12 14:55                             ` Steven Sistare
2022-10-12 14:59                               ` Jason Gunthorpe
2022-09-21 23:20           ` Jason Gunthorpe
2022-09-22 11:20           ` Daniel P. Berrangé
2022-09-22 14:08             ` Jason Gunthorpe
2022-09-22 14:49               ` Daniel P. Berrangé
2022-09-22 14:51                 ` Jason Gunthorpe
2022-09-22 15:00                   ` Daniel P. Berrangé
2022-09-22 15:31                     ` Jason Gunthorpe
2022-09-23  8:54                       ` Daniel P. Berrangé
2022-09-23 13:29                         ` Jason Gunthorpe
2022-09-23 13:35                           ` Daniel P. Berrangé
2022-09-23 13:46                             ` Jason Gunthorpe
2022-09-23 14:00                               ` Daniel P. Berrangé
2022-09-23 15:40                                 ` Laine Stump
2022-10-21 19:56                                   ` Jason Gunthorpe
2022-09-23 14:03                           ` Alex Williamson
2022-09-26  6:34                             ` David Gibson
2022-09-21 22:36         ` Laine Stump
2022-09-22 11:06         ` Daniel P. Berrangé
2022-09-22 14:13           ` Jason Gunthorpe
2022-09-22 14:46             ` Daniel P. Berrangé
2022-09-13  2:05 ` Tian, Kevin
2022-09-20 20:07   ` Jason Gunthorpe
2022-09-21  3:40     ` Tian, Kevin
2022-09-21 16:19       ` Jason Gunthorpe
2022-09-26 13:48     ` Rodel, Jorg

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Yz777bJZjTyLrHEQ@nvidia.com \
    --to=jgg@nvidia.com \
    --cc=alex.williamson@redhat.com \
    --cc=baolu.lu@linux.intel.com \
    --cc=berrange@redhat.com \
    --cc=chaitanyak@nvidia.com \
    --cc=cohuck@redhat.com \
    --cc=daniel.m.jordan@oracle.com \
    --cc=david@gibson.dropbear.id.au \
    --cc=eric.auger@redhat.com \
    --cc=farman@linux.ibm.com \
    --cc=iommu@lists.linux.dev \
    --cc=jasowang@redhat.com \
    --cc=jean-philippe@linaro.org \
    --cc=joao.m.martins@oracle.com \
    --cc=jroedel@suse.de \
    --cc=kevin.tian@intel.com \
    --cc=kvm@vger.kernel.org \
    --cc=laine@redhat.com \
    --cc=libvir-list@redhat.com \
    --cc=mjrosato@linux.ibm.com \
    --cc=mst@redhat.com \
    --cc=nicolinc@nvidia.com \
    --cc=schnelle@linux.ibm.com \
    --cc=shameerali.kolothum.thodi@huawei.com \
    --cc=steven.sistare@oracle.com \
    --cc=yi.l.liu@intel.com \
    --cc=zhukeqian1@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.