All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jason Gunthorpe via iommu <iommu@lists.linux-foundation.org>
To: Robin Murphy <robin.murphy@arm.com>
Cc: Alex Williamson <alex.williamson@redhat.com>,
	Kevin Tian <kevin.tian@intel.com>,
	Chaitanya Kulkarni <kch@nvidia.com>,
	Ashok Raj <ashok.raj@intel.com>,
	kvm@vger.kernel.org, rafael@kernel.org,
	Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	Cornelia Huck <cohuck@redhat.com>,
	linux-kernel@vger.kernel.org,
	Christoph Hellwig <hch@infradead.org>,
	iommu@lists.linux-foundation.org,
	Jacob jun Pan <jacob.jun.pan@intel.com>,
	linux-pci@vger.kernel.org, Bjorn Helgaas <bhelgaas@google.com>,
	Will Deacon <will@kernel.org>,
	Diana Craciun <diana.craciun@oss.nxp.com>
Subject: Re: [PATCH 03/11] PCI: pci_stub: Suppress kernel DMA ownership auto-claiming
Date: Mon, 15 Nov 2021 17:19:52 -0400	[thread overview]
Message-ID: <20211115211952.GA2534655@nvidia.com> (raw)
In-Reply-To: <ab3ae3d1-c4d7-251a-fecc-d21f6b9d87a5@arm.com>

On Mon, Nov 15, 2021 at 08:58:19PM +0000, Robin Murphy wrote:
> > The above scenarios are already blocked by the kernel with
> > LOCKDOWN_DEV_MEM - yes there are historical ways to violate kernel
> > integrity, and these days they almost all have mitigation. I would
> > consider any kernel integrity violation to be a bug today if
> > LOCKDOWN_INTEGRITY_MAX is enabled.
> > 
> > I don't know why you bring this up?
> 
> Because as soon as anyone brings up "security" I'm not going to blindly
> accept the implicit assumption that VFIO is the only possible way to get one
> device to mess with another. That was just a silly example in the most basic
> terms, and obviously I don't expect well-worn generic sysfs interfaces to be
> a genuine threat, but how confident are you that no other subsystem- or
> driver-level interfaces past present and future can ever be tricked into p2p
> DMA?

Given the definition of LOCKDOWN_INTEGRITY_MAX I will consider any
past/present/future p2p attacks as definitive kernel bugs.

Generally, allowing a device to do arbitary DMA to a userspace
controlled address is a pretty serious bug, and directly attacking the
kernel memory is a much more interesting and serious threat vector.

> > What is the preference then? This is the only working API today,
> > right?
> 
> I believe the intent was that everyone should move to
> iommu_group_get()/iommu_attach_group() - precisely *because*
> iommu_attach_device() can't work sensibly for multi-device groups.

And iommu_attach_group() can't work sensibly for anything except VFIO
today, so hum :)

> > > Indeed I wasn't imagining it changing any ownership, just preventing a group
> > > from being attached to a non-default domain if it contains devices bound to
> > > different incompatible drivers.
> > 
> > So this could solve just the domain/DMA API problem, but it leaves the
> > MMIO peer-to-peer issue unsolved, and it gives no tools to solve it in
> > a layered way.
> > 
> > This seems like half an idea, do you have a solution for the rest?
> 
> Tell me how the p2p DMA issue can manifest if device A is prohibited from
> attaching to VFIO's unmanaged domain while device B still has a driver
> bound, and thus would fail to be assigned to userspace in the first place.
> And conversely if non-VFIO drivers are still prevented from binding to
> device B while device A remains attached to the VFIO domain.

You've assumed that a group is continuously attached to the same
domain during the entire period that userspace has MMIO.

Any domain detatch creates a race where a kernel driver can jump in
and bind, while user space continues to have MMIO control over a
device. That violates the security invariant.

Many new flows, like PASID support, are requiring dynamically changing
the domain bound to a group.

If you want to go in this direction then we also need to have some
kind of compare and swap operation for the domain currently bound to a
group.

From a security perspective I disliek this idea a lot. Instead of
having nice clear barriers indicating the security domain we have a
very subtle 'so long as a domain is attached' idea, which is going to
get broken.

> > The concept of DMA USER is important here, and it is more than just
> > which domain is attached.
> 
> Tell me how a device would be assigned to userspace while its group is still
> attached to a kernel-managed default domain.
> 
> As soon as anyone calls iommu_attach_group() - or indeed
> iommu_attach_device() if more devices may end up hotplugged into the same
> group later - *that's* when the door opens for potential subversion of any
> kind, without ever having to leave kernel space.

The approach in this series can solve both, attach_device could switch
the device to user mode and it will block future hot plugged kernel
drivers.

> > VFIO also has logic related to the file
> 
> Yes, because unsurprisingly VFIO code is tailored for the specific case of
> VFIO usage rather than anything more general.

VFIO represents this class of users exposing the IOMMU to userspace,
I say it is general of that use class.

> > It isn't muddying the water, it is providing common security code that
> > is easy to undertand.
> > 
> > *Which* userspace FD/process owns the iommu_group is important
> > security information because we can't have process A do DMA attacks on
> > some other process B.
> 
> Tell me how a single group could be attached to two domains representing two
> different process address spaces at once.

Again you are focused on domains and ignoring MMIO.

Requiring the users of the API to continuously assert a non-default
domain is a non-trivial ask.

> In case this concept wasn't as clear as I thought, which seems to be so:
>
>                  | dev->iommu_group->domain | dev->driver
> DMA_OWNER_NONE   |          default         |   unbound
> DMA_OWNER_KERNEL |          default         |    bound
> DMA_OWNER_USER   |        non-default       |    bound

Unfortunately this can't use dev->driver. Reading dev->driver of every
device in the group requires holding the device_lock. really_probe()
already holds the device lock so this becomes a user triggerable ABBA
deadlock when scaled up to N devices.

This is why this series uses only the group mutex and tracks if
drivers are bound inside the group. I view that as unavoidable.

Jason
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

WARNING: multiple messages have this Message-ID (diff)
From: Jason Gunthorpe <jgg@nvidia.com>
To: Robin Murphy <robin.murphy@arm.com>
Cc: Christoph Hellwig <hch@infradead.org>,
	Kevin Tian <kevin.tian@intel.com>,
	Chaitanya Kulkarni <kch@nvidia.com>,
	Ashok Raj <ashok.raj@intel.com>,
	kvm@vger.kernel.org, rafael@kernel.org,
	Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	Cornelia Huck <cohuck@redhat.com>, Will Deacon <will@kernel.org>,
	linux-kernel@vger.kernel.org, iommu@lists.linux-foundation.org,
	Alex Williamson <alex.williamson@redhat.com>,
	Jacob jun Pan <jacob.jun.pan@intel.com>,
	linux-pci@vger.kernel.org, Bjorn Helgaas <bhelgaas@google.com>,
	Diana Craciun <diana.craciun@oss.nxp.com>
Subject: Re: [PATCH 03/11] PCI: pci_stub: Suppress kernel DMA ownership auto-claiming
Date: Mon, 15 Nov 2021 17:19:52 -0400	[thread overview]
Message-ID: <20211115211952.GA2534655@nvidia.com> (raw)
In-Reply-To: <ab3ae3d1-c4d7-251a-fecc-d21f6b9d87a5@arm.com>

On Mon, Nov 15, 2021 at 08:58:19PM +0000, Robin Murphy wrote:
> > The above scenarios are already blocked by the kernel with
> > LOCKDOWN_DEV_MEM - yes there are historical ways to violate kernel
> > integrity, and these days they almost all have mitigation. I would
> > consider any kernel integrity violation to be a bug today if
> > LOCKDOWN_INTEGRITY_MAX is enabled.
> > 
> > I don't know why you bring this up?
> 
> Because as soon as anyone brings up "security" I'm not going to blindly
> accept the implicit assumption that VFIO is the only possible way to get one
> device to mess with another. That was just a silly example in the most basic
> terms, and obviously I don't expect well-worn generic sysfs interfaces to be
> a genuine threat, but how confident are you that no other subsystem- or
> driver-level interfaces past present and future can ever be tricked into p2p
> DMA?

Given the definition of LOCKDOWN_INTEGRITY_MAX I will consider any
past/present/future p2p attacks as definitive kernel bugs.

Generally, allowing a device to do arbitary DMA to a userspace
controlled address is a pretty serious bug, and directly attacking the
kernel memory is a much more interesting and serious threat vector.

> > What is the preference then? This is the only working API today,
> > right?
> 
> I believe the intent was that everyone should move to
> iommu_group_get()/iommu_attach_group() - precisely *because*
> iommu_attach_device() can't work sensibly for multi-device groups.

And iommu_attach_group() can't work sensibly for anything except VFIO
today, so hum :)

> > > Indeed I wasn't imagining it changing any ownership, just preventing a group
> > > from being attached to a non-default domain if it contains devices bound to
> > > different incompatible drivers.
> > 
> > So this could solve just the domain/DMA API problem, but it leaves the
> > MMIO peer-to-peer issue unsolved, and it gives no tools to solve it in
> > a layered way.
> > 
> > This seems like half an idea, do you have a solution for the rest?
> 
> Tell me how the p2p DMA issue can manifest if device A is prohibited from
> attaching to VFIO's unmanaged domain while device B still has a driver
> bound, and thus would fail to be assigned to userspace in the first place.
> And conversely if non-VFIO drivers are still prevented from binding to
> device B while device A remains attached to the VFIO domain.

You've assumed that a group is continuously attached to the same
domain during the entire period that userspace has MMIO.

Any domain detatch creates a race where a kernel driver can jump in
and bind, while user space continues to have MMIO control over a
device. That violates the security invariant.

Many new flows, like PASID support, are requiring dynamically changing
the domain bound to a group.

If you want to go in this direction then we also need to have some
kind of compare and swap operation for the domain currently bound to a
group.

From a security perspective I disliek this idea a lot. Instead of
having nice clear barriers indicating the security domain we have a
very subtle 'so long as a domain is attached' idea, which is going to
get broken.

> > The concept of DMA USER is important here, and it is more than just
> > which domain is attached.
> 
> Tell me how a device would be assigned to userspace while its group is still
> attached to a kernel-managed default domain.
> 
> As soon as anyone calls iommu_attach_group() - or indeed
> iommu_attach_device() if more devices may end up hotplugged into the same
> group later - *that's* when the door opens for potential subversion of any
> kind, without ever having to leave kernel space.

The approach in this series can solve both, attach_device could switch
the device to user mode and it will block future hot plugged kernel
drivers.

> > VFIO also has logic related to the file
> 
> Yes, because unsurprisingly VFIO code is tailored for the specific case of
> VFIO usage rather than anything more general.

VFIO represents this class of users exposing the IOMMU to userspace,
I say it is general of that use class.

> > It isn't muddying the water, it is providing common security code that
> > is easy to undertand.
> > 
> > *Which* userspace FD/process owns the iommu_group is important
> > security information because we can't have process A do DMA attacks on
> > some other process B.
> 
> Tell me how a single group could be attached to two domains representing two
> different process address spaces at once.

Again you are focused on domains and ignoring MMIO.

Requiring the users of the API to continuously assert a non-default
domain is a non-trivial ask.

> In case this concept wasn't as clear as I thought, which seems to be so:
>
>                  | dev->iommu_group->domain | dev->driver
> DMA_OWNER_NONE   |          default         |   unbound
> DMA_OWNER_KERNEL |          default         |    bound
> DMA_OWNER_USER   |        non-default       |    bound

Unfortunately this can't use dev->driver. Reading dev->driver of every
device in the group requires holding the device_lock. really_probe()
already holds the device lock so this becomes a user triggerable ABBA
deadlock when scaled up to N devices.

This is why this series uses only the group mutex and tracks if
drivers are bound inside the group. I view that as unavoidable.

Jason

  reply	other threads:[~2021-11-15 21:20 UTC|newest]

Thread overview: 120+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-11-15  2:05 [PATCH 00/11] Fix BUG_ON in vfio_iommu_group_notifier() Lu Baolu
2021-11-15  2:05 ` Lu Baolu
2021-11-15  2:05 ` [PATCH 01/11] iommu: Add device dma ownership set/release interfaces Lu Baolu
2021-11-15  2:05   ` Lu Baolu
2021-11-15 13:14   ` Christoph Hellwig
2021-11-15 13:14     ` Christoph Hellwig
2021-11-16  1:57     ` Lu Baolu
2021-11-16  1:57       ` Lu Baolu
2021-11-16 13:46       ` Jason Gunthorpe
2021-11-16 13:46         ` Jason Gunthorpe via iommu
2021-11-17  5:22         ` Lu Baolu
2021-11-17  5:22           ` Lu Baolu
2021-11-17 13:35           ` Jason Gunthorpe
2021-11-17 13:35             ` Jason Gunthorpe via iommu
2021-11-18  1:12             ` Lu Baolu
2021-11-18  1:12               ` Lu Baolu
2021-11-18 14:10               ` Jason Gunthorpe
2021-11-18 14:10                 ` Jason Gunthorpe via iommu
2021-11-18  2:39         ` Tian, Kevin
2021-11-18  2:39           ` Tian, Kevin
2021-11-18 13:33           ` Jason Gunthorpe
2021-11-18 13:33             ` Jason Gunthorpe via iommu
2021-11-19  5:44             ` Tian, Kevin
2021-11-19  5:44               ` Tian, Kevin
2021-11-19 11:14               ` Lu Baolu
2021-11-19 11:14                 ` Lu Baolu
2021-11-19 15:06                 ` Jörg Rödel
2021-11-19 15:06                   ` Jörg Rödel
2021-11-19 15:43                   ` Jason Gunthorpe
2021-11-19 15:43                     ` Jason Gunthorpe via iommu
2021-11-20 11:16                   ` Lu Baolu
2021-11-20 11:16                     ` Lu Baolu
2021-11-19 12:56               ` Jason Gunthorpe
2021-11-19 12:56                 ` Jason Gunthorpe via iommu
2021-11-15 20:38   ` Bjorn Helgaas
2021-11-15 20:38     ` Bjorn Helgaas
2021-11-16  1:52     ` Lu Baolu
2021-11-16  1:52       ` Lu Baolu
2021-11-15  2:05 ` [PATCH 02/11] driver core: Set DMA ownership during driver bind/unbind Lu Baolu
2021-11-15  2:05   ` Lu Baolu
2021-11-15  6:59   ` Greg Kroah-Hartman
2021-11-15  6:59     ` Greg Kroah-Hartman
2021-11-15 13:20     ` Christoph Hellwig
2021-11-15 13:20       ` Christoph Hellwig
2021-11-15 13:38     ` Jason Gunthorpe via iommu
2021-11-15 13:38       ` Jason Gunthorpe
2021-11-15 13:19   ` Christoph Hellwig
2021-11-15 13:19     ` Christoph Hellwig
2021-11-15 13:24     ` Jason Gunthorpe
2021-11-15 13:24       ` Jason Gunthorpe via iommu
2021-11-15 15:37       ` Robin Murphy
2021-11-15 15:37         ` Robin Murphy
2021-11-15 15:56         ` Jason Gunthorpe
2021-11-15 15:56           ` Jason Gunthorpe via iommu
2021-11-15 18:15           ` Christoph Hellwig
2021-11-15 18:15             ` Christoph Hellwig
2021-11-15 18:35           ` Robin Murphy
2021-11-15 18:35             ` Robin Murphy
2021-11-15 19:39             ` Jason Gunthorpe via iommu
2021-11-15 19:39               ` Jason Gunthorpe
2021-11-15  2:05 ` [PATCH 03/11] PCI: pci_stub: Suppress kernel DMA ownership auto-claiming Lu Baolu
2021-11-15  2:05   ` Lu Baolu
2021-11-15 13:21   ` Christoph Hellwig
2021-11-15 13:21     ` Christoph Hellwig
2021-11-15 13:31     ` Jason Gunthorpe via iommu
2021-11-15 13:31       ` Jason Gunthorpe
2021-11-15 15:14       ` Robin Murphy
2021-11-15 15:14         ` Robin Murphy
2021-11-15 16:17         ` Jason Gunthorpe
2021-11-15 16:17           ` Jason Gunthorpe via iommu
2021-11-15 17:54           ` Robin Murphy
2021-11-15 17:54             ` Robin Murphy
2021-11-15 18:19             ` Christoph Hellwig
2021-11-15 18:19               ` Christoph Hellwig
2021-11-15 18:44               ` Robin Murphy
2021-11-15 18:44                 ` Robin Murphy
2021-11-15 19:22             ` Jason Gunthorpe via iommu
2021-11-15 19:22               ` Jason Gunthorpe
2021-11-15 20:58               ` Robin Murphy
2021-11-15 20:58                 ` Robin Murphy
2021-11-15 21:19                 ` Jason Gunthorpe via iommu [this message]
2021-11-15 21:19                   ` Jason Gunthorpe
2021-11-15 20:48   ` Bjorn Helgaas
2021-11-15 20:48     ` Bjorn Helgaas
2021-11-15 22:17   ` Bjorn Helgaas
2021-11-15 22:17     ` Bjorn Helgaas
2021-11-16  6:05     ` Lu Baolu
2021-11-16  6:05       ` Lu Baolu
2021-11-15  2:05 ` [PATCH 04/11] PCI: portdrv: " Lu Baolu
2021-11-15  2:05   ` Lu Baolu
2021-11-15 20:44   ` Bjorn Helgaas
2021-11-15 20:44     ` Bjorn Helgaas
2021-11-16  7:24     ` Lu Baolu
2021-11-16  7:24       ` Lu Baolu
2021-11-16 20:22       ` Bjorn Helgaas
2021-11-16 20:22         ` Bjorn Helgaas
2021-11-16 20:48         ` Jason Gunthorpe
2021-11-16 20:48           ` Jason Gunthorpe via iommu
2021-11-15  2:05 ` [PATCH 05/11] iommu: Add security context management for assigned devices Lu Baolu
2021-11-15  2:05   ` Lu Baolu
2021-11-15 13:22   ` Christoph Hellwig
2021-11-15 13:22     ` Christoph Hellwig
2021-11-16  7:25     ` Lu Baolu
2021-11-16  7:25       ` Lu Baolu
2021-11-15  2:05 ` [PATCH 06/11] iommu: Expose group variants of dma ownership interfaces Lu Baolu
2021-11-15  2:05   ` Lu Baolu
2021-11-15 13:27   ` Christoph Hellwig
2021-11-15 13:27     ` Christoph Hellwig
2021-11-16  9:42     ` Lu Baolu
2021-11-16  9:42       ` Lu Baolu
2021-11-15  2:05 ` [PATCH 07/11] vfio: Use DMA_OWNER_USER to declaim passthrough devices Lu Baolu
2021-11-15  2:05   ` Lu Baolu
2021-11-15  2:05 ` [PATCH 08/11] vfio: Remove use of vfio_group_viable() Lu Baolu
2021-11-15  2:05   ` Lu Baolu
2021-11-15  2:05 ` [PATCH 09/11] vfio: Delete the unbound_list Lu Baolu
2021-11-15  2:05   ` Lu Baolu
2021-11-15  2:05 ` [PATCH 10/11] vfio: Remove iommu group notifier Lu Baolu
2021-11-15  2:05   ` Lu Baolu
2021-11-15  2:05 ` [PATCH 11/11] iommu: Remove iommu group changes notifier Lu Baolu
2021-11-15  2:05   ` Lu Baolu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20211115211952.GA2534655@nvidia.com \
    --to=iommu@lists.linux-foundation.org \
    --cc=alex.williamson@redhat.com \
    --cc=ashok.raj@intel.com \
    --cc=bhelgaas@google.com \
    --cc=cohuck@redhat.com \
    --cc=diana.craciun@oss.nxp.com \
    --cc=gregkh@linuxfoundation.org \
    --cc=hch@infradead.org \
    --cc=jacob.jun.pan@intel.com \
    --cc=jgg@nvidia.com \
    --cc=kch@nvidia.com \
    --cc=kevin.tian@intel.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pci@vger.kernel.org \
    --cc=rafael@kernel.org \
    --cc=robin.murphy@arm.com \
    --cc=will@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.