LKML Archive on lore.kernel.org
 help / color / Atom feed
* [PATCH] vfio/type1: Limit DMA mappings per container
@ 2019-04-01 20:16 Alex Williamson
  2019-04-02  2:41 ` Peter Xu
  2019-04-02 14:58 ` Cornelia Huck
  0 siblings, 2 replies; 6+ messages in thread
From: Alex Williamson @ 2019-04-01 20:16 UTC (permalink / raw)
  To: alex.williamson; +Cc: kvm, linux-kernel, eric.auger, cohuck

Memory backed DMA mappings are accounted against a user's locked
memory limit, including multiple mappings of the same memory.  This
accounting bounds the number of such mappings that a user can create.
However, DMA mappings that are not backed by memory, such as DMA
mappings of device MMIO via mmaps, do not make use of page pinning
and therefore do not count against the user's locked memory limit.
These mappings still consume memory, but the memory is not well
associated to the process for the purpose of oom killing a task.

To add bounding on this use case, we introduce a limit to the total
number of concurrent DMA mappings that a user is allowed to create.
This limit is exposed as a tunable module option where the default
value of 64K is expected to be well in excess of any reasonable use
case (a large virtual machine configuration would typically only make
use of tens of concurrent mappings).

This fixes CVE-2019-3882.

Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
---
 drivers/vfio/vfio_iommu_type1.c |   14 ++++++++++++++
 1 file changed, 14 insertions(+)

diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
index 73652e21efec..7fc8fd7d4dc7 100644
--- a/drivers/vfio/vfio_iommu_type1.c
+++ b/drivers/vfio/vfio_iommu_type1.c
@@ -58,12 +58,18 @@ module_param_named(disable_hugepages,
 MODULE_PARM_DESC(disable_hugepages,
 		 "Disable VFIO IOMMU support for IOMMU hugepages.");
 
+static int dma_entry_limit __read_mostly = U16_MAX;
+module_param_named(dma_entry_limit, dma_entry_limit, int, 0644);
+MODULE_PARM_DESC(dma_entry_limit,
+		 "Maximum number of user DMA mappings per container (65535).");
+
 struct vfio_iommu {
 	struct list_head	domain_list;
 	struct vfio_domain	*external_domain; /* domain for external user */
 	struct mutex		lock;
 	struct rb_root		dma_list;
 	struct blocking_notifier_head notifier;
+	atomic_t		dma_avail;
 	bool			v2;
 	bool			nesting;
 };
@@ -836,6 +842,7 @@ static void vfio_remove_dma(struct vfio_iommu *iommu, struct vfio_dma *dma)
 	vfio_unlink_dma(iommu, dma);
 	put_task_struct(dma->task);
 	kfree(dma);
+	atomic_inc(&iommu->dma_avail);
 }
 
 static unsigned long vfio_pgsize_bitmap(struct vfio_iommu *iommu)
@@ -1081,8 +1088,14 @@ static int vfio_dma_do_map(struct vfio_iommu *iommu,
 		goto out_unlock;
 	}
 
+	if (!atomic_add_unless(&iommu->dma_avail, -1, 0)) {
+		ret = -ENOSPC;
+		goto out_unlock;
+	}
+
 	dma = kzalloc(sizeof(*dma), GFP_KERNEL);
 	if (!dma) {
+		atomic_inc(&iommu->dma_avail);
 		ret = -ENOMEM;
 		goto out_unlock;
 	}
@@ -1583,6 +1596,7 @@ static void *vfio_iommu_type1_open(unsigned long arg)
 
 	INIT_LIST_HEAD(&iommu->domain_list);
 	iommu->dma_list = RB_ROOT;
+	atomic_set(&iommu->dma_avail, dma_entry_limit);
 	mutex_init(&iommu->lock);
 	BLOCKING_INIT_NOTIFIER_HEAD(&iommu->notifier);
 


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] vfio/type1: Limit DMA mappings per container
  2019-04-01 20:16 [PATCH] vfio/type1: Limit DMA mappings per container Alex Williamson
@ 2019-04-02  2:41 ` Peter Xu
  2019-04-02  4:34   ` Alex Williamson
  2019-04-02 14:58 ` Cornelia Huck
  1 sibling, 1 reply; 6+ messages in thread
From: Peter Xu @ 2019-04-02  2:41 UTC (permalink / raw)
  To: Alex Williamson; +Cc: kvm, linux-kernel, eric.auger, cohuck

On Mon, Apr 01, 2019 at 02:16:52PM -0600, Alex Williamson wrote:

[...]

> @@ -1081,8 +1088,14 @@ static int vfio_dma_do_map(struct vfio_iommu *iommu,
>  		goto out_unlock;
>  	}
>  
> +	if (!atomic_add_unless(&iommu->dma_avail, -1, 0)) {
> +		ret = -ENOSPC;
> +		goto out_unlock;
> +	}
> +
>  	dma = kzalloc(sizeof(*dma), GFP_KERNEL);
>  	if (!dma) {
> +		atomic_inc(&iommu->dma_avail);

This should be the only special path to revert the change.  Not sure
whether this can be avoided by simply using atomic_read() or even
READ_ONCE() (I feel like we don't need atomic ops with dma_avail
because we've had the mutex but it of course it doesn't hurt...) to
replace atomic_add_unless() above to check against zero then we do
+1/-1 in vfio_[un]link_dma() only.  But AFAICT this patch is correct.

Reviewed-by: Peter Xu <peterx@redhat.com>

Thanks,

-- 
Peter Xu

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] vfio/type1: Limit DMA mappings per container
  2019-04-02  2:41 ` Peter Xu
@ 2019-04-02  4:34   ` Alex Williamson
  2019-04-02  5:18     ` Peter Xu
  0 siblings, 1 reply; 6+ messages in thread
From: Alex Williamson @ 2019-04-02  4:34 UTC (permalink / raw)
  To: Peter Xu; +Cc: kvm, linux-kernel, eric.auger, cohuck

On Tue, 2 Apr 2019 10:41:15 +0800
Peter Xu <peterx@redhat.com> wrote:

> On Mon, Apr 01, 2019 at 02:16:52PM -0600, Alex Williamson wrote:
> 
> [...]
> 
> > @@ -1081,8 +1088,14 @@ static int vfio_dma_do_map(struct vfio_iommu *iommu,
> >  		goto out_unlock;
> >  	}
> >  
> > +	if (!atomic_add_unless(&iommu->dma_avail, -1, 0)) {
> > +		ret = -ENOSPC;
> > +		goto out_unlock;
> > +	}
> > +
> >  	dma = kzalloc(sizeof(*dma), GFP_KERNEL);
> >  	if (!dma) {
> > +		atomic_inc(&iommu->dma_avail);  
> 
> This should be the only special path to revert the change.  Not sure
> whether this can be avoided by simply using atomic_read() or even
> READ_ONCE() (I feel like we don't need atomic ops with dma_avail
> because we've had the mutex but it of course it doesn't hurt...) to
> replace atomic_add_unless() above to check against zero then we do
> +1/-1 in vfio_[un]link_dma() only.  But AFAICT this patch is correct.

Thanks for the review, you're right, we're only twiddling this atomic
while holding the iommu->lock mutex, so it appears unnecessary.  Since
we're within the mutex, I think we don't even need a READ_ONCE.  We can
simple test it before alloc and decrement after.  Am I missing something
that would specifically require READ_ONCE within our mutex critical
section?  Thanks,

Alex

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] vfio/type1: Limit DMA mappings per container
  2019-04-02  4:34   ` Alex Williamson
@ 2019-04-02  5:18     ` Peter Xu
  2019-04-02 14:40       ` Alex Williamson
  0 siblings, 1 reply; 6+ messages in thread
From: Peter Xu @ 2019-04-02  5:18 UTC (permalink / raw)
  To: Alex Williamson; +Cc: kvm, linux-kernel, eric.auger, cohuck

On Mon, Apr 01, 2019 at 10:34:13PM -0600, Alex Williamson wrote:
> On Tue, 2 Apr 2019 10:41:15 +0800
> Peter Xu <peterx@redhat.com> wrote:
> 
> > On Mon, Apr 01, 2019 at 02:16:52PM -0600, Alex Williamson wrote:
> > 
> > [...]
> > 
> > > @@ -1081,8 +1088,14 @@ static int vfio_dma_do_map(struct vfio_iommu *iommu,
> > >  		goto out_unlock;
> > >  	}
> > >  
> > > +	if (!atomic_add_unless(&iommu->dma_avail, -1, 0)) {
> > > +		ret = -ENOSPC;
> > > +		goto out_unlock;
> > > +	}
> > > +
> > >  	dma = kzalloc(sizeof(*dma), GFP_KERNEL);
> > >  	if (!dma) {
> > > +		atomic_inc(&iommu->dma_avail);  
> > 
> > This should be the only special path to revert the change.  Not sure
> > whether this can be avoided by simply using atomic_read() or even
> > READ_ONCE() (I feel like we don't need atomic ops with dma_avail
> > because we've had the mutex but it of course it doesn't hurt...) to
> > replace atomic_add_unless() above to check against zero then we do
> > +1/-1 in vfio_[un]link_dma() only.  But AFAICT this patch is correct.
> 
> Thanks for the review, you're right, we're only twiddling this atomic
> while holding the iommu->lock mutex, so it appears unnecessary.  Since
> we're within the mutex, I think we don't even need a READ_ONCE.  We can
> simple test it before alloc and decrement after.  Am I missing something
> that would specifically require READ_ONCE within our mutex critical
> section?  Thanks,

I don't know very clear on this and I'd be glad to learn about that.
My understanding is that [READ|WRITE]_ONCE() is the same as volatile
mem operation and will make sure we don't keep variables in the
registers.  So if the mutex semantics can support that (say, a "*addr
= val" following with a mutex_unlock will make sure "val" will
definitely land into memory of "&addr") then I do think it's fine even
without it (which corresponds to WRITE_ONCE(&addr, val) in this case).

Thanks,

-- 
Peter Xu

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] vfio/type1: Limit DMA mappings per container
  2019-04-02  5:18     ` Peter Xu
@ 2019-04-02 14:40       ` Alex Williamson
  0 siblings, 0 replies; 6+ messages in thread
From: Alex Williamson @ 2019-04-02 14:40 UTC (permalink / raw)
  To: Peter Xu; +Cc: kvm, linux-kernel, eric.auger, cohuck

On Tue, 2 Apr 2019 13:18:02 +0800
Peter Xu <peterx@redhat.com> wrote:

> On Mon, Apr 01, 2019 at 10:34:13PM -0600, Alex Williamson wrote:
> > On Tue, 2 Apr 2019 10:41:15 +0800
> > Peter Xu <peterx@redhat.com> wrote:
> >   
> > > On Mon, Apr 01, 2019 at 02:16:52PM -0600, Alex Williamson wrote:
> > > 
> > > [...]
> > >   
> > > > @@ -1081,8 +1088,14 @@ static int vfio_dma_do_map(struct vfio_iommu *iommu,
> > > >  		goto out_unlock;
> > > >  	}
> > > >  
> > > > +	if (!atomic_add_unless(&iommu->dma_avail, -1, 0)) {
> > > > +		ret = -ENOSPC;
> > > > +		goto out_unlock;
> > > > +	}
> > > > +
> > > >  	dma = kzalloc(sizeof(*dma), GFP_KERNEL);
> > > >  	if (!dma) {
> > > > +		atomic_inc(&iommu->dma_avail);    
> > > 
> > > This should be the only special path to revert the change.  Not sure
> > > whether this can be avoided by simply using atomic_read() or even
> > > READ_ONCE() (I feel like we don't need atomic ops with dma_avail
> > > because we've had the mutex but it of course it doesn't hurt...) to
> > > replace atomic_add_unless() above to check against zero then we do
> > > +1/-1 in vfio_[un]link_dma() only.  But AFAICT this patch is correct.  
> > 
> > Thanks for the review, you're right, we're only twiddling this atomic
> > while holding the iommu->lock mutex, so it appears unnecessary.  Since
> > we're within the mutex, I think we don't even need a READ_ONCE.  We can
> > simple test it before alloc and decrement after.  Am I missing something
> > that would specifically require READ_ONCE within our mutex critical
> > section?  Thanks,  
> 
> I don't know very clear on this and I'd be glad to learn about that.
> My understanding is that [READ|WRITE]_ONCE() is the same as volatile
> mem operation and will make sure we don't keep variables in the
> registers.  So if the mutex semantics can support that (say, a "*addr
> = val" following with a mutex_unlock will make sure "val" will
> definitely land into memory of "&addr") then I do think it's fine even
> without it (which corresponds to WRITE_ONCE(&addr, val) in this case).

The READ/WRITE_ONCE macros add memory barriers, but we have the mutex
for protecting concurrent access to the data.  I don't see that there's
anything special about a counter on the iommu object that needs special
attention vs any other elements that might get modified in these
sections.  Thanks,

Alex

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] vfio/type1: Limit DMA mappings per container
  2019-04-01 20:16 [PATCH] vfio/type1: Limit DMA mappings per container Alex Williamson
  2019-04-02  2:41 ` Peter Xu
@ 2019-04-02 14:58 ` Cornelia Huck
  1 sibling, 0 replies; 6+ messages in thread
From: Cornelia Huck @ 2019-04-02 14:58 UTC (permalink / raw)
  To: Alex Williamson; +Cc: kvm, linux-kernel, eric.auger

On Mon, 01 Apr 2019 14:16:52 -0600
Alex Williamson <alex.williamson@redhat.com> wrote:

> Memory backed DMA mappings are accounted against a user's locked
> memory limit, including multiple mappings of the same memory.  This
> accounting bounds the number of such mappings that a user can create.
> However, DMA mappings that are not backed by memory, such as DMA
> mappings of device MMIO via mmaps, do not make use of page pinning
> and therefore do not count against the user's locked memory limit.
> These mappings still consume memory, but the memory is not well
> associated to the process for the purpose of oom killing a task.
> 
> To add bounding on this use case, we introduce a limit to the total
> number of concurrent DMA mappings that a user is allowed to create.
> This limit is exposed as a tunable module option where the default
> value of 64K is expected to be well in excess of any reasonable use
> case (a large virtual machine configuration would typically only make
> use of tens of concurrent mappings).
> 
> This fixes CVE-2019-3882.
> 
> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
> ---
>  drivers/vfio/vfio_iommu_type1.c |   14 ++++++++++++++
>  1 file changed, 14 insertions(+)

Reviewed-by: Cornelia Huck <cohuck@redhat.com>

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, back to index

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-04-01 20:16 [PATCH] vfio/type1: Limit DMA mappings per container Alex Williamson
2019-04-02  2:41 ` Peter Xu
2019-04-02  4:34   ` Alex Williamson
2019-04-02  5:18     ` Peter Xu
2019-04-02 14:40       ` Alex Williamson
2019-04-02 14:58 ` Cornelia Huck

LKML Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/lkml/0 lkml/git/0.git
	git clone --mirror https://lore.kernel.org/lkml/1 lkml/git/1.git
	git clone --mirror https://lore.kernel.org/lkml/2 lkml/git/2.git
	git clone --mirror https://lore.kernel.org/lkml/3 lkml/git/3.git
	git clone --mirror https://lore.kernel.org/lkml/4 lkml/git/4.git
	git clone --mirror https://lore.kernel.org/lkml/5 lkml/git/5.git
	git clone --mirror https://lore.kernel.org/lkml/6 lkml/git/6.git
	git clone --mirror https://lore.kernel.org/lkml/7 lkml/git/7.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 lkml lkml/ https://lore.kernel.org/lkml \
		linux-kernel@vger.kernel.org linux-kernel@archiver.kernel.org
	public-inbox-index lkml


Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-kernel


AGPL code for this site: git clone https://public-inbox.org/ public-inbox