IOMMU Archive on lore.kernel.org
 help / color / Atom feed
* Re: [bug] __blk_mq_run_hw_queue suspicious rcu usage
       [not found] <alpine.DEB.2.21.1909041434580.160038@chino.kir.corp.google.com>
@ 2019-09-05  6:06 ` Christoph Hellwig
  2019-09-05 22:37   ` David Rientjes via iommu
  0 siblings, 1 reply; 7+ messages in thread
From: Christoph Hellwig @ 2019-09-05  6:06 UTC (permalink / raw)
  To: David Rientjes
  Cc: Jens Axboe, Tom Lendacky, Brijesh Singh, x86, linux-kernel,
	Ming Lei, iommu, Peter Gonda, Christoph Hellwig, Jianxiong Gao

On Wed, Sep 04, 2019 at 02:40:44PM -0700, David Rientjes wrote:
> Hi Christoph, Jens, and Ming,
> 
> While booting a 5.2 SEV-enabled guest we have encountered the following 
> WARNING that is followed up by a BUG because we are in atomic context 
> while trying to call set_memory_decrypted:

Well, this really is a x86 / DMA API issue unfortunately.  Drivers
are allowed to do GFP_ATOMIC dma allocation under locks / rcu critical
sections and from interrupts.  And it seems like the SEV case can't
handle that.  We have some semi-generic code to have a fixed sized
pool in kernel/dma for non-coherent platforms that have similar issues
that we could try to wire up, but I wonder if there is a better way
to handle the issue, so I've added Tom and the x86 maintainers.

Now independent of that issue using DMA coherent memory for the nvme
PRPs/SGLs doesn't actually feel very optional.  We could do with
normal kmalloc allocations and just sync it to the device and back.
I wonder if we should create some general mempool-like helpers for that.
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [bug] __blk_mq_run_hw_queue suspicious rcu usage
  2019-09-05  6:06 ` [bug] __blk_mq_run_hw_queue suspicious rcu usage Christoph Hellwig
@ 2019-09-05 22:37   ` David Rientjes via iommu
  2019-09-16 23:45     ` David Rientjes via iommu
  0 siblings, 1 reply; 7+ messages in thread
From: David Rientjes via iommu @ 2019-09-05 22:37 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Jens Axboe, Tom Lendacky, Brijesh Singh, x86, linux-kernel,
	Ming Lei, iommu, Peter Gonda, Jianxiong Gao

On Thu, 5 Sep 2019, Christoph Hellwig wrote:

> > Hi Christoph, Jens, and Ming,
> > 
> > While booting a 5.2 SEV-enabled guest we have encountered the following 
> > WARNING that is followed up by a BUG because we are in atomic context 
> > while trying to call set_memory_decrypted:
> 
> Well, this really is a x86 / DMA API issue unfortunately.  Drivers
> are allowed to do GFP_ATOMIC dma allocation under locks / rcu critical
> sections and from interrupts.  And it seems like the SEV case can't
> handle that.  We have some semi-generic code to have a fixed sized
> pool in kernel/dma for non-coherent platforms that have similar issues
> that we could try to wire up, but I wonder if there is a better way
> to handle the issue, so I've added Tom and the x86 maintainers.
> 
> Now independent of that issue using DMA coherent memory for the nvme
> PRPs/SGLs doesn't actually feel very optional.  We could do with
> normal kmalloc allocations and just sync it to the device and back.
> I wonder if we should create some general mempool-like helpers for that.
> 

Thanks for looking into this.  I assume it's a non-starter to try to 
address this in _vm_unmap_aliases() itself, i.e. rely on a purge spinlock 
to do all synchronization (or trylock if not forced) for 
purge_vmap_area_lazy() rather than only the vmap_area_lock within it.  In 
other words, no mutex.

If that's the case, and set_memory_encrypted() can't be fixed to not need 
to sleep by changing _vm_unmap_aliases() locking, then I assume dmapool is 
our only alternative?  I have no idea with how large this should be.
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [bug] __blk_mq_run_hw_queue suspicious rcu usage
  2019-09-05 22:37   ` David Rientjes via iommu
@ 2019-09-16 23:45     ` David Rientjes via iommu
  2019-09-17 18:23       ` David Rientjes via iommu
  0 siblings, 1 reply; 7+ messages in thread
From: David Rientjes via iommu @ 2019-09-16 23:45 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Jens Axboe, Tom Lendacky, Brijesh Singh, x86, linux-kernel,
	Ming Lei, iommu, Peter Gonda, Jianxiong Gao

On Thu, 5 Sep 2019, David Rientjes wrote:

> > > Hi Christoph, Jens, and Ming,
> > > 
> > > While booting a 5.2 SEV-enabled guest we have encountered the following 
> > > WARNING that is followed up by a BUG because we are in atomic context 
> > > while trying to call set_memory_decrypted:
> > 
> > Well, this really is a x86 / DMA API issue unfortunately.  Drivers
> > are allowed to do GFP_ATOMIC dma allocation under locks / rcu critical
> > sections and from interrupts.  And it seems like the SEV case can't
> > handle that.  We have some semi-generic code to have a fixed sized
> > pool in kernel/dma for non-coherent platforms that have similar issues
> > that we could try to wire up, but I wonder if there is a better way
> > to handle the issue, so I've added Tom and the x86 maintainers.
> > 
> > Now independent of that issue using DMA coherent memory for the nvme
> > PRPs/SGLs doesn't actually feel very optional.  We could do with
> > normal kmalloc allocations and just sync it to the device and back.
> > I wonder if we should create some general mempool-like helpers for that.
> > 
> 
> Thanks for looking into this.  I assume it's a non-starter to try to 
> address this in _vm_unmap_aliases() itself, i.e. rely on a purge spinlock 
> to do all synchronization (or trylock if not forced) for 
> purge_vmap_area_lazy() rather than only the vmap_area_lock within it.  In 
> other words, no mutex.
> 
> If that's the case, and set_memory_encrypted() can't be fixed to not need 
> to sleep by changing _vm_unmap_aliases() locking, then I assume dmapool is 
> our only alternative?  I have no idea with how large this should be.
> 

Brijesh and Tom, we currently hit this any time we boot an SEV enabled 
Ubuntu 18.04 guest; I assume that guest kernels, especially those of such 
major distributions, are expected to work with warnings and BUGs when 
certain drivers are enabled.

If the vmap purge lock is to remain a mutex (any other reason that 
unmapping aliases can block?) then it appears that allocating a dmapool 
is the only alternative.  Is this something that you'll be addressing 
generically or do we need to get buy-in from the maintainers of this 
specific driver?
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [bug] __blk_mq_run_hw_queue suspicious rcu usage
  2019-09-16 23:45     ` David Rientjes via iommu
@ 2019-09-17 18:23       ` David Rientjes via iommu
  2019-09-17 18:32         ` Jens Axboe
  2019-09-17 18:41         ` Lendacky, Thomas
  0 siblings, 2 replies; 7+ messages in thread
From: David Rientjes via iommu @ 2019-09-17 18:23 UTC (permalink / raw)
  To: Christoph Hellwig, Keith Busch, Jens Axboe
  Cc: Tom Lendacky, Brijesh Singh, x86, linux-kernel, Ming Lei, iommu,
	Peter Gonda, Jianxiong Gao

On Mon, 16 Sep 2019, David Rientjes wrote:

> Brijesh and Tom, we currently hit this any time we boot an SEV enabled 
> Ubuntu 18.04 guest; I assume that guest kernels, especially those of such 
> major distributions, are expected to work with warnings and BUGs when 
> certain drivers are enabled.
> 
> If the vmap purge lock is to remain a mutex (any other reason that 
> unmapping aliases can block?) then it appears that allocating a dmapool 
> is the only alternative.  Is this something that you'll be addressing 
> generically or do we need to get buy-in from the maintainers of this 
> specific driver?
> 

We've found that the following applied on top of 5.2.14 suppresses the 
warnings.

Christoph, Keith, Jens, is this something that we could do for the nvme 
driver?  I'll happily propose it formally if it would be acceptable.

Thanks!


diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -1613,7 +1613,8 @@ static int nvme_alloc_admin_tags(struct nvme_dev *dev)
 		dev->admin_tagset.timeout = ADMIN_TIMEOUT;
 		dev->admin_tagset.numa_node = dev_to_node(dev->dev);
 		dev->admin_tagset.cmd_size = sizeof(struct nvme_iod);
-		dev->admin_tagset.flags = BLK_MQ_F_NO_SCHED;
+		dev->admin_tagset.flags = BLK_MQ_F_NO_SCHED |
+					  BLK_MQ_F_BLOCKING;
 		dev->admin_tagset.driver_data = dev;
 
 		if (blk_mq_alloc_tag_set(&dev->admin_tagset))
@@ -2262,7 +2263,8 @@ static int nvme_dev_add(struct nvme_dev *dev)
 		dev->tagset.queue_depth =
 				min_t(int, dev->q_depth, BLK_MQ_MAX_DEPTH) - 1;
 		dev->tagset.cmd_size = sizeof(struct nvme_iod);
-		dev->tagset.flags = BLK_MQ_F_SHOULD_MERGE;
+		dev->tagset.flags = BLK_MQ_F_SHOULD_MERGE |
+				    BLK_MQ_F_BLOCKING;
 		dev->tagset.driver_data = dev;
 
 		ret = blk_mq_alloc_tag_set(&dev->tagset);
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [bug] __blk_mq_run_hw_queue suspicious rcu usage
  2019-09-17 18:23       ` David Rientjes via iommu
@ 2019-09-17 18:32         ` Jens Axboe
  2019-09-17 18:41         ` Lendacky, Thomas
  1 sibling, 0 replies; 7+ messages in thread
From: Jens Axboe @ 2019-09-17 18:32 UTC (permalink / raw)
  To: David Rientjes, Christoph Hellwig, Keith Busch
  Cc: Tom Lendacky, Brijesh Singh, x86, linux-kernel, Ming Lei, iommu,
	Peter Gonda, Jianxiong Gao

On 9/17/19 12:23 PM, David Rientjes wrote:
> On Mon, 16 Sep 2019, David Rientjes wrote:
> 
>> Brijesh and Tom, we currently hit this any time we boot an SEV enabled
>> Ubuntu 18.04 guest; I assume that guest kernels, especially those of such
>> major distributions, are expected to work with warnings and BUGs when
>> certain drivers are enabled.
>>
>> If the vmap purge lock is to remain a mutex (any other reason that
>> unmapping aliases can block?) then it appears that allocating a dmapool
>> is the only alternative.  Is this something that you'll be addressing
>> generically or do we need to get buy-in from the maintainers of this
>> specific driver?
>>
> 
> We've found that the following applied on top of 5.2.14 suppresses the
> warnings.
> 
> Christoph, Keith, Jens, is this something that we could do for the nvme
> driver?  I'll happily propose it formally if it would be acceptable.

No, this is not going to be acceptable, I'm afraid. This tells blk-mq
that the driver always needs blocking context for queueing IO, which
will increase latencies for the cases where we'd otherwise issue IO
directly from the context that queues it.

-- 
Jens Axboe

_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [bug] __blk_mq_run_hw_queue suspicious rcu usage
  2019-09-17 18:23       ` David Rientjes via iommu
  2019-09-17 18:32         ` Jens Axboe
@ 2019-09-17 18:41         ` Lendacky, Thomas
  2019-09-18 13:22           ` Christoph Hellwig
  1 sibling, 1 reply; 7+ messages in thread
From: Lendacky, Thomas @ 2019-09-17 18:41 UTC (permalink / raw)
  To: David Rientjes, Christoph Hellwig, Keith Busch, Jens Axboe
  Cc: Singh, Brijesh, x86, linux-kernel, Ming Lei, iommu, Peter Gonda,
	Jianxiong Gao

On 9/17/19 1:23 PM, David Rientjes wrote:
> On Mon, 16 Sep 2019, David Rientjes wrote:
> 
>> Brijesh and Tom, we currently hit this any time we boot an SEV enabled 
>> Ubuntu 18.04 guest; I assume that guest kernels, especially those of such 
>> major distributions, are expected to work with warnings and BUGs when 
>> certain drivers are enabled.
>>
>> If the vmap purge lock is to remain a mutex (any other reason that 
>> unmapping aliases can block?) then it appears that allocating a dmapool 
>> is the only alternative.  Is this something that you'll be addressing 
>> generically or do we need to get buy-in from the maintainers of this 
>> specific driver?
>>
> 
> We've found that the following applied on top of 5.2.14 suppresses the 
> warnings.
> 
> Christoph, Keith, Jens, is this something that we could do for the nvme 
> driver?  I'll happily propose it formally if it would be acceptable.
> 
> Thanks!
> 
> 
> diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
> --- a/drivers/nvme/host/pci.c
> +++ b/drivers/nvme/host/pci.c
> @@ -1613,7 +1613,8 @@ static int nvme_alloc_admin_tags(struct nvme_dev *dev)
>  		dev->admin_tagset.timeout = ADMIN_TIMEOUT;
>  		dev->admin_tagset.numa_node = dev_to_node(dev->dev);
>  		dev->admin_tagset.cmd_size = sizeof(struct nvme_iod);
> -		dev->admin_tagset.flags = BLK_MQ_F_NO_SCHED;
> +		dev->admin_tagset.flags = BLK_MQ_F_NO_SCHED |
> +					  BLK_MQ_F_BLOCKING;

I think you want to only set the BLK_MQ_F_BLOCKING if the DMA is required
to be unencrypted. Unfortunately, force_dma_unencrypted() can't be called
from a module. Is there a DMA API that could be called to get that info?

Thanks,
Tom

>  		dev->admin_tagset.driver_data = dev;
>  
>  		if (blk_mq_alloc_tag_set(&dev->admin_tagset))
> @@ -2262,7 +2263,8 @@ static int nvme_dev_add(struct nvme_dev *dev)
>  		dev->tagset.queue_depth =
>  				min_t(int, dev->q_depth, BLK_MQ_MAX_DEPTH) - 1;
>  		dev->tagset.cmd_size = sizeof(struct nvme_iod);
> -		dev->tagset.flags = BLK_MQ_F_SHOULD_MERGE;
> +		dev->tagset.flags = BLK_MQ_F_SHOULD_MERGE |
> +				    BLK_MQ_F_BLOCKING;
>  		dev->tagset.driver_data = dev;
>  
>  		ret = blk_mq_alloc_tag_set(&dev->tagset);
> 
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [bug] __blk_mq_run_hw_queue suspicious rcu usage
  2019-09-17 18:41         ` Lendacky, Thomas
@ 2019-09-18 13:22           ` Christoph Hellwig
  0 siblings, 0 replies; 7+ messages in thread
From: Christoph Hellwig @ 2019-09-18 13:22 UTC (permalink / raw)
  To: Lendacky, Thomas
  Cc: Jens Axboe, Singh, Brijesh, x86, linux-kernel, Ming Lei, iommu,
	Peter Gonda, David Rientjes, Keith Busch, Christoph Hellwig,
	Jianxiong Gao

On Tue, Sep 17, 2019 at 06:41:02PM +0000, Lendacky, Thomas wrote:
> > diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
> > --- a/drivers/nvme/host/pci.c
> > +++ b/drivers/nvme/host/pci.c
> > @@ -1613,7 +1613,8 @@ static int nvme_alloc_admin_tags(struct nvme_dev *dev)
> >  		dev->admin_tagset.timeout = ADMIN_TIMEOUT;
> >  		dev->admin_tagset.numa_node = dev_to_node(dev->dev);
> >  		dev->admin_tagset.cmd_size = sizeof(struct nvme_iod);
> > -		dev->admin_tagset.flags = BLK_MQ_F_NO_SCHED;
> > +		dev->admin_tagset.flags = BLK_MQ_F_NO_SCHED |
> > +					  BLK_MQ_F_BLOCKING;
> 
> I think you want to only set the BLK_MQ_F_BLOCKING if the DMA is required
> to be unencrypted. Unfortunately, force_dma_unencrypted() can't be called
> from a module. Is there a DMA API that could be called to get that info?

The DMA API must support non-blocking calls, and various drivers rely
on that.  So we need to provide that even for the SEV case.  If the
actual blocking can't be made to work we'll need to wire up the DMA
pool in kernel/dma/remap.c for it (and probably move it to separate
file).
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, back to index

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <alpine.DEB.2.21.1909041434580.160038@chino.kir.corp.google.com>
2019-09-05  6:06 ` [bug] __blk_mq_run_hw_queue suspicious rcu usage Christoph Hellwig
2019-09-05 22:37   ` David Rientjes via iommu
2019-09-16 23:45     ` David Rientjes via iommu
2019-09-17 18:23       ` David Rientjes via iommu
2019-09-17 18:32         ` Jens Axboe
2019-09-17 18:41         ` Lendacky, Thomas
2019-09-18 13:22           ` Christoph Hellwig

IOMMU Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-iommu/0 linux-iommu/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-iommu linux-iommu/ https://lore.kernel.org/linux-iommu \
		iommu@lists.linux-foundation.org iommu@archiver.kernel.org
	public-inbox-index linux-iommu


Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.linux-foundation.lists.iommu


AGPL code for this site: git clone https://public-inbox.org/ public-inbox