All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jacob Pan <jacob.jun.pan@linux.intel.com>
To: Jean-Philippe Brucker <jean-philippe@linaro.org>
Cc: Joerg Roedel <joro@8bytes.org>,
	Alex Williamson <alex.williamson@redhat.com>,
	Lu Baolu <baolu.lu@linux.intel.com>,
	iommu@lists.linux-foundation.org,
	LKML <linux-kernel@vger.kernel.org>,
	David Woodhouse <dwmw2@infradead.org>,
	Jean-Philippe Brucker <jean-philippe@linaro.com>,
	Yi Liu <yi.l.liu@intel.com>, "Tian, Kevin" <kevin.tian@intel.com>,
	Raj Ashok <ashok.raj@intel.com>,
	Christoph Hellwig <hch@infradead.org>,
	Jonathan Cameron <jic23@kernel.org>,
	Eric Auger <eric.auger@redhat.com>, "Wu, Hao" <hao.wu@intel.com>,
	jacob.jun.pan@linux.intel.com
Subject: Re: [PATCH 00/10] IOASID extensions for guest SVA
Date: Thu, 2 Apr 2020 09:09:09 -0700	[thread overview]
Message-ID: <20200402090909.4475d430@jacob-builder> (raw)
In-Reply-To: <20200402122633.GC1176452@myrica>

On Thu, 2 Apr 2020 14:26:33 +0200
Jean-Philippe Brucker <jean-philippe@linaro.org> wrote:

> On Wed, Apr 01, 2020 at 04:38:42PM -0700, Jacob Pan wrote:
> > On Wed, 1 Apr 2020 16:03:01 +0200
> > Jean-Philippe Brucker <jean-philippe@linaro.org> wrote:
> >   
> > > Hi Jacob,
> > > 
> > > On Wed, Mar 25, 2020 at 10:55:21AM -0700, Jacob Pan wrote:  
> > > > IOASID was introduced in v5.5 as a generic kernel allocator
> > > > service for both PCIe Process Address Space ID (PASID) and ARM
> > > > SMMU's Sub Stream ID. In addition to basic ID allocation,
> > > > ioasid_set was introduced as a token that is shared by a group
> > > > of IOASIDs. This set token can be used for permission checking
> > > > but lack of some features needed by guest Shared Virtual
> > > > Address (SVA). In addition, IOASID support for life cycle
> > > > management is needed among multiple users.
> > > > 
> > > > This patchset introduces two extensions to the IOASID code,
> > > > 1. IOASID set operations
> > > > 2. Notifications for IOASID state synchronization    
> > > 
> > > My main concern with this series is patch 7 changing the spinlock
> > > to a mutex, which prevents SVA from calling ioasid_free() from
> > > the RCU callback of MMU notifiers. Could we use atomic notifiers,
> > > or do the FREE notification another way?
> > >   
> > Maybe I am looking at the wrong code, I thought
> > mmu_notifier_ops.free_notifier() is called outside spinlock with
> > call_srcu(), which will be invoked in the thread context.
> > in mmu_notifier.c mmu_notifier_put()
> > 	spin_unlock(&mm->notifier_subscriptions->lock);
> > 
> > 	call_srcu(&srcu, &subscription->rcu,
> > mmu_notifier_free_rcu);  
> 
> free_notifier() is called from RCU callback, and according to
> Documentation/RCU/checklist.txt:
> 
> 5.      If call_rcu() or call_srcu() is used, the callback function
> will be called from softirq context.  In particular, it cannot block.
> 
> When applying the patch I get the sleep-in-atomic warning:
> 
> [   87.861793] BUG: sleeping function called from invalid context at
> kernel/locking/mutex.c:935 [   87.863293] in_atomic(): 1,
> irqs_disabled(): 0, non_block: 0, pid: 74, name: kworker/6:1
> [   87.863993] 2 locks held by kworker/6:1/74: [   87.864493]  #0:
> ffffff885ac12538 ((wq_completion)rcu_gp){+.+.}-{0:0}, at:
> process_one_work+0x740/0x1880 [   87.865593]  #1: ffffff88591efd30
> ((work_completion)(&sdp->work)){+.+.}-{0:0}, at:
> process_one_work+0x740/0x1880 [   87.866993] CPU: 6 PID: 74 Comm:
> kworker/6:1 Not tainted 5.6.0-next-20200331+ #121 [   87.867393]
> Hardware name: FVP Base (DT) [   87.867893] Workqueue: rcu_gp
> srcu_invoke_callbacks [   87.868393] Call trace: [   87.868793]
> dump_backtrace+0x0/0x310 [   87.869293]  show_stack+0x14/0x20
> [   87.869693]  dump_stack+0x124/0x180 [   87.870193]
> ___might_sleep+0x2ac/0x428 [   87.870693]  __might_sleep+0x88/0x168
> [   87.871094]  __mutex_lock+0xa0/0x1270
> [   87.871593]  mutex_lock_nested+0x1c/0x28
> [   87.872093]  ioasid_free+0x28/0x48
> [   87.872493]  io_mm_free+0x1d0/0x608
> [   87.872993]  mmu_notifier_free_rcu+0x74/0xe8
> [   87.873393]  srcu_invoke_callbacks+0x1d0/0x2c8
> [   87.873893]  process_one_work+0x858/0x1880
> [   87.874393]  worker_thread+0x314/0xcd0
> [   87.874793]  kthread+0x318/0x400
> [   87.875293]  ret_from_fork+0x10/0x18
> 
You are right, I was reading call_srcu comments too fast. I guess
rcu callbacks are still in softirq not offloaded to kernel threads.

 *
 * The callback will be invoked from process context, but must
 *  nevertheless be fast and must not block.
 */
So even atomic works in principle but not a good idea since it may take
a long time.

> > 
> > Anyway, if we have to use atomic. I tried atomic notifier first,
> > there are two subscribers to the free event on x86.
> > 1. IOMMU
> > 2. KVM
> > 
> > For #1, the problem is that in the free operation, VT-d driver
> > needs to do a lot of clean up in thread context.
> > - hold a mutex to traverse a list of devices
> > - clear PASID entry and flush cache
> > 
> > For #2, KVM might be able to deal with spinlocks for updating VMCS
> > PASID translation table. +Hao
> > 
> > Perhaps two solutions I can think of:
> > 1. Use a cyclic IOASID allocator. The main reason of clean up at
> > free is to prevent race with IOASID alloc. Similar to PID, 2M IOASID
> > will take long time overflow. Then we can use atomic notifier and a
> > deferred workqueue to do IOMMU cleanup. The downside is a large and
> > growing PASID table, may not be a performance issue since it has
> > TLB.  
> 
> That might be a problem for SMMU, which has 1024 * 64kB leaf PASID
> tables, for a total of 64MB per endpoint if there is too much
> fragmentation in the IOASID space.
> 
OK. Not an option here :(

> > 2. Let VFIO ensure free always happen after unbind. Then there is no
> > need to do cleanup. But that requires VFIO to keep track of all the
> > PASIDs within each VM. When the VM terminates, VFIO is responsible
> > for the clean up. That was Yi's original proposal. I also tried to
> > provide an IOASID set iterator for VFIO to free the IOASIDs within
> > each VM/set, but the private data belongs to IOMMU driver.  
> 
> Not really my place to comment on this, but I find it nicer to use the
> same gpasid_unbind() path when VFIO frees a PASID as when the guest
> explicitly unbinds before freeing. 
> 

Might be the only option now.

Thanks,

Jacob

> Thanks,
> Jean
> 

[Jacob Pan]

WARNING: multiple messages have this Message-ID (diff)
From: Jacob Pan <jacob.jun.pan@linux.intel.com>
To: Jean-Philippe Brucker <jean-philippe@linaro.org>
Cc: "Tian, Kevin" <kevin.tian@intel.com>,
	Raj Ashok <ashok.raj@intel.com>,
	Jean-Philippe Brucker <jean-philippe@linaro.com>,
	LKML <linux-kernel@vger.kernel.org>,
	iommu@lists.linux-foundation.org,
	Alex Williamson <alex.williamson@redhat.com>,
	"Wu, Hao" <hao.wu@intel.com>,
	David Woodhouse <dwmw2@infradead.org>,
	Jonathan Cameron <jic23@kernel.org>
Subject: Re: [PATCH 00/10] IOASID extensions for guest SVA
Date: Thu, 2 Apr 2020 09:09:09 -0700	[thread overview]
Message-ID: <20200402090909.4475d430@jacob-builder> (raw)
In-Reply-To: <20200402122633.GC1176452@myrica>

On Thu, 2 Apr 2020 14:26:33 +0200
Jean-Philippe Brucker <jean-philippe@linaro.org> wrote:

> On Wed, Apr 01, 2020 at 04:38:42PM -0700, Jacob Pan wrote:
> > On Wed, 1 Apr 2020 16:03:01 +0200
> > Jean-Philippe Brucker <jean-philippe@linaro.org> wrote:
> >   
> > > Hi Jacob,
> > > 
> > > On Wed, Mar 25, 2020 at 10:55:21AM -0700, Jacob Pan wrote:  
> > > > IOASID was introduced in v5.5 as a generic kernel allocator
> > > > service for both PCIe Process Address Space ID (PASID) and ARM
> > > > SMMU's Sub Stream ID. In addition to basic ID allocation,
> > > > ioasid_set was introduced as a token that is shared by a group
> > > > of IOASIDs. This set token can be used for permission checking
> > > > but lack of some features needed by guest Shared Virtual
> > > > Address (SVA). In addition, IOASID support for life cycle
> > > > management is needed among multiple users.
> > > > 
> > > > This patchset introduces two extensions to the IOASID code,
> > > > 1. IOASID set operations
> > > > 2. Notifications for IOASID state synchronization    
> > > 
> > > My main concern with this series is patch 7 changing the spinlock
> > > to a mutex, which prevents SVA from calling ioasid_free() from
> > > the RCU callback of MMU notifiers. Could we use atomic notifiers,
> > > or do the FREE notification another way?
> > >   
> > Maybe I am looking at the wrong code, I thought
> > mmu_notifier_ops.free_notifier() is called outside spinlock with
> > call_srcu(), which will be invoked in the thread context.
> > in mmu_notifier.c mmu_notifier_put()
> > 	spin_unlock(&mm->notifier_subscriptions->lock);
> > 
> > 	call_srcu(&srcu, &subscription->rcu,
> > mmu_notifier_free_rcu);  
> 
> free_notifier() is called from RCU callback, and according to
> Documentation/RCU/checklist.txt:
> 
> 5.      If call_rcu() or call_srcu() is used, the callback function
> will be called from softirq context.  In particular, it cannot block.
> 
> When applying the patch I get the sleep-in-atomic warning:
> 
> [   87.861793] BUG: sleeping function called from invalid context at
> kernel/locking/mutex.c:935 [   87.863293] in_atomic(): 1,
> irqs_disabled(): 0, non_block: 0, pid: 74, name: kworker/6:1
> [   87.863993] 2 locks held by kworker/6:1/74: [   87.864493]  #0:
> ffffff885ac12538 ((wq_completion)rcu_gp){+.+.}-{0:0}, at:
> process_one_work+0x740/0x1880 [   87.865593]  #1: ffffff88591efd30
> ((work_completion)(&sdp->work)){+.+.}-{0:0}, at:
> process_one_work+0x740/0x1880 [   87.866993] CPU: 6 PID: 74 Comm:
> kworker/6:1 Not tainted 5.6.0-next-20200331+ #121 [   87.867393]
> Hardware name: FVP Base (DT) [   87.867893] Workqueue: rcu_gp
> srcu_invoke_callbacks [   87.868393] Call trace: [   87.868793]
> dump_backtrace+0x0/0x310 [   87.869293]  show_stack+0x14/0x20
> [   87.869693]  dump_stack+0x124/0x180 [   87.870193]
> ___might_sleep+0x2ac/0x428 [   87.870693]  __might_sleep+0x88/0x168
> [   87.871094]  __mutex_lock+0xa0/0x1270
> [   87.871593]  mutex_lock_nested+0x1c/0x28
> [   87.872093]  ioasid_free+0x28/0x48
> [   87.872493]  io_mm_free+0x1d0/0x608
> [   87.872993]  mmu_notifier_free_rcu+0x74/0xe8
> [   87.873393]  srcu_invoke_callbacks+0x1d0/0x2c8
> [   87.873893]  process_one_work+0x858/0x1880
> [   87.874393]  worker_thread+0x314/0xcd0
> [   87.874793]  kthread+0x318/0x400
> [   87.875293]  ret_from_fork+0x10/0x18
> 
You are right, I was reading call_srcu comments too fast. I guess
rcu callbacks are still in softirq not offloaded to kernel threads.

 *
 * The callback will be invoked from process context, but must
 *  nevertheless be fast and must not block.
 */
So even atomic works in principle but not a good idea since it may take
a long time.

> > 
> > Anyway, if we have to use atomic. I tried atomic notifier first,
> > there are two subscribers to the free event on x86.
> > 1. IOMMU
> > 2. KVM
> > 
> > For #1, the problem is that in the free operation, VT-d driver
> > needs to do a lot of clean up in thread context.
> > - hold a mutex to traverse a list of devices
> > - clear PASID entry and flush cache
> > 
> > For #2, KVM might be able to deal with spinlocks for updating VMCS
> > PASID translation table. +Hao
> > 
> > Perhaps two solutions I can think of:
> > 1. Use a cyclic IOASID allocator. The main reason of clean up at
> > free is to prevent race with IOASID alloc. Similar to PID, 2M IOASID
> > will take long time overflow. Then we can use atomic notifier and a
> > deferred workqueue to do IOMMU cleanup. The downside is a large and
> > growing PASID table, may not be a performance issue since it has
> > TLB.  
> 
> That might be a problem for SMMU, which has 1024 * 64kB leaf PASID
> tables, for a total of 64MB per endpoint if there is too much
> fragmentation in the IOASID space.
> 
OK. Not an option here :(

> > 2. Let VFIO ensure free always happen after unbind. Then there is no
> > need to do cleanup. But that requires VFIO to keep track of all the
> > PASIDs within each VM. When the VM terminates, VFIO is responsible
> > for the clean up. That was Yi's original proposal. I also tried to
> > provide an IOASID set iterator for VFIO to free the IOASIDs within
> > each VM/set, but the private data belongs to IOMMU driver.  
> 
> Not really my place to comment on this, but I find it nicer to use the
> same gpasid_unbind() path when VFIO frees a PASID as when the guest
> explicitly unbinds before freeing. 
> 

Might be the only option now.

Thanks,

Jacob

> Thanks,
> Jean
> 

[Jacob Pan]
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

  reply	other threads:[~2020-04-02 16:03 UTC|newest]

Thread overview: 114+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-03-25 17:55 [PATCH 00/10] IOASID extensions for guest SVA Jacob Pan
2020-03-25 17:55 ` Jacob Pan
2020-03-25 17:55 ` [PATCH 01/10] iommu/ioasid: Introduce system-wide capacity Jacob Pan
2020-03-25 17:55   ` Jacob Pan
2020-03-27  8:07   ` Tian, Kevin
2020-03-27  8:07     ` Tian, Kevin
2020-03-27 16:08     ` Jacob Pan
2020-03-27 16:08       ` Jacob Pan
2020-04-01 13:45   ` Jean-Philippe Brucker
2020-04-01 13:45     ` Jean-Philippe Brucker
2020-04-01 22:50     ` Jacob Pan
2020-04-01 22:50       ` Jacob Pan
2020-03-25 17:55 ` [PATCH 02/10] iommu/vt-d: Set IOASID capacity when SVM is enabled Jacob Pan
2020-03-25 17:55   ` Jacob Pan
2020-03-27  8:08   ` Tian, Kevin
2020-03-27  8:08     ` Tian, Kevin
2020-03-25 17:55 ` [PATCH 03/10] iommu/ioasid: Introduce per set allocation APIs Jacob Pan
2020-03-25 17:55   ` Jacob Pan
2020-03-26  2:12   ` Lu Baolu
2020-03-26  2:12     ` Lu Baolu
2020-03-26 21:30     ` Jacob Pan
2020-03-26 21:30       ` Jacob Pan
2020-03-27  8:38   ` Tian, Kevin
2020-03-27  8:38     ` Tian, Kevin
2020-03-27 16:59     ` Jacob Pan
2020-03-27 16:59       ` Jacob Pan
2020-03-28  6:32       ` Tian, Kevin
2020-03-28  6:32         ` Tian, Kevin
2020-04-01 13:47   ` Jean-Philippe Brucker
2020-04-01 13:47     ` Jean-Philippe Brucker
2020-04-06 20:02     ` Jacob Pan
2020-04-06 20:02       ` Jacob Pan
2020-04-07 11:01       ` Jean-Philippe Brucker
2020-04-07 11:01         ` Jean-Philippe Brucker
2020-04-21 21:51         ` Jacob Pan
2020-04-21 21:51           ` Jacob Pan
2020-03-25 17:55 ` [PATCH 04/10] iommu/ioasid: Rename ioasid_set_data to avoid confusion with ioasid_set Jacob Pan
2020-03-25 17:55   ` Jacob Pan
2020-03-27  9:35   ` Tian, Kevin
2020-03-27  9:35     ` Tian, Kevin
2020-03-25 17:55 ` [PATCH 05/10] iommu/ioasid: Create an IOASID set for host SVA use Jacob Pan
2020-03-25 17:55   ` Jacob Pan
2020-03-27  9:41   ` Tian, Kevin
2020-03-27  9:41     ` Tian, Kevin
2020-03-27 17:28     ` Jacob Pan
2020-03-27 17:28       ` Jacob Pan
2020-03-28  6:33       ` Tian, Kevin
2020-03-28  6:33         ` Tian, Kevin
2020-04-01 13:53   ` Jean-Philippe Brucker
2020-04-01 13:53     ` Jean-Philippe Brucker
2020-04-06 15:33     ` Jacob Pan
2020-04-06 15:33       ` Jacob Pan
2020-04-07 11:01       ` Jean-Philippe Brucker
2020-04-07 11:01         ` Jean-Philippe Brucker
2020-04-13 22:06         ` Jacob Pan
2020-04-13 22:06           ` Jacob Pan
2020-04-15 15:10           ` Jean-Philippe Brucker
2020-04-15 15:10             ` Jean-Philippe Brucker
2020-03-25 17:55 ` [PATCH 06/10] iommu/ioasid: Convert to set aware allocations Jacob Pan
2020-03-25 17:55   ` Jacob Pan
2020-03-27  9:54   ` Tian, Kevin
2020-03-27  9:54     ` Tian, Kevin
2020-03-27 17:41     ` Jacob Pan
2020-03-27 17:41       ` Jacob Pan
2020-03-28  6:40       ` Tian, Kevin
2020-03-28  6:40         ` Tian, Kevin
2020-04-06 20:07         ` Jacob Pan
2020-04-06 20:07           ` Jacob Pan
2020-04-01 13:55   ` Jean-Philippe Brucker
2020-04-01 13:55     ` Jean-Philippe Brucker
2020-04-01 22:45     ` Jacob Pan
2020-04-01 22:45       ` Jacob Pan
2020-03-25 17:55 ` [PATCH 07/10] iommu/ioasid: Use mutex instead of spinlock Jacob Pan
2020-03-25 17:55   ` Jacob Pan
2020-03-27  9:55   ` Tian, Kevin
2020-03-27  9:55     ` Tian, Kevin
2020-04-01 13:58   ` Jean-Philippe Brucker
2020-04-01 13:58     ` Jean-Philippe Brucker
2020-03-25 17:55 ` [PATCH 08/10] iommu/ioasid: Introduce notifier APIs Jacob Pan
2020-03-25 17:55   ` Jacob Pan
2020-03-27 10:03   ` Tian, Kevin
2020-03-27 10:03     ` Tian, Kevin
2020-03-27 18:36     ` Jacob Pan
2020-03-27 18:36       ` Jacob Pan
2020-03-28  6:43       ` Tian, Kevin
2020-03-28  6:43         ` Tian, Kevin
2020-03-31 15:13         ` Jacob Pan
2020-03-31 15:13           ` Jacob Pan
2020-04-01 14:00   ` Jean-Philippe Brucker
2020-04-01 14:00     ` Jean-Philippe Brucker
2020-04-10 15:43     ` Jacob Pan
2020-04-10 15:43       ` Jacob Pan
2020-03-25 17:55 ` [PATCH 09/10] iommu/ioasid: Support ioasid_set quota adjustment Jacob Pan
2020-03-25 17:55   ` Jacob Pan
2020-03-27 10:09   ` Tian, Kevin
2020-03-27 10:09     ` Tian, Kevin
2020-03-27 23:30     ` Jacob Pan
2020-03-27 23:30       ` Jacob Pan
2020-03-28  6:44       ` Tian, Kevin
2020-03-28  6:44         ` Tian, Kevin
2020-03-25 17:55 ` [PATCH 10/10] iommu/vt-d: Register PASID notifier for status change Jacob Pan
2020-03-25 17:55   ` Jacob Pan
2020-03-27 10:22   ` Tian, Kevin
2020-03-27 10:22     ` Tian, Kevin
2020-03-27 23:47     ` Jacob Pan
2020-03-27 23:47       ` Jacob Pan
2020-04-01 14:03 ` [PATCH 00/10] IOASID extensions for guest SVA Jean-Philippe Brucker
2020-04-01 14:03   ` Jean-Philippe Brucker
2020-04-01 23:38   ` Jacob Pan
2020-04-01 23:38     ` Jacob Pan
2020-04-02 12:26     ` Jean-Philippe Brucker
2020-04-02 12:26       ` Jean-Philippe Brucker
2020-04-02 16:09       ` Jacob Pan [this message]
2020-04-02 16:09         ` Jacob Pan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200402090909.4475d430@jacob-builder \
    --to=jacob.jun.pan@linux.intel.com \
    --cc=alex.williamson@redhat.com \
    --cc=ashok.raj@intel.com \
    --cc=baolu.lu@linux.intel.com \
    --cc=dwmw2@infradead.org \
    --cc=eric.auger@redhat.com \
    --cc=hao.wu@intel.com \
    --cc=hch@infradead.org \
    --cc=iommu@lists.linux-foundation.org \
    --cc=jean-philippe@linaro.com \
    --cc=jean-philippe@linaro.org \
    --cc=jic23@kernel.org \
    --cc=joro@8bytes.org \
    --cc=kevin.tian@intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=yi.l.liu@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.