Linux-Fsdevel Archive on lore.kernel.org
 help / color / Atom feed
* [PATCH] dma-resv: lockdep-prime address_space->i_mmap_rwsem for dma-resv
@ 2020-07-28 13:58 Daniel Vetter
  2020-07-30 12:03 ` [Linaro-mm-sig] " Christian König
  2020-07-30 12:17 ` Thomas Hellström (Intel)
  0 siblings, 2 replies; 5+ messages in thread
From: Daniel Vetter @ 2020-07-28 13:58 UTC (permalink / raw)
  To: DRI Development
  Cc: Intel Graphics Development, Daniel Vetter, Daniel Vetter,
	Sumit Semwal, Christian König, linux-media, linaro-mm-sig,
	Dave Chinner, Qian Cai, linux-xfs, linux-fsdevel,
	Thomas Hellström, Andrew Morton, Jason Gunthorpe, linux-mm,
	linux-rdma, Maarten Lankhorst

GPU drivers need this in their shrinkers, to be able to throw out
mmap'ed buffers. Note that we also need dma_resv_lock in shrinkers,
but that loop is resolved by trylocking in shrinkers.

So full hierarchy is now (ignore some of the other branches we already
have primed):

mmap_read_lock -> dma_resv -> shrinkers -> i_mmap_lock_write

I hope that's not inconsistent with anything mm or fs does, adding
relevant people.

Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
Cc: Sumit Semwal <sumit.semwal@linaro.org>
Cc: "Christian König" <christian.koenig@amd.com>
Cc: linux-media@vger.kernel.org
Cc: linaro-mm-sig@lists.linaro.org
Cc: Dave Chinner <david@fromorbit.com>
Cc: Qian Cai <cai@lca.pw>
Cc: linux-xfs@vger.kernel.org
Cc: linux-fsdevel@vger.kernel.org
Cc: Thomas Hellström (Intel) <thomas_os@shipmail.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Jason Gunthorpe <jgg@mellanox.com>
Cc: linux-mm@kvack.org
Cc: linux-rdma@vger.kernel.org
Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
---
 drivers/dma-buf/dma-resv.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/drivers/dma-buf/dma-resv.c b/drivers/dma-buf/dma-resv.c
index 0e6675ec1d11..9678162a4ac5 100644
--- a/drivers/dma-buf/dma-resv.c
+++ b/drivers/dma-buf/dma-resv.c
@@ -104,12 +104,14 @@ static int __init dma_resv_lockdep(void)
 	struct mm_struct *mm = mm_alloc();
 	struct ww_acquire_ctx ctx;
 	struct dma_resv obj;
+	struct address_space mapping;
 	int ret;
 
 	if (!mm)
 		return -ENOMEM;
 
 	dma_resv_init(&obj);
+	address_space_init_once(&mapping);
 
 	mmap_read_lock(mm);
 	ww_acquire_init(&ctx, &reservation_ww_class);
@@ -117,6 +119,9 @@ static int __init dma_resv_lockdep(void)
 	if (ret == -EDEADLK)
 		dma_resv_lock_slow(&obj, &ctx);
 	fs_reclaim_acquire(GFP_KERNEL);
+	/* for unmap_mapping_range on trylocked buffer objects in shrinkers */
+	i_mmap_lock_write(&mapping);
+	i_mmap_unlock_write(&mapping);
 #ifdef CONFIG_MMU_NOTIFIER
 	lock_map_acquire(&__mmu_notifier_invalidate_range_start_map);
 	__dma_fence_might_wait();
-- 
2.27.0


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [Linaro-mm-sig] [PATCH] dma-resv: lockdep-prime address_space->i_mmap_rwsem for dma-resv
  2020-07-28 13:58 [PATCH] dma-resv: lockdep-prime address_space->i_mmap_rwsem for dma-resv Daniel Vetter
@ 2020-07-30 12:03 ` Christian König
  2020-07-30 12:17 ` Thomas Hellström (Intel)
  1 sibling, 0 replies; 5+ messages in thread
From: Christian König @ 2020-07-30 12:03 UTC (permalink / raw)
  To: Daniel Vetter, DRI Development
  Cc: linux-xfs, Maarten Lankhorst, linux-rdma,
	Intel Graphics Development, Dave Chinner, Christian König,
	linaro-mm-sig, linux-mm, Jason Gunthorpe, Qian Cai,
	linux-fsdevel, Daniel Vetter, Andrew Morton, linux-media

Am 28.07.20 um 15:58 schrieb Daniel Vetter:
> GPU drivers need this in their shrinkers, to be able to throw out
> mmap'ed buffers. Note that we also need dma_resv_lock in shrinkers,
> but that loop is resolved by trylocking in shrinkers.
>
> So full hierarchy is now (ignore some of the other branches we already
> have primed):
>
> mmap_read_lock -> dma_resv -> shrinkers -> i_mmap_lock_write
>
> I hope that's not inconsistent with anything mm or fs does, adding
> relevant people.
>
> Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
> Cc: Sumit Semwal <sumit.semwal@linaro.org>
> Cc: "Christian König" <christian.koenig@amd.com>
> Cc: linux-media@vger.kernel.org
> Cc: linaro-mm-sig@lists.linaro.org
> Cc: Dave Chinner <david@fromorbit.com>
> Cc: Qian Cai <cai@lca.pw>
> Cc: linux-xfs@vger.kernel.org
> Cc: linux-fsdevel@vger.kernel.org
> Cc: Thomas Hellström (Intel) <thomas_os@shipmail.org>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Jason Gunthorpe <jgg@mellanox.com>
> Cc: linux-mm@kvack.org
> Cc: linux-rdma@vger.kernel.org
> Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>

Reviewed-by: Christian König <christian.koenig@amd.com>

> ---
>   drivers/dma-buf/dma-resv.c | 5 +++++
>   1 file changed, 5 insertions(+)
>
> diff --git a/drivers/dma-buf/dma-resv.c b/drivers/dma-buf/dma-resv.c
> index 0e6675ec1d11..9678162a4ac5 100644
> --- a/drivers/dma-buf/dma-resv.c
> +++ b/drivers/dma-buf/dma-resv.c
> @@ -104,12 +104,14 @@ static int __init dma_resv_lockdep(void)
>   	struct mm_struct *mm = mm_alloc();
>   	struct ww_acquire_ctx ctx;
>   	struct dma_resv obj;
> +	struct address_space mapping;
>   	int ret;
>   
>   	if (!mm)
>   		return -ENOMEM;
>   
>   	dma_resv_init(&obj);
> +	address_space_init_once(&mapping);
>   
>   	mmap_read_lock(mm);
>   	ww_acquire_init(&ctx, &reservation_ww_class);
> @@ -117,6 +119,9 @@ static int __init dma_resv_lockdep(void)
>   	if (ret == -EDEADLK)
>   		dma_resv_lock_slow(&obj, &ctx);
>   	fs_reclaim_acquire(GFP_KERNEL);
> +	/* for unmap_mapping_range on trylocked buffer objects in shrinkers */
> +	i_mmap_lock_write(&mapping);
> +	i_mmap_unlock_write(&mapping);
>   #ifdef CONFIG_MMU_NOTIFIER
>   	lock_map_acquire(&__mmu_notifier_invalidate_range_start_map);
>   	__dma_fence_might_wait();


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] dma-resv: lockdep-prime address_space->i_mmap_rwsem for dma-resv
  2020-07-28 13:58 [PATCH] dma-resv: lockdep-prime address_space->i_mmap_rwsem for dma-resv Daniel Vetter
  2020-07-30 12:03 ` [Linaro-mm-sig] " Christian König
@ 2020-07-30 12:17 ` Thomas Hellström (Intel)
  2020-07-30 13:17   ` Daniel Vetter
  1 sibling, 1 reply; 5+ messages in thread
From: Thomas Hellström (Intel) @ 2020-07-30 12:17 UTC (permalink / raw)
  To: Daniel Vetter, DRI Development
  Cc: Intel Graphics Development, Daniel Vetter, Sumit Semwal,
	Christian König, linux-media, linaro-mm-sig, Dave Chinner,
	Qian Cai, linux-xfs, linux-fsdevel, Andrew Morton,
	Jason Gunthorpe, linux-mm, linux-rdma, Maarten Lankhorst


On 7/28/20 3:58 PM, Daniel Vetter wrote:
> GPU drivers need this in their shrinkers, to be able to throw out
> mmap'ed buffers. Note that we also need dma_resv_lock in shrinkers,
> but that loop is resolved by trylocking in shrinkers.
>
> So full hierarchy is now (ignore some of the other branches we already
> have primed):
>
> mmap_read_lock -> dma_resv -> shrinkers -> i_mmap_lock_write
>
> I hope that's not inconsistent with anything mm or fs does, adding
> relevant people.
>
Looks OK to me. The mapping_dirty_helpers run under the i_mmap_lock, but 
don't allocate any memory AFAICT.

Since huge page-table-entry splitting may happen under the i_mmap_lock 
from unmap_mapping_range() it might be worth figuring out how new page 
directory pages are allocated, though.

/Thomas




^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] dma-resv: lockdep-prime address_space->i_mmap_rwsem for dma-resv
  2020-07-30 12:17 ` Thomas Hellström (Intel)
@ 2020-07-30 13:17   ` Daniel Vetter
  2020-07-30 16:45     ` Thomas Hellström (Intel)
  0 siblings, 1 reply; 5+ messages in thread
From: Daniel Vetter @ 2020-07-30 13:17 UTC (permalink / raw)
  To: Thomas Hellström (Intel)
  Cc: DRI Development, Intel Graphics Development, Daniel Vetter,
	Sumit Semwal, Christian König,
	open list:DMA BUFFER SHARING FRAMEWORK,
	moderated list:DMA BUFFER SHARING FRAMEWORK, Dave Chinner,
	Qian Cai, linux-xfs, linux-fsdevel, Andrew Morton,
	Jason Gunthorpe, Linux MM, linux-rdma, Maarten Lankhorst

On Thu, Jul 30, 2020 at 2:17 PM Thomas Hellström (Intel)
<thomas_os@shipmail.org> wrote:
>
>
> On 7/28/20 3:58 PM, Daniel Vetter wrote:
> > GPU drivers need this in their shrinkers, to be able to throw out
> > mmap'ed buffers. Note that we also need dma_resv_lock in shrinkers,
> > but that loop is resolved by trylocking in shrinkers.
> >
> > So full hierarchy is now (ignore some of the other branches we already
> > have primed):
> >
> > mmap_read_lock -> dma_resv -> shrinkers -> i_mmap_lock_write
> >
> > I hope that's not inconsistent with anything mm or fs does, adding
> > relevant people.
> >
> Looks OK to me. The mapping_dirty_helpers run under the i_mmap_lock, but
> don't allocate any memory AFAICT.
>
> Since huge page-table-entry splitting may happen under the i_mmap_lock
> from unmap_mapping_range() it might be worth figuring out how new page
> directory pages are allocated, though.

ofc I'm not an mm expert at all, but I did try to scroll through all
i_mmap_lock_write/read callers. Found the following:

- kernel/events/uprobes.c in build_map_info:

            /*
             * Needs GFP_NOWAIT to avoid i_mmap_rwsem recursion through
             * reclaim. This is optimistic, no harm done if it fails.
             */

- I got lost in the hugetlb.c code and couldn't convince myself it's
not allocating page directories at various levels with something else
than GFP_KERNEL.

So looks like the recursion is clearly there and known, but the
hugepage code is too complex and flying over my head.
-Daniel

>
> /Thomas
>
>
>


-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] dma-resv: lockdep-prime address_space->i_mmap_rwsem for dma-resv
  2020-07-30 13:17   ` Daniel Vetter
@ 2020-07-30 16:45     ` Thomas Hellström (Intel)
  0 siblings, 0 replies; 5+ messages in thread
From: Thomas Hellström (Intel) @ 2020-07-30 16:45 UTC (permalink / raw)
  To: Daniel Vetter
  Cc: DRI Development, Intel Graphics Development, Daniel Vetter,
	Sumit Semwal, Christian König,
	open list:DMA BUFFER SHARING FRAMEWORK,
	moderated list:DMA BUFFER SHARING FRAMEWORK, Dave Chinner,
	Qian Cai, linux-xfs, linux-fsdevel, Andrew Morton,
	Jason Gunthorpe, Linux MM, linux-rdma, Maarten Lankhorst


On 7/30/20 3:17 PM, Daniel Vetter wrote:
> On Thu, Jul 30, 2020 at 2:17 PM Thomas Hellström (Intel)
> <thomas_os@shipmail.org> wrote:
>>
>> On 7/28/20 3:58 PM, Daniel Vetter wrote:
>>> GPU drivers need this in their shrinkers, to be able to throw out
>>> mmap'ed buffers. Note that we also need dma_resv_lock in shrinkers,
>>> but that loop is resolved by trylocking in shrinkers.
>>>
>>> So full hierarchy is now (ignore some of the other branches we already
>>> have primed):
>>>
>>> mmap_read_lock -> dma_resv -> shrinkers -> i_mmap_lock_write
>>>
>>> I hope that's not inconsistent with anything mm or fs does, adding
>>> relevant people.
>>>
>> Looks OK to me. The mapping_dirty_helpers run under the i_mmap_lock, but
>> don't allocate any memory AFAICT.
>>
>> Since huge page-table-entry splitting may happen under the i_mmap_lock
>> from unmap_mapping_range() it might be worth figuring out how new page
>> directory pages are allocated, though.
> ofc I'm not an mm expert at all, but I did try to scroll through all
> i_mmap_lock_write/read callers. Found the following:
>
> - kernel/events/uprobes.c in build_map_info:
>
>              /*
>               * Needs GFP_NOWAIT to avoid i_mmap_rwsem recursion through
>               * reclaim. This is optimistic, no harm done if it fails.
>               */
>
> - I got lost in the hugetlb.c code and couldn't convince myself it's
> not allocating page directories at various levels with something else
> than GFP_KERNEL.
>
> So looks like the recursion is clearly there and known, but the
> hugepage code is too complex and flying over my head.
> -Daniel

OK, so I inverted your annotation and ran a memory hog, and got the 
below splat. So clearly your proposed reclaim->i_mmap_lock locking order 
is an already established one.

So

Reviewed-by: Thomas Hellström <thomas.hellstrom@intel.com>

8<---------------------------------------------------------------------------------------------

[  308.324654] WARNING: possible circular locking dependency detected
[  308.324655] 5.8.0-rc2+ #16 Not tainted
[  308.324656] ------------------------------------------------------
[  308.324657] kswapd0/98 is trying to acquire lock:
[  308.324658] ffff92a16f758428 (&mapping->i_mmap_rwsem){++++}-{3:3}, 
at: rmap_walk_file+0x1c0/0x2f0
[  308.324663]
                but task is already holding lock:
[  308.324664] ffffffffb0960240 (fs_reclaim){+.+.}-{0:0}, at: 
__fs_reclaim_acquire+0x5/0x30
[  308.324666]
                which lock already depends on the new lock.

[  308.324667]
                the existing dependency chain (in reverse order) is:
[  308.324667]
                -> #1 (fs_reclaim){+.+.}-{0:0}:
[  308.324670]        fs_reclaim_acquire+0x34/0x40
[  308.324672]        dma_resv_lockdep+0x186/0x224
[  308.324675]        do_one_initcall+0x5d/0x2c0
[  308.324676]        kernel_init_freeable+0x222/0x288
[  308.324678]        kernel_init+0xa/0x107
[  308.324679]        ret_from_fork+0x1f/0x30
[  308.324680]
                -> #0 (&mapping->i_mmap_rwsem){++++}-{3:3}:
[  308.324682]        __lock_acquire+0x119f/0x1fc0
[  308.324683]        lock_acquire+0xa4/0x3b0
[  308.324685]        down_read+0x2d/0x110
[  308.324686]        rmap_walk_file+0x1c0/0x2f0
[  308.324687]        page_referenced+0x133/0x150
[  308.324689]        shrink_active_list+0x142/0x610
[  308.324690]        balance_pgdat+0x229/0x620
[  308.324691]        kswapd+0x200/0x470
[  308.324693]        kthread+0x11f/0x140
[  308.324694]        ret_from_fork+0x1f/0x30
[  308.324694]
                other info that might help us debug this:

[  308.324695]  Possible unsafe locking scenario:

[  308.324695]        CPU0                    CPU1
[  308.324696]        ----                    ----
[  308.324696]   lock(fs_reclaim);
[  308.324697] lock(&mapping->i_mmap_rwsem);
[  308.324698]                                lock(fs_reclaim);
[  308.324699]   lock(&mapping->i_mmap_rwsem);
[  308.324699]
                 *** DEADLOCK ***

[  308.324700] 1 lock held by kswapd0/98:
[  308.324701]  #0: ffffffffb0960240 (fs_reclaim){+.+.}-{0:0}, at: 
__fs_reclaim_acquire+0x5/0x30
[  308.324702]
                stack backtrace:
[  308.324704] CPU: 1 PID: 98 Comm: kswapd0 Not tainted 5.8.0-rc2+ #16
[  308.324705] Hardware name: VMware, Inc. VMware Virtual Platform/440BX 
Desktop Reference Platform, BIOS 6.00 07/29/2019
[  308.324706] Call Trace:
[  308.324710]  dump_stack+0x92/0xc8
[  308.324711]  check_noncircular+0x12d/0x150
[  308.324713]  __lock_acquire+0x119f/0x1fc0
[  308.324715]  lock_acquire+0xa4/0x3b0
[  308.324716]  ? rmap_walk_file+0x1c0/0x2f0
[  308.324717]  ? __lock_acquire+0x394/0x1fc0
[  308.324719]  down_read+0x2d/0x110
[  308.324720]  ? rmap_walk_file+0x1c0/0x2f0
[  308.324721]  rmap_walk_file+0x1c0/0x2f0
[  308.324722]  page_referenced+0x133/0x150
[  308.324724]  ? __page_set_anon_rmap+0x70/0x70
[  308.324725]  ? page_get_anon_vma+0x190/0x190
[  308.324726]  shrink_active_list+0x142/0x610
[  308.324728]  balance_pgdat+0x229/0x620
[  308.324730]  kswapd+0x200/0x470
[  308.324731]  ? lockdep_hardirqs_on_prepare+0xf5/0x170
[  308.324733]  ? finish_wait+0x80/0x80
[  308.324734]  ? balance_pgdat+0x620/0x620
[  308.324736]  kthread+0x11f/0x140
[  308.324737]  ? kthread_create_worker_on_cpu+0x40/0x40
[  308.324739]  ret_from_fork+0x1f/0x30



>> /Thomas
>>
>>
>>
>

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, back to index

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-07-28 13:58 [PATCH] dma-resv: lockdep-prime address_space->i_mmap_rwsem for dma-resv Daniel Vetter
2020-07-30 12:03 ` [Linaro-mm-sig] " Christian König
2020-07-30 12:17 ` Thomas Hellström (Intel)
2020-07-30 13:17   ` Daniel Vetter
2020-07-30 16:45     ` Thomas Hellström (Intel)

Linux-Fsdevel Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-fsdevel/0 linux-fsdevel/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-fsdevel linux-fsdevel/ https://lore.kernel.org/linux-fsdevel \
		linux-fsdevel@vger.kernel.org
	public-inbox-index linux-fsdevel

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-fsdevel


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git