linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Re: [PATCH 2/3] xfs: add kmem_alloc_io()
       [not found]     ` <20190822003131.GR1119@dread.disaster.area>
@ 2019-08-22  7:59       ` Christoph Hellwig
  2019-08-22  8:51         ` Peter Zijlstra
  0 siblings, 1 reply; 10+ messages in thread
From: Christoph Hellwig @ 2019-08-22  7:59 UTC (permalink / raw)
  To: Dave Chinner
  Cc: linux-xfs, Peter Zijlstra, Ingo Molnar, Will Deacon,
	linux-kernel, linux-mm

On Thu, Aug 22, 2019 at 10:31:32AM +1000, Dave Chinner wrote:
> > Btw, I think we should eventually kill off KM_NOFS and just use
> > PF_MEMALLOC_NOFS in XFS, as the interface makes so much more sense.
> > But that's something for the future.
> 
> Yeah, and it's not quite as simple as just using PF_MEMALLOC_NOFS
> at high levels - we'll still need to annotate callers that use KM_NOFS
> to avoid lockdep false positives. i.e. any code that can be called from
> GFP_KERNEL and reclaim context will throw false positives from
> lockdep if we don't annotate tehm correctly....

Oh well.  For now we have the XFS kmem_wrappers to turn that into
GFP_NOFS so we shouldn't be too worried, but I think that is something
we should fix in lockdep to ensure it is generally useful.  I've added
the maintainers and relevant lists to kick off a discussion.


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 2/3] xfs: add kmem_alloc_io()
  2019-08-22  7:59       ` [PATCH 2/3] xfs: add kmem_alloc_io() Christoph Hellwig
@ 2019-08-22  8:51         ` Peter Zijlstra
  2019-08-22  9:10           ` Peter Zijlstra
  0 siblings, 1 reply; 10+ messages in thread
From: Peter Zijlstra @ 2019-08-22  8:51 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Dave Chinner, linux-xfs, Ingo Molnar, Will Deacon, linux-kernel,
	linux-mm, penguin-kernel

On Thu, Aug 22, 2019 at 12:59:48AM -0700, Christoph Hellwig wrote:
> On Thu, Aug 22, 2019 at 10:31:32AM +1000, Dave Chinner wrote:
> > > Btw, I think we should eventually kill off KM_NOFS and just use
> > > PF_MEMALLOC_NOFS in XFS, as the interface makes so much more sense.
> > > But that's something for the future.
> > 
> > Yeah, and it's not quite as simple as just using PF_MEMALLOC_NOFS
> > at high levels - we'll still need to annotate callers that use KM_NOFS
> > to avoid lockdep false positives. i.e. any code that can be called from
> > GFP_KERNEL and reclaim context will throw false positives from
> > lockdep if we don't annotate tehm correctly....
> 
> Oh well.  For now we have the XFS kmem_wrappers to turn that into
> GFP_NOFS so we shouldn't be too worried, but I think that is something
> we should fix in lockdep to ensure it is generally useful.  I've added
> the maintainers and relevant lists to kick off a discussion.

Strictly speaking the fs_reclaim annotation is no longer part of the
lockdep core, but is simply a fake lock in page_alloc.c and thus falls
under the mm people's purview.

That said; it should be fairly straight forward to teach
__need_fs_reclaim() about PF_MEMALLOC_NOFS, much like how it already
knows about PF_MEMALLOC.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 2/3] xfs: add kmem_alloc_io()
  2019-08-22  8:51         ` Peter Zijlstra
@ 2019-08-22  9:10           ` Peter Zijlstra
  2019-08-22 10:14             ` Dave Chinner
  0 siblings, 1 reply; 10+ messages in thread
From: Peter Zijlstra @ 2019-08-22  9:10 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Dave Chinner, linux-xfs, Ingo Molnar, Will Deacon, linux-kernel,
	linux-mm, penguin-kernel

On Thu, Aug 22, 2019 at 10:51:30AM +0200, Peter Zijlstra wrote:
> On Thu, Aug 22, 2019 at 12:59:48AM -0700, Christoph Hellwig wrote:
> > On Thu, Aug 22, 2019 at 10:31:32AM +1000, Dave Chinner wrote:
> > > > Btw, I think we should eventually kill off KM_NOFS and just use
> > > > PF_MEMALLOC_NOFS in XFS, as the interface makes so much more sense.
> > > > But that's something for the future.
> > > 
> > > Yeah, and it's not quite as simple as just using PF_MEMALLOC_NOFS
> > > at high levels - we'll still need to annotate callers that use KM_NOFS
> > > to avoid lockdep false positives. i.e. any code that can be called from
> > > GFP_KERNEL and reclaim context will throw false positives from
> > > lockdep if we don't annotate tehm correctly....
> > 
> > Oh well.  For now we have the XFS kmem_wrappers to turn that into
> > GFP_NOFS so we shouldn't be too worried, but I think that is something
> > we should fix in lockdep to ensure it is generally useful.  I've added
> > the maintainers and relevant lists to kick off a discussion.
> 
> Strictly speaking the fs_reclaim annotation is no longer part of the
> lockdep core, but is simply a fake lock in page_alloc.c and thus falls
> under the mm people's purview.
> 
> That said; it should be fairly straight forward to teach
> __need_fs_reclaim() about PF_MEMALLOC_NOFS, much like how it already
> knows about PF_MEMALLOC.

Ah, current_gfp_context() already seems to transfer PF_MEMALLOC_NOFS
into the GFP flags.

So are we sure it is broken and needs mending?

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 2/3] xfs: add kmem_alloc_io()
  2019-08-22  9:10           ` Peter Zijlstra
@ 2019-08-22 10:14             ` Dave Chinner
  2019-08-22 11:14               ` Vlastimil Babka
  0 siblings, 1 reply; 10+ messages in thread
From: Dave Chinner @ 2019-08-22 10:14 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Christoph Hellwig, linux-xfs, Ingo Molnar, Will Deacon,
	linux-kernel, linux-mm, penguin-kernel

On Thu, Aug 22, 2019 at 11:10:57AM +0200, Peter Zijlstra wrote:
> On Thu, Aug 22, 2019 at 10:51:30AM +0200, Peter Zijlstra wrote:
> > On Thu, Aug 22, 2019 at 12:59:48AM -0700, Christoph Hellwig wrote:
> > > On Thu, Aug 22, 2019 at 10:31:32AM +1000, Dave Chinner wrote:
> > > > > Btw, I think we should eventually kill off KM_NOFS and just use
> > > > > PF_MEMALLOC_NOFS in XFS, as the interface makes so much more sense.
> > > > > But that's something for the future.
> > > > 
> > > > Yeah, and it's not quite as simple as just using PF_MEMALLOC_NOFS
> > > > at high levels - we'll still need to annotate callers that use KM_NOFS
> > > > to avoid lockdep false positives. i.e. any code that can be called from
> > > > GFP_KERNEL and reclaim context will throw false positives from
> > > > lockdep if we don't annotate tehm correctly....
> > > 
> > > Oh well.  For now we have the XFS kmem_wrappers to turn that into
> > > GFP_NOFS so we shouldn't be too worried, but I think that is something
> > > we should fix in lockdep to ensure it is generally useful.  I've added
> > > the maintainers and relevant lists to kick off a discussion.
> > 
> > Strictly speaking the fs_reclaim annotation is no longer part of the
> > lockdep core, but is simply a fake lock in page_alloc.c and thus falls
> > under the mm people's purview.
> > 
> > That said; it should be fairly straight forward to teach
> > __need_fs_reclaim() about PF_MEMALLOC_NOFS, much like how it already
> > knows about PF_MEMALLOC.
> 
> Ah, current_gfp_context() already seems to transfer PF_MEMALLOC_NOFS
> into the GFP flags.
> 
> So are we sure it is broken and needs mending?

Well, that's what we are trying to work out. The problem is that we
have code that takes locks and does allocations that is called both
above and below the reclaim "lock" context. Once it's been seen
below the reclaim lock context, calling it with GFP_KERNEL context
above the reclaim lock context throws a deadlock warning.

The only way around that was to mark these allocation sites as
GFP_NOFS so lockdep is never allowed to see that recursion through
reclaim occur. Even though it isn't a deadlock vector.

What we're looking at is whether PF_MEMALLOC_NOFS changes this - I
don't think it does solve this problem. i.e. if we define the
allocation as GFP_KERNEL and then use PF_MEMALLOC_NOFS where reclaim
is not allowed, we still have GFP_KERNEL allocations in code above
reclaim that has also been seen below relcaim. And so we'll get
false positive warnings again.

What I think we are going to have to do here is manually audit
each of the KM_NOFS call sites as we remove the NOFS from them and
determine if ___GFP_NOLOCKDEP is needed to stop lockdep from trying
to track these allocation sites. We've never used this tag because
we'd already fixed most of these false positives with explicit
GFP_NOFS tags long before ___GFP_NOLOCKDEP was created.

But until someone starts doing the work, I don't know if it will
work or even whether conversion PF_MEMALLOC_NOFS is going to
introduce a bunch of new ways to get false positives from lockdep...

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 2/3] xfs: add kmem_alloc_io()
  2019-08-22 10:14             ` Dave Chinner
@ 2019-08-22 11:14               ` Vlastimil Babka
  2019-08-22 12:07                 ` Dave Chinner
  0 siblings, 1 reply; 10+ messages in thread
From: Vlastimil Babka @ 2019-08-22 11:14 UTC (permalink / raw)
  To: Dave Chinner, Peter Zijlstra
  Cc: Christoph Hellwig, linux-xfs, Ingo Molnar, Will Deacon,
	linux-kernel, linux-mm, penguin-kernel

On 8/22/19 12:14 PM, Dave Chinner wrote:
> On Thu, Aug 22, 2019 at 11:10:57AM +0200, Peter Zijlstra wrote:
>> 
>> Ah, current_gfp_context() already seems to transfer PF_MEMALLOC_NOFS
>> into the GFP flags.
>> 
>> So are we sure it is broken and needs mending?
> 
> Well, that's what we are trying to work out. The problem is that we
> have code that takes locks and does allocations that is called both
> above and below the reclaim "lock" context. Once it's been seen
> below the reclaim lock context, calling it with GFP_KERNEL context
> above the reclaim lock context throws a deadlock warning.
> 
> The only way around that was to mark these allocation sites as
> GFP_NOFS so lockdep is never allowed to see that recursion through
> reclaim occur. Even though it isn't a deadlock vector.
> 
> What we're looking at is whether PF_MEMALLOC_NOFS changes this - I
> don't think it does solve this problem. i.e. if we define the
> allocation as GFP_KERNEL and then use PF_MEMALLOC_NOFS where reclaim
> is not allowed, we still have GFP_KERNEL allocations in code above
> reclaim that has also been seen below relcaim. And so we'll get
> false positive warnings again.

If I understand both you and the code directly, the code sites won't call
__fs_reclaim_acquire when called with current->flags including PF_MEMALLOC_NOFS.
So that would mean they "won't be seen below the reclaim" and all would be fine,
right?

> What I think we are going to have to do here is manually audit
> each of the KM_NOFS call sites as we remove the NOFS from them and
> determine if ___GFP_NOLOCKDEP is needed to stop lockdep from trying
> to track these allocation sites. We've never used this tag because
> we'd already fixed most of these false positives with explicit
> GFP_NOFS tags long before ___GFP_NOLOCKDEP was created.
> 
> But until someone starts doing the work, I don't know if it will
> work or even whether conversion PF_MEMALLOC_NOFS is going to
> introduce a bunch of new ways to get false positives from lockdep...
> 
> Cheers,
> 
> Dave.
> 


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 2/3] xfs: add kmem_alloc_io()
  2019-08-22 11:14               ` Vlastimil Babka
@ 2019-08-22 12:07                 ` Dave Chinner
  2019-08-22 12:19                   ` Vlastimil Babka
  0 siblings, 1 reply; 10+ messages in thread
From: Dave Chinner @ 2019-08-22 12:07 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Peter Zijlstra, Christoph Hellwig, linux-xfs, Ingo Molnar,
	Will Deacon, linux-kernel, linux-mm, penguin-kernel

On Thu, Aug 22, 2019 at 01:14:30PM +0200, Vlastimil Babka wrote:
> On 8/22/19 12:14 PM, Dave Chinner wrote:
> > On Thu, Aug 22, 2019 at 11:10:57AM +0200, Peter Zijlstra wrote:
> >> 
> >> Ah, current_gfp_context() already seems to transfer PF_MEMALLOC_NOFS
> >> into the GFP flags.
> >> 
> >> So are we sure it is broken and needs mending?
> > 
> > Well, that's what we are trying to work out. The problem is that we
> > have code that takes locks and does allocations that is called both
> > above and below the reclaim "lock" context. Once it's been seen
> > below the reclaim lock context, calling it with GFP_KERNEL context
> > above the reclaim lock context throws a deadlock warning.
> > 
> > The only way around that was to mark these allocation sites as
> > GFP_NOFS so lockdep is never allowed to see that recursion through
> > reclaim occur. Even though it isn't a deadlock vector.
> > 
> > What we're looking at is whether PF_MEMALLOC_NOFS changes this - I
> > don't think it does solve this problem. i.e. if we define the
> > allocation as GFP_KERNEL and then use PF_MEMALLOC_NOFS where reclaim
> > is not allowed, we still have GFP_KERNEL allocations in code above
> > reclaim that has also been seen below relcaim. And so we'll get
> > false positive warnings again.
> 
> If I understand both you and the code directly, the code sites won't call
> __fs_reclaim_acquire when called with current->flags including PF_MEMALLOC_NOFS.
> So that would mean they "won't be seen below the reclaim" and all would be fine,
> right?

No, the problem is this (using kmalloc as a general term for
allocation, whether it be kmalloc, kmem_cache_alloc, alloc_page, etc)

   some random kernel code
    kmalloc(GFP_KERNEL)
     reclaim
     PF_MEMALLOC
     shrink_slab
      xfs_inode_shrink
       XFS_ILOCK
        xfs_buf_allocate_memory()
         kmalloc(GFP_KERNEL)

And so locks on inodes in reclaim are seen below reclaim. Then
somewhere else we have:

   some high level read-only xfs code like readdir
    XFS_ILOCK
     xfs_buf_allocate_memory()
      kmalloc(GFP_KERNEL)
       reclaim

And this one throws false positive lockdep warnings because we
called into reclaim with XFS_ILOCK held and GFP_KERNEL alloc
context. So the only solution we had at the tiem to shut it up was:

   some high level read-only xfs code like readdir
    XFS_ILOCK
     xfs_buf_allocate_memory()
      kmalloc(GFP_NOFS)

So that lockdep sees it's not going to recurse into reclaim and
doesn't throw a warning...

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 2/3] xfs: add kmem_alloc_io()
  2019-08-22 12:07                 ` Dave Chinner
@ 2019-08-22 12:19                   ` Vlastimil Babka
  2019-08-22 13:17                     ` Dave Chinner
  0 siblings, 1 reply; 10+ messages in thread
From: Vlastimil Babka @ 2019-08-22 12:19 UTC (permalink / raw)
  To: Dave Chinner
  Cc: Peter Zijlstra, Christoph Hellwig, linux-xfs, Ingo Molnar,
	Will Deacon, linux-kernel, linux-mm, penguin-kernel

On 8/22/19 2:07 PM, Dave Chinner wrote:
> On Thu, Aug 22, 2019 at 01:14:30PM +0200, Vlastimil Babka wrote:
> 
> No, the problem is this (using kmalloc as a general term for
> allocation, whether it be kmalloc, kmem_cache_alloc, alloc_page, etc)
> 
>    some random kernel code
>     kmalloc(GFP_KERNEL)
>      reclaim
>      PF_MEMALLOC
>      shrink_slab
>       xfs_inode_shrink
>        XFS_ILOCK
>         xfs_buf_allocate_memory()
>          kmalloc(GFP_KERNEL)
> 
> And so locks on inodes in reclaim are seen below reclaim. Then
> somewhere else we have:
> 
>    some high level read-only xfs code like readdir
>     XFS_ILOCK
>      xfs_buf_allocate_memory()
>       kmalloc(GFP_KERNEL)
>        reclaim
> 
> And this one throws false positive lockdep warnings because we
> called into reclaim with XFS_ILOCK held and GFP_KERNEL alloc

OK, and what exactly makes this positive a false one? Why can't it continue like
the first example where reclaim leads to another XFS_ILOCK, thus deadlock?

> context. So the only solution we had at the tiem to shut it up was:
> 
>    some high level read-only xfs code like readdir
>     XFS_ILOCK
>      xfs_buf_allocate_memory()
>       kmalloc(GFP_NOFS)
> 
> So that lockdep sees it's not going to recurse into reclaim and
> doesn't throw a warning...

AFAICS that GFP_NOFS would fix not only a warning but also a real deadlock
(depending on the answer to my previous question).

> Cheers,
> 
> Dave.
> 


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 2/3] xfs: add kmem_alloc_io()
  2019-08-22 12:19                   ` Vlastimil Babka
@ 2019-08-22 13:17                     ` Dave Chinner
  2019-08-22 14:26                       ` Vlastimil Babka
  0 siblings, 1 reply; 10+ messages in thread
From: Dave Chinner @ 2019-08-22 13:17 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Peter Zijlstra, Christoph Hellwig, linux-xfs, Ingo Molnar,
	Will Deacon, linux-kernel, linux-mm, penguin-kernel

On Thu, Aug 22, 2019 at 02:19:04PM +0200, Vlastimil Babka wrote:
> On 8/22/19 2:07 PM, Dave Chinner wrote:
> > On Thu, Aug 22, 2019 at 01:14:30PM +0200, Vlastimil Babka wrote:
> > 
> > No, the problem is this (using kmalloc as a general term for
> > allocation, whether it be kmalloc, kmem_cache_alloc, alloc_page, etc)
> > 
> >    some random kernel code
> >     kmalloc(GFP_KERNEL)
> >      reclaim
> >      PF_MEMALLOC
> >      shrink_slab
> >       xfs_inode_shrink
> >        XFS_ILOCK
> >         xfs_buf_allocate_memory()
> >          kmalloc(GFP_KERNEL)
> > 
> > And so locks on inodes in reclaim are seen below reclaim. Then
> > somewhere else we have:
> > 
> >    some high level read-only xfs code like readdir
> >     XFS_ILOCK
> >      xfs_buf_allocate_memory()
> >       kmalloc(GFP_KERNEL)
> >        reclaim
> > 
> > And this one throws false positive lockdep warnings because we
> > called into reclaim with XFS_ILOCK held and GFP_KERNEL alloc
> 
> OK, and what exactly makes this positive a false one? Why can't it continue like
> the first example where reclaim leads to another XFS_ILOCK, thus deadlock?

Because above reclaim we only have operations being done on
referenced inodes, and below reclaim we only have unreferenced
inodes. We never lock the same inode both above and below reclaim
at the same time.

IOWs, an operation above reclaim cannot see, access or lock
unreferenced inodes, except in inode write clustering, and that uses
trylocks so cannot deadlock with reclaim.

An operation below reclaim cannot see, access or lock referenced
inodes except during inode write clustering, and that uses trylocks
so cannot deadlock with code above reclaim.

FWIW, I'm trying to make the inode writeback clustering go away from
reclaim at the moment, so even that possibility is going away soon.
That will change everything to trylocks in reclaim context, so
lockdep is going to stop tracking it entirely.

Hmmm - maybe we're getting to the point where we actually
don't need GFP_NOFS/PF_MEMALLOC_NOFS at all in XFS anymore.....

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 2/3] xfs: add kmem_alloc_io()
  2019-08-22 13:17                     ` Dave Chinner
@ 2019-08-22 14:26                       ` Vlastimil Babka
  2019-08-26 12:21                         ` Michal Hocko
  0 siblings, 1 reply; 10+ messages in thread
From: Vlastimil Babka @ 2019-08-22 14:26 UTC (permalink / raw)
  To: Dave Chinner
  Cc: Peter Zijlstra, Christoph Hellwig, linux-xfs, Ingo Molnar,
	Will Deacon, linux-kernel, linux-mm, penguin-kernel

On 8/22/19 3:17 PM, Dave Chinner wrote:
> On Thu, Aug 22, 2019 at 02:19:04PM +0200, Vlastimil Babka wrote:
>> On 8/22/19 2:07 PM, Dave Chinner wrote:
>> > On Thu, Aug 22, 2019 at 01:14:30PM +0200, Vlastimil Babka wrote:
>> > 
>> > No, the problem is this (using kmalloc as a general term for
>> > allocation, whether it be kmalloc, kmem_cache_alloc, alloc_page, etc)
>> > 
>> >    some random kernel code
>> >     kmalloc(GFP_KERNEL)
>> >      reclaim
>> >      PF_MEMALLOC
>> >      shrink_slab
>> >       xfs_inode_shrink
>> >        XFS_ILOCK
>> >         xfs_buf_allocate_memory()
>> >          kmalloc(GFP_KERNEL)
>> > 
>> > And so locks on inodes in reclaim are seen below reclaim. Then
>> > somewhere else we have:
>> > 
>> >    some high level read-only xfs code like readdir
>> >     XFS_ILOCK
>> >      xfs_buf_allocate_memory()
>> >       kmalloc(GFP_KERNEL)
>> >        reclaim
>> > 
>> > And this one throws false positive lockdep warnings because we
>> > called into reclaim with XFS_ILOCK held and GFP_KERNEL alloc
>> 
>> OK, and what exactly makes this positive a false one? Why can't it continue like
>> the first example where reclaim leads to another XFS_ILOCK, thus deadlock?
> 
> Because above reclaim we only have operations being done on
> referenced inodes, and below reclaim we only have unreferenced
> inodes. We never lock the same inode both above and below reclaim
> at the same time.
> 
> IOWs, an operation above reclaim cannot see, access or lock
> unreferenced inodes, except in inode write clustering, and that uses
> trylocks so cannot deadlock with reclaim.
> 
> An operation below reclaim cannot see, access or lock referenced
> inodes except during inode write clustering, and that uses trylocks
> so cannot deadlock with code above reclaim.

Thanks for elaborating. Perhaps lockdep experts (not me) would know how to
express that. If not possible, then replacing GFP_NOFS with __GFP_NOLOCKDEP
should indeed suppress the warning, while allowing FS reclaim.

> FWIW, I'm trying to make the inode writeback clustering go away from
> reclaim at the moment, so even that possibility is going away soon.
> That will change everything to trylocks in reclaim context, so
> lockdep is going to stop tracking it entirely.

That's also a nice solution :)

> Hmmm - maybe we're getting to the point where we actually
> don't need GFP_NOFS/PF_MEMALLOC_NOFS at all in XFS anymore.....
> 
> Cheers,
> 
> Dave.
> 


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 2/3] xfs: add kmem_alloc_io()
  2019-08-22 14:26                       ` Vlastimil Babka
@ 2019-08-26 12:21                         ` Michal Hocko
  0 siblings, 0 replies; 10+ messages in thread
From: Michal Hocko @ 2019-08-26 12:21 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Dave Chinner, Peter Zijlstra, Christoph Hellwig, linux-xfs,
	Ingo Molnar, Will Deacon, linux-kernel, linux-mm, penguin-kernel

On Thu 22-08-19 16:26:42, Vlastimil Babka wrote:
> On 8/22/19 3:17 PM, Dave Chinner wrote:
> > On Thu, Aug 22, 2019 at 02:19:04PM +0200, Vlastimil Babka wrote:
> >> On 8/22/19 2:07 PM, Dave Chinner wrote:
> >> > On Thu, Aug 22, 2019 at 01:14:30PM +0200, Vlastimil Babka wrote:
> >> > 
> >> > No, the problem is this (using kmalloc as a general term for
> >> > allocation, whether it be kmalloc, kmem_cache_alloc, alloc_page, etc)
> >> > 
> >> >    some random kernel code
> >> >     kmalloc(GFP_KERNEL)
> >> >      reclaim
> >> >      PF_MEMALLOC
> >> >      shrink_slab
> >> >       xfs_inode_shrink
> >> >        XFS_ILOCK
> >> >         xfs_buf_allocate_memory()
> >> >          kmalloc(GFP_KERNEL)
> >> > 
> >> > And so locks on inodes in reclaim are seen below reclaim. Then
> >> > somewhere else we have:
> >> > 
> >> >    some high level read-only xfs code like readdir
> >> >     XFS_ILOCK
> >> >      xfs_buf_allocate_memory()
> >> >       kmalloc(GFP_KERNEL)
> >> >        reclaim
> >> > 
> >> > And this one throws false positive lockdep warnings because we
> >> > called into reclaim with XFS_ILOCK held and GFP_KERNEL alloc
> >> 
> >> OK, and what exactly makes this positive a false one? Why can't it continue like
> >> the first example where reclaim leads to another XFS_ILOCK, thus deadlock?
> > 
> > Because above reclaim we only have operations being done on
> > referenced inodes, and below reclaim we only have unreferenced
> > inodes. We never lock the same inode both above and below reclaim
> > at the same time.
> > 
> > IOWs, an operation above reclaim cannot see, access or lock
> > unreferenced inodes, except in inode write clustering, and that uses
> > trylocks so cannot deadlock with reclaim.
> > 
> > An operation below reclaim cannot see, access or lock referenced
> > inodes except during inode write clustering, and that uses trylocks
> > so cannot deadlock with code above reclaim.
> 
> Thanks for elaborating. Perhaps lockdep experts (not me) would know how to
> express that. If not possible, then replacing GFP_NOFS with __GFP_NOLOCKDEP
> should indeed suppress the warning, while allowing FS reclaim.

This was certainly my hope to happen when introducing __GFP_NOLOCKDEP.
I couldn't have done the second step because that requires a deep
understanding of the code in question which is beyond my capacity. It
seems we still haven't found a brave soul to start converting GFP_NOFS
to __GFP_NOLOCKDEP. And it would be really appreciated.

Thanks.
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2019-08-26 12:21 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <20190821083820.11725-1-david@fromorbit.com>
     [not found] ` <20190821083820.11725-3-david@fromorbit.com>
     [not found]   ` <20190821232440.GB24904@infradead.org>
     [not found]     ` <20190822003131.GR1119@dread.disaster.area>
2019-08-22  7:59       ` [PATCH 2/3] xfs: add kmem_alloc_io() Christoph Hellwig
2019-08-22  8:51         ` Peter Zijlstra
2019-08-22  9:10           ` Peter Zijlstra
2019-08-22 10:14             ` Dave Chinner
2019-08-22 11:14               ` Vlastimil Babka
2019-08-22 12:07                 ` Dave Chinner
2019-08-22 12:19                   ` Vlastimil Babka
2019-08-22 13:17                     ` Dave Chinner
2019-08-22 14:26                       ` Vlastimil Babka
2019-08-26 12:21                         ` Michal Hocko

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).