linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: NeilBrown <mr@neil.brown.name>
To: Michal Hocko <mhocko@kernel.org>
Cc: linux-mm@kvack.org, linux-fsdevel@vger.kernel.org,
	Andrew Morton <akpm@linux-foundation.org>,
	Dave Chinner <david@fromorbit.com>,
	"Theodore Ts'o" <tytso@mit.edu>, Chris Mason <clm@fb.com>,
	Jan Kara <jack@suse.cz>,
	ceph-devel@vger.kernel.org, cluster-devel@redhat.com,
	linux-nfs@vger.kernel.org, logfs@logfs.org, xfs@oss.sgi.com,
	linux-ext4@vger.kernel.org, linux-btrfs@vger.kernel.org,
	linux-mtd@lists.infradead.org, reiserfs-devel@vger.kernel.org,
	linux-ntfs-dev@lists.sourceforge.net,
	linux-f2fs-devel@lists.sourceforge.net,
	linux-afs@lists.infradead.org,
	LKML <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH 0/2] scop GFP_NOFS api
Date: Wed, 04 May 2016 09:26:15 +1000	[thread overview]
Message-ID: <87futyd8q0.fsf@notabene.neil.brown.name> (raw)
In-Reply-To: <20160503151312.GA4470@dhcp22.suse.cz>

[-- Attachment #1: Type: text/plain, Size: 3742 bytes --]

On Wed, May 04 2016, Michal Hocko wrote:

> Hi,
>
> On Sun 01-05-16 07:55:31, NeilBrown wrote:
> [...]
>> One particular problem with your process-context idea is that it isn't
>> inherited across threads.
>> Steve Whitehouse's example in gfs shows how allocation dependencies can
>> even cross into user space.
>
> Hmm, I am still not sure I understand that example completely but making
> a dependency between direct reclaim and userspace can hardly work.

No it can't.  Specifically: if direct reclaim blocks on user-space that
must be a problem.
I think the point of this example is that some filesystem things can
block on user-space in ways that are very hard to encode in with flags
as they are multi-level indirect.
So the conclusion (my conclusion) is that direct reclaim mustn't block.

When I was working on deadlock avoidance in loop-back NFS I went down
the path of adding GFP flags and extended the PF_FSTRANS flag and got it
working (think) but no-one liked it.  It was way too intrusive.

Some how I landed on the idea of making nfs_release_page non blocking
and everything suddenly became much much simpler.  Problems evaporated.

NFS has a distinct advantage here.  The "Close-to-open" cache semantic
means that all dirty pages must be flushed on last close.  So when
->evict_inode is finally called there is nothing much to do - just free
everything up.  So I could fix NFS without worrying about (or even
noticing) evict_inode.


> Especially when the direct reclaim might be sitting on top of hard to
> guess pile of locks. So unless I've missed anything what Steve has
> described is a clear NOFS context.
>
>> A more localized one that I have seen is that NFSv4 sometimes needs to
>> start up a state-management thread (particularly if the server
>> restarted).
>> It uses kthread_run(), which doesn't actually create the thread but asks
>> kthreadd to do it.  If NFS writeout is waiting for state management it
>> would need to make sure that kthreadd runs in allocation context to
>> avoid deadlock.
>> I feel that I've forgotten some important detail here and this might
>> have been fixed somehow, but the point still stands that the allocation
>> context can cross from thread to thread and can effectively become
>> anything and everything.
>
> Not sure I understand your point here but relying on kthread_run
> from GFP_NOFS context has always been deadlock prone with or without
> scope GFP_NOFS semantic so I am not really sure I see your point
> here. Similarly relying on a work item which doesn't have a dedicated
> WQ_MEM_RECLAIM WQ is deadlock prone.  You simply shouldn't do that.

The point is really that saying "You shouldn't do that" isn't much good
when "that" is exactly what the fs developer wants to do and it seems to
work and never triggers a warning.

You can create lots of rules about what is or is not allowed, or
you can make everything that it not explicit forbidden (ideally at
compile time but possibly at runtime with might_sleep or lockdep),
permitted.

I prefer the latter.

>
>> It is OK to wait for memory to be freed.  It is not OK to wait for any
>> particular piece of memory to be freed because you don't always know who
>> is waiting for you, or who you really are waiting on to free that
>> memory.
>> 
>> Whenever trying to free memory I think you need to do best-effort
>> without blocking.
>
> I agree with that. Or at least you have to wait on something that is
> _guaranteed_ to make a forward progress. I am not really that sure this
> is easy to achieve with the current code base.

I accept that it isn't "easy".  But I claim that it isn't particularly
difficult either.

NeilBrown

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 818 bytes --]

  reply	other threads:[~2016-05-03 23:26 UTC|newest]

Thread overview: 38+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-04-26 11:56 [PATCH 0/2] scop GFP_NOFS api Michal Hocko
2016-04-26 11:56 ` [PATCH 1/2] mm: add PF_MEMALLOC_NOFS Michal Hocko
2016-04-26 23:07   ` Dave Chinner
2016-04-27  7:51     ` Michal Hocko
2016-04-27 10:53   ` Tetsuo Handa
2016-04-27 11:15     ` Michal Hocko
2016-04-27 14:44       ` Tetsuo Handa
2016-04-27 20:05         ` Michal Hocko
2016-04-27 11:54   ` [PATCH 1.1/2] xfs: abstract PF_FSTRANS to PF_MEMALLOC_NOFS Michal Hocko
2016-04-27 11:54     ` [PATCH 1.2/2] mm: introduce memalloc_nofs_{save,restore} API Michal Hocko
2016-04-27 13:07       ` Michal Hocko
2016-04-27 20:09       ` Michal Hocko
2016-04-27 20:30         ` Michal Hocko
2016-04-27 21:14       ` Michal Hocko
2016-04-27 17:41     ` [PATCH 1.1/2] xfs: abstract PF_FSTRANS to PF_MEMALLOC_NOFS Andreas Dilger
2016-04-27 19:43       ` Michal Hocko
2016-04-26 11:56 ` [PATCH 2/2] mm, debug: report when GFP_NO{FS,IO} is used explicitly from memalloc_no{fs,io}_{save,restore} context Michal Hocko
2016-04-26 22:58   ` Dave Chinner
2016-04-27  8:03     ` Michal Hocko
2016-04-27 22:55       ` Dave Chinner
2016-04-28  8:17     ` Michal Hocko
2016-04-28 21:51       ` Dave Chinner
2016-04-29 12:12         ` Michal Hocko
2016-04-29 23:40           ` Dave Chinner
2016-05-03 15:38             ` Michal Hocko
2016-05-04  0:07               ` Dave Chinner
2016-04-29  5:35 ` [PATCH 0/2] scop GFP_NOFS api NeilBrown
2016-04-29 10:20   ` [Cluster-devel] " Steven Whitehouse
2016-04-30 21:17     ` NeilBrown
2016-04-29 12:04   ` Michal Hocko
2016-04-30  0:24     ` Dave Chinner
2016-04-30 21:55     ` NeilBrown
2016-05-03 15:13       ` Michal Hocko
2016-05-03 23:26         ` NeilBrown [this message]
2016-04-30  0:11   ` Dave Chinner
2016-04-30 22:19     ` NeilBrown
2016-05-04  1:00       ` Dave Chinner
2016-05-06  3:20         ` NeilBrown

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87futyd8q0.fsf@notabene.neil.brown.name \
    --to=mr@neil.brown.name \
    --cc=akpm@linux-foundation.org \
    --cc=ceph-devel@vger.kernel.org \
    --cc=clm@fb.com \
    --cc=cluster-devel@redhat.com \
    --cc=david@fromorbit.com \
    --cc=jack@suse.cz \
    --cc=linux-afs@lists.infradead.org \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=linux-ext4@vger.kernel.org \
    --cc=linux-f2fs-devel@lists.sourceforge.net \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-mtd@lists.infradead.org \
    --cc=linux-nfs@vger.kernel.org \
    --cc=linux-ntfs-dev@lists.sourceforge.net \
    --cc=logfs@logfs.org \
    --cc=mhocko@kernel.org \
    --cc=reiserfs-devel@vger.kernel.org \
    --cc=tytso@mit.edu \
    --cc=xfs@oss.sgi.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).