Linux-mm Archive on lore.kernel.org
 help / color / Atom feed
From: Michal Hocko <mhocko@kernel.org>
To: Jason Gunthorpe <jgg@ziepe.ca>
Cc: LKML <linux-kernel@vger.kernel.org>,
	linux-mm@kvack.org,
	"DRI Development" <dri-devel@lists.freedesktop.org>,
	"Intel Graphics Development" <intel-gfx@lists.freedesktop.org>,
	"Peter Zijlstra" <peterz@infradead.org>,
	"Ingo Molnar" <mingo@redhat.com>,
	"Andrew Morton" <akpm@linux-foundation.org>,
	"David Rientjes" <rientjes@google.com>,
	"Christian König" <christian.koenig@amd.com>,
	"Jérôme Glisse" <jglisse@redhat.com>,
	"Masahiro Yamada" <yamada.masahiro@socionext.com>,
	"Wei Wang" <wvw@google.com>,
	"Andy Shevchenko" <andriy.shevchenko@linux.intel.com>,
	"Thomas Gleixner" <tglx@linutronix.de>,
	"Jann Horn" <jannh@google.com>, "Feng Tang" <feng.tang@intel.com>,
	"Kees Cook" <keescook@chromium.org>,
	"Randy Dunlap" <rdunlap@infradead.org>,
	"Daniel Vetter" <daniel.vetter@intel.com>
Subject: Re: [PATCH 2/5] kernel.h: Add non_block_start/end()
Date: Thu, 15 Aug 2019 18:00:41 +0200
Message-ID: <20190815155950.GN9477@dhcp22.suse.cz> (raw)
In-Reply-To: <20190815141219.GF21596@ziepe.ca>

On Thu 15-08-19 11:12:19, Jason Gunthorpe wrote:
> On Thu, Aug 15, 2019 at 03:21:27PM +0200, Michal Hocko wrote:
> > On Thu 15-08-19 09:23:44, Jason Gunthorpe wrote:
> > > On Thu, Aug 15, 2019 at 08:58:29AM +0200, Daniel Vetter wrote:
> > > > On Wed, Aug 14, 2019 at 08:58:05PM -0300, Jason Gunthorpe wrote:
> > > > > On Wed, Aug 14, 2019 at 10:20:24PM +0200, Daniel Vetter wrote:
> > > > > > In some special cases we must not block, but there's not a
> > > > > > spinlock, preempt-off, irqs-off or similar critical section already
> > > > > > that arms the might_sleep() debug checks. Add a non_block_start/end()
> > > > > > pair to annotate these.
> > > > > > 
> > > > > > This will be used in the oom paths of mmu-notifiers, where blocking is
> > > > > > not allowed to make sure there's forward progress. Quoting Michal:
> > > > > > 
> > > > > > "The notifier is called from quite a restricted context - oom_reaper -
> > > > > > which shouldn't depend on any locks or sleepable conditionals. The code
> > > > > > should be swift as well but we mostly do care about it to make a forward
> > > > > > progress. Checking for sleepable context is the best thing we could come
> > > > > > up with that would describe these demands at least partially."
> > > > > 
> > > > > But this describes fs_reclaim_acquire() - is there some reason we are
> > > > > conflating fs_reclaim with non-sleeping?
> > > > 
> > > > No idea why you tie this into fs_reclaim. We can definitly sleep in there,
> > > > and for e.g. kswapd (which also wraps everything in fs_reclaim) we're
> > > > event supposed to I thought. To make sure we can get at the last bit of
> > > > memory by flushing all the queues and waiting for everything to be cleaned
> > > > out.
> > > 
> > > AFAIK the point of fs_reclaim is to prevent "indirect dependency upon
> > > the page allocator" ie a justification that was given this !blockable
> > > stuff.
> > > 
> > > For instance:
> > > 
> > >   fs_reclaim_acquire()
> > >   kmalloc(GFP_KERNEL) <- lock dep assertion
> > > 
> > > And further, Michal's concern about indirectness through locks is also
> > > handled by lockdep:
> > > 
> > >        CPU0                                 CPU1
> > >                                         mutex_lock()
> > >                                         kmalloc(GFP_KERNEL)
> > >                                         mutex_unlock()
> > >   fs_reclaim_acquire()
> > >   mutex_lock() <- lock dep assertion
> > > 
> > > In other words, to prevent recursion into the page allocator you use
> > > fs_reclaim_acquire(), and lockdep verfies it in its usual robust way.
> > 
> > fs_reclaim_acquire is about FS/IO recursions IIUC. We are talking about
> > any !GFP_NOWAIT allocation context here and any {in}direct dependency on
> > it. 
> 
> AFAIK 'GFP_NOWAIT' is characterized by the lack of __GFP_FS and
> __GFP_DIRECT_RECLAIM..
>
> This matches the existing test in __need_fs_reclaim() - so if you are
> OK with GFP_NOFS, aka __GFP_IO which triggers try_to_compact_pages(),
> allocations during OOM, then I think fs_reclaim already matches what
> you described?

No GFP_NOFS is equally bad. Please read my other email explaining what
the oom_reaper actually requires. In short no blocking on direct or
indirect dependecy on memory allocation that might sleep. If you can
express that in the existing lockdep machinery. All fine. But then
consider deployments where lockdep is no-no because of the overhead.

> > No, non-blocking is a very coarse approximation of what we really need.
> > But it should give us even a stronger condition. Essentially any sleep
> > other than a preemption shouldn't be allowed in that context.
> 
> But it is a nonsense API to give the driver invalidate_range_start,
> the blocking alternative to the non-blocking invalidate_range and then
> demand it to be non-blocking.
> 
> Inspecting the code, no drivers are actually able to progress their
> side in non-blocking mode.
> 
> The best we got was drivers tested the VA range and returned success
> if they had no interest. Which is a big win to be sure, but it looks
> like getting any more is not really posssible.

And that is already a great win! Because many notifiers only do care
about particular mappings. Please note that backing off unconditioanlly
will simply cause that the oom reaper will have to back off not doing
any tear down anything.

> However, we could (probably even should) make the drivers fs_reclaim
> safe.
> 
> If that is enough to guarantee progress of OOM, then lets consider
> something like using current_gfp_context() to force PF_MEMALLOC_NOFS
> allocation behavior on the driver callback and lockdep to try and keep
> pushing on the the debugging, and dropping !blocking.

How are you going to enforce indirect dependency? E.g. a lock that is
also used in other context which depend on sleepable memory allocation
to move forward.

Really, this was aimed to give a very simple debugging aid. If it is
considered to be so controversial, even though I really do not see why,
then let's just drop it on the floor.
-- 
Michal Hocko
SUSE Labs


  reply index

Thread overview: 69+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-08-14 20:20 [PATCH 0/5] hmm & mmu_notifier debug/lockdep annotations Daniel Vetter
2019-08-14 20:20 ` [PATCH 1/5] mm: Check if mmu notifier callbacks are allowed to fail Daniel Vetter
2019-08-14 22:14   ` Andrew Morton
2019-08-14 23:22     ` Jason Gunthorpe
2019-08-14 23:34     ` Ralph Campbell
2019-08-16 17:19   ` Jason Gunthorpe
2019-08-14 20:20 ` [PATCH 2/5] kernel.h: Add non_block_start/end() Daniel Vetter
2019-08-14 20:45   ` Andrew Morton
2019-08-15  6:52     ` Daniel Vetter
2019-08-15  8:44     ` Michal Hocko
2019-08-15 13:04       ` Jason Gunthorpe
2019-08-15 13:12         ` Daniel Vetter
2019-08-15 14:37           ` Jason Gunthorpe
2019-08-15 14:43             ` Daniel Vetter
2019-08-15 15:10               ` Jason Gunthorpe
2019-08-15 16:25                 ` Daniel Vetter
2019-08-15 17:35                   ` Jason Gunthorpe
2019-08-15 17:39                     ` Jerome Glisse
2019-08-15 18:01                       ` Jason Gunthorpe
2019-08-15 18:27                         ` Jerome Glisse
2019-08-15 18:57                           ` Jason Gunthorpe
2019-08-15 16:32                 ` Jerome Glisse
2019-08-15 17:16                   ` Jason Gunthorpe
2019-08-15 17:21                     ` Daniel Vetter
2019-08-15 17:35                       ` Jerome Glisse
2019-08-15 13:24         ` Michal Hocko
2019-08-15 22:15       ` Andrew Morton
2019-08-16  8:24         ` Michal Hocko
2019-08-14 23:58   ` Jason Gunthorpe
2019-08-15  6:58     ` Daniel Vetter
2019-08-15 12:23       ` Jason Gunthorpe
2019-08-15 13:21         ` Michal Hocko
2019-08-15 14:12           ` Jason Gunthorpe
2019-08-15 16:00             ` Michal Hocko [this message]
2019-08-15 16:56               ` Jason Gunthorpe
2019-08-15 17:11                 ` Jerome Glisse
2019-08-15 17:17                   ` Jason Gunthorpe
2019-08-15 17:42                 ` Michal Hocko
2019-08-15 17:57                   ` Jerome Glisse
2019-08-15 18:24                   ` Jason Gunthorpe
2019-08-15 19:05                     ` Michal Hocko
2019-08-15 19:18                       ` Jason Gunthorpe
2019-08-15 19:35                         ` Michal Hocko
2019-08-15 20:13                           ` Jason Gunthorpe
2019-08-16  8:10                             ` Michal Hocko
2019-08-16 12:19                               ` Jason Gunthorpe
2019-08-16 12:26                                 ` Michal Hocko
2019-08-15 20:16                           ` [Intel-gfx] " Daniel Vetter
2019-08-15 20:27                             ` Jason Gunthorpe
2019-08-15 20:49                               ` Daniel Vetter
2019-08-16  1:00                                 ` Jason Gunthorpe
2019-08-16  6:20                                   ` Daniel Vetter
2019-08-16 12:12                                     ` Jason Gunthorpe
2019-08-16 14:11                                       ` Daniel Vetter
2019-08-16 14:38                                         ` Jason Gunthorpe
2019-08-16 16:36                                           ` Daniel Vetter
2019-08-16 16:54                                             ` Jason Gunthorpe
2019-08-16  8:27                             ` Michal Hocko
2019-08-14 20:20 ` [PATCH 3/5] mm, notifier: Catch sleeping/blocking for !blockable Daniel Vetter
2019-08-15  0:00   ` Jason Gunthorpe
2019-08-15  7:02     ` Daniel Vetter
     [not found]       ` <20190815123556.GB21596@ziepe.ca>
2019-08-17 16:09         ` Daniel Vetter
2019-08-14 20:20 ` [PATCH 4/5] mm, notifier: Add a lockdep map for invalidate_range_start Daniel Vetter
2019-08-15  0:09   ` Jason Gunthorpe
2019-08-15  7:10     ` Daniel Vetter
2019-08-15 12:53       ` Jason Gunthorpe
2019-08-14 20:20 ` [PATCH 5/5] mm/hmm: WARN on illegal ->sync_cpu_device_pagetables errors Daniel Vetter
2019-08-15  0:11   ` Jason Gunthorpe
2019-08-15  7:14     ` Daniel Vetter

Reply instructions:

You may reply publically to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190815155950.GN9477@dhcp22.suse.cz \
    --to=mhocko@kernel.org \
    --cc=akpm@linux-foundation.org \
    --cc=andriy.shevchenko@linux.intel.com \
    --cc=christian.koenig@amd.com \
    --cc=daniel.vetter@intel.com \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=feng.tang@intel.com \
    --cc=intel-gfx@lists.freedesktop.org \
    --cc=jannh@google.com \
    --cc=jgg@ziepe.ca \
    --cc=jglisse@redhat.com \
    --cc=keescook@chromium.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mingo@redhat.com \
    --cc=peterz@infradead.org \
    --cc=rdunlap@infradead.org \
    --cc=rientjes@google.com \
    --cc=tglx@linutronix.de \
    --cc=wvw@google.com \
    --cc=yamada.masahiro@socionext.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Linux-mm Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-mm/0 linux-mm/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-mm linux-mm/ https://lore.kernel.org/linux-mm \
		linux-mm@kvack.org
	public-inbox-index linux-mm

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kvack.linux-mm


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git