Linux-mm Archive on lore.kernel.org
 help / color / Atom feed
From: Jason Gunthorpe <jgg@ziepe.ca>
To: Michal Hocko <mhocko@kernel.org>
Cc: LKML <linux-kernel@vger.kernel.org>,
	linux-mm@kvack.org,
	"DRI Development" <dri-devel@lists.freedesktop.org>,
	"Intel Graphics Development" <intel-gfx@lists.freedesktop.org>,
	"Peter Zijlstra" <peterz@infradead.org>,
	"Ingo Molnar" <mingo@redhat.com>,
	"Andrew Morton" <akpm@linux-foundation.org>,
	"David Rientjes" <rientjes@google.com>,
	"Christian König" <christian.koenig@amd.com>,
	"Jérôme Glisse" <jglisse@redhat.com>,
	"Masahiro Yamada" <yamada.masahiro@socionext.com>,
	"Wei Wang" <wvw@google.com>,
	"Andy Shevchenko" <andriy.shevchenko@linux.intel.com>,
	"Thomas Gleixner" <tglx@linutronix.de>,
	"Jann Horn" <jannh@google.com>, "Feng Tang" <feng.tang@intel.com>,
	"Kees Cook" <keescook@chromium.org>,
	"Randy Dunlap" <rdunlap@infradead.org>,
	"Daniel Vetter" <daniel.vetter@intel.com>
Subject: Re: [PATCH 2/5] kernel.h: Add non_block_start/end()
Date: Thu, 15 Aug 2019 13:56:31 -0300
Message-ID: <20190815165631.GK21596@ziepe.ca> (raw)
In-Reply-To: <20190815155950.GN9477@dhcp22.suse.cz>

On Thu, Aug 15, 2019 at 06:00:41PM +0200, Michal Hocko wrote:

> > AFAIK 'GFP_NOWAIT' is characterized by the lack of __GFP_FS and
> > __GFP_DIRECT_RECLAIM..
> >
> > This matches the existing test in __need_fs_reclaim() - so if you are
> > OK with GFP_NOFS, aka __GFP_IO which triggers try_to_compact_pages(),
> > allocations during OOM, then I think fs_reclaim already matches what
> > you described?
> 
> No GFP_NOFS is equally bad. Please read my other email explaining what
> the oom_reaper actually requires. In short no blocking on direct or
> indirect dependecy on memory allocation that might sleep.

It is much easier to follow with some hints on code, so the true
requirement is that the OOM repear not block on GFP_FS and GFP_IO
allocations, great, that constraint is now clear.

> If you can express that in the existing lockdep machinery. All
> fine. But then consider deployments where lockdep is no-no because
> of the overhead.

This is all for driver debugging. The point of lockdep is to find all
these paths without have to hit them as actual races, using debug
kernels.

I don't think we need this kind of debugging on production kernels?

> > The best we got was drivers tested the VA range and returned success
> > if they had no interest. Which is a big win to be sure, but it looks
> > like getting any more is not really posssible.
> 
> And that is already a great win! Because many notifiers only do care
> about particular mappings. Please note that backing off unconditioanlly
> will simply cause that the oom reaper will have to back off not doing
> any tear down anything.

Well, I'm working to propose that we do the VA range test under core
mmu notifier code that cannot block and then we simply remove the idea
of blockable from drivers using this new 'range notifier'. 

I think this pretty much solves the concern?

> > However, we could (probably even should) make the drivers fs_reclaim
> > safe.
> > 
> > If that is enough to guarantee progress of OOM, then lets consider
> > something like using current_gfp_context() to force PF_MEMALLOC_NOFS
> > allocation behavior on the driver callback and lockdep to try and keep
> > pushing on the the debugging, and dropping !blocking.
> 
> How are you going to enforce indirect dependency? E.g. a lock that is
> also used in other context which depend on sleepable memory allocation
> to move forward.

You mean like this:

       CPU0                                 CPU1
                                        mutex_lock()
                                        kmalloc(GFP_KERNEL)
                                        mutex_unlock()
  fs_reclaim_acquire()
  mutex_lock() <- illegal: lock dep assertion

?

lockdep handles this - that is what it does, it builds a graph of all
lock dependencies and requires the graph to be acyclic. So mutex_lock
depends on fs_reclaim_lock on CPU1 while on CPU0, fs_reclaim_lock
depends on mutex_lock. This is an ABBA locking cycle and lockdep will
reliably trigger.

So, if we wanted to do this, I'd probably suggest we retool
fs_reclaim's interface be more like

  prevent_gfp_flags(__GFP_IO | __GFP_FS);
  [..]
  restore_gfp_flags(__GFP_IO | __GFP_FS);

Which is lockdep based and follows the indirect lock dependencies you
talked about.

Then OOM and reclaim can specify the different flags they want
blocked.  We could probably use the same API with WQ_MEM_RECLAIM as I
was chatting with Tejun about:

https://www.spinics.net/lists/linux-rdma/msg77362.html

IMHO this stuff is *super complicated* for those of us outside the mm
subsystem, so having some really clear annotations like the above
would go a long way to help understand these special constraints.

I'm pretty sure there are lots of driver bugs related to using the
wrong GFP flags in the kernel.

> Really, this was aimed to give a very simple debugging aid. If it is
> considered to be so controversial, even though I really do not see why,
> then let's just drop it on the floor.

My concern is that the requirement was very unclear. I'm trying to
understand all the bits of how these notifiers work and the exact
semantic of this OOM path have been vauge if it is really some GFP
flag restriction or truely related to not sleeping.

Jason


  reply index

Thread overview: 69+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-08-14 20:20 [PATCH 0/5] hmm & mmu_notifier debug/lockdep annotations Daniel Vetter
2019-08-14 20:20 ` [PATCH 1/5] mm: Check if mmu notifier callbacks are allowed to fail Daniel Vetter
2019-08-14 22:14   ` Andrew Morton
2019-08-14 23:22     ` Jason Gunthorpe
2019-08-14 23:34     ` Ralph Campbell
2019-08-16 17:19   ` Jason Gunthorpe
2019-08-14 20:20 ` [PATCH 2/5] kernel.h: Add non_block_start/end() Daniel Vetter
2019-08-14 20:45   ` Andrew Morton
2019-08-15  6:52     ` Daniel Vetter
2019-08-15  8:44     ` Michal Hocko
2019-08-15 13:04       ` Jason Gunthorpe
2019-08-15 13:12         ` Daniel Vetter
2019-08-15 14:37           ` Jason Gunthorpe
2019-08-15 14:43             ` Daniel Vetter
2019-08-15 15:10               ` Jason Gunthorpe
2019-08-15 16:25                 ` Daniel Vetter
2019-08-15 17:35                   ` Jason Gunthorpe
2019-08-15 17:39                     ` Jerome Glisse
2019-08-15 18:01                       ` Jason Gunthorpe
2019-08-15 18:27                         ` Jerome Glisse
2019-08-15 18:57                           ` Jason Gunthorpe
2019-08-15 16:32                 ` Jerome Glisse
2019-08-15 17:16                   ` Jason Gunthorpe
2019-08-15 17:21                     ` Daniel Vetter
2019-08-15 17:35                       ` Jerome Glisse
2019-08-15 13:24         ` Michal Hocko
2019-08-15 22:15       ` Andrew Morton
2019-08-16  8:24         ` Michal Hocko
2019-08-14 23:58   ` Jason Gunthorpe
2019-08-15  6:58     ` Daniel Vetter
2019-08-15 12:23       ` Jason Gunthorpe
2019-08-15 13:21         ` Michal Hocko
2019-08-15 14:12           ` Jason Gunthorpe
2019-08-15 16:00             ` Michal Hocko
2019-08-15 16:56               ` Jason Gunthorpe [this message]
2019-08-15 17:11                 ` Jerome Glisse
2019-08-15 17:17                   ` Jason Gunthorpe
2019-08-15 17:42                 ` Michal Hocko
2019-08-15 17:57                   ` Jerome Glisse
2019-08-15 18:24                   ` Jason Gunthorpe
2019-08-15 19:05                     ` Michal Hocko
2019-08-15 19:18                       ` Jason Gunthorpe
2019-08-15 19:35                         ` Michal Hocko
2019-08-15 20:13                           ` Jason Gunthorpe
2019-08-16  8:10                             ` Michal Hocko
2019-08-16 12:19                               ` Jason Gunthorpe
2019-08-16 12:26                                 ` Michal Hocko
2019-08-15 20:16                           ` [Intel-gfx] " Daniel Vetter
2019-08-15 20:27                             ` Jason Gunthorpe
2019-08-15 20:49                               ` Daniel Vetter
2019-08-16  1:00                                 ` Jason Gunthorpe
2019-08-16  6:20                                   ` Daniel Vetter
2019-08-16 12:12                                     ` Jason Gunthorpe
2019-08-16 14:11                                       ` Daniel Vetter
2019-08-16 14:38                                         ` Jason Gunthorpe
2019-08-16 16:36                                           ` Daniel Vetter
2019-08-16 16:54                                             ` Jason Gunthorpe
2019-08-16  8:27                             ` Michal Hocko
2019-08-14 20:20 ` [PATCH 3/5] mm, notifier: Catch sleeping/blocking for !blockable Daniel Vetter
2019-08-15  0:00   ` Jason Gunthorpe
2019-08-15  7:02     ` Daniel Vetter
     [not found]       ` <20190815123556.GB21596@ziepe.ca>
2019-08-17 16:09         ` Daniel Vetter
2019-08-14 20:20 ` [PATCH 4/5] mm, notifier: Add a lockdep map for invalidate_range_start Daniel Vetter
2019-08-15  0:09   ` Jason Gunthorpe
2019-08-15  7:10     ` Daniel Vetter
2019-08-15 12:53       ` Jason Gunthorpe
2019-08-14 20:20 ` [PATCH 5/5] mm/hmm: WARN on illegal ->sync_cpu_device_pagetables errors Daniel Vetter
2019-08-15  0:11   ` Jason Gunthorpe
2019-08-15  7:14     ` Daniel Vetter

Reply instructions:

You may reply publically to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190815165631.GK21596@ziepe.ca \
    --to=jgg@ziepe.ca \
    --cc=akpm@linux-foundation.org \
    --cc=andriy.shevchenko@linux.intel.com \
    --cc=christian.koenig@amd.com \
    --cc=daniel.vetter@intel.com \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=feng.tang@intel.com \
    --cc=intel-gfx@lists.freedesktop.org \
    --cc=jannh@google.com \
    --cc=jglisse@redhat.com \
    --cc=keescook@chromium.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@kernel.org \
    --cc=mingo@redhat.com \
    --cc=peterz@infradead.org \
    --cc=rdunlap@infradead.org \
    --cc=rientjes@google.com \
    --cc=tglx@linutronix.de \
    --cc=wvw@google.com \
    --cc=yamada.masahiro@socionext.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Linux-mm Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-mm/0 linux-mm/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-mm linux-mm/ https://lore.kernel.org/linux-mm \
		linux-mm@kvack.org linux-mm@archiver.kernel.org
	public-inbox-index linux-mm


Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kvack.linux-mm


AGPL code for this site: git clone https://public-inbox.org/ public-inbox