From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <linux-kernel-owner@vger.kernel.org>
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S1760989AbdEVSES (ORCPT <rfc822;w@1wt.eu>);
        Mon, 22 May 2017 14:04:18 -0400
Received: from mx1.redhat.com ([209.132.183.28]:37354 "EHLO mx1.redhat.com"
        rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP
        id S1752289AbdEVSER (ORCPT <rfc822;linux-kernel@vger.kernel.org>);
        Mon, 22 May 2017 14:04:17 -0400
DMARC-Filter: OpenDMARC Filter v1.3.2 mx1.redhat.com A730080471
Authentication-Results: ext-mx04.extmail.prod.ext.phx2.redhat.com; dmarc=none (p=none dis=none) header.from=redhat.com
Authentication-Results: ext-mx04.extmail.prod.ext.phx2.redhat.com; spf=pass smtp.mailfrom=msnitzer@redhat.com
DKIM-Filter: OpenDKIM Filter v2.11.0 mx1.redhat.com A730080471
Date: Mon, 22 May 2017 14:04:15 -0400
From: Mike Snitzer <snitzer@redhat.com>
To: Michal Hocko <mhocko@kernel.org>
Cc: Mikulas Patocka <mpatocka@redhat.com>,
        Junaid Shahid <junaids@google.com>,
        David Rientjes <rientjes@google.com>, Alasdair Kergon <agk@redhat.com>,
        Andrew Morton <akpm@linux-foundation.org>, linux-mm@kvack.org,
        andreslc@google.com, gthelen@google.com, vbabka@suse.cz,
        linux-kernel@vger.kernel.org
Subject: Re: dm ioctl: Restore __GFP_HIGH in copy_params()
Message-ID: <20170522180415.GA25340@redhat.com>
References: <20170518190406.GB2330@dhcp22.suse.cz>
 <alpine.DEB.2.10.1705181338090.132717@chino.kir.corp.google.com>
 <1508444.i5EqlA1upv@js-desktop.svl.corp.google.com>
 <20170519074647.GC13041@dhcp22.suse.cz>
 <alpine.LRH.2.02.1705191934340.17646@file01.intranet.prod.int.rdu2.redhat.com>
 <20170522093725.GF8509@dhcp22.suse.cz>
 <alpine.LRH.2.02.1705220759001.27401@file01.intranet.prod.int.rdu2.redhat.com>
 <20170522120937.GI8509@dhcp22.suse.cz>
 <alpine.LRH.2.02.1705221026430.20076@file01.intranet.prod.int.rdu2.redhat.com>
 <20170522150321.GM8509@dhcp22.suse.cz>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <20170522150321.GM8509@dhcp22.suse.cz>
User-Agent: Mutt/1.5.21 (2010-09-15)
X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.28]); Mon, 22 May 2017 18:04:16 +0000 (UTC)
Sender: linux-kernel-owner@vger.kernel.org
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org

On Mon, May 22 2017 at 11:03am -0400,
Michal Hocko <mhocko@kernel.org> wrote:

> On Mon 22-05-17 10:52:44, Mikulas Patocka wrote:
> > 
> > 
> > On Mon, 22 May 2017, Michal Hocko wrote:
> [...] 
> > > I am not sure I understand. OOM killer is invoked for _all_ allocations
> > > <= PAGE_ALLOC_COSTLY_ORDER that do not have __GFP_NORETRY as long as the
> > > OOM killer is not disabled (oom_killer_disable) and that only happens
> > > from the PM suspend path which makes sure that no userspace is active at
> > > the time. AFAIU this is a userspace triggered path and so the later
> > > shouldn't apply to it and GFP_KERNEL should be therefore sufficient.
> > > Relying to a portion of memory reserves to prevent from deadlock seems
> > > fundamentaly broken  to me.
> > > 
> > 
> > The lvm2 was designed this way - it is broken, but there is not much that 
> > can be done about it - fixing this would mean major rewrite. The only 
> > thing we can do about it is to lower the deadlock probability with 
> > __GFP_HIGH (or PF_MEMALLOC that was used some times ago).

Yes, lvm2 was originally designed to to have access to memory reserves
to ensure forward progress.  But if the mm subsystem has improved to
allow for the required progress without lvm2 trying to stake a claim on
those reserves then we'll gladly avoid (ab)using them.

> But let me repeat. GFP_KERNEL allocation for order-0 page will not fail.

OK, but will it be serviced immediately?  Not failing isn't useful if it
never completes.

> If you need non-failing semantic then just make it clear by adding
> __GFP_NOFAIL rather than __GFP_HIGH. Memory reserves are a scarce
> resource and there are users which might really need it from atomic
> contexts.

While adding the __GFP_NOFAIL flag would serve to document expectations
I'm left unconvinced that the memory allocator will _not fail_ for an
order-0 page -- as Mikulas said most ioctls don't need more than 4K.
(Apologies if you've already covered _why_ we can have confidence in the
mm subsystem's ability to ensure forward progress for these allocations).

Mike