From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-1.1 required=3.0 tests=DKIMWL_WL_HIGH,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 841BAC4BA3B for ; Thu, 27 Feb 2020 12:03:04 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 5A98424698 for ; Thu, 27 Feb 2020 12:03:04 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1582804984; bh=y7LExek2/HgRvPAIyj/MtNQS2/O/SvinUYhsZt0ZFAU=; h=Date:From:To:Cc:Subject:References:In-Reply-To:List-ID:From; b=1tUWRxCMFMsz9TMSbVHERkffR7ngwcXI8afyHBD31CUdaNG+ySgDY66FuSM5I3s1F Cs2LJeW+VLkFOEP7o684nlgKrYOICaRtcJgAlGgFZP4TWt4dykps8Hs6RfQiK5/jcS Gu1nU35eOH8mR8yUk3Kmt4IaccrOcxJrz8e9bxqk= Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729067AbgB0MDE (ORCPT ); Thu, 27 Feb 2020 07:03:04 -0500 Received: from mail-ot1-f66.google.com ([209.85.210.66]:43392 "EHLO mail-ot1-f66.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1729052AbgB0MDD (ORCPT ); Thu, 27 Feb 2020 07:03:03 -0500 Received: by mail-ot1-f66.google.com with SMTP id p8so2620416oth.10 for ; Thu, 27 Feb 2020 04:03:03 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=s88dqWQXtnCCf1IG7r+WjQFYzYolCKpQ30PYLMnD1DA=; b=RByLiB8ePqj/LxWJ49ihCLmEq5f9a9RYabYUibDUmxBg+sCURqUhGxQxFpnHEtMts/ 730/WVjHVD1WOsKy1g1ydIHqrV/wNQI/45XCrF7SQhtVFlCJR0WBQCC3CyF07s61OuVa hOLSy9sM6pfVRKGP4MGHifval1FhudnhJrZfiNrGSMil6JlJqBgavjpoR17XaO1K9HSJ fDZuULVqaZwIJDD+Qlszi8iDgaSE0aNHpVCyQv0aW6KgzsXeTtvR7njtlkXCzdQj1VOr hZTyUXBHSwRPpin0aH937dVRKyQZIHNO93KvwVLQyJxc19VCJqtjZ+7RIroaMwYYYbbK 9m3Q== X-Gm-Message-State: APjAAAX5ETgPJqgLl3SpaBwrixv6DiNtJ9tvSz1RQAvVg4ybR0Y0z95U i7GRPkEZwzWPgtAlE7bmAuw= X-Google-Smtp-Source: APXvYqza0IEBOPp0BJkDZURKgcDaawSsuC+UWqxqZATJ7ZDU6JrAx7XQPzGdqfAhkKlLFN9tvV1W2Q== X-Received: by 2002:a9d:798e:: with SMTP id h14mr2912908otm.257.1582804982704; Thu, 27 Feb 2020 04:03:02 -0800 (PST) Received: from localhost (prg-ext-pat.suse.com. [213.151.95.130]) by smtp.gmail.com with ESMTPSA id c7sm1867370otm.63.2020.02.27.04.03.00 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 27 Feb 2020 04:03:01 -0800 (PST) Date: Thu, 27 Feb 2020 13:02:59 +0100 From: Michal Hocko To: Vlastimil Babka Cc: Christopher Lameter , Sachin Sant , Pekka Enberg , David Rientjes , Joonsoo Kim , Kirill Tkhai , Linux-Next Mailing List , linuxppc-dev@lists.ozlabs.org Subject: Re: [5.6.0-rc2-next-20200218/powerpc] Boot failure on POWER9 Message-ID: <20200227120259.GD3771@dhcp22.suse.cz> References: <20200218115525.GD4151@dhcp22.suse.cz> <20200218142620.GF4151@dhcp22.suse.cz> <35EE65CF-40E3-4870-AEBC-D326977176DA@linux.vnet.ibm.com> <20200218152441.GH4151@dhcp22.suse.cz> <20200224085812.GB22443@dhcp22.suse.cz> <20200226184152.GQ3771@dhcp22.suse.cz> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: Sender: linux-next-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-next@vger.kernel.org On Wed 26-02-20 22:45:52, Vlastimil Babka wrote: > On 2/26/20 7:41 PM, Michal Hocko wrote: > > On Wed 26-02-20 18:25:28, Cristopher Lameter wrote: > >> On Mon, 24 Feb 2020, Michal Hocko wrote: > >> > >>> Hmm, nasty. Is there any reason why kmalloc_node behaves differently > >>> from the page allocator? > >> > >> The page allocator will do the same thing if you pass GFP_THISNODE and > >> insist on allocating memory from a node that does not exist. > > > > I do not think that the page allocator would blow up even with > > GFP_THISNODE. The allocation would just fail on memory less node. > > > > Besides that kmalloc_node shouldn't really have an implicit GFP_THISNODE > > semantic right? At least I do not see anything like that documented > > anywhere. > > Seems like SLAB at least behaves like the page allocator. See > ____cache_alloc_node() where it basically does: > > page = cache_grow_begin(cachep, gfp_exact_node(flags), nodeid); > ... > if (!page) > fallback_alloc(cachep, flags) > > gfp_exact_node() adds __GFP_THISNODE among other things, so the initial > attempt does try to stick only to the given node. But fallback_alloc() > doesn't. In fact, even if kmalloc_node() was called with __GFP_THISNODE > then it wouldn't work as intended, as fallback_alloc() doesn't get the > nodeid, but instead will use numa_mem_id(). That part could probably be > improved. > > SLUB's ___slab_alloc() has for example this: > if (node != NUMA_NO_NODE && !node_present_pages(node)) Hmm, just a quick note. Shouldn't this be node_managed_pages? In most cases the difference is negligible but I can imagine crazy setups where all present pages are simply consumed. > searchnode = node_to_mem_node(node); > > That's from Joonsoo's 2014 commit a561ce00b09e ("slub: fall back to > node_to_mem_node() node if allocating on memoryless node"), suggesting > that the scenario in this bug report should work. Perhaps it just got > broken unintentionally later. A very good reference. Thanks! > And AFAICS the whole path leading to alloc_slab_page() also doesn't add > __GFP_THISNODE, but will keep it if caller passed it, and ultimately it > does: > > > if (node == NUMA_NO_NODE) > page = alloc_pages(flags, order); > else > page = __alloc_pages_node(node, flags, order); > > So yeah looks like SLUB's kmalloc_node() is supposed to behave like the > page allocator's __alloc_pages_node() and respect __GFP_THISNODE but not > enforce it by itself. There's probably just some missing data structure > initialization somewhere right now for memoryless nodes. Thanks for the confirmation! -- Michal Hocko SUSE Labs From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-1.0 required=3.0 tests=MAILING_LIST_MULTI, SPF_HELO_NONE,SPF_PASS autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0A179C4BA2D for ; Thu, 27 Feb 2020 12:39:56 +0000 (UTC) Received: from lists.ozlabs.org (lists.ozlabs.org [203.11.71.2]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id B94C524699 for ; Thu, 27 Feb 2020 12:39:55 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org B94C524699 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=kernel.org Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=linuxppc-dev-bounces+linuxppc-dev=archiver.kernel.org@lists.ozlabs.org Received: from lists.ozlabs.org (lists.ozlabs.org [IPv6:2401:3900:2:1::3]) by lists.ozlabs.org (Postfix) with ESMTP id 48Ssgp3W8gzDr3G for ; Thu, 27 Feb 2020 23:39:50 +1100 (AEDT) Authentication-Results: lists.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gmail.com (client-ip=209.85.210.67; helo=mail-ot1-f67.google.com; envelope-from=mstsxfx@gmail.com; receiver=) Authentication-Results: lists.ozlabs.org; dmarc=fail (p=none dis=none) header.from=kernel.org Received: from mail-ot1-f67.google.com (mail-ot1-f67.google.com [209.85.210.67]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by lists.ozlabs.org (Postfix) with ESMTPS id 48SrsW5NLlzDqSS for ; Thu, 27 Feb 2020 23:03:10 +1100 (AEDT) Received: by mail-ot1-f67.google.com with SMTP id w6so2681817otk.0 for ; Thu, 27 Feb 2020 04:03:10 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=s88dqWQXtnCCf1IG7r+WjQFYzYolCKpQ30PYLMnD1DA=; b=mcsCv5j8z2k3e+Cbm03JCUWV9e0IGiLAo6WvkqByRnB4IjYnaSHNgnE1+cxTLEnZPO oi2caRsf4WeocNZIHOP5qK++GAObkUuOkYogoC5OTy9w4/DtyOg1kfDKeW8Hbnl+da9k aB5Nqqd4urNvuV0UHTC1ljinWhg3nmYhHJVZJr96Aii+gHhstjAgGvZGxKsg2xcbPRFt EjLJXKaOLy3i2mX+D9Fd93J59eA9zA6T4JgX7th79o56UUd6sYXra0CWX8rVvT3GMnR4 4US6azVqsdsH+2542FyE36y8QFB+ygfzZUj4+aE4PY4Lcs4DkqH1DR5IRLw3xfpwZ+MG n5Iw== X-Gm-Message-State: APjAAAUzCF7o1NtroDJidy34doRqOuMsgGTn3Ho+nzhjkcLgNV0WYPkb 8Uin5ONxo95taws06FyvI0o= X-Google-Smtp-Source: APXvYqza0IEBOPp0BJkDZURKgcDaawSsuC+UWqxqZATJ7ZDU6JrAx7XQPzGdqfAhkKlLFN9tvV1W2Q== X-Received: by 2002:a9d:798e:: with SMTP id h14mr2912908otm.257.1582804982704; Thu, 27 Feb 2020 04:03:02 -0800 (PST) Received: from localhost (prg-ext-pat.suse.com. [213.151.95.130]) by smtp.gmail.com with ESMTPSA id c7sm1867370otm.63.2020.02.27.04.03.00 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 27 Feb 2020 04:03:01 -0800 (PST) Date: Thu, 27 Feb 2020 13:02:59 +0100 From: Michal Hocko To: Vlastimil Babka Subject: Re: [5.6.0-rc2-next-20200218/powerpc] Boot failure on POWER9 Message-ID: <20200227120259.GD3771@dhcp22.suse.cz> References: <20200218115525.GD4151@dhcp22.suse.cz> <20200218142620.GF4151@dhcp22.suse.cz> <35EE65CF-40E3-4870-AEBC-D326977176DA@linux.vnet.ibm.com> <20200218152441.GH4151@dhcp22.suse.cz> <20200224085812.GB22443@dhcp22.suse.cz> <20200226184152.GQ3771@dhcp22.suse.cz> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-BeenThere: linuxppc-dev@lists.ozlabs.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Linux on PowerPC Developers Mail List List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Sachin Sant , Pekka Enberg , Kirill Tkhai , Linux-Next Mailing List , David Rientjes , Christopher Lameter , linuxppc-dev@lists.ozlabs.org, Joonsoo Kim Errors-To: linuxppc-dev-bounces+linuxppc-dev=archiver.kernel.org@lists.ozlabs.org Sender: "Linuxppc-dev" On Wed 26-02-20 22:45:52, Vlastimil Babka wrote: > On 2/26/20 7:41 PM, Michal Hocko wrote: > > On Wed 26-02-20 18:25:28, Cristopher Lameter wrote: > >> On Mon, 24 Feb 2020, Michal Hocko wrote: > >> > >>> Hmm, nasty. Is there any reason why kmalloc_node behaves differently > >>> from the page allocator? > >> > >> The page allocator will do the same thing if you pass GFP_THISNODE and > >> insist on allocating memory from a node that does not exist. > > > > I do not think that the page allocator would blow up even with > > GFP_THISNODE. The allocation would just fail on memory less node. > > > > Besides that kmalloc_node shouldn't really have an implicit GFP_THISNODE > > semantic right? At least I do not see anything like that documented > > anywhere. > > Seems like SLAB at least behaves like the page allocator. See > ____cache_alloc_node() where it basically does: > > page = cache_grow_begin(cachep, gfp_exact_node(flags), nodeid); > ... > if (!page) > fallback_alloc(cachep, flags) > > gfp_exact_node() adds __GFP_THISNODE among other things, so the initial > attempt does try to stick only to the given node. But fallback_alloc() > doesn't. In fact, even if kmalloc_node() was called with __GFP_THISNODE > then it wouldn't work as intended, as fallback_alloc() doesn't get the > nodeid, but instead will use numa_mem_id(). That part could probably be > improved. > > SLUB's ___slab_alloc() has for example this: > if (node != NUMA_NO_NODE && !node_present_pages(node)) Hmm, just a quick note. Shouldn't this be node_managed_pages? In most cases the difference is negligible but I can imagine crazy setups where all present pages are simply consumed. > searchnode = node_to_mem_node(node); > > That's from Joonsoo's 2014 commit a561ce00b09e ("slub: fall back to > node_to_mem_node() node if allocating on memoryless node"), suggesting > that the scenario in this bug report should work. Perhaps it just got > broken unintentionally later. A very good reference. Thanks! > And AFAICS the whole path leading to alloc_slab_page() also doesn't add > __GFP_THISNODE, but will keep it if caller passed it, and ultimately it > does: > > > if (node == NUMA_NO_NODE) > page = alloc_pages(flags, order); > else > page = __alloc_pages_node(node, flags, order); > > So yeah looks like SLUB's kmalloc_node() is supposed to behave like the > page allocator's __alloc_pages_node() and respect __GFP_THISNODE but not > enforce it by itself. There's probably just some missing data structure > initialization somewhere right now for memoryless nodes. Thanks for the confirmation! -- Michal Hocko SUSE Labs