From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-6.8 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI, SIGNED_OFF_BY,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8300BC433DF for ; Thu, 20 Aug 2020 08:14:08 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 386D820855 for ; Thu, 20 Aug 2020 08:14:08 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=kernel.org header.i=@kernel.org header.b="eW9pvwPB" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 386D820855 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=linuxfoundation.org Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id D768D6B0006; Thu, 20 Aug 2020 04:14:07 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id CFF7B6B0007; Thu, 20 Aug 2020 04:14:07 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id BC82C6B0008; Thu, 20 Aug 2020 04:14:07 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0177.hostedemail.com [216.40.44.177]) by kanga.kvack.org (Postfix) with ESMTP id 9F9996B0006 for ; Thu, 20 Aug 2020 04:14:07 -0400 (EDT) Received: from smtpin12.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with ESMTP id 5EBE0362F for ; Thu, 20 Aug 2020 08:14:07 +0000 (UTC) X-FDA: 77170234134.12.glove35_47087b72702f Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin12.hostedemail.com (Postfix) with ESMTP id 3777D18017187 for ; Thu, 20 Aug 2020 08:14:07 +0000 (UTC) X-HE-Tag: glove35_47087b72702f X-Filterd-Recvd-Size: 4417 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by imf19.hostedemail.com (Postfix) with ESMTP for ; Thu, 20 Aug 2020 08:14:06 +0000 (UTC) Received: from localhost (83-86-89-107.cable.dynamic.v4.ziggo.nl [83.86.89.107]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id 587482080C; Thu, 20 Aug 2020 08:14:05 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1597911245; bh=a9EMwQOztoal5ntoGgZ/p/FzC9zlwjl92x//5B9Ochk=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=eW9pvwPBfY+qYb/gemG5QOA7N1uyTNYP+mNSMeQmos8yVCzHCEqbTinPl8TzkGrAX 6OPtk4EdmS6eiVfN1XuryaR0wvVQigsATX7kyZHXT/Esm28TCwamzbDTFR64J6PQXK NS48bWuLKQQ+/J90UJ+DtjdqqAXhsEWM/thsH22g= Date: Thu, 20 Aug 2020 10:14:26 +0200 From: Greg KH To: Michal Hocko Cc: Oscar Salvador , stable@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, vbabka@suse.com, david@redhat.com, Vlastimil Babka Subject: Re: [PATCH STABLE 4.9] mm: Avoid calling build_all_zonelists_init under hotplug context Message-ID: <20200820081426.GF4049659@kroah.com> References: <20200818110046.6664-1-osalvador@suse.de> <20200818122446.GA15067@dhcp22.suse.cz> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20200818122446.GA15067@dhcp22.suse.cz> X-Rspamd-Queue-Id: 3777D18017187 X-Spamd-Result: default: False [0.00 / 100.00] X-Rspamd-Server: rspam04 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Tue, Aug 18, 2020 at 02:24:46PM +0200, Michal Hocko wrote: > On Tue 18-08-20 13:00:46, Oscar Salvador wrote: > > Recently a customer of ours experienced a crash when booting the > > system while enabling memory-hotplug. > > > > The problem is that Normal zones on different nodes don't get their private > > zone->pageset allocated, and keep sharing the initial boot_pageset. > > The sharing between zones is normally safe as explained by the comment for > > boot_pageset - it's a percpu structure, and manipulations are done with > > disabled interrupts, and boot_pageset is set up in a way that any page placed > > on its pcplist is immediately flushed to shared zone's freelist, because > > pcp->high == 1. > > However, the hotplug operation updates pcp->high to a higher value as it > > expects to be operating on a private pageset. > > > > The problem is in build_all_zonelists(), which is called when the first range > > of pages is onlined for the Normal zone of node X or Y: > > > > if (system_state == SYSTEM_BOOTING) { > > build_all_zonelists_init(); > > } else { > > #ifdef CONFIG_MEMORY_HOTPLUG > > if (zone) > > setup_zone_pageset(zone); > > #endif > > /* we have to stop all cpus to guarantee there is no user > > of zonelist */ > > stop_machine(__build_all_zonelists, pgdat, NULL); > > /* cpuset refresh routine should be here */ > > } > > > > When called during hotplug, it should execute the setup_zone_pageset(zone) > > which allocates the private pageset. > > However, with memhp_default_state=online, this happens early while > > system_state == SYSTEM_BOOTING is still true, hence this step is skipped. > > (and build_all_zonelists_init() is probably unsafe anyway at this point). > > > > Another hotplug operation on the same zone then leads to zone_pcp_update(zone) > > called from online_pages(), which updates the pcp->high for the shared > > boot_pageset to a value higher than 1. > > At that point, pages freed from Node X and Y Normal zones can end up on the same > > pcplist and from there they can be freed to the wrong zone's freelist, > > leading to the corruption and crashes. > > > > Please, note that upstream has fixed that differently (and unintentionally) by > > adding another boot state (SYSTEM_SCHEDULING), which is set before smp_init(). > > That should happen before memory hotplug events even with memhp_default_state=online. > > Backporting that would be too intrusive. > > > > Signed-off-by: Oscar Salvador > > Debugged-by: Vlastimil Babka > > Yes, I believe this is the easiest and the least scary way to fix the > issue for stable kernel users. Feel free to add > Acked-by: Michal Hocko # for stable trees > > for that purpose. Now queued up, thanks! greg k-h