From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-4.0 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 20993C433E7 for ; Wed, 2 Sep 2020 11:26:28 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id DE7092065E for ; Wed, 2 Sep 2020 11:26:27 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org DE7092065E Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=suse.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 8692D6B0002; Wed, 2 Sep 2020 07:26:27 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 83F4D6B0003; Wed, 2 Sep 2020 07:26:27 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 77C9E6B0037; Wed, 2 Sep 2020 07:26:27 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0091.hostedemail.com [216.40.44.91]) by kanga.kvack.org (Postfix) with ESMTP id 64C406B0002 for ; Wed, 2 Sep 2020 07:26:27 -0400 (EDT) Received: from smtpin30.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with ESMTP id 2499C181AEF15 for ; Wed, 2 Sep 2020 11:26:27 +0000 (UTC) X-FDA: 77217893214.30.soup95_5e0f881270a0 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin30.hostedemail.com (Postfix) with ESMTP id D9BF0180B3AA7 for ; Wed, 2 Sep 2020 11:26:26 +0000 (UTC) X-HE-Tag: soup95_5e0f881270a0 X-Filterd-Recvd-Size: 4116 Received: from mx2.suse.de (mx2.suse.de [195.135.220.15]) by imf28.hostedemail.com (Postfix) with ESMTP for ; Wed, 2 Sep 2020 11:26:26 +0000 (UTC) X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.221.27]) by mx2.suse.de (Postfix) with ESMTP id 46DC0B18C; Wed, 2 Sep 2020 11:26:26 +0000 (UTC) Date: Wed, 2 Sep 2020 13:26:24 +0200 From: Michal Hocko To: Vlastimil Babka Cc: Pavel Tatashin , Roman Gushchin , Bharata B Rao , "linux-mm@kvack.org" , Andrew Morton , Johannes Weiner , Shakeel Butt , Vladimir Davydov , "linux-kernel@vger.kernel.org" , Kernel Team , Yafang Shao , stable , Linus Torvalds , Sasha Levin , Greg Kroah-Hartman , David Hildenbrand Subject: Re: [PATCH v2 00/28] The new cgroup slab memory controller Message-ID: <20200902112624.GC4617@dhcp22.suse.cz> References: <20200127173453.2089565-1-guro@fb.com> <20200130020626.GA21973@in.ibm.com> <20200130024135.GA14994@xps.DHCP.thefacebook.com> <20200813000416.GA1592467@carbon.dhcp.thefacebook.com> <6469324e-afa2-18b4-81fb-9e96466c1bf3@suse.cz> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <6469324e-afa2-18b4-81fb-9e96466c1bf3@suse.cz> X-Rspamd-Queue-Id: D9BF0180B3AA7 X-Spamd-Result: default: False [0.00 / 100.00] X-Rspamd-Server: rspam01 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Wed 02-09-20 11:53:00, Vlastimil Babka wrote: > On 8/28/20 6:47 PM, Pavel Tatashin wrote: > > There appears to be another problem that is related to the > > cgroup_mutex -> mem_hotplug_lock deadlock described above. > > > > In the original deadlock that I described, the workaround is to > > replace crash dump from piping to Linux traditional save to files > > method. However, after trying this workaround, I still observed > > hardware watchdog resets during machine shutdown. > > > > The new problem occurs for the following reason: upon shutdown systemd > > calls a service that hot-removes memory, and if hot-removing fails for > > Why is that hotremove even needed if we're shutting down? Are there any > (virtualization?) platforms where it makes some difference over plain > shutdown/restart? Yes this sounds quite dubious. > > some reason systemd kills that service after timeout. However, systemd > > is never able to kill the service, and we get hardware reset caused by > > watchdog or a hang during shutdown: > > > > Thread #1: memory hot-remove systemd service > > Loops indefinitely, because if there is something still to be migrated > > this loop never terminates. However, this loop can be terminated via > > signal from systemd after timeout. > > __offline_pages() > > do { > > pfn = scan_movable_pages(pfn, end_pfn); > > # Returns 0, meaning there is nothing available to > > # migrate, no page is PageLRU(page) > > ... > > ret = walk_system_ram_range(start_pfn, end_pfn - start_pfn, > > NULL, check_pages_isolated_cb); > > # Returns -EBUSY, meaning there is at least one PFN that > > # still has to be migrated. > > } while (ret); This shouldn't really happen. What does prevent from this to proceed? Did you manage to catch the specific pfn and what is it used for? start_isolate_page_range and scan_movable_pages should fail if there is any memory that cannot be migrated permanently. This is something that we should focus on when debugging. -- Michal Hocko SUSE Labs