From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.2 required=3.0 tests=HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_SANE_1 autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3F63EC4CECE for ; Mon, 14 Oct 2019 09:39:29 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id F183B20659 for ; Mon, 14 Oct 2019 09:39:28 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org F183B20659 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=redhat.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 90E4A8E0005; Mon, 14 Oct 2019 05:39:28 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 8955C8E0001; Mon, 14 Oct 2019 05:39:28 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 75D5E8E0005; Mon, 14 Oct 2019 05:39:28 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0206.hostedemail.com [216.40.44.206]) by kanga.kvack.org (Postfix) with ESMTP id 4E67C8E0001 for ; Mon, 14 Oct 2019 05:39:28 -0400 (EDT) Received: from smtpin19.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with SMTP id C6D6E18187FDB for ; Mon, 14 Oct 2019 09:39:27 +0000 (UTC) X-FDA: 76041892374.19.kite82_8cc3a9368b53b X-HE-Tag: kite82_8cc3a9368b53b X-Filterd-Recvd-Size: 9894 Received: from mx1.redhat.com (mx1.redhat.com [209.132.183.28]) by imf25.hostedemail.com (Postfix) with ESMTP for ; Mon, 14 Oct 2019 09:39:27 +0000 (UTC) Received: from smtp.corp.redhat.com (int-mx02.intmail.prod.int.phx2.redhat.com [10.5.11.12]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 491D020F6; Mon, 14 Oct 2019 09:39:25 +0000 (UTC) Received: from [10.36.116.28] (ovpn-116-28.ams2.redhat.com [10.36.116.28]) by smtp.corp.redhat.com (Postfix) with ESMTP id A9BBA60BE2; Mon, 14 Oct 2019 09:39:14 +0000 (UTC) Subject: Re: [PATCH v6 05/10] mm/memory_hotplug: Shrink zones when offlining memory To: linux-kernel@vger.kernel.org, Andrew Morton Cc: linux-mm@kvack.org, linux-arm-kernel@lists.infradead.org, linux-ia64@vger.kernel.org, linuxppc-dev@lists.ozlabs.org, linux-s390@vger.kernel.org, linux-sh@vger.kernel.org, x86@kernel.org, Catalin Marinas , Will Deacon , Tony Luck , Fenghua Yu , Benjamin Herrenschmidt , Paul Mackerras , Michael Ellerman , Heiko Carstens , Vasily Gorbik , Christian Borntraeger , Yoshinori Sato , Rich Felker , Dave Hansen , Andy Lutomirski , Peter Zijlstra , Thomas Gleixner , Ingo Molnar , Borislav Petkov , "H. Peter Anvin" , Mark Rutland , Steve Capper , Mike Rapoport , Anshuman Khandual , Yu Zhao , Jun Yao , Robin Murphy , Michal Hocko , Oscar Salvador , "Matthew Wilcox (Oracle)" , Christophe Leroy , "Aneesh Kumar K.V" , Pavel Tatashin , Gerald Schaefer , Halil Pasic , Tom Lendacky , Greg Kroah-Hartman , Masahiro Yamada , Dan Williams , Wei Yang , Qian Cai , Jason Gunthorpe , Logan Gunthorpe , Ira Weiny References: <20191006085646.5768-1-david@redhat.com> <20191006085646.5768-6-david@redhat.com> From: David Hildenbrand Organization: Red Hat GmbH Message-ID: Date: Mon, 14 Oct 2019 11:39:13 +0200 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.1.0 MIME-Version: 1.0 In-Reply-To: <20191006085646.5768-6-david@redhat.com> Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit X-Scanned-By: MIMEDefang 2.79 on 10.5.11.12 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.6.2 (mx1.redhat.com [10.5.110.71]); Mon, 14 Oct 2019 09:39:26 +0000 (UTC) X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On 06.10.19 10:56, David Hildenbrand wrote: > We currently try to shrink a single zone when removing memory. We use the > zone of the first page of the memory we are removing. If that memmap was > never initialized (e.g., memory was never onlined), we will read garbage > and can trigger kernel BUGs (due to a stale pointer): > > :/# [ 23.912993] BUG: unable to handle page fault for address: 000000000000353d > [ 23.914219] #PF: supervisor write access in kernel mode > [ 23.915199] #PF: error_code(0x0002) - not-present page > [ 23.916160] PGD 0 P4D 0 > [ 23.916627] Oops: 0002 [#1] SMP PTI > [ 23.917256] CPU: 1 PID: 7 Comm: kworker/u8:0 Not tainted 5.3.0-rc5-next-20190820+ #317 > [ 23.918900] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.4 > [ 23.921194] Workqueue: kacpi_hotplug acpi_hotplug_work_fn > [ 23.922249] RIP: 0010:clear_zone_contiguous+0x5/0x10 > [ 23.923173] Code: 48 89 c6 48 89 c3 e8 2a fe ff ff 48 85 c0 75 cf 5b 5d c3 c6 85 fd 05 00 00 01 5b 5d c3 0f 1f 840 > [ 23.926876] RSP: 0018:ffffad2400043c98 EFLAGS: 00010246 > [ 23.927928] RAX: 0000000000000000 RBX: 0000000200000000 RCX: 0000000000000000 > [ 23.929458] RDX: 0000000000200000 RSI: 0000000000140000 RDI: 0000000000002f40 > [ 23.930899] RBP: 0000000140000000 R08: 0000000000000000 R09: 0000000000000001 > [ 23.932362] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000140000 > [ 23.933603] R13: 0000000000140000 R14: 0000000000002f40 R15: ffff9e3e7aff3680 > [ 23.934913] FS: 0000000000000000(0000) GS:ffff9e3e7bb00000(0000) knlGS:0000000000000000 > [ 23.936294] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > [ 23.937481] CR2: 000000000000353d CR3: 0000000058610000 CR4: 00000000000006e0 > [ 23.938687] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > [ 23.939889] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 > [ 23.941168] Call Trace: > [ 23.941580] __remove_pages+0x4b/0x640 > [ 23.942303] ? mark_held_locks+0x49/0x70 > [ 23.943149] arch_remove_memory+0x63/0x8d > [ 23.943921] try_remove_memory+0xdb/0x130 > [ 23.944766] ? walk_memory_blocks+0x7f/0x9e > [ 23.945616] __remove_memory+0xa/0x11 > [ 23.946274] acpi_memory_device_remove+0x70/0x100 > [ 23.947308] acpi_bus_trim+0x55/0x90 > [ 23.947914] acpi_device_hotplug+0x227/0x3a0 > [ 23.948714] acpi_hotplug_work_fn+0x1a/0x30 > [ 23.949433] process_one_work+0x221/0x550 > [ 23.950190] worker_thread+0x50/0x3b0 > [ 23.950993] kthread+0x105/0x140 > [ 23.951644] ? process_one_work+0x550/0x550 > [ 23.952508] ? kthread_park+0x80/0x80 > [ 23.953367] ret_from_fork+0x3a/0x50 > [ 23.954025] Modules linked in: > [ 23.954613] CR2: 000000000000353d > [ 23.955248] ---[ end trace 93d982b1fb3e1a69 ]--- > > Instead, shrink the zones when offlining memory or when onlining failed. > Introduce and use remove_pfn_range_from_zone(() for that. We now properly > shrink the zones, even if we have DIMMs whereby > - Some memory blocks fall into no zone (never onlined) > - Some memory blocks fall into multiple zones (offlined+re-onlined) > - Multiple memory blocks that fall into different zones > > Drop the zone parameter (with a potential dubious value) from > __remove_pages() and __remove_section(). > > Cc: Catalin Marinas > Cc: Will Deacon > Cc: Tony Luck > Cc: Fenghua Yu > Cc: Benjamin Herrenschmidt > Cc: Paul Mackerras > Cc: Michael Ellerman > Cc: Heiko Carstens > Cc: Vasily Gorbik > Cc: Christian Borntraeger > Cc: Yoshinori Sato > Cc: Rich Felker > Cc: Dave Hansen > Cc: Andy Lutomirski > Cc: Peter Zijlstra > Cc: Thomas Gleixner > Cc: Ingo Molnar > Cc: Borislav Petkov > Cc: "H. Peter Anvin" > Cc: x86@kernel.org > Cc: Andrew Morton > Cc: Mark Rutland > Cc: Steve Capper > Cc: Mike Rapoport > Cc: Anshuman Khandual > Cc: Yu Zhao > Cc: Jun Yao > Cc: Robin Murphy > Cc: Michal Hocko > Cc: Oscar Salvador > Cc: "Matthew Wilcox (Oracle)" > Cc: Christophe Leroy > Cc: "Aneesh Kumar K.V" > Cc: Pavel Tatashin > Cc: Gerald Schaefer > Cc: Halil Pasic > Cc: Tom Lendacky > Cc: Greg Kroah-Hartman > Cc: Masahiro Yamada > Cc: Dan Williams > Cc: Wei Yang > Cc: Qian Cai > Cc: Jason Gunthorpe > Cc: Logan Gunthorpe > Cc: Ira Weiny > Cc: linux-arm-kernel@lists.infradead.org > Cc: linux-ia64@vger.kernel.org > Cc: linuxppc-dev@lists.ozlabs.org > Cc: linux-s390@vger.kernel.org > Cc: linux-sh@vger.kernel.org > Fixes: d0dc12e86b31 ("mm/memory_hotplug: optimize memory hotplug") @Andrew, can you convert that to Fixes: f1dd2cd13c4b ("mm, memory_hotplug: do not associate hotadded memory to zones until online") # visible after d0dc12e86b319 While adding cc'ing stable@vger.kernel.org # v4.13+ would be nice, I doubt it will be easily possible to backport, as we are missing some prereq patches (e.g., from Oscar like 2c2a5af6fed2 ("mm, memory_hotplug: add nid parameter to arch_remove_memory")). But, it could be done with some work. I think "Cc: stable@vger.kernel.org # v5.0+" could be done more easily. Maybe it's okay to not cc:stable this one. We usually online all memory (except s390x), however, s390x does not remove that memory ever. Devmem with driver reserved memory would be, however, worth backporting this. Thoughts? -- Thanks, David / dhildenb