From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id A2781C5475B for ; Wed, 6 Mar 2024 10:29:17 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 2423A6B0071; Wed, 6 Mar 2024 05:29:17 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 1F14E6B007E; Wed, 6 Mar 2024 05:29:17 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id F37D86B0081; Wed, 6 Mar 2024 05:29:16 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id D8CF86B0071 for ; Wed, 6 Mar 2024 05:29:16 -0500 (EST) Received: from smtpin02.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id B28E5403AA for ; Wed, 6 Mar 2024 10:29:16 +0000 (UTC) X-FDA: 81866241912.02.8999DAF Received: from esa5.hc1455-7.c3s2.iphmx.com (esa5.hc1455-7.c3s2.iphmx.com [68.232.139.130]) by imf10.hostedemail.com (Postfix) with ESMTP id 97751C0007 for ; Wed, 6 Mar 2024 10:29:13 +0000 (UTC) Authentication-Results: imf10.hostedemail.com; dkim=pass header.d=fujitsu.com header.s=fj2 header.b=bOOoL+v6; dmarc=pass (policy=quarantine) header.from=fujitsu.com; spf=pass (imf10.hostedemail.com: domain of lizhijian@fujitsu.com designates 68.232.139.130 as permitted sender) smtp.mailfrom=lizhijian@fujitsu.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1709720953; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:references:dkim-signature; bh=1/amipUbVc9NPsY4I025YcxF7k4pwbvvXF8lVahttxs=; b=Tm4J4Nl9vxGuGx63B1bpYCfRfVWXAXNtMD+W7rl5QYJnB/7apISEqH4wMcGQZs7jfvU1XN 8ASZn1rhE2o8nxTrEtn2HVJeu1Q+nBbzi+6i0bwnIYR28JuAVMgDtODEz/ucCoVOlgPCmw HlIgLZa5Hm856pw9UynobtGkRhon638= ARC-Authentication-Results: i=1; imf10.hostedemail.com; dkim=pass header.d=fujitsu.com header.s=fj2 header.b=bOOoL+v6; dmarc=pass (policy=quarantine) header.from=fujitsu.com; spf=pass (imf10.hostedemail.com: domain of lizhijian@fujitsu.com designates 68.232.139.130 as permitted sender) smtp.mailfrom=lizhijian@fujitsu.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1709720953; a=rsa-sha256; cv=none; b=e++kDSfOGNoH3d0NASSkvRO2G1QY74P2PbqvONM7KFTOWgKEFOu8Gf0bQmNl4/bkRet53R 0y/VGaXAdyeZPvjxKpGaPZZDyvpwndFj6GiKx/v8f+ErCDrCl68kb+G4q4350/jPLN1PXY tkVTbo7b6OmNnoyOcC9c159ERo0QolY= DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=fujitsu.com; i=@fujitsu.com; q=dns/txt; s=fj2; t=1709720953; x=1741256953; h=from:to:cc:subject:date:message-id:mime-version: content-transfer-encoding; bh=L1uC7nmPQ3VWAXabC3fl20OZJAiFpMalcUh/l8cZZSY=; b=bOOoL+v6ililZ8ZqzNqCbYpDiTxSZEqkXbDK00EwvWiud90QTOin0bKO P6xGRwK7ShBq71pNgQPeJrEA7bI0wZQD2pVqiPLHyb6/JeZir5NWPEU/x b9NKBt0IOAFMN3f2QfbqUTsfC9aBW38EqxHgFNfp201Gf32QyLhaDoEzN iNWmb0+TWwwSSn6X7MAQXiH1ZWwj66+X34fVL9rThzvpsa1pn0bkZizrk IB/MupIhuZ3fNLBlC4SixwCTEWago/KF1EYAppuD+EX9W0XvmxFpgHEEZ XSpYo975aEs/q1E34yZ0TKmYLJP23OHYayow6E2nvBbrcgGn4s4/9ncOm Q==; X-IronPort-AV: E=McAfee;i="6600,9927,11004"; a="150857716" X-IronPort-AV: E=Sophos;i="6.06,208,1705330800"; d="scan'208";a="150857716" Received: from unknown (HELO oym-r2.gw.nic.fujitsu.com) ([210.162.30.90]) by esa5.hc1455-7.c3s2.iphmx.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 06 Mar 2024 19:29:10 +0900 Received: from oym-m3.gw.nic.fujitsu.com (oym-nat-oym-m3.gw.nic.fujitsu.com [192.168.87.60]) by oym-r2.gw.nic.fujitsu.com (Postfix) with ESMTP id C5FABEB464 for ; Wed, 6 Mar 2024 19:29:07 +0900 (JST) Received: from kws-ab3.gw.nic.fujitsu.com (kws-ab3.gw.nic.fujitsu.com [192.51.206.21]) by oym-m3.gw.nic.fujitsu.com (Postfix) with ESMTP id E2613D5628 for ; Wed, 6 Mar 2024 19:29:06 +0900 (JST) Received: from edo.cn.fujitsu.com (edo.cn.fujitsu.com [10.167.33.5]) by kws-ab3.gw.nic.fujitsu.com (Postfix) with ESMTP id 6A643202CB587 for ; Wed, 6 Mar 2024 19:29:06 +0900 (JST) Received: from localhost.localdomain (unknown [10.167.226.45]) by edo.cn.fujitsu.com (Postfix) with ESMTP id 8455D1A006A; Wed, 6 Mar 2024 18:29:04 +0800 (CST) From: Li Zhijian To: linux-kernel@vger.kernel.org Cc: y-goto@fujitsu.com, Alison Schofield , Andrew Morton , Baoquan He , Borislav Petkov , Dan Williams , Dave Hansen , Dave Jiang , Greg Kroah-Hartman , hpa@zytor.com, Ingo Molnar , Ira Weiny , Thomas Gleixner , Vishal Verma , linux-cxl@vger.kernel.org, linux-mm@kvack.org, nvdimm@lists.linux.dev, x86@kernel.org, kexec@lists.infradead.org, Li Zhijian Subject: [RFC PATCH v3 0/7] device backed vmemmap crash dump support Date: Wed, 6 Mar 2024 18:28:39 +0800 Message-Id: <20240306102846.1020868-1-lizhijian@fujitsu.com> X-Mailer: git-send-email 2.31.1 MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-TM-AS-GCONF: 00 X-TM-AS-Product-Ver: IMSS-9.1.0.1417-9.0.0.1002-28234.006 X-TM-AS-User-Approved-Sender: Yes X-TMASE-Version: IMSS-9.1.0.1417-9.0.1002-28234.006 X-TMASE-Result: 10--12.047900-10.000000 X-TMASE-MatchedRID: WgnE0i2Gnm0S1H+wJnLU0xxQCXaqsX3JICcCYi/y4c1YC5LPd7BvbZVh 8IAdHNwNV9EQFA1Eag8lMmENZ05/+9/K1ikJIsLOF+qQpCWTUjks9yCYjUR6S0u7cLcUs73+UAr JnwHoG3RN6oLBwrNlQwp3Fx4qZRwntpor8DSuIkBIK2DGByysyvrFoSv9vjATUYkZd9+4t2+K/w YsY7aZE7yLKxASOUwjkBUaTCifn9xExKw+e7wVxgPZZctd3P4BX098A7fr3Vfir9ZdeIDkfES3N 7Ud/ZNyw3/C2qEmiGxJ7i3niaFL/WIo6q3zeo5wqhcdnP91eXGdj7YWJLs3nTb+WPFOeKNMsX5M tbDfIWiy+iBzQ4hgrIaPRWRVv3Ad5mMeg8RfsnWHmRpBdG9H1zX8AUEphxj0Blnw3dG9MzGjxYy RBa/qJcFwgTvxipFajoczmuoPCq3JpPrv+A3tItKgYcJqG5qWnoRh9cAERsgMkJ+raRGIqWxKck D0+Hnx X-TMASE-SNAP-Result: 1.821001.0001-0-1-22:0,33:0,34:0-0 X-Rspam-User: X-Rspamd-Server: rspam12 X-Rspamd-Queue-Id: 97751C0007 X-Stat-Signature: touoeg6z77yccudpqbotswsxa1cx3c1k X-HE-Tag: 1709720953-61446 X-HE-Meta: U2FsdGVkX1/SC1NT9sMdaZOOtF9KwbL3NTK35LgoAVGiO0t1pbvS8asGAdK7VIwXBrD8bViHmfoEWQHxWu/cZME8LYzvG8lUQLXnfR9klt7mhYjp2bJYOdt+vBPxwPHHleZteB4urtmhJZqU/f3MyU6STOAf5rA3qIhHGBkn53bEutD0oZR2ldRBOBi1B32n29mMk6tB8yusl1vy8LsBJ4ImyZOH975hdvBxHEzH9tFWpNUr3GaCKhsS5CMXEgg/ynIC8R+IdoFH/wGavTqkLeU3fFTVZc/6oNjsr9xSM2/nHtVpxNNZtJD1aQQ+Gtf6PhCChlKcqfAi1if4wIAX9JZeaaJASefr8lD38OiKmkivTSWJ2dgp/huKhdIeThqoMvSckGQVWNbEBTIrUmngm7L4fDRrf3RxBjvqwN4ZanJABpzQLSuBu6CAyqNPvXkD4tRyWk4aAncy5Epjeve2X7TXm1M+Ikve0Act9/ISH4sR07gs171Pd+BW5tkzKb+tOvH7lDM1rnqBwntDfg4QYR0lU9oYrcCs0l2qDTndMJJfvMNR2zO4flicqjcA5Mic2QildJzkK1uxXaiNXU5KPRjqzH6mSTx6zud8RqdBlwpIFRLQmsrNe963+ulF0A25h/Zarv/WQW74itwSYFbzR0PDO9XgysKVXQG+vy20JAJIWq49TxhnAQcOB6phbhWRk7ahKPPh6/pVnxVbrlGDtzmNloO62y+y03+R+G/nl90vbxE3NPq6TFVfdJkgOLCNL7p9122SOGtGBlT3fGeeD6X0IqMeGfkJgSDWEsx7lKeTE4xJGc/Eb6j3UYjWIHcR89PtsHktyKGB5lyxwEopRENJdvqAByPkpNgnEXVPZAUvof+OoOMDBM8kSHYm/2dlmYubw5BQCc3GiiLAFSD4efcBLiSWwU6+aycW32wXBOvoP1ylCNj/UJl0UMTfPPXC5+5G8vOODDbkciZyPih 7KQ== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Hello folks, Compared with the V2[1] I posted a long time ago, this time it is a completely new proposal design. ### Background and motivate overview ### --- Crash dump is an important feature for troubleshooting the kernel. It is the final way to chase what happened at the kernel panic, slowdown, and so on. It is one of the most important tools for customer support. Currently, there are 2 syscalls(kexec_file_load(2) and kexec_load(2)) to configure the dumpable regions. Generally, (A)iomem resources registered with flags (IORESOURCE_SYSTEM_RAM | IORESOUCE_BUSY) for kexec_file_load(2) or (B)iomem resources registered with "System RAM" name prefix for kexec_load(2) are dumpable. The pmem use cases including fsdax and devdax, could map their vmemmap to their own devices. In this case, these part of vmemmap will not be dumped when crash happened since these regions are satisfied with neither the above (A) nor (B). In fsdax, the vmemmap(struct page array) becomes very important, it is one of the key data to find status of reverse map. Lacking of the information may cause difficulty to analyze trouble around pmem (especially Filesystem-DAX). That means troubleshooters are unable to check more details about pmem from the dumpfile. ### Proposal ### --- In this proposal, register the device backed vmemmap as a separate resource. This resource has its own new flag and name, and then teaches kexec_file_load(2) and kexec_load(2) to mark it as dumpable. Proposed flag: IORESOURCE_DEVICE_BACKED_VMEMMAP Proposed name: "Device Backed Vmemmap" NOTE: crash-utils also needs to adapt to this new name for kexec_load() With current proposal, the /proc/iomem should show as following for device backed vmemmap # cat /proc/iomem ... fffc0000-ffffffff : Reserved 100000000-13fffffff : Persistent Memory 100000000-10fffffff : namespace0.0 100000000-1005fffff : Device Backed Vmemmap # fsdax a80000000-b7fffffff : CXL Window 0 a80000000-affffffff : Persistent Memory a80000000-affffffff : region1 a80000000-a811fffff : namespace1.0 a80000000-a811fffff : Device Backed Vmemmap # devdax a81200000-abfffffff : dax1.0 b80000000-c7fffffff : CXL Window 1 c80000000-147fffffff : PCI Bus 0000:00 c80000000-c801fffff : PCI Bus 0000:01 ... ### Kdump service reloading ### --- Once the kdump service is loaded, if changes to CPUs or memory occur, either by hot un/plug or off/onlining, the crash elfcorehdr should also be updated. There are 2 approaches to make the reloading more efficient. 1) Use udev rules to watch CPU and memory events, then reload kdump 2) Enable kernel crash hotplug to automatically reload elfcorehdr (>= 6.5) This reloading also needed when device backed vmemmap layouts change, Similar to what 1) does now, one could add the following as the first lines to the RHEL udev rule file /usr/lib/udev/rules.d/98-kexec.rules: # namespace updated: watch daxX.Y(devdax) and pfnX.Y(fsdax) of nd SUBSYSTEM=="nd", KERNEL=="[dp][af][xn][0-9].*", ACTION=="bind", GOTO="kdump_reload" SUBSYSTEM=="nd", KERNEL=="[dp][af][xn][0-9].*", ACTION=="unbind", GOTO="kdump_reload" # devdax <-> system-ram updated: watch daxX.Y of dax SUBSYSTEM=="dax", KERNEL=="dax[0-9].*", ACTION=="bind", GOTO="kdump_reload" SUBSYSTEM=="dax", KERNEL=="dax[0-9].*", ACTION=="unbind", GOTO="kdump_reload" Regarding 2), my idea is that it would need to call the memory_notify() in devm_memremap_pages_release() and devm_memremap_pages() to trigger the crash hotplug. This part is not yet mature, but it does not affect the whole feature because we can still use method 1) alternatively. [1] https://lore.kernel.org/lkml/02066f0f-dbc0-0388-4233-8e24b6f8435b@fujitsu.com/T/ -------------------------------------------- changes from V2[1] - new proposal design CC: Alison Schofield CC: Andrew Morton CC: Baoquan He CC: Borislav Petkov CC: Dan Williams CC: Dave Hansen CC: Dave Jiang CC: Greg Kroah-Hartman CC: "H. Peter Anvin" CC: Ingo Molnar CC: Ira Weiny CC: Thomas Gleixner CC: Vishal Verma CC: linux-cxl@vger.kernel.org CC: linux-mm@kvack.org CC: nvdimm@lists.linux.dev CC: x86@kernel.org Li Zhijian (7): mm: memremap: register/unregister altmap region to a separate resource mm: memremap: add pgmap_parent_resource() helper nvdimm: pmem: assign a parent resource for vmemmap region for the fsdax dax: pmem: assign a parent resource for vmemmap region for the devdax resource: Introduce walk device_backed_vmemmap res() helper x86/crash: make device backed vmemmap dumpable for kexec_file_load nvdimm: set force_raw=1 in kdump kernel arch/x86/kernel/crash.c | 5 +++++ drivers/dax/pmem.c | 8 ++++++-- drivers/nvdimm/namespace_devs.c | 3 +++ drivers/nvdimm/pmem.c | 9 ++++++--- include/linux/ioport.h | 4 ++++ include/linux/memremap.h | 4 ++++ kernel/resource.c | 13 +++++++++++++ mm/memremap.c | 30 +++++++++++++++++++++++++++++- 8 files changed, 70 insertions(+), 6 deletions(-) -- 2.29.2