linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Sasha Levin <sashal@kernel.org>
To: linux-kernel@vger.kernel.org, stable@vger.kernel.org
Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com>,
	Michael Ellerman <mpe@ellerman.id.au>,
	Sasha Levin <sashal@kernel.org>,
	linuxppc-dev@lists.ozlabs.org
Subject: [PATCH AUTOSEL 5.2 21/59] powerpc/nvdimm: Pick nearby online node if the device node is not online
Date: Tue,  6 Aug 2019 17:32:41 -0400	[thread overview]
Message-ID: <20190806213319.19203-21-sashal@kernel.org> (raw)
In-Reply-To: <20190806213319.19203-1-sashal@kernel.org>

From: "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com>

[ Upstream commit da1115fdbd6e86c62185cdd2b4bf7add39f2f82b ]

Currently, nvdimm subsystem expects the device numa node for SCM device to be
an online node. It also doesn't try to bring the device numa node online. Hence
if we use a non-online numa node as device node we hit crashes like below. This
is because we try to access uninitialized NODE_DATA in different code paths.

cpu 0x0: Vector: 300 (Data Access) at [c0000000fac53170]
    pc: c0000000004bbc50: ___slab_alloc+0x120/0xca0
    lr: c0000000004bc834: __slab_alloc+0x64/0xc0
    sp: c0000000fac53400
   msr: 8000000002009033
   dar: 73e8
 dsisr: 80000
  current = 0xc0000000fabb6d80
  paca    = 0xc000000003870000   irqmask: 0x03   irq_happened: 0x01
    pid   = 7, comm = kworker/u16:0
Linux version 5.2.0-06234-g76bd729b2644 (kvaneesh@ltc-boston123) (gcc version 7.4.0 (Ubuntu 7.4.0-1ubuntu1~18.04.1)) #135 SMP Thu Jul 11 05:36:30 CDT 2019
enter ? for help
[link register   ] c0000000004bc834 __slab_alloc+0x64/0xc0
[c0000000fac53400] c0000000fac53480 (unreliable)
[c0000000fac53500] c0000000004bc818 __slab_alloc+0x48/0xc0
[c0000000fac53560] c0000000004c30a0 __kmalloc_node_track_caller+0x3c0/0x6b0
[c0000000fac535d0] c000000000cfafe4 devm_kmalloc+0x74/0xc0
[c0000000fac53600] c000000000d69434 nd_region_activate+0x144/0x560
[c0000000fac536d0] c000000000d6b19c nd_region_probe+0x17c/0x370
[c0000000fac537b0] c000000000d6349c nvdimm_bus_probe+0x10c/0x230
[c0000000fac53840] c000000000cf3cc4 really_probe+0x254/0x4e0
[c0000000fac538d0] c000000000cf429c driver_probe_device+0x16c/0x1e0
[c0000000fac53950] c000000000cf0b44 bus_for_each_drv+0x94/0x130
[c0000000fac539b0] c000000000cf392c __device_attach+0xdc/0x200
[c0000000fac53a50] c000000000cf231c bus_probe_device+0x4c/0xf0
[c0000000fac53a90] c000000000ced268 device_add+0x528/0x810
[c0000000fac53b60] c000000000d62a58 nd_async_device_register+0x28/0xa0
[c0000000fac53bd0] c0000000001ccb8c async_run_entry_fn+0xcc/0x1f0
[c0000000fac53c50] c0000000001bcd9c process_one_work+0x46c/0x860
[c0000000fac53d20] c0000000001bd4f4 worker_thread+0x364/0x5f0
[c0000000fac53db0] c0000000001c7260 kthread+0x1b0/0x1c0
[c0000000fac53e20] c00000000000b954 ret_from_kernel_thread+0x5c/0x68

The patch tries to fix this by picking the nearest online node as the SCM node.
This does have a problem of us losing the information that SCM node is
equidistant from two other online nodes. If applications need to understand these
fine-grained details we should express then like x86 does via
/sys/devices/system/node/nodeX/accessY/initiators/

With the patch we get

 # numactl -H
available: 2 nodes (0-1)
node 0 cpus:
node 0 size: 0 MB
node 0 free: 0 MB
node 1 cpus: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31
node 1 size: 130865 MB
node 1 free: 129130 MB
node distances:
node   0   1
  0:  10  20
  1:  20  10
 # cat /sys/bus/nd/devices/region0/numa_node
0
 # dmesg | grep papr_scm
[   91.332305] papr_scm ibm,persistent-memory:ibm,pmemory@44104001: Region registered with target node 2 and online node 0

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20190729095128.23707-1-aneesh.kumar@linux.ibm.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 arch/powerpc/platforms/pseries/papr_scm.c | 29 +++++++++++++++++++++--
 1 file changed, 27 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/platforms/pseries/papr_scm.c b/arch/powerpc/platforms/pseries/papr_scm.c
index 96c53b23e58f9..30af5019a68fe 100644
--- a/arch/powerpc/platforms/pseries/papr_scm.c
+++ b/arch/powerpc/platforms/pseries/papr_scm.c
@@ -194,12 +194,32 @@ static const struct attribute_group *papr_scm_dimm_groups[] = {
 	NULL,
 };
 
+static inline int papr_scm_node(int node)
+{
+	int min_dist = INT_MAX, dist;
+	int nid, min_node;
+
+	if ((node == NUMA_NO_NODE) || node_online(node))
+		return node;
+
+	min_node = first_online_node;
+	for_each_online_node(nid) {
+		dist = node_distance(node, nid);
+		if (dist < min_dist) {
+			min_dist = dist;
+			min_node = nid;
+		}
+	}
+	return min_node;
+}
+
 static int papr_scm_nvdimm_init(struct papr_scm_priv *p)
 {
 	struct device *dev = &p->pdev->dev;
 	struct nd_mapping_desc mapping;
 	struct nd_region_desc ndr_desc;
 	unsigned long dimm_flags;
+	int target_nid, online_nid;
 
 	p->bus_desc.ndctl = papr_scm_ndctl;
 	p->bus_desc.module = THIS_MODULE;
@@ -238,8 +258,10 @@ static int papr_scm_nvdimm_init(struct papr_scm_priv *p)
 
 	memset(&ndr_desc, 0, sizeof(ndr_desc));
 	ndr_desc.attr_groups = region_attr_groups;
-	ndr_desc.numa_node = dev_to_node(&p->pdev->dev);
-	ndr_desc.target_node = ndr_desc.numa_node;
+	target_nid = dev_to_node(&p->pdev->dev);
+	online_nid = papr_scm_node(target_nid);
+	ndr_desc.numa_node = online_nid;
+	ndr_desc.target_node = target_nid;
 	ndr_desc.res = &p->res;
 	ndr_desc.of_node = p->dn;
 	ndr_desc.provider_data = p;
@@ -254,6 +276,9 @@ static int papr_scm_nvdimm_init(struct papr_scm_priv *p)
 				ndr_desc.res, p->dn);
 		goto err;
 	}
+	if (target_nid != online_nid)
+		dev_info(dev, "Region registered with target node %d and online node %d",
+			 target_nid, online_nid);
 
 	return 0;
 
-- 
2.20.1


  parent reply	other threads:[~2019-08-06 21:44 UTC|newest]

Thread overview: 67+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-08-06 21:32 [PATCH AUTOSEL 5.2 01/59] RDMA/hns: Fix sg offset non-zero issue Sasha Levin
2019-08-06 21:32 ` [PATCH AUTOSEL 5.2 02/59] IB/mlx5: Replace kfree with kvfree Sasha Levin
2019-08-06 21:32 ` [PATCH AUTOSEL 5.2 03/59] clk: at91: generated: Truncate divisor to GENERATED_MAX_DIV + 1 Sasha Levin
2019-08-06 21:32 ` [PATCH AUTOSEL 5.2 04/59] clk: sprd: Select REGMAP_MMIO to avoid compile errors Sasha Levin
2019-08-06 21:32 ` [PATCH AUTOSEL 5.2 05/59] clk: renesas: cpg-mssr: Fix reset control race condition Sasha Levin
2019-08-06 21:32 ` [PATCH AUTOSEL 5.2 06/59] dma-mapping: check pfn validity in dma_common_{mmap,get_sgtable} Sasha Levin
2019-08-06 21:32 ` [PATCH AUTOSEL 5.2 07/59] xtensa: fix build for cores with coprocessors Sasha Levin
2019-08-06 21:55   ` Max Filippov
2019-08-18  1:45     ` Sasha Levin
2019-08-06 21:32 ` [PATCH AUTOSEL 5.2 08/59] platform/x86: pcengines-apuv2: Fix softdep statement Sasha Levin
2019-08-06 21:32 ` [PATCH AUTOSEL 5.2 09/59] platform/x86: intel_pmc_core: Add ICL-NNPI support to PMC Core Sasha Levin
2019-08-06 21:32 ` [PATCH AUTOSEL 5.2 10/59] mm/hmm: always return EBUSY for invalid ranges in hmm_range_{fault,snapshot} Sasha Levin
2019-08-06 21:32 ` [PATCH AUTOSEL 5.2 11/59] xen/pciback: remove set but not used variable 'old_state' Sasha Levin
2019-08-06 21:32 ` [PATCH AUTOSEL 5.2 12/59] irqchip/gic-v3-its: Free unused vpt_page when alloc vpe table fail Sasha Levin
2019-08-06 21:32 ` [PATCH AUTOSEL 5.2 13/59] irqchip/irq-imx-gpcv2: Forward irq type to parent Sasha Levin
2019-08-06 21:32 ` [PATCH AUTOSEL 5.2 14/59] f2fs: fix to read source block before invalidating it Sasha Levin
2019-08-06 21:32 ` [PATCH AUTOSEL 5.2 15/59] tools perf beauty: Fix usbdevfs_ioctl table generator to handle _IOC() Sasha Levin
2019-08-06 21:32 ` [PATCH AUTOSEL 5.2 16/59] perf header: Fix divide by zero error if f_header.attr_size==0 Sasha Levin
2019-08-06 21:32 ` [PATCH AUTOSEL 5.2 17/59] perf header: Fix use of unitialized value warning Sasha Levin
2019-08-06 21:32 ` [PATCH AUTOSEL 5.2 18/59] RDMA/qedr: Fix the hca_type and hca_rev returned in device attributes Sasha Levin
2019-08-06 21:32 ` [PATCH AUTOSEL 5.2 19/59] ALSA: pcm: fix lost wakeup event scenarios in snd_pcm_drain Sasha Levin
2019-08-06 21:32 ` [PATCH AUTOSEL 5.2 20/59] libata: zpodd: Fix small read overflow in zpodd_get_mech_type() Sasha Levin
2019-08-06 21:32 ` Sasha Levin [this message]
2019-08-06 21:32 ` [PATCH AUTOSEL 5.2 22/59] drm/bridge: lvds-encoder: Fix build error while CONFIG_DRM_KMS_HELPER=m Sasha Levin
2019-08-06 21:32 ` [PATCH AUTOSEL 5.2 23/59] drm/bridge: tc358764: Fix build error Sasha Levin
2019-08-06 21:32 ` [PATCH AUTOSEL 5.2 24/59] Btrfs: fix deadlock between fiemap and transaction commits Sasha Levin
2019-08-06 21:32 ` [PATCH AUTOSEL 5.2 25/59] scsi: hpsa: correct scsi command status issue after reset Sasha Levin
2019-08-06 21:32 ` [PATCH AUTOSEL 5.2 26/59] scsi: qla2xxx: Fix possible fcport null-pointer dereferences Sasha Levin
2019-08-06 21:32 ` [PATCH AUTOSEL 5.2 27/59] exit: make setting exit_state consistent Sasha Levin
2019-08-06 21:32 ` [PATCH AUTOSEL 5.2 28/59] tracing: Fix header include guards in trace event headers Sasha Levin
2019-08-06 21:32 ` [PATCH AUTOSEL 5.2 29/59] drm/amdkfd: Fix byte align on VegaM Sasha Levin
2019-08-06 21:32 ` [PATCH AUTOSEL 5.2 30/59] drm/amd/powerplay: fix null pointer dereference around dpm state relates Sasha Levin
2019-08-06 21:32 ` [PATCH AUTOSEL 5.2 31/59] drm/amdgpu: fix error handling in amdgpu_cs_process_fence_dep Sasha Levin
2019-08-06 21:32 ` [PATCH AUTOSEL 5.2 32/59] drm/amdgpu: fix a potential information leaking bug Sasha Levin
2019-08-06 21:32 ` [PATCH AUTOSEL 5.2 33/59] ata: libahci: do not complain in case of deferred probe Sasha Levin
2019-08-06 21:32 ` [PATCH AUTOSEL 5.2 34/59] kbuild: modpost: handle KBUILD_EXTRA_SYMBOLS only for external modules Sasha Levin
2019-08-06 21:32 ` [PATCH AUTOSEL 5.2 35/59] kbuild: Check for unknown options with cc-option usage in Kconfig and clang Sasha Levin
2019-08-06 21:32 ` [PATCH AUTOSEL 5.2 36/59] arm64/efi: fix variable 'si' set but not used Sasha Levin
2019-08-06 21:32 ` [PATCH AUTOSEL 5.2 37/59] drm/vgem: fix cache synchronization on arm/arm64 Sasha Levin
2019-08-06 22:45   ` Rob Clark
2019-08-18  1:45     ` Sasha Levin
2019-08-06 21:32 ` [PATCH AUTOSEL 5.2 38/59] riscv: Fix perf record without libelf support Sasha Levin
2019-08-06 21:32 ` [PATCH AUTOSEL 5.2 39/59] i2c: iproc: Fix i2c master read more than 63 bytes Sasha Levin
2019-08-06 21:33 ` [PATCH AUTOSEL 5.2 40/59] arm64: Lower priority mask for GIC_PRIO_IRQON Sasha Levin
2019-08-06 21:33 ` [PATCH AUTOSEL 5.2 41/59] arm64: unwind: Prohibit probing on return_address() Sasha Levin
2019-08-06 21:33 ` [PATCH AUTOSEL 5.2 42/59] arm64/mm: fix variable 'pud' set but not used Sasha Levin
2019-08-06 21:33 ` [PATCH AUTOSEL 5.2 43/59] arm64/mm: fix variable 'tag' " Sasha Levin
2019-08-06 21:33 ` [PATCH AUTOSEL 5.2 44/59] IB/core: Add mitigation for Spectre V1 Sasha Levin
2019-08-06 21:33 ` [PATCH AUTOSEL 5.2 45/59] IB/mlx5: Fix MR registration flow to use UMR properly Sasha Levin
2019-08-06 21:33 ` [PATCH AUTOSEL 5.2 46/59] RDMA/restrack: Track driver QP types in resource tracker Sasha Levin
2019-08-06 21:33 ` [PATCH AUTOSEL 5.2 47/59] IB/mad: Fix use-after-free in ib mad completion handling Sasha Levin
2019-08-06 21:33 ` [PATCH AUTOSEL 5.2 48/59] RDMA/mlx5: Release locks during notifier unregister Sasha Levin
2019-08-06 21:33 ` [PATCH AUTOSEL 5.2 49/59] drm: msm: Fix add_gpu_components Sasha Levin
2019-08-06 21:33 ` [PATCH AUTOSEL 5.2 50/59] RDMA/hns: Fix error return code in hns_roce_v1_rsv_lp_qp() Sasha Levin
2019-08-06 21:33 ` [PATCH AUTOSEL 5.2 51/59] drm/exynos: fix missing decrement of retry counter Sasha Levin
2019-08-07  8:49   ` David Laight
2019-08-18  1:47     ` Sasha Levin
2019-08-06 21:33 ` [PATCH AUTOSEL 5.2 52/59] arm64: kprobes: Recover pstate.D in single-step exception handler Sasha Levin
2019-08-06 21:33 ` [PATCH AUTOSEL 5.2 53/59] arm64: Make debug exception handlers visible from RCU Sasha Levin
2019-08-06 21:33 ` [PATCH AUTOSEL 5.2 54/59] Revert "kmemleak: allow to coexist with fault injection" Sasha Levin
2019-08-06 21:33 ` [PATCH AUTOSEL 5.2 55/59] ocfs2: remove set but not used variable 'last_hash' Sasha Levin
2019-08-06 21:33 ` [PATCH AUTOSEL 5.2 56/59] page flags: prioritize kasan bits over last-cpuid Sasha Levin
2019-08-06 21:33 ` [PATCH AUTOSEL 5.2 57/59] coredump: split pipe command whitespace before expanding template Sasha Levin
2019-08-07  1:41   ` Paul Wise
2019-08-18  1:48     ` Sasha Levin
2019-08-06 21:33 ` [PATCH AUTOSEL 5.2 58/59] asm-generic: fix -Wtype-limits compiler warnings Sasha Levin
2019-08-06 21:33 ` [PATCH AUTOSEL 5.2 59/59] tpm: tpm_ibm_vtpm: Fix unallocated banks Sasha Levin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190806213319.19203-21-sashal@kernel.org \
    --to=sashal@kernel.org \
    --cc=aneesh.kumar@linux.ibm.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linuxppc-dev@lists.ozlabs.org \
    --cc=mpe@ellerman.id.au \
    --cc=stable@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).