All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jijie Shao <shaojijie@huawei.com>
To: <will@kernel.org>, <jonathan.cameron@huawei.com>,
	<mark.rutland@arm.com>, <yangyicong@hisilicon.com>
Cc: <shaojijie@huawei.com>, <chenhao418@huawei.com>,
	<shenjian15@huawei.com>, <wangjie125@huawei.com>,
	<liuyonglong@huawei.com>, <linux-kernel@vger.kernel.org>,
	<linux-arm-kernel@lists.infradead.org>
Subject: [PATCH V2 drivers/perf: hisi:] drivers/perf: hisi: use cpuhp_state_remove_instance_nocalls() for hns3 pmu uninit process
Date: Mon, 16 Oct 2023 18:51:39 +0800	[thread overview]
Message-ID: <20231016105139.1436425-1-shaojijie@huawei.com> (raw)

From: Hao Chen <chenhao418@huawei.com>

For pmu uninit process, we unregister pmu and then call cpuhp_state_remove_instance()
to call callback function to migrate pmu context. the logic is unreasonable and
may result in NULL pointer call trace.

Found NULL pointer call trace for kernel-5.13 at the first time, log shows as below:
Unable to handle kernel NULL pointer dereference at virtual address 0000000000000000
[27169.471153] Mem abort info:
[27169.478859]   ESR = 0x96000006
[27169.485117]   EC = 0x25: DABT (current EL), IL = 32 bits
[27169.501386]   SET = 0, FnV = 0
[27169.508038]   EA = 0, S1PTW = 0
[27169.559839] Data abort info:
[27169.567296]   ISV = 0, ISS = 0x00000006
[27169.585332]   CM = 0, WnR = 0
[27169.592627] user pgtable: 4k pages, 48-bit VAs, pgdp=00000820ac0c2000
[27169.615618] [0000000000000000] pgd=00000820af2bf003, p4d=00000820af2bf003, pud=00000820af2bd003, pmd=0000000000000000
[27169.641348] Internal error: Oops: 96000006 [#1] PREEMPT SMP
[27169.649879] Modules linked in: hisi_hns3_pmu(-) hns3 hclge hclgevf hnae3 hns3_cae(O) vfio_iommu_type1 vfio_pci vfio_virqfd vfio pv680_mii(O) [last unloaded: hisi_hns3_pmu]
[27169.675150] CPU: 0 PID: 15 Comm: cpuhp/0 Tainted: G        W  O      5.12.0-rc4+ #1
[27169.686538] Hardware name:  , BIOS KpxxxFPGA 1P B600 V143 04/22/2021
[27169.694884] pstate: 80400009 (Nzcv daif +PAN -UAO -TCO BTYPE=--)
[27169.704199] pc : perf_pmu_migrate_context+0x98/0x38c
[27169.713848] lr : perf_pmu_migrate_context+0x94/0x38c
[27169.723069] sp : ffff80001020bc80
[27169.727938] x29: ffff80001020bc80 x28: 0000000000000000
[27169.737236] x27: 0000000000000001 x26: fffffbffeff951d8
[27169.745245] x25: ffffb7ae1ce36fd0 x24: fffffbffeff95178
[27169.753116] x23: ffff0821f6f82668 x22: 0000000000000001
[27169.760949] x21: ffff80001020bd08 x20: fffffbffeffb5188
[27169.768467] x19: fffffbffeffb5178 x18: 0000000000000020
[27169.776699] x17: 0000000000000000 x16: 0000000000000000
[27169.785042] x15: 0000000000000004 x14: ffff08208093d290
[27169.793254] x13: 0000000000000000 x12: 0000497f0149bafe
[27169.801390] x11: 01262d2f8bbfdccc x10: 3dc70ddec1e5f4f9
[27169.808896] x9 : ffffb7ae1a7eb108 x8 : 0000000000000002
[27169.817000] x7 : 0000000000000000 x6 : 000000000000000e
[27169.824726] x5 : 0000000000000001 x4 : 0000000000000000
[27169.832924] x3 : 0000000000000000 x2 : ffff082080a30e80
[27169.841135] x1 : 0000000000000000 x0 : fffffbffeffb5188
[27169.849059] Call trace:
[27169.853177]  perf_pmu_migrate_context+0x98/0x38c
[27169.862256]  hisi_hns3_pmu_offline_cpu+0x104/0x12c [hisi_hns3_pmu]
[27169.873943]  cpuhp_invoke_callback+0x118/0x634
[27169.882775]  cpuhp_thread_fun+0xe8/0x190
[27169.890345]  smpboot_thread_fn+0x25c/0x290
[27169.898241]  kthread+0x168/0x16c
[27169.904178]  ret_from_fork+0x10/0x18
[27169.911943] Code: 944beda1 aa1403e0 944bed9f f8460f5c (f9400394)

And tested some other uncore pmu driver such as hisi pcie pmu and hisi uncore i3c pmu,
hisi pcie pmu driver can reproduce it also, and change hisi uncore i3c pmu hotplug
function from cpuhp_state_remove_instance_nocalls() to cpuhp_state_remove_instance(),
it has the same call trace.

This patch change cpuhp_state_remove_instance() to cpuhp_state_remove_instance_nocalls()
after pmu unregistered to fix the problem.

This problem is fixed by commit bd2756811766 ("perf: Rewrite core context handling")
for kernel-6.2.

Hisi hns3 pmu driver is supported for kernel-5.19, so, use this patch as bugfix for
stable 6.0 and 6.1 tag, and use this patch as cleanup for newest kernel version.

Fixes: 66637ab137b4 ("drivers/perf: hisi: add driver for HNS3 PMU")
Signed-off-by: Hao Chen <chenhao418@huawei.com>
Signed-off-by: Jijie Shao <shaojijie@huawei.com>
---
ChangeLog:
	1. Add more details for NULL pointer call trace suggested by Yicong
	V1: https://lore.kernel.org/all/20231009105038.126040-1-shaojijie@huawei.com/
---
 drivers/perf/hisilicon/hns3_pmu.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/perf/hisilicon/hns3_pmu.c b/drivers/perf/hisilicon/hns3_pmu.c
index e0457d84af6b..16869bf5bf4c 100644
--- a/drivers/perf/hisilicon/hns3_pmu.c
+++ b/drivers/perf/hisilicon/hns3_pmu.c
@@ -1556,8 +1556,8 @@ static int hns3_pmu_init_pmu(struct pci_dev *pdev, struct hns3_pmu *hns3_pmu)
 	ret = perf_pmu_register(&hns3_pmu->pmu, hns3_pmu->pmu.name, -1);
 	if (ret) {
 		pci_err(pdev, "failed to register perf PMU, ret = %d.\n", ret);
-		cpuhp_state_remove_instance(CPUHP_AP_PERF_ARM_HNS3_PMU_ONLINE,
-					    &hns3_pmu->node);
+		cpuhp_state_remove_instance_nocalls(CPUHP_AP_PERF_ARM_HNS3_PMU_ONLINE,
+						    &hns3_pmu->node);
 	}
 
 	return ret;
@@ -1568,8 +1568,8 @@ static void hns3_pmu_uninit_pmu(struct pci_dev *pdev)
 	struct hns3_pmu *hns3_pmu = pci_get_drvdata(pdev);
 
 	perf_pmu_unregister(&hns3_pmu->pmu);
-	cpuhp_state_remove_instance(CPUHP_AP_PERF_ARM_HNS3_PMU_ONLINE,
-				    &hns3_pmu->node);
+	cpuhp_state_remove_instance_nocalls(CPUHP_AP_PERF_ARM_HNS3_PMU_ONLINE,
+					    &hns3_pmu->node);
 }
 
 static int hns3_pmu_init_dev(struct pci_dev *pdev)
-- 
2.30.0


WARNING: multiple messages have this Message-ID (diff)
From: Jijie Shao <shaojijie@huawei.com>
To: <will@kernel.org>, <jonathan.cameron@huawei.com>,
	<mark.rutland@arm.com>, <yangyicong@hisilicon.com>
Cc: <shaojijie@huawei.com>, <chenhao418@huawei.com>,
	<shenjian15@huawei.com>, <wangjie125@huawei.com>,
	<liuyonglong@huawei.com>, <linux-kernel@vger.kernel.org>,
	<linux-arm-kernel@lists.infradead.org>
Subject: [PATCH V2 drivers/perf: hisi:] drivers/perf: hisi: use cpuhp_state_remove_instance_nocalls() for hns3 pmu uninit process
Date: Mon, 16 Oct 2023 18:51:39 +0800	[thread overview]
Message-ID: <20231016105139.1436425-1-shaojijie@huawei.com> (raw)

From: Hao Chen <chenhao418@huawei.com>

For pmu uninit process, we unregister pmu and then call cpuhp_state_remove_instance()
to call callback function to migrate pmu context. the logic is unreasonable and
may result in NULL pointer call trace.

Found NULL pointer call trace for kernel-5.13 at the first time, log shows as below:
Unable to handle kernel NULL pointer dereference at virtual address 0000000000000000
[27169.471153] Mem abort info:
[27169.478859]   ESR = 0x96000006
[27169.485117]   EC = 0x25: DABT (current EL), IL = 32 bits
[27169.501386]   SET = 0, FnV = 0
[27169.508038]   EA = 0, S1PTW = 0
[27169.559839] Data abort info:
[27169.567296]   ISV = 0, ISS = 0x00000006
[27169.585332]   CM = 0, WnR = 0
[27169.592627] user pgtable: 4k pages, 48-bit VAs, pgdp=00000820ac0c2000
[27169.615618] [0000000000000000] pgd=00000820af2bf003, p4d=00000820af2bf003, pud=00000820af2bd003, pmd=0000000000000000
[27169.641348] Internal error: Oops: 96000006 [#1] PREEMPT SMP
[27169.649879] Modules linked in: hisi_hns3_pmu(-) hns3 hclge hclgevf hnae3 hns3_cae(O) vfio_iommu_type1 vfio_pci vfio_virqfd vfio pv680_mii(O) [last unloaded: hisi_hns3_pmu]
[27169.675150] CPU: 0 PID: 15 Comm: cpuhp/0 Tainted: G        W  O      5.12.0-rc4+ #1
[27169.686538] Hardware name:  , BIOS KpxxxFPGA 1P B600 V143 04/22/2021
[27169.694884] pstate: 80400009 (Nzcv daif +PAN -UAO -TCO BTYPE=--)
[27169.704199] pc : perf_pmu_migrate_context+0x98/0x38c
[27169.713848] lr : perf_pmu_migrate_context+0x94/0x38c
[27169.723069] sp : ffff80001020bc80
[27169.727938] x29: ffff80001020bc80 x28: 0000000000000000
[27169.737236] x27: 0000000000000001 x26: fffffbffeff951d8
[27169.745245] x25: ffffb7ae1ce36fd0 x24: fffffbffeff95178
[27169.753116] x23: ffff0821f6f82668 x22: 0000000000000001
[27169.760949] x21: ffff80001020bd08 x20: fffffbffeffb5188
[27169.768467] x19: fffffbffeffb5178 x18: 0000000000000020
[27169.776699] x17: 0000000000000000 x16: 0000000000000000
[27169.785042] x15: 0000000000000004 x14: ffff08208093d290
[27169.793254] x13: 0000000000000000 x12: 0000497f0149bafe
[27169.801390] x11: 01262d2f8bbfdccc x10: 3dc70ddec1e5f4f9
[27169.808896] x9 : ffffb7ae1a7eb108 x8 : 0000000000000002
[27169.817000] x7 : 0000000000000000 x6 : 000000000000000e
[27169.824726] x5 : 0000000000000001 x4 : 0000000000000000
[27169.832924] x3 : 0000000000000000 x2 : ffff082080a30e80
[27169.841135] x1 : 0000000000000000 x0 : fffffbffeffb5188
[27169.849059] Call trace:
[27169.853177]  perf_pmu_migrate_context+0x98/0x38c
[27169.862256]  hisi_hns3_pmu_offline_cpu+0x104/0x12c [hisi_hns3_pmu]
[27169.873943]  cpuhp_invoke_callback+0x118/0x634
[27169.882775]  cpuhp_thread_fun+0xe8/0x190
[27169.890345]  smpboot_thread_fn+0x25c/0x290
[27169.898241]  kthread+0x168/0x16c
[27169.904178]  ret_from_fork+0x10/0x18
[27169.911943] Code: 944beda1 aa1403e0 944bed9f f8460f5c (f9400394)

And tested some other uncore pmu driver such as hisi pcie pmu and hisi uncore i3c pmu,
hisi pcie pmu driver can reproduce it also, and change hisi uncore i3c pmu hotplug
function from cpuhp_state_remove_instance_nocalls() to cpuhp_state_remove_instance(),
it has the same call trace.

This patch change cpuhp_state_remove_instance() to cpuhp_state_remove_instance_nocalls()
after pmu unregistered to fix the problem.

This problem is fixed by commit bd2756811766 ("perf: Rewrite core context handling")
for kernel-6.2.

Hisi hns3 pmu driver is supported for kernel-5.19, so, use this patch as bugfix for
stable 6.0 and 6.1 tag, and use this patch as cleanup for newest kernel version.

Fixes: 66637ab137b4 ("drivers/perf: hisi: add driver for HNS3 PMU")
Signed-off-by: Hao Chen <chenhao418@huawei.com>
Signed-off-by: Jijie Shao <shaojijie@huawei.com>
---
ChangeLog:
	1. Add more details for NULL pointer call trace suggested by Yicong
	V1: https://lore.kernel.org/all/20231009105038.126040-1-shaojijie@huawei.com/
---
 drivers/perf/hisilicon/hns3_pmu.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/perf/hisilicon/hns3_pmu.c b/drivers/perf/hisilicon/hns3_pmu.c
index e0457d84af6b..16869bf5bf4c 100644
--- a/drivers/perf/hisilicon/hns3_pmu.c
+++ b/drivers/perf/hisilicon/hns3_pmu.c
@@ -1556,8 +1556,8 @@ static int hns3_pmu_init_pmu(struct pci_dev *pdev, struct hns3_pmu *hns3_pmu)
 	ret = perf_pmu_register(&hns3_pmu->pmu, hns3_pmu->pmu.name, -1);
 	if (ret) {
 		pci_err(pdev, "failed to register perf PMU, ret = %d.\n", ret);
-		cpuhp_state_remove_instance(CPUHP_AP_PERF_ARM_HNS3_PMU_ONLINE,
-					    &hns3_pmu->node);
+		cpuhp_state_remove_instance_nocalls(CPUHP_AP_PERF_ARM_HNS3_PMU_ONLINE,
+						    &hns3_pmu->node);
 	}
 
 	return ret;
@@ -1568,8 +1568,8 @@ static void hns3_pmu_uninit_pmu(struct pci_dev *pdev)
 	struct hns3_pmu *hns3_pmu = pci_get_drvdata(pdev);
 
 	perf_pmu_unregister(&hns3_pmu->pmu);
-	cpuhp_state_remove_instance(CPUHP_AP_PERF_ARM_HNS3_PMU_ONLINE,
-				    &hns3_pmu->node);
+	cpuhp_state_remove_instance_nocalls(CPUHP_AP_PERF_ARM_HNS3_PMU_ONLINE,
+					    &hns3_pmu->node);
 }
 
 static int hns3_pmu_init_dev(struct pci_dev *pdev)
-- 
2.30.0


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

             reply	other threads:[~2023-10-16 10:55 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-10-16 10:51 Jijie Shao [this message]
2023-10-16 10:51 ` [PATCH V2 drivers/perf: hisi:] drivers/perf: hisi: use cpuhp_state_remove_instance_nocalls() for hns3 pmu uninit process Jijie Shao
2023-10-19  1:52 ` Yicong Yang
2023-10-19  1:52   ` Yicong Yang
2023-10-19  8:47   ` Jijie Shao
2023-10-19  8:47     ` Jijie Shao

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20231016105139.1436425-1-shaojijie@huawei.com \
    --to=shaojijie@huawei.com \
    --cc=chenhao418@huawei.com \
    --cc=jonathan.cameron@huawei.com \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=liuyonglong@huawei.com \
    --cc=mark.rutland@arm.com \
    --cc=shenjian15@huawei.com \
    --cc=wangjie125@huawei.com \
    --cc=will@kernel.org \
    --cc=yangyicong@hisilicon.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.