Linux-EDAC Archive on lore.kernel.org
 help / color / Atom feed
* [PATCH 1/1] EDAC/ghes: Fix for NULL pointer dereference in ghes_edac_register()
@ 2020-08-25 13:01 Shiju Jose
  2020-08-26  8:52 ` Borislav Petkov
  0 siblings, 1 reply; 7+ messages in thread
From: Shiju Jose @ 2020-08-25 13:01 UTC (permalink / raw)
  To: linux-edac, linux-kernel, bp, mchehab, tony.luck, james.morse, rrichter
  Cc: linuxarm

After the 'commit b9cae27728d1 ("EDAC/ghes: Scan the system once on driver init")'
applied, following error has occurred in ghes_edac_register() when
CONFIG_DEBUG_TEST_DRIVER_REMOVE is enabled. The null ghes_hw.dimms pointer
in the mci_for_each_dimm() of ghes_edac_register() caused the error.

The error occurs when all the previously initialized ghes instances are
removed and then probe a new ghes instance. In this case, the ghes_refcount
would be 0, ghes_hw.dimms and mci already freed. The ghes_hw.dimms would
be null because ghes_scan_system() would not call enumerate_dimms() again.

Following is the error log:

EDAC MC0: Giving out device to module ghes_edac.c controller ghes_edac: DEV ghes (INTERRUPT)
EDAC MC: Removed device 0 for ghes_edac.c ghes_edac: DEV ghes
Unable to handle kernel NULL pointer dereference at virtual address 0000000000000330
Mem abort info:
  ESR = 0x96000004
  EC = 0x25: DABT (current EL), IL = 32 bits
  SET = 0, FnV = 0
  EA = 0, S1PTW = 0
Data abort info:
  ISV = 0, ISS = 0x00000004
  CM = 0, WnR = 0
[0000000000000330] user address but active_mm is swapper
Internal error: Oops: 96000004 [#1] PREEMPT SMP
Modules linked in:
CPU: 34 PID: 1 Comm: swapper/0 Not tainted 5.9.0-rc1-00085-g06a4ec1d9dc6-dirty #29
Hardware name: Huawei TaiShan 2280 V2/BC82AMDC, BIOS 2280-V2 CS V3.B270.01 05/08/2020
pstate: 60c00009 (nZCv daif +PAN +UAO BTYPE=--)
pc : ghes_edac_register+0x19c/0x340
lr : ghes_edac_register+0x12c/0x340
sp : ffff80001041bad0
x29: ffff80001041bad0 x28: ffffc56e16f210a0
x27: 0000000000000000 x26: ffffc56e175d0000
x25: 0000000000000000 x24: ffff007ef7e2a010
x23: ffff007ef5c3a6ec x22: ffffc56e17606000
x21: ffffc56e17409000 x20: ffff007ef5c3a000
x19: ffffc56e176a7000 x18: 000000000000000e
x17: ffff80001007dfff x16: 0000008000000000
x15: ffff80001007dfff x14: 0000000044011000
x13: 0000000040000000 x12: ffff80001007e000
x11: 00000000ffffffff x10: 00000000ffffffff
x9 : 0000000000000002 x8 : ffff207ef6c502fc
x7 : 0000000000000360 x6 : 0000000000000000
x5 : 00000000fffffffc x4 : ffff007ef5c3a6e0
x3 : 0000000000000020 x2 : ffff207ef6c27c00
x1 : 0000000000000000 x0 : 0000000000000000
Call trace:
 ghes_edac_register+0x19c/0x340
 ghes_probe+0x1f0/0x3dc
 platform_drv_probe+0x4c/0xb0
 really_probe+0x1c4/0x444
 driver_probe_device+0x54/0xb0
 device_driver_attach+0x68/0x70
 __driver_attach+0x94/0xdc
 bus_for_each_dev+0x64/0xc0
 driver_attach+0x20/0x28
 bus_add_driver+0x138/0x1f8
 driver_register+0x60/0x10c
 __platform_driver_register+0x4c/0x54
 ghes_init+0x94/0x110
 do_one_initcall+0x58/0x1ac
 kernel_init_freeable+0x204/0x274
 kernel_init+0x10/0x10c
 ret_from_fork+0x10/0x18
Code: 52800000 52806c07 f9401026 9b271801 (b9433023)
---[ end trace f7c77f8c8dfe4b4a ]---
Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b
SMP: stopping secondary CPUs
Kernel Offset: 0x456e05a60000 from 0xffff800010000000
PHYS_OFFSET: 0xffffc58400000000
CPU features: 0x0040002,22808a18
Memory Limit: none
---[ end Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b ]---

Signed-off-by: Shiju Jose <shiju.jose@huawei.com>
---
 drivers/edac/ghes_edac.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/edac/ghes_edac.c b/drivers/edac/ghes_edac.c
index da60c29468a7..7930643c6811 100644
--- a/drivers/edac/ghes_edac.c
+++ b/drivers/edac/ghes_edac.c
@@ -227,7 +227,7 @@ static void ghes_scan_system(void)
 {
 	static bool scanned;
 
-	if (scanned)
+	if (scanned && refcount_read(&ghes_refcount))
 		return;
 
 	dmi_walk(enumerate_dimms, &ghes_hw);
-- 
2.17.1



^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 1/1] EDAC/ghes: Fix for NULL pointer dereference in ghes_edac_register()
  2020-08-25 13:01 [PATCH 1/1] EDAC/ghes: Fix for NULL pointer dereference in ghes_edac_register() Shiju Jose
@ 2020-08-26  8:52 ` Borislav Petkov
  2020-08-27 14:02   ` Shiju Jose
  0 siblings, 1 reply; 7+ messages in thread
From: Borislav Petkov @ 2020-08-26  8:52 UTC (permalink / raw)
  To: Shiju Jose
  Cc: linux-edac, linux-kernel, mchehab, tony.luck, james.morse,
	rrichter, linuxarm

On Tue, Aug 25, 2020 at 02:01:08PM +0100, Shiju Jose wrote:
> After the 'commit b9cae27728d1 ("EDAC/ghes: Scan the system once on driver init")'
> applied, following error has occurred in ghes_edac_register() when
> CONFIG_DEBUG_TEST_DRIVER_REMOVE is enabled. The null ghes_hw.dimms pointer
> in the mci_for_each_dimm() of ghes_edac_register() caused the error.
> 
> The error occurs when all the previously initialized ghes instances are
> removed and then probe a new ghes instance. In this case, the ghes_refcount
> would be 0, ghes_hw.dimms and mci already freed. The ghes_hw.dimms would
> be null because ghes_scan_system() would not call enumerate_dimms() again.

Try the below instead and see if it fixes the issue for you too.

If it does, pls send it as v2 but do not add the splat to the commit
message - that's a lot of noise for something which is clear why it
happens and you explain it properly in text anyway.

Thx.

---
diff --git a/drivers/edac/ghes_edac.c b/drivers/edac/ghes_edac.c
index da60c29468a7..54ebc8afc6b1 100644
--- a/drivers/edac/ghes_edac.c
+++ b/drivers/edac/ghes_edac.c
@@ -55,6 +55,8 @@ static DEFINE_SPINLOCK(ghes_lock);
 static bool __read_mostly force_load;
 module_param(force_load, bool, 0);
 
+static bool system_scanned;
+
 /* Memory Device - Type 17 of SMBIOS spec */
 struct memdev_dmi_entry {
 	u8 type;
@@ -225,14 +227,12 @@ static void enumerate_dimms(const struct dmi_header *dh, void *arg)
 
 static void ghes_scan_system(void)
 {
-	static bool scanned;
-
-	if (scanned)
+	if (system_scanned)
 		return;
 
 	dmi_walk(enumerate_dimms, &ghes_hw);
 
-	scanned = true;
+	system_scanned = true;
 }
 
 void ghes_edac_report_mem_error(int sev, struct cper_sec_mem_err *mem_err)
@@ -631,6 +631,8 @@ void ghes_edac_unregister(struct ghes *ghes)
 
 	mutex_lock(&ghes_reg_mutex);
 
+	system_scanned = false;
+
 	if (!refcount_dec_and_test(&ghes_refcount))
 		goto unlock;
 

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 7+ messages in thread

* RE: [PATCH 1/1] EDAC/ghes: Fix for NULL pointer dereference in ghes_edac_register()
  2020-08-26  8:52 ` Borislav Petkov
@ 2020-08-27 14:02   ` Shiju Jose
  2020-09-11 16:48     ` Borislav Petkov
  0 siblings, 1 reply; 7+ messages in thread
From: Shiju Jose @ 2020-08-27 14:02 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: linux-edac, linux-kernel, mchehab, tony.luck, james.morse,
	rrichter, Linuxarm

Hello Boris,

Thanks for reviewing.

>-----Original Message-----
>From: linux-edac-owner@vger.kernel.org [mailto:linux-edac-
>owner@vger.kernel.org] On Behalf Of Borislav Petkov
>Sent: 26 August 2020 09:52
>To: Shiju Jose <shiju.jose@huawei.com>
>Cc: linux-edac@vger.kernel.org; linux-kernel@vger.kernel.org;
>mchehab@kernel.org; tony.luck@intel.com; james.morse@arm.com;
>rrichter@marvell.com; Linuxarm <linuxarm@huawei.com>
>Subject: Re: [PATCH 1/1] EDAC/ghes: Fix for NULL pointer dereference in
>ghes_edac_register()
>
>On Tue, Aug 25, 2020 at 02:01:08PM +0100, Shiju Jose wrote:
>> After the 'commit b9cae27728d1 ("EDAC/ghes: Scan the system once on
>driver init")'
>> applied, following error has occurred in ghes_edac_register() when
>> CONFIG_DEBUG_TEST_DRIVER_REMOVE is enabled. The null
>ghes_hw.dimms
>> pointer in the mci_for_each_dimm() of ghes_edac_register() caused the
>error.
>>
>> The error occurs when all the previously initialized ghes instances
>> are removed and then probe a new ghes instance. In this case, the
>> ghes_refcount would be 0, ghes_hw.dimms and mci already freed. The
>> ghes_hw.dimms would be null because ghes_scan_system() would not call
>enumerate_dimms() again.
>
>Try the below instead and see if it fixes the issue for you too.
>
>If it does, pls send it as v2 but do not add the splat to the commit message -
>that's a lot of noise for something which is clear why it happens and you
>explain it properly in text anyway.

I tested with your changes and it fixes the issue.  I will send v2.
 
>
>Thx.
>
>---
>diff --git a/drivers/edac/ghes_edac.c b/drivers/edac/ghes_edac.c index
>da60c29468a7..54ebc8afc6b1 100644
>--- a/drivers/edac/ghes_edac.c
>+++ b/drivers/edac/ghes_edac.c
>@@ -55,6 +55,8 @@ static DEFINE_SPINLOCK(ghes_lock);  static bool
>__read_mostly force_load;  module_param(force_load, bool, 0);
>
>+static bool system_scanned;
>+
> /* Memory Device - Type 17 of SMBIOS spec */  struct memdev_dmi_entry {
> 	u8 type;
>@@ -225,14 +227,12 @@ static void enumerate_dimms(const struct
>dmi_header *dh, void *arg)
>
> static void ghes_scan_system(void)
> {
>-	static bool scanned;
>-
>-	if (scanned)
>+	if (system_scanned)
> 		return;
>
> 	dmi_walk(enumerate_dimms, &ghes_hw);
>
>-	scanned = true;
>+	system_scanned = true;
> }
>
> void ghes_edac_report_mem_error(int sev, struct cper_sec_mem_err
>*mem_err) @@ -631,6 +631,8 @@ void ghes_edac_unregister(struct ghes
>*ghes)
>
> 	mutex_lock(&ghes_reg_mutex);
>
>+	system_scanned = false;
>+
> 	if (!refcount_dec_and_test(&ghes_refcount))
> 		goto unlock;
>
>
>--
>Regards/Gruss,
>    Boris.
>
>https://people.kernel.org/tglx/notes-about-netiquette

Thanks,
Shiju

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 1/1] EDAC/ghes: Fix for NULL pointer dereference in ghes_edac_register()
  2020-08-27 14:02   ` Shiju Jose
@ 2020-09-11 16:48     ` Borislav Petkov
  2020-09-11 16:49       ` [PATCH] EDAC/ghes: Check whether the driver is on the safe list correctly Borislav Petkov
  2020-09-15 10:54       ` [PATCH 1/1] EDAC/ghes: Fix for NULL pointer dereference in ghes_edac_register() Shiju Jose
  0 siblings, 2 replies; 7+ messages in thread
From: Borislav Petkov @ 2020-09-11 16:48 UTC (permalink / raw)
  To: Shiju Jose
  Cc: linux-edac, linux-kernel, mchehab, tony.luck, james.morse,
	Linuxarm, Robert Richter

On Thu, Aug 27, 2020 at 02:02:27PM +0000, Shiju Jose wrote:
> I tested with your changes and it fixes the issue.  I will send v2.

Btw, I don't know how it managed to work on your machine because even
with this patch, it isn't all fixed because num_dimms needs to be
cleared too, see here:

---
From: Borislav Petkov <bp@suse.de>
Date: Fri, 11 Sep 2020 12:55:55 +0200
Subject: [PATCH] EDAC/ghes: Clear scanned data on unload

Commit

  b972fdba8665 ("EDAC/ghes: Fix NULL pointer dereference in ghes_edac_register()")

didn't clear all the information from the scanned system and, more
specifically, left ghes_hw.num_dimms to its previous value. On a
second load (CONFIG_DEBUG_TEST_DRIVER_REMOVE=y), the driver would use
the leftover num_dimms value which is not 0 and thus the 0 check in
enumerate_dimms() will get bypassed and it would go directly to the
pointer deref:

  d = &hw->dimms[hw->num_dimms];

which is, of course, NULL:

  #PF: supervisor write access in kernel mode
  #PF: error_code(0x0002) - not-present page
  PGD 0 P4D 0
  Oops: 0002 [#1] PREEMPT SMP
  CPU: 7 PID: 1 Comm: swapper/0 Not tainted 5.9.0-rc4+ #7
  Hardware name: GIGABYTE MZ01-CE1-00/MZ01-CE1-00, BIOS F02 08/29/2018
  RIP: 0010:enumerate_dimms.cold+0x7b/0x375

Reset the whole ghes_hw on driver unregister so that no stale values are
used on a second system scan.

Fixes: b972fdba8665 ("EDAC/ghes: Fix NULL pointer dereference in ghes_edac_register()")
Cc: Shiju Jose <shiju.jose@huawei.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
---
 drivers/edac/ghes_edac.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/edac/ghes_edac.c b/drivers/edac/ghes_edac.c
index a6b9c0b2a15c..eb6034a6fbbb 100644
--- a/drivers/edac/ghes_edac.c
+++ b/drivers/edac/ghes_edac.c
@@ -632,6 +632,7 @@ void ghes_edac_unregister(struct ghes *ghes)
 	mutex_lock(&ghes_reg_mutex);
 
 	system_scanned = false;
+	memset(&ghes_hw, 0, sizeof(struct ghes_hw_desc));
 
 	if (!refcount_dec_and_test(&ghes_refcount))
 		goto unlock;
-- 
2.21.0

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH] EDAC/ghes: Check whether the driver is on the safe list correctly
  2020-09-11 16:48     ` Borislav Petkov
@ 2020-09-11 16:49       ` Borislav Petkov
  2020-09-15 10:54       ` [PATCH 1/1] EDAC/ghes: Fix for NULL pointer dereference in ghes_edac_register() Shiju Jose
  1 sibling, 0 replies; 7+ messages in thread
From: Borislav Petkov @ 2020-09-11 16:49 UTC (permalink / raw)
  To: Shiju Jose
  Cc: linux-edac, linux-kernel, mchehab, tony.luck, james.morse,
	Linuxarm, Robert Richter

And here's one more I triggered with this driver removing thing:

---
From: Borislav Petkov <bp@suse.de>

With CONFIG_DEBUG_TEST_DRIVER_REMOVE=y, a system would try to probe,
unregister and probe again a driver.

When ghes_edac is attempted to be loaded on a system which is not on
the safe platforms list, ghes_edac_register() would return early. The
unregister counterpart ghes_edac_unregister() would still attempt to
unregister and exit early at the refcount test, leading to the refcount
underflow below.

In order to not do *anything* on the unregister path too, reuse the
force_load parameter and check it on that path too, before fumbling with
the refcount.

  ghes_edac: ghes_edac_register: entry
  ghes_edac: ghes_edac_register: return -ENODEV
  ------------[ cut here ]------------
  refcount_t: underflow; use-after-free.
  WARNING: CPU: 10 PID: 1 at lib/refcount.c:28 refcount_warn_saturate+0xb9/0x100
  Modules linked in:
  CPU: 10 PID: 1 Comm: swapper/0 Not tainted 5.9.0-rc4+ #12
  Hardware name: GIGABYTE MZ01-CE1-00/MZ01-CE1-00, BIOS F02 08/29/2018
  RIP: 0010:refcount_warn_saturate+0xb9/0x100
  Code: 82 e8 fb 8f 4d 00 90 0f 0b 90 90 c3 80 3d 55 4c f5 00 00 75 88 c6 05 4c 4c f5 00 01 90 48 c7 c7 d0 8a 10 82 e8 d8 8f 4d 00 90 <0f> 0b 90 90 c3 80 3d 30 4c f5 00 00 0f 85 61 ff ff ff c6 05 23 4c
  RSP: 0018:ffffc90000037d58 EFLAGS: 00010292
  RAX: 0000000000000026 RBX: ffff88840b8da000 RCX: 0000000000000000
  RDX: 0000000000000001 RSI: ffffffff8216b24f RDI: 00000000ffffffff
  RBP: ffff88840c662e00 R08: 0000000000000001 R09: 0000000000000001
  R10: 0000000000000001 R11: 0000000000000046 R12: 0000000000000000
  R13: 0000000000000001 R14: 0000000000000000 R15: 0000000000000000
  FS:  0000000000000000(0000) GS:ffff88840ee80000(0000) knlGS:0000000000000000
  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  CR2: 0000000000000000 CR3: 0000800002211000 CR4: 00000000003506e0
  Call Trace:
   ghes_edac_unregister
   ghes_remove
   platform_drv_remove
   really_probe
   driver_probe_device
   device_driver_attach
   __driver_attach
   ? device_driver_attach
   ? device_driver_attach
   bus_for_each_dev
   bus_add_driver
   driver_register
   ? bert_init
   ghes_init
   do_one_initcall
   ? rcu_read_lock_sched_held
   kernel_init_freeable
   ? rest_init
   kernel_init
   ret_from_fork
   ...
  ghes_edac: ghes_edac_unregister: FALSE, refcount: -1073741824

Signed-off-by: Borislav Petkov <bp@suse.de>
---
 drivers/edac/ghes_edac.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/drivers/edac/ghes_edac.c b/drivers/edac/ghes_edac.c
index eb6034a6fbbb..dbbefd2798c5 100644
--- a/drivers/edac/ghes_edac.c
+++ b/drivers/edac/ghes_edac.c
@@ -508,6 +508,7 @@ int ghes_edac_register(struct ghes *ghes, struct device *dev)
 		if (!force_load && idx < 0)
 			return -ENODEV;
 	} else {
+		force_load = true;
 		idx = 0;
 	}
 
@@ -629,6 +630,9 @@ void ghes_edac_unregister(struct ghes *ghes)
 	struct mem_ctl_info *mci;
 	unsigned long flags;
 
+	if (!force_load)
+		return;
+
 	mutex_lock(&ghes_reg_mutex);
 
 	system_scanned = false;
-- 
2.21.0


-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 7+ messages in thread

* RE: [PATCH 1/1] EDAC/ghes: Fix for NULL pointer dereference in ghes_edac_register()
  2020-09-11 16:48     ` Borislav Petkov
  2020-09-11 16:49       ` [PATCH] EDAC/ghes: Check whether the driver is on the safe list correctly Borislav Petkov
@ 2020-09-15 10:54       ` Shiju Jose
  2020-09-15 12:25         ` Borislav Petkov
  1 sibling, 1 reply; 7+ messages in thread
From: Shiju Jose @ 2020-09-15 10:54 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: linux-edac, linux-kernel, mchehab, tony.luck, james.morse,
	Linuxarm, Robert Richter

Hi Boris,

Sorry for the delay.

>-----Original Message-----
>From: Borislav Petkov [mailto:bp@alien8.de]
>Sent: 11 September 2020 17:48
>To: Shiju Jose <shiju.jose@huawei.com>
>Cc: linux-edac@vger.kernel.org; linux-kernel@vger.kernel.org;
>mchehab@kernel.org; tony.luck@intel.com; james.morse@arm.com;
>Linuxarm <linuxarm@huawei.com>; Robert Richter <rric@kernel.org>
>Subject: Re: [PATCH 1/1] EDAC/ghes: Fix for NULL pointer dereference in
>ghes_edac_register()
>
>On Thu, Aug 27, 2020 at 02:02:27PM +0000, Shiju Jose wrote:
>> I tested with your changes and it fixes the issue.  I will send v2.
>
>Btw, I don't know how it managed to work on your machine because even
>with this patch, it isn't all fixed because num_dimms needs to be cleared too,
>see here:

I debug with adding more logs. 
I found that in our platform hw->num_dimms was 32 when called ghes_edac_register() second time
when probe a new ghes instance,  the check !(hw->num_dimms % 16) in the enumerate_dimms() passed and 
it allocated memory for  hw->dimms. Thus it did not fail with NULL pointer dereference in ghes_edac_register().
With the your new fix hw->num_dimms reset to 0.

>
>---
>From: Borislav Petkov <bp@suse.de>
>Date: Fri, 11 Sep 2020 12:55:55 +0200
>Subject: [PATCH] EDAC/ghes: Clear scanned data on unload
>
>Commit
>
>  b972fdba8665 ("EDAC/ghes: Fix NULL pointer dereference in
>ghes_edac_register()")
>
>didn't clear all the information from the scanned system and, more
>specifically, left ghes_hw.num_dimms to its previous value. On a second load
>(CONFIG_DEBUG_TEST_DRIVER_REMOVE=y), the driver would use the
>leftover num_dimms value which is not 0 and thus the 0 check in
>enumerate_dimms() will get bypassed and it would go directly to the pointer
>deref:
>
>  d = &hw->dimms[hw->num_dimms];
>
>which is, of course, NULL:
>
>  #PF: supervisor write access in kernel mode
>  #PF: error_code(0x0002) - not-present page
>  PGD 0 P4D 0
>  Oops: 0002 [#1] PREEMPT SMP
>  CPU: 7 PID: 1 Comm: swapper/0 Not tainted 5.9.0-rc4+ #7
>  Hardware name: GIGABYTE MZ01-CE1-00/MZ01-CE1-00, BIOS F02
>08/29/2018
>  RIP: 0010:enumerate_dimms.cold+0x7b/0x375
>
>Reset the whole ghes_hw on driver unregister so that no stale values are
>used on a second system scan.
>
>Fixes: b972fdba8665 ("EDAC/ghes: Fix NULL pointer dereference in
>ghes_edac_register()")
>Cc: Shiju Jose <shiju.jose@huawei.com>
>Signed-off-by: Borislav Petkov <bp@suse.de>
>---
> drivers/edac/ghes_edac.c | 1 +
> 1 file changed, 1 insertion(+)
>
>diff --git a/drivers/edac/ghes_edac.c b/drivers/edac/ghes_edac.c index
>a6b9c0b2a15c..eb6034a6fbbb 100644
>--- a/drivers/edac/ghes_edac.c
>+++ b/drivers/edac/ghes_edac.c
>@@ -632,6 +632,7 @@ void ghes_edac_unregister(struct ghes *ghes)
> 	mutex_lock(&ghes_reg_mutex);
>
> 	system_scanned = false;
>+	memset(&ghes_hw, 0, sizeof(struct ghes_hw_desc));
>
> 	if (!refcount_dec_and_test(&ghes_refcount))
> 		goto unlock;
>--
>2.21.0
>
>--
>Regards/Gruss,
>    Boris.
>
>https://people.kernel.org/tglx/notes-about-netiquette

Thanks,
Shiju

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 1/1] EDAC/ghes: Fix for NULL pointer dereference in ghes_edac_register()
  2020-09-15 10:54       ` [PATCH 1/1] EDAC/ghes: Fix for NULL pointer dereference in ghes_edac_register() Shiju Jose
@ 2020-09-15 12:25         ` Borislav Petkov
  0 siblings, 0 replies; 7+ messages in thread
From: Borislav Petkov @ 2020-09-15 12:25 UTC (permalink / raw)
  To: Shiju Jose
  Cc: linux-edac, linux-kernel, mchehab, tony.luck, james.morse,
	Linuxarm, Robert Richter

On Tue, Sep 15, 2020 at 10:54:33AM +0000, Shiju Jose wrote:
> I debug with adding more logs. I found that in our platform
> hw->num_dimms was 32 when called ghes_edac_register() second time when
> probe a new ghes instance, the check !(hw->num_dimms % 16) in the
> enumerate_dimms() passed and it allocated memory for hw->dimms. Thus
> it did not fail with NULL pointer dereference in ghes_edac_register().

Oh, what a lucky coincidence ;-\

> With the your new fix hw->num_dimms reset to 0.

Thanks for testing and digging out what is happening on your machine. So
I've queued them here:

https://git.kernel.org/pub/scm/linux/kernel/git/ras/ras.git/log/?h=edac-for-next

Will send them to Linus on the weekend so that they make it into 5.9.

Thx.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, back to index

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-08-25 13:01 [PATCH 1/1] EDAC/ghes: Fix for NULL pointer dereference in ghes_edac_register() Shiju Jose
2020-08-26  8:52 ` Borislav Petkov
2020-08-27 14:02   ` Shiju Jose
2020-09-11 16:48     ` Borislav Petkov
2020-09-11 16:49       ` [PATCH] EDAC/ghes: Check whether the driver is on the safe list correctly Borislav Petkov
2020-09-15 10:54       ` [PATCH 1/1] EDAC/ghes: Fix for NULL pointer dereference in ghes_edac_register() Shiju Jose
2020-09-15 12:25         ` Borislav Petkov

Linux-EDAC Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-edac/0 linux-edac/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-edac linux-edac/ https://lore.kernel.org/linux-edac \
		linux-edac@vger.kernel.org
	public-inbox-index linux-edac

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-edac


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git