All of lore.kernel.org
 help / color / mirror / Atom feed
From: John Garry <john.garry@huawei.com>
To: Robert Richter <rrichter@marvell.com>
Cc: Borislav Petkov <bp@alien8.de>,
	Mauro Carvalho Chehab <mchehab@kernel.org>,
	James Morse <james.morse@arm.com>,
	"tony.luck@intel.com" <tony.luck@intel.com>,
	"linux-edac@vger.kernel.org" <linux-edac@vger.kernel.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	wanghuiqiang <wanghuiqiang@huawei.com>,
	Xiaofei Tan <tanxiaofei@huawei.com>,
	Linuxarm <linuxarm@huawei.com>,
	"Huangming (Mark)" <huangming23@huawei.com>
Subject: Re: linuxnext-2019119 edac warns (was Re: edac KASAN warning in experimental arm64 allmodconfig boot)
Date: Thu, 21 Nov 2019 15:23:42 +0000	[thread overview]
Message-ID: <4ff7631f-fbb7-e45f-87dd-9223beca4da7@huawei.com> (raw)
In-Reply-To: <20191121142302.rhvgkgqpiubidhtu@rric.localdomain>

On 21/11/2019 14:23, Robert Richter wrote:
> Hi John,
> 
> thanks for testing and reporting this. See inline.
> 
> On 21.11.19 12:34:22, John Garry wrote:
>> On 14/10/2019 16:18, John Garry wrote:
>> JFYI, I see an issue on linuxnext-2019119, as follows:
>>
>>     21.645388] io scheduler kyber registered
>> [   21.734011] input: Power Button as
>> /devices/LNXSYSTM:00/LNXSYBUS:00/PNP0C0C:00/input/input0
>> [   21.743295] ACPI: Power Button [PWRB]
>> [   21.809644] [Firmware Bug]: APEI: Invalid bit width + offset in GAR
>> [0x94110034/64/0/3/0]
>> [   21.821974] EDAC MC0: Giving out device to module ghes_edac.c controller
>> ghes_edac: DEV ghes (INTERRUPT)
>> [   21.831763] ------------[ cut here ]------------
>> [   21.836374] refcount_t: increment on 0; use-after-free.
>> [   21.841620] WARNING: CPU: 36 PID: 1 at lib/refcount.c:156
>> refcount_inc_checked+0x44/0x50
>> [   21.849697] Modules linked in:
>> [   21.852745] CPU: 36 PID: 1 Comm: swapper/0 Not tainted
>> 5.4.0-rc8-next-20191119-00003-g141a9fef5092-dirty #650
>> [   21.862645] Hardware name: Huawei D06 /D06, BIOS Hisilicon D06 UEFI RC0 -
>> V1.16.01 03/15/2019
>> [   21.871157] pstate: 60c00009 (nZCv daif +PAN +UAO)
>> [   21.875936] pc : refcount_inc_checked+0x44/0x50
>> [   21.880455] lr : refcount_inc_checked+0x44/0x50
> 
> This is a warning from the refcount framework. It warns if we
> increment from zero. This is reasonable as typically a kernel object
> is created with a refcount of 1 and thrown away once the refcount is
> zero. Afterwards the object is used-after-free.
> 
> For ghes the refcount is initialized with zero, and that is why we see
> this message. However, we protect the refcount with the ghes_reg_mutex
> and thus there is no use after free. The device is allocated and
> registered if the refcount is zero. So this works fine.
> 
> Enclosed a fix that avoids the warning, please test.
> 
> But see below...
> 
>> [   21.884972] sp : ffff00236ffbf8a0
>> [   21.888274] x29: ffff00236ffbf8a0 x28: 0000000000000002
>> [   21.893576] x27: ffff00236cd07900 x26: ffff002369063010
>> [   21.898876] x25: 0000000000000000 x24: ffff00233c236824
>> [   21.904177] x23: ffffa000137b9000 x22: ffffa00016fbb7c0
>> [   21.909477] x21: ffffa00012dfd000 x20: 1fffe0046dff7f24
>> [   21.914777] x19: ffff00233c236000 x18: 0000000000000000
>> [   21.920077] x17: 0000000000000000 x16: 0000000000000000
>> [   21.925377] x15: 0000000000007700 x14: 64655f7365686720
>> [   21.930677] x13: 72656c6c6f72746e x12: 1ffff40002719618
>> [   21.935977] x11: ffff940002719618 x10: dfffa00000000000
>> [   21.941278] x9 : ffff940002719619 x8 : 0000000000000001
>> [   21.946578] x7 : 0000000000000000 x6 : 0000000000000001
>> [   21.951877] x5 : ffff940002719618 x4 : ffff00236ffb0010
>> [   21.957178] x3 : ffffa000112415e4 x2 : ffff80046dff7ede
>> [   21.962478] x1 : 5aff78756b1cf400 x0 : 0000000000000000
>> [   21.967779] Call trace:
>> [   21.970214]  refcount_inc_checked+0x44/0x50
>> [   21.974389]  ghes_edac_register+0x258/0x388
>> [   21.978562]  ghes_probe+0x28c/0x5f0
>> [   21.982041]  platform_drv_probe+0x70/0xd8
>> [   21.986039]  really_probe+0x174/0x468
>> [   21.989690]  driver_probe_device+0x7c/0x148
>> [   21.993862]  device_driver_attach+0x94/0xa0
>> [   21.998033]  __driver_attach+0xa4/0x110
>> [   22.001857]  bus_for_each_dev+0xe8/0x158
>> [   22.005768]  driver_attach+0x30/0x40
>> [   22.009331]  bus_add_driver+0x234/0x2f0
>> [   22.013156]  driver_register+0xbc/0x1d0
>> [   22.016981]  __platform_driver_register+0x7c/0x88
>> [   22.021675]  ghes_init+0xbc/0x14c
>> [   22.024979]  do_one_initcall+0xb4/0x254
>> [   22.028805]  kernel_init_freeable+0x248/0x2f4
>> [   22.033151]  kernel_init+0x10/0x118
>> [   22.036628]  ret_from_fork+0x10/0x18
>> [   22.040194] ---[ end trace 33655bb65a9835fe ]---
>> [   22.046666] EDAC MC: bug in low-level driver: attempt to assign
>> [   22.046666]     duplicate mc_idx 0 in add_mc_to_global_list()
>> [   22.058311] ghes_edac: Can't register at EDAC core
>> [   22.065402] EDAC MC: bug in low-level driver: attempt to assign
>> [   22.065402]     duplicate mc_idx 0 in add_mc_to_global_list()
>> [   22.077080] ghes_edac: Can't register at EDAC core
>> [   22.084140] EDAC MC: bug in low-level driver: attempt to assign
>> [   22.084140]     duplicate mc_idx 0 in add_mc_to_global_list()
>> [   22.095789] ghes_edac: Can't register at EDAC core
>> [   22.102873] EDAC MC: bug in low-level driver: attempt to assign
>> [   22.102873]     duplicate mc_idx 0 in add_mc_to_global_list()
>> [   22.115442] ghes_edac: Can't register at EDAC core
>> [   22.122536] EDAC MC: bug in low-level driver: attempt to assign
>> [   22.122536]     duplicate mc_idx 0 in add_mc_to_global_list()
>> [   22.134344] ghes_edac: Can't register at EDAC core
>> [   22.141441] EDAC MC: bug in low-level driver: attempt to assign
>> [   22.141441]     duplicate mc_idx 0 in add_mc_to_global_list()
>> [   22.153089] ghes_edac: Can't register at EDAC core
>> [   22.160161] EDAC MC: bug in low-level driver: attempt to assign
>> [   22.160161]     duplicate mc_idx 0 in add_mc_to_global_list()
>> [   22.171810] ghes_edac: Can't register at EDAC core
> 
> What I am more concerned is this here. In total this implies 8 ghes
> users that all try to register a (single-instance) ghes mc device. For
> non-x86 only one instance is allowed (see ghes_edac_register(), idx =
> 0).
> 

[cc some guys about HEST]

> So on your platform, when parsing the HEST table
> (hest_ghes_dev_register()), more than one "GHES" device is parsed,
> allocated and registered. Mind sending me your HEST table
> (/sys/firmware/acpi/tables/HEST), or explain what happens here? 

I think that this should be the same:
https://github.com/tianocore/edk2-platforms/tree/master/Silicon/Hisilicon/Hi1620/Drivers/Apei/Hest


If
> this is a valid use case, we need to change ghes_edac_register() to
> support more than one instance.
> 
> Again, please try the patch below.
> 
> Thanks,
> 
> -Robert
> 
> 
>>From 6962f8af4a7c1051c9e87a5ac60571f70d2b6814 Mon Sep 17 00:00:00 2001
> From: Robert Richter <rrichter@marvell.com>
> Date: Thu, 21 Nov 2019 15:01:28 +0100
> Subject: [PATCH] EDAC/ghes: Do not warn on when increment refcount on 0
> 
> Signed-off-by: Robert Richter <rrichter@marvell.com>
> ---
>   drivers/edac/ghes_edac.c | 4 ++--
>   1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/edac/ghes_edac.c b/drivers/edac/ghes_edac.c
> index 47f4e7f90ef0..b99080d8a10c 100644
> --- a/drivers/edac/ghes_edac.c
> +++ b/drivers/edac/ghes_edac.c
> @@ -556,8 +556,8 @@ int ghes_edac_register(struct ghes *ghes, struct device *dev)
>   	ghes_pvt = pvt;
>   	spin_unlock_irqrestore(&ghes_lock, flags);
>   
> -	/* only increment on success */
> -	refcount_inc(&ghes_refcount);
> +	/* only set on success */
> +	refcount_set(&ghes_refcount, 1);

Yep, that seems to have silenced it all:

[   21.739895] input: Power Button as 
/devices/LNXSYSTM:00/LNXSYBUS:00/PNP0C0C:00/input/input0
[   21.749848] ACPI: Power Button [PWRB]
[   21.816977] [Firmware Bug]: APEI: Invalid bit width + offset in GAR 
[0x94110034/64/0/3/0]
[   21.827880] EDAC MC0: Giving out device to module ghes_edac.c 
controller ghes_edac: DEV ghes (INTERRUPT)
[   21.841313] GHES: APEI firmware first mode is enabled by APEI bit and 
WHEA _OSC.
[   21.849176] EINJ: Error INJection is initialized.
[   21.855135] ACPI GTDT: found 1 SBSA generic Watchdog(s).

Thanks,
John

>   
>   unlock:
>   	mutex_unlock(&ghes_reg_mutex);
> 


Cheers,
John

  reply	other threads:[~2019-11-21 15:24 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-10-14 15:18 edac KASAN warning in experimental arm64 allmodconfig boot John Garry
2019-10-14 16:09 ` Borislav Petkov
2019-10-14 16:44   ` John Garry
2019-10-14 16:15 ` James Morse
2019-10-14 16:56   ` John Garry
2019-10-14 16:57     ` Borislav Petkov
2019-11-21 12:34 ` linuxnext-2019119 edac warns (was Re: edac KASAN warning in experimental arm64 allmodconfig boot) John Garry
2019-11-21 14:23   ` Robert Richter
2019-11-21 15:23     ` John Garry [this message]
2019-11-21 21:36       ` [PATCH] EDAC/ghes: Do not warn when incrementing refcount on 0 Robert Richter
2019-11-22  9:01         ` Borislav Petkov
2019-11-26  9:57           ` John Garry
2019-11-22 11:28       ` linuxnext-2019119 edac warns (was Re: edac KASAN warning in experimental arm64 allmodconfig boot) Robert Richter
2019-11-26  9:59         ` John Garry
2019-11-27 17:07           ` linuxnext-2019127 " John Garry
2019-11-27 20:54             ` Robert Richter
2019-11-28 11:02               ` linuxnext-20191127 " John Garry
2019-11-28 16:44                 ` Borislav Petkov
2019-11-28 21:12             ` linuxnext-2019127 " Robert Richter
2019-12-02 10:23               ` John Garry
2019-12-02 11:46                 ` Robert Richter

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4ff7631f-fbb7-e45f-87dd-9223beca4da7@huawei.com \
    --to=john.garry@huawei.com \
    --cc=bp@alien8.de \
    --cc=huangming23@huawei.com \
    --cc=james.morse@arm.com \
    --cc=linux-edac@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linuxarm@huawei.com \
    --cc=mchehab@kernel.org \
    --cc=rrichter@marvell.com \
    --cc=tanxiaofei@huawei.com \
    --cc=tony.luck@intel.com \
    --cc=wanghuiqiang@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.