All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v5 0/2] Two x86 fixes
@ 2022-03-10  1:53 Ammar Faizi
  2022-03-10  1:53 ` [PATCH v5 1/2] x86/delay: Fix the wrong asm constraint in `delay_loop()` Ammar Faizi
                   ` (2 more replies)
  0 siblings, 3 replies; 13+ messages in thread
From: Ammar Faizi @ 2022-03-10  1:53 UTC (permalink / raw)
  To: Borislav Petkov, Thomas Gleixner
  Cc: Ammar Faizi, Alviro Iskandar Setiawan, Alviro Iskandar Setiawan,
	Dave Hansen, Greg Kroah-Hartman, H. Peter Anvin, Ingo Molnar,
	Tony Luck, Yazen Ghannam, linux-edac, linux-kernel, stable, gwml,
	x86

Hi,

Two x86 fixes in this series.

1) x86/delay: Fix the wrong Assembly constraint in delay_loop() function.
2) x86/MCE/AMD: Fix memory leak when `threshold_create_bank()` fails.

## Changelog

v5:
  - Mark patch #1 for stable.
  - Commit message improvement for patch #1 and #2.
  - Fold in changes from Yazen and Alviro (for patch #2).

v4:
  - Address comment from Greg, sha1 commit Fixes only needs
    to be 12 chars.
  - Add the author of the fixed commit to the CC list.

v3:
  - Fold in changes from Alviro, the previous version is still
    leaking @bank[n].

v2:
  - Fix wrong copy/paste.

Signed-off-by: Alviro Iskandar Setiawan <alviro.iskandar@gnuweeb.org>
Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Ammar Faizi <ammarfaizi2@gnuweeb.org>
---

Ammar Faizi (2):
  x86/delay: Fix the wrong asm constraint in `delay_loop()`
  x86/MCE/AMD: Fix memory leak when `threshold_create_bank()` fails

 arch/x86/kernel/cpu/mce/amd.c | 32 +++++++++++++++++++-------------
 arch/x86/lib/delay.c          |  4 ++--
 2 files changed, 21 insertions(+), 15 deletions(-)


base-commit: 7e57714cd0ad2d5bb90e50b5096a0e671dec1ef3
-- 
Ammar Faizi


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH v5 1/2] x86/delay: Fix the wrong asm constraint in `delay_loop()`
  2022-03-10  1:53 [PATCH v5 0/2] Two x86 fixes Ammar Faizi
@ 2022-03-10  1:53 ` Ammar Faizi
  2022-03-27 21:38   ` Borislav Petkov
  2022-03-10  1:53 ` [PATCH v5 2/2] x86/MCE/AMD: Fix memory leak when `threshold_create_bank()` fails Ammar Faizi
  2022-03-17  8:19 ` [PATCH v5 0/2] Two x86 fixes Ammar Faizi
  2 siblings, 1 reply; 13+ messages in thread
From: Ammar Faizi @ 2022-03-10  1:53 UTC (permalink / raw)
  To: Borislav Petkov, Thomas Gleixner
  Cc: Ammar Faizi, Alviro Iskandar Setiawan, Alviro Iskandar Setiawan,
	Dave Hansen, Greg Kroah-Hartman, H. Peter Anvin, Ingo Molnar,
	Tony Luck, Yazen Ghannam, linux-edac, linux-kernel, stable, gwml,
	x86, David Laight, Jiri Hladky

The asm constraint does not reflect that the asm statement can modify
the value of @loops. But the asm statement in delay_loop() does modify
the @loops.

Specifiying the wrong constraint may lead to undefined behavior, it may
clobber random stuff (e.g. local variable, important temporary value in
regs, etc.).

Fix this by changing the constraint from "a" (as an input) to "+a" (as
an input and output).

Cc: David Laight <David.Laight@ACULAB.COM>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Jiri Hladky <hladky.jiri@googlemail.com>
Cc: stable@vger.kernel.org # v2.6.27+
Fixes: e01b70ef3eb ("x86: fix bug in arch/i386/lib/delay.c file, delay_loop function")
Signed-off-by: Ammar Faizi <ammarfaizi2@gnuweeb.org>
---
 arch/x86/lib/delay.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/x86/lib/delay.c b/arch/x86/lib/delay.c
index 65d15df6212d..0e65d00e2339 100644
--- a/arch/x86/lib/delay.c
+++ b/arch/x86/lib/delay.c
@@ -54,8 +54,8 @@ static void delay_loop(u64 __loops)
 		"	jnz 2b		\n"
 		"3:	dec %0		\n"
 
-		: /* we don't need output */
-		:"a" (loops)
+		: "+a" (loops)
+		:
 	);
 }
 
-- 
2.32.0


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH v5 2/2] x86/MCE/AMD: Fix memory leak when `threshold_create_bank()` fails
  2022-03-10  1:53 [PATCH v5 0/2] Two x86 fixes Ammar Faizi
  2022-03-10  1:53 ` [PATCH v5 1/2] x86/delay: Fix the wrong asm constraint in `delay_loop()` Ammar Faizi
@ 2022-03-10  1:53 ` Ammar Faizi
  2022-03-27 22:52   ` Borislav Petkov
  2022-03-17  8:19 ` [PATCH v5 0/2] Two x86 fixes Ammar Faizi
  2 siblings, 1 reply; 13+ messages in thread
From: Ammar Faizi @ 2022-03-10  1:53 UTC (permalink / raw)
  To: Borislav Petkov, Thomas Gleixner
  Cc: Ammar Faizi, Alviro Iskandar Setiawan, Alviro Iskandar Setiawan,
	Dave Hansen, Greg Kroah-Hartman, H. Peter Anvin, Ingo Molnar,
	Tony Luck, Yazen Ghannam, linux-edac, linux-kernel, stable, gwml,
	x86

In mce_threshold_create_device(), if threshold_create_bank() fails, the
@bp will be leaked, because the call to mce_threshold_remove_device()
will not free the @bp. mce_threshold_remove_device() frees
@threshold_banks. At that point, the @bp has not been written to
@threshold_banks, @threshold_banks is NULL, so the call is just a nop.

Fix this by extracting the cleanup part into a new static function
_mce_threshold_remove_device(), then call it from create/remove device
functions.

Also, eliminate the "goto out_err", just early return inside the loop
if the creation fails.

Cc: Borislav Petkov <bp@alien8.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: stable@vger.kernel.org # v5.8+
Fixes: 6458de97fc15 ("x86/mce/amd: Straighten CPU hotplug path")
Link: https://lore.kernel.org/lkml/9dfe087a-f941-1bc4-657d-7e7c198888ff@gnuweeb.org
Co-authored-by: Alviro Iskandar Setiawan <alviro.iskandar@gnuweeb.org>
Signed-off-by: Alviro Iskandar Setiawan <alviro.iskandar@gnuweeb.org>
Co-authored-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Ammar Faizi <ammarfaizi2@gnuweeb.org>
---
 arch/x86/kernel/cpu/mce/amd.c | 32 +++++++++++++++++++-------------
 1 file changed, 19 insertions(+), 13 deletions(-)

diff --git a/arch/x86/kernel/cpu/mce/amd.c b/arch/x86/kernel/cpu/mce/amd.c
index 9f4b508886dd..e492efab68a2 100644
--- a/arch/x86/kernel/cpu/mce/amd.c
+++ b/arch/x86/kernel/cpu/mce/amd.c
@@ -1293,10 +1293,23 @@ static void threshold_remove_bank(struct threshold_bank *bank)
 	kfree(bank);
 }
 
+static void _mce_threshold_remove_device(struct threshold_bank **bp,
+					 unsigned int numbanks)
+{
+	unsigned int bank;
+
+	for (bank = 0; bank < numbanks; bank++) {
+		if (bp[bank]) {
+			threshold_remove_bank(bp[bank]);
+			bp[bank] = NULL;
+		}
+	}
+	kfree(bp);
+}
+
 int mce_threshold_remove_device(unsigned int cpu)
 {
 	struct threshold_bank **bp = this_cpu_read(threshold_banks);
-	unsigned int bank, numbanks = this_cpu_read(mce_num_banks);
 
 	if (!bp)
 		return 0;
@@ -1307,13 +1320,7 @@ int mce_threshold_remove_device(unsigned int cpu)
 	 */
 	this_cpu_write(threshold_banks, NULL);
 
-	for (bank = 0; bank < numbanks; bank++) {
-		if (bp[bank]) {
-			threshold_remove_bank(bp[bank]);
-			bp[bank] = NULL;
-		}
-	}
-	kfree(bp);
+	_mce_threshold_remove_device(bp, this_cpu_read(mce_num_banks));
 	return 0;
 }
 
@@ -1350,15 +1357,14 @@ int mce_threshold_create_device(unsigned int cpu)
 		if (!(this_cpu_read(bank_map) & (1 << bank)))
 			continue;
 		err = threshold_create_bank(bp, cpu, bank);
-		if (err)
-			goto out_err;
+		if (err) {
+			_mce_threshold_remove_device(bp, numbanks);
+			return err;
+		}
 	}
 	this_cpu_write(threshold_banks, bp);
 
 	if (thresholding_irq_en)
 		mce_threshold_vector = amd_threshold_interrupt;
 	return 0;
-out_err:
-	mce_threshold_remove_device(cpu);
-	return err;
 }
-- 
2.32.0


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH v5 0/2] Two x86 fixes
  2022-03-10  1:53 [PATCH v5 0/2] Two x86 fixes Ammar Faizi
  2022-03-10  1:53 ` [PATCH v5 1/2] x86/delay: Fix the wrong asm constraint in `delay_loop()` Ammar Faizi
  2022-03-10  1:53 ` [PATCH v5 2/2] x86/MCE/AMD: Fix memory leak when `threshold_create_bank()` fails Ammar Faizi
@ 2022-03-17  8:19 ` Ammar Faizi
  2022-03-17  9:27   ` Borislav Petkov
  2 siblings, 1 reply; 13+ messages in thread
From: Ammar Faizi @ 2022-03-17  8:19 UTC (permalink / raw)
  To: Borislav Petkov, Thomas Gleixner
  Cc: Ammar Faizi, Alviro Iskandar Setiawan, Alviro Iskandar Setiawan,
	David.Laight, Dave Hansen, Greg Kroah-Hartman, H. Peter Anvin,
	Ingo Molnar, Tony Luck, Yazen Ghannam, linux-edac, linux-kernel,
	stable, GNU/Weeb Mailing List, x86

On Thu, Mar 10, 2022 at 8:53 AM Ammar Faizi wrote:
> Two x86 fixes in this series.
>
> 1) x86/delay: Fix the wrong Assembly constraint in delay_loop() function.
> 2) x86/MCE/AMD: Fix memory leak when `threshold_create_bank()` fails.

Ping (1)!
Borislav? Thomas?

Ref: https://lore.kernel.org/lkml/20220310015306.445359-1-ammarfaizi2@gnuweeb.org/

-- 
Ammar Faizi

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v5 0/2] Two x86 fixes
  2022-03-17  8:19 ` [PATCH v5 0/2] Two x86 fixes Ammar Faizi
@ 2022-03-17  9:27   ` Borislav Petkov
  2022-03-17  9:50     ` Ammar Faizi
  0 siblings, 1 reply; 13+ messages in thread
From: Borislav Petkov @ 2022-03-17  9:27 UTC (permalink / raw)
  To: Ammar Faizi
  Cc: Thomas Gleixner, Alviro Iskandar Setiawan,
	Alviro Iskandar Setiawan, David.Laight, Dave Hansen,
	Greg Kroah-Hartman, H. Peter Anvin, Ingo Molnar, Tony Luck,
	Yazen Ghannam, linux-edac, linux-kernel, stable,
	GNU/Weeb Mailing List, x86

On Thu, Mar 17, 2022 at 03:19:07PM +0700, Ammar Faizi wrote:
> On Thu, Mar 10, 2022 at 8:53 AM Ammar Faizi wrote:
> > Two x86 fixes in this series.
> >
> > 1) x86/delay: Fix the wrong Assembly constraint in delay_loop() function.
> > 2) x86/MCE/AMD: Fix memory leak when `threshold_create_bank()` fails.
> 
> Ping (1)!
> Borislav? Thomas?

Yes, what's up?

Are those urgent fixes which break some use case or can you simply sit
patiently and wait?

Because we have an upcoming merge window and we need to prepare for
that. And there are real bugs that need fixing too.

So what's the rush here?

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v5 0/2] Two x86 fixes
  2022-03-17  9:27   ` Borislav Petkov
@ 2022-03-17  9:50     ` Ammar Faizi
  0 siblings, 0 replies; 13+ messages in thread
From: Ammar Faizi @ 2022-03-17  9:50 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Thomas Gleixner, Alviro Iskandar Setiawan,
	Alviro Iskandar Setiawan, David.Laight, Dave Hansen,
	Greg Kroah-Hartman, H. Peter Anvin, Ingo Molnar, Tony Luck,
	Yazen Ghannam, linux-edac, linux-kernel, stable,
	GNU/Weeb Mailing List, x86

On Thu, Mar 17, 2022 at 4:27 PM Borislav Petkov wrote:
> Yes, what's up?
>
> Are those urgent fixes which break some use case or can you simply sit
> patiently and wait?

Sorry for pinging at the wrong time. Excuse my weekly ping. They are
not urgent fixes. So no rush.

> Because we have an upcoming merge window and we need to prepare for
> that. And there are real bugs that need fixing too.

Hopefully, the 5.17 release and 5.18 merge window go well.

-- 
Ammar Faizi

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v5 1/2] x86/delay: Fix the wrong asm constraint in `delay_loop()`
  2022-03-10  1:53 ` [PATCH v5 1/2] x86/delay: Fix the wrong asm constraint in `delay_loop()` Ammar Faizi
@ 2022-03-27 21:38   ` Borislav Petkov
  2022-03-28  4:16     ` Ammar Faizi
  2022-03-28  4:29     ` Ammar Faizi
  0 siblings, 2 replies; 13+ messages in thread
From: Borislav Petkov @ 2022-03-27 21:38 UTC (permalink / raw)
  To: Ammar Faizi
  Cc: Thomas Gleixner, Alviro Iskandar Setiawan,
	Alviro Iskandar Setiawan, Dave Hansen, Greg Kroah-Hartman,
	H. Peter Anvin, Ingo Molnar, Tony Luck, Yazen Ghannam,
	linux-edac, linux-kernel, stable, gwml, x86, David Laight,
	Jiri Hladky

On Thu, Mar 10, 2022 at 08:53:05AM +0700, Ammar Faizi wrote:
> The asm constraint does not reflect that the asm statement can modify
> the value of @loops. But the asm statement in delay_loop() does modify
> the @loops.
> 
> Specifiying the wrong constraint may lead to undefined behavior, it may
> clobber random stuff (e.g. local variable, important temporary value in
> regs, etc.).

This is especially dangerous when the compiler decides to inline the
function and since it doesn't know that the value gets modified, it
might decide to use it from a register directly without reloading it.

Add that to the commit message pls.

> Fix this by changing the constraint from "a" (as an input) to "+a" (as
> an input and output).
> 
> Cc: David Laight <David.Laight@ACULAB.COM>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Ingo Molnar <mingo@redhat.com>
> Cc: Borislav Petkov <bp@alien8.de>
> Cc: Dave Hansen <dave.hansen@linux.intel.com>
> Cc: "H. Peter Anvin" <hpa@zytor.com>
> Cc: Jiri Hladky <hladky.jiri@googlemail.com>

All those Ccs in the commit message are not really needed -
get_maintainers.pl gives the correct list already.

> Cc: stable@vger.kernel.org # v2.6.27+

I don't see the need for the stable Cc. Or do you have a case where
a corruption really does happen?

> Fixes: e01b70ef3eb ("x86: fix bug in arch/i386/lib/delay.c file, delay_loop function")

Commit sha1 (e01b70ef3eb) needs to be at least 12 chars long:

e01b70ef3eb3 ("x86: fix bug in arch/i386/lib/delay.c file, delay_loop function")

This is best fixed by doing:

[core]
        abbrev = 12

in your .git/config

> Signed-off-by: Ammar Faizi <ammarfaizi2@gnuweeb.org>
> ---
>  arch/x86/lib/delay.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/x86/lib/delay.c b/arch/x86/lib/delay.c
> index 65d15df6212d..0e65d00e2339 100644
> --- a/arch/x86/lib/delay.c
> +++ b/arch/x86/lib/delay.c
> @@ -54,8 +54,8 @@ static void delay_loop(u64 __loops)
>  		"	jnz 2b		\n"
>  		"3:	dec %0		\n"
>  
> -		: /* we don't need output */
> -		:"a" (loops)
> +		: "+a" (loops)
> +		:
>  	);

Thx.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v5 2/2] x86/MCE/AMD: Fix memory leak when `threshold_create_bank()` fails
  2022-03-10  1:53 ` [PATCH v5 2/2] x86/MCE/AMD: Fix memory leak when `threshold_create_bank()` fails Ammar Faizi
@ 2022-03-27 22:52   ` Borislav Petkov
  2022-03-28  4:12     ` Ammar Faizi
  0 siblings, 1 reply; 13+ messages in thread
From: Borislav Petkov @ 2022-03-27 22:52 UTC (permalink / raw)
  To: Ammar Faizi
  Cc: Thomas Gleixner, Alviro Iskandar Setiawan,
	Alviro Iskandar Setiawan, Dave Hansen, Greg Kroah-Hartman,
	H. Peter Anvin, Ingo Molnar, Tony Luck, Yazen Ghannam,
	linux-edac, linux-kernel, stable, gwml, x86

On Thu, Mar 10, 2022 at 08:53:06AM +0700, Ammar Faizi wrote:
> In mce_threshold_create_device(), if threshold_create_bank() fails, the
> @bp will be leaked, because the call to mce_threshold_remove_device()
> will not free the @bp. mce_threshold_remove_device() frees
> @threshold_banks. At that point, the @bp has not been written to
> @threshold_banks, @threshold_banks is NULL, so the call is just a nop.
> 
> Fix this by extracting the cleanup part into a new static function
> _mce_threshold_remove_device(), then call it from create/remove device
> functions.
> 
> Also, eliminate the "goto out_err", just early return inside the loop
> if the creation fails.
> 
> Cc: Borislav Petkov <bp@alien8.de>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> Cc: stable@vger.kernel.org # v5.8+
> Fixes: 6458de97fc15 ("x86/mce/amd: Straighten CPU hotplug path")

How did you decide this is the commit that this is fixing?

> Link: https://lore.kernel.org/lkml/9dfe087a-f941-1bc4-657d-7e7c198888ff@gnuweeb.org

That Link tag is not needed.

> Co-authored-by: Alviro Iskandar Setiawan <alviro.iskandar@gnuweeb.org>
> Signed-off-by: Alviro Iskandar Setiawan <alviro.iskandar@gnuweeb.org>
> Co-authored-by: Yazen Ghannam <yazen.ghannam@amd.com>

There's no "Co-authored-by".

The correct tag is described in

Documentation/process/submitting-patches.rst

Please make sure you've read that file before sending patches.

> Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
> Signed-off-by: Ammar Faizi <ammarfaizi2@gnuweeb.org>
> ---
>  arch/x86/kernel/cpu/mce/amd.c | 32 +++++++++++++++++++-------------
>  1 file changed, 19 insertions(+), 13 deletions(-)

...

> @@ -1350,15 +1357,14 @@ int mce_threshold_create_device(unsigned int cpu)
>  		if (!(this_cpu_read(bank_map) & (1 << bank)))
>  			continue;
>  		err = threshold_create_bank(bp, cpu, bank);
> -		if (err)
> -			goto out_err;
> +		if (err) {
> +			_mce_threshold_remove_device(bp, numbanks);
> +			return err;
> +		}
>  	}
>  	this_cpu_write(threshold_banks, bp);

Do I see it correctly that the publishing of the @bp pointer - i.e.,
this line - should be moved right above the for loop?

Then mce_threshold_remove_device() would properly free it in the error
case and your patch turns into a oneliner?

And then your Fixes: tag would be correct too...

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v5 2/2] x86/MCE/AMD: Fix memory leak when `threshold_create_bank()` fails
  2022-03-27 22:52   ` Borislav Petkov
@ 2022-03-28  4:12     ` Ammar Faizi
  2022-03-28  8:05       ` Borislav Petkov
  0 siblings, 1 reply; 13+ messages in thread
From: Ammar Faizi @ 2022-03-28  4:12 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Thomas Gleixner, Alviro Iskandar Setiawan,
	Alviro Iskandar Setiawan, Dave Hansen, Greg Kroah-Hartman,
	H. Peter Anvin, Ingo Molnar, Tony Luck, Yazen Ghannam,
	linux-edac, linux-kernel, stable, gwml, x86

On 3/28/22 5:52 AM, Borislav Petkov wrote:
[...]
>> Fixes: 6458de97fc15 ("x86/mce/amd: Straighten CPU hotplug path")
> 
> How did you decide this is the commit that this is fixing?

I examined the history in those lines by git blame. Will recheck after the below
doubt is cleared.

>> Link: https://lore.kernel.org/lkml/9dfe087a-f941-1bc4-657d-7e7c198888ff@gnuweeb.org
> 
> That Link tag is not needed.
> 
>> Co-authored-by: Alviro Iskandar Setiawan <alviro.iskandar@gnuweeb.org>
>> Signed-off-by: Alviro Iskandar Setiawan <alviro.iskandar@gnuweeb.org>
>> Co-authored-by: Yazen Ghannam <yazen.ghannam@amd.com>
> 
> There's no "Co-authored-by".
> 
> The correct tag is described in
> 
> Documentation/process/submitting-patches.rst

Will fix them in the v6.

> ...
> 
>> @@ -1350,15 +1357,14 @@ int mce_threshold_create_device(unsigned int cpu)
>>   		if (!(this_cpu_read(bank_map) & (1 << bank)))
>>   			continue;
>>   		err = threshold_create_bank(bp, cpu, bank);
>> -		if (err)
>> -			goto out_err;
>> +		if (err) {
>> +			_mce_threshold_remove_device(bp, numbanks);
>> +			return err;
>> +		}
>>   	}
>>   	this_cpu_write(threshold_banks, bp);
> 
> Do I see it correctly that the publishing of the @bp pointer - i.e.,
> this line - should be moved right above the for loop?
> 
> Then mce_threshold_remove_device() would properly free it in the error
> case and your patch turns into a oneliner?

Previously, in v4 I did that too. But after discussion with Yazen, we got a
conclusion that placing `this_cpu_write(threshold_banks, bp);` before the for loop
is not the right thing to do.

> And then your Fixes: tag would be correct too...
The reason is based on the discussion with Yazen, the full discussion can be read in
the Link tag above.

==================
The point is:

On Wed, 2 Mar 2022 17:26:32 +0000, Yazen Ghannam <yazen.ghannam@amd.com> wrote:
> The threshold interrupt handler uses this pointer. I think the goal here is to
> set this pointer when the list is fully formed and clear this pointer before
> making any changes to the list. Otherwise, the interrupt handler will operate
> on incomplete data if an interrupt comes in the middle of these updates.
==================

Also, looking at the comment in mce_threshold_remove_device() function:

	/*
	 * Clear the pointer before cleaning up, so that the interrupt won't
	 * touch anything of this.
	 */
	this_cpu_write(threshold_banks, NULL);

I think it's reasonable to place `this_cpu_write(threshold_banks, bp);` after
the "for loop" on the creation process for the similar reason. In short, don't
let the interrupt sees incomplete data.

Although, I am not sure if that 100% guarantees mce_threshold_remove_device()
will not mess up with the interrupt (e.g. freeing the data while the interrupt
reading it), unless we're using RCU stuff.

What do you think?

-- 
Ammar Faizi

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v5 1/2] x86/delay: Fix the wrong asm constraint in `delay_loop()`
  2022-03-27 21:38   ` Borislav Petkov
@ 2022-03-28  4:16     ` Ammar Faizi
  2022-03-28  4:29     ` Ammar Faizi
  1 sibling, 0 replies; 13+ messages in thread
From: Ammar Faizi @ 2022-03-28  4:16 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Thomas Gleixner, Alviro Iskandar Setiawan,
	Alviro Iskandar Setiawan, Dave Hansen, Greg Kroah-Hartman,
	H. Peter Anvin, Ingo Molnar, Tony Luck, Yazen Ghannam,
	linux-edac, linux-kernel, stable, gwml, x86, David Laight,
	Jiri Hladky

On 3/28/22 4:38 AM, Borislav Petkov wrote:
> All those Ccs in the commit message are not really needed -
> get_maintainers.pl gives the correct list already.
> 
>> Cc: stable@vger.kernel.org # v2.6.27+
> 
> I don't see the need for the stable Cc. Or do you have a case where
> a corruption really does happen?
> 
>> Fixes: e01b70ef3eb ("x86: fix bug in arch/i386/lib/delay.c file, delay_loop function")
> 
> Commit sha1 (e01b70ef3eb) needs to be at least 12 chars long:
> 
> e01b70ef3eb3 ("x86: fix bug in arch/i386/lib/delay.c file, delay_loop function")
> 
> This is best fixed by doing:
> 
> [core]
>          abbrev = 12
> 
> in your .git/config

Will fix that in the v6.

-- 
Ammar Faizi

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v5 1/2] x86/delay: Fix the wrong asm constraint in `delay_loop()`
  2022-03-27 21:38   ` Borislav Petkov
  2022-03-28  4:16     ` Ammar Faizi
@ 2022-03-28  4:29     ` Ammar Faizi
  2022-03-28  7:56       ` Borislav Petkov
  1 sibling, 1 reply; 13+ messages in thread
From: Ammar Faizi @ 2022-03-28  4:29 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Thomas Gleixner, Alviro Iskandar Setiawan,
	Alviro Iskandar Setiawan, Dave Hansen, Greg Kroah-Hartman,
	H. Peter Anvin, Ingo Molnar, Tony Luck, Yazen Ghannam,
	linux-edac, linux-kernel, stable, gwml, x86, David Laight,
	Jiri Hladky

On 3/28/22 4:38 AM, Borislav Petkov wrote:
> On Thu, Mar 10, 2022 at 08:53:05AM +0700, Ammar Faizi wrote:
>> The asm constraint does not reflect that the asm statement can modify
>> the value of @loops. But the asm statement in delay_loop() does modify
>> the @loops.
>>
>> Specifiying the wrong constraint may lead to undefined behavior, it may
>> clobber random stuff (e.g. local variable, important temporary value in
>> regs, etc.).
> 
> This is especially dangerous when the compiler decides to inline the
> function and since it doesn't know that the value gets modified, it
> might decide to use it from a register directly without reloading it.
> 
> Add that to the commit message pls.

Will add that in the v6.

>> Cc: stable@vger.kernel.org # v2.6.27+
> 
> I don't see the need for the stable Cc. Or do you have a case where
> a corruption really does happen?

I don't find any visible issue on this. But that's undefined behavior,
different compiler may yield different result (e.g. there is no guarantee
newer compilers will produce the appropriate result due to UB). So it's not
something we should rely on.

============
Side note for inline:
Even if it is not inlined, it's still dangerous, because if the compiler is
able to see that the function to be called doesn't clobber some call-clobbered
regs, the compiler can assume the call-clobbered regs are not clobbered and it
reuses the value without reloading.

See the example from Alviro here:

   https://lore.kernel.org/lkml/CAOG64qPgTv5tQNknuG9d-=oL2EPQQ1ys7xu2FoBpNLyzv1qYzA@mail.gmail.com/

-- 
Ammar Faizi

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v5 1/2] x86/delay: Fix the wrong asm constraint in `delay_loop()`
  2022-03-28  4:29     ` Ammar Faizi
@ 2022-03-28  7:56       ` Borislav Petkov
  0 siblings, 0 replies; 13+ messages in thread
From: Borislav Petkov @ 2022-03-28  7:56 UTC (permalink / raw)
  To: Ammar Faizi
  Cc: Thomas Gleixner, Alviro Iskandar Setiawan,
	Alviro Iskandar Setiawan, Dave Hansen, Greg Kroah-Hartman,
	H. Peter Anvin, Ingo Molnar, Tony Luck, Yazen Ghannam,
	linux-edac, linux-kernel, stable, gwml, x86, David Laight,
	Jiri Hladky

On Mon, Mar 28, 2022 at 11:29:26AM +0700, Ammar Faizi wrote:
> See the example from Alviro here:
> 
>   https://lore.kernel.org/lkml/CAOG64qPgTv5tQNknuG9d-=oL2EPQQ1ys7xu2FoBpNLyzv1qYzA@mail.gmail.com/

Not the same thing - see David's reply there.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v5 2/2] x86/MCE/AMD: Fix memory leak when `threshold_create_bank()` fails
  2022-03-28  4:12     ` Ammar Faizi
@ 2022-03-28  8:05       ` Borislav Petkov
  0 siblings, 0 replies; 13+ messages in thread
From: Borislav Petkov @ 2022-03-28  8:05 UTC (permalink / raw)
  To: Ammar Faizi
  Cc: Thomas Gleixner, Alviro Iskandar Setiawan,
	Alviro Iskandar Setiawan, Dave Hansen, Greg Kroah-Hartman,
	H. Peter Anvin, Ingo Molnar, Tony Luck, Yazen Ghannam,
	linux-edac, linux-kernel, stable, gwml, x86

On Mon, Mar 28, 2022 at 11:12:53AM +0700, Ammar Faizi wrote:
> Although, I am not sure if that 100% guarantees mce_threshold_remove_device()
> will not mess up with the interrupt (e.g. freeing the data while the interrupt
> reading it), unless we're using RCU stuff.
> 
> What do you think?

I would've said it doesn't matter but that thresholding device creation
is part of hotplug and it can happen multiple times even *after* the
interrupt vector has been set during setup so a potential teardown and
concurrent thresholding interrupt firing might really hit in a not fully
initialized/cleaned up state so yeah, let's do Yazen's thing.

The alternative would be the temporarily re-assign mce_threshold_vector
to default_threshold_interrupt while setup is being done but that's not
really necessary atm.

But call that helper function __threshold_remove_device().

Thx.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2022-03-28  8:05 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-03-10  1:53 [PATCH v5 0/2] Two x86 fixes Ammar Faizi
2022-03-10  1:53 ` [PATCH v5 1/2] x86/delay: Fix the wrong asm constraint in `delay_loop()` Ammar Faizi
2022-03-27 21:38   ` Borislav Petkov
2022-03-28  4:16     ` Ammar Faizi
2022-03-28  4:29     ` Ammar Faizi
2022-03-28  7:56       ` Borislav Petkov
2022-03-10  1:53 ` [PATCH v5 2/2] x86/MCE/AMD: Fix memory leak when `threshold_create_bank()` fails Ammar Faizi
2022-03-27 22:52   ` Borislav Petkov
2022-03-28  4:12     ` Ammar Faizi
2022-03-28  8:05       ` Borislav Petkov
2022-03-17  8:19 ` [PATCH v5 0/2] Two x86 fixes Ammar Faizi
2022-03-17  9:27   ` Borislav Petkov
2022-03-17  9:50     ` Ammar Faizi

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.