linuxppc-dev.lists.ozlabs.org archive mirror
 help / color / mirror / Atom feed
* hardcoded SIGSEGV in __die() ?
@ 2020-03-23 14:17 Joakim Tjernlund
  2020-03-23 14:43 ` Christophe Leroy
  0 siblings, 1 reply; 11+ messages in thread
From: Joakim Tjernlund @ 2020-03-23 14:17 UTC (permalink / raw)
  To: linuxppc-dev

In __die(), see below, there is this call to notify_send() with SIGSEGV hardcoded, this seems odd
to me as the variable "err" holds the true signal(in my case SIGBUS)
Should not SIGSEGV be replaced with the true signal no.?

  Jocke

static int __die(const char *str, struct pt_regs *regs, long err)
{
	printk("Oops: %s, sig: %ld [#%d]\n", str, err, ++die_counter);

	if (IS_ENABLED(CONFIG_CPU_LITTLE_ENDIAN))
		printk("LE ");
	else
		printk("BE ");

	if (IS_ENABLED(CONFIG_PREEMPT))
		pr_cont("PREEMPT ");

	if (IS_ENABLED(CONFIG_SMP))
		pr_cont("SMP NR_CPUS=%d ", NR_CPUS);

	if (debug_pagealloc_enabled())
		pr_cont("DEBUG_PAGEALLOC ");

	if (IS_ENABLED(CONFIG_NUMA))
		pr_cont("NUMA ");

	pr_cont("%s\n", ppc_md.name ? ppc_md.name : "");

	if (notify_die(DIE_OOPS, str, regs, err, 255, SIGSEGV) == NOTIFY_STOP)
		return 1;

	print_modules();
	show_regs(regs);

	return 0;
}

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: hardcoded SIGSEGV in __die() ?
  2020-03-23 14:17 hardcoded SIGSEGV in __die() ? Joakim Tjernlund
@ 2020-03-23 14:43 ` Christophe Leroy
  2020-03-23 14:45   ` Christophe Leroy
  0 siblings, 1 reply; 11+ messages in thread
From: Christophe Leroy @ 2020-03-23 14:43 UTC (permalink / raw)
  To: Joakim Tjernlund, linuxppc-dev



Le 23/03/2020 à 15:17, Joakim Tjernlund a écrit :
> In __die(), see below, there is this call to notify_send() with SIGSEGV hardcoded, this seems odd
> to me as the variable "err" holds the true signal(in my case SIGBUS)
> Should not SIGSEGV be replaced with the true signal no.?

As far as I can see, comes from 
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=66fcb1059

Christophe

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: hardcoded SIGSEGV in __die() ?
  2020-03-23 14:43 ` Christophe Leroy
@ 2020-03-23 14:45   ` Christophe Leroy
  2020-03-23 15:08     ` Joakim Tjernlund
  0 siblings, 1 reply; 11+ messages in thread
From: Christophe Leroy @ 2020-03-23 14:45 UTC (permalink / raw)
  To: Joakim Tjernlund, linuxppc-dev



Le 23/03/2020 à 15:43, Christophe Leroy a écrit :
> 
> 
> Le 23/03/2020 à 15:17, Joakim Tjernlund a écrit :
>> In __die(), see below, there is this call to notify_send() with 
>> SIGSEGV hardcoded, this seems odd
>> to me as the variable "err" holds the true signal(in my case SIGBUS)
>> Should not SIGSEGV be replaced with the true signal no.?
> 
> As far as I can see, comes from 
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=66fcb1059 
> 

And 
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=ae87221d3ce49d9de1e43756da834fd0bf05a2ad 
shows it is (was?) similar on x86.

Christophe

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: hardcoded SIGSEGV in __die() ?
  2020-03-23 14:45   ` Christophe Leroy
@ 2020-03-23 15:08     ` Joakim Tjernlund
  2020-03-23 15:31       ` Christophe Leroy
  2020-03-26  0:28       ` Michael Ellerman
  0 siblings, 2 replies; 11+ messages in thread
From: Joakim Tjernlund @ 2020-03-23 15:08 UTC (permalink / raw)
  To: christophe.leroy, linuxppc-dev

On Mon, 2020-03-23 at 15:45 +0100, Christophe Leroy wrote:
> CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you recognize the sender and know the content is safe.
> 
> 
> Le 23/03/2020 à 15:43, Christophe Leroy a écrit :
> > 
> > Le 23/03/2020 à 15:17, Joakim Tjernlund a écrit :
> > > In __die(), see below, there is this call to notify_send() with
> > > SIGSEGV hardcoded, this seems odd
> > > to me as the variable "err" holds the true signal(in my case SIGBUS)
> > > Should not SIGSEGV be replaced with the true signal no.?
> > 
> > As far as I can see, comes from
> > https://nam03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgit.kernel.org%2Fpub%2Fscm%2Flinux%2Fkernel%2Fgit%2Ftorvalds%2Flinux.git%2Fcommit%2F%3Fid%3D66fcb1059&data=02%7C01%7CJoakim.Tjernlund%40infinera.com%7C4291ac1b501e4296869a08d7cf38cdb4%7C285643de5f5b4b03a1530ae2dc8aaf77%7C1%7C0%7C637205715189366995&sdata=Z2bFsmDlD2MKhLACQvayk9ejz0dqgMEOlBTlocAmtTg%3D&reserved=0
> > 
> 
> And
> https://nam03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgit.kernel.org%2Fpub%2Fscm%2Flinux%2Fkernel%2Fgit%2Ftorvalds%2Flinux.git%2Fcommit%2F%3Fid%3Dae87221d3ce49d9de1e43756da834fd0bf05a2ad&data=02%7C01%7CJoakim.Tjernlund%40infinera.com%7C4291ac1b501e4296869a08d7cf38cdb4%7C285643de5f5b4b03a1530ae2dc8aaf77%7C1%7C0%7C637205715189366995&sdata=97kyz3Ur88BhDUUYzya5t%2FFQVhXYu6qiHoW8hsEg81s%3D&reserved=0
> shows it is (was?) similar on x86.
> 

I tried to follow that chain thinking it would end up sending a signal to user space but I cannot see
that happens. Seems to be related to debugging.

In short, I cannot see any signal being delivered to user space. If so that would explain why
our user space process never dies.
Is there a signal hidden in machine_check handler for SIGBUS I cannot see?

     Jocke

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: hardcoded SIGSEGV in __die() ?
  2020-03-23 15:08     ` Joakim Tjernlund
@ 2020-03-23 15:31       ` Christophe Leroy
  2020-03-23 15:44         ` Joakim Tjernlund
  2020-03-26  0:28       ` Michael Ellerman
  1 sibling, 1 reply; 11+ messages in thread
From: Christophe Leroy @ 2020-03-23 15:31 UTC (permalink / raw)
  To: Joakim Tjernlund, linuxppc-dev



Le 23/03/2020 à 16:08, Joakim Tjernlund a écrit :
> On Mon, 2020-03-23 at 15:45 +0100, Christophe Leroy wrote:
>> CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you recognize the sender and know the content is safe.
>>
>>
>> Le 23/03/2020 à 15:43, Christophe Leroy a écrit :
>>>
>>> Le 23/03/2020 à 15:17, Joakim Tjernlund a écrit :
>>>> In __die(), see below, there is this call to notify_send() with
>>>> SIGSEGV hardcoded, this seems odd
>>>> to me as the variable "err" holds the true signal(in my case SIGBUS)
>>>> Should not SIGSEGV be replaced with the true signal no.?
>>>
>>> As far as I can see, comes from
>>> https://nam03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgit.kernel.org%2Fpub%2Fscm%2Flinux%2Fkernel%2Fgit%2Ftorvalds%2Flinux.git%2Fcommit%2F%3Fid%3D66fcb1059&data=02%7C01%7CJoakim.Tjernlund%40infinera.com%7C4291ac1b501e4296869a08d7cf38cdb4%7C285643de5f5b4b03a1530ae2dc8aaf77%7C1%7C0%7C637205715189366995&sdata=Z2bFsmDlD2MKhLACQvayk9ejz0dqgMEOlBTlocAmtTg%3D&reserved=0
>>>
>>
>> And
>> https://nam03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgit.kernel.org%2Fpub%2Fscm%2Flinux%2Fkernel%2Fgit%2Ftorvalds%2Flinux.git%2Fcommit%2F%3Fid%3Dae87221d3ce49d9de1e43756da834fd0bf05a2ad&data=02%7C01%7CJoakim.Tjernlund%40infinera.com%7C4291ac1b501e4296869a08d7cf38cdb4%7C285643de5f5b4b03a1530ae2dc8aaf77%7C1%7C0%7C637205715189366995&sdata=97kyz3Ur88BhDUUYzya5t%2FFQVhXYu6qiHoW8hsEg81s%3D&reserved=0
>> shows it is (was?) similar on x86.
>>
> 
> I tried to follow that chain thinking it would end up sending a signal to user space but I cannot see
> that happens. Seems to be related to debugging.
> 
> In short, I cannot see any signal being delivered to user space. If so that would explain why
> our user space process never dies.
> Is there a signal hidden in machine_check handler for SIGBUS I cannot see?
> 

Isn't it done in do_exit(), called from oops_end() ?

Christophe

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: hardcoded SIGSEGV in __die() ?
  2020-03-23 15:31       ` Christophe Leroy
@ 2020-03-23 15:44         ` Joakim Tjernlund
  2020-03-25 17:02           ` David Laight
  0 siblings, 1 reply; 11+ messages in thread
From: Joakim Tjernlund @ 2020-03-23 15:44 UTC (permalink / raw)
  To: christophe.leroy, linuxppc-dev

On Mon, 2020-03-23 at 16:31 +0100, Christophe Leroy wrote:
> 
> Le 23/03/2020 à 16:08, Joakim Tjernlund a écrit :
> > On Mon, 2020-03-23 at 15:45 +0100, Christophe Leroy wrote:
> > > CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you recognize the sender and know the content is safe.
> > > 
> > > 
> > > Le 23/03/2020 à 15:43, Christophe Leroy a écrit :
> > > > Le 23/03/2020 à 15:17, Joakim Tjernlund a écrit :
> > > > > In __die(), see below, there is this call to notify_send() with
> > > > > SIGSEGV hardcoded, this seems odd
> > > > > to me as the variable "err" holds the true signal(in my case SIGBUS)
> > > > > Should not SIGSEGV be replaced with the true signal no.?
> > > > 
> > > > As far as I can see, comes from
> > > > https://nam03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgit.kernel.org%2Fpub%2Fscm%2Flinux%2Fkernel%2Fgit%2Ftorvalds%2Flinux.git%2Fcommit%2F%3Fid%3D66fcb1059&data=02%7C01%7CJoakim.Tjernlund%40infinera.com%7Cefe6d37a85e1494658ec08d7cf3f513f%7C285643de5f5b4b03a1530ae2dc8aaf77%7C1%7C0%7C637205743206770599&sdata=k8%2Bs7ifiCyuNzXuOhykjXUEtWzD62q3HGIIiavqE6%2FA%3D&reserved=0
> > > > 
> > > 
> > > And
> > > https://nam03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgit.kernel.org%2Fpub%2Fscm%2Flinux%2Fkernel%2Fgit%2Ftorvalds%2Flinux.git%2Fcommit%2F%3Fid%3Dae87221d3ce49d9de1e43756da834fd0bf05a2ad&data=02%7C01%7CJoakim.Tjernlund%40infinera.com%7Cefe6d37a85e1494658ec08d7cf3f513f%7C285643de5f5b4b03a1530ae2dc8aaf77%7C1%7C0%7C637205743206770599&sdata=oCU%2FMelrWDOCjmGOfVuNp2tM%2BwQ%2BRD25jzRWoGbHAew%3D&reserved=0
> > > shows it is (was?) similar on x86.
> > > 
> > 
> > I tried to follow that chain thinking it would end up sending a signal to user space but I cannot see
> > that happens. Seems to be related to debugging.
> > 
> > In short, I cannot see any signal being delivered to user space. If so that would explain why
> > our user space process never dies.
> > Is there a signal hidden in machine_check handler for SIGBUS I cannot see?
> > 
> 
> Isn't it done in do_exit(), called from oops_end() ?

hmm, so it seems. The odd thing though is that do_exit takes an exit code, not signal number.
Also, feels a bit odd to force an exit(that we haven't seen happening) rather than just a signal.

     Jocke

^ permalink raw reply	[flat|nested] 11+ messages in thread

* RE: hardcoded SIGSEGV in __die() ?
  2020-03-23 15:44         ` Joakim Tjernlund
@ 2020-03-25 17:02           ` David Laight
  2020-03-25 17:09             ` Joakim Tjernlund
  0 siblings, 1 reply; 11+ messages in thread
From: David Laight @ 2020-03-25 17:02 UTC (permalink / raw)
  To: 'Joakim Tjernlund', christophe.leroy, linuxppc-dev

From: Joakim Tjernlund
> Sent: 23 March 2020 15:45
...
> > > I tried to follow that chain thinking it would end up sending a signal to user space but I cannot
> see
> > > that happens. Seems to be related to debugging.
> > >
> > > In short, I cannot see any signal being delivered to user space. If so that would explain why
> > > our user space process never dies.
> > > Is there a signal hidden in machine_check handler for SIGBUS I cannot see?
> > >
> >
> > Isn't it done in do_exit(), called from oops_end() ?
> 
> hmm, so it seems. The odd thing though is that do_exit takes an exit code, not signal number.
> Also, feels a bit odd to force an exit(that we haven't seen happening) rather than just a signal.

Isn't there something 'magic' that converts EFAULT into SIGSEGV?

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: hardcoded SIGSEGV in __die() ?
  2020-03-25 17:02           ` David Laight
@ 2020-03-25 17:09             ` Joakim Tjernlund
  0 siblings, 0 replies; 11+ messages in thread
From: Joakim Tjernlund @ 2020-03-25 17:09 UTC (permalink / raw)
  To: christophe.leroy, linuxppc-dev, David.Laight

On Wed, 2020-03-25 at 17:02 +0000, David Laight wrote:
> CAUTION: This email originated from outside of the organization. Do
> not click links or open attachments unless you recognize the sender
> and know the content is safe.
> 
> 
> From: Joakim Tjernlund
> > Sent: 23 March 2020 15:45
> ...
> > > > I tried to follow that chain thinking it would end up sending a
> > > > signal to user space but I cannot
> > see
> > > > that happens. Seems to be related to debugging.
> > > > 
> > > > In short, I cannot see any signal being delivered to user
> > > > space. If so that would explain why
> > > > our user space process never dies.
> > > > Is there a signal hidden in machine_check handler for SIGBUS I
> > > > cannot see?
> > > > 
> > > 
> > > Isn't it done in do_exit(), called from oops_end() ?
> > 
> > hmm, so it seems. The odd thing though is that do_exit takes an
> > exit code, not signal number.
> > Also, feels a bit odd to force an exit(that we haven't seen
> > happening) rather than just a signal.
> 
> Isn't there something 'magic' that converts EFAULT into SIGSEGV?

I have tried to find out and I cannot see a signal beeing sent.
Also, SEGV is wrong, this is a SIGBUS fault.

> 
>         David
> 
> -
> Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes,
> MK1 1PT, UK
> Registration No: 1397386 (Wales)

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: hardcoded SIGSEGV in __die() ?
  2020-03-23 15:08     ` Joakim Tjernlund
  2020-03-23 15:31       ` Christophe Leroy
@ 2020-03-26  0:28       ` Michael Ellerman
  2020-03-27 10:10         ` Joakim Tjernlund
  2020-03-30 17:16         ` Joakim Tjernlund
  1 sibling, 2 replies; 11+ messages in thread
From: Michael Ellerman @ 2020-03-26  0:28 UTC (permalink / raw)
  To: Joakim Tjernlund, christophe.leroy, linuxppc-dev

Joakim Tjernlund <Joakim.Tjernlund@infinera.com> writes:
> On Mon, 2020-03-23 at 15:45 +0100, Christophe Leroy wrote:
>> Le 23/03/2020 à 15:43, Christophe Leroy a écrit :
>> > Le 23/03/2020 à 15:17, Joakim Tjernlund a écrit :
>> > > In __die(), see below, there is this call to notify_send() with
>> > > SIGSEGV hardcoded, this seems odd
>> > > to me as the variable "err" holds the true signal(in my case SIGBUS)
>> > > Should not SIGSEGV be replaced with the true signal no.?
>> > 
>> > As far as I can see, comes from
>> > https://nam03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgit.kernel.org%2Fpub%2Fscm%2Flinux%2Fkernel%2Fgit%2Ftorvalds%2Flinux.git%2Fcommit%2F%3Fid%3D66fcb1059&amp;data=02%7C01%7CJoakim.Tjernlund%40infinera.com%7C4291ac1b501e4296869a08d7cf38cdb4%7C285643de5f5b4b03a1530ae2dc8aaf77%7C1%7C0%7C637205715189366995&amp;sdata=Z2bFsmDlD2MKhLACQvayk9ejz0dqgMEOlBTlocAmtTg%3D&amp;reserved=0
>> > 
>> 
>> And
>> https://nam03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgit.kernel.org%2Fpub%2Fscm%2Flinux%2Fkernel%2Fgit%2Ftorvalds%2Flinux.git%2Fcommit%2F%3Fid%3Dae87221d3ce49d9de1e43756da834fd0bf05a2ad&amp;data=02%7C01%7CJoakim.Tjernlund%40infinera.com%7C4291ac1b501e4296869a08d7cf38cdb4%7C285643de5f5b4b03a1530ae2dc8aaf77%7C1%7C0%7C637205715189366995&amp;sdata=97kyz3Ur88BhDUUYzya5t%2FFQVhXYu6qiHoW8hsEg81s%3D&amp;reserved=0
>> shows it is (was?) similar on x86.
>> 
>
> I tried to follow that chain thinking it would end up sending a signal to user space but I cannot see
> that happens. Seems to be related to debugging.
>
> In short, I cannot see any signal being delivered to user space. If so that would explain why
> our user space process never dies.
> Is there a signal hidden in machine_check handler for SIGBUS I cannot see?

It's platform specific. What platform are you on?

See the ppc_md & cur_cpu_spec calls here:

void machine_check_exception(struct pt_regs *regs)
{
	int recover = 0;
	bool nested = in_nmi();
	if (!nested)
		nmi_enter();

	__this_cpu_inc(irq_stat.mce_exceptions);

	add_taint(TAINT_MACHINE_CHECK, LOCKDEP_NOW_UNRELIABLE);

	/* See if any machine dependent calls. In theory, we would want
	 * to call the CPU first, and call the ppc_md. one if the CPU
	 * one returns a positive number. However there is existing code
	 * that assumes the board gets a first chance, so let's keep it
	 * that way for now and fix things later. --BenH.
	 */
	if (ppc_md.machine_check_exception)
		recover = ppc_md.machine_check_exception(regs);
	else if (cur_cpu_spec->machine_check)
		recover = cur_cpu_spec->machine_check(regs);

	if (recover > 0)
		goto bail;


Either the ppc_md or cpu_spec handlers can send a signal, but after a
bit of grepping I think only the pseries and powernv ones do.

If you get into die() then it's an oops, which is not the same as a
normal signal.

cheers

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: hardcoded SIGSEGV in __die() ?
  2020-03-26  0:28       ` Michael Ellerman
@ 2020-03-27 10:10         ` Joakim Tjernlund
  2020-03-30 17:16         ` Joakim Tjernlund
  1 sibling, 0 replies; 11+ messages in thread
From: Joakim Tjernlund @ 2020-03-27 10:10 UTC (permalink / raw)
  To: christophe.leroy, mpe, linuxppc-dev

On Thu, 2020-03-26 at 11:28 +1100, Michael Ellerman wrote:
> CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you recognize the sender and know the content is safe.
> 
> 
> Joakim Tjernlund <Joakim.Tjernlund@infinera.com> writes:
> > On Mon, 2020-03-23 at 15:45 +0100, Christophe Leroy wrote:
> > > Le 23/03/2020 à 15:43, Christophe Leroy a écrit :
> > > > Le 23/03/2020 à 15:17, Joakim Tjernlund a écrit :
> > > > > In __die(), see below, there is this call to notify_send() with
> > > > > SIGSEGV hardcoded, this seems odd
> > > > > to me as the variable "err" holds the true signal(in my case SIGBUS)
> > > > > Should not SIGSEGV be replaced with the true signal no.?
> > > > 
> > > > As far as I can see, comes from
> > > > https://nam03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgit.kernel.org%2Fpub%2Fscm%2Flinux%2Fkernel%2Fgit%2Ftorvalds%2Flinux.git%2Fcommit%2F%3Fid%3D66fcb1059&amp;data=02%7C01%7CJoakim.Tjernlund%40infinera.com%7Caa316058f9e34dd758c808d7d11ca391%7C285643de5f5b4b03a1530ae2dc8aaf77%7C1%7C0%7C637207793252449714&amp;sdata=LBzRMxHWJzNEztnnG0UzJb7PHvaDGVswQD%2B8WpY9YX8%3D&amp;reserved=0
> > > > 
> > > 
> > > And
> > > https://nam03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgit.kernel.org%2Fpub%2Fscm%2Flinux%2Fkernel%2Fgit%2Ftorvalds%2Flinux.git%2Fcommit%2F%3Fid%3Dae87221d3ce49d9de1e43756da834fd0bf05a2ad&amp;data=02%7C01%7CJoakim.Tjernlund%40infinera.com%7Caa316058f9e34dd758c808d7d11ca391%7C285643de5f5b4b03a1530ae2dc8aaf77%7C1%7C0%7C637207793252449714&amp;sdata=Dh%2BUTRgG85oVSgC3SCR1B7izQH4HofT4ppOMiy9xvDA%3D&amp;reserved=0
> > > shows it is (was?) similar on x86.
> > > 
> > 
> > I tried to follow that chain thinking it would end up sending a signal to user space but I cannot see
> > that happens. Seems to be related to debugging.
> > 
> > In short, I cannot see any signal being delivered to user space. If so that would explain why
> > our user space process never dies.
> > Is there a signal hidden in machine_check handler for SIGBUS I cannot see?
> 
> It's platform specific. What platform are you on?

I am on e500, e5500(e500mc) and 83xx :)


> 
> See the ppc_md & cur_cpu_spec calls here:
> 
> void machine_check_exception(struct pt_regs *regs)
> {
>         int recover = 0;
>         bool nested = in_nmi();
>         if (!nested)
>                 nmi_enter();
> 
>         __this_cpu_inc(irq_stat.mce_exceptions);
> 
>         add_taint(TAINT_MACHINE_CHECK, LOCKDEP_NOW_UNRELIABLE);
> 
>         /* See if any machine dependent calls. In theory, we would want
>          * to call the CPU first, and call the ppc_md. one if the CPU
>          * one returns a positive number. However there is existing code
>          * that assumes the board gets a first chance, so let's keep it
>          * that way for now and fix things later. --BenH.
>          */
>         if (ppc_md.machine_check_exception)
>                 recover = ppc_md.machine_check_exception(regs);
>         else if (cur_cpu_spec->machine_check)
>                 recover = cur_cpu_spec->machine_check(regs);
> 
>         if (recover > 0)
>                 goto bail;
> 
> 
> Either the ppc_md or cpu_spec handlers can send a signal, but after a
> bit of grepping I think only the pseries and powernv ones do.

Seems so

> 
> If you get into die() then it's an oops, which is not the same as a
> normal signal.

Exactly, and the die/OOPS does not seem work as intended either. The system tries to limp along
and generates more similar OOPses and may even hang.

> 
> cheers


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: hardcoded SIGSEGV in __die() ?
  2020-03-26  0:28       ` Michael Ellerman
  2020-03-27 10:10         ` Joakim Tjernlund
@ 2020-03-30 17:16         ` Joakim Tjernlund
  1 sibling, 0 replies; 11+ messages in thread
From: Joakim Tjernlund @ 2020-03-30 17:16 UTC (permalink / raw)
  To: christophe.leroy, mpe, linuxppc-dev

On Thu, 2020-03-26 at 11:28 +1100, Michael Ellerman wrote:
> CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you recognize the sender and know the content is safe.
> 
> 
> Joakim Tjernlund <Joakim.Tjernlund@infinera.com> writes:
> > On Mon, 2020-03-23 at 15:45 +0100, Christophe Leroy wrote:
> > > Le 23/03/2020 à 15:43, Christophe Leroy a écrit :
> > > > Le 23/03/2020 à 15:17, Joakim Tjernlund a écrit :
> > > > > In __die(), see below, there is this call to notify_send() with
> > > > > SIGSEGV hardcoded, this seems odd
> > > > > to me as the variable "err" holds the true signal(in my case SIGBUS)
> > > > > Should not SIGSEGV be replaced with the true signal no.?
> > > > 
> > > > As far as I can see, comes from
> > > > https://nam03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgit.kernel.org%2Fpub%2Fscm%2Flinux%2Fkernel%2Fgit%2Ftorvalds%2Flinux.git%2Fcommit%2F%3Fid%3D66fcb1059&amp;data=02%7C01%7CJoakim.Tjernlund%40infinera.com%7Caa316058f9e34dd758c808d7d11ca391%7C285643de5f5b4b03a1530ae2dc8aaf77%7C1%7C0%7C637207793252449714&amp;sdata=LBzRMxHWJzNEztnnG0UzJb7PHvaDGVswQD%2B8WpY9YX8%3D&amp;reserved=0
> > > > 
> > > 
> > > And
> > > https://nam03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgit.kernel.org%2Fpub%2Fscm%2Flinux%2Fkernel%2Fgit%2Ftorvalds%2Flinux.git%2Fcommit%2F%3Fid%3Dae87221d3ce49d9de1e43756da834fd0bf05a2ad&amp;data=02%7C01%7CJoakim.Tjernlund%40infinera.com%7Caa316058f9e34dd758c808d7d11ca391%7C285643de5f5b4b03a1530ae2dc8aaf77%7C1%7C0%7C637207793252449714&amp;sdata=Dh%2BUTRgG85oVSgC3SCR1B7izQH4HofT4ppOMiy9xvDA%3D&amp;reserved=0
> > > shows it is (was?) similar on x86.
> > > 
> > 
> > I tried to follow that chain thinking it would end up sending a signal to user space but I cannot see
> > that happens. Seems to be related to debugging.
> > 
> > In short, I cannot see any signal being delivered to user space. If so that would explain why
> > our user space process never dies.
> > Is there a signal hidden in machine_check handler for SIGBUS I cannot see?
> 
> It's platform specific. What platform are you on?
> 
> See the ppc_md & cur_cpu_spec calls here:
> 
> void machine_check_exception(struct pt_regs *regs)
> {
>         int recover = 0;
>         bool nested = in_nmi();
>         if (!nested)
>                 nmi_enter();
> 
>         __this_cpu_inc(irq_stat.mce_exceptions);
> 
>         add_taint(TAINT_MACHINE_CHECK, LOCKDEP_NOW_UNRELIABLE);
> 
>         /* See if any machine dependent calls. In theory, we would want
>          * to call the CPU first, and call the ppc_md. one if the CPU
>          * one returns a positive number. However there is existing code
>          * that assumes the board gets a first chance, so let's keep it
>          * that way for now and fix things later. --BenH.
>          */
>         if (ppc_md.machine_check_exception)
>                 recover = ppc_md.machine_check_exception(regs);
>         else if (cur_cpu_spec->machine_check)
>                 recover = cur_cpu_spec->machine_check(regs);
> 
>         if (recover > 0)
>                 goto bail;
> 
> 
> Either the ppc_md or cpu_spec handlers can send a signal, but after a
> bit of grepping I think only the pseries and powernv ones do.
> 
> If you get into die() then it's an oops, which is not the same as a
> normal signal.

I had a look at opal_machine_check and friends and came up with:

diff --git a/arch/powerpc/kernel/traps.c b/arch/powerpc/kernel/traps.c
index 0381242920d9..12715d24141c 100644
--- a/arch/powerpc/kernel/traps.c
+++ b/arch/powerpc/kernel/traps.c
@@ -621,6 +621,11 @@ int machine_check_e500mc(struct pt_regs *regs)
                       reason & MCSR_MEA ? "Effective" : "Physical", addr);
        }
 
+       if ((user_mode(regs))) {
+               _exception(SIGBUS, regs, reason, regs->nip);
+               recoverable = 1;
+       }
+
 silent_out:
        mtspr(SPRN_MCSR, mcsr);
        return mfspr(SPRN_MCSR) == 0 && recoverable;
@@ -665,6 +670,10 @@ int machine_check_e500(struct pt_regs *regs)
        if (reason & MCSR_BUS_RPERR)
                printk("Bus - Read Parity Error\n");
 
+       if ((user_mode(regs))) {
+               _exception(SIGBUS, regs, reason, regs->nip);
+               return 1;
+       }
        return 0;
 }
 
@@ -695,6 +704,10 @@ int machine_check_e200(struct pt_regs *regs)
        if (reason & MCSR_BUS_WRERR)
                printk("Bus - Write Bus Error on buffered store or cache line push\n");
 
+       if ((user_mode(regs))) {
+               _exception(SIGBUS, regs, reason, regs->nip);
+               return 1;
+       }
        return 0;
 }
 #elif defined(CONFIG_PPC32)
@@ -731,6 +744,10 @@ int machine_check_generic(struct pt_regs *regs)
        default:
                printk("Unknown values in msr\n");
        }
+       if ((user_mode(regs))) {
+               _exception(SIGBUS, regs, reason, regs->nip);
+               return 1;
+       }
        return 0;
 }
 #endif /* everything else */

I don't really know what I am doing, does the above make sense to you?

     Jocke

^ permalink raw reply related	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2020-03-30 17:18 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-03-23 14:17 hardcoded SIGSEGV in __die() ? Joakim Tjernlund
2020-03-23 14:43 ` Christophe Leroy
2020-03-23 14:45   ` Christophe Leroy
2020-03-23 15:08     ` Joakim Tjernlund
2020-03-23 15:31       ` Christophe Leroy
2020-03-23 15:44         ` Joakim Tjernlund
2020-03-25 17:02           ` David Laight
2020-03-25 17:09             ` Joakim Tjernlund
2020-03-26  0:28       ` Michael Ellerman
2020-03-27 10:10         ` Joakim Tjernlund
2020-03-30 17:16         ` Joakim Tjernlund

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).