All of lore.kernel.org
 help / color / mirror / Atom feed
From: Changbin Du <changbin.du@gmail.com>
To: Jisheng Zhang <jszhang@kernel.org>
Cc: Changbin Du <changbin.du@gmail.com>,
	Paul Walmsley <paul.walmsley@sifive.com>,
	Palmer Dabbelt <palmer@dabbelt.com>,
	Albert Ou <aou@eecs.berkeley.edu>,
	linux-riscv@lists.infradead.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH v2] riscv: fix oops caused by irq on/off tracer
Date: Fri, 11 Feb 2022 11:06:13 +0800	[thread overview]
Message-ID: <20220211030613.s75irqxhflc25t7a@mail.google.com> (raw)
In-Reply-To: <YgUxIgMJRhJD6/GP@xhacker>

I reconsidered the problem and found my previous analysis is flawed. So let's re-explain.

The fault happens on code generated by CALLER_ADDR1 (aka.__builtin_return_address(1)):
   0xffffffff8011510e <+80>:    ld      a1,-16(s0)
   0xffffffff80115112 <+84>:    ld      s2,-8(a1)  # <-- paging fault here,a1=0x0000000000000100

This because the assembly entry code doesn't setup a valid frame pointer, and the fp(aka. s0) register is used for other purpose.
resume_kernel:
	REG_L s0, TASK_TI_PREEMPT_COUNT(tp)
	bnez s0, restore_all
	REG_L s0, TASK_TI_FLAGS(tp)
	andi s0, s0, _TIF_NEED_RESCHED
	beqz s0, restore_all
	call preempt_schedule_irq
	j restore_all

So, there is two solutions:
 1) Invoke trace_hardirqs_on/off in C function, so the compiler will take care of frame pointer. This what I did.
 2) Always setup vaild frame pointer in assembly entry code. I think this is what JiSheng suggested?

I prefer #1 since we don't need to setup frame pointer if irqoff tracer is not enabled.

On Thu, Feb 10, 2022 at 11:37:06PM +0800, Jisheng Zhang wrote:
> On Thu, Feb 10, 2022 at 11:27:21PM +0800, Jisheng Zhang wrote:
> > On Thu, Feb 10, 2022 at 09:37:58PM +0800, Changbin Du wrote:
> > > On Thu, Feb 10, 2022 at 01:32:59AM +0800, Jisheng Zhang wrote:
> > > [snip]
> > > > Hi Changbin,
> > > > 
> > > > I read the code and find that current riscv frame records during
> > > > exception isn't as completed as other architectures. riscv only
> > > > records frames from the ret_from_exception(). If we add completed
> > > What do you mean for 'record'?
> > > 
> > 
> > stack frame record.
> > 
> > > > frame records as other arch do, then the issue you saw can also
> > > > be fixed at the same time.
> > > > 
> > > I don't think so. The problem is __builtin_return_address(1) trigger page fault
> > > here.
> > 
> > There's misunderstanding here. I interpret this bug as incomplete
> > stackframes.
> > 
> > This is current riscv stackframe during exception:
> > 
> > high
> >  	----------------
> > top	|		|  <- ret_from_exception
> > 	----------------
> > 	|		|  <- trace_hardirqs_on
> > 	-----------------
> > low
> 
> sorry, the "top" is wrongly placed.
>  high
>   	----------------
>  	|		|  <- ret_from_exception
>  	----------------
>  	|		|  <- trace_hardirqs_on
>  	-----------------
> top
> 
>  low
> 
> 
> 
> > 
> > As you said, the CALLER_ADDR1 a.k.a __builtin_return_address(1) needs
> > at least two parent call frames. 
> > 
> > If we complete the stackframes during exception as other arch does:
> > 
> > high
> >  	----------------
> > top	|		|  <- the synthetic stackframe from the interrupted point
> >  	----------------
> > 	      .....	      
> >         ----------------
> > 	|		|  <- ret_from_exception
> > 	----------------
> > 	|		|  <- trace_hardirqs_on
> > 	-----------------
> > low
> 
> ditto
> 
> > 
> > 
> > Then we meet the "at least two parent call frames" requirement. IOW, my
> > solution solve the problem from the entry.S side. One of the advantages
> > would be we let interrupted point show up in dump_stack() as other arch
> > do. What I'm not sure is whether it's safe to do so now since rc3 is
> > released.
> > 

-- 
Cheers,
Changbin Du

WARNING: multiple messages have this Message-ID (diff)
From: Changbin Du <changbin.du@gmail.com>
To: Jisheng Zhang <jszhang@kernel.org>
Cc: Changbin Du <changbin.du@gmail.com>,
	Paul Walmsley <paul.walmsley@sifive.com>,
	Palmer Dabbelt <palmer@dabbelt.com>,
	Albert Ou <aou@eecs.berkeley.edu>,
	linux-riscv@lists.infradead.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH v2] riscv: fix oops caused by irq on/off tracer
Date: Fri, 11 Feb 2022 11:06:13 +0800	[thread overview]
Message-ID: <20220211030613.s75irqxhflc25t7a@mail.google.com> (raw)
In-Reply-To: <YgUxIgMJRhJD6/GP@xhacker>

I reconsidered the problem and found my previous analysis is flawed. So let's re-explain.

The fault happens on code generated by CALLER_ADDR1 (aka.__builtin_return_address(1)):
   0xffffffff8011510e <+80>:    ld      a1,-16(s0)
   0xffffffff80115112 <+84>:    ld      s2,-8(a1)  # <-- paging fault here,a1=0x0000000000000100

This because the assembly entry code doesn't setup a valid frame pointer, and the fp(aka. s0) register is used for other purpose.
resume_kernel:
	REG_L s0, TASK_TI_PREEMPT_COUNT(tp)
	bnez s0, restore_all
	REG_L s0, TASK_TI_FLAGS(tp)
	andi s0, s0, _TIF_NEED_RESCHED
	beqz s0, restore_all
	call preempt_schedule_irq
	j restore_all

So, there is two solutions:
 1) Invoke trace_hardirqs_on/off in C function, so the compiler will take care of frame pointer. This what I did.
 2) Always setup vaild frame pointer in assembly entry code. I think this is what JiSheng suggested?

I prefer #1 since we don't need to setup frame pointer if irqoff tracer is not enabled.

On Thu, Feb 10, 2022 at 11:37:06PM +0800, Jisheng Zhang wrote:
> On Thu, Feb 10, 2022 at 11:27:21PM +0800, Jisheng Zhang wrote:
> > On Thu, Feb 10, 2022 at 09:37:58PM +0800, Changbin Du wrote:
> > > On Thu, Feb 10, 2022 at 01:32:59AM +0800, Jisheng Zhang wrote:
> > > [snip]
> > > > Hi Changbin,
> > > > 
> > > > I read the code and find that current riscv frame records during
> > > > exception isn't as completed as other architectures. riscv only
> > > > records frames from the ret_from_exception(). If we add completed
> > > What do you mean for 'record'?
> > > 
> > 
> > stack frame record.
> > 
> > > > frame records as other arch do, then the issue you saw can also
> > > > be fixed at the same time.
> > > > 
> > > I don't think so. The problem is __builtin_return_address(1) trigger page fault
> > > here.
> > 
> > There's misunderstanding here. I interpret this bug as incomplete
> > stackframes.
> > 
> > This is current riscv stackframe during exception:
> > 
> > high
> >  	----------------
> > top	|		|  <- ret_from_exception
> > 	----------------
> > 	|		|  <- trace_hardirqs_on
> > 	-----------------
> > low
> 
> sorry, the "top" is wrongly placed.
>  high
>   	----------------
>  	|		|  <- ret_from_exception
>  	----------------
>  	|		|  <- trace_hardirqs_on
>  	-----------------
> top
> 
>  low
> 
> 
> 
> > 
> > As you said, the CALLER_ADDR1 a.k.a __builtin_return_address(1) needs
> > at least two parent call frames. 
> > 
> > If we complete the stackframes during exception as other arch does:
> > 
> > high
> >  	----------------
> > top	|		|  <- the synthetic stackframe from the interrupted point
> >  	----------------
> > 	      .....	      
> >         ----------------
> > 	|		|  <- ret_from_exception
> > 	----------------
> > 	|		|  <- trace_hardirqs_on
> > 	-----------------
> > low
> 
> ditto
> 
> > 
> > 
> > Then we meet the "at least two parent call frames" requirement. IOW, my
> > solution solve the problem from the entry.S side. One of the advantages
> > would be we let interrupted point show up in dump_stack() as other arch
> > do. What I'm not sure is whether it's safe to do so now since rc3 is
> > released.
> > 

-- 
Cheers,
Changbin Du

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

  reply	other threads:[~2022-02-11  3:07 UTC|newest]

Thread overview: 27+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-01-29  0:42 [PATCH v2] riscv: fix oops caused by irq on/off tracer Changbin Du
2022-01-29  0:42 ` Changbin Du
2022-01-29 14:49 ` kernel test robot
2022-01-29 14:49   ` kernel test robot
2022-01-29 14:49   ` kernel test robot
2022-02-06 17:25 ` Jisheng Zhang
2022-02-06 17:25   ` Jisheng Zhang
2022-02-07 12:38   ` ChangbinCONFIG_IRQSOFF_TRACER Du
2022-02-07 12:38     ` ChangbinCONFIG_IRQSOFF_TRACER Du
2022-02-07 15:31     ` Jisheng Zhang
2022-02-07 15:31       ` Jisheng Zhang
2022-02-08  0:35       ` Changbin Du
2022-02-08  0:35         ` Changbin Du
2022-02-09 17:32         ` Jisheng Zhang
2022-02-09 17:32           ` Jisheng Zhang
2022-02-10 13:37           ` Changbin Du
2022-02-10 13:37             ` Changbin Du
2022-02-10 15:27             ` Jisheng Zhang
2022-02-10 15:27               ` Jisheng Zhang
2022-02-10 15:37               ` Jisheng Zhang
2022-02-10 15:37                 ` Jisheng Zhang
2022-02-11  3:06                 ` Changbin Du [this message]
2022-02-11  3:06                   ` Changbin Du
2022-02-10 15:59               ` Jessica Clarke
2022-02-10 15:59                 ` Jessica Clarke
2022-02-11  3:21                 ` Changbin Du
2022-02-11  3:21                   ` Changbin Du

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20220211030613.s75irqxhflc25t7a@mail.google.com \
    --to=changbin.du@gmail.com \
    --cc=aou@eecs.berkeley.edu \
    --cc=jszhang@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-riscv@lists.infradead.org \
    --cc=palmer@dabbelt.com \
    --cc=paul.walmsley@sifive.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.