All of lore.kernel.org
 help / color / mirror / Atom feed
* Possible to annotate ARM64 IRQ handling to help gdb?
@ 2019-02-01 21:38 Doug Anderson
  2019-02-04 12:31 ` Dave Martin
  2019-02-04 13:12 ` Mark Rutland
  0 siblings, 2 replies; 7+ messages in thread
From: Doug Anderson @ 2019-02-01 21:38 UTC (permalink / raw)
  To: Linux ARM; +Cc: kgdb-bugreport, Will Deacon, Dave Martin, Stephen Boyd

Hi,

I was wondering if anyone out there has given any thought to
annotating the ARM64 IRQ handling in such a way that we could stack
crawl past el1_irq() when in gdb.

I spent a bit of time on this a few months ago and documented all my
findings in:

https://bugs.chromium.org/p/chromium/issues/detail?id=908721

I can copy and paste all the discussion from that bug here, but since
it's public hopefully folks can read the discussion / investigation
there.  To put it briefly, though: I can stack crawl past "el1_irq"
with the normal linux stack crawl (which is what kdb uses) but I can't
crawl past "el1_irq" in gdb().  After talking to some of our tools
guys here I'm fairly certain that we could solve this with the right
CFI directives, but when I poked at it I wasn't able to figure out the
magic.


Anyway, I figured I'd check to see if anyone here happens to know the
right magic.


-Doug

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Possible to annotate ARM64 IRQ handling to help gdb?
  2019-02-01 21:38 Possible to annotate ARM64 IRQ handling to help gdb? Doug Anderson
@ 2019-02-04 12:31 ` Dave Martin
  2019-02-11 17:27   ` Doug Anderson
  2019-02-04 13:12 ` Mark Rutland
  1 sibling, 1 reply; 7+ messages in thread
From: Dave Martin @ 2019-02-04 12:31 UTC (permalink / raw)
  To: Doug Anderson; +Cc: kgdb-bugreport, Will Deacon, Linux ARM, Stephen Boyd

On Fri, Feb 01, 2019 at 01:38:05PM -0800, Doug Anderson wrote:
> Hi,
> 
> I was wondering if anyone out there has given any thought to
> annotating the ARM64 IRQ handling in such a way that we could stack
> crawl past el1_irq() when in gdb.
> 
> I spent a bit of time on this a few months ago and documented all my
> findings in:
> 
> https://bugs.chromium.org/p/chromium/issues/detail?id=908721
> 
> I can copy and paste all the discussion from that bug here, but since
> it's public hopefully folks can read the discussion / investigation
> there.  To put it briefly, though: I can stack crawl past "el1_irq"
> with the normal linux stack crawl (which is what kdb uses) but I can't
> crawl past "el1_irq" in gdb().  After talking to some of our tools
> guys here I'm fairly certain that we could solve this with the right
> CFI directives, but when I poked at it I wasn't able to figure out the
> magic.
> 
> 
> Anyway, I figured I'd check to see if anyone here happens to know the
> right magic.

The kernel (appears to) generate a valid frame record for el1_irq:

   0xffffff8008082b94 <+84>:    mrs     x22, elr_el1

	[...]

   0xffffff8008082ba0 <+96>:    stp     x29, x22, [sp, #304]
   0xffffff8008082ba4 <+100>:   add     x29, sp, #0x130

(I note that 0x130 == 304.  Yay binutils.)


From the bug report, I don't see any real investigation into what
precisely causes gdb to choke on this frame.

Do you have evidence that CFI annotations help in this case?  And can
you explain _why_ they help (i.e., precisely how is gdb relying on the
annotations)?

Cheers
---Dave

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Possible to annotate ARM64 IRQ handling to help gdb?
  2019-02-01 21:38 Possible to annotate ARM64 IRQ handling to help gdb? Doug Anderson
  2019-02-04 12:31 ` Dave Martin
@ 2019-02-04 13:12 ` Mark Rutland
  2019-02-11 18:05   ` Doug Anderson
  1 sibling, 1 reply; 7+ messages in thread
From: Mark Rutland @ 2019-02-04 13:12 UTC (permalink / raw)
  To: Doug Anderson
  Cc: kgdb-bugreport, Will Deacon, Dave Martin, Linux ARM, Stephen Boyd

On Fri, Feb 01, 2019 at 01:38:05PM -0800, Doug Anderson wrote:
> Hi,

Hi Doug,

> I was wondering if anyone out there has given any thought to
> annotating the ARM64 IRQ handling in such a way that we could stack
> crawl past el1_irq() when in gdb.
> 
> I spent a bit of time on this a few months ago and documented all my
> findings in:
> 
> https://bugs.chromium.org/p/chromium/issues/detail?id=908721

There, the error from GDB is:

    Backtrace stopped: previous frame identical to this frame (corrupt
    stack?)

... is that misleading?

... or do we have some duplicate stack frame that we somewhow skip in
the kernel unwinder?

> I can copy and paste all the discussion from that bug here, but since
> it's public hopefully folks can read the discussion / investigation
> there.  To put it briefly, though: I can stack crawl past "el1_irq"
> with the normal linux stack crawl (which is what kdb uses) but I can't
> crawl past "el1_irq" in gdb().  After talking to some of our tools
> guys here I'm fairly certain that we could solve this with the right
> CFI directives, but when I poked at it I wasn't able to figure out the
> magic.

AFAICT, we don't know why GDB is terminating early. Could we please
figure that out first? e.g. by looking for the above message in the GDB
sources.

If we do need CFI annotations, I'd rather move that entry code to C
first, to minimize how painful that is. I have an ongoing project [1] to
do just that...

Thanks,
Mark.

[1] https://git.kernel.org/pub/scm/linux/kernel/git/mark/linux.git/log/?h=arm64/entry-deasm

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Possible to annotate ARM64 IRQ handling to help gdb?
  2019-02-04 12:31 ` Dave Martin
@ 2019-02-11 17:27   ` Doug Anderson
  2019-02-11 19:57     ` Dave Martin
  0 siblings, 1 reply; 7+ messages in thread
From: Doug Anderson @ 2019-02-11 17:27 UTC (permalink / raw)
  To: Dave Martin
  Cc: Caroline Tice, kgdb-bugreport, Will Deacon, Linux ARM, Stephen Boyd

Hi,

On Mon, Feb 4, 2019 at 4:31 AM Dave Martin <Dave.Martin@arm.com> wrote:
>
> On Fri, Feb 01, 2019 at 01:38:05PM -0800, Doug Anderson wrote:
> > Hi,
> >
> > I was wondering if anyone out there has given any thought to
> > annotating the ARM64 IRQ handling in such a way that we could stack
> > crawl past el1_irq() when in gdb.
> >
> > I spent a bit of time on this a few months ago and documented all my
> > findings in:
> >
> > https://bugs.chromium.org/p/chromium/issues/detail?id=908721
> >
> > I can copy and paste all the discussion from that bug here, but since
> > it's public hopefully folks can read the discussion / investigation
> > there.  To put it briefly, though: I can stack crawl past "el1_irq"
> > with the normal linux stack crawl (which is what kdb uses) but I can't
> > crawl past "el1_irq" in gdb().  After talking to some of our tools
> > guys here I'm fairly certain that we could solve this with the right
> > CFI directives, but when I poked at it I wasn't able to figure out the
> > magic.
> >
> >
> > Anyway, I figured I'd check to see if anyone here happens to know the
> > right magic.
>
> The kernel (appears to) generate a valid frame record for el1_irq:
>
>    0xffffff8008082b94 <+84>:    mrs     x22, elr_el1
>
>         [...]
>
>    0xffffff8008082ba0 <+96>:    stp     x29, x22, [sp, #304]
>    0xffffff8008082ba4 <+100>:   add     x29, sp, #0x130
>
> (I note that 0x130 == 304.  Yay binutils.)

Right, this is how the kernel is able to do the crawl.  It's also why
I was able to manually do the crawl in the bug by chaining together
frame pointers.


> From the bug report, I don't see any real investigation into what
> precisely causes gdb to choke on this frame.

Right.  I just don't know gdb well enough.  :(  I've had it on my list
to dig into it, but I need to find time.  ;-)


> Do you have evidence that CFI annotations help in this case?  And can
> you explain _why_ they help (i.e., precisely how is gdb relying on the
> annotations)?

I spent a tiny bit of time playing around with CFI annotations.
Mostly it was stumbling around in the dark since I had a hard time
finding good arm/arm64 examples and the documentation was a little
hard for me to parse.

...but from my experience with gdb, my guess is that gdb wants more
than just the simple frame pointers.  It wants to know where _all_ the
registers are stored on the stack and the only way it's going to get
that from assembly code (especially assembly code that barfed the
registers onto the stack somewhere that's not between FUNC and
ENDFUNC) is with some type of annotation.  My guess is that it doesn't
fall back to just looking at frame pointer chains.  Specifically as
you move up the stack frame in gdb and you type "info reg", the set of
registers changes to be those registers that are correct for the stack
frame you're on.  Here's a quick example showing how gdb behaves with
a random register that was barfed, $x22:

(gdb) frame 3
#3  0xffffff800846a088 in __handle_sysrq (key=103,
check_mask=<optimized out>) at .../drivers/tty/sysrq.c:620
620                             op_p->handler(key);

(gdb) disass
Dump of assembler code for function __handle_sysrq:
   0xffffff8008469f64 <+0>:     str     x23, [sp, #-64]!
   0xffffff8008469f68 <+4>:     stp     x22, x21, [sp, #16]
   0xffffff8008469f6c <+8>:     stp     x20, x19, [sp, #32]
   0xffffff8008469f70 <+12>:    stp     x29, x30, [sp, #48]
   0xffffff8008469f74 <+16>:    add     x29, sp, #0x30

(gdb) print /x $x22
$13 = 0xffffff8009035000

(gdb) print /x *(void**)($x29 - 0x30 + 16)
$14 = 0x8000100

(gdb) up
#4  0xffffff800846a0dc in handle_sysrq (key=103) at .../drivers/tty/sysrq.c:649
649                     __handle_sysrq(key, true);

(gdb) print /x $x22
$15 = 0x8000100

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Possible to annotate ARM64 IRQ handling to help gdb?
  2019-02-04 13:12 ` Mark Rutland
@ 2019-02-11 18:05   ` Doug Anderson
  0 siblings, 0 replies; 7+ messages in thread
From: Doug Anderson @ 2019-02-11 18:05 UTC (permalink / raw)
  To: Mark Rutland
  Cc: kgdb-bugreport, Will Deacon, Stephen Boyd, Caroline Tice,
	Dave Martin, Linux ARM

Hi,

On Mon, Feb 4, 2019 at 5:12 AM Mark Rutland <mark.rutland@arm.com> wrote:
>
> On Fri, Feb 01, 2019 at 01:38:05PM -0800, Doug Anderson wrote:
> > Hi,
>
> Hi Doug,
>
> > I was wondering if anyone out there has given any thought to
> > annotating the ARM64 IRQ handling in such a way that we could stack
> > crawl past el1_irq() when in gdb.
> >
> > I spent a bit of time on this a few months ago and documented all my
> > findings in:
> >
> > https://bugs.chromium.org/p/chromium/issues/detail?id=908721
>
> There, the error from GDB is:
>
>     Backtrace stopped: previous frame identical to this frame (corrupt
>     stack?)
>
> ... is that misleading?
>
> ... or do we have some duplicate stack frame that we somewhow skip in
> the kernel unwinder?

If I had to guess I'd say that when gdb doesn't see a frame it
recognizes then it just returns the previous one, which causes it to
stop.  I don't think gdb falls back to just looking at the link
register because it needs more.


> > I can copy and paste all the discussion from that bug here, but since
> > it's public hopefully folks can read the discussion / investigation
> > there.  To put it briefly, though: I can stack crawl past "el1_irq"
> > with the normal linux stack crawl (which is what kdb uses) but I can't
> > crawl past "el1_irq" in gdb().  After talking to some of our tools
> > guys here I'm fairly certain that we could solve this with the right
> > CFI directives, but when I poked at it I wasn't able to figure out the
> > magic.
>
> AFAICT, we don't know why GDB is terminating early. Could we please
> figure that out first? e.g. by looking for the above message in the GDB
> sources.
>
> If we do need CFI annotations, I'd rather move that entry code to C
> first, to minimize how painful that is. I have an ongoing project [1] to
> do just that...
>
> Thanks,
> Mark.
>
> [1] https://git.kernel.org/pub/scm/linux/kernel/git/mark/linux.git/log/?h=arm64/entry-deasm

OK, I tried this.  It _changes_ the behavior but doesn't magically get
me get a full crawl.  If something like this is likely to merge to
mainline before too long then it makes sense to spend the time
debugging it instead of the old code...

---

Vanilla v5.0-rc6 on kevin:

#13 0xffffff801013e08c in generic_handle_irq_desc
    (desc=0x1)
    at .../include/linux/irqdesc.h:154
#14 generic_handle_irq
    (irq=<optimized out>)
    at .../kernel/irq/irqdesc.c:628
#15 0xffffff801013e110 in __handle_domain_irq
    (domain=0xffffffc000211880, hwirq=<optimized out>,
     lookup=<optimized out>, regs=0xffffff8011003ce0)
    at .../kernel/irq/irqdesc.c:665
#16 0xffffff8010081124 in handle_domain_irq
    (domain=0x1, hwirq=<optimized out>, regs=<optimized out>)
    at .../include/linux/irqdesc.h:172
#17 gic_handle_irq (regs=0xffffff8011003ce0)
    at .../drivers/irqchip/irq-gic-v3.c:367
#18 0xffffff8010082bf4 in el1_irq ()
    at .../arch/arm64/kernel/entry.S:609
Backtrace stopped: previous frame identical to this frame (corrupt stack?)

---

Vanilla v5.0-rc6 + your patches on kevin:

#13 0xffffff801013e3cc in generic_handle_irq_desc
    (desc=0x1)
    at .../include/linux/irqdesc.h:154
#14 generic_handle_irq
    (irq=<optimized out>)
    at .../kernel/irq/irqdesc.c:628
#15 0xffffff801013e450 in __handle_domain_irq
    (domain=0xffffffc000211880, hwirq=<optimized out>,
     lookup=<optimized out>, regs=0xffffff8011003ce0)
    at .../kernel/irq/irqdesc.c:665
#16 0xffffff80100810c4 in handle_domain_irq
    (domain=0x1, hwirq=<optimized out>, regs=<optimized out>)
    at .../include/linux/irqdesc.h:172
#17 gic_handle_irq
    (regs=0xffffff8011003ce0)
    at .../drivers/irqchip/irq-gic-v3.c:367
#18 0xffffff8010084fd0 in call_on_stack
    ()
    at .../arch/arm64/kernel/entry.S:718
Backtrace stopped: Cannot access memory at address 0xffffff8010004008


-Doug

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Possible to annotate ARM64 IRQ handling to help gdb?
  2019-02-11 17:27   ` Doug Anderson
@ 2019-02-11 19:57     ` Dave Martin
  2019-02-13 21:19       ` Doug Anderson
  0 siblings, 1 reply; 7+ messages in thread
From: Dave Martin @ 2019-02-11 19:57 UTC (permalink / raw)
  To: Doug Anderson
  Cc: Caroline Tice, kgdb-bugreport, Will Deacon, Linux ARM, Stephen Boyd

On Mon, Feb 11, 2019 at 09:27:11AM -0800, Doug Anderson wrote:
> Hi,
> 
> On Mon, Feb 4, 2019 at 4:31 AM Dave Martin <Dave.Martin@arm.com> wrote:
> >
> > On Fri, Feb 01, 2019 at 01:38:05PM -0800, Doug Anderson wrote:
> > > Hi,
> > >
> > > I was wondering if anyone out there has given any thought to
> > > annotating the ARM64 IRQ handling in such a way that we could stack
> > > crawl past el1_irq() when in gdb.
> > >
> > > I spent a bit of time on this a few months ago and documented all my
> > > findings in:
> > >
> > > https://bugs.chromium.org/p/chromium/issues/detail?id=908721
> > >
> > > I can copy and paste all the discussion from that bug here, but since
> > > it's public hopefully folks can read the discussion / investigation
> > > there.  To put it briefly, though: I can stack crawl past "el1_irq"
> > > with the normal linux stack crawl (which is what kdb uses) but I can't
> > > crawl past "el1_irq" in gdb().  After talking to some of our tools
> > > guys here I'm fairly certain that we could solve this with the right
> > > CFI directives, but when I poked at it I wasn't able to figure out the
> > > magic.
> > >
> > >
> > > Anyway, I figured I'd check to see if anyone here happens to know the
> > > right magic.
> >
> > The kernel (appears to) generate a valid frame record for el1_irq:
> >
> >    0xffffff8008082b94 <+84>:    mrs     x22, elr_el1
> >
> >         [...]
> >
> >    0xffffff8008082ba0 <+96>:    stp     x29, x22, [sp, #304]
> >    0xffffff8008082ba4 <+100>:   add     x29, sp, #0x130
> >
> > (I note that 0x130 == 304.  Yay binutils.)
> 
> Right, this is how the kernel is able to do the crawl.  It's also why
> I was able to manually do the crawl in the bug by chaining together
> frame pointers.
> 
> 
> > From the bug report, I don't see any real investigation into what
> > precisely causes gdb to choke on this frame.
> 
> Right.  I just don't know gdb well enough.  :(  I've had it on my list
> to dig into it, but I need to find time.  ;-)
> 
> 
> > Do you have evidence that CFI annotations help in this case?  And can
> > you explain _why_ they help (i.e., precisely how is gdb relying on the
> > annotations)?
> 
> I spent a tiny bit of time playing around with CFI annotations.
> Mostly it was stumbling around in the dark since I had a hard time
> finding good arm/arm64 examples and the documentation was a little
> hard for me to parse.

You could try compiling a few simple C functions with gcc -S
-fexceptions and see what the compiler spits out.

> ...but from my experience with gdb, my guess is that gdb wants more
> than just the simple frame pointers.  It wants to know where _all_ the
> registers are stored on the stack and the only way it's going to get
> that from assembly code (especially assembly code that barfed the
> registers onto the stack somewhere that's not between FUNC and
> ENDFUNC) is with some type of annotation.  My guess is that it doesn't
> fall back to just looking at frame pointer chains.  Specifically as
> you move up the stack frame in gdb and you type "info reg", the set of
> registers changes to be those registers that are correct for the stack
> frame you're on.  Here's a quick example showing how gdb behaves with
> a random register that was barfed, $x22:
> 
> (gdb) frame 3
> #3  0xffffff800846a088 in __handle_sysrq (key=103,
> check_mask=<optimized out>) at .../drivers/tty/sysrq.c:620
> 620                             op_p->handler(key);
> 
> (gdb) disass
> Dump of assembler code for function __handle_sysrq:
>    0xffffff8008469f64 <+0>:     str     x23, [sp, #-64]!
>    0xffffff8008469f68 <+4>:     stp     x22, x21, [sp, #16]
>    0xffffff8008469f6c <+8>:     stp     x20, x19, [sp, #32]
>    0xffffff8008469f70 <+12>:    stp     x29, x30, [sp, #48]
>    0xffffff8008469f74 <+16>:    add     x29, sp, #0x30
> 
> (gdb) print /x $x22
> $13 = 0xffffff8009035000
> 
> (gdb) print /x *(void**)($x29 - 0x30 + 16)
> $14 = 0x8000100
> 
> (gdb) up
> #4  0xffffff800846a0dc in handle_sysrq (key=103) at .../drivers/tty/sysrq.c:649
> 649                     __handle_sysrq(key, true);
> 
> (gdb) print /x $x22
> $15 = 0x8000100


Indeed, but this requires full DWARF or .eh_frame info, which is not
generally available in the kernel.

Except for code built with -fomit-frame-pointer, you should at least
be able to see a list of frames though: this doesn't require all the
registers of ancestor frames to be recovered, just x29 and lr (which is
what the frame records on the stack contain -- so no other magic info
is required in order to recover these).

gdb tries various methods to unwind a frame, and ought to fall back to
this approach if all else fails.  Frame chains that appear to loop
are a problem though, with no straightforward solution.

My hunch is that gdb sees the frame chain attempt to loop backwards
after el1_irq and bails out.  Is your task stack at a lower address than
the IRQ stack?

In the kernel we gave up attempting to fully detect backtrace loops
because of this issue, but this involves some cruddy heuristics which
may not be considered acceptable for gdb.  For one thing, our rules are
specific for the kernel, not general-purpose.

Cheers
---Dave

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Possible to annotate ARM64 IRQ handling to help gdb?
  2019-02-11 19:57     ` Dave Martin
@ 2019-02-13 21:19       ` Doug Anderson
  0 siblings, 0 replies; 7+ messages in thread
From: Doug Anderson @ 2019-02-13 21:19 UTC (permalink / raw)
  To: Dave Martin
  Cc: Caroline Tice, kgdb-bugreport, Will Deacon, Linux ARM, Stephen Boyd

Hi,

On Mon, Feb 11, 2019 at 11:58 AM Dave Martin <Dave.Martin@arm.com> wrote:
>
> On Mon, Feb 11, 2019 at 09:27:11AM -0800, Doug Anderson wrote:
> > Hi,
> >
> > On Mon, Feb 4, 2019 at 4:31 AM Dave Martin <Dave.Martin@arm.com> wrote:
> > >
> > > On Fri, Feb 01, 2019 at 01:38:05PM -0800, Doug Anderson wrote:
> > > > Hi,
> > > >
> > > > I was wondering if anyone out there has given any thought to
> > > > annotating the ARM64 IRQ handling in such a way that we could stack
> > > > crawl past el1_irq() when in gdb.
> > > >
> > > > I spent a bit of time on this a few months ago and documented all my
> > > > findings in:
> > > >
> > > > https://bugs.chromium.org/p/chromium/issues/detail?id=908721
> > > >
> > > > I can copy and paste all the discussion from that bug here, but since
> > > > it's public hopefully folks can read the discussion / investigation
> > > > there.  To put it briefly, though: I can stack crawl past "el1_irq"
> > > > with the normal linux stack crawl (which is what kdb uses) but I can't
> > > > crawl past "el1_irq" in gdb().  After talking to some of our tools
> > > > guys here I'm fairly certain that we could solve this with the right
> > > > CFI directives, but when I poked at it I wasn't able to figure out the
> > > > magic.
> > > >
> > > >
> > > > Anyway, I figured I'd check to see if anyone here happens to know the
> > > > right magic.
> > >
> > > The kernel (appears to) generate a valid frame record for el1_irq:
> > >
> > >    0xffffff8008082b94 <+84>:    mrs     x22, elr_el1
> > >
> > >         [...]
> > >
> > >    0xffffff8008082ba0 <+96>:    stp     x29, x22, [sp, #304]
> > >    0xffffff8008082ba4 <+100>:   add     x29, sp, #0x130
> > >
> > > (I note that 0x130 == 304.  Yay binutils.)
> >
> > Right, this is how the kernel is able to do the crawl.  It's also why
> > I was able to manually do the crawl in the bug by chaining together
> > frame pointers.
> >
> >
> > > From the bug report, I don't see any real investigation into what
> > > precisely causes gdb to choke on this frame.
> >
> > Right.  I just don't know gdb well enough.  :(  I've had it on my list
> > to dig into it, but I need to find time.  ;-)
> >
> >
> > > Do you have evidence that CFI annotations help in this case?  And can
> > > you explain _why_ they help (i.e., precisely how is gdb relying on the
> > > annotations)?
> >
> > I spent a tiny bit of time playing around with CFI annotations.
> > Mostly it was stumbling around in the dark since I had a hard time
> > finding good arm/arm64 examples and the documentation was a little
> > hard for me to parse.
>
> You could try compiling a few simple C functions with gcc -S
> -fexceptions and see what the compiler spits out.

Thanks, this definitely helped!


> > ...but from my experience with gdb, my guess is that gdb wants more
> > than just the simple frame pointers.  It wants to know where _all_ the
> > registers are stored on the stack and the only way it's going to get
> > that from assembly code (especially assembly code that barfed the
> > registers onto the stack somewhere that's not between FUNC and
> > ENDFUNC) is with some type of annotation.  My guess is that it doesn't
> > fall back to just looking at frame pointer chains.  Specifically as
> > you move up the stack frame in gdb and you type "info reg", the set of
> > registers changes to be those registers that are correct for the stack
> > frame you're on.  Here's a quick example showing how gdb behaves with
> > a random register that was barfed, $x22:
> >
> > (gdb) frame 3
> > #3  0xffffff800846a088 in __handle_sysrq (key=103,
> > check_mask=<optimized out>) at .../drivers/tty/sysrq.c:620
> > 620                             op_p->handler(key);
> >
> > (gdb) disass
> > Dump of assembler code for function __handle_sysrq:
> >    0xffffff8008469f64 <+0>:     str     x23, [sp, #-64]!
> >    0xffffff8008469f68 <+4>:     stp     x22, x21, [sp, #16]
> >    0xffffff8008469f6c <+8>:     stp     x20, x19, [sp, #32]
> >    0xffffff8008469f70 <+12>:    stp     x29, x30, [sp, #48]
> >    0xffffff8008469f74 <+16>:    add     x29, sp, #0x30
> >
> > (gdb) print /x $x22
> > $13 = 0xffffff8009035000
> >
> > (gdb) print /x *(void**)($x29 - 0x30 + 16)
> > $14 = 0x8000100
> >
> > (gdb) up
> > #4  0xffffff800846a0dc in handle_sysrq (key=103) at .../drivers/tty/sysrq.c:649
> > 649                     __handle_sysrq(key, true);
> >
> > (gdb) print /x $x22
> > $15 = 0x8000100
>
>
> Indeed, but this requires full DWARF or .eh_frame info, which is not
> generally available in the kernel.

Yup, but I have it for gdb and right now the problem I'm trying to
solve is being able to crawl in gdb since the kernel seems to be OK.
I guess I was thinking that perhaps the DWARF info could be confusing
gdb?



> Except for code built with -fomit-frame-pointer, you should at least
> be able to see a list of frames though: this doesn't require all the
> registers of ancestor frames to be recovered, just x29 and lr (which is
> what the frame records on the stack contain -- so no other magic info
> is required in order to recover these).
>
> gdb tries various methods to unwind a frame, and ought to fall back to
> this approach if all else fails.  Frame chains that appear to loop
> are a problem though, with no straightforward solution.
>
> My hunch is that gdb sees the frame chain attempt to loop backwards
> after el1_irq and bails out.  Is your task stack at a lower address than
> the IRQ stack?

Here's what I've got (not lower)

#16 0xffffff8008082bf0 in el1_irq () at
/mnt/host/source/src/third_party/kernel/v4.19/arch/arm64/kernel/entry.S:622
622             irq_handler

(gdb) print /x $sp
$11 = 0xffffff8008004000
(gdb) print /x $x29
$12 = 0xffffff8009003e90
(gdb) print /x ((void**)$x29)[0]
$13 = 0xffffff8009003ed0
(gdb) print /x (*(void***)$x29)[0]
$14 = 0xffffff8009003ee0


...but then I poked a bit more and found out one really big problem is
this that "irq_stack_entry" swaps the stack before calling
gic_handle_irq() and this seemed to be confusing gdb.  Specifically
the value of "sp" when I point gdb at the "el1_irq" frame is actually
"irq_stack_ptr" AKA 0xffffff8008004000.


I've been fighting a bit with trying to figure out how to make .cfi
directives do what I want and I managed a stupid/ugly hack that at
least seems to get my stack pointer to be correct in el1_irq now:

---

 static asmlinkage void __exception_irq_entry gic_handle_irq(struct
pt_regs *regs)
 {
        u32 irqnr;
+       asm volatile (".cfi_register 31, 19");

---

...when I do that then my stack pointer sane which I point at el1_irq
(it matches x19), but I still can't get a trace.  I also haven't yet
been able to figure out how to accomplish that without hacking it into
gic_handle_irq().


While it would be nice to get all this solved, it's probably not high
priority right now, so I might have to punt unless there's some other
obvious / low hanging fruit to try.


-Doug

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2019-02-13 21:19 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-02-01 21:38 Possible to annotate ARM64 IRQ handling to help gdb? Doug Anderson
2019-02-04 12:31 ` Dave Martin
2019-02-11 17:27   ` Doug Anderson
2019-02-11 19:57     ` Dave Martin
2019-02-13 21:19       ` Doug Anderson
2019-02-04 13:12 ` Mark Rutland
2019-02-11 18:05   ` Doug Anderson

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.