Linux-OMAP Archive on lore.kernel.org
 help / color / Atom feed
* Lockup inside omap4_prminst_read_inst_reg on OMAP5 uEVM
@ 2020-07-26 17:59 David Shah
  2020-07-27  6:47 ` Tony Lindgren
  2020-08-01 20:57 ` David Shah
  0 siblings, 2 replies; 3+ messages in thread
From: David Shah @ 2020-07-26 17:59 UTC (permalink / raw)
  To: Discussions about the Letux Kernel,
	; kernel@pyra-handheld.com; Linux-OMAP

Hi all,

I am looking into random lockups - significantly rarer than once a day
in typical usage, various patterns like lots of bursty network traffic
increase frequency - that affect both the uEVM and the Pyra (also
OMAP5432 based) on newer kernels (currently testing with 5.6 but I have
seen lockups with 5.7 too).

Currently I'm working with the uEVM as it is a bit easier to connect
the JTAG adapter. I managed to get a lockup with the JTAG attached, and
unfortunately the processor is badly locked up enough (presumably a
stuck memory bus?) that JTAG isn't able to get a register dump or
stacktrace. But I do get the following error which at least gives a
PC: 

CortexA15_0: Trouble Halting Target CPU: (Error -1323 @ 0xC0223E0C)
Device failed to enter debug/halt mode because pipeline is stalled.
Power-cycle the board. If error persists, confirm configuration and/or
try more reliable JTAG settings (e.g. lower TCLK). (Emulation package
9.2.0.00002) 

The second core is just sitting at WFI, don't think there is anything
suspicious about that.

Looking at the kernel disassembly this is the actual register read (ldr
r0, [r1]) part of omap4_prminst_read_inst_reg.

My best guess is that it is trying to read from a register that doesn't
exist or isn't responding due to the current power configuration, but I
wonder if anyone has seen this before or has any more clues on how to
debug this? It's a shame that I can't seem to see what r1 is or get a
backtrace. It looks like it might be possible to set some kind of
timeout on the interconnect, has anyone tried something like that to
debug this kind of issue?

Best

David Shah



^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: Lockup inside omap4_prminst_read_inst_reg on OMAP5 uEVM
  2020-07-26 17:59 Lockup inside omap4_prminst_read_inst_reg on OMAP5 uEVM David Shah
@ 2020-07-27  6:47 ` Tony Lindgren
  2020-08-01 20:57 ` David Shah
  1 sibling, 0 replies; 3+ messages in thread
From: Tony Lindgren @ 2020-07-27  6:47 UTC (permalink / raw)
  To: David Shah
  Cc: Discussions about the Letux Kernel,
	; kernel@pyra-handheld.com; Linux-OMAP

* David Shah <dave@ds0.me> [200726 17:59]:
> Hi all,
> 
> I am looking into random lockups - significantly rarer than once a day
> in typical usage, various patterns like lots of bursty network traffic
> increase frequency - that affect both the uEVM and the Pyra (also
> OMAP5432 based) on newer kernels (currently testing with 5.6 but I have
> seen lockups with 5.7 too).

Just wondering.. Is this with USB Ethernet or with WLAN?

Regards,

Tony

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: Lockup inside omap4_prminst_read_inst_reg on OMAP5 uEVM
  2020-07-26 17:59 Lockup inside omap4_prminst_read_inst_reg on OMAP5 uEVM David Shah
  2020-07-27  6:47 ` Tony Lindgren
@ 2020-08-01 20:57 ` David Shah
  1 sibling, 0 replies; 3+ messages in thread
From: David Shah @ 2020-08-01 20:57 UTC (permalink / raw)
  To: Discussions about the Letux Kernel, kernel, Linux-OMAP

A tiny bit more information, if anyone has any more ideas.

I can confirm that this happened once with the device idle, and no
networking connection.

Based on the information I have been able to extract, the call stack does
seem to involve omap4_enter_lowpower but I can't be certain.

The main JTAG access I have is to be able to read out what seems to be
kernel virtual memory via the other, non-locked-up but WFI, core. I
attempted to add some tracing via writing a value to a global variable
inside the problem function and then flushing the D$, but the delay this
adds (or the cache flush itself) seems to stop the lockup from occuring
most of the time. It did lock up once with this added, but then reading
out that area of memory failed, possibly because the locked up core was
confusing the cache coherency magic inside the cores.

Since that lock-up I added 20 NOPs after the cache flush, to try and make
sure the cache flush really does work, and with those added it does not
lock up at all.

Is there a better way to take advantage of this ability to read out
memory for debugging?

Best

David


On Sun, 2020-07-26 at 18:59 +0100, David Shah wrote:
> Hi all,
> 
> I am looking into random lockups - significantly rarer than once a day
> in typical usage, various patterns like lots of bursty network traffic
> increase frequency - that affect both the uEVM and the Pyra (also
> OMAP5432 based) on newer kernels (currently testing with 5.6 but I have
> seen lockups with 5.7 too).
> 
> Currently I'm working with the uEVM as it is a bit easier to connect
> the JTAG adapter. I managed to get a lockup with the JTAG attached, and
> unfortunately the processor is badly locked up enough (presumably a
> stuck memory bus?) that JTAG isn't able to get a register dump or
> stacktrace. But I do get the following error which at least gives a
> PC: 
> 
> CortexA15_0: Trouble Halting Target CPU: (Error -1323 @ 0xC0223E0C)
> Device failed to enter debug/halt mode because pipeline is stalled.
> Power-cycle the board. If error persists, confirm configuration and/or
> try more reliable JTAG settings (e.g. lower TCLK). (Emulation package
> 9.2.0.00002) 
> 
> The second core is just sitting at WFI, don't think there is anything
> suspicious about that.
> 
> Looking at the kernel disassembly this is the actual register read (ldr
> r0, [r1]) part of omap4_prminst_read_inst_reg.
> 
> My best guess is that it is trying to read from a register that doesn't
> exist or isn't responding due to the current power configuration, but I
> wonder if anyone has seen this before or has any more clues on how to
> debug this? It's a shame that I can't seem to see what r1 is or get a
> backtrace. It looks like it might be possible to set some kind of
> timeout on the interconnect, has anyone tried something like that to
> debug this kind of issue?
> 
> Best
> 
> David Shah
> 
> 


^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, back to index

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-07-26 17:59 Lockup inside omap4_prminst_read_inst_reg on OMAP5 uEVM David Shah
2020-07-27  6:47 ` Tony Lindgren
2020-08-01 20:57 ` David Shah

Linux-OMAP Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-omap/0 linux-omap/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-omap linux-omap/ https://lore.kernel.org/linux-omap \
		linux-omap@vger.kernel.org
	public-inbox-index linux-omap

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-omap


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git