All of lore.kernel.org
 help / color / mirror / Atom feed
* Debugging a kernel freeze
@ 2017-12-01 14:05 Victor Ascroft
  2017-12-01 15:48 ` Daniel.
  2017-12-01 16:53 ` phil
  0 siblings, 2 replies; 3+ messages in thread
From: Victor Ascroft @ 2017-12-01 14:05 UTC (permalink / raw)
  To: kernelnewbies

Hello,

I have a iMX6 running a 4.9 kernel with a custom kernel driver communicating
with a FPGA over PCIe. The driver is not built in to the kernel but loaded as
a module after complete boot up. During the running of the system, after a few
hours the kernel completely freezes. No kernel panics or stack traces, nothing.
I have access to the serial console.

In such a scenario what are the ways to debug and try locating the source of
the problem? I am not looking for a solution for my problem but things or
approaches one can go about trying while trying to fix such a scenario? 

Thank you for any pointers in advance.

Regards.

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Debugging a kernel freeze
  2017-12-01 14:05 Debugging a kernel freeze Victor Ascroft
@ 2017-12-01 15:48 ` Daniel.
  2017-12-01 16:53 ` phil
  1 sibling, 0 replies; 3+ messages in thread
From: Daniel. @ 2017-12-01 15:48 UTC (permalink / raw)
  To: kernelnewbies

In this scenario I would isolate stuff until find at last the domain of the
problem. You say that it happens after some hours, can it be because of
temperature?, What is the workload?

2017-12-01 12:05 GMT-02:00 Victor Ascroft <victorascroft@gmail.com>:

> Hello,
>
> I have a iMX6 running a 4.9 kernel with a custom kernel driver
> communicating
> with a FPGA over PCIe. The driver is not built in to the kernel but loaded
> as
> a module after complete boot up. During the running of the system, after a
> few
> hours the kernel completely freezes. No kernel panics or stack traces,
> nothing.
> I have access to the serial console.
>
> In such a scenario what are the ways to debug and try locating the source
> of
> the problem? I am not looking for a solution for my problem but things or
> approaches one can go about trying while trying to fix such a scenario?
>
> Thank you for any pointers in advance.
>
> Regards.
>
> _______________________________________________
> Kernelnewbies mailing list
> Kernelnewbies at kernelnewbies.org
> https://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies
>



-- 
?If you're going to try, go all the way. Otherwise, don't even start. ..."
  Charles Bukowski
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.kernelnewbies.org/pipermail/kernelnewbies/attachments/20171201/ca628d9f/attachment.html 

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Debugging a kernel freeze
  2017-12-01 14:05 Debugging a kernel freeze Victor Ascroft
  2017-12-01 15:48 ` Daniel.
@ 2017-12-01 16:53 ` phil
  1 sibling, 0 replies; 3+ messages in thread
From: phil @ 2017-12-01 16:53 UTC (permalink / raw)
  To: kernelnewbies

On 01/12/17 14:05, Victor Ascroft wrote:
> I have a iMX6 running a 4.9 kernel with a custom kernel driver communicating
> with a FPGA over PCIe. The driver is not built in to the kernel but loaded as
> a module after complete boot up. During the running of the system, after a few
> hours the kernel completely freezes. No kernel panics or stack traces, nothing.
> I have access to the serial console.

I've done a lot of work with the imx6 and an Altera Cyclone IV FPGA 
connected via PCIe bus and I've not experienced any major issues with 
this setup.

> In such a scenario what are the ways to debug and try locating the source of
> the problem? I am not looking for a solution for my problem but things or
> approaches one can go about trying while trying to fix such a scenario?

This is a difficult situation and it will take a lot of time to debug 
but you really just need to spend time picking apart the driver. You 
should try disabling various parts and adding dynamic debug messages or 
tracing.

My first suspicion in these cases however is always with interrupts. 
There have been a few times when our FPGA code has a fault and the 
interrupts fail, so my first port of call is to usually disable 
interrupts in my driver and replace them with highres timers. Also you 
might want to look at load balancing the interrupts, ARM processors keep 
interrupts to one core (or they did in the kernels I've been using) and 
you can either manually assign the interrupts to other cores or use 
irqbalance to do so automatically. I prefered the manual solution as 
irqbalance didn't seem to assign my workload efficiently across the 
cores. At any rate you should probably be monitoring the interrupts.

Good Luck!

Regards,

Philip Downer

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2017-12-01 16:53 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-12-01 14:05 Debugging a kernel freeze Victor Ascroft
2017-12-01 15:48 ` Daniel.
2017-12-01 16:53 ` phil

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.