linux-rt-users.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* RT kernel v4.14 NULL pointer dereferences
@ 2020-06-16  7:30 Roosen Henri
  2020-08-14 11:58 ` Sebastian Andrzej Siewior
  0 siblings, 1 reply; 11+ messages in thread
From: Roosen Henri @ 2020-06-16  7:30 UTC (permalink / raw)
  To: linux-rt-users

[-- Attachment #1: Type: text/plain, Size: 994 bytes --]

Hi RT-experts,

Duration tests with RT enabled v4.14 kernels on our ARM iMX6Q systems
are showing "Unable to handle kernel NULL pointer dereference at
virtual address" after running for some months. The same tests on NON-
RT kernels don't show any problem and have been running without any
problems for more than 20 months now.

I've been updating the tests from time to time with more recent
kernels. Testing is basically done with cyclictest, in parallel with
some load scripts/hackbench. Please have a look at the kernel traces:
4.14.71-rt44  -> https://paste.debian.net/1152206/
4.14.106-rt56 -> https://paste.debian.net/1152204/
4.14.146-rt67 -> https://paste.debian.net/1152202/

I've seen the v4.14-rt branch got some fixes, so my questions are:
1) is problem I'm facing known and fixed in v4.14-rt, or
2) is the problem I'm seeing a new issue probably still unfixed in
4.14.183-rt83 (which I'm testing right now)?

Any help is highly appreciated!

Thanks,
Henri

[-- Attachment #2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 3608 bytes --]

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: RT kernel v4.14 NULL pointer dereferences
  2020-06-16  7:30 RT kernel v4.14 NULL pointer dereferences Roosen Henri
@ 2020-08-14 11:58 ` Sebastian Andrzej Siewior
  2020-08-17  8:12   ` Roosen Henri
  0 siblings, 1 reply; 11+ messages in thread
From: Sebastian Andrzej Siewior @ 2020-08-14 11:58 UTC (permalink / raw)
  To: Roosen Henri; +Cc: linux-rt-users, Clark Williams

On 2020-06-16 07:30:33 [+0000], Roosen Henri wrote:
> Hi RT-experts,
> 
> Duration tests with RT enabled v4.14 kernels on our ARM iMX6Q systems
> are showing "Unable to handle kernel NULL pointer dereference at
> virtual address" after running for some months. The same tests on NON-
> RT kernels don't show any problem and have been running without any
> problems for more than 20 months now.
> 
> I've been updating the tests from time to time with more recent
> kernels. Testing is basically done with cyclictest, in parallel with
> some load scripts/hackbench. Please have a look at the kernel traces:
> 4.14.71-rt44  -> https://paste.debian.net/1152206/
> 4.14.106-rt56 -> https://paste.debian.net/1152204/
> 4.14.146-rt67 -> https://paste.debian.net/1152202/
> 
> I've seen the v4.14-rt branch got some fixes, so my questions are:
> 1) is problem I'm facing known and fixed in v4.14-rt, or
> 2) is the problem I'm seeing a new issue probably still unfixed in
> 4.14.183-rt83 (which I'm testing right now)?
> 
> Any help is highly appreciated!

So this has not fixed all by itself?
Is the 4.14 series the only one that is affected?

> Thanks,
> Henri

Sebastian

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: RT kernel v4.14 NULL pointer dereferences
  2020-08-14 11:58 ` Sebastian Andrzej Siewior
@ 2020-08-17  8:12   ` Roosen Henri
  2020-08-17 11:13     ` bigeasy
  0 siblings, 1 reply; 11+ messages in thread
From: Roosen Henri @ 2020-08-17  8:12 UTC (permalink / raw)
  To: bigeasy; +Cc: linux-rt-users, williams

[-- Attachment #1: Type: text/plain, Size: 2664 bytes --]



Henri Roosen | Entwicklung Software

GINZINGER ELECTRONIC SYSTEMS GMBH

Tel.: +43 7723 5422 161
Mail: henri.roosen@ginzinger.com
Web: www.ginzinger.com




On Fri, 2020-08-14 at 13:58 +0200, Sebastian Andrzej Siewior wrote:
> On 2020-06-16 07:30:33 [+0000], Roosen Henri wrote:
> > Hi RT-experts,
> >
> > Duration tests with RT enabled v4.14 kernels on our ARM iMX6Q
> > systems
> > are showing "Unable to handle kernel NULL pointer dereference at
> > virtual address" after running for some months. The same tests on
> > NON-
> > RT kernels don't show any problem and have been running without any
> > problems for more than 20 months now.
> >
> > I've been updating the tests from time to time with more recent
> > kernels. Testing is basically done with cyclictest, in parallel
> > with
> > some load scripts/hackbench. Please have a look at the kernel
> > traces:
> > 4.14.71-rt44  -> https://paste.debian.net/1152206/
> > 4.14.106-rt56 -> https://paste.debian.net/1152204/
> > 4.14.146-rt67 -> https://paste.debian.net/1152202/
> >
> > I've seen the v4.14-rt branch got some fixes, so my questions are:
> > 1) is problem I'm facing known and fixed in v4.14-rt, or
> > 2) is the problem I'm seeing a new issue probably still unfixed in
> > 4.14.183-rt83 (which I'm testing right now)?
> >
> > Any help is highly appreciated!
>
> So this has not fixed all by itself?
> Is the 4.14 series the only one that is affected?

No, unfortunately not.

I ended up testing the v4.14.184-rt84 on two systems and the v5.4.45-
rt27 on another system, all iMX6Q. Only one of the v4.14 systems is
still running, the other v4.14 system dumping backtraces until
eventually reset, the v5.4 system endlessly dumping backtraces:

v4.14: https://paste.ubuntu.com/p/ZdFYhs4pjx/
v5.4: https://paste.ubuntu.com/p/wZPCQv8KjX/

If the backtraces don't point to a root-cause, are there any kernel
configuration options which are usefull to track this issue down?

Thanks!

>
> > Thanks,
> > Henri
>
> Sebastian



________________________________________

Ginzinger electronic systems GmbH
Gewerbegebiet Pirath 16
4952 Weng im Innkreis
www.ginzinger.com

Firmenbuchnummer: FN 364958d
Firmenbuchgericht: Ried im Innkreis
UID-Nr.: ATU66521089


Diese Nachricht ist vertraulich und darf nicht an andere Personen weitergegeben oder von diesen verwendet werden. Verständigen Sie uns, wenn Sie irrtümlich eine Mitteilung empfangen haben.

This message is confidential. It may not be disclosed to, or used by, anyone other than the addressee. If you receive this message by mistake, please advise the sender.

[-- Attachment #2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 3608 bytes --]

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: RT kernel v4.14 NULL pointer dereferences
  2020-08-17  8:12   ` Roosen Henri
@ 2020-08-17 11:13     ` bigeasy
       [not found]       ` <05cc3d6085641c7f0425e45358451b05c5e9fc07.camel@ginzinger.com>
  2021-01-19 10:57       ` Roosen Henri
  0 siblings, 2 replies; 11+ messages in thread
From: bigeasy @ 2020-08-17 11:13 UTC (permalink / raw)
  To: Roosen Henri; +Cc: linux-rt-users, williams

On 2020-08-17 08:12:31 [+0000], Roosen Henri wrote:
> > Is the 4.14 series the only one that is affected?
> 
> No, unfortunately not.
> 
> I ended up testing the v4.14.184-rt84 on two systems and the v5.4.45-
> rt27 on another system, all iMX6Q. Only one of the v4.14 systems is
> still running, the other v4.14 system dumping backtraces until
> eventually reset, the v5.4 system endlessly dumping backtraces:
> 
> v4.14: https://paste.ubuntu.com/p/ZdFYhs4pjx/
> v5.4: https://paste.ubuntu.com/p/wZPCQv8KjX/

So the 5.4 has an uptime of 45 days until this start?

> If the backtraces don't point to a root-cause, are there any kernel
> configuration options which are usefull to track this issue down?

The 5.4 is not complete, it started earlier and this might be a
follow-up problem (it seems to originate in die()).
You could send me the config, I have an imx6q somewhere so I might be
able to take a look. However I want sort another ppc issue first.

> Thanks!

Sebastian

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: RT kernel v4.14 NULL pointer dereferences
       [not found]       ` <05cc3d6085641c7f0425e45358451b05c5e9fc07.camel@ginzinger.com>
@ 2020-08-17 11:58         ` Roosen Henri
  2020-08-26  9:56           ` Roosen Henri
  0 siblings, 1 reply; 11+ messages in thread
From: Roosen Henri @ 2020-08-17 11:58 UTC (permalink / raw)
  To: bigeasy; +Cc: linux-rt-users, williams

[-- Attachment #1: Type: text/plain, Size: 1630 bytes --]

On Mon, 2020-08-17 at 13:43 +0200, Henri Roosen wrote:
> On Mon, 2020-08-17 at 13:13 +0200, bigeasy@linutronix.de wrote:
> > On 2020-08-17 08:12:31 [+0000], Roosen Henri wrote:
> > > > Is the 4.14 series the only one that is affected?
> > > 
> > > No, unfortunately not.
> > > 
> > > I ended up testing the v4.14.184-rt84 on two systems and the
> > > v5.4.45-
> > > rt27 on another system, all iMX6Q. Only one of the v4.14 systems
> > > is
> > > still running, the other v4.14 system dumping backtraces until
> > > eventually reset, the v5.4 system endlessly dumping backtraces:
> > > 
> > > v4.14: https://paste.ubuntu.com/p/ZdFYhs4pjx/
> > > v5.4: https://paste.ubuntu.com/p/wZPCQv8KjX/
> > 
> > So the 5.4 has an uptime of 45 days until this start?
> 
> Correct.
> 
> > > If the backtraces don't point to a root-cause, are there any
> > > kernel
> > > configuration options which are usefull to track this issue down?
> > 
> > The 5.4 is not complete, it started earlier and this might be a
> > follow-up problem (it seems to originate in die()).
> > You could send me the config, I have an imx6q somewhere so I might
> > be
> > able to take a look. However I want sort another ppc issue first.
> 
> Please find the v5.4.45-rt27-config file attached to this email. Let 

To clarify the v5.4.45-rt27 before any confusion: I used the v5.4.44-
rt27 merged into our v5.4.45 branch.

> me
> know if you have any suggestions about the config file or useful
> debugging I can switch on; I'll be restarting a couple of systems for
> new duration tests.
> 
> Thanks for your help!
> Henri
> 

[-- Attachment #2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 3608 bytes --]

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: RT kernel v4.14 NULL pointer dereferences
  2020-08-17 11:58         ` Roosen Henri
@ 2020-08-26  9:56           ` Roosen Henri
  2020-08-27 12:52             ` bigeasy
  0 siblings, 1 reply; 11+ messages in thread
From: Roosen Henri @ 2020-08-26  9:56 UTC (permalink / raw)
  To: bigeasy; +Cc: linux-rt-users, williams

[-- Attachment #1: Type: text/plain, Size: 2092 bytes --]

On Mon, 2020-08-17 at 11:58 +0000, Roosen Henri wrote:
> On Mon, 2020-08-17 at 13:43 +0200, Henri Roosen wrote:
> > On Mon, 2020-08-17 at 13:13 +0200, bigeasy@linutronix.de wrote:
> > > On 2020-08-17 08:12:31 [+0000], Roosen Henri wrote:
> > > > > Is the 4.14 series the only one that is affected?
> > > > 
> > > > No, unfortunately not.
> > > > 
> > > > I ended up testing the v4.14.184-rt84 on two systems and the
> > > > v5.4.45-
> > > > rt27 on another system, all iMX6Q. Only one of the v4.14
> > > > systems
> > > > is
> > > > still running, the other v4.14 system dumping backtraces until
> > > > eventually reset, the v5.4 system endlessly dumping backtraces:
> > > > 
> > > > v4.14: https://paste.ubuntu.com/p/ZdFYhs4pjx/
> > > > v5.4: https://paste.ubuntu.com/p/wZPCQv8KjX/
> > > 
> > > So the 5.4 has an uptime of 45 days until this start?
> > 
> > Correct.
> > 
> > > > If the backtraces don't point to a root-cause, are there any
> > > > kernel
> > > > configuration options which are usefull to track this issue
> > > > down?
> > > 
> > > The 5.4 is not complete, it started earlier and this might be a
> > > follow-up problem (it seems to originate in die()).
> > > You could send me the config, I have an imx6q somewhere so I
> > > might
> > > be
> > > able to take a look. However I want sort another ppc issue first.
> > 
> > Please find the v5.4.45-rt27-config file attached to this email. 

I'm afraid something went wrong with sending attachments to the mailing
list. Please find the v5.4.45-rt27-config here as well: 
https://paste.ubuntu.com/p/Yr9Vf4yNWs/

> > Let 
> 
> To clarify the v5.4.45-rt27 before any confusion: I used the v5.4.44-
> rt27 merged into our v5.4.45 branch.
> 
> > me
> > know if you have any suggestions about the config file or useful
> > debugging I can switch on; I'll be restarting a couple of systems
> > for
> > new duration tests.

Please let me know if there you find any unregularities in my config or
suggestions on any usefull debugging-/forensic-config-options.

Thanks!
Henri

[-- Attachment #2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 3608 bytes --]

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: RT kernel v4.14 NULL pointer dereferences
  2020-08-26  9:56           ` Roosen Henri
@ 2020-08-27 12:52             ` bigeasy
  2020-08-27 13:11               ` bigeasy
  0 siblings, 1 reply; 11+ messages in thread
From: bigeasy @ 2020-08-27 12:52 UTC (permalink / raw)
  To: Roosen Henri; +Cc: linux-rt-users, williams

On 2020-08-26 09:56:16 [+0000], Roosen Henri wrote:
> Please let me know if there you find any unregularities in my config or
> suggestions on any usefull debugging-/forensic-config-options.

I have no suggestions. Well besides memory debugging, lockdep and these
kind of things.

> Thanks!
> Henri

Sebastian

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: RT kernel v4.14 NULL pointer dereferences
  2020-08-27 12:52             ` bigeasy
@ 2020-08-27 13:11               ` bigeasy
  2020-08-28  8:47                 ` Roosen Henri
  0 siblings, 1 reply; 11+ messages in thread
From: bigeasy @ 2020-08-27 13:11 UTC (permalink / raw)
  To: Roosen Henri; +Cc: linux-rt-users, williams

On 2020-08-27 14:52:58 [+0200], To Roosen Henri wrote:
> On 2020-08-26 09:56:16 [+0000], Roosen Henri wrote:
> > Please let me know if there you find any unregularities in my config or
> > suggestions on any usefull debugging-/forensic-config-options.
> 
> I have no suggestions. Well besides memory debugging, lockdep and these
> kind of things.

You never sent a backtrace, did you? I remember you posted a link a
pastebin which contain only the "middle" of the trace, not where it
started.

> > Thanks!
> > Henri
> 
Sebastian

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: RT kernel v4.14 NULL pointer dereferences
  2020-08-27 13:11               ` bigeasy
@ 2020-08-28  8:47                 ` Roosen Henri
  0 siblings, 0 replies; 11+ messages in thread
From: Roosen Henri @ 2020-08-28  8:47 UTC (permalink / raw)
  To: bigeasy; +Cc: linux-rt-users, williams

[-- Attachment #1: Type: text/plain, Size: 1573 bytes --]

On Thu, 2020-08-27 at 15:11 +0200, bigeasy@linutronix.de wrote:
> On 2020-08-27 14:52:58 [+0200], To Roosen Henri wrote:
> > On 2020-08-26 09:56:16 [+0000], Roosen Henri wrote:
> > > Please let me know if there you find any unregularities in my
> > > config or
> > > suggestions on any usefull debugging-/forensic-config-options.
> > 
> > I have no suggestions. Well besides memory debugging, lockdep and
> > these
> > kind of things.
> 
> You never sent a backtrace, did you? I remember you posted a link a
> pastebin which contain only the "middle" of the trace, not where it
> started.

Correct, please see 
https://lore.kernel.org/linux-rt-users/25412beffb96ab8c2b8f869bdac5a66c49faa5ca.camel@ginzinger.com/

Unfortunately the only v5.4 backtrace I currently have, the kernel was
continuously dumping backtraces, therefore only the "middle" of the
trace is available (https://paste.ubuntu.com/p/wZPCQv8KjX/).

Currently testing 2 systems with v5.4.58-rt35 for 2 weeks now. Might
take another 6 weeks or more before backtraces become available.

The pastebins of the v4.14 kernels from the first post in this thread
don't work anymore, I re-posted them here:

4.14.71-rt44  -> https://paste.ubuntu.com/p/rgSNr97w6k/
4.14.106-rt56 -> https://paste.ubuntu.com/p/csVvJcYGrt/
4.14.146-rt67 -> https://paste.ubuntu.com/p/crybp6F6Hb/

The v4.14 logs have a more complete backtrace. Seems the v4.14 log are
all showing backtraces of SyS_read/SyS_write syscalls on unix-stream-
sockets, but that might be just a red-herring.

Thanks!
Henri

[-- Attachment #2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 3608 bytes --]

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: RT kernel v4.14 NULL pointer dereferences
  2020-08-17 11:13     ` bigeasy
       [not found]       ` <05cc3d6085641c7f0425e45358451b05c5e9fc07.camel@ginzinger.com>
@ 2021-01-19 10:57       ` Roosen Henri
  2021-01-21 15:29         ` bigeasy
  1 sibling, 1 reply; 11+ messages in thread
From: Roosen Henri @ 2021-01-19 10:57 UTC (permalink / raw)
  To: bigeasy; +Cc: linux-rt-users, williams

[-- Attachment #1: Type: text/plain, Size: 1231 bytes --]

On Mon, 2020-08-17 at 13:13 +0200, bigeasy@linutronix.de wrote:
> On 2020-08-17 08:12:31 [+0000], Roosen Henri wrote:
> > > Is the 4.14 series the only one that is affected?
> > 
> > No, unfortunately not.
> > 
> > I ended up testing the v4.14.184-rt84 on two systems and the
> > v5.4.45-
> > rt27 on another system, all iMX6Q. Only one of the v4.14 systems is
> > still running, the other v4.14 system dumping backtraces until
> > eventually reset, the v5.4 system endlessly dumping backtraces:
> > 
> > v4.14: https://paste.ubuntu.com/p/ZdFYhs4pjx/
> > v5.4: https://paste.ubuntu.com/p/wZPCQv8KjX/
> 
> So the 5.4 has an uptime of 45 days until this start?
> 
> > If the backtraces don't point to a root-cause, are there any kernel
> > configuration options which are usefull to track this issue down?
> 
> The 5.4 is not complete, it started earlier and this might be a
> follow-up problem (it seems to originate in die()).
> You could send me the config, I have an imx6q somewhere so I might be
> able to take a look. However I want sort another ppc issue first.

Sebastian, did you have a chance to bring up a duration test on your
imx6q board? Are there any results from testing?

Thanks,
Henri

[-- Attachment #2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 3608 bytes --]

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: RT kernel v4.14 NULL pointer dereferences
  2021-01-19 10:57       ` Roosen Henri
@ 2021-01-21 15:29         ` bigeasy
  0 siblings, 0 replies; 11+ messages in thread
From: bigeasy @ 2021-01-21 15:29 UTC (permalink / raw)
  To: Roosen Henri; +Cc: linux-rt-users, williams

On 2021-01-19 10:57:36 [+0000], Roosen Henri wrote:
> Sebastian, did you have a chance to bring up a duration test on your
> imx6q board? Are there any results from testing?

I just took your config from the thread and just bootet v5.4.84-rt47.
For some reason I have no network so I started hackbench in the
background, redirecting it to dev/null and recording serial showing
cyclictest.
I took the v5.4-tree as reported.

> Thanks,
> Henri

Sebastian

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2021-01-21 15:33 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-06-16  7:30 RT kernel v4.14 NULL pointer dereferences Roosen Henri
2020-08-14 11:58 ` Sebastian Andrzej Siewior
2020-08-17  8:12   ` Roosen Henri
2020-08-17 11:13     ` bigeasy
     [not found]       ` <05cc3d6085641c7f0425e45358451b05c5e9fc07.camel@ginzinger.com>
2020-08-17 11:58         ` Roosen Henri
2020-08-26  9:56           ` Roosen Henri
2020-08-27 12:52             ` bigeasy
2020-08-27 13:11               ` bigeasy
2020-08-28  8:47                 ` Roosen Henri
2021-01-19 10:57       ` Roosen Henri
2021-01-21 15:29         ` bigeasy

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).