All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Koehrer Mathias (ETAS/ESW5)" <mathias.koehrer@etas.com>
To: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: linux-rt-users <linux-rt-users@vger.kernel.org>
Subject: RE: Kernel 4.9.x-rt Fully Preemptible Kernel: Issue with gdb and unexpected SIGSTOP signals
Date: Fri, 27 Jan 2017 14:04:35 +0000	[thread overview]
Message-ID: <61f913336df14059bc409249f0e70ea3@FE-MBX1012.de.bosch.com> (raw)
In-Reply-To: <3c9395f9c6b944978c1f420276bde0ee@FE-MBX1012.de.bosch.com>

[-- Attachment #1: Type: text/plain, Size: 6243 bytes --]

Hi Sebastian,

> > > > > > > The kernel version 4.1.37-rt43 is working fine, the kernel
> > > > > > > versions
> > > > > > > 4.4.39-rt50 and
> > > > > > 4.8.11-rt7 show the same strange behavior as 4.9.0-rt1.
> > > > > > > Something on the way between 4.1.37-rt43 and 4.4.39-rt50
> > > > > > > seems to cause the
> > > > > > trouble.
> > > > > >
> > > > > > can you check if one of the earlier v4.4-RT releases (maybe
> > > > > > start with
> > > > > > -rt2 or -rt1) also shows this behaviour?
> > > > > > If not, would you have a testcase?
> > > > > I tested with 4.4.0-rt2.
> > > > > Here this issue occurs very rarely. I ran the gdb for about 30
> > > > > times, and here I got one hit.
> > > > > With the other kernel versions I got the issue in 20-40% of the cases.
> > > > >
> > > > > I will try out the other 4.4.0 releases...
> > > >
> > > > I ran more tests. The kernel 4.4.1-rt4 is working fairly fine
> > > > (only about one hit every 30 runs) but the kernel 4.4.1-rt5 is
> > > > causing the issue very
> > > often.
> > >
> > > and 4.1.37-rt43 shows not SIGSTOPS in 100 runs?
> > That's right.
> > I used 4.1.37-rt43 and ran the test for 200 times. There was no single issue.
I have extended my test executable to use the kernel tracing feature to get more details.

Here is the output of my gdb session:
------------------ BEGIN -------------------------
pid: 17969
[New Thread 0x7ffff7616700 (LWP 17973)]
[New Thread 0x7ffff6e15700 (LWP 17974)]
[New Thread 0x7ffff6614700 (LWP 17975)]
[New Thread 0x7ffff5e13700 (LWP 17976)]
End of thread 0
End of thread 2
End of thread 1
Thread 0 joined
Thread 1 joined
Thread 2 joined
[Thread 0x7ffff6614700 (LWP 17975) exited]
[Thread 0x7ffff6e15700 (LWP 17974) exited]
[Thread 0x7ffff7616700 (LWP 17973) exited]
^C
Thread 1 "gdb-issue" received signal SIGINT, Interrupt.
0x00007ffff79bd6bd in pthread_join (threadid=140737318565632, thread_return=0x0) at pthread_join.c:90
90      pthread_join.c: No such file or directory.
(gdb) info thr
  Id   Target Id         Frame
* 1    Thread 0x7ffff7fe0700 (LWP 17969) "gdb-issue" 0x00007ffff79bd6bd in pthread_join (threadid=140737318565632, thread_return=0x0)
    at pthread_join.c:90
  5    Thread 0x7ffff5e13700 (LWP 17976) "gdb-issue" 0x00007ffff79c5b7b in __waitpid (pid=18160, stat_loc=0x7ffff5e12ea0, options=0)
    at ../sysdeps/unix/sysv/linux/waitpid.c:29
(gdb) thr 5
[Switching to thread 5 (Thread 0x7ffff5e13700 (LWP 17976))]
#0  0x00007ffff79c5b7b in __waitpid (pid=18160, stat_loc=0x7ffff5e12ea0, options=0) at ../sysdeps/unix/sysv/linux/waitpid.c:29
29      ../sysdeps/unix/sysv/linux/waitpid.c: No such file or directory.
(gdb) bt
#0  0x00007ffff79c5b7b in __waitpid (pid=18160, stat_loc=0x7ffff5e12ea0, options=0) at ../sysdeps/unix/sysv/linux/waitpid.c:29
#1  0x00005555555551ce in thread_func (arg=0x555555756130 <thread_data+48>) at gdb-issue.c:107
#2  0x00007ffff79bc464 in start_thread (arg=0x7ffff5e13700) at pthread_create.c:333
#3  0x00007ffff76ff9df in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:105
(gdb)
------------------ END -------------------------

The program hangs as it cannot join thread 3 (LWP 17976).
This thread has forked (pid=18160) to call "lspci -xxx" but it waits forever for the termination of pid 18160.

I have attached an extract of the kernel trace where this situation is reflected.

ERROR:
One interesting thing can be observed. The forked process 18160 is listed with the following lines
in the trace:
           <...>-18160 [003] d...112 19076.865212: sched_waking: comm=gdb-issue pid=17969 prio=120 target_cpu=000
           <...>-18160 [003] d...212 19076.865213: sched_wakeup: comm=gdb-issue pid=17969 prio=120 target_cpu=000
           <...>-18160 [003] d...2.. 19076.874033: sched_switch: prev_comm=gdb-issue prev_pid=18160 prev_prio=39 prev_state=T|K ==> next_comm=gdb-issue next_pid=17976 next_prio=39
           <...>-18160 [003] d...2.. 19076.874033: tlb_flush: pages:-1 reason:flush on task switch (0)
           <...>-18160 [003] d...2.. 19076.874034: x86_fpu_regs_deactivated: x86/fpu: ffff8801fcfaf600 fpregs_active: 0 fpstate_active: 1 counter: 4 xfeatures: 6 xcomp_bv: 0
           <...>-18160 [003] d...2.. 19076.874034: x86_fpu_regs_activated: x86/fpu: ffff88040b027600 fpregs_active: 1 fpstate_active: 1 counter: 113 xfeatures: 6 xcomp_bv: 0
           <...>-18160 [003] d...2.. 19076.874034: x86_fpu_regs_activated: x86/fpu: ffff88040b027600 fpregs_active: 1 fpstate_active: 1 counter: 113 xfeatures: 6 xcomp_bv: 0
           <...>-18160 [003] d...2.. 19076.874034: write_msr: c0000100, value 7ffff5e13700


SUCCESS:
Within the previous loop the thread 17976 also forked to pid=18150 to do the very same.
In this case it has been successfully.
The lines for pid=18150 in the trace looks like this:
           <...>-18150 [003] d...2.. 19076.846386: sched_switch: prev_comm=sh prev_pid=18150 prev_prio=39 prev_state=S ==> next_comm=gdb-issue next_pid=17976 next_prio=39
           <...>-18150 [003] d...2.. 19076.846387: tlb_flush: pages:-1 reason:flush on task switch (0)
           <...>-18150 [003] d...2.. 19076.846387: x86_fpu_regs_deactivated: x86/fpu: ffff8801fcfa8a00 fpregs_active: 0 fpstate_active: 1 counter: 0 xfeatures: 2 xcomp_bv: 0
           <...>-18150 [003] d...2.. 19076.846388: x86_fpu_regs_activated: x86/fpu: ffff88040b027600 fpregs_active: 1 fpstate_active: 1 counter: 105 xfeatures: 6 xcomp_bv: 0
           <...>-18150 [003] d...2.. 19076.846388: x86_fpu_regs_activated: x86/fpu: ffff88040b027600 fpregs_active: 1 fpstate_active: 1 counter: 105 xfeatures: 6 xcomp_bv: 0
           <...>-18150 [003] d...2.. 19076.846388: write_msr: c0000100, value 7ffff5e13700

In all the successful loops the lines in the trace look like the lines in the successful (2nd) example below.
In the error case there is an additional sched_waking and sched_wakeup in the trace.
Also there is always "prev_state=S" within the forked process.
In the error case, there is "prev_state=T|K".

Please find the trace extract and the test executable in the attachment.

Any idea or feedback on the issue is highly welcome!

Thanks

Best regards

Mathias



[-- Attachment #2: trace-extract.gz --]
[-- Type: application/x-gzip, Size: 11327 bytes --]

[-- Attachment #3: gdb-issue.tgz --]
[-- Type: application/x-compressed, Size: 1727 bytes --]

  reply	other threads:[~2017-01-27 14:15 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-01-23 13:43 Kernel 4.9.x-rt Fully Preemptible Kernel: Issue with gdb and unexpected SIGSTOP signals Koehrer Mathias (ETAS/ESW5)
2017-01-24  9:15 ` Koehrer Mathias (ETAS/ESW5)
2017-01-25  9:56   ` Sebastian Andrzej Siewior
2017-01-25 11:28     ` Koehrer Mathias (ETAS/ESW5)
2017-01-25 12:55       ` Koehrer Mathias (ETAS/ESW5)
2017-01-25 13:36         ` Koehrer Mathias (ETAS/ESW5)
2017-01-25 13:40         ` Sebastian Andrzej Siewior
2017-01-25 14:00           ` Koehrer Mathias (ETAS/ESW5)
2017-01-26  9:26             ` Koehrer Mathias (ETAS/ESW5)
2017-01-27 14:04               ` Koehrer Mathias (ETAS/ESW5) [this message]
2017-01-27 15:33                 ` Sebastian Andrzej Siewior
2017-01-30  7:24                   ` Koehrer Mathias (ETAS/ESW5)
2017-03-02 17:51                 ` Sebastian Andrzej Siewior
2017-03-07 13:39                   ` Koehrer Mathias (ETAS/ESW3)
2017-03-07 23:21                     ` Sebastian Andrzej Siewior
2017-04-24 19:49                       ` David Hauck
2017-04-25  6:06                         ` Koehrer Mathias (ETAS/ESW3)
     [not found] <6a05f9f4-9299-4b36-7f11-5e334768880a@windriver.com>
     [not found] ` <85216e5b-7c2d-9ff7-c118-9279023a1726@windriver.com>
2017-06-29  6:08   ` Zhou, Li
2017-06-29  7:45     ` Koehrer Mathias (ETAS/EHE1)
2017-06-29  8:42       ` Zhou, Li

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=61f913336df14059bc409249f0e70ea3@FE-MBX1012.de.bosch.com \
    --to=mathias.koehrer@etas.com \
    --cc=bigeasy@linutronix.de \
    --cc=linux-rt-users@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.