Kernel 4.9.x-rt Fully Preemptible Kernel: Issue with gdb and unexpected SIGSTOP signals

All of lore.kernel.org
 help / color / mirror / Atom feed

* Kernel 4.9.x-rt Fully Preemptible Kernel: Issue with gdb and unexpected SIGSTOP signals
@ 2017-01-23 13:43 Koehrer Mathias (ETAS/ESW5)
  2017-01-24  9:15 ` Koehrer Mathias (ETAS/ESW5)
  0 siblings, 1 reply; 20+ messages in thread
From: Koehrer Mathias (ETAS/ESW5) @ 2017-01-23 13:43 UTC (permalink / raw)
  To: linux-rt-users

Hi all,

with the 4.9.0-rt1 and also the 4.9.4-rt2 kernel (x86, 64bit) I observe a strange issue when running my multithreaded real time executable within gdb.

Fairly often (in about 40% of all runs) the gdb stops executing with the message:
  Thread 8 "MDL07-Acknowled" received signal SIGSTOP, Stopped (signal).
  [Switching to Thread 0x7fffef141700 (LWP 9770)]
  0x00007ffff535b24d in read () at ../sysdeps/unix/syscall-template.S:84

Often it helps to enter "continue" a couple of times to continue the debugging.
However, sometimes it ends up in an endless loop of "continue" followed by a SIGSTOP, "continue", etc.
If this issue occurs, the SIGSTOP is complained for all threads in the executable.
Different system calls are affected: read, select, pselect, clone (via pthread_create), ...
Debugging is more or less impossible due to this issue.
No SIGSTOP is sent out by the application, the root cause for this signal is not clear to me.

When I reconfigured the kernel to use "Preemptible Kernel (Basic RT)" I do not observer this issue.
Also with kernel 3.18.42-rt44 (Fully Preemptible) it works fine. I never see this issue there.

Any hints on how to solve the issue are highly appreciated. 

Thanks

Best regards

Mathias

^ permalink raw reply	[flat|nested] 20+ messages in thread

* RE: Kernel 4.9.x-rt Fully Preemptible Kernel: Issue with gdb and unexpected SIGSTOP signals
  2017-01-23 13:43 Kernel 4.9.x-rt Fully Preemptible Kernel: Issue with gdb and unexpected SIGSTOP signals Koehrer Mathias (ETAS/ESW5)
@ 2017-01-24  9:15 ` Koehrer Mathias (ETAS/ESW5)
  2017-01-25  9:56   ` Sebastian Andrzej Siewior
  0 siblings, 1 reply; 20+ messages in thread
From: Koehrer Mathias (ETAS/ESW5) @ 2017-01-24  9:15 UTC (permalink / raw)
  To: Koehrer Mathias (ETAS/ESW5), linux-rt-users

Hi,

> with the 4.9.0-rt1 and also the 4.9.4-rt2 kernel (x86, 64bit) I observe a strange issue
> when running my multithreaded real time executable within gdb.
> 
> Fairly often (in about 40% of all runs) the gdb stops executing with the message:
>   Thread 8 "MDL07-Acknowled" received signal SIGSTOP, Stopped (signal).
>   [Switching to Thread 0x7fffef141700 (LWP 9770)]
>   0x00007ffff535b24d in read () at ../sysdeps/unix/syscall-template.S:84
> 
> Often it helps to enter "continue" a couple of times to continue the debugging.
> However, sometimes it ends up in an endless loop of "continue" followed by a
> SIGSTOP, "continue", etc.
> If this issue occurs, the SIGSTOP is complained for all threads in the executable.
> Different system calls are affected: read, select, pselect, clone (via pthread_create),
> ...
> Debugging is more or less impossible due to this issue.
> No SIGSTOP is sent out by the application, the root cause for this signal is not clear
> to me.
> 
> When I reconfigured the kernel to use "Preemptible Kernel (Basic RT)" I do not
> observer this issue.
> Also with kernel 3.18.42-rt44 (Fully Preemptible) it works fine. I never see this issue
> there.
> 
> Any hints on how to solve the issue are highly appreciated.
I did an additional analysis.
The kernel version 4.1.37-rt43 is working fine, the kernel versions 4.4.39-rt50 and 4.8.11-rt7 show the same strange behavior as 4.9.0-rt1.
Something on the way between 4.1.37-rt43 and 4.4.39-rt50 seems to cause the trouble.

Regards

Mathias

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Kernel 4.9.x-rt Fully Preemptible Kernel: Issue with gdb and unexpected SIGSTOP signals
  2017-01-24  9:15 ` Koehrer Mathias (ETAS/ESW5)
@ 2017-01-25  9:56   ` Sebastian Andrzej Siewior
  2017-01-25 11:28     ` Koehrer Mathias (ETAS/ESW5)
  0 siblings, 1 reply; 20+ messages in thread
From: Sebastian Andrzej Siewior @ 2017-01-25  9:56 UTC (permalink / raw)
  To: Koehrer Mathias (ETAS/ESW5); +Cc: linux-rt-users

On 2017-01-24 09:15:59 [+0000], Koehrer Mathias (ETAS/ESW5) wrote:
> The kernel version 4.1.37-rt43 is working fine, the kernel versions 4.4.39-rt50 and 4.8.11-rt7 show the same strange behavior as 4.9.0-rt1.
> Something on the way between 4.1.37-rt43 and 4.4.39-rt50 seems to cause the trouble.

can you check if one of the earlier v4.4-RT releases (maybe start with
-rt2 or -rt1) also shows this behaviour?
If not, would you have a testcase?

> Regards
> 
> Mathias

Sebastian

^ permalink raw reply	[flat|nested] 20+ messages in thread

* RE: Kernel 4.9.x-rt Fully Preemptible Kernel: Issue with gdb and unexpected SIGSTOP signals
  2017-01-25  9:56   ` Sebastian Andrzej Siewior
@ 2017-01-25 11:28     ` Koehrer Mathias (ETAS/ESW5)
  2017-01-25 12:55       ` Koehrer Mathias (ETAS/ESW5)
  0 siblings, 1 reply; 20+ messages in thread
From: Koehrer Mathias (ETAS/ESW5) @ 2017-01-25 11:28 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior; +Cc: linux-rt-users

Hi Sebastian,

> 
> On 2017-01-24 09:15:59 [+0000], Koehrer Mathias (ETAS/ESW5) wrote:
> > The kernel version 4.1.37-rt43 is working fine, the kernel versions 4.4.39-rt50 and
> 4.8.11-rt7 show the same strange behavior as 4.9.0-rt1.
> > Something on the way between 4.1.37-rt43 and 4.4.39-rt50 seems to cause the
> trouble.
> 
> can you check if one of the earlier v4.4-RT releases (maybe start with
> -rt2 or -rt1) also shows this behaviour?
> If not, would you have a testcase?
I tested with 4.4.0-rt2.
Here this issue occurs very rarely. I ran the gdb for about 30 times, and here I got one hit.
With the other kernel versions I got the issue in 20-40% of the cases.

I will try out the other 4.4.0 releases...

I also tried to generate an easy useable test case - without success so far.
The issue occurs with our (large) application.

Some observations I have made:
- Whenever the issue occurs, at least one thread was in the system call "clone" - either called by pthread_create() 
or by system(). I got this information by calling "info thr" inside gdb.
- I removed the calls to pthread_setschedparam() to avoid running the threads in SCHED_FIFO scheduling.
Even there I got the issue. It seems not to be related to threads running in real time priority.

My tooling (Debian testing/stretch):
gdb 7.12
gcc 6.2.1

Regards

Mathias

^ permalink raw reply	[flat|nested] 20+ messages in thread

* RE: Kernel 4.9.x-rt Fully Preemptible Kernel: Issue with gdb and unexpected SIGSTOP signals
  2017-01-25 11:28     ` Koehrer Mathias (ETAS/ESW5)
@ 2017-01-25 12:55       ` Koehrer Mathias (ETAS/ESW5)
  2017-01-25 13:36         ` Koehrer Mathias (ETAS/ESW5)
  2017-01-25 13:40         ` Sebastian Andrzej Siewior
  0 siblings, 2 replies; 20+ messages in thread
From: Koehrer Mathias (ETAS/ESW5) @ 2017-01-25 12:55 UTC (permalink / raw)
  To: Koehrer Mathias (ETAS/ESW5), Sebastian Andrzej Siewior; +Cc: linux-rt-users

Hi Sebastian,
> > > The kernel version 4.1.37-rt43 is working fine, the kernel versions
> > > 4.4.39-rt50 and
> > 4.8.11-rt7 show the same strange behavior as 4.9.0-rt1.
> > > Something on the way between 4.1.37-rt43 and 4.4.39-rt50 seems to
> > > cause the
> > trouble.
> >
> > can you check if one of the earlier v4.4-RT releases (maybe start with
> > -rt2 or -rt1) also shows this behaviour?
> > If not, would you have a testcase?
> I tested with 4.4.0-rt2.
> Here this issue occurs very rarely. I ran the gdb for about 30 times, and here I got
> one hit.
> With the other kernel versions I got the issue in 20-40% of the cases.
> 
> I will try out the other 4.4.0 releases...

I ran more tests. The kernel 4.4.1-rt4 is working fairly fine (only about one hit every 30 runs)
but the kernel 4.4.1-rt5 is causing the issue very often.

Here is a typical output of the gdb command "info thr" for kernel 4.4.1-rt5

[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
[New Thread 0x7ffff3e03700 (LWP 13240)]
[New Thread 0x7ffff3602700 (LWP 13241)]
[New Thread 0x7ffff0933700 (LWP 13266)]
[New Thread 0x7ffff0132700 (LWP 13267)]
[New Thread 0x7fffef931700 (LWP 13268)]
[New Thread 0x7fffef130700 (LWP 13269)]
[New Thread 0x7fffee92f700 (LWP 13270)]
[New Thread 0x7fffee12e700 (LWP 13271)]

Thread 5 "RTPC.bin" received signal SIGSTOP, Stopped (signal).
[Switching to Thread 0x7ffff0132700 (LWP 13267)]
0x00007ffff536124d in read () at ../sysdeps/unix/syscall-template.S:84
84      ../sysdeps/unix/syscall-template.S: No such file or directory.
(gdb) info thr
  Id   Target Id         Frame
  1    Thread 0x7ffff7fd8500 (LWP 13220) "RTPC.bin" clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:80
  2    Thread 0x7ffff3e03700 (LWP 13240) "RTPC.bin" 0x00007ffff536124d in read () at ../sysdeps/unix/syscall-template.S:84
  3    Thread 0x7ffff3602700 (LWP 13241) "RTPC.bin" 0x00007ffff4d902b3 in select () at ../sysdeps/unix/syscall-template.S:84
  4    Thread 0x7ffff0933700 (LWP 13266) "RTPC.bin" 0x00007ffff536124d in read () at ../sysdeps/unix/syscall-template.S:84
* 5    Thread 0x7ffff0132700 (LWP 13267) "RTPC.bin" 0x00007ffff536124d in read () at ../sysdeps/unix/syscall-template.S:84
  6    Thread 0x7fffef931700 (LWP 13268) "RTPC.bin" 0x00007ffff536124d in read () at ../sysdeps/unix/syscall-template.S:84
  7    Thread 0x7fffef130700 (LWP 13269) "RTPC.bin" 0x00007ffff536124d in read () at ../sysdeps/unix/syscall-template.S:84
  8    Thread 0x7fffee92f700 (LWP 13270) "RTPC.bin" 0x00007ffff536124d in read () at ../sysdeps/unix/syscall-template.S:84
  9    Thread 0x7fffee12e700 (LWP 13271) "RTPC.bin" 0x00007ffff536124d in read () at ../sysdeps/unix/syscall-template.S:84
(gdb)


Regards

Mathias



^ permalink raw reply	[flat|nested] 20+ messages in thread

* RE: Kernel 4.9.x-rt Fully Preemptible Kernel: Issue with gdb and unexpected SIGSTOP signals
  2017-01-25 12:55       ` Koehrer Mathias (ETAS/ESW5)
@ 2017-01-25 13:36         ` Koehrer Mathias (ETAS/ESW5)
  2017-01-25 13:40         ` Sebastian Andrzej Siewior
  1 sibling, 0 replies; 20+ messages in thread
From: Koehrer Mathias (ETAS/ESW5) @ 2017-01-25 13:36 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior; +Cc: linux-rt-users

Hi Sebastian,
> > > > The kernel version 4.1.37-rt43 is working fine, the kernel
> > > > versions
> > > > 4.4.39-rt50 and
> > > 4.8.11-rt7 show the same strange behavior as 4.9.0-rt1.
> > > > Something on the way between 4.1.37-rt43 and 4.4.39-rt50 seems to
> > > > cause the
> > > trouble.
> > >
> > > can you check if one of the earlier v4.4-RT releases (maybe start
> > > with
> > > -rt2 or -rt1) also shows this behaviour?
> > > If not, would you have a testcase?
> > I tested with 4.4.0-rt2.
> > Here this issue occurs very rarely. I ran the gdb for about 30 times,
> > and here I got one hit.
> > With the other kernel versions I got the issue in 20-40% of the cases.
> >
> > I will try out the other 4.4.0 releases...
> 
> I ran more tests. The kernel 4.4.1-rt4 is working fairly fine (only about one hit every
> 30 runs) but the kernel 4.4.1-rt5 is causing the issue very often.
> 
Update:
I removed the patch "kernel-perf-mark-perf_cpu_context-s-timer-as-irqsafe.patch" from the 4.4.1-rt5 series.
This improved the situation. No I got the issue about in 5-10% of the runs.


With 4.9.4-rt2 I tried to remove the same assignment (as part of this very patch) in kernel/events/core.c:
timer->irqsafe = 1;
However this did not show any major improvement... 
It looks as this not the issue alone. However it might be an indication where the issue is caused from.

Regards

Mathias

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Kernel 4.9.x-rt Fully Preemptible Kernel: Issue with gdb and unexpected SIGSTOP signals
  2017-01-25 12:55       ` Koehrer Mathias (ETAS/ESW5)
  2017-01-25 13:36         ` Koehrer Mathias (ETAS/ESW5)
@ 2017-01-25 13:40         ` Sebastian Andrzej Siewior
  2017-01-25 14:00           ` Koehrer Mathias (ETAS/ESW5)
  1 sibling, 1 reply; 20+ messages in thread
From: Sebastian Andrzej Siewior @ 2017-01-25 13:40 UTC (permalink / raw)
  To: Koehrer Mathias (ETAS/ESW5); +Cc: linux-rt-users

On 2017-01-25 12:55:30 [+0000], Koehrer Mathias (ETAS/ESW5) wrote:
> Hi Sebastian,
Hi Mathias,

> > > > The kernel version 4.1.37-rt43 is working fine, the kernel versions
> > > > 4.4.39-rt50 and
> > > 4.8.11-rt7 show the same strange behavior as 4.9.0-rt1.
> > > > Something on the way between 4.1.37-rt43 and 4.4.39-rt50 seems to
> > > > cause the
> > > trouble.
> > >
> > > can you check if one of the earlier v4.4-RT releases (maybe start with
> > > -rt2 or -rt1) also shows this behaviour?
> > > If not, would you have a testcase?
> > I tested with 4.4.0-rt2.
> > Here this issue occurs very rarely. I ran the gdb for about 30 times, and here I got
> > one hit.
> > With the other kernel versions I got the issue in 20-40% of the cases.
> > 
> > I will try out the other 4.4.0 releases...
> 
> I ran more tests. The kernel 4.4.1-rt4 is working fairly fine (only about one hit every 30 runs)
> but the kernel 4.4.1-rt5 is causing the issue very often.

and 4.1.37-rt43 shows not SIGSTOPS in 100 runs?
The reason I asked for testing early 4.4 kernel was to bisect the
failure if we had a version which did not show the failure.
Now we got from rarely hitting the problem to more often. Going through
the history of rt4…rt5 shows only fixes. One thing is fixing
lazy-preempt and another migrate-disable. And I believe one problem was
that we did not utilize more than one CPU under certain circumstances.
So if you bisect between those two it might be that the fix uncovers a
bug or showing more reliably.
But in the end, I have no idea where the extra SIGSTOPs signals are
from.

Sebastian

^ permalink raw reply	[flat|nested] 20+ messages in thread

* RE: Kernel 4.9.x-rt Fully Preemptible Kernel: Issue with gdb and unexpected SIGSTOP signals
  2017-01-25 13:40         ` Sebastian Andrzej Siewior
@ 2017-01-25 14:00           ` Koehrer Mathias (ETAS/ESW5)
  2017-01-26  9:26             ` Koehrer Mathias (ETAS/ESW5)
  0 siblings, 1 reply; 20+ messages in thread
From: Koehrer Mathias (ETAS/ESW5) @ 2017-01-25 14:00 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior; +Cc: linux-rt-users

Hi Sebastian,

> > > > > The kernel version 4.1.37-rt43 is working fine, the kernel
> > > > > versions
> > > > > 4.4.39-rt50 and
> > > > 4.8.11-rt7 show the same strange behavior as 4.9.0-rt1.
> > > > > Something on the way between 4.1.37-rt43 and 4.4.39-rt50 seems
> > > > > to cause the
> > > > trouble.
> > > >
> > > > can you check if one of the earlier v4.4-RT releases (maybe start
> > > > with
> > > > -rt2 or -rt1) also shows this behaviour?
> > > > If not, would you have a testcase?
> > > I tested with 4.4.0-rt2.
> > > Here this issue occurs very rarely. I ran the gdb for about 30
> > > times, and here I got one hit.
> > > With the other kernel versions I got the issue in 20-40% of the cases.
> > >
> > > I will try out the other 4.4.0 releases...
> >
> > I ran more tests. The kernel 4.4.1-rt4 is working fairly fine (only
> > about one hit every 30 runs) but the kernel 4.4.1-rt5 is causing the issue very
> often.
> 
> and 4.1.37-rt43 shows not SIGSTOPS in 100 runs?
That's right.
I used 4.1.37-rt43 and ran the test for 200 times. There was no single issue.

Regards

Mathias



^ permalink raw reply	[flat|nested] 20+ messages in thread

* RE: Kernel 4.9.x-rt Fully Preemptible Kernel: Issue with gdb and unexpected SIGSTOP signals
  2017-01-25 14:00           ` Koehrer Mathias (ETAS/ESW5)
@ 2017-01-26  9:26             ` Koehrer Mathias (ETAS/ESW5)
  2017-01-27 14:04               ` Koehrer Mathias (ETAS/ESW5)
  0 siblings, 1 reply; 20+ messages in thread
From: Koehrer Mathias (ETAS/ESW5) @ 2017-01-26  9:26 UTC (permalink / raw)
  To: Koehrer Mathias (ETAS/ESW5), Sebastian Andrzej Siewior; +Cc: linux-rt-users

[-- Attachment #1: Type: text/plain, Size: 5807 bytes --]

Hi Sebastian,

 
> > > > > > The kernel version 4.1.37-rt43 is working fine, the kernel
> > > > > > versions
> > > > > > 4.4.39-rt50 and
> > > > > 4.8.11-rt7 show the same strange behavior as 4.9.0-rt1.
> > > > > > Something on the way between 4.1.37-rt43 and 4.4.39-rt50 seems
> > > > > > to cause the
> > > > > trouble.
> > > > >
> > > > > can you check if one of the earlier v4.4-RT releases (maybe
> > > > > start with
> > > > > -rt2 or -rt1) also shows this behaviour?
> > > > > If not, would you have a testcase?
> > > > I tested with 4.4.0-rt2.
> > > > Here this issue occurs very rarely. I ran the gdb for about 30
> > > > times, and here I got one hit.
> > > > With the other kernel versions I got the issue in 20-40% of the cases.
> > > >
> > > > I will try out the other 4.4.0 releases...
> > >
> > > I ran more tests. The kernel 4.4.1-rt4 is working fairly fine (only
> > > about one hit every 30 runs) but the kernel 4.4.1-rt5 is causing the
> > > issue very
> > often.
> >
> > and 4.1.37-rt43 shows not SIGSTOPS in 100 runs?
> That's right.
> I used 4.1.37-rt43 and ran the test for 200 times. There was no single issue.

I have now a simple executable where I can reproduce a very similar issue.
It might be related to the one I see as it effects also gdb debugging.

The simple executable (see below or in the attachment) runs perfectly without gdb.
However, using kernel version 4.4.1-rt6 and also with 4.9.4-rt2 it very often does not run properly
using gdb.
In kernel version 4.1.37-rt43 it always runs properly in gdb.

Whenever the issue occurs, the gdb does not reach the end of the executable.
I have to stop the gdb with with CTRL-C.
Here is the gdb output of kernel 4.9.4-rt7
(gdb) info thr
shows
  Id   Target Id         Frame
* 1    Thread 0x7ffff7fdf700 (LWP 5383) "gdb-issue" 0x00007ffff79bd6bd in pthread_join (threadid=140737343743744,
    thread_return=0x0) at pthread_join.c:90
  2    Thread 0x7ffff7616700 (LWP 5387) "gdb-issue" 0x00007ffff76cee4b in __GI___waitpid (pid=5540,
    stat_loc=stat_loc@entry=0x7ffff7615d50, options=options@entry=0) at ../sysdeps/unix/sysv/linux/waitpid.c:29
(gdb) thr 2
(gdb) bt
shows
#0  0x00007ffff76cee4b in __GI___waitpid (pid=5540, stat_loc=stat_loc@entry=0x7ffff7615d50, options=options@entry=0)
    at ../sysdeps/unix/sysv/linux/waitpid.c:29
#1  0x00007ffff765608b in do_system (line=<optimized out>) at ../sysdeps/posix/system.c:148
#2  0x0000555555554df8 in thread_func (arg=0x5555557560c0 <thread_data>) at gdb-issue.c:54
#3  0x00007ffff79bc464 in start_thread (arg=0x7ffff7616700) at pthread_create.c:333
#4  0x00007ffff76ff9df in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:105

Sometimes it works fine, however in about 20-30% of the runs, the gdb session does not work properly.
Does this give you any hint?

Thanks for the support.

Regards

Mathias

--------------- BEGIN of C CODE -------------
#define _GNU_SOURCE
#include <sched.h>
#include <stdio.h>
#include <string.h>
#include <pthread.h>
#include <stdint.h>
#include <stdlib.h>
#include <unistd.h>
#include <stdbool.h>
#include <sys/eventfd.h>
#include <sys/mman.h>

static volatile bool terminated;

typedef struct
{
    pthread_t thread;
    int id;
    int trigger_fd;
}
thread_data_t;

void *thread_func(void *arg)
{
    int rc;
    thread_data_t *thr = arg;
    int loop;
    {
        struct sched_param param;
        memset(&param, 0, sizeof(param));
        param.sched_priority = 60;
        rc = pthread_setschedparam(pthread_self(), SCHED_FIFO, &param);
        if (rc) {
            fprintf(stderr, "pthread_setschedparam %m\n");
            exit(1);
        }

        /* Run each thread on a separate CPU core */
        cpu_set_t set;
        CPU_ZERO(&set);
        CPU_SET(thr->id, &set);
        rc = sched_setaffinity(0, sizeof(set), &set);
        if (rc) {
            fprintf(stderr, "sched_setaffinity %m\n");
            exit(1);
        }
    }

    for (loop=0; !terminated ; loop++) {
        uint64_t u64;
        int rc = read(thr->trigger_fd, &u64, 8);
        if (rc > 0) {
            /* printf("thread %i loop %i\n", thr->id, loop); */
            system("lspci -xxx > /dev/null");
        }
        else {
            fprintf(stderr, "read %m\n");
        }
    }
    printf("End of thread %i\n", thr->id);
    return NULL;
}

#define NO_THREADS 4
static thread_data_t thread_data[NO_THREADS];

int main(int argc, char *argv[])
{
    int i;
    int rc;
    /* Init */
    mlockall(MCL_CURRENT | MCL_FUTURE);
    for (i=0; i<NO_THREADS; i++) {
        thread_data[i].id = i;
        rc = thread_data[i].trigger_fd = eventfd(0,0);
        if (rc < 0) {
            fprintf(stderr, "Error creating eventfd: %m\n"); 
            return 1;
        }
    }

    for (i=0; i<NO_THREADS; i++) {
        rc = pthread_create(&thread_data[i].thread, NULL, thread_func, &thread_data[i]);
        if (rc) {
            fprintf(stderr, "Error creating thread: %m\n");
            return 1;
        }
    }

    /* Loop */
    int j;
    int max = 100;
    for (j=0; j<=max; j++) {
        struct timespec ts = { 0, 10000000 };
        clock_nanosleep(CLOCK_MONOTONIC, 0, &ts, NULL);

        if (j == max) {
            terminated = true;
        }

        for (i=0; i<NO_THREADS; i++) {
            uint64_t u64 = 1;
            write(thread_data[i].trigger_fd, &u64, 8);
        }
    }
    
    /* Cleanup */
    for (i=0; i<NO_THREADS; i++) {
        pthread_join(thread_data[i].thread, NULL);
        printf("Thread %i joined\n", i);
        close(thread_data[i].trigger_fd);
    }
    return 0;
}

--------------- END of C CODE -------------



[-- Attachment #2: gdb-issue.tgz --]
[-- Type: application/x-compressed, Size: 1273 bytes --]

^ permalink raw reply	[flat|nested] 20+ messages in thread

* RE: Kernel 4.9.x-rt Fully Preemptible Kernel: Issue with gdb and unexpected SIGSTOP signals
  2017-01-26  9:26             ` Koehrer Mathias (ETAS/ESW5)
@ 2017-01-27 14:04               ` Koehrer Mathias (ETAS/ESW5)
  2017-01-27 15:33                 ` Sebastian Andrzej Siewior
  2017-03-02 17:51                 ` Sebastian Andrzej Siewior
  0 siblings, 2 replies; 20+ messages in thread
From: Koehrer Mathias (ETAS/ESW5) @ 2017-01-27 14:04 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior; +Cc: linux-rt-users

[-- Attachment #1: Type: text/plain, Size: 6243 bytes --]

Hi Sebastian,

> > > > > > > The kernel version 4.1.37-rt43 is working fine, the kernel
> > > > > > > versions
> > > > > > > 4.4.39-rt50 and
> > > > > > 4.8.11-rt7 show the same strange behavior as 4.9.0-rt1.
> > > > > > > Something on the way between 4.1.37-rt43 and 4.4.39-rt50
> > > > > > > seems to cause the
> > > > > > trouble.
> > > > > >
> > > > > > can you check if one of the earlier v4.4-RT releases (maybe
> > > > > > start with
> > > > > > -rt2 or -rt1) also shows this behaviour?
> > > > > > If not, would you have a testcase?
> > > > > I tested with 4.4.0-rt2.
> > > > > Here this issue occurs very rarely. I ran the gdb for about 30
> > > > > times, and here I got one hit.
> > > > > With the other kernel versions I got the issue in 20-40% of the cases.
> > > > >
> > > > > I will try out the other 4.4.0 releases...
> > > >
> > > > I ran more tests. The kernel 4.4.1-rt4 is working fairly fine
> > > > (only about one hit every 30 runs) but the kernel 4.4.1-rt5 is
> > > > causing the issue very
> > > often.
> > >
> > > and 4.1.37-rt43 shows not SIGSTOPS in 100 runs?
> > That's right.
> > I used 4.1.37-rt43 and ran the test for 200 times. There was no single issue.
I have extended my test executable to use the kernel tracing feature to get more details.

Here is the output of my gdb session:
------------------ BEGIN -------------------------
pid: 17969
[New Thread 0x7ffff7616700 (LWP 17973)]
[New Thread 0x7ffff6e15700 (LWP 17974)]
[New Thread 0x7ffff6614700 (LWP 17975)]
[New Thread 0x7ffff5e13700 (LWP 17976)]
End of thread 0
End of thread 2
End of thread 1
Thread 0 joined
Thread 1 joined
Thread 2 joined
[Thread 0x7ffff6614700 (LWP 17975) exited]
[Thread 0x7ffff6e15700 (LWP 17974) exited]
[Thread 0x7ffff7616700 (LWP 17973) exited]
^C
Thread 1 "gdb-issue" received signal SIGINT, Interrupt.
0x00007ffff79bd6bd in pthread_join (threadid=140737318565632, thread_return=0x0) at pthread_join.c:90
90      pthread_join.c: No such file or directory.
(gdb) info thr
  Id   Target Id         Frame
* 1    Thread 0x7ffff7fe0700 (LWP 17969) "gdb-issue" 0x00007ffff79bd6bd in pthread_join (threadid=140737318565632, thread_return=0x0)
    at pthread_join.c:90
  5    Thread 0x7ffff5e13700 (LWP 17976) "gdb-issue" 0x00007ffff79c5b7b in __waitpid (pid=18160, stat_loc=0x7ffff5e12ea0, options=0)
    at ../sysdeps/unix/sysv/linux/waitpid.c:29
(gdb) thr 5
[Switching to thread 5 (Thread 0x7ffff5e13700 (LWP 17976))]
#0  0x00007ffff79c5b7b in __waitpid (pid=18160, stat_loc=0x7ffff5e12ea0, options=0) at ../sysdeps/unix/sysv/linux/waitpid.c:29
29      ../sysdeps/unix/sysv/linux/waitpid.c: No such file or directory.
(gdb) bt
#0  0x00007ffff79c5b7b in __waitpid (pid=18160, stat_loc=0x7ffff5e12ea0, options=0) at ../sysdeps/unix/sysv/linux/waitpid.c:29
#1  0x00005555555551ce in thread_func (arg=0x555555756130 <thread_data+48>) at gdb-issue.c:107
#2  0x00007ffff79bc464 in start_thread (arg=0x7ffff5e13700) at pthread_create.c:333
#3  0x00007ffff76ff9df in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:105
(gdb)
------------------ END -------------------------

The program hangs as it cannot join thread 3 (LWP 17976).
This thread has forked (pid=18160) to call "lspci -xxx" but it waits forever for the termination of pid 18160.

I have attached an extract of the kernel trace where this situation is reflected.

ERROR:
One interesting thing can be observed. The forked process 18160 is listed with the following lines
in the trace:
           <...>-18160 [003] d...112 19076.865212: sched_waking: comm=gdb-issue pid=17969 prio=120 target_cpu=000
           <...>-18160 [003] d...212 19076.865213: sched_wakeup: comm=gdb-issue pid=17969 prio=120 target_cpu=000
           <...>-18160 [003] d...2.. 19076.874033: sched_switch: prev_comm=gdb-issue prev_pid=18160 prev_prio=39 prev_state=T|K ==> next_comm=gdb-issue next_pid=17976 next_prio=39
           <...>-18160 [003] d...2.. 19076.874033: tlb_flush: pages:-1 reason:flush on task switch (0)
           <...>-18160 [003] d...2.. 19076.874034: x86_fpu_regs_deactivated: x86/fpu: ffff8801fcfaf600 fpregs_active: 0 fpstate_active: 1 counter: 4 xfeatures: 6 xcomp_bv: 0
           <...>-18160 [003] d...2.. 19076.874034: x86_fpu_regs_activated: x86/fpu: ffff88040b027600 fpregs_active: 1 fpstate_active: 1 counter: 113 xfeatures: 6 xcomp_bv: 0
           <...>-18160 [003] d...2.. 19076.874034: x86_fpu_regs_activated: x86/fpu: ffff88040b027600 fpregs_active: 1 fpstate_active: 1 counter: 113 xfeatures: 6 xcomp_bv: 0
           <...>-18160 [003] d...2.. 19076.874034: write_msr: c0000100, value 7ffff5e13700


SUCCESS:
Within the previous loop the thread 17976 also forked to pid=18150 to do the very same.
In this case it has been successfully.
The lines for pid=18150 in the trace looks like this:
           <...>-18150 [003] d...2.. 19076.846386: sched_switch: prev_comm=sh prev_pid=18150 prev_prio=39 prev_state=S ==> next_comm=gdb-issue next_pid=17976 next_prio=39
           <...>-18150 [003] d...2.. 19076.846387: tlb_flush: pages:-1 reason:flush on task switch (0)
           <...>-18150 [003] d...2.. 19076.846387: x86_fpu_regs_deactivated: x86/fpu: ffff8801fcfa8a00 fpregs_active: 0 fpstate_active: 1 counter: 0 xfeatures: 2 xcomp_bv: 0
           <...>-18150 [003] d...2.. 19076.846388: x86_fpu_regs_activated: x86/fpu: ffff88040b027600 fpregs_active: 1 fpstate_active: 1 counter: 105 xfeatures: 6 xcomp_bv: 0
           <...>-18150 [003] d...2.. 19076.846388: x86_fpu_regs_activated: x86/fpu: ffff88040b027600 fpregs_active: 1 fpstate_active: 1 counter: 105 xfeatures: 6 xcomp_bv: 0
           <...>-18150 [003] d...2.. 19076.846388: write_msr: c0000100, value 7ffff5e13700

In all the successful loops the lines in the trace look like the lines in the successful (2nd) example below.
In the error case there is an additional sched_waking and sched_wakeup in the trace.
Also there is always "prev_state=S" within the forked process.
In the error case, there is "prev_state=T|K".

Please find the trace extract and the test executable in the attachment.

Any idea or feedback on the issue is highly welcome!

Thanks

Best regards

Mathias



[-- Attachment #2: trace-extract.gz --]
[-- Type: application/x-gzip, Size: 11327 bytes --]

[-- Attachment #3: gdb-issue.tgz --]
[-- Type: application/x-compressed, Size: 1727 bytes --]

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Kernel 4.9.x-rt Fully Preemptible Kernel: Issue with gdb and unexpected SIGSTOP signals
  2017-01-27 14:04               ` Koehrer Mathias (ETAS/ESW5)
@ 2017-01-27 15:33                 ` Sebastian Andrzej Siewior
  2017-01-30  7:24                   ` Koehrer Mathias (ETAS/ESW5)
  2017-03-02 17:51                 ` Sebastian Andrzej Siewior
  1 sibling, 1 reply; 20+ messages in thread
From: Sebastian Andrzej Siewior @ 2017-01-27 15:33 UTC (permalink / raw)
  To: Koehrer Mathias (ETAS/ESW5); +Cc: linux-rt-users

On 2017-01-27 14:04:35 [+0000], Koehrer Mathias (ETAS/ESW5) wrote:
> Hi Sebastian,
Hi Mathias,

> In all the successful loops the lines in the trace look like the lines in the successful (2nd) example below.
> In the error case there is an additional sched_waking and sched_wakeup in the trace.
> Also there is always "prev_state=S" within the forked process.
> In the error case, there is "prev_state=T|K".

That T|K is probably some kind of debug state the program remains until
the debugger puts it back on track. I think the interesting
part ist figure out what decided to send this signal or put the program
into stop-state. Usually a breakpoint, invalid opcode, … causes this
kind of action but my understanding is that you use none of those.

> Thanks
> 
> Best regards
> 
> Mathias

Sebastian

^ permalink raw reply	[flat|nested] 20+ messages in thread

* RE: Kernel 4.9.x-rt Fully Preemptible Kernel: Issue with gdb and unexpected SIGSTOP signals
  2017-01-27 15:33                 ` Sebastian Andrzej Siewior
@ 2017-01-30  7:24                   ` Koehrer Mathias (ETAS/ESW5)
  0 siblings, 0 replies; 20+ messages in thread
From: Koehrer Mathias (ETAS/ESW5) @ 2017-01-30  7:24 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior; +Cc: linux-rt-users

Hi Sebastian,

> > In all the successful loops the lines in the trace look like the lines in the successful
> (2nd) example below.
> > In the error case there is an additional sched_waking and sched_wakeup in the
> trace.
> > Also there is always "prev_state=S" within the forked process.
> > In the error case, there is "prev_state=T|K".
> 
> That T|K is probably some kind of debug state the program remains until the
> debugger puts it back on track. I think the interesting part ist figure out what decided
> to send this signal or put the program into stop-state. Usually a breakpoint, invalid
> opcode, … causes this kind of action but my understanding is that you use none of
> those.
Perhaps I was not precise enough in my previous mail. 
The effect that I described in there was not a SIGSTOP but a hanging thread.
I had to press CTRL-C to interrupt the execution and to enter the gdb prompt.
This hanging occurs in 10-20% of the runs.
Also here, this hang always occurs when one of the threads does a kind of "clone".
And this effect only appears it the kernel is configured for "full preemptible".

So there are two effects I have noticed:
- The strange SIGSTOP where I cannot provide a simple test executable
- The strange issue with the hanging thread (please use the example from my previous mail)
In both issues, a "clone" is executed in one of the threads.

Regards

Mathias

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Kernel 4.9.x-rt Fully Preemptible Kernel: Issue with gdb and unexpected SIGSTOP signals
  2017-01-27 14:04               ` Koehrer Mathias (ETAS/ESW5)
  2017-01-27 15:33                 ` Sebastian Andrzej Siewior
@ 2017-03-02 17:51                 ` Sebastian Andrzej Siewior
  2017-03-07 13:39                   ` Koehrer Mathias (ETAS/ESW3)
  1 sibling, 1 reply; 20+ messages in thread
From: Sebastian Andrzej Siewior @ 2017-03-02 17:51 UTC (permalink / raw)
  To: Koehrer Mathias (ETAS/ESW5); +Cc: linux-rt-users

On 2017-01-27 14:04:35 [+0000], Koehrer Mathias (ETAS/ESW5) wrote:
> Hi Sebastian,
Hi Mathias,

> In the error case, there is "prev_state=T|K".

The t|K should be okay.

        gdb-issue-8145  [001] .......  9315.877956: sched_process_fork: comm=gdb-issue pid=8145 child_comm=gdb-issue child_pid=8473
        gdb-issue-8145  [001] d...2..  9315.877958: sched_wakeup_new: comm=gdb-issue pid=8473 prio=39 target_cpu=001
        gdb-issue-8145  [001] d...212  9315.877964: sched_wakeup: comm=gdb pid=8138 prio=120 target_cpu=004
        gdb-issue-8145  [001] .....12  9315.877964: signal_generate: sig=17 errno=0 code=262148 comm=gdb pid=8138 grp=1 res=0
        gdb-issue-8145  [001] d...2..  9315.877966: sched_switch: prev_comm=gdb-issue prev_pid=8145 prev_prio=39 prev_state=t|K ==> next_comm =gdb-issue next_pid=8473 next_prio=39
           <idle>-0     [004] d...2..  9315.877990: sched_switch: prev_comm=swapper/4 prev_pid=0 prev_prio=120 prev_state=R ==> next_comm=gdb  next_pid=8138 next_prio=120
              gdb-8138  [004] .......  9315.877992: sys_exit: NR 7 = -516
              gdb-8138  [004] .....11  9315.877993: signal_deliver: sig=17 errno=0 code=262148 sa_handler=5623f7c281e0 sa_flags=14000000
…

The gdb task deals with each new child and releases it later:

              gdb-8138  [004] .......  9315.878159: sys_enter: NR 101 (11, 2119, 0, 0, 10, 30)
              gdb-8138  [004] d...2..  9315.878160: sched_wait_task: comm=gdb-issue pid=8473 prio=39
              gdb-8138  [004] d...212  9315.878163: sched_wakeup: comm=gdb-issue pid=8473 prio=39 target_cpu=001
              gdb-8138  [004] .......  9315.878164: sys_exit: NR 101 = 0

later it is gone. So that looks fine. When it hangs

        gdb-issue-8424  [000] d...212  9315.738204: sched_wakeup: comm=gdb pid=8138 prio=120 target_cpu=004
(8424 will sched out due to ptrace)
              gdb-8138  [004] .......  9315.738207: sys_exit: NR 61 = 8424
              gdb-8138  [004] .......  9315.738214: sys_enter: NR 15 (c, 5623f7fff84b, 1, 0, 8, 30)
              gdb-8138  [004] .......  9315.738215: sys_exit: NR -1 = 8424
              gdb-8138  [004] .......  9315.738216: sys_enter: NR 101 (11, 20e8, 0, 0, 10, 30)
              gdb-8138  [004] d...2..  9315.738217: sched_wait_task: comm=gdb-issue pid=8424 prio=39
              gdb-8138  [004] d...112  9315.738218: sched_waking: comm=gdb-issue pid=8424 prio=39 target_cpu=000
              gdb-8138  [004] d...212  9315.738220: sched_wakeup: comm=gdb-issue pid=8424 prio=39 target_cpu=000
              gdb-8138  [004] .......  9315.738221: sys_exit: NR 101 = 0
        gdb-issue-8144  [000] .......  9315.738246: sys_exit: NR 56 = 8424
        gdb-issue-8144  [000] .......  9315.738261: sched_process_wait: comm=gdb-issue pid=8424 prio=39

and 8144 blocks in wait for 8424 which is never completes without
additional help (like a manual SIGCONT). Right now it looks like the
PTRACE_DETACH (syscall 101, 11) which should remove the task from ptrace
and wakeup did not work but I see a wakeup…
The wakeup worked (most likely) but since it is stuck I guess that there
was a second signal it is processing and waiting for gdb.

> Please find the trace extract and the test executable in the attachment.
> 
> Any idea or feedback on the issue is highly welcome!
> 
> Thanks
> 
> Best regards
> 
> Mathias

Sebastian

^ permalink raw reply	[flat|nested] 20+ messages in thread

* RE: Kernel 4.9.x-rt Fully Preemptible Kernel: Issue with gdb and unexpected SIGSTOP signals
  2017-03-02 17:51                 ` Sebastian Andrzej Siewior
@ 2017-03-07 13:39                   ` Koehrer Mathias (ETAS/ESW3)
  2017-03-07 23:21                     ` Sebastian Andrzej Siewior
  0 siblings, 1 reply; 20+ messages in thread
From: Koehrer Mathias (ETAS/ESW3) @ 2017-03-07 13:39 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior; +Cc: linux-rt-users

Hi Sebastian,

thanks for the feedback.
> > In the error case, there is "prev_state=T|K".
> 
> The t|K should be okay.
> 
>         gdb-issue-8145  [001] .......  9315.877956: sched_process_fork: comm=gdb-
> issue pid=8145 child_comm=gdb-issue child_pid=8473
>         gdb-issue-8145  [001] d...2..  9315.877958: sched_wakeup_new: comm=gdb-
> issue pid=8473 prio=39 target_cpu=001
>         gdb-issue-8145  [001] d...212  9315.877964: sched_wakeup: comm=gdb
> pid=8138 prio=120 target_cpu=004
>         gdb-issue-8145  [001] .....12  9315.877964: signal_generate: sig=17 errno=0
> code=262148 comm=gdb pid=8138 grp=1 res=0
>         gdb-issue-8145  [001] d...2..  9315.877966: sched_switch: prev_comm=gdb-
> issue prev_pid=8145 prev_prio=39 prev_state=t|K ==> next_comm =gdb-issue
> next_pid=8473 next_prio=39
>            <idle>-0     [004] d...2..  9315.877990: sched_switch: prev_comm=swapper/4
> prev_pid=0 prev_prio=120 prev_state=R ==> next_comm=gdb  next_pid=8138
> next_prio=120
>               gdb-8138  [004] .......  9315.877992: sys_exit: NR 7 = -516
>               gdb-8138  [004] .....11  9315.877993: signal_deliver: sig=17 errno=0
> code=262148 sa_handler=5623f7c281e0 sa_flags=14000000 …
> 
> The gdb task deals with each new child and releases it later:
> 
>               gdb-8138  [004] .......  9315.878159: sys_enter: NR 101 (11, 2119, 0, 0, 10,
> 30)
>               gdb-8138  [004] d...2..  9315.878160: sched_wait_task: comm=gdb-issue
> pid=8473 prio=39
>               gdb-8138  [004] d...212  9315.878163: sched_wakeup: comm=gdb-issue
> pid=8473 prio=39 target_cpu=001
>               gdb-8138  [004] .......  9315.878164: sys_exit: NR 101 = 0
> 
> later it is gone. So that looks fine. When it hangs
> 
>         gdb-issue-8424  [000] d...212  9315.738204: sched_wakeup: comm=gdb
> pid=8138 prio=120 target_cpu=004
> (8424 will sched out due to ptrace)
>               gdb-8138  [004] .......  9315.738207: sys_exit: NR 61 = 8424
>               gdb-8138  [004] .......  9315.738214: sys_enter: NR 15 (c, 5623f7fff84b, 1,
> 0, 8, 30)
>               gdb-8138  [004] .......  9315.738215: sys_exit: NR -1 = 8424
>               gdb-8138  [004] .......  9315.738216: sys_enter: NR 101 (11, 20e8, 0, 0, 10,
> 30)
>               gdb-8138  [004] d...2..  9315.738217: sched_wait_task: comm=gdb-issue
> pid=8424 prio=39
>               gdb-8138  [004] d...112  9315.738218: sched_waking: comm=gdb-issue
> pid=8424 prio=39 target_cpu=000
>               gdb-8138  [004] d...212  9315.738220: sched_wakeup: comm=gdb-issue
> pid=8424 prio=39 target_cpu=000
>               gdb-8138  [004] .......  9315.738221: sys_exit: NR 101 = 0
>         gdb-issue-8144  [000] .......  9315.738246: sys_exit: NR 56 = 8424
>         gdb-issue-8144  [000] .......  9315.738261: sched_process_wait: comm=gdb-
> issue pid=8424 prio=39
> 
> and 8144 blocks in wait for 8424 which is never completes without additional help
> (like a manual SIGCONT). Right now it looks like the PTRACE_DETACH (syscall
> 101, 11) which should remove the task from ptrace and wakeup did not work but I
> see a wakeup… The wakeup worked (most likely) but since it is stuck I guess that
> there was a second signal it is processing and waiting for gdb.
I do not understand, what that means.
When I run the very same test on a non-rt kernel (or even on a RT kernel with a 
preemption model that is not configured as "Fully Preemptible Kernel"), I never see 
this issue.
Also - as mentioned - previously, with older RT preempted kernel versions it is working fine
as well. For me it looks as if the RT preempting on newer kernels somehow has impact 
on the way the debugging with gdb works
My expectation (and so far I had the experience) is that a code that is running fine 
with a non RT kernel should also run fine with the fully preempted kernel (of course 
the timing behavior is different). 

Regards

Mathias



^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Kernel 4.9.x-rt Fully Preemptible Kernel: Issue with gdb and unexpected SIGSTOP signals
  2017-03-07 13:39                   ` Koehrer Mathias (ETAS/ESW3)
@ 2017-03-07 23:21                     ` Sebastian Andrzej Siewior
  2017-04-24 19:49                       ` David Hauck
  0 siblings, 1 reply; 20+ messages in thread
From: Sebastian Andrzej Siewior @ 2017-03-07 23:21 UTC (permalink / raw)
  To: Koehrer Mathias (ETAS/ESW3); +Cc: linux-rt-users

On 2017-03-07 13:39:36 [+0000], Koehrer Mathias (ETAS/ESW3) wrote:
> Hi Sebastian,
Hi Mathias,

> > and 8144 blocks in wait for 8424 which is never completes without additional help
> > (like a manual SIGCONT). Right now it looks like the PTRACE_DETACH (syscall
> > 101, 11) which should remove the task from ptrace and wakeup did not work but I
> > see a wakeup… The wakeup worked (most likely) but since it is stuck I guess that
> > there was a second signal it is processing and waiting for gdb.
> I do not understand, what that means.
> When I run the very same test on a non-rt kernel (or even on a RT kernel with a 
> preemption model that is not configured as "Fully Preemptible Kernel"), I never see 
> this issue.

I can reproduce the issue on a -RT kernel. I don't really know what the
root issue is but it is a problem. We had a ptrace issue if you look
into the queue for
  "ptrace: fix ptrace vs tasklist_lock race"

> Also - as mentioned - previously, with older RT preempted kernel versions it is working fine
> as well. For me it looks as if the RT preempting on newer kernels somehow has impact 
> on the way the debugging with gdb works
> My expectation (and so far I had the experience) is that a code that is running fine 
> with a non RT kernel should also run fine with the fully preempted kernel (of course 
> the timing behavior is different). 

That is correct - it should work. I've been tracing it for a while now
and have no real clue what goes wrong. There a few different ways how it
seems to go wrong and the outcome is always that the kernel puts the
task in "stop" mode while gdb expects it to run and waits for it.

> Regards
> 
> Mathias

Sebastian

^ permalink raw reply	[flat|nested] 20+ messages in thread

* RE: Kernel 4.9.x-rt Fully Preemptible Kernel: Issue with gdb and unexpected SIGSTOP signals
  2017-03-07 23:21                     ` Sebastian Andrzej Siewior
@ 2017-04-24 19:49                       ` David Hauck
  2017-04-25  6:06                         ` Koehrer Mathias (ETAS/ESW3)
  0 siblings, 1 reply; 20+ messages in thread
From: David Hauck @ 2017-04-24 19:49 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior, Koehrer Mathias (ETAS/ESW3); +Cc: linux-rt-users

Hi Sebastian and Mathias,
 
I wanted to follow-up on this thread since we, too, are seeing issues with GDB and v4.9-rt kernels. We've been working on upgrading our systems/application from the v3.18-rt kernel series (specifically v3.18..29-rt30) to v4.9-rt (specifically 4.9.13-rt11) and immediately started experiencing various GDB related issues. The first (inability to pause at breakpoints and single-step) was eliminated by reverting to using GDB v7.11.1 (from GDB v7.12.1). However, the second (the SIGSTOP issue originally reported by Mathias) isn't something we've been able to work around. Have either of you made any of your own progress on this issue? Are others also seeing this?

Toolchain (v3.18-rt i386 targets/application)
	GCC v4.9.3
	GLIbC v2.19
	GDB v7.11.1
Toolchain (v4.9-rt i386 targets/application)
	GCC v4.9.4
	GLibC v2.25
	GDB v7.11.1
	
Thanks,
-David Hauck
NetAcquire Corp.
 
On Tuesday, March 07, 2017 3:22 PM, linux-rt-users-owner@vger.kernel.org wrote:
> On 2017-03-07 13:39:36 [+0000], Koehrer Mathias (ETAS/ESW3) wrote:
>> Hi Sebastian,
> Hi Mathias,
> 
>>> and 8144 blocks in wait for 8424 which is never completes without
>>> additional help (like a manual SIGCONT). Right now it looks like the
>>> PTRACE_DETACH (syscall 101, 11) which should remove the task from
>>> ptrace and wakeup did not work but I see a wakeup… The wakeup worked
>>> (most likely) but since it is stuck I guess that there was a second signal it is processing and waiting for
> gdb.
>> I do not understand, what that means.
>> When I run the very same test on a non-rt kernel (or even on a RT
>> kernel with a preemption model that is not configured as "Fully
>> Preemptible Kernel"), I never see this issue.
> 
> I can reproduce the issue on a -RT kernel. I don't really know what the root issue is but it is a problem.
> We had a ptrace issue if you look into the queue for
>   "ptrace: fix ptrace vs tasklist_lock race"
>> Also - as mentioned - previously, with older RT preempted kernel
>> versions it is working fine as well. For me it looks as if the RT
>> preempting on newer kernels somehow has impact on the way the
>> debugging with gdb works My expectation (and so far I had the
>> experience) is that a code that is running fine with a non RT kernel
>> should also run fine with the fully preempted kernel (of course the timing behavior is different).
> 
> That is correct - it should work. I've been tracing it for a while now and have no real clue what goes
> wrong. There a few different ways how it seems to go wrong and the outcome is always that the kernel
> puts the task in "stop" mode while gdb expects it to run and waits for it.
> 
>> Regards
>> 
>> Mathias
> 
> Sebastian

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Kernel 4.9.x-rt Fully Preemptible Kernel: Issue with gdb and unexpected SIGSTOP signals
  2017-04-24 19:49                       ` David Hauck
@ 2017-04-25  6:06                         ` Koehrer Mathias (ETAS/ESW3)
  0 siblings, 0 replies; 20+ messages in thread
From: Koehrer Mathias (ETAS/ESW3) @ 2017-04-25  6:06 UTC (permalink / raw)
  To: David Hauck, Sebastian Andrzej Siewior; +Cc: linux-rt-users

Hi David
> I wanted to follow-up on this thread since we, too, are seeing issues with GDB and
> v4.9-rt kernels. We've been working on upgrading our systems/application from the
> v3.18-rt kernel series (specifically v3.18..29-rt30) to v4.9-rt (specifically 4.9.13-rt11)
> and immediately started experiencing various GDB related issues. The first (inability
> to pause at breakpoints and single-step) was eliminated by reverting to using GDB
> v7.11.1 (from GDB v7.12.1). However, the second (the SIGSTOP issue originally
> reported by Mathias) isn't something we've been able to work around. Have either of
> you made any of your own progress on this issue? Are others also seeing this?
No progress from our side... We are staying on the 3.18 kernel (which is not really fine 
as it is now EOL....)

Best regards

Mathias 

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Kernel 4.9.x-rt Fully Preemptible Kernel: Issue with gdb and unexpected SIGSTOP signals
  2017-06-29  7:45     ` Koehrer Mathias (ETAS/EHE1)
@ 2017-06-29  8:42       ` Zhou, Li
  0 siblings, 0 replies; 20+ messages in thread
From: Zhou, Li @ 2017-06-29  8:42 UTC (permalink / raw)
  To: Koehrer Mathias (ETAS/EHE1), linux-rt-users

Hi, Mathias:

         Thank you very much for your help. The related changes can 
solve my issue.


On 06/29/2017 03:45 PM, Koehrer Mathias (ETAS/EHE1) wrote:
> Hi,
>> Issue with gdb and unexpected SIGSTOP signals" in this mail list, I wonder if there is
>> any update for this.
>> I want to follow this thread too, because I meet almost the same issue.
>> My kernel version is based on rt kernel 4.8.x.
> According to the announcements this should be fixed with v4.9.33-rt23.
> Please read the announcements for this version and the one for 4.9.30-rt21.
>
> Regards
>
> Mathias
>

-- 
Best Regards!
Zhou Li
Phone number: 86-10-84778511


^ permalink raw reply	[flat|nested] 20+ messages in thread

* RE: Kernel 4.9.x-rt Fully Preemptible Kernel: Issue with gdb and unexpected SIGSTOP signals
  2017-06-29  6:08   ` Zhou, Li
@ 2017-06-29  7:45     ` Koehrer Mathias (ETAS/EHE1)
  2017-06-29  8:42       ` Zhou, Li
  0 siblings, 1 reply; 20+ messages in thread
From: Koehrer Mathias (ETAS/EHE1) @ 2017-06-29  7:45 UTC (permalink / raw)
  To: Zhou, Li, linux-rt-users

Hi,
> Issue with gdb and unexpected SIGSTOP signals" in this mail list, I wonder if there is
> any update for this.
> I want to follow this thread too, because I meet almost the same issue.
> My kernel version is based on rt kernel 4.8.x.
According to the announcements this should be fixed with v4.9.33-rt23.
Please read the announcements for this version and the one for 4.9.30-rt21.

Regards

Mathias


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Kernel 4.9.x-rt Fully Preemptible Kernel: Issue with gdb and unexpected SIGSTOP signals
       [not found] ` <85216e5b-7c2d-9ff7-c118-9279023a1726@windriver.com>
@ 2017-06-29  6:08   ` Zhou, Li
  2017-06-29  7:45     ` Koehrer Mathias (ETAS/EHE1)
  0 siblings, 1 reply; 20+ messages in thread
From: Zhou, Li @ 2017-06-29  6:08 UTC (permalink / raw)
  To: linux-rt-users

Hi, all:
         About the issue "Kernel 4.9.x-rt Fully Preemptible Kernel: 
Issue with gdb and unexpected SIGSTOP signals" in this mail list, I 
wonder if there is any update for this.
I want to follow this thread too, because I meet almost the same issue.  
My kernel version is based on rt kernel 4.8.x.
Thanks.

-- 

Best Regards!
Zhou Li
Phone number: 86-10-84778511


^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2017-06-29  8:42 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-01-23 13:43 Kernel 4.9.x-rt Fully Preemptible Kernel: Issue with gdb and unexpected SIGSTOP signals Koehrer Mathias (ETAS/ESW5)
2017-01-24  9:15 ` Koehrer Mathias (ETAS/ESW5)
2017-01-25  9:56   ` Sebastian Andrzej Siewior
2017-01-25 11:28     ` Koehrer Mathias (ETAS/ESW5)
2017-01-25 12:55       ` Koehrer Mathias (ETAS/ESW5)
2017-01-25 13:36         ` Koehrer Mathias (ETAS/ESW5)
2017-01-25 13:40         ` Sebastian Andrzej Siewior
2017-01-25 14:00           ` Koehrer Mathias (ETAS/ESW5)
2017-01-26  9:26             ` Koehrer Mathias (ETAS/ESW5)
2017-01-27 14:04               ` Koehrer Mathias (ETAS/ESW5)
2017-01-27 15:33                 ` Sebastian Andrzej Siewior
2017-01-30  7:24                   ` Koehrer Mathias (ETAS/ESW5)
2017-03-02 17:51                 ` Sebastian Andrzej Siewior
2017-03-07 13:39                   ` Koehrer Mathias (ETAS/ESW3)
2017-03-07 23:21                     ` Sebastian Andrzej Siewior
2017-04-24 19:49                       ` David Hauck
2017-04-25  6:06                         ` Koehrer Mathias (ETAS/ESW3)
     [not found] <6a05f9f4-9299-4b36-7f11-5e334768880a@windriver.com>
     [not found] ` <85216e5b-7c2d-9ff7-c118-9279023a1726@windriver.com>
2017-06-29  6:08   ` Zhou, Li
2017-06-29  7:45     ` Koehrer Mathias (ETAS/EHE1)
2017-06-29  8:42       ` Zhou, Li

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.