All of lore.kernel.org
 help / color / mirror / Atom feed
* rt_dev_send() stalls periodic task
@ 2019-04-15 17:28 C Smith
  2019-04-16  8:03 ` Jan Kiszka
  0 siblings, 1 reply; 21+ messages in thread
From: C Smith @ 2019-04-15 17:28 UTC (permalink / raw)
  To: Sumitabh Ghosh via Xenomai

My Xenomai periodic routine normally runs for days at a time on most
motherboards, but it is spontaneously getting stuck forever in
rt_dev_write(). This is a write to a xeno_16550A driver serial port.

I must use this brand of motherboard, where the first serial port (rtser0
0x3f8 irq 4) does not have a problem, but the other two serial ports have
the stalling problem (rtser1 0x2f8 irq 5, rtser2 0x2e8 irq 3). Three
motherboards of this brand have been tried with the same results. There are
no shared interrupts in this scenario.

The serial device is set up this way:

struct rtser_config serial_config = {
        .config_mask       = 0xFFFF,
        .baud_rate         = 115200,
        .parity            = RTSER_NO_PARITY,
        .data_bits         = RTSER_8_BITS,
        .stop_bits         = RTSER_1_STOPB,
        .handshake         = RTSER_NO_HAND,
        .fifo_depth        = RTSER_DEF_FIFO_DEPTH, //RTSER_FIFO_DEPTH_8,
    .reserved          = 0,
        .rx_timeout        = 500000,
        .tx_timeout        = RTSER_DEF_TIMEOUT,
        .event_timeout     = 5000000,
        .timestamp_history = RTSER_RX_TIMESTAMP_HISTORY,
        .event_mask        = RTSER_EVENT_RXPEND,
};
fd_tty[0] = rt_dev_open("rtser1", O_RDWR | O_NONBLOCK);
sret = rt_dev_ioctl(fd_tty[0], RTSER_RTIOC_SET_CONFIG, &serial_config);

The application transmits a packet of about 75 bytes repeatedly from a
xenomai periodic task that wakes up at 125Hz repeatedly. Note that there is
also a small RX serial packet arriving so there is some full-duplex
overlap.  On rtser0 this works fine, on the other serial ports the stall
happens after a few hours and my periodic xenomai task stops. There is no
xenomai watchdog message in dmesg. The code is repeatedly checking the
serial port status ioctl and there are no errors like framing errors etc.

The periodic task is just a typical xenomai while() loop:
  next += period_ns + adjust_ns;
    rt_task_sleep_until(next);

When my periodic task stops the kernel says the stack trace is:
[root@oyx ~]# cd /proc/1066/task/1075/
[root@oyx 1075]# cat stack
[<c112d058>] xnpod_suspend_thread+0x3d8/0x650
[<c1132f09>] xnsynch_sleep_on+0x139/0x320
[<c11a7f14>] rtdm_event_timedwait+0x2e4/0x390
[<e858ed3b>] rt_16550_write+0x35b/0x540 [xeno_16550A]
[<c11a1e23>] __rt_dev_write+0x63/0x110
[<c11a9374>] sys_rtdm_write+0x24/0x30
[<c113c2dc>] hisyscall_event+0x1ec/0x380
[<c10eb31a>] ipipe_syscall_hook+0x3a/0x50
[<c10ea220>] __ipipe_notify_syscall+0xb0/0x160
[<c16a73bb>] pipeline_syscall+0x7/0x18
[<ffffffff>] 0xffffffff

I can attach with a debugger, and when I do I think the debugger gets us
out of the stall, so can actually single step the code for a little while.
I can't see any suspicious variable values, only that the serial port
transmitted 40 of my 75 bytes, which is unusual. But I can only single step
until my task sleeps one more time. At the next wakeup if I step into the
rt_dev_write() the task stalls forever and I can no longer debug.

(gdb) thread 2
[Switching to thread 2 (Thread 0xb7797b40 (LWP 1336))]
#0  0xb77caa92 in _dl_sysinfo_int80 () from /lib/ld-linux.so.2
(gdb) where
#0  0xb77caa92 in _dl_sysinfo_int80 () from /lib/ld-linux.so.2
#1  0xb775d872 in rt_dev_write (fd=12, buf=0xa8eda001, nbyte=72) at
core.c:72
#2  0x08056515 in Process_serial (comm_p=0x810e644 <comm_object+4>,
portnum=1 '\001') at periodic_app.cpp:5404
#3  0x0804e0e4 in Periodic_routine (cookie=0x0) at periodic_app.cpp:1654
#4  0xb7764acd in rt_task_trampoline (cookie=0x0) at task.c:113
#5  0xb777a313 in start_thread () from /lib/libpthread.so.0
#6  0xb7528f2e in clone () from /lib/libc.so.6

I'm using an Intel I5 CPU, 32 bit kernel 3.18.20, Xenomai 2.6.5. I must be
on this Xenomai/kernel version to support tens of thousands of lines of
legacy code. I diffed the driver sources and the rtl_16550 driver did not
functionally change between Xenomai 2.6.5 and Xenomai 3.0.8.

I looked at the rt_dev_write() source code, but I don't see an obvious
infinite loop (though the assembly code is a bit beyond my understanding).
I'd like to detect the problem early and continue without stalling.
It seems the physical serial ports are misbehaving, sure. But what would
make rt_dev_write() stall forever?

thanks,
C Smith

^ permalink raw reply	[flat|nested] 21+ messages in thread
* Re: rt_dev_send() stalls periodic task
@ 2019-04-24 13:05 Jeff Webb
  2019-04-24 14:36 ` Jan Kiszka
  0 siblings, 1 reply; 21+ messages in thread
From: Jeff Webb @ 2019-04-24 13:05 UTC (permalink / raw)
  To: xenomai

> The only difference in the serial configuration between that cross-link.c
> app and my app was :
> struct rtser_config :
>          .rx_timeout        = RTSER_DEF_TIMEOUT  // infinite ,  no stall for
> many hours in cross-link.c
> versus:
>          .rx_timeout        = 500000   // 500us, stalls within an hour in my
> app
> I don't know why an RX setting affects TX behavior. I also can't use
> RTSER_DEF_TIMEOUT in my application or it dies when it starts up - no clue
> why.  But I did try setting
>    .rx_timeout        = 5000000   // 5 ms. my app doesnt stall for several
> hours

This may not be related, but this jogged my memory about a problem I had years ago regarding rx/tx timeouts affecting the other operation.  I just took a look at rt_16550_write(), and I see:

  rtdm_toseq_init(&timeout_seq, ctx->config.rx_timeout);

  /* Make write operation atomic. */
  ret = rtdm_mutex_timedlock(&ctx->out_lock, ctx->config.rx_timeout,
     &timeout_seq);

This is the same code as in *_read().  I am wondering if this is a cut and paste error.  It seems like these two lines should use tx_timeout instead, but I haven't looked into the code in detail.  Maybe this is related, or maybe another bug?

Thanks,

-Jeff Webb



^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2019-04-26 16:38 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-04-15 17:28 rt_dev_send() stalls periodic task C Smith
2019-04-16  8:03 ` Jan Kiszka
2019-04-18  6:42   ` C Smith
2019-04-18  8:36     ` Jan Kiszka
2019-04-21  4:33       ` C Smith
2019-04-21 20:10         ` Steve Freyder
2019-04-22  6:40           ` C Smith
2019-04-22  6:45             ` Jan Kiszka
2019-04-22 19:51               ` Steve Freyder
2019-04-22 20:58                 ` Steve Freyder
2019-04-22 22:56                   ` C Smith
2019-04-22 23:44                     ` Steve Freyder
2019-04-23 12:15               ` Jan Kiszka
2019-04-24  6:53                 ` C Smith
2019-04-25  7:15                 ` C Smith
2019-04-25  8:23                   ` Jan Kiszka
2019-04-26  0:59                     ` C Smith
2019-04-26 16:38                       ` Jan Kiszka
2019-04-24 13:05 Jeff Webb
2019-04-24 14:36 ` Jan Kiszka
2019-04-26  0:41   ` Jeff Webb

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.