From mboxrd@z Thu Jan 1 00:00:00 1970 Subject: Re: rt_dev_send() stalls periodic task References: From: Jan Kiszka Message-ID: Date: Tue, 16 Apr 2019 10:03:50 +0200 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: 7bit List-Id: Discussions about the Xenomai project List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: C Smith , Sumitabh Ghosh via Xenomai On 15.04.19 19:28, C Smith via Xenomai wrote: > My Xenomai periodic routine normally runs for days at a time on most > motherboards, but it is spontaneously getting stuck forever in > rt_dev_write(). This is a write to a xeno_16550A driver serial port. > > I must use this brand of motherboard, where the first serial port (rtser0 > 0x3f8 irq 4) does not have a problem, but the other two serial ports have > the stalling problem (rtser1 0x2f8 irq 5, rtser2 0x2e8 irq 3). Three > motherboards of this brand have been tried with the same results. There are > no shared interrupts in this scenario. > > The serial device is set up this way: > > struct rtser_config serial_config = { > .config_mask = 0xFFFF, > .baud_rate = 115200, > .parity = RTSER_NO_PARITY, > .data_bits = RTSER_8_BITS, > .stop_bits = RTSER_1_STOPB, > .handshake = RTSER_NO_HAND, > .fifo_depth = RTSER_DEF_FIFO_DEPTH, //RTSER_FIFO_DEPTH_8, > .reserved = 0, > .rx_timeout = 500000, > .tx_timeout = RTSER_DEF_TIMEOUT, > .event_timeout = 5000000, > .timestamp_history = RTSER_RX_TIMESTAMP_HISTORY, > .event_mask = RTSER_EVENT_RXPEND, > }; > fd_tty[0] = rt_dev_open("rtser1", O_RDWR | O_NONBLOCK); > sret = rt_dev_ioctl(fd_tty[0], RTSER_RTIOC_SET_CONFIG, &serial_config); > > The application transmits a packet of about 75 bytes repeatedly from a > xenomai periodic task that wakes up at 125Hz repeatedly. Note that there is > also a small RX serial packet arriving so there is some full-duplex > overlap. On rtser0 this works fine, on the other serial ports the stall > happens after a few hours and my periodic xenomai task stops. There is no > xenomai watchdog message in dmesg. The code is repeatedly checking the > serial port status ioctl and there are no errors like framing errors etc. > > The periodic task is just a typical xenomai while() loop: > next += period_ns + adjust_ns; > rt_task_sleep_until(next); > > When my periodic task stops the kernel says the stack trace is: > [root@oyx ~]# cd /proc/1066/task/1075/ > [root@oyx 1075]# cat stack > [] xnpod_suspend_thread+0x3d8/0x650 > [] xnsynch_sleep_on+0x139/0x320 > [] rtdm_event_timedwait+0x2e4/0x390 > [] rt_16550_write+0x35b/0x540 [xeno_16550A] This means the driver is stuck while writing because there are no more free entries in the hardware TX FIFO. Do you have hardware flow control enabled? Are you sure the that the receiving side is playing nicely? Jan > [] __rt_dev_write+0x63/0x110 > [] sys_rtdm_write+0x24/0x30 > [] hisyscall_event+0x1ec/0x380 > [] ipipe_syscall_hook+0x3a/0x50 > [] __ipipe_notify_syscall+0xb0/0x160 > [] pipeline_syscall+0x7/0x18 > [] 0xffffffff > > I can attach with a debugger, and when I do I think the debugger gets us > out of the stall, so can actually single step the code for a little while. > I can't see any suspicious variable values, only that the serial port > transmitted 40 of my 75 bytes, which is unusual. But I can only single step > until my task sleeps one more time. At the next wakeup if I step into the > rt_dev_write() the task stalls forever and I can no longer debug. > > (gdb) thread 2 > [Switching to thread 2 (Thread 0xb7797b40 (LWP 1336))] > #0 0xb77caa92 in _dl_sysinfo_int80 () from /lib/ld-linux.so.2 > (gdb) where > #0 0xb77caa92 in _dl_sysinfo_int80 () from /lib/ld-linux.so.2 > #1 0xb775d872 in rt_dev_write (fd=12, buf=0xa8eda001, nbyte=72) at > core.c:72 > #2 0x08056515 in Process_serial (comm_p=0x810e644 , > portnum=1 '\001') at periodic_app.cpp:5404 > #3 0x0804e0e4 in Periodic_routine (cookie=0x0) at periodic_app.cpp:1654 > #4 0xb7764acd in rt_task_trampoline (cookie=0x0) at task.c:113 > #5 0xb777a313 in start_thread () from /lib/libpthread.so.0 > #6 0xb7528f2e in clone () from /lib/libc.so.6 > > I'm using an Intel I5 CPU, 32 bit kernel 3.18.20, Xenomai 2.6.5. I must be > on this Xenomai/kernel version to support tens of thousands of lines of > legacy code. I diffed the driver sources and the rtl_16550 driver did not > functionally change between Xenomai 2.6.5 and Xenomai 3.0.8. > > I looked at the rt_dev_write() source code, but I don't see an obvious > infinite loop (though the assembly code is a bit beyond my understanding). > I'd like to detect the problem early and continue without stalling. > It seems the physical serial ports are misbehaving, sure. But what would > make rt_dev_write() stall forever? > > thanks, > C Smith > -- Siemens AG, Corporate Technology, CT RDA IOT SES-DE Corporate Competence Center Embedded Linux