All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jan Kiszka <jan.kiszka@siemens.com>
To: C Smith <csmithquestions@gmail.com>, Steve Freyder <steve@freyder.net>
Cc: "xenomai@xenomai.org" <xenomai@xenomai.org>
Subject: Re: rt_dev_send() stalls periodic task
Date: Tue, 23 Apr 2019 14:15:58 +0200	[thread overview]
Message-ID: <dccf6d29-e996-5c78-93fe-a74879270a44@siemens.com> (raw)
In-Reply-To: <ebb4d71b-3d5b-9d66-270f-ca35e7785bce@web.de>

On 22.04.19 08:45, Jan Kiszka via Xenomai wrote:
> On 22.04.19 08:40, C Smith via Xenomai wrote:
>> Thanks for your insight, Steve. I didn't realize rt_dev_write() doesnt
>> actually stall until it is called many times and the 4K TX buffer gets
>> full. (is that right Jan?)
>> It that is the case, sure I could find a way to check the TX buffer fill
>> level to prevent my app from stalling.
>>
>> I rewrote the xeno_16550A driver RTSER_RTIOC_GET_STATUS ioctl to return to
>> userspace the contents of the IIR and the IER too.
>> I'm getting IIR = 0b 0001 0100, so the source of the latest interrupt is a
>> RX (not surprising, as I'm doing full duplex) and there is no THRE
>> interrupt pending.
>> So regardless of the ultimate cause, this state will never empty the TX
>> buffer.
>>
>> I think my only choice is to try something I had to do once before on a
>> similarly misbehaving serial port: I'll rewrite the xeno_16550A interrupt
>> handlers to redundantly check for data pending in the TX buffer whenever
>> any interrupt like an RX interrupt happens. I do have bidirectional traffic
>> after all, so the driver will wake up frequently and keep the TX data
>> transmitting.
>>
>> Interesting enough, the stall problem did not occur when I used the sample
>> serial code provided by xenomai: cross-link.c . I also rewrote cross-link.c
>> to send a 72 byte packet and receive on the same port (I installed a
>> physical loopback device on the serial port). No stalls for 12+ hours with
>> packets streaming at 100 Hz.
>> The only difference in the serial configuration between that cross-link.c
>> app and my app was :
>> struct rtser_config :
>>          .rx_timeout        = RTSER_DEF_TIMEOUT  // infinite ,  no stall for
>> many hours in cross-link.c
>> versus:
>>          .rx_timeout        = 500000   // 500us, stalls within an hour in my
>> app
>> I don't know why an RX setting affects TX behavior. I also can't use
>> RTSER_DEF_TIMEOUT in my application or it dies when it starts up - no clue
>> why.  But I did try setting
>>    .rx_timeout        = 5000000   // 5 ms. my app doesnt stall for several
>> hours
>> and though that did not cause the serial to stall in my app for several
>> hours of testing, it is just open-loop finger-crossing, and not a real
>> solution.
>> I need the TX interrupts to fire reliably. So I think I must rewrite that
>> interrupt handler, as above.
>>
> 
> I think we have a race between rt_16550_write filling the software queue that
> the tx interrupt is supposed to write out and the latter already firing,
> consuming that event without seeing the queue filled. I'll think about a better
> algorithm tomorrow, one that can possibly get rid of some interrupt events as well.
> 

Could you try this diff?

diff --git a/kernel/drivers/serial/16550A.c b/kernel/drivers/serial/16550A.c
index 24415cf67c..369d65be66 100644
--- a/kernel/drivers/serial/16550A.c
+++ b/kernel/drivers/serial/16550A.c
@@ -190,7 +190,7 @@ static inline int rt_16550_rx_interrupt(struct rt_16550_context *ctx,
 	return rbytes;
 }
 
-static inline void rt_16550_tx_interrupt(struct rt_16550_context *ctx)
+static void rt_16550_tx_fill(struct rt_16550_context *ctx)
 {
 	int c;
 	int count;
@@ -248,7 +248,7 @@ static int rt_16550_interrupt(rtdm_irq_t * irq_context)
 		} else if (iir == IIR_STAT)
 			rt_16550_stat_interrupt(ctx);
 		else if (iir == IIR_TX)
-			rt_16550_tx_interrupt(ctx);
+			rt_16550_tx_fill(ctx);
 		else if (iir == IIR_MODEM) {
 			modem = rt_16550_reg_in(mode, base, MSR);
 			if (modem & (modem << 4))
@@ -951,6 +951,7 @@ ssize_t rt_16550_write(struct rtdm_fd *fd, const void *buf, size_t nbyte)
 	int block;
 	int subblock;
 	int out_pos;
+	int lsr;
 	char *in_pos = (char *)buf;
 	rtdm_toseq_t timeout_seq;
 	ssize_t ret;
@@ -1027,11 +1028,18 @@ ssize_t rt_16550_write(struct rtdm_fd *fd, const void *buf, size_t nbyte)
 			    (ctx->out_tail + block) & (OUT_BUFFER_SIZE - 1);
 			ctx->out_npend += block;
 
-			/* unmask tx interrupt */
-			ctx->ier_status |= IER_TX;
-			rt_16550_reg_out(rt_16550_io_mode_from_ctx(ctx),
-					 ctx->base_addr, IER,
-					 ctx->ier_status);
+			lsr = rt_16550_reg_in(rt_16550_io_mode_from_ctx(ctx),
+					      ctx->base_addr, LSR);
+			if (lsr & RTSER_LSR_THR_EMTPY)
+				rt_16550_tx_fill(ctx);
+
+			if (ctx->out_npend > 0) {
+				/* unmask tx interrupt */
+				ctx->ier_status |= IER_TX;
+				rt_16550_reg_out(rt_16550_io_mode_from_ctx(ctx),
+						ctx->base_addr, IER,
+						ctx->ier_status);
+			}
 
 			rtdm_lock_put_irqrestore(&ctx->lock, lock_ctx);
 			continue;

Only compile-tested so far. In theory, it should plug that race and
avoid enabling the TX interrupt in case the FIFO can take the complete
write directly.

Thanks,
Jan

-- 
Siemens AG, Corporate Technology, CT RDA IOT SES-DE
Corporate Competence Center Embedded Linux


  parent reply	other threads:[~2019-04-23 12:15 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-04-15 17:28 rt_dev_send() stalls periodic task C Smith
2019-04-16  8:03 ` Jan Kiszka
2019-04-18  6:42   ` C Smith
2019-04-18  8:36     ` Jan Kiszka
2019-04-21  4:33       ` C Smith
2019-04-21 20:10         ` Steve Freyder
2019-04-22  6:40           ` C Smith
2019-04-22  6:45             ` Jan Kiszka
2019-04-22 19:51               ` Steve Freyder
2019-04-22 20:58                 ` Steve Freyder
2019-04-22 22:56                   ` C Smith
2019-04-22 23:44                     ` Steve Freyder
2019-04-23 12:15               ` Jan Kiszka [this message]
2019-04-24  6:53                 ` C Smith
2019-04-25  7:15                 ` C Smith
2019-04-25  8:23                   ` Jan Kiszka
2019-04-26  0:59                     ` C Smith
2019-04-26 16:38                       ` Jan Kiszka
2019-04-24 13:05 Jeff Webb
2019-04-24 14:36 ` Jan Kiszka
2019-04-26  0:41   ` Jeff Webb

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=dccf6d29-e996-5c78-93fe-a74879270a44@siemens.com \
    --to=jan.kiszka@siemens.com \
    --cc=csmithquestions@gmail.com \
    --cc=steve@freyder.net \
    --cc=xenomai@xenomai.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.