* [PATCH] IRQ handling race and spurious IIR read in serial/8250.c
@ 2009-03-12 17:57 ` Markus Armbruster
0 siblings, 0 replies; 10+ messages in thread
From: Markus Armbruster @ 2009-03-12 17:57 UTC (permalink / raw)
To: linux-kernel
Cc: linux-serial, xen-devel, Ian Jackson, Anders Kaseorg,
Jeremy Fitzhardinge, Andrew Morton
From: Ian Jackson <ian.jackson@eu.citrix.com>
Do not read IIR in serial8250_start_tx when UART_BUG_TXEN
Reading the IIR clears some oustanding interrupts so it is not safe.
Instead, simply transmit immediately if the buffer is empty without
regard to IIR.
Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
---
Ian Jackson recently debugged a problem reported by Anders Kaseorg, and
posted a fix (see http://lkml.org/lkml/2009/2/11/240). As far as I can
see, the patch has been dropped on the floor.
I also experienced the problem, and Ian's patch fixes it for me.
The bug bites Xen HVM guests. Output to the serial console stalls until
some input happens. As Ian's analysis quoted below shows, it is a race
condition that could theoretically bite elsewhere as well.
Ian Jackson explained:
> The bugs in detail (this discussion applies to 2.6.20 and also to
> 2.6.28.4):
>
> 1. The hunk of serial8250_startup I quote below attempts to discover
> whether writing the IER re-asserts the THRI (transmit ready)
> interrupt. However the spinlock that it has taken out,
> port->lock, is not the one that the IRQ service routine takes
> before reading the IIR (i->lock). As a result, on an SMP system
> the generated interrupt races with the straight-line code in
> serial8250_startup.
>
> If serial8250_startup loses the race (perhaps because the system
> is a VM and its VCPU got preempted), UART_BUG_TXEN is spuriously
> added to bugs. This is quite unlikely in a normal system but in
> certain Xen configurations, particularly ones where there is CPU
> pressure, we may lose the race every time.
>
> It is not exactly clear to me how this ought to be resolved. One
> possibility is that the UART_BUG_TXEN problem might be worked
> around perfectly well by the new and very similar workaround
> UART_BUG_THRE[1] in 2.6.21ish in which case it could just be
> removed.
> 2. UART_BUG_TXEN's workaround appears to be intended to be
> harmless.
> However what it actually does is to read the IIR, thus clearing
> any actual interrupt (including incidentally non-THRI), and then
> only perform the intended servicing if the interrupt was _not_
> asserted. That is, it breaks on any serial port with the bug.
>
> As far as I can see there is not much use in UART_BUG_TXEN reading
> IIR at all, so a suitable change if we want to keep UART_BUG_TXEN
> might be the first patch I enclose below (again, not compiled
> or tested).
>
> If UART_BUG_TXEN is retained something along these lines should be
> done at the very least.
>
> Ian.
>
> [1] http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=40b36daad0ac704e6d5c1b75789f371ef5b053c1
> in which case UART
--- ../linux-2.6.28.4/drivers/serial/8250.c~ 2009-02-06 21:47:45.000000000 +0000
+++ ../linux-2.6.28.4/drivers/serial/8250.c 2009-02-11 15:55:24.000000000 +0000
@@ -1257,14 +1257,12 @@
serial_out(up, UART_IER, up->ier);
if (up->bugs & UART_BUG_TXEN) {
- unsigned char lsr, iir;
+ unsigned char lsr;
lsr = serial_in(up, UART_LSR);
up->lsr_saved_flags |= lsr & LSR_SAVE_FLAGS;
- iir = serial_in(up, UART_IIR) & 0x0f;
if ((up->port.type == PORT_RM9000) ?
- (lsr & UART_LSR_THRE &&
- (iir == UART_IIR_NO_INT || iir == UART_IIR_THRI)) :
- (lsr & UART_LSR_TEMT && iir & UART_IIR_NO_INT))
+ (lsr & UART_LSR_THRE) :
+ (lsr & UART_LSR_TEMT))
transmit_chars(up);
}
}
^ permalink raw reply [flat|nested] 10+ messages in thread
* [PATCH] IRQ handling race and spurious IIR read in serial/8250.c
@ 2009-03-12 17:57 ` Markus Armbruster
0 siblings, 0 replies; 10+ messages in thread
From: Markus Armbruster @ 2009-03-12 17:57 UTC (permalink / raw)
To: linux-kernel
Cc: linux-serial, xen-devel, Ian Jackson, Anders Kaseorg,
Jeremy Fitzhardinge, Andrew Morton
From: Ian Jackson <ian.jackson@eu.citrix.com>
Do not read IIR in serial8250_start_tx when UART_BUG_TXEN
Reading the IIR clears some oustanding interrupts so it is not safe.
Instead, simply transmit immediately if the buffer is empty without
regard to IIR.
Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
---
Ian Jackson recently debugged a problem reported by Anders Kaseorg, and
posted a fix (see http://lkml.org/lkml/2009/2/11/240). As far as I can
see, the patch has been dropped on the floor.
I also experienced the problem, and Ian's patch fixes it for me.
The bug bites Xen HVM guests. Output to the serial console stalls until
some input happens. As Ian's analysis quoted below shows, it is a race
condition that could theoretically bite elsewhere as well.
Ian Jackson explained:
> The bugs in detail (this discussion applies to 2.6.20 and also to
> 2.6.28.4):
>
> 1. The hunk of serial8250_startup I quote below attempts to discover
> whether writing the IER re-asserts the THRI (transmit ready)
> interrupt. However the spinlock that it has taken out,
> port->lock, is not the one that the IRQ service routine takes
> before reading the IIR (i->lock). As a result, on an SMP system
> the generated interrupt races with the straight-line code in
> serial8250_startup.
>
> If serial8250_startup loses the race (perhaps because the system
> is a VM and its VCPU got preempted), UART_BUG_TXEN is spuriously
> added to bugs. This is quite unlikely in a normal system but in
> certain Xen configurations, particularly ones where there is CPU
> pressure, we may lose the race every time.
>
> It is not exactly clear to me how this ought to be resolved. One
> possibility is that the UART_BUG_TXEN problem might be worked
> around perfectly well by the new and very similar workaround
> UART_BUG_THRE[1] in 2.6.21ish in which case it could just be
> removed.
> 2. UART_BUG_TXEN's workaround appears to be intended to be
> harmless.
> However what it actually does is to read the IIR, thus clearing
> any actual interrupt (including incidentally non-THRI), and then
> only perform the intended servicing if the interrupt was _not_
> asserted. That is, it breaks on any serial port with the bug.
>
> As far as I can see there is not much use in UART_BUG_TXEN reading
> IIR at all, so a suitable change if we want to keep UART_BUG_TXEN
> might be the first patch I enclose below (again, not compiled
> or tested).
>
> If UART_BUG_TXEN is retained something along these lines should be
> done at the very least.
>
> Ian.
>
> [1] http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=40b36daad0ac704e6d5c1b75789f371ef5b053c1
> in which case UART
--- ../linux-2.6.28.4/drivers/serial/8250.c~ 2009-02-06 21:47:45.000000000 +0000
+++ ../linux-2.6.28.4/drivers/serial/8250.c 2009-02-11 15:55:24.000000000 +0000
@@ -1257,14 +1257,12 @@
serial_out(up, UART_IER, up->ier);
if (up->bugs & UART_BUG_TXEN) {
- unsigned char lsr, iir;
+ unsigned char lsr;
lsr = serial_in(up, UART_LSR);
up->lsr_saved_flags |= lsr & LSR_SAVE_FLAGS;
- iir = serial_in(up, UART_IIR) & 0x0f;
if ((up->port.type == PORT_RM9000) ?
- (lsr & UART_LSR_THRE &&
- (iir == UART_IIR_NO_INT || iir == UART_IIR_THRI)) :
- (lsr & UART_LSR_TEMT && iir & UART_IIR_NO_INT))
+ (lsr & UART_LSR_THRE) :
+ (lsr & UART_LSR_TEMT))
transmit_chars(up);
}
}
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] IRQ handling race and spurious IIR read in serial/8250.c
2009-03-12 17:57 ` Markus Armbruster
@ 2009-03-12 18:09 ` Alan Cox
-1 siblings, 0 replies; 10+ messages in thread
From: Alan Cox @ 2009-03-12 18:09 UTC (permalink / raw)
To: Markus Armbruster
Cc: linux-kernel, linux-serial, xen-devel, Ian Jackson,
Anders Kaseorg, Jeremy Fitzhardinge, Andrew Morton
> Ian Jackson recently debugged a problem reported by Anders Kaseorg, and
> posted a fix (see http://lkml.org/lkml/2009/2/11/240). As far as I can
> see, the patch has been dropped on the floor.
It was proposed as a band aid fix not a real fix. The real fix involves
finishing sorting out any locking issues Ian identified. I've not seen a
patch for that yet.
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] IRQ handling race and spurious IIR read in serial/8250.c
@ 2009-03-12 18:09 ` Alan Cox
0 siblings, 0 replies; 10+ messages in thread
From: Alan Cox @ 2009-03-12 18:09 UTC (permalink / raw)
To: Markus Armbruster
Cc: Jeremy Fitzhardinge, Ian, Jackson, linux-kernel, Anders Kaseorg,
xen-devel, linux-serial, Andrew Morton
> Ian Jackson recently debugged a problem reported by Anders Kaseorg, and
> posted a fix (see http://lkml.org/lkml/2009/2/11/240). As far as I can
> see, the patch has been dropped on the floor.
It was proposed as a band aid fix not a real fix. The real fix involves
finishing sorting out any locking issues Ian identified. I've not seen a
patch for that yet.
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] IRQ handling race and spurious IIR read in serial/8250.c
2009-03-12 18:09 ` Alan Cox
@ 2009-03-12 19:30 ` Ian Jackson
-1 siblings, 0 replies; 10+ messages in thread
From: Ian Jackson @ 2009-03-12 19:30 UTC (permalink / raw)
To: Alan Cox
Cc: Markus Armbruster, linux-kernel, linux-serial, xen-devel,
Anders Kaseorg, Jeremy Fitzhardinge, Andrew Morton
Alan Cox writes ("Re: [PATCH] IRQ handling race and spurious IIR read in serial/8250.c"):
> > Ian Jackson recently debugged a problem reported by Anders Kaseorg, and
> > posted a fix (see http://lkml.org/lkml/2009/2/11/240). As far as I can
> > see, the patch has been dropped on the floor.
>
> It was proposed as a band aid fix not a real fix. The real fix involves
> finishing sorting out any locking issues Ian identified. I've not seen a
> patch for that yet.
The patch I have provided and which Markus has reposted has these
properties:
1. It is definitely correct and does not introduce any new bugs.
2. It makes the existing bug definitely go away. That is, after my
patch is applied the code drives the UART correctly - that is,
according to the specification and in a manner guaranteed to be
reliable - even though the code may still mistakenly set a bug
flag;
3. Without it the code is definitely wrong;
4. Any fix which does not involve completely removing UART_BUG_TXEN
will need my change.
Or to put it another way
UART_BUG_TXEN UART_BUG_TXEN
incorrectly set behaviour
Current code Sometimes in VMs Breaks correct systems
Rarely in baremetal? (observed in production)
With my patch Sometimes in VMs Always correct.
(behaviour unchanged) Minor possible
performance problem.
Perfect code Perhaps feature If retained, should
should be removed? be correct. Minor
performance impact
is acceptable.
My patch is therefore a strict improvement and also a likely component
of many of the possible ultimate fixes; the other ultimate fixes
involve removing entirely the code which I am now proposing to patch.
Please do not block this bugfix just because I haven't been able to
conclusively determine whether the buggy-without-my-patch workaround
should be entirely removed, and just because I do not fix every other
bug in the same area.
In particular, the fact that the detection of UART_BUG_TXEN remains
buggy is not a reason to block a patch which makes UART_BUG_TXEN's
effects correct.
As you can see `perfect code' is not yet attainable because the people
who originally implemented UART_BUG_TXEN don't seem to be around right
now to ask, and we don't have a 100% clear description of the buggy
behaviour it is trying to detect, and so we don't know whether it can
safely be removed.
Ian.
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] IRQ handling race and spurious IIR read in serial/8250.c
@ 2009-03-12 19:30 ` Ian Jackson
0 siblings, 0 replies; 10+ messages in thread
From: Ian Jackson @ 2009-03-12 19:30 UTC (permalink / raw)
To: Alan Cox
Cc: Jeremy Fitzhardinge, xen-devel, linux-kernel, Markus Armbruster,
Anders Kaseorg, linux-serial, Andrew Morton
Alan Cox writes ("Re: [PATCH] IRQ handling race and spurious IIR read in serial/8250.c"):
> > Ian Jackson recently debugged a problem reported by Anders Kaseorg, and
> > posted a fix (see http://lkml.org/lkml/2009/2/11/240). As far as I can
> > see, the patch has been dropped on the floor.
>
> It was proposed as a band aid fix not a real fix. The real fix involves
> finishing sorting out any locking issues Ian identified. I've not seen a
> patch for that yet.
The patch I have provided and which Markus has reposted has these
properties:
1. It is definitely correct and does not introduce any new bugs.
2. It makes the existing bug definitely go away. That is, after my
patch is applied the code drives the UART correctly - that is,
according to the specification and in a manner guaranteed to be
reliable - even though the code may still mistakenly set a bug
flag;
3. Without it the code is definitely wrong;
4. Any fix which does not involve completely removing UART_BUG_TXEN
will need my change.
Or to put it another way
UART_BUG_TXEN UART_BUG_TXEN
incorrectly set behaviour
Current code Sometimes in VMs Breaks correct systems
Rarely in baremetal? (observed in production)
With my patch Sometimes in VMs Always correct.
(behaviour unchanged) Minor possible
performance problem.
Perfect code Perhaps feature If retained, should
should be removed? be correct. Minor
performance impact
is acceptable.
My patch is therefore a strict improvement and also a likely component
of many of the possible ultimate fixes; the other ultimate fixes
involve removing entirely the code which I am now proposing to patch.
Please do not block this bugfix just because I haven't been able to
conclusively determine whether the buggy-without-my-patch workaround
should be entirely removed, and just because I do not fix every other
bug in the same area.
In particular, the fact that the detection of UART_BUG_TXEN remains
buggy is not a reason to block a patch which makes UART_BUG_TXEN's
effects correct.
As you can see `perfect code' is not yet attainable because the people
who originally implemented UART_BUG_TXEN don't seem to be around right
now to ask, and we don't have a 100% clear description of the buggy
behaviour it is trying to detect, and so we don't know whether it can
safely be removed.
Ian.
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [Xen-devel] Re: [PATCH] IRQ handling race and spurious IIR read in serial/8250.c
2009-03-12 19:30 ` Ian Jackson
@ 2009-03-12 21:38 ` Alan Cox
-1 siblings, 0 replies; 10+ messages in thread
From: Alan Cox @ 2009-03-12 21:38 UTC (permalink / raw)
To: Ian Jackson
Cc: Jeremy Fitzhardinge, xen-devel, linux-kernel, Markus Armbruster,
Anders Kaseorg, linux-serial, Andrew Morton
> 1. It is definitely correct and does not introduce any new bugs.
Right ;)
> 2. It makes the existing bug definitely go away. That is, after my
> patch is applied the code drives the UART correctly - that is,
> according to the specification and in a manner guaranteed to be
The UART hardware and specifications diverge - often quite spectacularly.
> 4. Any fix which does not involve completely removing UART_BUG_TXEN
> will need my change.
Or the locking fix
It was described as a band aid so I treated it as such. Regardless of the
right thing to do long term its not a 2.6.29 candidate at this point but
certainly something that wants looking into in 2.6.30
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Re: [PATCH] IRQ handling race and spurious IIR read in serial/8250.c
@ 2009-03-12 21:38 ` Alan Cox
0 siblings, 0 replies; 10+ messages in thread
From: Alan Cox @ 2009-03-12 21:38 UTC (permalink / raw)
To: Ian Jackson
Cc: Jeremy Fitzhardinge, xen-devel, Markus Armbruster, linux-kernel,
Anders Kaseorg, linux-serial, Andrew Morton
> 1. It is definitely correct and does not introduce any new bugs.
Right ;)
> 2. It makes the existing bug definitely go away. That is, after my
> patch is applied the code drives the UART correctly - that is,
> according to the specification and in a manner guaranteed to be
The UART hardware and specifications diverge - often quite spectacularly.
> 4. Any fix which does not involve completely removing UART_BUG_TXEN
> will need my change.
Or the locking fix
It was described as a band aid so I treated it as such. Regardless of the
right thing to do long term its not a 2.6.29 candidate at this point but
certainly something that wants looking into in 2.6.30
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [Xen-devel] Re: [PATCH] IRQ handling race and spurious IIR read in serial/8250.c
2009-03-12 21:38 ` Alan Cox
(?)
@ 2009-06-19 1:39 ` Robert Evans
-1 siblings, 0 replies; 10+ messages in thread
From: Robert Evans @ 2009-06-19 1:39 UTC (permalink / raw)
To: Alan Cox
Cc: Ian Jackson, Jeremy Fitzhardinge, xen-devel, Markus Armbruster,
Anders Kaseorg, linux-serial, Andrew Morton
On Thu, Mar 12, 2009 at 5:38 PM, Alan Cox<alan@lxorguk.ukuu.org.uk> wrote:
...
>> 4. Any fix which does not involve completely removing UART_BUG_TXEN
>> will need my change.
>
> Or the locking fix
>
> It was described as a band aid so I treated it as such. Regardless of the
> right thing to do long term its not a 2.6.29 candidate at this point but
> certainly something that wants looking into in 2.6.30
Stratus has experienced this same problem on bare metal. An analysis
and proposed fixes were posted to this list on 2008-08-13. See
http://marc.info/?l=linux-serial&m=121863945506976&w=2
The first proposed fix takes the port's irq lock when starting up the
UART to provide mutual exclusion between the "quick test" in the
startup code and the interrupt service routine. Stratus has quite a
bit or runtime with this locking fix on kernels of the 2.6.18 vintage
with good success.
Is this a "right thing" fix that could be considered for inclusion upstream?
--- linux-2.6.18-92-old/drivers/serial/8250.c 2008-08-13
14:07:08.000000000 +0000
+++ linux-2.6.18-92-new/drivers/serial/8250.c 2008-08-13
14:04:12.000000000 +0000
@@ -1749,65 +1749,73 @@
timeout = timeout > 6 ? (timeout / 2 - 2) : 1;
up->timer.data = (unsigned long)up;
mod_timer(&up->timer, jiffies + timeout);
} else {
retval = serial_link_irq_chain(up);
if (retval)
return retval;
}
/*
* Now, initialize the UART
*/
serial_outp(up, UART_LCR, UART_LCR_WLEN8);
- spin_lock_irqsave(&up->port.lock, flags);
+ if (is_real_interrupt(up->port.irq)) {
+ spin_lock_irqsave(&irq_lists[up->port.irq].lock, flags);
+ spin_lock(&up->port.lock);
+ } else
+ spin_lock_irqsave(&up->port.lock, flags);
if (up->port.flags & UPF_FOURPORT) {
if (!is_real_interrupt(up->port.irq))
up->port.mctrl |= TIOCM_OUT1;
} else
/*
* Most PC uarts need OUT2 raised to enable interrupts.
*/
if (is_real_interrupt(up->port.irq))
up->port.mctrl |= TIOCM_OUT2;
serial8250_set_mctrl(&up->port, up->port.mctrl);
/*
* Do a quick test to see if we receive an
* interrupt when we enable the TX irq.
*/
serial_outp(up, UART_IER, UART_IER_THRI);
lsr = serial_in(up, UART_LSR);
iir = serial_in(up, UART_IIR);
serial_outp(up, UART_IER, 0);
if (lsr & UART_LSR_TEMT && iir & UART_IIR_NO_INT) {
if (!(up->bugs & UART_BUG_TXEN)) {
up->bugs |= UART_BUG_TXEN;
pr_debug("ttyS%d - enabling bad tx status workarounds\n",
port->line);
}
} else {
up->bugs &= ~UART_BUG_TXEN;
}
- spin_unlock_irqrestore(&up->port.lock, flags);
+ if (is_real_interrupt(up->port.irq)) {
+ spin_unlock(&up->port.lock);
+ spin_unlock_irqrestore(&irq_lists[up->port.irq].lock, flags);
+ } else
+ spin_unlock_irqrestore(&up->port.lock, flags);
/*
* Finally, enable interrupts. Note: Modem status interrupts
* are set via set_termios(), which will be occurring imminently
* anyway, so we don't enable them here.
*/
up->ier = UART_IER_RLSI | UART_IER_RDI;
serial_outp(up, UART_IER, up->ier);
if (up->port.flags & UPF_FOURPORT) {
unsigned int icp;
/*
* Enable interrupts on the AST Fourport board
*/
icp = (up->port.iobase & 0xfe0) | 0x01f;
outb_p(0x80, icp);
Robert N. Evans
Software Engineer
STRATUS TECHNOLOGIES
111 Powdermill Road,
Maynard, MA 01754-3409 U.S.A.
^ permalink raw reply [flat|nested] 10+ messages in thread
* Serial console hangs with Linux 2.6.20 HVM guest
@ 2009-02-05 2:23 Anders Kaseorg
2009-02-05 17:04 ` Ian Jackson
0 siblings, 1 reply; 10+ messages in thread
From: Anders Kaseorg @ 2009-02-05 2:23 UTC (permalink / raw)
To: xen-devel
I am seeing a problem with the Xen emulated serial console. When
running a Linux 2.6.20 HVM guest that has CONFIG_HOTPLUG_CPU=n, the
guest blocks on output to the console until it receives input keypresses
from `xm console`. This prevents the guest from booting up without
banging on some keys, and makes interactive use of the console
difficult.
By bisecting Linux kernel commits, I found that the bug goes away in
commit 40b36daad0ac704e6d5c1b75789f371ef5b053c1 (v2.6.21-rc1~261), which
is a workaround for buggy UARTs on certain HP machines.
http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=40b36daad0ac704e6d5c1b75789f371ef5b053c1
“The patch below works around a minor bug found in the UART of the
remote management card used in many HP ia64 and parisc servers (aka the
Diva UARTs). The problem is that the UART does not reassert the THRE
interrupt if it has been previously cleared and the IIR THRI bit is
re-enabled. This can produce a very annoying failure mode when used as
a serial console, allowing a boot/reboot to hang indefinitely until an
RX interrupt kicks it into working again (ie. an unattended reboot could
stall).”
That matches my symptoms exactly, which suggests that the Xen UART
probably has a similar bug.
I’ve seen this in Xen 3.2.1 and 3.3.1. (My host is running Debian Lenny
amd64, with the Xen dom0 kernel 2.6.24-23-xen from Ubuntu Hardy, on a
server with two quad-core Xeons.)
Anders
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Serial console hangs with Linux 2.6.20 HVM guest
2009-02-05 2:23 Serial console hangs with Linux 2.6.20 HVM guest Anders Kaseorg
@ 2009-02-05 17:04 ` Ian Jackson
2009-02-05 19:34 ` Anders Kaseorg
0 siblings, 1 reply; 10+ messages in thread
From: Ian Jackson @ 2009-02-05 17:04 UTC (permalink / raw)
To: Anders Kaseorg; +Cc: xen-devel
Anders Kaseorg writes ("[Xen-devel] Serial console hangs with Linux 2.6.20 HVM guest"):
> http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=40b36daad0ac704e6d5c1b75789f371ef5b053c1
Thanks for this. However looking at the code I'm not sure that's
what's actually going on. Very old versions of qemu did have the bug
that patch describes, but it was fixed in qemu upstream in 2004 in
this commit:
commit 60e336dbb837ef4d5053433f9ee391feb102be36
Author: bellard <bellard@c046a42c-6fe2-441c-8c8c-71466251a162>
Date: Tue Aug 24 21:55:28 2004 +0000
serial interrupt fix (Hampa Hug)
git-svn-id: svn://svn.savannah.nongnu.org/qemu/trunk@1049 c046a42c-6fe2-441c-8c8c-71466251a162
As far as I can see this fix is included in all relevant versions of
qemu. The upstream Linux changeset seems like it would fix any
general IRQ loss problem with the UART.
If you can reproduce this problem, I would like to try to debug it.
Probably the best way is to attach to the running qemu with gdb:
* Add
CFLAGS += -g -O0
to xen-unstable.hg/.config (creating it if necessary) and rebuild
and install the resulting qemu-dm binary where it will be built.
* Start the VM in the usual way.
* cd to the qemu-xen-unstable tree (probably tools/ioemu-remote);
use ps to find the pid of qemu-dm, and say
gdb i386-dm/qemu-dm <pid>
and then at the gdb prompt say
handle SIGUSR2 nostop noprint
break serial_ioport_write if (addr&7)==1
cont
* do whatever it is that makes the VM stuck
* when it next stops it will be in serial_ioport_write setting
the IER. So
print val
print *s
and email the output to this list.
NB I haven't tested this recipe. If you get syntax errors or
something doesn't work I'll actually do so and let you know what to
type.
Ian.
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Serial console hangs with Linux 2.6.20 HVM guest
2009-02-05 17:04 ` Ian Jackson
@ 2009-02-05 19:34 ` Anders Kaseorg
2009-02-05 21:52 ` Anders Kaseorg
0 siblings, 1 reply; 10+ messages in thread
From: Anders Kaseorg @ 2009-02-05 19:34 UTC (permalink / raw)
To: Ian Jackson; +Cc: xen-devel
On Thu, 2009-02-05 at 17:04 +0000, Ian Jackson wrote:
> handle SIGUSR2 nostop noprint
> break serial_ioport_write if (addr&7)==1
> cont
> * do whatever it is that makes the VM stuck
> * when it next stops it will be in serial_ioport_write setting
> the IER. So
> print val
> print *s
This breakpoint is triggered for all messages printed by the kernel,
which always showed up with no delay; but it is only occasionally
triggered for strings printed by userspace, even after forcing those
strings to show up by sending keystrokes.
Here is one of the latter cases. (I am sitting at a
“root@andersk-intrepid:~# ” prompt, repeatedly pressing Enter. Each
keypress causes the previous prompt to show up, followed by a newline,
and the current prompt is stalled.)
Breakpoint 1, serial_ioport_write (opaque=0xb342e0, addr=1, val=5)
at /home/andersk/xen-3-3.3.1/debian/build/build-utils_amd64/tools/ioemu-dir/hw/serial.c:413
413 {
(gdb) print val
$5 = 5
(gdb) print *s
$6 = {divider = 1, rbr = 0 '\0', thr = 32 ' ', tsr = 32 ' ', ier = 5 '\005', iir = 193 '�',
lcr = 19 '\023', mcr = 11 '\v', lsr = 96 '`', msr = 176 '�', scr = 0 '\0', fcr = 129 '\201',
thr_ipending = 1, irq = 0xb1d610, chr = 0xb122a0, last_break_enable = 0, base = 0,
it_shift = 0, baudbase = 115200, tsr_retry = 0, last_xmit_ts = 380482341502, recv_fifo = {
data = '\r' <repeats 16 times>, count = 0 '\0', itl = 8 '\b', tail = 0 '\0',
head = 0 '\0'}, xmit_fifo = {data = "repid:~# rsk-int", count = 0 '\0', itl = 0 '\0',
tail = 9 '\t', head = 9 '\t'}, fifo_timeout_timer = 0xb31ad0, timeout_ipending = 0,
transmit_timer = 0xb31b00, char_transmit_time = 78120, poll_msl = -1,
modem_status_poll = 0xb327e0}
Anders
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Serial console hangs with Linux 2.6.20 HVM guest
2009-02-05 19:34 ` Anders Kaseorg
@ 2009-02-05 21:52 ` Anders Kaseorg
2009-02-10 15:34 ` Ian Jackson
0 siblings, 1 reply; 10+ messages in thread
From: Anders Kaseorg @ 2009-02-05 21:52 UTC (permalink / raw)
To: Ian Jackson; +Cc: xen-devel
In case this is more interesting, here is a log of all the breakpoints
hit during a Linux boot, up to the point where it hangs:
http://web.mit.edu/andersk/Public/xen-serial-log
Anders
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Serial console hangs with Linux 2.6.20 HVM guest
2009-02-05 21:52 ` Anders Kaseorg
@ 2009-02-10 15:34 ` Ian Jackson
2009-02-10 18:20 ` Anders Kaseorg
0 siblings, 1 reply; 10+ messages in thread
From: Ian Jackson @ 2009-02-10 15:34 UTC (permalink / raw)
To: Anders Kaseorg; +Cc: xen-devel
Anders Kaseorg writes ("Re: [Xen-devel] Serial console hangs with Linux 2.6.20 HVM guest"):
> In case this is more interesting, here is a log of all the breakpoints
> hit during a Linux boot, up to the point where it hangs:
>
> http://web.mit.edu/andersk/Public/xen-serial-log
Thanks, that's great. But I think it's a kernel bug. The last thing
the kernel does is read the IIR (Interrupt Identification Register)
twice at a time when the transmit FIFO is empty. Reading the IIR is
(sadly) not a side-effect-free operation; specifically, it cancels any
outstanding transmit fifo/buffer empty interrupt[1]. So the first
time it gets told `Transmit Holding Register Empty interrupt', but
that has the effect of clearing the interrupt so the second time it
reads the IIRC it gets `no interrupt pending'.
[1] I'm getting this out of the National Semiconductor datasheet for
the PC16550D, document number TL/C/8652, June 1995. See for example
section 8.11 `FIFO Interrupt Mode Operation' item A:
The transmit holding register interrupt (02) occurs when
the XMIT FIFO is empty; it is cleared as soon as the transmitter
holding register is written to ([...]) or the IIR is read.
As far as I can see qemu-dm is emulating this accurately.
Can you point me at the exact kernel source code you're using ?
Thanks,
Ian.
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Serial console hangs with Linux 2.6.20 HVM guest
2009-02-10 15:34 ` Ian Jackson
@ 2009-02-10 18:20 ` Anders Kaseorg
2009-02-11 16:08 ` [PATCH] IRQ handling race and spurious IIR read in serial/8250.c Ian Jackson
0 siblings, 1 reply; 10+ messages in thread
From: Anders Kaseorg @ 2009-02-10 18:20 UTC (permalink / raw)
To: Ian Jackson; +Cc: xen-devel
On Tue, 10 Feb 2009, Ian Jackson wrote:
> Can you point me at the exact kernel source code you're using ?
Yes, I took Linux v2.6.20 on amd64, ran `make defconfig`, then ran `make
menuconfig` and turned off CONFIG_HOTPLUG_CPU (Processor type and features
→ Support for hot-pluggable CPUs).
Anders
^ permalink raw reply [flat|nested] 10+ messages in thread
* [PATCH] IRQ handling race and spurious IIR read in serial/8250.c
2009-02-10 18:20 ` Anders Kaseorg
@ 2009-02-11 16:08 ` Ian Jackson
2009-02-19 17:52 ` Ian Jackson
0 siblings, 1 reply; 10+ messages in thread
From: Ian Jackson @ 2009-02-11 16:08 UTC (permalink / raw)
To: Linux Kernel Mailing List; +Cc: Anders Kaseorg, xen-devel
Anders Kaseorg writes ("Re: Serial console hangs with Linux 2.6.20 HVM guest"):
> Yes, I took Linux v2.6.20 on amd64, ran `make defconfig`, then ran `make
> menuconfig` and turned off CONFIG_HOTPLUG_CPU (Processor type and features
> ò Support for hot-pluggable CPUs).
Thanks. I think I have tracked down the bug. In
drivers/serial/8250.c in Linux there are two bugs:
1. UART_BUG_TXEN can be spuriously set, due to an IRQ race
2. The workaround then applied by the kernel is itself buggy
Anders: can you try two tests for me ? Firstly, in
serial8250_startup, delete the section which sets UART_BUG_TXEN (see
2nd patch below); I think this will fix the symptoms for you.
Secondly, in serial8250_start_tx delete the read from the IIR and the
relevant branch of the text (see 3rd patch below); I think this will
also in itself fix your symptoms. I haven't compiled either patch (so
you may find that eg I missed deleting some variables).
The bugs in detail (this discussion applies to 2.6.20 and also to
2.6.28.4):
1. The hunk of serial8250_startup I quote below attempts to discover
whether writing the IER re-asserts the THRI (transmit ready)
interrupt. However the spinlock that it has taken out,
port->lock, is not the one that the IRQ service routine takes
before reading the IIR (i->lock). As a result, on an SMP system
the generated interrupt races with the straight-line code in
serial8250_startup.
If serial8250_startup loses the race (perhaps because the system
is a VM and its VCPU got preempted), UART_BUG_TXEN is spuriously
added to bugs. This is quite unlikely in a normal system but in
certain Xen configurations, particularly ones where there is CPU
pressure, we may lose the race every time.
It is not exactly clear to me how this ought to be resolved. One
possibility is that the UART_BUG_TXEN problem might be worked
around perfectly well by the new and very similar workaround
UART_BUG_THRE[1] in 2.6.21ish in which case it could just be
removed.
2. UART_BUG_TXEN's workaround appears to be intended to be harmless.
However what it actually does is to read the IIR, thus clearing
any actual interrupt (including incidentally non-THRI), and then
only perform the intended servicing if the interrupt was _not_
asserted. That is, it breaks on any serial port with the bug.
As far as I can see there is not much use in UART_BUG_TXEN reading
IIR at all, so a suitable change if we want to keep UART_BUG_TXEN
might be the first patch I enclose below (again, not compiled
or tested).
If UART_BUG_TXEN is retained something along these lines should be
done at the very least.
Ian.
[1] http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=40b36daad0ac704e6d5c1b75789f371ef5b053c1
in which case UART
Proposed initial band-aid fix (against 2.6.28.4):
Do not read IIR in serial8250_start_tx when UART_BUG_TXEN
Reading the IIR clears some oustanding interrupts so it is not safe.
Instead, simply transmit immediately if the buffer is empty without
regard to IIR.
Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
--- ../linux-2.6.28.4/drivers/serial/8250.c~ 2009-02-06 21:47:45.000000000 +0000
+++ ../linux-2.6.28.4/drivers/serial/8250.c 2009-02-11 15:55:24.000000000 +0000
@@ -1257,14 +1257,12 @@
serial_out(up, UART_IER, up->ier);
if (up->bugs & UART_BUG_TXEN) {
- unsigned char lsr, iir;
+ unsigned char lsr;
lsr = serial_in(up, UART_LSR);
up->lsr_saved_flags |= lsr & LSR_SAVE_FLAGS;
- iir = serial_in(up, UART_IIR) & 0x0f;
if ((up->port.type == PORT_RM9000) ?
- (lsr & UART_LSR_THRE &&
- (iir == UART_IIR_NO_INT || iir == UART_IIR_THRI)) :
- (lsr & UART_LSR_TEMT && iir & UART_IIR_NO_INT))
+ (lsr & UART_LSR_THRE) :
+ (lsr & UART_LSR_TEMT))
transmit_chars(up);
}
}
Anders - first patch to try (against 2.6.20):
--- drivers/serial/8250.c~ 2007-02-04 18:44:54.000000000 +0000
+++ drivers/serial/8250.c 2009-02-11 15:39:43.000000000 +0000
@@ -1645,25 +1645,6 @@
serial8250_set_mctrl(&up->port, up->port.mctrl);
- /*
- * Do a quick test to see if we receive an
- * interrupt when we enable the TX irq.
- */
- serial_outp(up, UART_IER, UART_IER_THRI);
- lsr = serial_in(up, UART_LSR);
- iir = serial_in(up, UART_IIR);
- serial_outp(up, UART_IER, 0);
-
- if (lsr & UART_LSR_TEMT && iir & UART_IIR_NO_INT) {
- if (!(up->bugs & UART_BUG_TXEN)) {
- up->bugs |= UART_BUG_TXEN;
- pr_debug("ttyS%d - enabling bad tx status workarounds\n",
- port->line);
- }
- } else {
- up->bugs &= ~UART_BUG_TXEN;
- }
-
spin_unlock_irqrestore(&up->port.lock, flags);
/*
Anders - second patch to try (against 2.6.20):
Fix should be suitable for distribution IMO.
Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
--- drivers/serial/8250.c~ 2007-02-04 18:44:54.000000000 +0000
+++ drivers/serial/8250.c 2009-02-11 15:41:51.000000000 +0000
@@ -1136,10 +1136,9 @@
serial_out(up, UART_IER, up->ier);
if (up->bugs & UART_BUG_TXEN) {
- unsigned char lsr, iir;
+ unsigned char lsr;
lsr = serial_in(up, UART_LSR);
- iir = serial_in(up, UART_IIR);
- if (lsr & UART_LSR_TEMT && iir & UART_IIR_NO_INT)
+ if (lsr & UART_LSR_TEMT)
transmit_chars(up);
}
}
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] IRQ handling race and spurious IIR read in serial/8250.c
2009-02-11 16:08 ` [PATCH] IRQ handling race and spurious IIR read in serial/8250.c Ian Jackson
@ 2009-02-19 17:52 ` Ian Jackson
2009-02-19 18:37 ` [Xen-devel] " Markus Armbruster
0 siblings, 1 reply; 10+ messages in thread
From: Ian Jackson @ 2009-02-19 17:52 UTC (permalink / raw)
To: Linux Kernel Mailing List; +Cc: Anders Kaseorg, xen-devel, Markus Armbruster
I wrote:
> In drivers/serial/8250.c in Linux there are two bugs:
> 1. UART_BUG_TXEN can be spuriously set, due to an IRQ race
> 2. The workaround then applied by the kernel is itself buggy
Markus Armbruster has also experienced this problem in a Xen
environment and has confirmed that my patch fixes it.
I think at the very least this change:
> Proposed initial band-aid fix (against 2.6.28.4):
>
> Do not read IIR in serial8250_start_tx when UART_BUG_TXEN
>
> Reading the IIR clears some oustanding interrupts so it is not safe.
> Instead, simply transmit immediately if the buffer is empty without
> regard to IIR.
>
> Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
should be made right away.
Ian.
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [Xen-devel] Re: [PATCH] IRQ handling race and spurious IIR read in serial/8250.c
2009-02-19 17:52 ` Ian Jackson
@ 2009-02-19 18:37 ` Markus Armbruster
0 siblings, 0 replies; 10+ messages in thread
From: Markus Armbruster @ 2009-02-19 18:37 UTC (permalink / raw)
To: Ian Jackson; +Cc: Linux Kernel Mailing List, xen-devel, Anders Kaseorg
Ian Jackson <Ian.Jackson@eu.citrix.com> writes:
> I wrote:
>> In drivers/serial/8250.c in Linux there are two bugs:
>> 1. UART_BUG_TXEN can be spuriously set, due to an IRQ race
>> 2. The workaround then applied by the kernel is itself buggy
>
> Markus Armbruster has also experienced this problem in a Xen
> environment and has confirmed that my patch fixes it.
Correct.
> I think at the very least this change:
>
>> Proposed initial band-aid fix (against 2.6.28.4):
>>
>> Do not read IIR in serial8250_start_tx when UART_BUG_TXEN
>>
>> Reading the IIR clears some oustanding interrupts so it is not safe.
>> Instead, simply transmit immediately if the buffer is empty without
>> regard to IIR.
>>
>> Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
>
> should be made right away.
>
> Ian.
Patch makes sense to me.
Acked-by: Markus Armbruster <armbru@redhat.com>
^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2009-06-19 1:39 UTC | newest]
Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-03-12 17:57 [PATCH] IRQ handling race and spurious IIR read in serial/8250.c Markus Armbruster
2009-03-12 17:57 ` Markus Armbruster
2009-03-12 18:09 ` Alan Cox
2009-03-12 18:09 ` Alan Cox
2009-03-12 19:30 ` Ian Jackson
2009-03-12 19:30 ` Ian Jackson
2009-03-12 21:38 ` [Xen-devel] " Alan Cox
2009-03-12 21:38 ` Alan Cox
2009-06-19 1:39 ` [Xen-devel] " Robert Evans
-- strict thread matches above, loose matches on Subject: below --
2009-02-05 2:23 Serial console hangs with Linux 2.6.20 HVM guest Anders Kaseorg
2009-02-05 17:04 ` Ian Jackson
2009-02-05 19:34 ` Anders Kaseorg
2009-02-05 21:52 ` Anders Kaseorg
2009-02-10 15:34 ` Ian Jackson
2009-02-10 18:20 ` Anders Kaseorg
2009-02-11 16:08 ` [PATCH] IRQ handling race and spurious IIR read in serial/8250.c Ian Jackson
2009-02-19 17:52 ` Ian Jackson
2009-02-19 18:37 ` [Xen-devel] " Markus Armbruster
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.