All of lore.kernel.org
 help / color / mirror / Atom feed
* hvc_console change results in corrupt oops output
@ 2011-07-04 10:57 Anton Blanchard
  2011-07-04 13:56 ` Benjamin Herrenschmidt
  0 siblings, 1 reply; 10+ messages in thread
From: Anton Blanchard @ 2011-07-04 10:57 UTC (permalink / raw)
  To: benh, brueckner, borntraeger, linuxppc-dev


Hi,

We've been struggling to debug a hang on a large ppc64 box. Every time
we collect oops output there are pieces of the oops output missing and
in some cases entire CPUs are missing.

Eventually I realised the hvc_console driver is dropping characters.
The commit that caused this is:


commit 3feebbb5492e9e463467cefb633e23a3dfcec132
Author: Hendrik Brueckner <brueckner@linux.vnet.ibm.com>
Date:   Mon Oct 13 23:12:50 2008 +0000

    hvc_console: Fix loop if put_char() returns 0
    
    If put_char() routine of a hvc console backend returns 0, then the
    hvc console starts looping in the following scenarios:
    
    1. hvc_console_print()
        If put_char() returns 0 then the while loop may loop forever.
        I have added the missing check for 0 to throw away console
        messages.


The hypervisor gives us a busy return, so we could retry a number of
times instead of dropping it on the floor. We'd need to do it in the
hvc_console driver - the tty drivers share the same backend
functions so we can't hide it in the pseries put_chars function.

Anton

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: hvc_console change results in corrupt oops output
  2011-07-04 10:57 hvc_console change results in corrupt oops output Anton Blanchard
@ 2011-07-04 13:56 ` Benjamin Herrenschmidt
  2011-07-04 14:24   ` Hendrik Brueckner
  0 siblings, 1 reply; 10+ messages in thread
From: Benjamin Herrenschmidt @ 2011-07-04 13:56 UTC (permalink / raw)
  To: Anton Blanchard; +Cc: borntraeger, brueckner, linuxppc-dev

On Mon, 2011-07-04 at 20:57 +1000, Anton Blanchard wrote:

 .../...

> The hypervisor gives us a busy return, so we could retry a number of
> times instead of dropping it on the floor. We'd need to do it in the
> hvc_console driver - the tty drivers share the same backend
> functions so we can't hide it in the pseries put_chars function.

For kernel console, I don't see why not wait forever ... If the
underlying backend really lost the connection it can always return a
negative error no ?

Or we could have "well defined" return codes for "wait forever" vs.
"drop this" ... maybe return -EAGAIN.

Cheers,
Ben.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: hvc_console change results in corrupt oops output
  2011-07-04 13:56 ` Benjamin Herrenschmidt
@ 2011-07-04 14:24   ` Hendrik Brueckner
  2011-07-05  4:17     ` Tabi Timur-B04825
  0 siblings, 1 reply; 10+ messages in thread
From: Hendrik Brueckner @ 2011-07-04 14:24 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: borntraeger, brueckner, linuxppc-dev, Anton Blanchard

On Mon, Jul 04, 2011 at 11:56:27PM +1000, Benjamin Herrenschmidt wrote:
> On Mon, 2011-07-04 at 20:57 +1000, Anton Blanchard wrote:
> 
>  .../...
> 
> > The hypervisor gives us a busy return, so we could retry a number of
> > times instead of dropping it on the floor. We'd need to do it in the
> > hvc_console driver - the tty drivers share the same backend
> > functions so we can't hide it in the pseries put_chars function.
> 
> For kernel console, I don't see why not wait forever ... If the
> underlying backend really lost the connection it can always return a
> negative error no ?
> 
> Or we could have "well defined" return codes for "wait forever" vs.
> "drop this" ... maybe return -EAGAIN.

I will check this again for my hvc_iucv back-end.  Meanwhile a found
an old thread discussing the same issue.  It covers some differences
between console and ttys which actually does not matter for hvc-backend
because of the shared put_chars() routine.

You can read the thread on lkml.org: http://lkml.org/lkml/2009/10/15/149


Kind regards,
  Hendrik

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: hvc_console change results in corrupt oops output
  2011-07-04 14:24   ` Hendrik Brueckner
@ 2011-07-05  4:17     ` Tabi Timur-B04825
  2011-07-05  6:22       ` Benjamin Herrenschmidt
  0 siblings, 1 reply; 10+ messages in thread
From: Tabi Timur-B04825 @ 2011-07-05  4:17 UTC (permalink / raw)
  To: Hendrik Brueckner; +Cc: linuxppc-dev, Anton Blanchard, borntraeger

On Mon, Jul 4, 2011 at 9:24 AM, Hendrik Brueckner
<brueckner@linux.vnet.ibm.com> wrote:

> I will check this again for my hvc_iucv back-end. =A0Meanwhile a found
> an old thread discussing the same issue. =A0It covers some differences
> between console and ttys which actually does not matter for hvc-backend
> because of the shared put_chars() routine.
>
> You can read the thread on lkml.org: http://lkml.org/lkml/2009/10/15/149

I started that thread.  After much soul searching, we came to the
conclusion that HVC is not compatible with hypervisors that return
BUSY on writes.  So I threw out my HVC driver and rewrote it as a
standard console and TTY driver.  That driver is waiting to be
included in the 3.1 kernel.

--=20
Timur Tabi
Linux kernel developer at Freescale=

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: hvc_console change results in corrupt oops output
  2011-07-05  4:17     ` Tabi Timur-B04825
@ 2011-07-05  6:22       ` Benjamin Herrenschmidt
  2011-07-05 13:51         ` Tabi Timur-B04825
  2011-07-05 14:28         ` [RFC] [PATCH] hvc_console: improve tty/console put_chars handling Hendrik Brueckner
  0 siblings, 2 replies; 10+ messages in thread
From: Benjamin Herrenschmidt @ 2011-07-05  6:22 UTC (permalink / raw)
  To: Tabi Timur-B04825
  Cc: borntraeger, Hendrik Brueckner, linuxppc-dev, Anton Blanchard

On Tue, 2011-07-05 at 04:17 +0000, Tabi Timur-B04825 wrote:
> On Mon, Jul 4, 2011 at 9:24 AM, Hendrik Brueckner
> <brueckner@linux.vnet.ibm.com> wrote:
> 
> > I will check this again for my hvc_iucv back-end.  Meanwhile a found
> > an old thread discussing the same issue.  It covers some differences
> > between console and ttys which actually does not matter for hvc-backend
> > because of the shared put_chars() routine.
> >
> > You can read the thread on lkml.org: http://lkml.org/lkml/2009/10/15/149
> 
> I started that thread.  After much soul searching, we came to the
> conclusion that HVC is not compatible with hypervisors that return
> BUSY on writes. 

That is a fun conclusion considering that hvc has been written for the
pseries hypervisor which ... can return BUSY on writes :-)

> So I threw out my HVC driver and rewrote it as a
> standard console and TTY driver.  That driver is waiting to be
> included in the 3.1 kernel.

Sucktastic.

We just need to fix HVC properly.

Cheers,
Ben.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: hvc_console change results in corrupt oops output
  2011-07-05  6:22       ` Benjamin Herrenschmidt
@ 2011-07-05 13:51         ` Tabi Timur-B04825
  2011-07-05 14:28         ` [RFC] [PATCH] hvc_console: improve tty/console put_chars handling Hendrik Brueckner
  1 sibling, 0 replies; 10+ messages in thread
From: Tabi Timur-B04825 @ 2011-07-05 13:51 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: borntraeger, Hendrik Brueckner, linuxppc-dev, Anton Blanchard

Benjamin Herrenschmidt wrote:

> That is a fun conclusion considering that hvc has been written for the
> pseries hypervisor which ... can return BUSY on writes :-)

Go read the original thread.  The problem is that tty writes and console=20
writes are treated the same by the hvc client driver.  If a client driver=20
detects that the hypervisor is busy, it has the choice of either spinning=20
or returning right away.  Spinning is not acceptable for tty output, so=20
all drivers return right away.  hvc then drops the unwritten characters.

According to Hendrik, this is still happening.

> We just need to fix HVC properly.

Where were you two years ago?  I complained about the problem, and even=20
posted a hackish "fix".  The response I got was tepid -- some=20
acknowledgement that the problem exists, but no real desire to fix it by=20
anyone.

So I had no choice but to abandon hvc.  And frankly, I still don't=20
understand why it exists.  Since then, I wrote a very nice console/tty=20
driver, and I have no plans to return to hvc even if the problem is fixed.

--=20
Timur Tabi
Linux kernel developer at Freescale=

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [RFC] [PATCH] hvc_console: improve tty/console put_chars handling
  2011-07-05  6:22       ` Benjamin Herrenschmidt
  2011-07-05 13:51         ` Tabi Timur-B04825
@ 2011-07-05 14:28         ` Hendrik Brueckner
  2011-07-06  7:49           ` Anton Blanchard
                             ` (2 more replies)
  1 sibling, 3 replies; 10+ messages in thread
From: Hendrik Brueckner @ 2011-07-05 14:28 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: borntraeger, Hendrik Brueckner, Tabi Timur-B04825, linuxppc-dev,
	Anton Blanchard

Hi folks,

On Tue, Jul 05, 2011 at 04:22:43PM +1000, Benjamin Herrenschmidt wrote:
> On Tue, 2011-07-05 at 04:17 +0000, Tabi Timur-B04825 wrote:
> > On Mon, Jul 4, 2011 at 9:24 AM, Hendrik Brueckner
> > <brueckner@linux.vnet.ibm.com> wrote:
> > 
> > I started that thread.  After much soul searching, we came to the
> > conclusion that HVC is not compatible with hypervisors that return
> > BUSY on writes. 
> 
> That is a fun conclusion considering that hvc has been written for the
> pseries hypervisor which ... can return BUSY on writes :-)
> 
> We just need to fix HVC properly.

So I took initiative and looked again into this issue.  Below you can
find a patch that is based on Ben's -EAGAIN idea.  The hvc console layer
takes care of retrying depending on the backend's return code.

However, the main issue is that from a backend perspective, there is no
difference between tty and console output.  Because consoles, especially
the preferred console, behave different than a simple tty.  Blocked write
to a preferred console can stop the system.

So with the patch below, the backend can now indirectly control the way
console output is handled for it.  I still have to think if this solution
is ok or if it is better to introduce a new callback to console output only
(and might provide a default implemenatation similar to the patch below).

NOTE: I did not yet test this patch but will do.. I just want to share it
      early to get feedback from you.

-->8---------------------------------------------------------------------

Currently, the hvc_console_print() function drops console output if the
hvc backend's put_chars() returns 0.  This patch changes this behavior
to allow a retry through returning -EAGAIN.

This change also affects the hvc_push() function.  Both functions are
changed to handle -EAGAIN and to retry the put_chars() operation.

If a hvc backend returns -EAGAIN, the retry handling differs:

  - hvc_console_print() spins to write the complete console output.
  - hvc_push() behaves the same way as for returning 0.

Now hvc backends can indirectly control the way how console output is
handled through the hvc console layer.

Signed-off-by: Hendrik Brueckner <brueckner@linux.vnet.ibm.com>
---
 drivers/tty/hvc/hvc_console.c |    8 +++++---
 1 file changed, 5 insertions(+), 3 deletions(-)

--- a/drivers/tty/hvc/hvc_console.c
+++ b/drivers/tty/hvc/hvc_console.c
@@ -163,8 +163,10 @@ static void hvc_console_print(struct con
 		} else {
 			r = cons_ops[index]->put_chars(vtermnos[index], c, i);
 			if (r <= 0) {
-				/* throw away chars on error */
-				i = 0;
+				/* throw away characters on error
+				 * but spin in case of -EAGAIN */
+				if (r != -EAGAIN)
+					i = 0;
 			} else if (r > 0) {
 				i -= r;
 				if (i > 0)
@@ -448,7 +450,7 @@ static int hvc_push(struct hvc_struct *h
 
 	n = hp->ops->put_chars(hp->vtermno, hp->outbuf, hp->n_outbuf);
 	if (n <= 0) {
-		if (n == 0) {
+		if (n == 0 || n == -EAGAIN) {
 			hp->do_wakeup = 1;
 			return 0;
 		}

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC] [PATCH] hvc_console: improve tty/console put_chars handling
  2011-07-05 14:28         ` [RFC] [PATCH] hvc_console: improve tty/console put_chars handling Hendrik Brueckner
@ 2011-07-06  7:49           ` Anton Blanchard
  2011-07-06  7:50           ` [PATCH 1/2]: " Anton Blanchard
  2011-07-06  7:51           ` [PATCH 2/2]: powerpc/pseries/hvconsole: Fix dropped console output Anton Blanchard
  2 siblings, 0 replies; 10+ messages in thread
From: Anton Blanchard @ 2011-07-06  7:49 UTC (permalink / raw)
  To: Hendrik Brueckner; +Cc: Tabi Timur-B04825, linuxppc-dev, borntraeger


Hi Hendrik,

> So with the patch below, the backend can now indirectly control the
> way console output is handled for it.  I still have to think if this
> solution is ok or if it is better to introduce a new callback to
> console output only (and might provide a default implemenatation
> similar to the patch below).
> 
> NOTE: I did not yet test this patch but will do.. I just want to
> share it early to get feedback from you.

Thanks a lot for this. I added the pseries bits and tested it, patches
to follow.

Anton

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH 1/2]: hvc_console: improve tty/console put_chars handling
  2011-07-05 14:28         ` [RFC] [PATCH] hvc_console: improve tty/console put_chars handling Hendrik Brueckner
  2011-07-06  7:49           ` Anton Blanchard
@ 2011-07-06  7:50           ` Anton Blanchard
  2011-07-06  7:51           ` [PATCH 2/2]: powerpc/pseries/hvconsole: Fix dropped console output Anton Blanchard
  2 siblings, 0 replies; 10+ messages in thread
From: Anton Blanchard @ 2011-07-06  7:50 UTC (permalink / raw)
  To: Hendrik Brueckner; +Cc: Tabi Timur-B04825, linuxppc-dev, borntraeger

From: Hendrik Brueckner <brueckner@linux.vnet.ibm.com>

Currently, the hvc_console_print() function drops console output if the
hvc backend's put_chars() returns 0.  This patch changes this behavior
to allow a retry through returning -EAGAIN.

This change also affects the hvc_push() function.  Both functions are
changed to handle -EAGAIN and to retry the put_chars() operation.

If a hvc backend returns -EAGAIN, the retry handling differs:

  - hvc_console_print() spins to write the complete console output.
  - hvc_push() behaves the same way as for returning 0.

Now hvc backends can indirectly control the way how console output is
handled through the hvc console layer.

Signed-off-by: Hendrik Brueckner <brueckner@linux.vnet.ibm.com>
Acked-by: Anton Blanchard <anton@samba.org>
Cc: <stable@kernel.org>
---

Index: linux-powerpc/drivers/tty/hvc/hvc_console.c
===================================================================
--- linux-powerpc.orig/drivers/tty/hvc/hvc_console.c	2011-07-06 17:10:51.288360541 +1000
+++ linux-powerpc/drivers/tty/hvc/hvc_console.c	2011-07-06 17:11:16.398794913 +1000
@@ -163,8 +163,10 @@ static void hvc_console_print(struct con
 		} else {
 			r = cons_ops[index]->put_chars(vtermnos[index], c, i);
 			if (r <= 0) {
-				/* throw away chars on error */
-				i = 0;
+				/* throw away characters on error
+				 * but spin in case of -EAGAIN */
+				if (r != -EAGAIN)
+					i = 0;
 			} else if (r > 0) {
 				i -= r;
 				if (i > 0)
@@ -448,7 +450,7 @@ static int hvc_push(struct hvc_struct *h
 
 	n = hp->ops->put_chars(hp->vtermno, hp->outbuf, hp->n_outbuf);
 	if (n <= 0) {
-		if (n == 0) {
+		if (n == 0 || n == -EAGAIN) {
 			hp->do_wakeup = 1;
 			return 0;
 		}

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH 2/2]: powerpc/pseries/hvconsole: Fix dropped console output
  2011-07-05 14:28         ` [RFC] [PATCH] hvc_console: improve tty/console put_chars handling Hendrik Brueckner
  2011-07-06  7:49           ` Anton Blanchard
  2011-07-06  7:50           ` [PATCH 1/2]: " Anton Blanchard
@ 2011-07-06  7:51           ` Anton Blanchard
  2 siblings, 0 replies; 10+ messages in thread
From: Anton Blanchard @ 2011-07-06  7:51 UTC (permalink / raw)
  To: Hendrik Brueckner; +Cc: Tabi Timur-B04825, linuxppc-dev, borntraeger


Return -EAGAIN when we get H_BUSY back from the hypervisor. This
makes the hvc console driver retry, avoiding dropped printks.

Signed-off-by: Anton Blanchard <anton@samba.org>
Cc: <stable@kernel.org>
---

Index: linux-powerpc/arch/powerpc/platforms/pseries/hvconsole.c
===================================================================
--- linux-powerpc.orig/arch/powerpc/platforms/pseries/hvconsole.c	2011-07-06 17:11:33.799095935 +1000
+++ linux-powerpc/arch/powerpc/platforms/pseries/hvconsole.c	2011-07-06 17:11:47.499332962 +1000
@@ -73,7 +73,7 @@ int hvc_put_chars(uint32_t vtermno, cons
 	if (ret == H_SUCCESS)
 		return count;
 	if (ret == H_BUSY)
-		return 0;
+		return -EAGAIN;
 	return -EIO;
 }
 

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2011-07-06  7:51 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-07-04 10:57 hvc_console change results in corrupt oops output Anton Blanchard
2011-07-04 13:56 ` Benjamin Herrenschmidt
2011-07-04 14:24   ` Hendrik Brueckner
2011-07-05  4:17     ` Tabi Timur-B04825
2011-07-05  6:22       ` Benjamin Herrenschmidt
2011-07-05 13:51         ` Tabi Timur-B04825
2011-07-05 14:28         ` [RFC] [PATCH] hvc_console: improve tty/console put_chars handling Hendrik Brueckner
2011-07-06  7:49           ` Anton Blanchard
2011-07-06  7:50           ` [PATCH 1/2]: " Anton Blanchard
2011-07-06  7:51           ` [PATCH 2/2]: powerpc/pseries/hvconsole: Fix dropped console output Anton Blanchard

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.