qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH] ehci: Ensure that device is not NULL before calling usb_ep_get
@ 2019-07-30 17:45 Guenter Roeck
  2019-07-31 11:08 ` Philippe Mathieu-Daudé
  0 siblings, 1 reply; 11+ messages in thread
From: Guenter Roeck @ 2019-07-30 17:45 UTC (permalink / raw)
  To: Gerd Hoffmann
  Cc: Paolo Bonzini, qemu-devel, Guenter Roeck, Marc-André Lureau

The following assert is seen once in a while while resetting the
Linux kernel.

qemu-system-x86_64: hw/usb/core.c:734: usb_ep_get:
	Assertion `dev != NULL' failed.

The call to usb_ep_get() originates from ehci_execute().
Analysis and debugging shows that p->queue->dev can indeed be NULL
in this function. Add check for this condition and return an error
if it is seen.

Signed-off-by: Guenter Roeck <linux@roeck-us.net>
---
 hw/usb/hcd-ehci.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/hw/usb/hcd-ehci.c b/hw/usb/hcd-ehci.c
index 62dab05..c759f3e 100644
--- a/hw/usb/hcd-ehci.c
+++ b/hw/usb/hcd-ehci.c
@@ -1348,6 +1348,11 @@ static int ehci_execute(EHCIPacket *p, const char *action)
         return -1;
     }
 
+    if (p->queue->dev == NULL) {
+        ehci_trace_guest_bug(p->queue->ehci, "No device attached to queue\n");
+        return -1;
+    }
+
     if (get_field(p->qtd.token, QTD_TOKEN_TBYTES) > BUFF_SIZE) {
         ehci_trace_guest_bug(p->queue->ehci,
                              "guest requested more bytes than allowed");
-- 
2.7.4



^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [Qemu-devel] [PATCH] ehci: Ensure that device is not NULL before calling usb_ep_get
  2019-07-30 17:45 [Qemu-devel] [PATCH] ehci: Ensure that device is not NULL before calling usb_ep_get Guenter Roeck
@ 2019-07-31 11:08 ` Philippe Mathieu-Daudé
  2019-07-31 21:11   ` Guenter Roeck
  2019-08-02 14:11   ` Gerd Hoffmann
  0 siblings, 2 replies; 11+ messages in thread
From: Philippe Mathieu-Daudé @ 2019-07-31 11:08 UTC (permalink / raw)
  To: Guenter Roeck, Gerd Hoffmann
  Cc: Paolo Bonzini, qemu-devel, Marc-André Lureau

On 7/30/19 7:45 PM, Guenter Roeck wrote:
> The following assert is seen once in a while while resetting the
> Linux kernel.
> 
> qemu-system-x86_64: hw/usb/core.c:734: usb_ep_get:
> 	Assertion `dev != NULL' failed.
> 
> The call to usb_ep_get() originates from ehci_execute().
> Analysis and debugging shows that p->queue->dev can indeed be NULL
> in this function. Add check for this condition and return an error
> if it is seen.

Your patch is not wrong as it corrects your case, but I wonder why we
get there. This assert seems to have catched a bug.

Gerd, shouldn't we call usb_packet_cleanup() in ehci_reset() rather than
ehci_finalize()? Then we shouldn't need this patch.

> Signed-off-by: Guenter Roeck <linux@roeck-us.net>
> ---
>  hw/usb/hcd-ehci.c | 5 +++++
>  1 file changed, 5 insertions(+)
> 
> diff --git a/hw/usb/hcd-ehci.c b/hw/usb/hcd-ehci.c
> index 62dab05..c759f3e 100644
> --- a/hw/usb/hcd-ehci.c
> +++ b/hw/usb/hcd-ehci.c
> @@ -1348,6 +1348,11 @@ static int ehci_execute(EHCIPacket *p, const char *action)
>          return -1;
>      }
>  
> +    if (p->queue->dev == NULL) {
> +        ehci_trace_guest_bug(p->queue->ehci, "No device attached to queue\n");
> +        return -1;
> +    }
> +
>      if (get_field(p->qtd.token, QTD_TOKEN_TBYTES) > BUFF_SIZE) {
>          ehci_trace_guest_bug(p->queue->ehci,
>                               "guest requested more bytes than allowed");
> 


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Qemu-devel] [PATCH] ehci: Ensure that device is not NULL before calling usb_ep_get
  2019-07-31 11:08 ` Philippe Mathieu-Daudé
@ 2019-07-31 21:11   ` Guenter Roeck
  2019-08-02 14:11   ` Gerd Hoffmann
  1 sibling, 0 replies; 11+ messages in thread
From: Guenter Roeck @ 2019-07-31 21:11 UTC (permalink / raw)
  To: Philippe Mathieu-Daudé
  Cc: Paolo Bonzini, Gerd Hoffmann, Marc-André Lureau, qemu-devel

On Wed, Jul 31, 2019 at 01:08:50PM +0200, Philippe Mathieu-Daudé wrote:
> On 7/30/19 7:45 PM, Guenter Roeck wrote:
> > The following assert is seen once in a while while resetting the
> > Linux kernel.
> > 
> > qemu-system-x86_64: hw/usb/core.c:734: usb_ep_get:
> > 	Assertion `dev != NULL' failed.
> > 
> > The call to usb_ep_get() originates from ehci_execute().
> > Analysis and debugging shows that p->queue->dev can indeed be NULL
> > in this function. Add check for this condition and return an error
> > if it is seen.
> 
> Your patch is not wrong as it corrects your case, but I wonder why we
> get there. This assert seems to have catched a bug.
> 
> Gerd, shouldn't we call usb_packet_cleanup() in ehci_reset() rather than
> ehci_finalize()? Then we shouldn't need this patch.
> 


If you send me an alternate patch, I'll be happy to insert it
into my build to give it some test coverage.

Thanks,
Guenter

> > Signed-off-by: Guenter Roeck <linux@roeck-us.net>
> > ---
> >  hw/usb/hcd-ehci.c | 5 +++++
> >  1 file changed, 5 insertions(+)
> > 
> > diff --git a/hw/usb/hcd-ehci.c b/hw/usb/hcd-ehci.c
> > index 62dab05..c759f3e 100644
> > --- a/hw/usb/hcd-ehci.c
> > +++ b/hw/usb/hcd-ehci.c
> > @@ -1348,6 +1348,11 @@ static int ehci_execute(EHCIPacket *p, const char *action)
> >          return -1;
> >      }
> >  
> > +    if (p->queue->dev == NULL) {
> > +        ehci_trace_guest_bug(p->queue->ehci, "No device attached to queue\n");
> > +        return -1;
> > +    }
> > +
> >      if (get_field(p->qtd.token, QTD_TOKEN_TBYTES) > BUFF_SIZE) {
> >          ehci_trace_guest_bug(p->queue->ehci,
> >                               "guest requested more bytes than allowed");
> > 


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Qemu-devel] [PATCH] ehci: Ensure that device is not NULL before calling usb_ep_get
  2019-07-31 11:08 ` Philippe Mathieu-Daudé
  2019-07-31 21:11   ` Guenter Roeck
@ 2019-08-02 14:11   ` Gerd Hoffmann
  2019-08-02 16:46     ` Guenter Roeck
  2019-08-06 13:23     ` Guenter Roeck
  1 sibling, 2 replies; 11+ messages in thread
From: Gerd Hoffmann @ 2019-08-02 14:11 UTC (permalink / raw)
  To: Philippe Mathieu-Daudé
  Cc: Paolo Bonzini, qemu-devel, Guenter Roeck, Marc-André Lureau

On Wed, Jul 31, 2019 at 01:08:50PM +0200, Philippe Mathieu-Daudé wrote:
> On 7/30/19 7:45 PM, Guenter Roeck wrote:
> > The following assert is seen once in a while while resetting the
> > Linux kernel.
> > 
> > qemu-system-x86_64: hw/usb/core.c:734: usb_ep_get:
> > 	Assertion `dev != NULL' failed.
> > 
> > The call to usb_ep_get() originates from ehci_execute().
> > Analysis and debugging shows that p->queue->dev can indeed be NULL
> > in this function. Add check for this condition and return an error
> > if it is seen.
> 
> Your patch is not wrong as it corrects your case, but I wonder why we
> get there. This assert seems to have catched a bug.

https://bugzilla.redhat.com//show_bug.cgi?id=1715801 maybe.

> Gerd, shouldn't we call usb_packet_cleanup() in ehci_reset() rather than
> ehci_finalize()? Then we shouldn't need this patch.

The two ehci_queues_rip_all() calls in ehci_reset() should clean up everything
properly.

Can you try the patch below to see whenever a ehci_find_device failure is the
root cause?

thanks,
  Gerd

diff --git a/hw/usb/hcd-ehci.c b/hw/usb/hcd-ehci.c
index 62dab0592fa2..2b0a57772ed5 100644
--- a/hw/usb/hcd-ehci.c
+++ b/hw/usb/hcd-ehci.c
@@ -1644,6 +1644,10 @@ static EHCIQueue *ehci_state_fetchqh(EHCIState *ehci, int async)
         q->dev = ehci_find_device(q->ehci,
                                   get_field(q->qh.epchar, QH_EPCHAR_DEVADDR));
     }
+    if (q->dev == NULL) {
+        fprintf(stderr, "%s: device %d not found\n", __func__,
+                get_field(q->qh.epchar, QH_EPCHAR_DEVADDR));
+    }
 
     if (async && (q->qh.epchar & QH_EPCHAR_H)) {
 


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [Qemu-devel] [PATCH] ehci: Ensure that device is not NULL before calling usb_ep_get
  2019-08-02 14:11   ` Gerd Hoffmann
@ 2019-08-02 16:46     ` Guenter Roeck
  2019-08-02 17:28       ` Guenter Roeck
  2019-08-06 13:23     ` Guenter Roeck
  1 sibling, 1 reply; 11+ messages in thread
From: Guenter Roeck @ 2019-08-02 16:46 UTC (permalink / raw)
  To: Gerd Hoffmann
  Cc: Paolo Bonzini, Philippe Mathieu-Daudé,
	qemu-devel, Marc-André Lureau

On Fri, Aug 02, 2019 at 04:11:49PM +0200, Gerd Hoffmann wrote:
> On Wed, Jul 31, 2019 at 01:08:50PM +0200, Philippe Mathieu-Daudé wrote:
> > On 7/30/19 7:45 PM, Guenter Roeck wrote:
> > > The following assert is seen once in a while while resetting the
> > > Linux kernel.
> > > 
> > > qemu-system-x86_64: hw/usb/core.c:734: usb_ep_get:
> > > 	Assertion `dev != NULL' failed.
> > > 
> > > The call to usb_ep_get() originates from ehci_execute().
> > > Analysis and debugging shows that p->queue->dev can indeed be NULL
> > > in this function. Add check for this condition and return an error
> > > if it is seen.
> > 
> > Your patch is not wrong as it corrects your case, but I wonder why we
> > get there. This assert seems to have catched a bug.
> 
> https://bugzilla.redhat.com//show_bug.cgi?id=1715801 maybe.
> 
> > Gerd, shouldn't we call usb_packet_cleanup() in ehci_reset() rather than
> > ehci_finalize()? Then we shouldn't need this patch.
> 
> The two ehci_queues_rip_all() calls in ehci_reset() should clean up everything
> properly.
> 
> Can you try the patch below to see whenever a ehci_find_device failure is the
> root cause?
> 
> thanks,
>   Gerd
> 
> diff --git a/hw/usb/hcd-ehci.c b/hw/usb/hcd-ehci.c
> index 62dab0592fa2..2b0a57772ed5 100644
> --- a/hw/usb/hcd-ehci.c
> +++ b/hw/usb/hcd-ehci.c
> @@ -1644,6 +1644,10 @@ static EHCIQueue *ehci_state_fetchqh(EHCIState *ehci, int async)
>          q->dev = ehci_find_device(q->ehci,
>                                    get_field(q->qh.epchar, QH_EPCHAR_DEVADDR));
>      }
> +    if (q->dev == NULL) {
> +        fprintf(stderr, "%s: device %d not found\n", __func__,
> +                get_field(q->qh.epchar, QH_EPCHAR_DEVADDR));
> +    }

I had tried that, but this does happen as standard behavior for some
architectures (I didn't write down where exactly since I thought it
must be normal). But, sure, I'll add a log message.

Guenter

>  
>      if (async && (q->qh.epchar & QH_EPCHAR_H)) {
>  


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Qemu-devel] [PATCH] ehci: Ensure that device is not NULL before calling usb_ep_get
  2019-08-02 16:46     ` Guenter Roeck
@ 2019-08-02 17:28       ` Guenter Roeck
  0 siblings, 0 replies; 11+ messages in thread
From: Guenter Roeck @ 2019-08-02 17:28 UTC (permalink / raw)
  To: Gerd Hoffmann
  Cc: Paolo Bonzini, Philippe Mathieu-Daudé,
	qemu-devel, Marc-André Lureau

On Fri, Aug 02, 2019 at 09:46:26AM -0700, Guenter Roeck wrote:
> On Fri, Aug 02, 2019 at 04:11:49PM +0200, Gerd Hoffmann wrote:
> > On Wed, Jul 31, 2019 at 01:08:50PM +0200, Philippe Mathieu-Daudé wrote:
> > > On 7/30/19 7:45 PM, Guenter Roeck wrote:
> > > > The following assert is seen once in a while while resetting the
> > > > Linux kernel.
> > > > 
> > > > qemu-system-x86_64: hw/usb/core.c:734: usb_ep_get:
> > > > 	Assertion `dev != NULL' failed.
> > > > 
> > > > The call to usb_ep_get() originates from ehci_execute().
> > > > Analysis and debugging shows that p->queue->dev can indeed be NULL
> > > > in this function. Add check for this condition and return an error
> > > > if it is seen.
> > > 
> > > Your patch is not wrong as it corrects your case, but I wonder why we
> > > get there. This assert seems to have catched a bug.
> > 
> > https://bugzilla.redhat.com//show_bug.cgi?id=1715801 maybe.
> > 
> > > Gerd, shouldn't we call usb_packet_cleanup() in ehci_reset() rather than
> > > ehci_finalize()? Then we shouldn't need this patch.
> > 
> > The two ehci_queues_rip_all() calls in ehci_reset() should clean up everything
> > properly.
> > 
> > Can you try the patch below to see whenever a ehci_find_device failure is the
> > root cause?
> > 
> > thanks,
> >   Gerd
> > 
> > diff --git a/hw/usb/hcd-ehci.c b/hw/usb/hcd-ehci.c
> > index 62dab0592fa2..2b0a57772ed5 100644
> > --- a/hw/usb/hcd-ehci.c
> > +++ b/hw/usb/hcd-ehci.c
> > @@ -1644,6 +1644,10 @@ static EHCIQueue *ehci_state_fetchqh(EHCIState *ehci, int async)
> >          q->dev = ehci_find_device(q->ehci,
> >                                    get_field(q->qh.epchar, QH_EPCHAR_DEVADDR));
> >      }
> > +    if (q->dev == NULL) {
> > +        fprintf(stderr, "%s: device %d not found\n", __func__,
> > +                get_field(q->qh.epchar, QH_EPCHAR_DEVADDR));
> > +    }
> 
> I had tried that, but this does happen as standard behavior for some
> architectures (I didn't write down where exactly since I thought it
> must be normal). But, sure, I'll add a log message.
> 

With the log message added, I see it a lot when booting riscv64 images
from usb-ehci.

ehci_state_fetchqh: device 0 not found

It looks like this happens for each usb access (a whopping 800+ times
for a simple boot test). So it is definitely a very common condition.
The relevant qemu command line is something like

	-usb -device usb-ehci,id=ehci -device usb-storage,bus=ehci.0,drive=d0 \
	-drive file=rootfs.ext2,if=none,id=d0,format=raw

The image works fine otherwise, so I thought that the condition is normal.

Guenter


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Qemu-devel] [PATCH] ehci: Ensure that device is not NULL before calling usb_ep_get
  2019-08-02 14:11   ` Gerd Hoffmann
  2019-08-02 16:46     ` Guenter Roeck
@ 2019-08-06 13:23     ` Guenter Roeck
  2019-08-13 11:42       ` Gerd Hoffmann
  1 sibling, 1 reply; 11+ messages in thread
From: Guenter Roeck @ 2019-08-06 13:23 UTC (permalink / raw)
  To: Gerd Hoffmann, Philippe Mathieu-Daudé
  Cc: Paolo Bonzini, qemu-devel, Marc-André Lureau

On 8/2/19 7:11 AM, Gerd Hoffmann wrote:
> On Wed, Jul 31, 2019 at 01:08:50PM +0200, Philippe Mathieu-Daudé wrote:
>> On 7/30/19 7:45 PM, Guenter Roeck wrote:
>>> The following assert is seen once in a while while resetting the
>>> Linux kernel.
>>>
>>> qemu-system-x86_64: hw/usb/core.c:734: usb_ep_get:
>>> 	Assertion `dev != NULL' failed.
>>>
>>> The call to usb_ep_get() originates from ehci_execute().
>>> Analysis and debugging shows that p->queue->dev can indeed be NULL
>>> in this function. Add check for this condition and return an error
>>> if it is seen.
>>
>> Your patch is not wrong as it corrects your case, but I wonder why we
>> get there. This assert seems to have catched a bug.
> 
> https://bugzilla.redhat.com//show_bug.cgi?id=1715801 maybe.
> 
>> Gerd, shouldn't we call usb_packet_cleanup() in ehci_reset() rather than
>> ehci_finalize()? Then we shouldn't need this patch.
> 
> The two ehci_queues_rip_all() calls in ehci_reset() should clean up everything
> properly.
> 
> Can you try the patch below to see whenever a ehci_find_device failure is the
> root cause?
> 
> thanks,
>    Gerd
> 
> diff --git a/hw/usb/hcd-ehci.c b/hw/usb/hcd-ehci.c
> index 62dab0592fa2..2b0a57772ed5 100644
> --- a/hw/usb/hcd-ehci.c
> +++ b/hw/usb/hcd-ehci.c
> @@ -1644,6 +1644,10 @@ static EHCIQueue *ehci_state_fetchqh(EHCIState *ehci, int async)
>           q->dev = ehci_find_device(q->ehci,
>                                     get_field(q->qh.epchar, QH_EPCHAR_DEVADDR));
>       }
> +    if (q->dev == NULL) {
> +        fprintf(stderr, "%s: device %d not found\n", __func__,
> +                get_field(q->qh.epchar, QH_EPCHAR_DEVADDR));
> +    }
>   
Turns out I end up seeing that message hundreds of times early on each boot,
no matter which architecture. It is quite obviously a normal operating condition.

Guenter


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Qemu-devel] [PATCH] ehci: Ensure that device is not NULL before calling usb_ep_get
  2019-08-06 13:23     ` Guenter Roeck
@ 2019-08-13 11:42       ` Gerd Hoffmann
  2019-08-14 14:41         ` Guenter Roeck
  2019-08-20 15:38         ` Guenter Roeck
  0 siblings, 2 replies; 11+ messages in thread
From: Gerd Hoffmann @ 2019-08-13 11:42 UTC (permalink / raw)
  To: Guenter Roeck
  Cc: Paolo Bonzini, Philippe Mathieu-Daudé,
	qemu-devel, Marc-André Lureau

On Tue, Aug 06, 2019 at 06:23:38AM -0700, Guenter Roeck wrote:
> On 8/2/19 7:11 AM, Gerd Hoffmann wrote:
> > On Wed, Jul 31, 2019 at 01:08:50PM +0200, Philippe Mathieu-Daudé wrote:
> > > On 7/30/19 7:45 PM, Guenter Roeck wrote:
> > > > The following assert is seen once in a while while resetting the
> > > > Linux kernel.
> > > > 
> > > > qemu-system-x86_64: hw/usb/core.c:734: usb_ep_get:
> > > > 	Assertion `dev != NULL' failed.
> > > > 
> > > > The call to usb_ep_get() originates from ehci_execute().
> > > > Analysis and debugging shows that p->queue->dev can indeed be NULL
> > > > in this function. Add check for this condition and return an error
> > > > if it is seen.
> > > 
> > > Your patch is not wrong as it corrects your case, but I wonder why we
> > > get there. This assert seems to have catched a bug.
> > 
> > https://bugzilla.redhat.com//show_bug.cgi?id=1715801 maybe.
> > 
> > > Gerd, shouldn't we call usb_packet_cleanup() in ehci_reset() rather than
> > > ehci_finalize()? Then we shouldn't need this patch.
> > 
> > The two ehci_queues_rip_all() calls in ehci_reset() should clean up everything
> > properly.
> > 
> > Can you try the patch below to see whenever a ehci_find_device failure is the
> > root cause?
> > 
> > thanks,
> >    Gerd
> > 
> > diff --git a/hw/usb/hcd-ehci.c b/hw/usb/hcd-ehci.c
> > index 62dab0592fa2..2b0a57772ed5 100644
> > --- a/hw/usb/hcd-ehci.c
> > +++ b/hw/usb/hcd-ehci.c
> > @@ -1644,6 +1644,10 @@ static EHCIQueue *ehci_state_fetchqh(EHCIState *ehci, int async)
> >           q->dev = ehci_find_device(q->ehci,
> >                                     get_field(q->qh.epchar, QH_EPCHAR_DEVADDR));
> >       }
> > +    if (q->dev == NULL) {
> > +        fprintf(stderr, "%s: device %d not found\n", __func__,
> > +                get_field(q->qh.epchar, QH_EPCHAR_DEVADDR));
> > +    }
> Turns out I end up seeing that message hundreds of times early on each boot,
> no matter which architecture. It is quite obviously a normal operating condition.

Yep, as long as the queue is not active this is completely harmless.
So we need to check a bit later.  In execute() looks a bit too late
though, we don't have a good backup plan then.

Does the patch below solve the problem without bad side effects?

thanks,
  Gerd

From 5980eaad23f675a2d509d0c55e288793619761bc Mon Sep 17 00:00:00 2001
From: Gerd Hoffmann <kraxel@redhat.com>
Date: Tue, 13 Aug 2019 13:37:09 +0200
Subject: [PATCH] ehci: try fix queue->dev null ptr dereference

Reported-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
---
 hw/usb/hcd-ehci.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/hw/usb/hcd-ehci.c b/hw/usb/hcd-ehci.c
index 62dab0592fa2..5f089f30054b 100644
--- a/hw/usb/hcd-ehci.c
+++ b/hw/usb/hcd-ehci.c
@@ -1834,6 +1834,9 @@ static int ehci_state_fetchqtd(EHCIQueue *q)
             ehci_set_state(q->ehci, q->async, EST_EXECUTING);
             break;
         }
+    } else if (q->dev == NULL) {
+        ehci_trace_guest_bug(q->ehci, "no device attached to queue");
+        ehci_set_state(q->ehci, q->async, EST_HORIZONTALQH);
     } else {
         p = ehci_alloc_packet(q);
         p->qtdaddr = q->qtdaddr;
-- 
2.18.1



^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [Qemu-devel] [PATCH] ehci: Ensure that device is not NULL before calling usb_ep_get
  2019-08-13 11:42       ` Gerd Hoffmann
@ 2019-08-14 14:41         ` Guenter Roeck
  2019-08-20 15:38         ` Guenter Roeck
  1 sibling, 0 replies; 11+ messages in thread
From: Guenter Roeck @ 2019-08-14 14:41 UTC (permalink / raw)
  To: Gerd Hoffmann
  Cc: Paolo Bonzini, Philippe Mathieu-Daudé,
	qemu-devel, Marc-André Lureau

On Tue, Aug 13, 2019 at 01:42:03PM +0200, Gerd Hoffmann wrote:
> On Tue, Aug 06, 2019 at 06:23:38AM -0700, Guenter Roeck wrote:
> > On 8/2/19 7:11 AM, Gerd Hoffmann wrote:
> > > On Wed, Jul 31, 2019 at 01:08:50PM +0200, Philippe Mathieu-Daudé wrote:
> > > > On 7/30/19 7:45 PM, Guenter Roeck wrote:
> > > > > The following assert is seen once in a while while resetting the
> > > > > Linux kernel.
> > > > > 
> > > > > qemu-system-x86_64: hw/usb/core.c:734: usb_ep_get:
> > > > > 	Assertion `dev != NULL' failed.
> > > > > 
> > > > > The call to usb_ep_get() originates from ehci_execute().
> > > > > Analysis and debugging shows that p->queue->dev can indeed be NULL
> > > > > in this function. Add check for this condition and return an error
> > > > > if it is seen.
> > > > 
> > > > Your patch is not wrong as it corrects your case, but I wonder why we
> > > > get there. This assert seems to have catched a bug.
> > > 
> > > https://bugzilla.redhat.com//show_bug.cgi?id=1715801 maybe.
> > > 
> > > > Gerd, shouldn't we call usb_packet_cleanup() in ehci_reset() rather than
> > > > ehci_finalize()? Then we shouldn't need this patch.
> > > 
> > > The two ehci_queues_rip_all() calls in ehci_reset() should clean up everything
> > > properly.
> > > 
> > > Can you try the patch below to see whenever a ehci_find_device failure is the
> > > root cause?
> > > 
> > > thanks,
> > >    Gerd
> > > 
> > > diff --git a/hw/usb/hcd-ehci.c b/hw/usb/hcd-ehci.c
> > > index 62dab0592fa2..2b0a57772ed5 100644
> > > --- a/hw/usb/hcd-ehci.c
> > > +++ b/hw/usb/hcd-ehci.c
> > > @@ -1644,6 +1644,10 @@ static EHCIQueue *ehci_state_fetchqh(EHCIState *ehci, int async)
> > >           q->dev = ehci_find_device(q->ehci,
> > >                                     get_field(q->qh.epchar, QH_EPCHAR_DEVADDR));
> > >       }
> > > +    if (q->dev == NULL) {
> > > +        fprintf(stderr, "%s: device %d not found\n", __func__,
> > > +                get_field(q->qh.epchar, QH_EPCHAR_DEVADDR));
> > > +    }
> > Turns out I end up seeing that message hundreds of times early on each boot,
> > no matter which architecture. It is quite obviously a normal operating condition.
> 
> Yep, as long as the queue is not active this is completely harmless.
> So we need to check a bit later.  In execute() looks a bit too late
> though, we don't have a good backup plan then.
> 
> Does the patch below solve the problem without bad side effects?
> 
I reverted my patch and applied the patch below to my builds of qemu, for both
v4.0 and v4.1, and installed it in my test bed. I'll let you know how it goes.

Thanks,
Guenter

> thanks,
>   Gerd
> 
> From 5980eaad23f675a2d509d0c55e288793619761bc Mon Sep 17 00:00:00 2001
> From: Gerd Hoffmann <kraxel@redhat.com>
> Date: Tue, 13 Aug 2019 13:37:09 +0200
> Subject: [PATCH] ehci: try fix queue->dev null ptr dereference
> 
> Reported-by: Guenter Roeck <linux@roeck-us.net>
> Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
> ---
>  hw/usb/hcd-ehci.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/hw/usb/hcd-ehci.c b/hw/usb/hcd-ehci.c
> index 62dab0592fa2..5f089f30054b 100644
> --- a/hw/usb/hcd-ehci.c
> +++ b/hw/usb/hcd-ehci.c
> @@ -1834,6 +1834,9 @@ static int ehci_state_fetchqtd(EHCIQueue *q)
>              ehci_set_state(q->ehci, q->async, EST_EXECUTING);
>              break;
>          }
> +    } else if (q->dev == NULL) {
> +        ehci_trace_guest_bug(q->ehci, "no device attached to queue");
> +        ehci_set_state(q->ehci, q->async, EST_HORIZONTALQH);
>      } else {
>          p = ehci_alloc_packet(q);
>          p->qtdaddr = q->qtdaddr;
> -- 
> 2.18.1
> 


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Qemu-devel] [PATCH] ehci: Ensure that device is not NULL before calling usb_ep_get
  2019-08-13 11:42       ` Gerd Hoffmann
  2019-08-14 14:41         ` Guenter Roeck
@ 2019-08-20 15:38         ` Guenter Roeck
  2019-08-21  8:54           ` Gerd Hoffmann
  1 sibling, 1 reply; 11+ messages in thread
From: Guenter Roeck @ 2019-08-20 15:38 UTC (permalink / raw)
  To: Gerd Hoffmann
  Cc: Paolo Bonzini, Philippe Mathieu-Daudé,
	qemu-devel, Marc-André Lureau

On 8/13/19 4:42 AM, Gerd Hoffmann wrote:
> On Tue, Aug 06, 2019 at 06:23:38AM -0700, Guenter Roeck wrote:
>> On 8/2/19 7:11 AM, Gerd Hoffmann wrote:
>>> On Wed, Jul 31, 2019 at 01:08:50PM +0200, Philippe Mathieu-Daudé wrote:
>>>> On 7/30/19 7:45 PM, Guenter Roeck wrote:
>>>>> The following assert is seen once in a while while resetting the
>>>>> Linux kernel.
>>>>>
>>>>> qemu-system-x86_64: hw/usb/core.c:734: usb_ep_get:
>>>>> 	Assertion `dev != NULL' failed.
>>>>>
>>>>> The call to usb_ep_get() originates from ehci_execute().
>>>>> Analysis and debugging shows that p->queue->dev can indeed be NULL
>>>>> in this function. Add check for this condition and return an error
>>>>> if it is seen.
>>>>
>>>> Your patch is not wrong as it corrects your case, but I wonder why we
>>>> get there. This assert seems to have catched a bug.
>>>
>>> https://bugzilla.redhat.com//show_bug.cgi?id=1715801 maybe.
>>>
>>>> Gerd, shouldn't we call usb_packet_cleanup() in ehci_reset() rather than
>>>> ehci_finalize()? Then we shouldn't need this patch.
>>>
>>> The two ehci_queues_rip_all() calls in ehci_reset() should clean up everything
>>> properly.
>>>
>>> Can you try the patch below to see whenever a ehci_find_device failure is the
>>> root cause?
>>>
>>> thanks,
>>>     Gerd
>>>
>>> diff --git a/hw/usb/hcd-ehci.c b/hw/usb/hcd-ehci.c
>>> index 62dab0592fa2..2b0a57772ed5 100644
>>> --- a/hw/usb/hcd-ehci.c
>>> +++ b/hw/usb/hcd-ehci.c
>>> @@ -1644,6 +1644,10 @@ static EHCIQueue *ehci_state_fetchqh(EHCIState *ehci, int async)
>>>            q->dev = ehci_find_device(q->ehci,
>>>                                      get_field(q->qh.epchar, QH_EPCHAR_DEVADDR));
>>>        }
>>> +    if (q->dev == NULL) {
>>> +        fprintf(stderr, "%s: device %d not found\n", __func__,
>>> +                get_field(q->qh.epchar, QH_EPCHAR_DEVADDR));
>>> +    }
>> Turns out I end up seeing that message hundreds of times early on each boot,
>> no matter which architecture. It is quite obviously a normal operating condition.
> 
> Yep, as long as the queue is not active this is completely harmless.
> So we need to check a bit later.  In execute() looks a bit too late
> though, we don't have a good backup plan then.
> 
> Does the patch below solve the problem without bad side effects?
> 
> thanks,
>    Gerd
> 
>>From 5980eaad23f675a2d509d0c55e288793619761bc Mon Sep 17 00:00:00 2001
> From: Gerd Hoffmann <kraxel@redhat.com>
> Date: Tue, 13 Aug 2019 13:37:09 +0200
> Subject: [PATCH] ehci: try fix queue->dev null ptr dereference
> 
> Reported-by: Guenter Roeck <linux@roeck-us.net>
> Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
> ---
>   hw/usb/hcd-ehci.c | 3 +++
>   1 file changed, 3 insertions(+)
> 
> diff --git a/hw/usb/hcd-ehci.c b/hw/usb/hcd-ehci.c
> index 62dab0592fa2..5f089f30054b 100644
> --- a/hw/usb/hcd-ehci.c
> +++ b/hw/usb/hcd-ehci.c
> @@ -1834,6 +1834,9 @@ static int ehci_state_fetchqtd(EHCIQueue *q)
>               ehci_set_state(q->ehci, q->async, EST_EXECUTING);
>               break;
>           }
> +    } else if (q->dev == NULL) {
> +        ehci_trace_guest_bug(q->ehci, "no device attached to queue");
> +        ehci_set_state(q->ehci, q->async, EST_HORIZONTALQH);
>       } else {
>           p = ehci_alloc_packet(q);
>           p->qtdaddr = q->qtdaddr;
> 

That seems to be working as intended. I have not seen a crash
since I applied it. I tested it on top of v4.0 and v4.1.

Tested-by: Guenter Roeck <linux@roeck-us.net>

Guenter



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Qemu-devel] [PATCH] ehci: Ensure that device is not NULL before calling usb_ep_get
  2019-08-20 15:38         ` Guenter Roeck
@ 2019-08-21  8:54           ` Gerd Hoffmann
  0 siblings, 0 replies; 11+ messages in thread
From: Gerd Hoffmann @ 2019-08-21  8:54 UTC (permalink / raw)
  To: Guenter Roeck
  Cc: Paolo Bonzini, Philippe Mathieu-Daudé,
	qemu-devel, Marc-André Lureau

  Hi,

> > Yep, as long as the queue is not active this is completely harmless.
> > So we need to check a bit later.  In execute() looks a bit too late
> > though, we don't have a good backup plan then.
> > 
> > Does the patch below solve the problem without bad side effects?

> That seems to be working as intended. I have not seen a crash
> since I applied it. I tested it on top of v4.0 and v4.1.

Thanks.  Send as formal patch now.

cheers,
  Gerd



^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2019-08-21  8:56 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-07-30 17:45 [Qemu-devel] [PATCH] ehci: Ensure that device is not NULL before calling usb_ep_get Guenter Roeck
2019-07-31 11:08 ` Philippe Mathieu-Daudé
2019-07-31 21:11   ` Guenter Roeck
2019-08-02 14:11   ` Gerd Hoffmann
2019-08-02 16:46     ` Guenter Roeck
2019-08-02 17:28       ` Guenter Roeck
2019-08-06 13:23     ` Guenter Roeck
2019-08-13 11:42       ` Gerd Hoffmann
2019-08-14 14:41         ` Guenter Roeck
2019-08-20 15:38         ` Guenter Roeck
2019-08-21  8:54           ` Gerd Hoffmann

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).