All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [RFC] e1000: Don't save writes to ICS/ICR masked by IMS
@ 2016-09-01 17:57 Ed Swierk
  2016-09-13 20:59 ` Konrad Rzeszutek Wilk
  2016-09-13 20:59 ` [Qemu-devel] [Xen-devel] " Konrad Rzeszutek Wilk
  0 siblings, 2 replies; 9+ messages in thread
From: Ed Swierk @ 2016-09-01 17:57 UTC (permalink / raw)
  To: qemu-devel; +Cc: xen-devel

Windows 8, 10 and Server 2012 guests hang intermittently while booting
on Xen 4.5.3 with 1 vCPU and 4 e1000 vNICs, shortly after the Windows
logo appears and the little dots start spinning.

Running strace on qemu shows its main thread doing the following every
couple of milliseconds:

 ppoll([..., {fd=30</dev/xen/evtchn>, events=POLLIN|POLLERR|POLLHUP},
        ...], ...) = 1 ([{fd=30, revents=POLLIN}], ...)
 read(30</dev/xen/evtchn>, "^\0\0\0", 4) = 4
 write(30</dev/xen/evtchn>, "^\0\0\0", 4) = 4
 ioctl(30</dev/xen/evtchn>, IOCTL_EVTCHN_NOTIFY, 0x7f1f9449d310) = 0
 clock_gettime(CLOCK_MONOTONIC, {6937, 449468262}) = 0
 clock_gettime(CLOCK_MONOTONIC, {6937, 449582903}) = 0
 gettimeofday({1472251376, 673434}, NULL) = 0
 clock_gettime(CLOCK_MONOTONIC, {6937, 449856205}) = 0
 gettimeofday({1472251376, 673679}, NULL) = 0

The event channel (identified by '^' or 94 in this example) is always
the third of the domain's four channels.

Two recent qemu patches (http://git.qemu.org/?p=qemu.git;h=9596ef7c and
http://git.qemu.org/?p=qemu.git;h=74004e8c) seem to address similar
issues, but don't help in this case.

The proposed fix from
https://bugzilla.redhat.com/show_bug.cgi?id=874406#c78 makes the hang
go away. It's not clear to me why it works, or if it's just papering
over a bug elsewhere, or if there are any possible side effects.

Suggested-by: Andrew Jones <drjones@redhat.com>
Signed-off-by: Ed Swierk <eswierk@skyportsystems.com>

diff --git a/hw/net/e1000.c b/hw/net/e1000.c
index 6eac66d..c891b67 100644
--- a/hw/net/e1000.c
+++ b/hw/net/e1000.c
@@ -293,6 +293,8 @@ set_interrupt_cause(E1000State *s, int index, uint32_t val)
     uint32_t pending_ints;
     uint32_t mit_delay;
 
+    val &= s->mac_reg[IMS];
+
     s->mac_reg[ICR] = val;
 
     /*
@@ -305,7 +307,7 @@ set_interrupt_cause(E1000State *s, int index, uint32_t val)
      */
     s->mac_reg[ICS] = val;
 
-    pending_ints = (s->mac_reg[IMS] & s->mac_reg[ICR]);
+    pending_ints = s->mac_reg[ICR];
     if (!s->mit_irq_level && pending_ints) {
         /*
          * Here we detect a potential raising edge. We postpone raising the

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [Qemu-devel] [Xen-devel] [RFC] e1000: Don't save writes to ICS/ICR masked by IMS
  2016-09-01 17:57 [Qemu-devel] [RFC] e1000: Don't save writes to ICS/ICR masked by IMS Ed Swierk
  2016-09-13 20:59 ` Konrad Rzeszutek Wilk
@ 2016-09-13 20:59 ` Konrad Rzeszutek Wilk
  2016-09-15  9:15     ` Denis V. Lunev
  1 sibling, 1 reply; 9+ messages in thread
From: Konrad Rzeszutek Wilk @ 2016-09-13 20:59 UTC (permalink / raw)
  To: Ed Swierk, den; +Cc: qemu-devel, xen-devel

On Thu, Sep 01, 2016 at 10:57:48AM -0700, Ed Swierk wrote:
> Windows 8, 10 and Server 2012 guests hang intermittently while booting
> on Xen 4.5.3 with 1 vCPU and 4 e1000 vNICs, shortly after the Windows
> logo appears and the little dots start spinning.
> 
> Running strace on qemu shows its main thread doing the following every
> couple of milliseconds:
> 
>  ppoll([..., {fd=30</dev/xen/evtchn>, events=POLLIN|POLLERR|POLLHUP},
>         ...], ...) = 1 ([{fd=30, revents=POLLIN}], ...)
>  read(30</dev/xen/evtchn>, "^\0\0\0", 4) = 4
>  write(30</dev/xen/evtchn>, "^\0\0\0", 4) = 4
>  ioctl(30</dev/xen/evtchn>, IOCTL_EVTCHN_NOTIFY, 0x7f1f9449d310) = 0
>  clock_gettime(CLOCK_MONOTONIC, {6937, 449468262}) = 0
>  clock_gettime(CLOCK_MONOTONIC, {6937, 449582903}) = 0
>  gettimeofday({1472251376, 673434}, NULL) = 0
>  clock_gettime(CLOCK_MONOTONIC, {6937, 449856205}) = 0
>  gettimeofday({1472251376, 673679}, NULL) = 0
> 
> The event channel (identified by '^' or 94 in this example) is always
> the third of the domain's four channels.
> 
> Two recent qemu patches (http://git.qemu.org/?p=qemu.git;h=9596ef7c and
> http://git.qemu.org/?p=qemu.git;h=74004e8c) seem to address similar
> issues, but don't help in this case.
> 
> The proposed fix from
> https://bugzilla.redhat.com/show_bug.cgi?id=874406#c78 makes the hang
> go away. It's not clear to me why it works, or if it's just papering
> over a bug elsewhere, or if there are any possible side effects.

CC-ing Denis.


Is the fix below based on reading the spec or more of instrumenting?

Thanks.
> 
> Suggested-by: Andrew Jones <drjones@redhat.com>
> Signed-off-by: Ed Swierk <eswierk@skyportsystems.com>
> 
> diff --git a/hw/net/e1000.c b/hw/net/e1000.c
> index 6eac66d..c891b67 100644
> --- a/hw/net/e1000.c
> +++ b/hw/net/e1000.c
> @@ -293,6 +293,8 @@ set_interrupt_cause(E1000State *s, int index, uint32_t val)
>      uint32_t pending_ints;
>      uint32_t mit_delay;
>  
> +    val &= s->mac_reg[IMS];
> +
>      s->mac_reg[ICR] = val;
>  
>      /*
> @@ -305,7 +307,7 @@ set_interrupt_cause(E1000State *s, int index, uint32_t val)
>       */
>      s->mac_reg[ICS] = val;
>  
> -    pending_ints = (s->mac_reg[IMS] & s->mac_reg[ICR]);
> +    pending_ints = s->mac_reg[ICR];
>      if (!s->mit_irq_level && pending_ints) {
>          /*
>           * Here we detect a potential raising edge. We postpone raising the
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [RFC] e1000: Don't save writes to ICS/ICR masked by IMS
  2016-09-01 17:57 [Qemu-devel] [RFC] e1000: Don't save writes to ICS/ICR masked by IMS Ed Swierk
@ 2016-09-13 20:59 ` Konrad Rzeszutek Wilk
  2016-09-13 20:59 ` [Qemu-devel] [Xen-devel] " Konrad Rzeszutek Wilk
  1 sibling, 0 replies; 9+ messages in thread
From: Konrad Rzeszutek Wilk @ 2016-09-13 20:59 UTC (permalink / raw)
  To: Ed Swierk, den; +Cc: xen-devel, qemu-devel

On Thu, Sep 01, 2016 at 10:57:48AM -0700, Ed Swierk wrote:
> Windows 8, 10 and Server 2012 guests hang intermittently while booting
> on Xen 4.5.3 with 1 vCPU and 4 e1000 vNICs, shortly after the Windows
> logo appears and the little dots start spinning.
> 
> Running strace on qemu shows its main thread doing the following every
> couple of milliseconds:
> 
>  ppoll([..., {fd=30</dev/xen/evtchn>, events=POLLIN|POLLERR|POLLHUP},
>         ...], ...) = 1 ([{fd=30, revents=POLLIN}], ...)
>  read(30</dev/xen/evtchn>, "^\0\0\0", 4) = 4
>  write(30</dev/xen/evtchn>, "^\0\0\0", 4) = 4
>  ioctl(30</dev/xen/evtchn>, IOCTL_EVTCHN_NOTIFY, 0x7f1f9449d310) = 0
>  clock_gettime(CLOCK_MONOTONIC, {6937, 449468262}) = 0
>  clock_gettime(CLOCK_MONOTONIC, {6937, 449582903}) = 0
>  gettimeofday({1472251376, 673434}, NULL) = 0
>  clock_gettime(CLOCK_MONOTONIC, {6937, 449856205}) = 0
>  gettimeofday({1472251376, 673679}, NULL) = 0
> 
> The event channel (identified by '^' or 94 in this example) is always
> the third of the domain's four channels.
> 
> Two recent qemu patches (http://git.qemu.org/?p=qemu.git;h=9596ef7c and
> http://git.qemu.org/?p=qemu.git;h=74004e8c) seem to address similar
> issues, but don't help in this case.
> 
> The proposed fix from
> https://bugzilla.redhat.com/show_bug.cgi?id=874406#c78 makes the hang
> go away. It's not clear to me why it works, or if it's just papering
> over a bug elsewhere, or if there are any possible side effects.

CC-ing Denis.


Is the fix below based on reading the spec or more of instrumenting?

Thanks.
> 
> Suggested-by: Andrew Jones <drjones@redhat.com>
> Signed-off-by: Ed Swierk <eswierk@skyportsystems.com>
> 
> diff --git a/hw/net/e1000.c b/hw/net/e1000.c
> index 6eac66d..c891b67 100644
> --- a/hw/net/e1000.c
> +++ b/hw/net/e1000.c
> @@ -293,6 +293,8 @@ set_interrupt_cause(E1000State *s, int index, uint32_t val)
>      uint32_t pending_ints;
>      uint32_t mit_delay;
>  
> +    val &= s->mac_reg[IMS];
> +
>      s->mac_reg[ICR] = val;
>  
>      /*
> @@ -305,7 +307,7 @@ set_interrupt_cause(E1000State *s, int index, uint32_t val)
>       */
>      s->mac_reg[ICS] = val;
>  
> -    pending_ints = (s->mac_reg[IMS] & s->mac_reg[ICR]);
> +    pending_ints = s->mac_reg[ICR];
>      if (!s->mit_irq_level && pending_ints) {
>          /*
>           * Here we detect a potential raising edge. We postpone raising the
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> https://lists.xen.org/xen-devel

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Qemu-devel] [Xen-devel] [RFC] e1000: Don't save writes to ICS/ICR masked by IMS
  2016-09-13 20:59 ` [Qemu-devel] [Xen-devel] " Konrad Rzeszutek Wilk
@ 2016-09-15  9:15     ` Denis V. Lunev
  0 siblings, 0 replies; 9+ messages in thread
From: Denis V. Lunev @ 2016-09-15  9:15 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk, Ed Swierk; +Cc: qemu-devel, xen-devel

On 09/13/2016 11:59 PM, Konrad Rzeszutek Wilk wrote:
> On Thu, Sep 01, 2016 at 10:57:48AM -0700, Ed Swierk wrote:
>> Windows 8, 10 and Server 2012 guests hang intermittently while booting
>> on Xen 4.5.3 with 1 vCPU and 4 e1000 vNICs, shortly after the Windows
>> logo appears and the little dots start spinning.
>>
>> Running strace on qemu shows its main thread doing the following every
>> couple of milliseconds:
>>
>>  ppoll([..., {fd=30</dev/xen/evtchn>, events=POLLIN|POLLERR|POLLHUP},
>>         ...], ...) = 1 ([{fd=30, revents=POLLIN}], ...)
>>  read(30</dev/xen/evtchn>, "^\0\0\0", 4) = 4
>>  write(30</dev/xen/evtchn>, "^\0\0\0", 4) = 4
>>  ioctl(30</dev/xen/evtchn>, IOCTL_EVTCHN_NOTIFY, 0x7f1f9449d310) = 0
>>  clock_gettime(CLOCK_MONOTONIC, {6937, 449468262}) = 0
>>  clock_gettime(CLOCK_MONOTONIC, {6937, 449582903}) = 0
>>  gettimeofday({1472251376, 673434}, NULL) = 0
>>  clock_gettime(CLOCK_MONOTONIC, {6937, 449856205}) = 0
>>  gettimeofday({1472251376, 673679}, NULL) = 0
>>
>> The event channel (identified by '^' or 94 in this example) is always
>> the third of the domain's four channels.
>>
>> Two recent qemu patches (http://git.qemu.org/?p=qemu.git;h=9596ef7c and
>> http://git.qemu.org/?p=qemu.git;h=74004e8c) seem to address similar
>> issues, but don't help in this case.
>>
>> The proposed fix from
>> https://bugzilla.redhat.com/show_bug.cgi?id=874406#c78 makes the hang
>> go away. It's not clear to me why it works, or if it's just papering
>> over a bug elsewhere, or if there are any possible side effects.
> CC-ing Denis.
>
>
> Is the fix below based on reading the spec or more of instrumenting?
>
> Thanks.

hmm. I have looked in our older code (completely separate
from QEMU). It does not have this trick.

2012r2 (as far as I remember) has a bug in the driver
when LSC interrupt was raised unexpectedly during driver
initialization. The original bug was about TXQE. Can you
pls confirm which interrupt causes the storm?

Den

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [RFC] e1000: Don't save writes to ICS/ICR masked by IMS
@ 2016-09-15  9:15     ` Denis V. Lunev
  0 siblings, 0 replies; 9+ messages in thread
From: Denis V. Lunev @ 2016-09-15  9:15 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk, Ed Swierk; +Cc: xen-devel, qemu-devel

On 09/13/2016 11:59 PM, Konrad Rzeszutek Wilk wrote:
> On Thu, Sep 01, 2016 at 10:57:48AM -0700, Ed Swierk wrote:
>> Windows 8, 10 and Server 2012 guests hang intermittently while booting
>> on Xen 4.5.3 with 1 vCPU and 4 e1000 vNICs, shortly after the Windows
>> logo appears and the little dots start spinning.
>>
>> Running strace on qemu shows its main thread doing the following every
>> couple of milliseconds:
>>
>>  ppoll([..., {fd=30</dev/xen/evtchn>, events=POLLIN|POLLERR|POLLHUP},
>>         ...], ...) = 1 ([{fd=30, revents=POLLIN}], ...)
>>  read(30</dev/xen/evtchn>, "^\0\0\0", 4) = 4
>>  write(30</dev/xen/evtchn>, "^\0\0\0", 4) = 4
>>  ioctl(30</dev/xen/evtchn>, IOCTL_EVTCHN_NOTIFY, 0x7f1f9449d310) = 0
>>  clock_gettime(CLOCK_MONOTONIC, {6937, 449468262}) = 0
>>  clock_gettime(CLOCK_MONOTONIC, {6937, 449582903}) = 0
>>  gettimeofday({1472251376, 673434}, NULL) = 0
>>  clock_gettime(CLOCK_MONOTONIC, {6937, 449856205}) = 0
>>  gettimeofday({1472251376, 673679}, NULL) = 0
>>
>> The event channel (identified by '^' or 94 in this example) is always
>> the third of the domain's four channels.
>>
>> Two recent qemu patches (http://git.qemu.org/?p=qemu.git;h=9596ef7c and
>> http://git.qemu.org/?p=qemu.git;h=74004e8c) seem to address similar
>> issues, but don't help in this case.
>>
>> The proposed fix from
>> https://bugzilla.redhat.com/show_bug.cgi?id=874406#c78 makes the hang
>> go away. It's not clear to me why it works, or if it's just papering
>> over a bug elsewhere, or if there are any possible side effects.
> CC-ing Denis.
>
>
> Is the fix below based on reading the spec or more of instrumenting?
>
> Thanks.

hmm. I have looked in our older code (completely separate
from QEMU). It does not have this trick.

2012r2 (as far as I remember) has a bug in the driver
when LSC interrupt was raised unexpectedly during driver
initialization. The original bug was about TXQE. Can you
pls confirm which interrupt causes the storm?

Den


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Qemu-devel] [Xen-devel] [RFC] e1000: Don't save writes to ICS/ICR masked by IMS
  2016-09-15  9:15     ` Denis V. Lunev
@ 2016-09-15 12:22       ` Ed Swierk
  -1 siblings, 0 replies; 9+ messages in thread
From: Ed Swierk @ 2016-09-15 12:22 UTC (permalink / raw)
  To: Denis V. Lunev; +Cc: Konrad Rzeszutek Wilk, qemu-devel, xen-devel

On Thu, Sep 15, 2016 at 2:15 AM, Denis V. Lunev <den@openvz.org> wrote:
> On 09/13/2016 11:59 PM, Konrad Rzeszutek Wilk wrote:
> > On Thu, Sep 01, 2016 at 10:57:48AM -0700, Ed Swierk wrote:
> >> Windows 8, 10 and Server 2012 guests hang intermittently while booting
> >> on Xen 4.5.3 with 1 vCPU and 4 e1000 vNICs, shortly after the Windows
> >> logo appears and the little dots start spinning.
> >>
> >> Running strace on qemu shows its main thread doing the following every
> >> couple of milliseconds:
> >>
> >>  ppoll([..., {fd=30</dev/xen/evtchn>, events=POLLIN|POLLERR|POLLHUP},
> >>         ...], ...) = 1 ([{fd=30, revents=POLLIN}], ...)
> >>  read(30</dev/xen/evtchn>, "^\0\0\0", 4) = 4
> >>  write(30</dev/xen/evtchn>, "^\0\0\0", 4) = 4
> >>  ioctl(30</dev/xen/evtchn>, IOCTL_EVTCHN_NOTIFY, 0x7f1f9449d310) = 0
> >>  clock_gettime(CLOCK_MONOTONIC, {6937, 449468262}) = 0
> >>  clock_gettime(CLOCK_MONOTONIC, {6937, 449582903}) = 0
> >>  gettimeofday({1472251376, 673434}, NULL) = 0
> >>  clock_gettime(CLOCK_MONOTONIC, {6937, 449856205}) = 0
> >>  gettimeofday({1472251376, 673679}, NULL) = 0
> >>
> >> The event channel (identified by '^' or 94 in this example) is always
> >> the third of the domain's four channels.
> >>
> >> Two recent qemu patches (http://git.qemu.org/?p=qemu.git;h=9596ef7c and
> >> http://git.qemu.org/?p=qemu.git;h=74004e8c) seem to address similar
> >> issues, but don't help in this case.
> >>
> >> The proposed fix from
> >> https://bugzilla.redhat.com/show_bug.cgi?id=874406#c78 makes the hang
> >> go away. It's not clear to me why it works, or if it's just papering
> >> over a bug elsewhere, or if there are any possible side effects.
> > CC-ing Denis.
> >
> >
> > Is the fix below based on reading the spec or more of instrumenting?
> >
> > Thanks.
>
> hmm. I have looked in our older code (completely separate
> from QEMU). It does not have this trick.
>
> 2012r2 (as far as I remember) has a bug in the driver
> when LSC interrupt was raised unexpectedly during driver
> initialization. The original bug was about TXQE. Can you
> pls confirm which interrupt causes the storm?

How can I tell which interrupt is firing? Instrument the QEMU e1000 code?

--Ed

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [RFC] e1000: Don't save writes to ICS/ICR masked by IMS
@ 2016-09-15 12:22       ` Ed Swierk
  0 siblings, 0 replies; 9+ messages in thread
From: Ed Swierk @ 2016-09-15 12:22 UTC (permalink / raw)
  To: Denis V. Lunev; +Cc: xen-devel, qemu-devel

On Thu, Sep 15, 2016 at 2:15 AM, Denis V. Lunev <den@openvz.org> wrote:
> On 09/13/2016 11:59 PM, Konrad Rzeszutek Wilk wrote:
> > On Thu, Sep 01, 2016 at 10:57:48AM -0700, Ed Swierk wrote:
> >> Windows 8, 10 and Server 2012 guests hang intermittently while booting
> >> on Xen 4.5.3 with 1 vCPU and 4 e1000 vNICs, shortly after the Windows
> >> logo appears and the little dots start spinning.
> >>
> >> Running strace on qemu shows its main thread doing the following every
> >> couple of milliseconds:
> >>
> >>  ppoll([..., {fd=30</dev/xen/evtchn>, events=POLLIN|POLLERR|POLLHUP},
> >>         ...], ...) = 1 ([{fd=30, revents=POLLIN}], ...)
> >>  read(30</dev/xen/evtchn>, "^\0\0\0", 4) = 4
> >>  write(30</dev/xen/evtchn>, "^\0\0\0", 4) = 4
> >>  ioctl(30</dev/xen/evtchn>, IOCTL_EVTCHN_NOTIFY, 0x7f1f9449d310) = 0
> >>  clock_gettime(CLOCK_MONOTONIC, {6937, 449468262}) = 0
> >>  clock_gettime(CLOCK_MONOTONIC, {6937, 449582903}) = 0
> >>  gettimeofday({1472251376, 673434}, NULL) = 0
> >>  clock_gettime(CLOCK_MONOTONIC, {6937, 449856205}) = 0
> >>  gettimeofday({1472251376, 673679}, NULL) = 0
> >>
> >> The event channel (identified by '^' or 94 in this example) is always
> >> the third of the domain's four channels.
> >>
> >> Two recent qemu patches (http://git.qemu.org/?p=qemu.git;h=9596ef7c and
> >> http://git.qemu.org/?p=qemu.git;h=74004e8c) seem to address similar
> >> issues, but don't help in this case.
> >>
> >> The proposed fix from
> >> https://bugzilla.redhat.com/show_bug.cgi?id=874406#c78 makes the hang
> >> go away. It's not clear to me why it works, or if it's just papering
> >> over a bug elsewhere, or if there are any possible side effects.
> > CC-ing Denis.
> >
> >
> > Is the fix below based on reading the spec or more of instrumenting?
> >
> > Thanks.
>
> hmm. I have looked in our older code (completely separate
> from QEMU). It does not have this trick.
>
> 2012r2 (as far as I remember) has a bug in the driver
> when LSC interrupt was raised unexpectedly during driver
> initialization. The original bug was about TXQE. Can you
> pls confirm which interrupt causes the storm?

How can I tell which interrupt is firing? Instrument the QEMU e1000 code?

--Ed

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Qemu-devel] [Xen-devel] [RFC] e1000: Don't save writes to ICS/ICR masked by IMS
  2016-09-15 12:22       ` Ed Swierk
@ 2016-09-15 12:46         ` Denis V. Lunev
  -1 siblings, 0 replies; 9+ messages in thread
From: Denis V. Lunev @ 2016-09-15 12:46 UTC (permalink / raw)
  To: Ed Swierk, Denis V. Lunev; +Cc: xen-devel, qemu-devel, Konrad Rzeszutek Wilk

On 09/15/2016 03:22 PM, Ed Swierk wrote:
> On Thu, Sep 15, 2016 at 2:15 AM, Denis V. Lunev <den@openvz.org> wrote:
>> On 09/13/2016 11:59 PM, Konrad Rzeszutek Wilk wrote:
>>> On Thu, Sep 01, 2016 at 10:57:48AM -0700, Ed Swierk wrote:
>>>> Windows 8, 10 and Server 2012 guests hang intermittently while booting
>>>> on Xen 4.5.3 with 1 vCPU and 4 e1000 vNICs, shortly after the Windows
>>>> logo appears and the little dots start spinning.
>>>>
>>>> Running strace on qemu shows its main thread doing the following every
>>>> couple of milliseconds:
>>>>
>>>>  ppoll([..., {fd=30</dev/xen/evtchn>, events=POLLIN|POLLERR|POLLHUP},
>>>>         ...], ...) = 1 ([{fd=30, revents=POLLIN}], ...)
>>>>  read(30</dev/xen/evtchn>, "^\0\0\0", 4) = 4
>>>>  write(30</dev/xen/evtchn>, "^\0\0\0", 4) = 4
>>>>  ioctl(30</dev/xen/evtchn>, IOCTL_EVTCHN_NOTIFY, 0x7f1f9449d310) = 0
>>>>  clock_gettime(CLOCK_MONOTONIC, {6937, 449468262}) = 0
>>>>  clock_gettime(CLOCK_MONOTONIC, {6937, 449582903}) = 0
>>>>  gettimeofday({1472251376, 673434}, NULL) = 0
>>>>  clock_gettime(CLOCK_MONOTONIC, {6937, 449856205}) = 0
>>>>  gettimeofday({1472251376, 673679}, NULL) = 0
>>>>
>>>> The event channel (identified by '^' or 94 in this example) is always
>>>> the third of the domain's four channels.
>>>>
>>>> Two recent qemu patches (http://git.qemu.org/?p=qemu.git;h=9596ef7c and
>>>> http://git.qemu.org/?p=qemu.git;h=74004e8c) seem to address similar
>>>> issues, but don't help in this case.
>>>>
>>>> The proposed fix from
>>>> https://bugzilla.redhat.com/show_bug.cgi?id=874406#c78 makes the hang
>>>> go away. It's not clear to me why it works, or if it's just papering
>>>> over a bug elsewhere, or if there are any possible side effects.
>>> CC-ing Denis.
>>>
>>>
>>> Is the fix below based on reading the spec or more of instrumenting?
>>>
>>> Thanks.
>> hmm. I have looked in our older code (completely separate
>> from QEMU). It does not have this trick.
>>
>> 2012r2 (as far as I remember) has a bug in the driver
>> when LSC interrupt was raised unexpectedly during driver
>> initialization. The original bug was about TXQE. Can you
>> pls confirm which interrupt causes the storm?
> How can I tell which interrupt is firing? Instrument the QEMU e1000 code?
>
> --Ed
>
connect with gdb to qemu, set break into set_interrupt_cause
and dump value of  s->mac_reg[ICS], s->mac_reg[IMS] and s->mac_reg[ICR]

Bits are defined here hw/net/e1000_regs.h

/* Interrupt Cause Read */
#define E1000_ICR_TXDW          0x00000001 /* Transmit desc written back */
#define E1000_ICR_TXQE          0x00000002 /* Transmit Queue empty */

Den

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Qemu-devel] [RFC] e1000: Don't save writes to ICS/ICR masked by IMS
@ 2016-09-15 12:46         ` Denis V. Lunev
  0 siblings, 0 replies; 9+ messages in thread
From: Denis V. Lunev @ 2016-09-15 12:46 UTC (permalink / raw)
  To: Ed Swierk, Denis V. Lunev; +Cc: xen-devel, qemu-devel

On 09/15/2016 03:22 PM, Ed Swierk wrote:
> On Thu, Sep 15, 2016 at 2:15 AM, Denis V. Lunev <den@openvz.org> wrote:
>> On 09/13/2016 11:59 PM, Konrad Rzeszutek Wilk wrote:
>>> On Thu, Sep 01, 2016 at 10:57:48AM -0700, Ed Swierk wrote:
>>>> Windows 8, 10 and Server 2012 guests hang intermittently while booting
>>>> on Xen 4.5.3 with 1 vCPU and 4 e1000 vNICs, shortly after the Windows
>>>> logo appears and the little dots start spinning.
>>>>
>>>> Running strace on qemu shows its main thread doing the following every
>>>> couple of milliseconds:
>>>>
>>>>  ppoll([..., {fd=30</dev/xen/evtchn>, events=POLLIN|POLLERR|POLLHUP},
>>>>         ...], ...) = 1 ([{fd=30, revents=POLLIN}], ...)
>>>>  read(30</dev/xen/evtchn>, "^\0\0\0", 4) = 4
>>>>  write(30</dev/xen/evtchn>, "^\0\0\0", 4) = 4
>>>>  ioctl(30</dev/xen/evtchn>, IOCTL_EVTCHN_NOTIFY, 0x7f1f9449d310) = 0
>>>>  clock_gettime(CLOCK_MONOTONIC, {6937, 449468262}) = 0
>>>>  clock_gettime(CLOCK_MONOTONIC, {6937, 449582903}) = 0
>>>>  gettimeofday({1472251376, 673434}, NULL) = 0
>>>>  clock_gettime(CLOCK_MONOTONIC, {6937, 449856205}) = 0
>>>>  gettimeofday({1472251376, 673679}, NULL) = 0
>>>>
>>>> The event channel (identified by '^' or 94 in this example) is always
>>>> the third of the domain's four channels.
>>>>
>>>> Two recent qemu patches (http://git.qemu.org/?p=qemu.git;h=9596ef7c and
>>>> http://git.qemu.org/?p=qemu.git;h=74004e8c) seem to address similar
>>>> issues, but don't help in this case.
>>>>
>>>> The proposed fix from
>>>> https://bugzilla.redhat.com/show_bug.cgi?id=874406#c78 makes the hang
>>>> go away. It's not clear to me why it works, or if it's just papering
>>>> over a bug elsewhere, or if there are any possible side effects.
>>> CC-ing Denis.
>>>
>>>
>>> Is the fix below based on reading the spec or more of instrumenting?
>>>
>>> Thanks.
>> hmm. I have looked in our older code (completely separate
>> from QEMU). It does not have this trick.
>>
>> 2012r2 (as far as I remember) has a bug in the driver
>> when LSC interrupt was raised unexpectedly during driver
>> initialization. The original bug was about TXQE. Can you
>> pls confirm which interrupt causes the storm?
> How can I tell which interrupt is firing? Instrument the QEMU e1000 code?
>
> --Ed
>
connect with gdb to qemu, set break into set_interrupt_cause
and dump value of  s->mac_reg[ICS], s->mac_reg[IMS] and s->mac_reg[ICR]

Bits are defined here hw/net/e1000_regs.h

/* Interrupt Cause Read */
#define E1000_ICR_TXDW          0x00000001 /* Transmit desc written back */
#define E1000_ICR_TXQE          0x00000002 /* Transmit Queue empty */

Den

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2016-09-15 13:49 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-09-01 17:57 [Qemu-devel] [RFC] e1000: Don't save writes to ICS/ICR masked by IMS Ed Swierk
2016-09-13 20:59 ` Konrad Rzeszutek Wilk
2016-09-13 20:59 ` [Qemu-devel] [Xen-devel] " Konrad Rzeszutek Wilk
2016-09-15  9:15   ` Denis V. Lunev
2016-09-15  9:15     ` Denis V. Lunev
2016-09-15 12:22     ` [Qemu-devel] [Xen-devel] " Ed Swierk
2016-09-15 12:22       ` Ed Swierk
2016-09-15 12:46       ` [Qemu-devel] [Xen-devel] " Denis V. Lunev
2016-09-15 12:46         ` [Qemu-devel] " Denis V. Lunev

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.