qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH 0/3] ahci: fix completion race condition
@ 2018-05-31  0:43 John Snow
  2018-05-31  0:43 ` [Qemu-devel] [PATCH 1/3] ahci: trim signatures on raise/lower John Snow
                   ` (5 more replies)
  0 siblings, 6 replies; 16+ messages in thread
From: John Snow @ 2018-05-31  0:43 UTC (permalink / raw)
  To: qemu-devel, qemu-block; +Cc: John Snow

Commit d759c951f changed the main thread lock release/reacquisition,
and in so doing apparently jostled loose a race condition in the AHCI
code.

Patch 2 should be sufficient to fix this, and patches 1 and 3 are just
little trivial fixes.

This might be sufficient to fix the bug as reported at
https://bugs.launchpad.net/qemu/+bug/1769189
but the nature of the timing changes make it difficult to confirm,
so I am posting this patchset for the reporters to help test.

John Snow (3):
  ahci: trim signatures on raise/lower
  ahci: fix PxCI register race
  ahci: don't schedule unnecessary BH

 hw/ide/ahci.c | 24 +++++++++++-------------
 1 file changed, 11 insertions(+), 13 deletions(-)

-- 
2.14.3

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [Qemu-devel] [PATCH 1/3] ahci: trim signatures on raise/lower
  2018-05-31  0:43 [Qemu-devel] [PATCH 0/3] ahci: fix completion race condition John Snow
@ 2018-05-31  0:43 ` John Snow
  2018-05-31 14:56   ` Eric Blake
  2018-05-31 16:00   ` [Qemu-devel] [Qemu-block] " Jeff Cody
  2018-05-31  0:43 ` [Qemu-devel] [PATCH 2/3] ahci: fix PxCI register race John Snow
                   ` (4 subsequent siblings)
  5 siblings, 2 replies; 16+ messages in thread
From: John Snow @ 2018-05-31  0:43 UTC (permalink / raw)
  To: qemu-devel, qemu-block; +Cc: John Snow

These functions work on the AHCI device, not the individual
AHCI devices, so trim the AHCIDevice argument.

Signed-off-by: John Snow <jsnow@redhat.com>
---
 hw/ide/ahci.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/hw/ide/ahci.c b/hw/ide/ahci.c
index e22d7be05f..b7a6f68790 100644
--- a/hw/ide/ahci.c
+++ b/hw/ide/ahci.c
@@ -131,7 +131,7 @@ static uint32_t  ahci_port_read(AHCIState *s, int port, int offset)
     return val;
 }
 
-static void ahci_irq_raise(AHCIState *s, AHCIDevice *dev)
+static void ahci_irq_raise(AHCIState *s)
 {
     DeviceState *dev_state = s->container;
     PCIDevice *pci_dev = (PCIDevice *) object_dynamic_cast(OBJECT(dev_state),
@@ -146,7 +146,7 @@ static void ahci_irq_raise(AHCIState *s, AHCIDevice *dev)
     }
 }
 
-static void ahci_irq_lower(AHCIState *s, AHCIDevice *dev)
+static void ahci_irq_lower(AHCIState *s)
 {
     DeviceState *dev_state = s->container;
     PCIDevice *pci_dev = (PCIDevice *) object_dynamic_cast(OBJECT(dev_state),
@@ -174,9 +174,9 @@ static void ahci_check_irq(AHCIState *s)
     trace_ahci_check_irq(s, old_irq, s->control_regs.irqstatus);
     if (s->control_regs.irqstatus &&
         (s->control_regs.ghc & HOST_CTL_IRQ_EN)) {
-            ahci_irq_raise(s, NULL);
+            ahci_irq_raise(s);
     } else {
-        ahci_irq_lower(s, NULL);
+        ahci_irq_lower(s);
     }
 }
 
-- 
2.14.3

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [Qemu-devel] [PATCH 2/3] ahci: fix PxCI register race
  2018-05-31  0:43 [Qemu-devel] [PATCH 0/3] ahci: fix completion race condition John Snow
  2018-05-31  0:43 ` [Qemu-devel] [PATCH 1/3] ahci: trim signatures on raise/lower John Snow
@ 2018-05-31  0:43 ` John Snow
  2018-05-31 16:16   ` John Snow
  2018-05-31 16:22   ` [Qemu-devel] [Qemu-block] " Jeff Cody
  2018-05-31  0:43 ` [Qemu-devel] [PATCH 3/3] ahci: don't schedule unnecessary BH John Snow
                   ` (3 subsequent siblings)
  5 siblings, 2 replies; 16+ messages in thread
From: John Snow @ 2018-05-31  0:43 UTC (permalink / raw)
  To: qemu-devel, qemu-block; +Cc: John Snow

AHCI presently signals completion prior to the PxCI register being
cleared to indicate completion. If a guest driver attempts to issue
a new command in its IRQ handler, it might be surprised to learn there
is still a command pending.

In the case of Windows 10's boot driver, it will actually poll the IRQ
register hoping to find out when the command is done running -- which
will never happen, as there isn't a command running.

Fix this: clear PxCI in ahci_cmd_done and not in the asynchronous BH.
Because it now runs synchronously, we don't need to check if the command
is actually done by spying on the ATA registers. We know it's done.

Signed-off-by: John Snow <jsnow@redhat.com>
---
 hw/ide/ahci.c | 13 ++++++-------
 1 file changed, 6 insertions(+), 7 deletions(-)

diff --git a/hw/ide/ahci.c b/hw/ide/ahci.c
index b7a6f68790..a9558e45e7 100644
--- a/hw/ide/ahci.c
+++ b/hw/ide/ahci.c
@@ -532,13 +532,6 @@ static void ahci_check_cmd_bh(void *opaque)
     qemu_bh_delete(ad->check_bh);
     ad->check_bh = NULL;
 
-    if ((ad->busy_slot != -1) &&
-        !(ad->port.ifs[0].status & (BUSY_STAT|DRQ_STAT))) {
-        /* no longer busy */
-        ad->port_regs.cmd_issue &= ~(1 << ad->busy_slot);
-        ad->busy_slot = -1;
-    }
-
     check_cmd(ad->hba, ad->port_no);
 }
 
@@ -1425,6 +1418,12 @@ static void ahci_cmd_done(IDEDMA *dma)
 
     trace_ahci_cmd_done(ad->hba, ad->port_no);
 
+    /* no longer busy */
+    if (ad->busy_slot != -1) {
+        ad->port_regs.cmd_issue &= ~(1 << ad->busy_slot);
+        ad->busy_slot = -1;
+    }
+
     /* update d2h status */
     ahci_write_fis_d2h(ad);
 
-- 
2.14.3

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [Qemu-devel] [PATCH 3/3] ahci: don't schedule unnecessary BH
  2018-05-31  0:43 [Qemu-devel] [PATCH 0/3] ahci: fix completion race condition John Snow
  2018-05-31  0:43 ` [Qemu-devel] [PATCH 1/3] ahci: trim signatures on raise/lower John Snow
  2018-05-31  0:43 ` [Qemu-devel] [PATCH 2/3] ahci: fix PxCI register race John Snow
@ 2018-05-31  0:43 ` John Snow
  2018-05-31 16:22   ` [Qemu-devel] [Qemu-block] " Jeff Cody
  2018-05-31  8:01 ` [Qemu-devel] [Qemu-block] [PATCH 0/3] ahci: fix completion race condition Stefan Hajnoczi
                   ` (2 subsequent siblings)
  5 siblings, 1 reply; 16+ messages in thread
From: John Snow @ 2018-05-31  0:43 UTC (permalink / raw)
  To: qemu-devel, qemu-block; +Cc: John Snow

The comment gives us a hint. *Maybe* we still have something to
process. Well, why not check?

Signed-off-by: John Snow <jsnow@redhat.com>
---
 hw/ide/ahci.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/hw/ide/ahci.c b/hw/ide/ahci.c
index a9558e45e7..380366b038 100644
--- a/hw/ide/ahci.c
+++ b/hw/ide/ahci.c
@@ -1427,8 +1427,7 @@ static void ahci_cmd_done(IDEDMA *dma)
     /* update d2h status */
     ahci_write_fis_d2h(ad);
 
-    if (!ad->check_bh) {
-        /* maybe we still have something to process, check later */
+    if (ad->port_regs.cmd_issue && !ad->check_bh) {
         ad->check_bh = qemu_bh_new(ahci_check_cmd_bh, ad);
         qemu_bh_schedule(ad->check_bh);
     }
-- 
2.14.3

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [Qemu-devel] [Qemu-block] [PATCH 0/3] ahci: fix completion race condition
  2018-05-31  0:43 [Qemu-devel] [PATCH 0/3] ahci: fix completion race condition John Snow
                   ` (2 preceding siblings ...)
  2018-05-31  0:43 ` [Qemu-devel] [PATCH 3/3] ahci: don't schedule unnecessary BH John Snow
@ 2018-05-31  8:01 ` Stefan Hajnoczi
  2018-05-31 12:21 ` [Qemu-devel] " Bruce Rogers
  2018-06-01  0:00 ` John Snow
  5 siblings, 0 replies; 16+ messages in thread
From: Stefan Hajnoczi @ 2018-05-31  8:01 UTC (permalink / raw)
  To: John Snow; +Cc: qemu-devel, qemu block

On Thu, May 31, 2018 at 1:43 AM, John Snow <jsnow@redhat.com> wrote:
> Commit d759c951f changed the main thread lock release/reacquisition,
> and in so doing apparently jostled loose a race condition in the AHCI
> code.
>
> Patch 2 should be sufficient to fix this, and patches 1 and 3 are just
> little trivial fixes.
>
> This might be sufficient to fix the bug as reported at
> https://bugs.launchpad.net/qemu/+bug/1769189
> but the nature of the timing changes make it difficult to confirm,
> so I am posting this patchset for the reporters to help test.
>
> John Snow (3):
>   ahci: trim signatures on raise/lower
>   ahci: fix PxCI register race
>   ahci: don't schedule unnecessary BH
>
>  hw/ide/ahci.c | 24 +++++++++++-------------
>  1 file changed, 11 insertions(+), 13 deletions(-)

Nice find!

Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [Qemu-devel] [PATCH 0/3] ahci: fix completion race condition
  2018-05-31  0:43 [Qemu-devel] [PATCH 0/3] ahci: fix completion race condition John Snow
                   ` (3 preceding siblings ...)
  2018-05-31  8:01 ` [Qemu-devel] [Qemu-block] [PATCH 0/3] ahci: fix completion race condition Stefan Hajnoczi
@ 2018-05-31 12:21 ` Bruce Rogers
  2018-05-31 13:06   ` Philippe Mathieu-Daudé
  2018-06-01  0:00 ` John Snow
  5 siblings, 1 reply; 16+ messages in thread
From: Bruce Rogers @ 2018-05-31 12:21 UTC (permalink / raw)
  To: qemu-block, qemu-devel, jsnow

>>> On 5/30/2018 at 6:43 PM, John Snow <jsnow@redhat.com> wrote:
> Commit d759c951f changed the main thread lock release/reacquisition,
> and in so doing apparently jostled loose a race condition in the AHCI
> code.
> 
> Patch 2 should be sufficient to fix this, and patches 1 and 3 are just
> little trivial fixes.
> 
> This might be sufficient to fix the bug as reported at
> https://bugs.launchpad.net/qemu/+bug/1769189 
> but the nature of the timing changes make it difficult to confirm,
> so I am posting this patchset for the reporters to help test.
> 
> John Snow (3):
>   ahci: trim signatures on raise/lower
>   ahci: fix PxCI register race
>   ahci: don't schedule unnecessary BH
> 
>  hw/ide/ahci.c | 24 +++++++++++‑‑‑‑‑‑‑‑‑‑‑‑‑
>  1 file changed, 11 insertions(+), 13 deletions(‑)
> 
> ‑‑ 
> 2.14.3

In my case, I applied these 3 patches on top of v2.12 qemu under which
I can fairly reliably reproduce Windows10 disk delays. With these patches,
after quite a number of attempts, I no longer can reproduce the failure
case, so from my perspective it solves the issue.

Thanks!

Bruce

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [Qemu-devel] [PATCH 0/3] ahci: fix completion race condition
  2018-05-31 12:21 ` [Qemu-devel] " Bruce Rogers
@ 2018-05-31 13:06   ` Philippe Mathieu-Daudé
  2018-05-31 13:12     ` Bruce Rogers
  0 siblings, 1 reply; 16+ messages in thread
From: Philippe Mathieu-Daudé @ 2018-05-31 13:06 UTC (permalink / raw)
  To: Bruce Rogers, qemu-block, qemu-devel, jsnow

Hi Bruce,

On 05/31/2018 09:21 AM, Bruce Rogers wrote:
>>>> On 5/30/2018 at 6:43 PM, John Snow <jsnow@redhat.com> wrote:
>> Commit d759c951f changed the main thread lock release/reacquisition,
>> and in so doing apparently jostled loose a race condition in the AHCI
>> code.
>>
>> Patch 2 should be sufficient to fix this, and patches 1 and 3 are just
>> little trivial fixes.
>>
>> This might be sufficient to fix the bug as reported at
>> https://bugs.launchpad.net/qemu/+bug/1769189 
>> but the nature of the timing changes make it difficult to confirm,
>> so I am posting this patchset for the reporters to help test.
>>
>> John Snow (3):
>>   ahci: trim signatures on raise/lower
>>   ahci: fix PxCI register race
>>   ahci: don't schedule unnecessary BH
>>
>>  hw/ide/ahci.c | 24 +++++++++++‑‑‑‑‑‑‑‑‑‑‑‑‑
>>  1 file changed, 11 insertions(+), 13 deletions(‑)
>>
>> ‑‑ 
>> 2.14.3
> 
> In my case, I applied these 3 patches on top of v2.12 qemu under which
> I can fairly reliably reproduce Windows10 disk delays. With these patches,
> after quite a number of attempts, I no longer can reproduce the failure
> case, so from my perspective it solves the issue.

Does that implicitly mean John can use your "Tested-by: Bruce Rogers
<brogers@suse.com>" tag?

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [Qemu-devel] [PATCH 0/3] ahci: fix completion race condition
  2018-05-31 13:06   ` Philippe Mathieu-Daudé
@ 2018-05-31 13:12     ` Bruce Rogers
  0 siblings, 0 replies; 16+ messages in thread
From: Bruce Rogers @ 2018-05-31 13:12 UTC (permalink / raw)
  To: f4bug, qemu-block, qemu-devel, jsnow

>>> On 5/31/2018 at 7:06 AM, Philippe Mathieu-Daudé <f4bug@amsat.org> wrote:
> Hi Bruce,
> 
> On 05/31/2018 09:21 AM, Bruce Rogers wrote:
>>>>> On 5/30/2018 at 6:43 PM, John Snow <jsnow@redhat.com> wrote:
>>> Commit d759c951f changed the main thread lock release/reacquisition,
>>> and in so doing apparently jostled loose a race condition in the AHCI
>>> code.
>>>
>>> Patch 2 should be sufficient to fix this, and patches 1 and 3 are just
>>> little trivial fixes.
>>>
>>> This might be sufficient to fix the bug as reported at
>>> https://bugs.launchpad.net/qemu/+bug/1769189 
>>> but the nature of the timing changes make it difficult to confirm,
>>> so I am posting this patchset for the reporters to help test.
>>>
>>> John Snow (3):
>>>   ahci: trim signatures on raise/lower
>>>   ahci: fix PxCI register race
>>>   ahci: don't schedule unnecessary BH
>>>
>>>  hw/ide/ahci.c | 24 +++++++++++‑‑‑‑‑‑‑‑‑‑‑‑‑
>>>  1 file changed, 11 insertions(+), 13 deletions(‑)
>>>
>>> ‑‑ 
>>> 2.14.3
>> 
>> In my case, I applied these 3 patches on top of v2.12 qemu under which
>> I can fairly reliably reproduce Windows10 disk delays. With these patches,
>> after quite a number of attempts, I no longer can reproduce the failure
>> case, so from my perspective it solves the issue.
> 
> Does that implicitly mean John can use your "Tested-by: Bruce Rogers
> <brogers@suse.com>" tag?

Yes. I guess I should have added that.
Bruce

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [Qemu-devel] [PATCH 1/3] ahci: trim signatures on raise/lower
  2018-05-31  0:43 ` [Qemu-devel] [PATCH 1/3] ahci: trim signatures on raise/lower John Snow
@ 2018-05-31 14:56   ` Eric Blake
  2018-05-31 15:38     ` John Snow
  2018-05-31 16:00   ` [Qemu-devel] [Qemu-block] " Jeff Cody
  1 sibling, 1 reply; 16+ messages in thread
From: Eric Blake @ 2018-05-31 14:56 UTC (permalink / raw)
  To: John Snow, qemu-devel, qemu-block

On 05/30/2018 07:43 PM, John Snow wrote:
> These functions work on the AHCI device, not the individual

s/device/state/ ?

> AHCI devices, so trim the AHCIDevice argument.
> 
> Signed-off-by: John Snow <jsnow@redhat.com>
> ---
>   hw/ide/ahci.c | 8 ++++----
>   1 file changed, 4 insertions(+), 4 deletions(-)
> 

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [Qemu-devel] [PATCH 1/3] ahci: trim signatures on raise/lower
  2018-05-31 14:56   ` Eric Blake
@ 2018-05-31 15:38     ` John Snow
  0 siblings, 0 replies; 16+ messages in thread
From: John Snow @ 2018-05-31 15:38 UTC (permalink / raw)
  To: Eric Blake, qemu-devel, qemu-block



On 05/31/2018 10:56 AM, Eric Blake wrote:
> On 05/30/2018 07:43 PM, John Snow wrote:
>> These functions work on the AHCI device, not the individual
> 
> s/device/state/ ?
> 

It's confusing; the AHCI device itself (the HBA) is represented by the
struct AHCIState. the SATA devices are referred to by "AHCIDevice".

Not great names, admittedly.

--js

>> AHCI devices, so trim the AHCIDevice argument.
>>
>> Signed-off-by: John Snow <jsnow@redhat.com>
>> ---
>>   hw/ide/ahci.c | 8 ++++----
>>   1 file changed, 4 insertions(+), 4 deletions(-)
>>
> 

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [Qemu-devel] [Qemu-block] [PATCH 1/3] ahci: trim signatures on raise/lower
  2018-05-31  0:43 ` [Qemu-devel] [PATCH 1/3] ahci: trim signatures on raise/lower John Snow
  2018-05-31 14:56   ` Eric Blake
@ 2018-05-31 16:00   ` Jeff Cody
  1 sibling, 0 replies; 16+ messages in thread
From: Jeff Cody @ 2018-05-31 16:00 UTC (permalink / raw)
  To: John Snow; +Cc: qemu-devel, qemu-block

On Wed, May 30, 2018 at 08:43:21PM -0400, John Snow wrote:
> These functions work on the AHCI device, not the individual
> AHCI devices, so trim the AHCIDevice argument.
> 
> Signed-off-by: John Snow <jsnow@redhat.com>
> ---
>  hw/ide/ahci.c | 8 ++++----
>  1 file changed, 4 insertions(+), 4 deletions(-)
> 
> diff --git a/hw/ide/ahci.c b/hw/ide/ahci.c
> index e22d7be05f..b7a6f68790 100644
> --- a/hw/ide/ahci.c
> +++ b/hw/ide/ahci.c
> @@ -131,7 +131,7 @@ static uint32_t  ahci_port_read(AHCIState *s, int port, int offset)
>      return val;
>  }
>  
> -static void ahci_irq_raise(AHCIState *s, AHCIDevice *dev)
> +static void ahci_irq_raise(AHCIState *s)
>  {
>      DeviceState *dev_state = s->container;
>      PCIDevice *pci_dev = (PCIDevice *) object_dynamic_cast(OBJECT(dev_state),
> @@ -146,7 +146,7 @@ static void ahci_irq_raise(AHCIState *s, AHCIDevice *dev)
>      }
>  }
>  
> -static void ahci_irq_lower(AHCIState *s, AHCIDevice *dev)
> +static void ahci_irq_lower(AHCIState *s)
>  {
>      DeviceState *dev_state = s->container;
>      PCIDevice *pci_dev = (PCIDevice *) object_dynamic_cast(OBJECT(dev_state),
> @@ -174,9 +174,9 @@ static void ahci_check_irq(AHCIState *s)
>      trace_ahci_check_irq(s, old_irq, s->control_regs.irqstatus);
>      if (s->control_regs.irqstatus &&
>          (s->control_regs.ghc & HOST_CTL_IRQ_EN)) {
> -            ahci_irq_raise(s, NULL);
> +            ahci_irq_raise(s);
>      } else {
> -        ahci_irq_lower(s, NULL);
> +        ahci_irq_lower(s);
>      }
>  }
>  
> -- 
> 2.14.3
> 
> 

Reviewed-by: Jeff Cody <jcody@redhat.com>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [Qemu-devel] [PATCH 2/3] ahci: fix PxCI register race
  2018-05-31  0:43 ` [Qemu-devel] [PATCH 2/3] ahci: fix PxCI register race John Snow
@ 2018-05-31 16:16   ` John Snow
  2018-05-31 16:22   ` [Qemu-devel] [Qemu-block] " Jeff Cody
  1 sibling, 0 replies; 16+ messages in thread
From: John Snow @ 2018-05-31 16:16 UTC (permalink / raw)
  To: qemu-devel, qemu-block; +Cc: qemu-stable

>From comments on the cover letter, I am adding:

Fixes: https://bugs.launchpad.net/qemu/+bug/1769189

On 05/30/2018 08:43 PM, John Snow wrote:
> AHCI presently signals completion prior to the PxCI register being
> cleared to indicate completion. If a guest driver attempts to issue
> a new command in its IRQ handler, it might be surprised to learn there
> is still a command pending.
> 
> In the case of Windows 10's boot driver, it will actually poll the IRQ
> register hoping to find out when the command is done running -- which
> will never happen, as there isn't a command running.
> 
> Fix this: clear PxCI in ahci_cmd_done and not in the asynchronous BH.
> Because it now runs synchronously, we don't need to check if the command
> is actually done by spying on the ATA registers. We know it's done.
> 
> Signed-off-by: John Snow <jsnow@redhat.com>

and:

CC: qemu-stable <qemu-stable@nongnu.org>
Reported-by: François Guerraz <kubrick@fgv6.net>
Tested-by: Bruce Rogers <brogers@suse.com>

(Stable note: only patch #2 here is strictly required.)

> ---
>  hw/ide/ahci.c | 13 ++++++-------
>  1 file changed, 6 insertions(+), 7 deletions(-)
> 
> diff --git a/hw/ide/ahci.c b/hw/ide/ahci.c
> index b7a6f68790..a9558e45e7 100644
> --- a/hw/ide/ahci.c
> +++ b/hw/ide/ahci.c
> @@ -532,13 +532,6 @@ static void ahci_check_cmd_bh(void *opaque)
>      qemu_bh_delete(ad->check_bh);
>      ad->check_bh = NULL;
>  
> -    if ((ad->busy_slot != -1) &&
> -        !(ad->port.ifs[0].status & (BUSY_STAT|DRQ_STAT))) {
> -        /* no longer busy */
> -        ad->port_regs.cmd_issue &= ~(1 << ad->busy_slot);
> -        ad->busy_slot = -1;
> -    }
> -
>      check_cmd(ad->hba, ad->port_no);
>  }
>  
> @@ -1425,6 +1418,12 @@ static void ahci_cmd_done(IDEDMA *dma)
>  
>      trace_ahci_cmd_done(ad->hba, ad->port_no);
>  
> +    /* no longer busy */
> +    if (ad->busy_slot != -1) {
> +        ad->port_regs.cmd_issue &= ~(1 << ad->busy_slot);
> +        ad->busy_slot = -1;
> +    }
> +
>      /* update d2h status */
>      ahci_write_fis_d2h(ad);
>  
> 

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [Qemu-devel] [Qemu-block] [PATCH 2/3] ahci: fix PxCI register race
  2018-05-31  0:43 ` [Qemu-devel] [PATCH 2/3] ahci: fix PxCI register race John Snow
  2018-05-31 16:16   ` John Snow
@ 2018-05-31 16:22   ` Jeff Cody
  1 sibling, 0 replies; 16+ messages in thread
From: Jeff Cody @ 2018-05-31 16:22 UTC (permalink / raw)
  To: John Snow; +Cc: qemu-devel, qemu-block

On Wed, May 30, 2018 at 08:43:22PM -0400, John Snow wrote:
> AHCI presently signals completion prior to the PxCI register being
> cleared to indicate completion. If a guest driver attempts to issue
> a new command in its IRQ handler, it might be surprised to learn there
> is still a command pending.
> 
> In the case of Windows 10's boot driver, it will actually poll the IRQ
> register hoping to find out when the command is done running -- which
> will never happen, as there isn't a command running.
> 
> Fix this: clear PxCI in ahci_cmd_done and not in the asynchronous BH.
> Because it now runs synchronously, we don't need to check if the command
> is actually done by spying on the ATA registers. We know it's done.
> 
> Signed-off-by: John Snow <jsnow@redhat.com>
> ---
>  hw/ide/ahci.c | 13 ++++++-------
>  1 file changed, 6 insertions(+), 7 deletions(-)
> 
> diff --git a/hw/ide/ahci.c b/hw/ide/ahci.c
> index b7a6f68790..a9558e45e7 100644
> --- a/hw/ide/ahci.c
> +++ b/hw/ide/ahci.c
> @@ -532,13 +532,6 @@ static void ahci_check_cmd_bh(void *opaque)
>      qemu_bh_delete(ad->check_bh);
>      ad->check_bh = NULL;
>  
> -    if ((ad->busy_slot != -1) &&
> -        !(ad->port.ifs[0].status & (BUSY_STAT|DRQ_STAT))) {
> -        /* no longer busy */
> -        ad->port_regs.cmd_issue &= ~(1 << ad->busy_slot);
> -        ad->busy_slot = -1;
> -    }
> -
>      check_cmd(ad->hba, ad->port_no);
>  }
>  
> @@ -1425,6 +1418,12 @@ static void ahci_cmd_done(IDEDMA *dma)
>  
>      trace_ahci_cmd_done(ad->hba, ad->port_no);
>  
> +    /* no longer busy */
> +    if (ad->busy_slot != -1) {
> +        ad->port_regs.cmd_issue &= ~(1 << ad->busy_slot);
> +        ad->busy_slot = -1;
> +    }
> +
>      /* update d2h status */
>      ahci_write_fis_d2h(ad);
>  
> -- 
> 2.14.3
> 
> 

Reviewed-by: Jeff Cody <jcody@redhat.com>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [Qemu-devel] [Qemu-block] [PATCH 3/3] ahci: don't schedule unnecessary BH
  2018-05-31  0:43 ` [Qemu-devel] [PATCH 3/3] ahci: don't schedule unnecessary BH John Snow
@ 2018-05-31 16:22   ` Jeff Cody
  0 siblings, 0 replies; 16+ messages in thread
From: Jeff Cody @ 2018-05-31 16:22 UTC (permalink / raw)
  To: John Snow; +Cc: qemu-devel, qemu-block

On Wed, May 30, 2018 at 08:43:23PM -0400, John Snow wrote:
> The comment gives us a hint. *Maybe* we still have something to
> process. Well, why not check?

Heh :)

Reviewed-by: Jeff Cody <jcody@redhat.com>

> 
> Signed-off-by: John Snow <jsnow@redhat.com>
> ---
>  hw/ide/ahci.c | 3 +--
>  1 file changed, 1 insertion(+), 2 deletions(-)
> 
> diff --git a/hw/ide/ahci.c b/hw/ide/ahci.c
> index a9558e45e7..380366b038 100644
> --- a/hw/ide/ahci.c
> +++ b/hw/ide/ahci.c
> @@ -1427,8 +1427,7 @@ static void ahci_cmd_done(IDEDMA *dma)
>      /* update d2h status */
>      ahci_write_fis_d2h(ad);
>  
> -    if (!ad->check_bh) {
> -        /* maybe we still have something to process, check later */
> +    if (ad->port_regs.cmd_issue && !ad->check_bh) {
>          ad->check_bh = qemu_bh_new(ahci_check_cmd_bh, ad);
>          qemu_bh_schedule(ad->check_bh);
>      }
> -- 
> 2.14.3
> 
> 

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [Qemu-devel] [PATCH 0/3] ahci: fix completion race condition
  2018-05-31  0:43 [Qemu-devel] [PATCH 0/3] ahci: fix completion race condition John Snow
                   ` (4 preceding siblings ...)
  2018-05-31 12:21 ` [Qemu-devel] " Bruce Rogers
@ 2018-06-01  0:00 ` John Snow
  2018-06-04  9:22   ` [Qemu-devel] [Qemu-block] " Stefan Hajnoczi
  5 siblings, 1 reply; 16+ messages in thread
From: John Snow @ 2018-06-01  0:00 UTC (permalink / raw)
  To: qemu-devel, qemu-block



On 05/30/2018 08:43 PM, John Snow wrote:
> Commit d759c951f changed the main thread lock release/reacquisition,
> and in so doing apparently jostled loose a race condition in the AHCI
> code.
> 
> Patch 2 should be sufficient to fix this, and patches 1 and 3 are just
> little trivial fixes.
> 
> This might be sufficient to fix the bug as reported at
> https://bugs.launchpad.net/qemu/+bug/1769189
> but the nature of the timing changes make it difficult to confirm,
> so I am posting this patchset for the reporters to help test.
> 
> John Snow (3):
>   ahci: trim signatures on raise/lower
>   ahci: fix PxCI register race
>   ahci: don't schedule unnecessary BH
> 
>  hw/ide/ahci.c | 24 +++++++++++-------------
>  1 file changed, 11 insertions(+), 13 deletions(-)
> 

Thanks for the testing and reviews, everyone!

Thanks, applied to my IDE tree:

https://github.com/jnsnow/qemu/commits/ide
https://github.com/jnsnow/qemu.git

--js

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [Qemu-devel] [Qemu-block] [PATCH 0/3] ahci: fix completion race condition
  2018-06-01  0:00 ` John Snow
@ 2018-06-04  9:22   ` Stefan Hajnoczi
  0 siblings, 0 replies; 16+ messages in thread
From: Stefan Hajnoczi @ 2018-06-04  9:22 UTC (permalink / raw)
  To: John Snow; +Cc: qemu-devel, qemu-block, qemu-stable

[-- Attachment #1: Type: text/plain, Size: 1141 bytes --]

On Thu, May 31, 2018 at 08:00:34PM -0400, John Snow wrote:
> On 05/30/2018 08:43 PM, John Snow wrote:
> > Commit d759c951f changed the main thread lock release/reacquisition,
> > and in so doing apparently jostled loose a race condition in the AHCI
> > code.
> > 
> > Patch 2 should be sufficient to fix this, and patches 1 and 3 are just
> > little trivial fixes.
> > 
> > This might be sufficient to fix the bug as reported at
> > https://bugs.launchpad.net/qemu/+bug/1769189
> > but the nature of the timing changes make it difficult to confirm,
> > so I am posting this patchset for the reporters to help test.
> > 
> > John Snow (3):
> >   ahci: trim signatures on raise/lower
> >   ahci: fix PxCI register race
> >   ahci: don't schedule unnecessary BH
> > 
> >  hw/ide/ahci.c | 24 +++++++++++-------------
> >  1 file changed, 11 insertions(+), 13 deletions(-)
> > 
> 
> Thanks for the testing and reviews, everyone!
> 
> Thanks, applied to my IDE tree:
> 
> https://github.com/jnsnow/qemu/commits/ide
> https://github.com/jnsnow/qemu.git

Also CCing stable.  This should go into QEMU 2.12.1.

Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 455 bytes --]

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2018-06-04  9:23 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-05-31  0:43 [Qemu-devel] [PATCH 0/3] ahci: fix completion race condition John Snow
2018-05-31  0:43 ` [Qemu-devel] [PATCH 1/3] ahci: trim signatures on raise/lower John Snow
2018-05-31 14:56   ` Eric Blake
2018-05-31 15:38     ` John Snow
2018-05-31 16:00   ` [Qemu-devel] [Qemu-block] " Jeff Cody
2018-05-31  0:43 ` [Qemu-devel] [PATCH 2/3] ahci: fix PxCI register race John Snow
2018-05-31 16:16   ` John Snow
2018-05-31 16:22   ` [Qemu-devel] [Qemu-block] " Jeff Cody
2018-05-31  0:43 ` [Qemu-devel] [PATCH 3/3] ahci: don't schedule unnecessary BH John Snow
2018-05-31 16:22   ` [Qemu-devel] [Qemu-block] " Jeff Cody
2018-05-31  8:01 ` [Qemu-devel] [Qemu-block] [PATCH 0/3] ahci: fix completion race condition Stefan Hajnoczi
2018-05-31 12:21 ` [Qemu-devel] " Bruce Rogers
2018-05-31 13:06   ` Philippe Mathieu-Daudé
2018-05-31 13:12     ` Bruce Rogers
2018-06-01  0:00 ` John Snow
2018-06-04  9:22   ` [Qemu-devel] [Qemu-block] " Stefan Hajnoczi

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).