All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Michael Kelley (EOSG)" <Michael.H.Kelley@microsoft.com>
To: Dexuan Cui <decui@microsoft.com>,
	"'Lorenzo Pieralisi'" <lorenzo.pieralisi@arm.com>,
	"'Bjorn Helgaas'" <bhelgaas@google.com>,
	"'linux-pci@vger.kernel.org'" <linux-pci@vger.kernel.org>,
	KY Srinivasan <kys@microsoft.com>,
	Stephen Hemminger <sthemmin@microsoft.com>,
	"'olaf@aepfle.de'" <olaf@aepfle.de>,
	"'apw@canonical.com'" <apw@canonical.com>,
	"'jasowang@redhat.com'" <jasowang@redhat.com>
Cc: "'linux-kernel@vger.kernel.org'" <linux-kernel@vger.kernel.org>,
	"'driverdev-devel@linuxdriverproject.org'"
	<driverdev-devel@linuxdriverproject.org>,
	Haiyang Zhang <haiyangz@microsoft.com>,
	"'vkuznets@redhat.com'" <vkuznets@redhat.com>,
	"'marcelo.cerri@canonical.com'" <marcelo.cerri@canonical.com>
Subject: RE: [PATCH] PCI: hv: Do not wait forever on a device that has disappeared
Date: Tue, 29 May 2018 00:19:16 +0000	[thread overview]
Message-ID: <SN6PR2101MB112020C6F949F7C72F52D6BEDC6D0@SN6PR2101MB1120.namprd21.prod.outlook.com> (raw)
In-Reply-To: <KL1P15301MB0006DF4209AEE3809ABD303FBF6B0@KL1P15301MB0006.APCP153.PROD.OUTLOOK.COM>

> 
> Before the guest finishes the device initialization, the device can be
> removed anytime by the host, and after that the host won't respond to
> the guest's request, so the guest should be prepared to handle this
> case.
> 
> Signed-off-by: Dexuan Cui <decui@microsoft.com>
> Cc: Stephen Hemminger <sthemmin@microsoft.com>
> Cc: K. Y. Srinivasan <kys@microsoft.com>
> ---
>  drivers/pci/host/pci-hyperv.c | 46 ++++++++++++++++++++++++++++++++-----------
>  1 file changed, 34 insertions(+), 12 deletions(-)
> 

While this patch solves the immediate problem of getting hung waiting
for a response from Hyper-V that will never come, there's another scenario
to look at that I think introduces a race.  Suppose the guest VM issues a
vmbus_sendpacket() request in one of the cases covered by this patch,
and suppose that Hyper-V queues a response to the request, and then
immediately follows with a rescind request.   Processing the response will
get queued to a tasklet associated with the channel, while processing the
rescind will get queued to a tasklet associated with the top-level vmbus
connection.   From what I can see, the code doesn't impose any ordering
on processing the two.  If the rescind is processed first, the new
wait_for_response() function may wake up, notice the rescind flag, and
return an error.  Its caller will return an error, and in doing so pop the
completion packet off the stack.   When the response is processed later,
it will try to signal completion via a completion packet that no longer
exists, and memory corruption will likely occur.

Am I missing anything that would prevent this scenario from happening?
It is admittedly low probability, and a solution seems non-trivial.  I haven't
looked specifically, but a similar scenario is probably possible with the
drivers for other VMbus devices.  We should work on a generic solution.

Michael

WARNING: multiple messages have this Message-ID (diff)
From: "Michael Kelley (EOSG)" <Michael.H.Kelley@microsoft.com>
To: Dexuan Cui <decui@microsoft.com>,
	'Lorenzo Pieralisi' <lorenzo.pieralisi@arm.com>,
	'Bjorn Helgaas' <bhelgaas@google.com>,
	"'linux-pci@vger.kernel.org'" <linux-pci@vger.kernel.org>,
	KY Srinivasan <kys@microsoft.com>,
	Stephen Hemminger <sthemmin@microsoft.com>,
	"'olaf@aepfle.de'" <olaf@aepfle.de>,
	"'apw@canonical.com'" <apw@canonical.com>,
	"'jasowang@redhat.com'" <jasowang@redhat.com>
Cc: "'linux-kernel@vger.kernel.org'" <linux-kernel@vger.kernel.org>,
	"'driverdev-devel@linuxdriverproject.org'"
	<driverdev-devel@linuxdriverproject.org>,
	Haiyang Zhang <haiyangz@microsoft.com>,
	"'vkuznets@redhat.com'" <vkuznets@redhat.com>,
	"'marcelo.cerri@canonical.com'" <marcelo.cerri@canonical.com>
Subject: RE: [PATCH] PCI: hv: Do not wait forever on a device that has disappeared
Date: Tue, 29 May 2018 00:19:16 +0000	[thread overview]
Message-ID: <SN6PR2101MB112020C6F949F7C72F52D6BEDC6D0@SN6PR2101MB1120.namprd21.prod.outlook.com> (raw)
In-Reply-To: <KL1P15301MB0006DF4209AEE3809ABD303FBF6B0@KL1P15301MB0006.APCP153.PROD.OUTLOOK.COM>

>=20
> Before the guest finishes the device initialization, the device can be
> removed anytime by the host, and after that the host won't respond to
> the guest's request, so the guest should be prepared to handle this
> case.
>=20
> Signed-off-by: Dexuan Cui <decui@microsoft.com>
> Cc: Stephen Hemminger <sthemmin@microsoft.com>
> Cc: K. Y. Srinivasan <kys@microsoft.com>
> ---
>  drivers/pci/host/pci-hyperv.c | 46 ++++++++++++++++++++++++++++++++-----=
------
>  1 file changed, 34 insertions(+), 12 deletions(-)
>=20

While this patch solves the immediate problem of getting hung waiting
for a response from Hyper-V that will never come, there's another scenario
to look at that I think introduces a race.  Suppose the guest VM issues a
vmbus_sendpacket() request in one of the cases covered by this patch,
and suppose that Hyper-V queues a response to the request, and then
immediately follows with a rescind request.   Processing the response will
get queued to a tasklet associated with the channel, while processing the
rescind will get queued to a tasklet associated with the top-level vmbus
connection.   From what I can see, the code doesn't impose any ordering
on processing the two.  If the rescind is processed first, the new
wait_for_response() function may wake up, notice the rescind flag, and
return an error.  Its caller will return an error, and in doing so pop the
completion packet off the stack.   When the response is processed later,
it will try to signal completion via a completion packet that no longer
exists, and memory corruption will likely occur.

Am I missing anything that would prevent this scenario from happening?
It is admittedly low probability, and a solution seems non-trivial.  I have=
n't
looked specifically, but a similar scenario is probably possible with the
drivers for other VMbus devices.  We should work on a generic solution.

Michael

WARNING: multiple messages have this Message-ID (diff)
From: "Michael Kelley (EOSG)" <Michael.H.Kelley@microsoft.com>
To: Dexuan Cui <decui@microsoft.com>,
	'Lorenzo Pieralisi' <lorenzo.pieralisi@arm.com>,
	'Bjorn Helgaas' <bhelgaas@google.com>,
	"'linux-pci@vger.kernel.org'" <linux-pci@vger.kernel.org>,
	KY Srinivasan <kys@microsoft.com>,
	Stephen Hemminger <sthemmin@microsoft.com>,
	"'olaf@aepfle.de'" <olaf@aepfle.de>,
	"'apw@canonical.com'" <apw@canonical.com>,
	"'jasowang@redhat.com'" <jasowang@redhat.com>
Cc: "'marcelo.cerri@canonical.com'" <marcelo.cerri@canonical.com>,
	"'vkuznets@redhat.com'" <vkuznets@redhat.com>,
	Haiyang Zhang <haiyangz@microsoft.com>,
	"'driverdev-devel@linuxdriverproject.org'"
	<driverdev-devel@linuxdriverproject.org>,
	"'linux-kernel@vger.kernel.org'" <linux-kernel@vger.kernel.org>
Subject: RE: [PATCH] PCI: hv: Do not wait forever on a device that has disappeared
Date: Tue, 29 May 2018 00:19:16 +0000	[thread overview]
Message-ID: <SN6PR2101MB112020C6F949F7C72F52D6BEDC6D0@SN6PR2101MB1120.namprd21.prod.outlook.com> (raw)
In-Reply-To: <KL1P15301MB0006DF4209AEE3809ABD303FBF6B0@KL1P15301MB0006.APCP153.PROD.OUTLOOK.COM>

> 
> Before the guest finishes the device initialization, the device can be
> removed anytime by the host, and after that the host won't respond to
> the guest's request, so the guest should be prepared to handle this
> case.
> 
> Signed-off-by: Dexuan Cui <decui@microsoft.com>
> Cc: Stephen Hemminger <sthemmin@microsoft.com>
> Cc: K. Y. Srinivasan <kys@microsoft.com>
> ---
>  drivers/pci/host/pci-hyperv.c | 46 ++++++++++++++++++++++++++++++++-----------
>  1 file changed, 34 insertions(+), 12 deletions(-)
> 

While this patch solves the immediate problem of getting hung waiting
for a response from Hyper-V that will never come, there's another scenario
to look at that I think introduces a race.  Suppose the guest VM issues a
vmbus_sendpacket() request in one of the cases covered by this patch,
and suppose that Hyper-V queues a response to the request, and then
immediately follows with a rescind request.   Processing the response will
get queued to a tasklet associated with the channel, while processing the
rescind will get queued to a tasklet associated with the top-level vmbus
connection.   From what I can see, the code doesn't impose any ordering
on processing the two.  If the rescind is processed first, the new
wait_for_response() function may wake up, notice the rescind flag, and
return an error.  Its caller will return an error, and in doing so pop the
completion packet off the stack.   When the response is processed later,
it will try to signal completion via a completion packet that no longer
exists, and memory corruption will likely occur.

Am I missing anything that would prevent this scenario from happening?
It is admittedly low probability, and a solution seems non-trivial.  I haven't
looked specifically, but a similar scenario is probably possible with the
drivers for other VMbus devices.  We should work on a generic solution.

Michael

_______________________________________________
devel mailing list
devel@linuxdriverproject.org
http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel

  parent reply	other threads:[~2018-05-29  0:19 UTC|newest]

Thread overview: 31+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-05-23 21:12 [PATCH] PCI: hv: Do not wait forever on a device that has disappeared Dexuan Cui
2018-05-23 21:12 ` Dexuan Cui
2018-05-24 12:41 ` Lorenzo Pieralisi
2018-05-24 12:41   ` Lorenzo Pieralisi
2018-05-24 23:55   ` Dexuan Cui
2018-05-24 23:55     ` Dexuan Cui
2018-05-24 23:55     ` Dexuan Cui
2018-05-25 10:29     ` Lorenzo Pieralisi
2018-05-25 10:29       ` Lorenzo Pieralisi
2018-05-25 11:43 ` Haiyang Zhang
2018-05-25 11:43   ` Haiyang Zhang
2018-05-25 11:43   ` Haiyang Zhang
2018-05-25 13:56 ` Lorenzo Pieralisi
2018-05-25 13:56   ` Lorenzo Pieralisi
2018-05-29  0:19 ` Michael Kelley (EOSG) [this message]
2018-05-29  0:19   ` Michael Kelley (EOSG)
2018-05-29  0:19   ` Michael Kelley (EOSG)
2018-05-29 19:58   ` Dexuan Cui
2018-05-29 19:58     ` Dexuan Cui
2018-05-29 19:58     ` Dexuan Cui
2018-05-31 16:40     ` Michael Kelley (EOSG)
2018-05-31 16:40       ` Michael Kelley (EOSG)
2018-05-31 16:40       ` Michael Kelley (EOSG)
2018-05-31 21:01       ` Dexuan Cui
2018-05-31 21:01         ` Dexuan Cui
2018-05-31 21:01         ` Dexuan Cui
2018-05-29 21:20 ` Andy Shevchenko
2018-05-29 21:20   ` Andy Shevchenko
2018-05-29 21:28   ` Dexuan Cui
2018-05-29 21:28     ` Dexuan Cui
2018-05-29 21:28     ` Dexuan Cui

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=SN6PR2101MB112020C6F949F7C72F52D6BEDC6D0@SN6PR2101MB1120.namprd21.prod.outlook.com \
    --to=michael.h.kelley@microsoft.com \
    --cc=apw@canonical.com \
    --cc=bhelgaas@google.com \
    --cc=decui@microsoft.com \
    --cc=driverdev-devel@linuxdriverproject.org \
    --cc=haiyangz@microsoft.com \
    --cc=jasowang@redhat.com \
    --cc=kys@microsoft.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pci@vger.kernel.org \
    --cc=lorenzo.pieralisi@arm.com \
    --cc=marcelo.cerri@canonical.com \
    --cc=olaf@aepfle.de \
    --cc=sthemmin@microsoft.com \
    --cc=vkuznets@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.