linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2] hv: retry infinitely on hypercall transient failures
@ 2017-01-05  2:12 Long Li
  2017-01-05  7:47 ` Greg KH
  0 siblings, 1 reply; 5+ messages in thread
From: Long Li @ 2017-01-05  2:12 UTC (permalink / raw)
  To: K. Y. Srinivasan, Haiyang Zhang; +Cc: devel, linux-kernel, Long Li

From: Long Li <longli@microsoft.com>

Hyper-v host guarantees that a hypercall will finish in reasonable time.
Retry infinitely on transient failures to avoid returning error to upper layer.

Signed-off-by: Long Li <longli@microsoft.com>
---
 drivers/hv/connection.c | 30 ++++++++++++++----------------
 1 file changed, 14 insertions(+), 16 deletions(-)

diff --git a/drivers/hv/connection.c b/drivers/hv/connection.c
index 6ce8b87..4b3cfde 100644
--- a/drivers/hv/connection.c
+++ b/drivers/hv/connection.c
@@ -438,46 +438,44 @@ void vmbus_on_event(unsigned long data)
 int vmbus_post_msg(void *buffer, size_t buflen)
 {
 	union hv_connection_id conn_id;
-	int ret = 0;
-	int retries = 0;
+	int ret;
 	u32 usec = 1;
 
 	conn_id.asu32 = 0;
 	conn_id.u.id = VMBUS_MESSAGE_CONNECTION_ID;
 
 	/*
-	 * hv_post_message() can have transient failures because of
-	 * insufficient resources. Retry the operation a couple of
-	 * times before giving up.
+	 * hv_post_message() can have transient failures. We retry infinitely
+	 * on these failures because host guarantees hypercall will finish.
 	 */
-	while (retries < 20) {
+	while (1) {
 		ret = hv_post_message(conn_id, 1, buffer, buflen);
 
 		switch (ret) {
+		/*
+		 * Retry on transient failures:
+		 * 1. HV_STATUS_INVALID_CONNECTION_ID:
+		 *    We send messages too frequently.
+		 *
+		 * 2. HV_STATUS_INSUFFICIENT_MEMORY and
+		 *    HV_STATUS_INSUFFICIENT_BUFFERS:
+		 *    The host is temporariliy running out of resources.
+		 */
 		case HV_STATUS_INVALID_CONNECTION_ID:
-			/*
-			 * We could get this if we send messages too
-			 * frequently.
-			 */
-			ret = -EAGAIN;
-			break;
 		case HV_STATUS_INSUFFICIENT_MEMORY:
 		case HV_STATUS_INSUFFICIENT_BUFFERS:
-			ret = -ENOMEM;
 			break;
 		case HV_STATUS_SUCCESS:
-			return ret;
+			return 0;
 		default:
 			pr_err("hv_post_msg() failed; error code:%d\n", ret);
 			return -EINVAL;
 		}
 
-		retries++;
 		udelay(usec);
 		if (usec < 2048)
 			usec *= 2;
 	}
-	return ret;
 }
 
 /*
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH v2] hv: retry infinitely on hypercall transient failures
  2017-01-05  2:12 [PATCH v2] hv: retry infinitely on hypercall transient failures Long Li
@ 2017-01-05  7:47 ` Greg KH
  2017-01-07  7:23   ` Long Li
  0 siblings, 1 reply; 5+ messages in thread
From: Greg KH @ 2017-01-05  7:47 UTC (permalink / raw)
  To: Long Li; +Cc: K. Y. Srinivasan, Haiyang Zhang, devel, linux-kernel

On Wed, Jan 04, 2017 at 06:12:20PM -0800, Long Li wrote:
> From: Long Li <longli@microsoft.com>
> 
> Hyper-v host guarantees that a hypercall will finish in reasonable time.
> Retry infinitely on transient failures to avoid returning error to upper layer.

Again, never retry "forever", always have a way out, otherwise you will
crash.

And again, why are you making this change?  What problem does it solve?

greg k-h

^ permalink raw reply	[flat|nested] 5+ messages in thread

* RE: [PATCH v2] hv: retry infinitely on hypercall transient failures
  2017-01-05  7:47 ` Greg KH
@ 2017-01-07  7:23   ` Long Li
  2017-01-07  7:42     ` Greg KH
  0 siblings, 1 reply; 5+ messages in thread
From: Long Li @ 2017-01-07  7:23 UTC (permalink / raw)
  To: Greg KH; +Cc: KY Srinivasan, Haiyang Zhang, devel, linux-kernel

> -----Original Message-----
> From: Greg KH [mailto:greg@kroah.com]
> Sent: Wednesday, January 04, 2017 11:48 PM
> To: Long Li <longli@microsoft.com>
> Cc: KY Srinivasan <kys@microsoft.com>; Haiyang Zhang
> <haiyangz@microsoft.com>; devel@linuxdriverproject.org; linux-
> kernel@vger.kernel.org
> Subject: Re: [PATCH v2] hv: retry infinitely on hypercall transient failures
> 
> On Wed, Jan 04, 2017 at 06:12:20PM -0800, Long Li wrote:
> > From: Long Li <longli@microsoft.com>
> >
> > Hyper-v host guarantees that a hypercall will finish in reasonable time.
> > Retry infinitely on transient failures to avoid returning error to upper layer.
> 
> Again, never retry "forever", always have a way out, otherwise you will crash.
> 
> And again, why are you making this change?  What problem does it solve?

The problem it tries to solve is that in this code we are returning error prematurely on transient failures. The hypercall is used mostly in channel establishment. If we return a transient failure, the VM may not boot or not useful after boot due to some devices missing.

Another approach is to increase the number of retries. But we don't know how many retries is safe, and Windows host side expects the guest retry infinitely and not return error on transient failures.

> 
> greg k-h

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH v2] hv: retry infinitely on hypercall transient failures
  2017-01-07  7:23   ` Long Li
@ 2017-01-07  7:42     ` Greg KH
  2017-01-07  8:06       ` Long Li
  0 siblings, 1 reply; 5+ messages in thread
From: Greg KH @ 2017-01-07  7:42 UTC (permalink / raw)
  To: Long Li; +Cc: KY Srinivasan, Haiyang Zhang, devel, linux-kernel

On Sat, Jan 07, 2017 at 07:23:14AM +0000, Long Li wrote:
> > -----Original Message-----
> > From: Greg KH [mailto:greg@kroah.com]
> > Sent: Wednesday, January 04, 2017 11:48 PM
> > To: Long Li <longli@microsoft.com>
> > Cc: KY Srinivasan <kys@microsoft.com>; Haiyang Zhang
> > <haiyangz@microsoft.com>; devel@linuxdriverproject.org; linux-
> > kernel@vger.kernel.org
> > Subject: Re: [PATCH v2] hv: retry infinitely on hypercall transient failures
> > 
> > On Wed, Jan 04, 2017 at 06:12:20PM -0800, Long Li wrote:
> > > From: Long Li <longli@microsoft.com>
> > >
> > > Hyper-v host guarantees that a hypercall will finish in reasonable time.
> > > Retry infinitely on transient failures to avoid returning error to upper layer.
> > 
> > Again, never retry "forever", always have a way out, otherwise you will crash.
> > 
> > And again, why are you making this change?  What problem does it solve?
> 
> The problem it tries to solve is that in this code we are returning
> error prematurely on transient failures. The hypercall is used mostly
> in channel establishment. If we return a transient failure, the VM may
> not boot or not useful after boot due to some devices missing.
> 
> Another approach is to increase the number of retries. But we don't
> know how many retries is safe, and Windows host side expects the guest
> retry infinitely and not return error on transient failures.

That implies a lot of trust in the host side, don't you think?

Worse case, make the delay a minute or so, but give the system a way out
incase there's a bug in the host.  As there will be bugs in the host,
just like there are bugs in the client :)

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 5+ messages in thread

* RE: [PATCH v2] hv: retry infinitely on hypercall transient failures
  2017-01-07  7:42     ` Greg KH
@ 2017-01-07  8:06       ` Long Li
  0 siblings, 0 replies; 5+ messages in thread
From: Long Li @ 2017-01-07  8:06 UTC (permalink / raw)
  To: Greg KH; +Cc: KY Srinivasan, Haiyang Zhang, devel, linux-kernel



> -----Original Message-----
> From: Greg KH [mailto:greg@kroah.com]
> Sent: Friday, January 06, 2017 11:43 PM
> To: Long Li <longli@microsoft.com>
> Cc: KY Srinivasan <kys@microsoft.com>; Haiyang Zhang
> <haiyangz@microsoft.com>; devel@linuxdriverproject.org; linux-
> kernel@vger.kernel.org
> Subject: Re: [PATCH v2] hv: retry infinitely on hypercall transient failures
> 
> On Sat, Jan 07, 2017 at 07:23:14AM +0000, Long Li wrote:
> > > -----Original Message-----
> > > From: Greg KH [mailto:greg@kroah.com]
> > > Sent: Wednesday, January 04, 2017 11:48 PM
> > > To: Long Li <longli@microsoft.com>
> > > Cc: KY Srinivasan <kys@microsoft.com>; Haiyang Zhang
> > > <haiyangz@microsoft.com>; devel@linuxdriverproject.org; linux-
> > > kernel@vger.kernel.org
> > > Subject: Re: [PATCH v2] hv: retry infinitely on hypercall transient
> > > failures
> > >
> > > On Wed, Jan 04, 2017 at 06:12:20PM -0800, Long Li wrote:
> > > > From: Long Li <longli@microsoft.com>
> > > >
> > > > Hyper-v host guarantees that a hypercall will finish in reasonable time.
> > > > Retry infinitely on transient failures to avoid returning error to upper
> layer.
> > >
> > > Again, never retry "forever", always have a way out, otherwise you will
> crash.
> > >
> > > And again, why are you making this change?  What problem does it solve?
> >
> > The problem it tries to solve is that in this code we are returning
> > error prematurely on transient failures. The hypercall is used mostly
> > in channel establishment. If we return a transient failure, the VM may
> > not boot or not useful after boot due to some devices missing.
> >
> > Another approach is to increase the number of retries. But we don't
> > know how many retries is safe, and Windows host side expects the guest
> > retry infinitely and not return error on transient failures.
> 
> That implies a lot of trust in the host side, don't you think?
> 
> Worse case, make the delay a minute or so, but give the system a way out
> incase there's a bug in the host.  As there will be bugs in the host, just like
> there are bugs in the client :)

This makes sense. 1 minute is a long time for a hypercall. I will send V3.

> 
> thanks,
> 
> greg k-h

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2017-01-07  8:07 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-01-05  2:12 [PATCH v2] hv: retry infinitely on hypercall transient failures Long Li
2017-01-05  7:47 ` Greg KH
2017-01-07  7:23   ` Long Li
2017-01-07  7:42     ` Greg KH
2017-01-07  8:06       ` Long Li

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).