All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] Drivers: hv: vm_bus: Handle vmbus rescind calls after vmbus is suspended
@ 2022-07-03  6:53 Shradha Gupta
  2022-07-05 15:41 ` Michael Kelley (LINUX)
  0 siblings, 1 reply; 2+ messages in thread
From: Shradha Gupta @ 2022-07-03  6:53 UTC (permalink / raw)
  To: linux-hyperv, linux-kernel
  Cc: K. Y. Srinivasan, Haiyang Zhang, Stephen Hemminger, Wei Liu,
	Dexuan Cui, Shradha Gupta

Add a flag to indicate that the vmbus is suspended so we should ignore the
rescind offer message. Add a new work_queue for rescind msg, so we could 
drain it in vmbus_suspend processing. This should help avoid any rescind 
offer msg processing if ignore_offer_rescind_msg flag is set to true.
It was observed in some hibernation related scenario testing that after
KVP_vmbus suspend, we get another rescind offer message for the vmbus. This
led to rescind message processing after vm_suspend and we would end up with
a warning and stack dumps

Signed-off-by: Shradha Gupta <shradhagupta@linux.microsoft.com>
---
 drivers/hv/connection.c   | 11 +++++++++++
 drivers/hv/hyperv_vmbus.h |  7 +++++++
 drivers/hv/vmbus_drv.c    | 24 +++++++++++++++++++++++-
 3 files changed, 41 insertions(+), 1 deletion(-)

diff --git a/drivers/hv/connection.c b/drivers/hv/connection.c
index 6218bbf6863a..88a0fd8e80c0 100644
--- a/drivers/hv/connection.c
+++ b/drivers/hv/connection.c
@@ -171,6 +171,14 @@ int vmbus_connect(void)
 		goto cleanup;
 	}
 
+	vmbus_connection.rescind_work_queue =
+		create_workqueue("hv_vmbus_rescind");
+	if (!vmbus_connection.rescind_work_queue) {
+		ret = -ENOMEM;
+		goto cleanup;
+	}
+	vmbus_connection.ignore_offer_rescind_msg = false;
+
 	vmbus_connection.handle_primary_chan_wq =
 		create_workqueue("hv_pri_chan");
 	if (!vmbus_connection.handle_primary_chan_wq) {
@@ -357,6 +365,9 @@ void vmbus_disconnect(void)
 	if (vmbus_connection.handle_primary_chan_wq)
 		destroy_workqueue(vmbus_connection.handle_primary_chan_wq);
 
+	if (vmbus_connection.rescind_work_queue)
+		destroy_workqueue(vmbus_connection.rescind_work_queue);
+
 	if (vmbus_connection.work_queue)
 		destroy_workqueue(vmbus_connection.work_queue);
 
diff --git a/drivers/hv/hyperv_vmbus.h b/drivers/hv/hyperv_vmbus.h
index 4f5b824b16cf..ff8707284554 100644
--- a/drivers/hv/hyperv_vmbus.h
+++ b/drivers/hv/hyperv_vmbus.h
@@ -262,6 +262,13 @@ struct vmbus_connection {
 	struct workqueue_struct *handle_primary_chan_wq;
 	struct workqueue_struct *handle_sub_chan_wq;
 
+	/*
+	 * On suspension of the vmbus, the accumulated rescind message
+	 * must be dropped.
+	 */
+	bool ignore_offer_rescind_msg;
+	struct workqueue_struct *rescind_work_queue;
+
 	/*
 	 * The number of sub-channels and hv_sock channels that should be
 	 * cleaned up upon suspend: sub-channels will be re-created upon
diff --git a/drivers/hv/vmbus_drv.c b/drivers/hv/vmbus_drv.c
index 547ae334e5cd..46bd867f11ba 100644
--- a/drivers/hv/vmbus_drv.c
+++ b/drivers/hv/vmbus_drv.c
@@ -1160,7 +1160,9 @@ void vmbus_on_msg_dpc(unsigned long data)
 			 * work queue: the RESCIND handler can not start to
 			 * run before the OFFER handler finishes.
 			 */
-			schedule_work(&ctx->work);
+			if (vmbus_connection.ignore_offer_rescind_msg)
+				break;
+			queue_work(vmbus_connection.rescind_work_queue, &ctx->work);
 			break;
 
 		case CHANNELMSG_OFFERCHANNEL:
@@ -1186,6 +1188,8 @@ void vmbus_on_msg_dpc(unsigned long data)
 			 * to the CPUs which will execute the offer & rescind
 			 * works by the time these works will start execution.
 			 */
+			if (vmbus_connection.ignore_offer_rescind_msg)
+				break;
 			atomic_inc(&vmbus_connection.offer_in_progress);
 			fallthrough;
 
@@ -2446,8 +2450,20 @@ static int vmbus_acpi_add(struct acpi_device *device)
 #ifdef CONFIG_PM_SLEEP
 static int vmbus_bus_suspend(struct device *dev)
 {
+	struct hv_per_cpu_context *hv_cpu = per_cpu_ptr(
+			hv_context.cpu_context, VMBUS_CONNECT_CPU);
 	struct vmbus_channel *channel, *sc;
 
+	tasklet_disable(&hv_cpu->msg_dpc);
+	vmbus_connection.ignore_offer_rescind_msg = true;
+	tasklet_enable(&hv_cpu->msg_dpc);
+
+	/* Drain all the workqueues as we are in suspend */
+	drain_workqueue(vmbus_connection.rescind_work_queue);
+	drain_workqueue(vmbus_connection.work_queue);
+	drain_workqueue(vmbus_connection.handle_primary_chan_wq);
+	drain_workqueue(vmbus_connection.handle_sub_chan_wq);
+
 	while (atomic_read(&vmbus_connection.offer_in_progress) != 0) {
 		/*
 		 * We wait here until the completion of any channel
@@ -2527,10 +2543,16 @@ static int vmbus_bus_suspend(struct device *dev)
 
 static int vmbus_bus_resume(struct device *dev)
 {
+	struct hv_per_cpu_context *hv_cpu = per_cpu_ptr(
+			hv_context.cpu_context, VMBUS_CONNECT_CPU);
 	struct vmbus_channel_msginfo *msginfo;
 	size_t msgsize;
 	int ret;
 
+	tasklet_disable(&hv_cpu->msg_dpc);
+	vmbus_connection.ignore_offer_rescind_msg = false;
+	tasklet_enable(&hv_cpu->msg_dpc);
+
 	/*
 	 * We only use the 'vmbus_proto_version', which was in use before
 	 * hibernation, to re-negotiate with the host.
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 2+ messages in thread

* RE: [PATCH] Drivers: hv: vm_bus: Handle vmbus rescind calls after vmbus is suspended
  2022-07-03  6:53 [PATCH] Drivers: hv: vm_bus: Handle vmbus rescind calls after vmbus is suspended Shradha Gupta
@ 2022-07-05 15:41 ` Michael Kelley (LINUX)
  0 siblings, 0 replies; 2+ messages in thread
From: Michael Kelley (LINUX) @ 2022-07-05 15:41 UTC (permalink / raw)
  To: Shradha Gupta, linux-hyperv, linux-kernel
  Cc: KY Srinivasan, Haiyang Zhang, Stephen Hemminger, Wei Liu,
	Dexuan Cui, Shradha Gupta

From: Shradha Gupta <shradhagupta@linux.microsoft.com> Sent: Saturday, July 2, 2022 11:54 PM
> 
> Add a flag to indicate that the vmbus is suspended so we should ignore the
> rescind offer message. 

From the code, it appears that the flag causes both offer and rescind offer
messages to be ignored.

> Add a new work_queue for rescind msg, so we could
> drain it in vmbus_suspend processing. This should help avoid any rescind
> offer msg processing if ignore_offer_rescind_msg flag is set to true.
> It was observed in some hibernation related scenario testing that after
> KVP_vmbus suspend, we get another rescind offer message for the vmbus. This

I'm not clear on what "KVP_vmbus suspend" is.  

> led to rescind message processing after vm_suspend and we would end up with
> a warning and stack dumps

And what is "vm_suspend"?  There is a function vmbus_bus_suspend() -- is that
perhaps the intention here?

But I think the core issue you observed is that after vmbus_bus_suspend() runs,
we might still process a rescind message for a channel that has already
been closed as part of the suspend operation.

> 
> Signed-off-by: Shradha Gupta <shradhagupta@linux.microsoft.com>
> ---
>  drivers/hv/connection.c   | 11 +++++++++++
>  drivers/hv/hyperv_vmbus.h |  7 +++++++
>  drivers/hv/vmbus_drv.c    | 24 +++++++++++++++++++++++-
>  3 files changed, 41 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/hv/connection.c b/drivers/hv/connection.c
> index 6218bbf6863a..88a0fd8e80c0 100644
> --- a/drivers/hv/connection.c
> +++ b/drivers/hv/connection.c
> @@ -171,6 +171,14 @@ int vmbus_connect(void)
>  		goto cleanup;
>  	}
> 
> +	vmbus_connection.rescind_work_queue =
> +		create_workqueue("hv_vmbus_rescind");
> +	if (!vmbus_connection.rescind_work_queue) {
> +		ret = -ENOMEM;
> +		goto cleanup;
> +	}
> +	vmbus_connection.ignore_offer_rescind_msg = false;
> +
>  	vmbus_connection.handle_primary_chan_wq =
>  		create_workqueue("hv_pri_chan");
>  	if (!vmbus_connection.handle_primary_chan_wq) {
> @@ -357,6 +365,9 @@ void vmbus_disconnect(void)
>  	if (vmbus_connection.handle_primary_chan_wq)
>  		destroy_workqueue(vmbus_connection.handle_primary_chan_wq);
> 
> +	if (vmbus_connection.rescind_work_queue)
> +		destroy_workqueue(vmbus_connection.rescind_work_queue);
> +
>  	if (vmbus_connection.work_queue)
>  		destroy_workqueue(vmbus_connection.work_queue);
> 
> diff --git a/drivers/hv/hyperv_vmbus.h b/drivers/hv/hyperv_vmbus.h
> index 4f5b824b16cf..ff8707284554 100644
> --- a/drivers/hv/hyperv_vmbus.h
> +++ b/drivers/hv/hyperv_vmbus.h
> @@ -262,6 +262,13 @@ struct vmbus_connection {
>  	struct workqueue_struct *handle_primary_chan_wq;
>  	struct workqueue_struct *handle_sub_chan_wq;
> 
> +	/*
> +	 * On suspension of the vmbus, the accumulated rescind message
> +	 * must be dropped.

As noted regarding the commit message, the flag causes both offer
messages and rescind offer messages to be dropped.  Maybe the flag
should be renamed so that it is clear both message types are
dropped?

> +	 */
> +	bool ignore_offer_rescind_msg;
> +	struct workqueue_struct *rescind_work_queue;
> +
>  	/*
>  	 * The number of sub-channels and hv_sock channels that should be
>  	 * cleaned up upon suspend: sub-channels will be re-created upon
> diff --git a/drivers/hv/vmbus_drv.c b/drivers/hv/vmbus_drv.c
> index 547ae334e5cd..46bd867f11ba 100644
> --- a/drivers/hv/vmbus_drv.c
> +++ b/drivers/hv/vmbus_drv.c
> @@ -1160,7 +1160,9 @@ void vmbus_on_msg_dpc(unsigned long data)
>  			 * work queue: the RESCIND handler can not start to
>  			 * run before the OFFER handler finishes.
>  			 */
> -			schedule_work(&ctx->work);
> +			if (vmbus_connection.ignore_offer_rescind_msg)
> +				break;
> +			queue_work(vmbus_connection.rescind_work_queue, &ctx->work);
>  			break;
> 
>  		case CHANNELMSG_OFFERCHANNEL:
> @@ -1186,6 +1188,8 @@ void vmbus_on_msg_dpc(unsigned long data)
>  			 * to the CPUs which will execute the offer & rescind
>  			 * works by the time these works will start execution.
>  			 */
> +			if (vmbus_connection.ignore_offer_rescind_msg)
> +				break;
>  			atomic_inc(&vmbus_connection.offer_in_progress);
>  			fallthrough;
> 
> @@ -2446,8 +2450,20 @@ static int vmbus_acpi_add(struct acpi_device *device)
>  #ifdef CONFIG_PM_SLEEP
>  static int vmbus_bus_suspend(struct device *dev)
>  {
> +	struct hv_per_cpu_context *hv_cpu = per_cpu_ptr(
> +			hv_context.cpu_context, VMBUS_CONNECT_CPU);
>  	struct vmbus_channel *channel, *sc;
> 
> +	tasklet_disable(&hv_cpu->msg_dpc);
> +	vmbus_connection.ignore_offer_rescind_msg = true;
> +	tasklet_enable(&hv_cpu->msg_dpc);
> +
> +	/* Drain all the workqueues as we are in suspend */
> +	drain_workqueue(vmbus_connection.rescind_work_queue);
> +	drain_workqueue(vmbus_connection.work_queue);
> +	drain_workqueue(vmbus_connection.handle_primary_chan_wq);
> +	drain_workqueue(vmbus_connection.handle_sub_chan_wq);
> +

The above looks good to me.  The ordering is very important, and you have
the order correct.   Once the tasklet is re-enabled, there may be messages
queued up that will be processed.  But the flag is now set, so those messages
will not result in any work being queued to work_queue or rescind_work_queue.
Note that a memory barrier should be used after setting the flag to true, but
tasklet_enable() provides that memory barrier, which might be worth a comment.
Then the work_queue and rescind_work_queues are drained.  The process
of draining these work queues could put entries into the handle_primary_chan_wq
or handle_sub_chan_wq, so then those two queues are drained.

>  	while (atomic_read(&vmbus_connection.offer_in_progress) != 0) {

With your above changes in place, I think the test here for offer_in_progress is
redundant.  There can't be any offers in progress since the workqueues
have been drained, and new entries to the workqueues are prevented.

>  		/*
>  		 * We wait here until the completion of any channel
> @@ -2527,10 +2543,16 @@ static int vmbus_bus_suspend(struct device *dev)
> 
>  static int vmbus_bus_resume(struct device *dev)
>  {
> +	struct hv_per_cpu_context *hv_cpu = per_cpu_ptr(
> +			hv_context.cpu_context, VMBUS_CONNECT_CPU);
>  	struct vmbus_channel_msginfo *msginfo;
>  	size_t msgsize;
>  	int ret;
> 
> +	tasklet_disable(&hv_cpu->msg_dpc);
> +	vmbus_connection.ignore_offer_rescind_msg = false;
> +	tasklet_enable(&hv_cpu->msg_dpc);
> +

Is there a reason for the tasklet_disable()/enable() calls?  I can see that they
are needed when the flag transitions from false to true, but it's not clear to
me why they would be needed for the transition from true to false.

>  	/*
>  	 * We only use the 'vmbus_proto_version', which was in use before
>  	 * hibernation, to re-negotiate with the host.
> --
> 2.17.1

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2022-07-05 15:42 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-07-03  6:53 [PATCH] Drivers: hv: vm_bus: Handle vmbus rescind calls after vmbus is suspended Shradha Gupta
2022-07-05 15:41 ` Michael Kelley (LINUX)

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.