linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/5] Drivers: hv: Miscellaneous fixes
@ 2015-09-16  0:37 K. Y. Srinivasan
  2015-09-16  0:37 ` [PATCH 1/5] Drivers: hv: vmbus: fix init_vp_index() for reloading hv_netvsc K. Y. Srinivasan
  0 siblings, 1 reply; 19+ messages in thread
From: K. Y. Srinivasan @ 2015-09-16  0:37 UTC (permalink / raw)
  To: gregkh, linux-kernel, devel, olaf, apw, vkuznets, jasowang
  Cc: K. Y. Srinivasan

The Copy-VMFile cmdlet on the host may fail because the guest fcopy
driver state machine gets out of sync. This happens because the ->state
and ->context variables are accessed by the main thread and from
interrupt context. If an interrupt happens between fcopy_respond_to_host
and hv_poll_channel in fcopy_write, then hv_fcopy_onchannelcallback
called from that interrupt sees still state HVUTIL_USERSPACE_RECV. It
updates the context, but fcopy_write will not notice that update and
hv_poll_channel gets called with an empty context.  As a result
hv_fcopy_daemon gets no more data. After a timeout Copy-VMFile fails
with timeout.

In my initial testing for a fix I put a "mb()" after the last .state
change in fcopy_write. But this series implementes read/write memory
barriers as needed. Let me know if this is overdoing things.


Dexuan Cui (1):
  Drivers: hv: vmbus: fix init_vp_index() for reloading hv_netvsc

Olaf Hering (4):
  hv: add helpers to handle hv_util device state
  hv: fcopy: use wrappers to propagate state
  hv: kvp: use wrappers to propaigate state
  hv: vss: use wrappers to propagate state

 drivers/hv/channel_mgmt.c |   17 +++++++++++++++++
 drivers/hv/hv_fcopy.c     |   36 ++++++++++++++++++++----------------
 drivers/hv/hv_kvp.c       |   39 +++++++++++++++++++++------------------
 drivers/hv/hv_snapshot.c  |   37 ++++++++++++++++++++-----------------
 drivers/hv/hyperv_vmbus.h |   14 ++++++++++++++
 5 files changed, 92 insertions(+), 51 deletions(-)

-- 
1.7.4.1


^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH 1/5] Drivers: hv: vmbus: fix init_vp_index() for reloading hv_netvsc
  2015-09-16  0:37 [PATCH 0/5] Drivers: hv: Miscellaneous fixes K. Y. Srinivasan
@ 2015-09-16  0:37 ` K. Y. Srinivasan
  2015-09-16  0:37   ` [PATCH 2/5] hv: add helpers to handle hv_util device state K. Y. Srinivasan
                     ` (3 more replies)
  0 siblings, 4 replies; 19+ messages in thread
From: K. Y. Srinivasan @ 2015-09-16  0:37 UTC (permalink / raw)
  To: gregkh, linux-kernel, devel, olaf, apw, vkuznets, jasowang
  Cc: Dexuan Cui, K. Y. Srinivasan

From: Dexuan Cui <decui@microsoft.com>

This fixes the recent commit 3b71107d73b16074afa7658f3f0fcf837aabfe24:
Drivers: hv: vmbus: Further improve CPU affiliation logic

Without the fix, reloading hv_netvsc hangs the guest.

Signed-off-by: Dexuan Cui <decui@microsoft.com>
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
---
 drivers/hv/channel_mgmt.c |   17 +++++++++++++++++
 1 files changed, 17 insertions(+), 0 deletions(-)

diff --git a/drivers/hv/channel_mgmt.c b/drivers/hv/channel_mgmt.c
index 3ab4753..8a4105c 100644
--- a/drivers/hv/channel_mgmt.c
+++ b/drivers/hv/channel_mgmt.c
@@ -204,6 +204,8 @@ void hv_process_channel_removal(struct vmbus_channel *channel, u32 relid)
 		spin_lock_irqsave(&vmbus_connection.channel_lock, flags);
 		list_del(&channel->listentry);
 		spin_unlock_irqrestore(&vmbus_connection.channel_lock, flags);
+
+		primary_channel = channel;
 	} else {
 		primary_channel = channel->primary_channel;
 		spin_lock_irqsave(&primary_channel->lock, flags);
@@ -211,6 +213,14 @@ void hv_process_channel_removal(struct vmbus_channel *channel, u32 relid)
 		primary_channel->num_sc--;
 		spin_unlock_irqrestore(&primary_channel->lock, flags);
 	}
+
+	/*
+	 * We need to free the bit for init_vp_index() to work in the case
+	 * of sub-channel, when we reload drivers like hv_netvsc.
+	 */
+	cpumask_clear_cpu(channel->target_cpu,
+			  &primary_channel->alloced_cpus_in_node);
+
 	free_channel(channel);
 }
 
@@ -457,6 +467,13 @@ static void init_vp_index(struct vmbus_channel *channel, const uuid_le *type_gui
 			continue;
 		}
 
+		/*
+		 * NOTE: in the case of sub-channel, we clear the sub-channel
+		 * related bit(s) in primary->alloced_cpus_in_node in
+		 * hv_process_channel_removal(), so when we reload drivers
+		 * like hv_netvsc in SMP guest, here we're able to re-allocate
+		 * bit from primary->alloced_cpus_in_node.
+		 */
 		if (!cpumask_test_cpu(cur_cpu,
 				&primary->alloced_cpus_in_node)) {
 			cpumask_set_cpu(cur_cpu,
-- 
1.7.4.1


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 2/5] hv: add helpers to handle hv_util device state
  2015-09-16  0:37 ` [PATCH 1/5] Drivers: hv: vmbus: fix init_vp_index() for reloading hv_netvsc K. Y. Srinivasan
@ 2015-09-16  0:37   ` K. Y. Srinivasan
  2015-09-21  5:25     ` Greg KH
  2015-09-16  0:37   ` [PATCH 3/5] hv: fcopy: use wrappers to propagate state K. Y. Srinivasan
                     ` (2 subsequent siblings)
  3 siblings, 1 reply; 19+ messages in thread
From: K. Y. Srinivasan @ 2015-09-16  0:37 UTC (permalink / raw)
  To: gregkh, linux-kernel, devel, olaf, apw, vkuznets, jasowang
  Cc: K. Y. Srinivasan

From: Olaf Hering <olaf@aepfle.de>

The callbacks in kvp, vss and fcopy code are called both from the main thread
as well as from interrupt context. If a state change is done by the main
thread it is not immediately seen by the interrupt. As a result the
state machine gets out of sync.

Force propagation of state changes via get/set helpers with a memory barrier.

Signed-off-by: Olaf Hering <olaf@aepfle.de>
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
---
 drivers/hv/hyperv_vmbus.h |   14 ++++++++++++++
 1 files changed, 14 insertions(+), 0 deletions(-)

diff --git a/drivers/hv/hyperv_vmbus.h b/drivers/hv/hyperv_vmbus.h
index 4b1eb6d..dee5798 100644
--- a/drivers/hv/hyperv_vmbus.h
+++ b/drivers/hv/hyperv_vmbus.h
@@ -780,4 +780,18 @@ enum hvutil_device_state {
 	HVUTIL_DEVICE_DYING,     /* driver unload is in progress */
 };
 
+static inline void hvutil_device_set_state(enum hvutil_device_state *p,
+					   enum hvutil_device_state s)
+{
+	*p = s;
+	wmb();
+}
+
+static inline enum hvutil_device_state
+hvutil_device_get_state(enum hvutil_device_state *p)
+{
+	rmb();
+	return *p;
+}
+
 #endif /* _HYPERV_VMBUS_H */
-- 
1.7.4.1


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 3/5] hv: fcopy: use wrappers to propagate state
  2015-09-16  0:37 ` [PATCH 1/5] Drivers: hv: vmbus: fix init_vp_index() for reloading hv_netvsc K. Y. Srinivasan
  2015-09-16  0:37   ` [PATCH 2/5] hv: add helpers to handle hv_util device state K. Y. Srinivasan
@ 2015-09-16  0:37   ` K. Y. Srinivasan
  2015-09-16  0:37   ` [PATCH 4/5] hv: kvp: use wrappers to propaigate state K. Y. Srinivasan
  2015-09-16  0:37   ` [PATCH 5/5] hv: vss: use wrappers to propagate state K. Y. Srinivasan
  3 siblings, 0 replies; 19+ messages in thread
From: K. Y. Srinivasan @ 2015-09-16  0:37 UTC (permalink / raw)
  To: gregkh, linux-kernel, devel, olaf, apw, vkuznets, jasowang
  Cc: K. Y. Srinivasan

From: Olaf Hering <olaf@aepfle.de>

The "state" is used by several threads of execution.
Propagate the state to make changes visible. Also propagate context
change in hv_fcopy_onchannelcallback.
Without this change fcopy may hang at random points.

Signed-off-by: Olaf Hering <olaf@aepfle.de>
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
---
 drivers/hv/hv_fcopy.c |   36 ++++++++++++++++++++----------------
 1 files changed, 20 insertions(+), 16 deletions(-)

diff --git a/drivers/hv/hv_fcopy.c b/drivers/hv/hv_fcopy.c
index db4b887..47d9c34 100644
--- a/drivers/hv/hv_fcopy.c
+++ b/drivers/hv/hv_fcopy.c
@@ -46,7 +46,7 @@
  */
 
 static struct {
-	int state;   /* hvutil_device_state */
+	enum hvutil_device_state state;
 	int recv_len; /* number of bytes received. */
 	struct hv_fcopy_hdr  *fcopy_msg; /* current message */
 	struct vmbus_channel *recv_channel; /* chn we got the request */
@@ -67,6 +67,9 @@ static struct hvutil_transport *hvt;
  */
 static int dm_reg_value;
 
+#define fcopy_get_state() hvutil_device_get_state(&fcopy_transaction.state)
+#define fcopy_set_state(s) hvutil_device_set_state(&fcopy_transaction.state, s)
+
 static void fcopy_timeout_func(struct work_struct *dummy)
 {
 	/*
@@ -76,8 +79,8 @@ static void fcopy_timeout_func(struct work_struct *dummy)
 	fcopy_respond_to_host(HV_E_FAIL);
 
 	/* Transaction is finished, reset the state. */
-	if (fcopy_transaction.state > HVUTIL_READY)
-		fcopy_transaction.state = HVUTIL_READY;
+	if (fcopy_get_state() > HVUTIL_READY)
+		fcopy_set_state(HVUTIL_READY);
 
 	hv_poll_channel(fcopy_transaction.fcopy_context,
 			hv_fcopy_onchannelcallback);
@@ -108,7 +111,7 @@ static int fcopy_handle_handshake(u32 version)
 		return -EINVAL;
 	}
 	pr_debug("FCP: userspace daemon ver. %d registered\n", version);
-	fcopy_transaction.state = HVUTIL_READY;
+	fcopy_set_state(HVUTIL_READY);
 	hv_poll_channel(fcopy_transaction.fcopy_context,
 			hv_fcopy_onchannelcallback);
 	return 0;
@@ -162,13 +165,13 @@ static void fcopy_send_data(struct work_struct *dummy)
 		break;
 	}
 
-	fcopy_transaction.state = HVUTIL_USERSPACE_REQ;
+	fcopy_set_state(HVUTIL_USERSPACE_REQ);
 	rc = hvutil_transport_send(hvt, out_src, out_len);
 	if (rc) {
 		pr_debug("FCP: failed to communicate to the daemon: %d\n", rc);
 		if (cancel_delayed_work_sync(&fcopy_timeout_work)) {
 			fcopy_respond_to_host(HV_E_FAIL);
-			fcopy_transaction.state = HVUTIL_READY;
+			fcopy_set_state(HVUTIL_READY);
 		}
 	}
 	kfree(smsg_out);
@@ -227,12 +230,13 @@ void hv_fcopy_onchannelcallback(void *context)
 	int util_fw_version;
 	int fcopy_srv_version;
 
-	if (fcopy_transaction.state > HVUTIL_READY) {
+	if (fcopy_get_state() > HVUTIL_READY) {
 		/*
 		 * We will defer processing this callback once
 		 * the current transaction is complete.
 		 */
 		fcopy_transaction.fcopy_context = context;
+		wmb();
 		return;
 	}
 	fcopy_transaction.fcopy_context = NULL;
@@ -264,12 +268,12 @@ void hv_fcopy_onchannelcallback(void *context)
 		fcopy_transaction.recv_req_id = requestid;
 		fcopy_transaction.fcopy_msg = fcopy_msg;
 
-		if (fcopy_transaction.state < HVUTIL_READY) {
+		if (fcopy_get_state() < HVUTIL_READY) {
 			/* Userspace is not registered yet */
 			fcopy_respond_to_host(HV_E_FAIL);
 			return;
 		}
-		fcopy_transaction.state = HVUTIL_HOSTMSG_RECEIVED;
+		fcopy_set_state(HVUTIL_HOSTMSG_RECEIVED);
 
 		/*
 		 * Send the information to the user-level daemon.
@@ -291,10 +295,10 @@ static int fcopy_on_msg(void *msg, int len)
 	if (len != sizeof(int))
 		return -EINVAL;
 
-	if (fcopy_transaction.state == HVUTIL_DEVICE_INIT)
+	if (fcopy_get_state() == HVUTIL_DEVICE_INIT)
 		return fcopy_handle_handshake(*val);
 
-	if (fcopy_transaction.state != HVUTIL_USERSPACE_REQ)
+	if (fcopy_get_state() != HVUTIL_USERSPACE_REQ)
 		return -EINVAL;
 
 	/*
@@ -302,9 +306,9 @@ static int fcopy_on_msg(void *msg, int len)
 	 * to the host. But first, cancel the timeout.
 	 */
 	if (cancel_delayed_work_sync(&fcopy_timeout_work)) {
-		fcopy_transaction.state = HVUTIL_USERSPACE_RECV;
+		fcopy_set_state(HVUTIL_USERSPACE_RECV);
 		fcopy_respond_to_host(*val);
-		fcopy_transaction.state = HVUTIL_READY;
+		fcopy_set_state(HVUTIL_READY);
 		hv_poll_channel(fcopy_transaction.fcopy_context,
 				hv_fcopy_onchannelcallback);
 	}
@@ -317,7 +321,7 @@ static void fcopy_on_reset(void)
 	/*
 	 * The daemon has exited; reset the state.
 	 */
-	fcopy_transaction.state = HVUTIL_DEVICE_INIT;
+	fcopy_set_state(HVUTIL_DEVICE_INIT);
 
 	if (cancel_delayed_work_sync(&fcopy_timeout_work))
 		fcopy_respond_to_host(HV_E_FAIL);
@@ -333,7 +337,7 @@ int hv_fcopy_init(struct hv_util_service *srv)
 	 * Defer processing channel callbacks until the daemon
 	 * has registered.
 	 */
-	fcopy_transaction.state = HVUTIL_DEVICE_INIT;
+	fcopy_set_state(HVUTIL_DEVICE_INIT);
 
 	hvt = hvutil_transport_init(fcopy_devname, 0, 0,
 				    fcopy_on_msg, fcopy_on_reset);
@@ -345,7 +349,7 @@ int hv_fcopy_init(struct hv_util_service *srv)
 
 void hv_fcopy_deinit(void)
 {
-	fcopy_transaction.state = HVUTIL_DEVICE_DYING;
+	fcopy_set_state(HVUTIL_DEVICE_DYING);
 	cancel_delayed_work_sync(&fcopy_timeout_work);
 	hvutil_transport_destroy(hvt);
 }
-- 
1.7.4.1


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 4/5] hv: kvp: use wrappers to propaigate state
  2015-09-16  0:37 ` [PATCH 1/5] Drivers: hv: vmbus: fix init_vp_index() for reloading hv_netvsc K. Y. Srinivasan
  2015-09-16  0:37   ` [PATCH 2/5] hv: add helpers to handle hv_util device state K. Y. Srinivasan
  2015-09-16  0:37   ` [PATCH 3/5] hv: fcopy: use wrappers to propagate state K. Y. Srinivasan
@ 2015-09-16  0:37   ` K. Y. Srinivasan
  2015-09-21  5:26     ` Greg KH
  2015-09-16  0:37   ` [PATCH 5/5] hv: vss: use wrappers to propagate state K. Y. Srinivasan
  3 siblings, 1 reply; 19+ messages in thread
From: K. Y. Srinivasan @ 2015-09-16  0:37 UTC (permalink / raw)
  To: gregkh, linux-kernel, devel, olaf, apw, vkuznets, jasowang
  Cc: K. Y. Srinivasan

From: Olaf Hering <olaf@aepfle.de>

The "state" is used by several threads of execution.
Propagate the state to make changes visible. Also propagate context
change in kvp_on_msg.

Signed-off-by: Olaf Hering <olaf@aepfle.de>
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
---
 drivers/hv/hv_kvp.c |   39 +++++++++++++++++++++------------------
 1 files changed, 21 insertions(+), 18 deletions(-)

diff --git a/drivers/hv/hv_kvp.c b/drivers/hv/hv_kvp.c
index 74c38a9..778d353 100644
--- a/drivers/hv/hv_kvp.c
+++ b/drivers/hv/hv_kvp.c
@@ -61,7 +61,7 @@
  */
 
 static struct {
-	int state;   /* hvutil_device_state */
+	enum hvutil_device_state state;
 	int recv_len; /* number of bytes received. */
 	struct hv_kvp_msg  *kvp_msg; /* current message */
 	struct vmbus_channel *recv_channel; /* chn we got the request */
@@ -74,6 +74,9 @@ static struct {
  */
 static int dm_reg_value;
 
+#define kvp_get_state() hvutil_device_get_state(&kvp_transaction.state)
+#define kvp_set_state(s) hvutil_device_set_state(&kvp_transaction.state, s)
+
 static void kvp_send_key(struct work_struct *dummy);
 
 
@@ -122,8 +125,8 @@ static void kvp_timeout_func(struct work_struct *dummy)
 	kvp_respond_to_host(NULL, HV_E_FAIL);
 
 	/* Transaction is finished, reset the state. */
-	if (kvp_transaction.state > HVUTIL_READY)
-		kvp_transaction.state = HVUTIL_READY;
+	if (kvp_get_state() > HVUTIL_READY)
+		kvp_set_state(HVUTIL_READY);
 
 	hv_poll_channel(kvp_transaction.kvp_context,
 			hv_kvp_onchannelcallback);
@@ -153,7 +156,7 @@ static int kvp_handle_handshake(struct hv_kvp_msg *msg)
 	pr_debug("KVP: userspace daemon ver. %d registered\n",
 		 KVP_OP_REGISTER);
 	kvp_register(dm_reg_value);
-	kvp_transaction.state = HVUTIL_READY;
+	kvp_set_state(HVUTIL_READY);
 
 	return 0;
 }
@@ -177,15 +180,14 @@ static int kvp_on_msg(void *msg, int len)
 	 * with the daemon; handle that first.
 	 */
 
-	if (kvp_transaction.state < HVUTIL_READY) {
+	if (kvp_get_state() < HVUTIL_READY)
 		return kvp_handle_handshake(message);
-	}
 
 	/* We didn't send anything to userspace so the reply is spurious */
-	if (kvp_transaction.state < HVUTIL_USERSPACE_REQ)
+	if (kvp_get_state() < HVUTIL_USERSPACE_REQ)
 		return -EINVAL;
 
-	kvp_transaction.state = HVUTIL_USERSPACE_RECV;
+	kvp_set_state(HVUTIL_USERSPACE_RECV);
 
 	/*
 	 * Based on the version of the daemon, we propagate errors from the
@@ -218,7 +220,7 @@ static int kvp_on_msg(void *msg, int len)
 	 */
 	if (cancel_delayed_work_sync(&kvp_timeout_work)) {
 		kvp_respond_to_host(message, error);
-		kvp_transaction.state = HVUTIL_READY;
+		kvp_set_state(HVUTIL_READY);
 		hv_poll_channel(kvp_transaction.kvp_context,
 				hv_kvp_onchannelcallback);
 	}
@@ -349,7 +351,7 @@ kvp_send_key(struct work_struct *dummy)
 	int rc;
 
 	/* The transaction state is wrong. */
-	if (kvp_transaction.state != HVUTIL_HOSTMSG_RECEIVED)
+	if (kvp_get_state() != HVUTIL_HOSTMSG_RECEIVED)
 		return;
 
 	message = kzalloc(sizeof(*message), GFP_KERNEL);
@@ -442,13 +444,13 @@ kvp_send_key(struct work_struct *dummy)
 			break;
 	}
 
-	kvp_transaction.state = HVUTIL_USERSPACE_REQ;
+	kvp_set_state(HVUTIL_USERSPACE_REQ);
 	rc = hvutil_transport_send(hvt, message, sizeof(*message));
 	if (rc) {
 		pr_debug("KVP: failed to communicate to the daemon: %d\n", rc);
 		if (cancel_delayed_work_sync(&kvp_timeout_work)) {
 			kvp_respond_to_host(message, HV_E_FAIL);
-			kvp_transaction.state = HVUTIL_READY;
+			kvp_set_state(HVUTIL_READY);
 		}
 	}
 
@@ -596,12 +598,13 @@ void hv_kvp_onchannelcallback(void *context)
 	int util_fw_version;
 	int kvp_srv_version;
 
-	if (kvp_transaction.state > HVUTIL_READY) {
+	if (kvp_get_state() > HVUTIL_READY) {
 		/*
 		 * We will defer processing this callback once
 		 * the current transaction is complete.
 		 */
 		kvp_transaction.kvp_context = context;
+		wmb();
 		return;
 	}
 	kvp_transaction.kvp_context = NULL;
@@ -651,12 +654,12 @@ void hv_kvp_onchannelcallback(void *context)
 			kvp_transaction.recv_req_id = requestid;
 			kvp_transaction.kvp_msg = kvp_msg;
 
-			if (kvp_transaction.state < HVUTIL_READY) {
+			if (kvp_get_state() < HVUTIL_READY) {
 				/* Userspace is not registered yet */
 				kvp_respond_to_host(NULL, HV_E_FAIL);
 				return;
 			}
-			kvp_transaction.state = HVUTIL_HOSTMSG_RECEIVED;
+			kvp_set_state(HVUTIL_HOSTMSG_RECEIVED);
 
 			/*
 			 * Get the information from the
@@ -688,7 +691,7 @@ static void kvp_on_reset(void)
 {
 	if (cancel_delayed_work_sync(&kvp_timeout_work))
 		kvp_respond_to_host(NULL, HV_E_FAIL);
-	kvp_transaction.state = HVUTIL_DEVICE_INIT;
+	kvp_set_state(HVUTIL_DEVICE_INIT);
 }
 
 int
@@ -702,7 +705,7 @@ hv_kvp_init(struct hv_util_service *srv)
 	 * Defer processing channel callbacks until the daemon
 	 * has registered.
 	 */
-	kvp_transaction.state = HVUTIL_DEVICE_INIT;
+	kvp_set_state(HVUTIL_DEVICE_INIT);
 
 	hvt = hvutil_transport_init(kvp_devname, CN_KVP_IDX, CN_KVP_VAL,
 				    kvp_on_msg, kvp_on_reset);
@@ -714,7 +717,7 @@ hv_kvp_init(struct hv_util_service *srv)
 
 void hv_kvp_deinit(void)
 {
-	kvp_transaction.state = HVUTIL_DEVICE_DYING;
+	kvp_set_state(HVUTIL_DEVICE_DYING);
 	cancel_delayed_work_sync(&kvp_timeout_work);
 	cancel_work_sync(&kvp_sendkey_work);
 	hvutil_transport_destroy(hvt);
-- 
1.7.4.1


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 5/5] hv: vss: use wrappers to propagate state
  2015-09-16  0:37 ` [PATCH 1/5] Drivers: hv: vmbus: fix init_vp_index() for reloading hv_netvsc K. Y. Srinivasan
                     ` (2 preceding siblings ...)
  2015-09-16  0:37   ` [PATCH 4/5] hv: kvp: use wrappers to propaigate state K. Y. Srinivasan
@ 2015-09-16  0:37   ` K. Y. Srinivasan
  3 siblings, 0 replies; 19+ messages in thread
From: K. Y. Srinivasan @ 2015-09-16  0:37 UTC (permalink / raw)
  To: gregkh, linux-kernel, devel, olaf, apw, vkuznets, jasowang
  Cc: K. Y. Srinivasan

From: Olaf Hering <olaf@aepfle.de>

The "state" is used by several threads of execution.
Propagate the state to make changes visible. Also propagate context
change in vss_on_msg.

Signed-off-by: Olaf Hering <olaf@aepfle.de>
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
---
 drivers/hv/hv_snapshot.c |   37 ++++++++++++++++++++-----------------
 1 files changed, 20 insertions(+), 17 deletions(-)

diff --git a/drivers/hv/hv_snapshot.c b/drivers/hv/hv_snapshot.c
index 815405f..f3cb822 100644
--- a/drivers/hv/hv_snapshot.c
+++ b/drivers/hv/hv_snapshot.c
@@ -48,7 +48,7 @@
  */
 
 static struct {
-	int state;   /* hvutil_device_state */
+	enum hvutil_device_state state;
 	int recv_len; /* number of bytes received. */
 	struct vmbus_channel *recv_channel; /* chn we got the request */
 	u64 recv_req_id; /* request ID. */
@@ -64,6 +64,9 @@ static void vss_respond_to_host(int error);
  */
 static int dm_reg_value;
 
+#define vss_get_state() hvutil_device_get_state(&vss_transaction.state)
+#define vss_set_state(s) hvutil_device_set_state(&vss_transaction.state, s)
+
 static const char vss_devname[] = "vmbus/hv_vss";
 static __u8 *recv_buffer;
 static struct hvutil_transport *hvt;
@@ -87,8 +90,8 @@ static void vss_timeout_func(struct work_struct *dummy)
 	vss_respond_to_host(HV_E_FAIL);
 
 	/* Transaction is finished, reset the state. */
-	if (vss_transaction.state > HVUTIL_READY)
-		vss_transaction.state = HVUTIL_READY;
+	if (vss_get_state() > HVUTIL_READY)
+		vss_set_state(HVUTIL_READY);
 
 	hv_poll_channel(vss_transaction.vss_context,
 			hv_vss_onchannelcallback);
@@ -112,7 +115,7 @@ static int vss_handle_handshake(struct hv_vss_msg *vss_msg)
 	default:
 		return -EINVAL;
 	}
-	vss_transaction.state = HVUTIL_READY;
+	vss_set_state(HVUTIL_READY);
 	pr_debug("VSS: userspace daemon ver. %d registered\n", dm_reg_value);
 	return 0;
 }
@@ -130,15 +133,15 @@ static int vss_on_msg(void *msg, int len)
 		 * Don't process registration messages if we're in the middle
 		 * of a transaction processing.
 		 */
-		if (vss_transaction.state > HVUTIL_READY)
+		if (vss_get_state() > HVUTIL_READY)
 			return -EINVAL;
 		return vss_handle_handshake(vss_msg);
-	} else if (vss_transaction.state == HVUTIL_USERSPACE_REQ) {
-		vss_transaction.state = HVUTIL_USERSPACE_RECV;
+	} else if (vss_get_state() == HVUTIL_USERSPACE_REQ) {
+		vss_set_state(HVUTIL_USERSPACE_RECV);
 		if (cancel_delayed_work_sync(&vss_timeout_work)) {
 			vss_respond_to_host(vss_msg->error);
 			/* Transaction is finished, reset the state. */
-			vss_transaction.state = HVUTIL_READY;
+			vss_set_state(HVUTIL_READY);
 			hv_poll_channel(vss_transaction.vss_context,
 					hv_vss_onchannelcallback);
 		}
@@ -158,7 +161,7 @@ static void vss_send_op(struct work_struct *dummy)
 	struct hv_vss_msg *vss_msg;
 
 	/* The transaction state is wrong. */
-	if (vss_transaction.state != HVUTIL_HOSTMSG_RECEIVED)
+	if (vss_get_state() != HVUTIL_HOSTMSG_RECEIVED)
 		return;
 
 	vss_msg = kzalloc(sizeof(*vss_msg), GFP_KERNEL);
@@ -167,13 +170,13 @@ static void vss_send_op(struct work_struct *dummy)
 
 	vss_msg->vss_hdr.operation = op;
 
-	vss_transaction.state = HVUTIL_USERSPACE_REQ;
+	vss_set_state(HVUTIL_USERSPACE_REQ);
 	rc = hvutil_transport_send(hvt, vss_msg, sizeof(*vss_msg));
 	if (rc) {
 		pr_warn("VSS: failed to communicate to the daemon: %d\n", rc);
 		if (cancel_delayed_work_sync(&vss_timeout_work)) {
 			vss_respond_to_host(HV_E_FAIL);
-			vss_transaction.state = HVUTIL_READY;
+			vss_set_state(HVUTIL_READY);
 		}
 	}
 
@@ -238,7 +241,7 @@ void hv_vss_onchannelcallback(void *context)
 	struct icmsg_hdr *icmsghdrp;
 	struct icmsg_negotiate *negop = NULL;
 
-	if (vss_transaction.state > HVUTIL_READY) {
+	if (vss_get_state() > HVUTIL_READY) {
 		/*
 		 * We will defer processing this callback once
 		 * the current transaction is complete.
@@ -288,12 +291,12 @@ void hv_vss_onchannelcallback(void *context)
 				 */
 			case VSS_OP_FREEZE:
 			case VSS_OP_THAW:
-				if (vss_transaction.state < HVUTIL_READY) {
+				if (vss_get_state() < HVUTIL_READY) {
 					/* Userspace is not registered yet */
 					vss_respond_to_host(HV_E_FAIL);
 					return;
 				}
-				vss_transaction.state = HVUTIL_HOSTMSG_RECEIVED;
+				vss_set_state(HVUTIL_HOSTMSG_RECEIVED);
 				schedule_work(&vss_send_op_work);
 				schedule_delayed_work(&vss_timeout_work,
 						      VSS_USERSPACE_TIMEOUT);
@@ -332,7 +335,7 @@ static void vss_on_reset(void)
 {
 	if (cancel_delayed_work_sync(&vss_timeout_work))
 		vss_respond_to_host(HV_E_FAIL);
-	vss_transaction.state = HVUTIL_DEVICE_INIT;
+	vss_set_state(HVUTIL_DEVICE_INIT);
 }
 
 int
@@ -346,7 +349,7 @@ hv_vss_init(struct hv_util_service *srv)
 	 * Defer processing channel callbacks until the daemon
 	 * has registered.
 	 */
-	vss_transaction.state = HVUTIL_DEVICE_INIT;
+	vss_set_state(HVUTIL_DEVICE_INIT);
 
 	hvt = hvutil_transport_init(vss_devname, CN_VSS_IDX, CN_VSS_VAL,
 				    vss_on_msg, vss_on_reset);
@@ -358,7 +361,7 @@ hv_vss_init(struct hv_util_service *srv)
 
 void hv_vss_deinit(void)
 {
-	vss_transaction.state = HVUTIL_DEVICE_DYING;
+	vss_set_state(HVUTIL_DEVICE_DYING);
 	cancel_delayed_work_sync(&vss_timeout_work);
 	cancel_work_sync(&vss_send_op_work);
 	hvutil_transport_destroy(hvt);
-- 
1.7.4.1


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [PATCH 2/5] hv: add helpers to handle hv_util device state
  2015-09-16  0:37   ` [PATCH 2/5] hv: add helpers to handle hv_util device state K. Y. Srinivasan
@ 2015-09-21  5:25     ` Greg KH
  2015-09-21 10:26       ` Olaf Hering
  0 siblings, 1 reply; 19+ messages in thread
From: Greg KH @ 2015-09-21  5:25 UTC (permalink / raw)
  To: K. Y. Srinivasan; +Cc: linux-kernel, devel, olaf, apw, vkuznets, jasowang

On Tue, Sep 15, 2015 at 05:37:51PM -0700, K. Y. Srinivasan wrote:
> From: Olaf Hering <olaf@aepfle.de>
> 
> The callbacks in kvp, vss and fcopy code are called both from the main thread
> as well as from interrupt context. If a state change is done by the main
> thread it is not immediately seen by the interrupt. As a result the
> state machine gets out of sync.
> 
> Force propagation of state changes via get/set helpers with a memory barrier.
> 
> Signed-off-by: Olaf Hering <olaf@aepfle.de>
> Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
> ---
>  drivers/hv/hyperv_vmbus.h |   14 ++++++++++++++
>  1 files changed, 14 insertions(+), 0 deletions(-)
> 
> diff --git a/drivers/hv/hyperv_vmbus.h b/drivers/hv/hyperv_vmbus.h
> index 4b1eb6d..dee5798 100644
> --- a/drivers/hv/hyperv_vmbus.h
> +++ b/drivers/hv/hyperv_vmbus.h
> @@ -780,4 +780,18 @@ enum hvutil_device_state {
>  	HVUTIL_DEVICE_DYING,     /* driver unload is in progress */
>  };
>  
> +static inline void hvutil_device_set_state(enum hvutil_device_state *p,
> +					   enum hvutil_device_state s)
> +{
> +	*p = s;
> +	wmb();
> +}
> +
> +static inline enum hvutil_device_state
> +hvutil_device_get_state(enum hvutil_device_state *p)
> +{
> +	rmb();
> +	return *p;
> +}
> +
>  #endif /* _HYPERV_VMBUS_H */

This is crazy.  If you need to know the state of something (pun
intended) then you had better be using a lock, and not relying on a
random pointer to contain a random value and be able to do something
based on that.

This shows the code is broken, don't paper over things by throwing in
random read/write barriers, that is a HUGE flag that something bad is
happening here.

Just use a lock, that's what it is there for.

greg k-h

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 4/5] hv: kvp: use wrappers to propaigate state
  2015-09-16  0:37   ` [PATCH 4/5] hv: kvp: use wrappers to propaigate state K. Y. Srinivasan
@ 2015-09-21  5:26     ` Greg KH
  2015-09-21 10:18       ` Olaf Hering
  2015-09-21 16:16       ` KY Srinivasan
  0 siblings, 2 replies; 19+ messages in thread
From: Greg KH @ 2015-09-21  5:26 UTC (permalink / raw)
  To: K. Y. Srinivasan; +Cc: linux-kernel, devel, olaf, apw, vkuznets, jasowang

On Tue, Sep 15, 2015 at 05:37:53PM -0700, K. Y. Srinivasan wrote:
> From: Olaf Hering <olaf@aepfle.de>
> 
> The "state" is used by several threads of execution.
> Propagate the state to make changes visible. Also propagate context
> change in kvp_on_msg.
> 
> Signed-off-by: Olaf Hering <olaf@aepfle.de>
> Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
> ---
>  drivers/hv/hv_kvp.c |   39 +++++++++++++++++++++------------------
>  1 files changed, 21 insertions(+), 18 deletions(-)
> 
> diff --git a/drivers/hv/hv_kvp.c b/drivers/hv/hv_kvp.c
> index 74c38a9..778d353 100644
> --- a/drivers/hv/hv_kvp.c
> +++ b/drivers/hv/hv_kvp.c
> @@ -61,7 +61,7 @@
>   */
>  
>  static struct {
> -	int state;   /* hvutil_device_state */
> +	enum hvutil_device_state state;
>  	int recv_len; /* number of bytes received. */
>  	struct hv_kvp_msg  *kvp_msg; /* current message */
>  	struct vmbus_channel *recv_channel; /* chn we got the request */
> @@ -74,6 +74,9 @@ static struct {
>   */
>  static int dm_reg_value;
>  
> +#define kvp_get_state() hvutil_device_get_state(&kvp_transaction.state)
> +#define kvp_set_state(s) hvutil_device_set_state(&kvp_transaction.state, s)
> +
>  static void kvp_send_key(struct work_struct *dummy);
>  
>  
> @@ -122,8 +125,8 @@ static void kvp_timeout_func(struct work_struct *dummy)
>  	kvp_respond_to_host(NULL, HV_E_FAIL);
>  
>  	/* Transaction is finished, reset the state. */
> -	if (kvp_transaction.state > HVUTIL_READY)
> -		kvp_transaction.state = HVUTIL_READY;
> +	if (kvp_get_state() > HVUTIL_READY)
> +		kvp_set_state(HVUTIL_READY);
>  

And what if the state changed the line after this?  Oops, your code is
hosed.  See, you need a lock, do this correctly.

greg k-h

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 4/5] hv: kvp: use wrappers to propaigate state
  2015-09-21  5:26     ` Greg KH
@ 2015-09-21 10:18       ` Olaf Hering
  2015-09-21 16:31         ` KY Srinivasan
  2015-09-21 16:16       ` KY Srinivasan
  1 sibling, 1 reply; 19+ messages in thread
From: Olaf Hering @ 2015-09-21 10:18 UTC (permalink / raw)
  To: K. Y. Srinivasan, Greg KH; +Cc: linux-kernel, devel, apw, vkuznets, jasowang

On Sun, Sep 20, Greg KH wrote:

> On Tue, Sep 15, 2015 at 05:37:53PM -0700, K. Y. Srinivasan wrote:
> > @@ -122,8 +125,8 @@ static void kvp_timeout_func(struct work_struct *dummy)

> > +	if (kvp_get_state() > HVUTIL_READY)
> > +		kvp_set_state(HVUTIL_READY);

> And what if the state changed the line after this?  Oops, your code is
> hosed.  See, you need a lock, do this correctly.

KY,  can this happen?

Olaf

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 2/5] hv: add helpers to handle hv_util device state
  2015-09-21  5:25     ` Greg KH
@ 2015-09-21 10:26       ` Olaf Hering
  2015-09-21 11:25         ` Vitaly Kuznetsov
  2015-09-21 16:34         ` KY Srinivasan
  0 siblings, 2 replies; 19+ messages in thread
From: Olaf Hering @ 2015-09-21 10:26 UTC (permalink / raw)
  To: K. Y. Srinivasan, Greg KH; +Cc: linux-kernel, devel, apw, vkuznets, jasowang

On Sun, Sep 20, Greg KH wrote:

> Just use a lock, that's what it is there for.

How would that help? It might help because it enforces ordering. But
that requires that all three utils get refactored to deal with the
introduced locking. I will let KY comment on this.

The issue I see with fcopy is that after or while fcopy_respond_to_host
runs an interrupt triggers which also calls into
hv_fcopy_onchannelcallback.  It was most likely caused by a logic change
in "recent" vmbus updates because this did not happen before. At least,
the fcopy hang was not seen earler. Maybe the bug did just not trigger
up to now for other reasons...

Olaf

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 2/5] hv: add helpers to handle hv_util device state
  2015-09-21 10:26       ` Olaf Hering
@ 2015-09-21 11:25         ` Vitaly Kuznetsov
  2015-09-21 12:17           ` Olaf Hering
  2015-09-21 16:34         ` KY Srinivasan
  1 sibling, 1 reply; 19+ messages in thread
From: Vitaly Kuznetsov @ 2015-09-21 11:25 UTC (permalink / raw)
  To: Olaf Hering; +Cc: K. Y. Srinivasan, Greg KH, linux-kernel, devel, apw, jasowang

Olaf Hering <olaf@aepfle.de> writes:

> On Sun, Sep 20, Greg KH wrote:
>
>> Just use a lock, that's what it is there for.
>
> How would that help? It might help because it enforces ordering. But
> that requires that all three utils get refactored to deal with the
> introduced locking. I will let KY comment on this.
>
> The issue I see with fcopy is that after or while fcopy_respond_to_host
> runs an interrupt triggers which also calls into
> hv_fcopy_onchannelcallback.  It was most likely caused by a logic change
> in "recent" vmbus updates because this did not happen before. At least,
> the fcopy hang was not seen earler. Maybe the bug did just not trigger
> up to now for other reasons...

I'd like to see a trace from the hang, it is not obvious to me how it
happened and what caused it. (or if you have such hang scenario in your
head, can you please reveal it?) AFAICS barriers you introduced don't
give you guarantees in an SMP environment.

-- 
  Vitaly

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 2/5] hv: add helpers to handle hv_util device state
  2015-09-21 11:25         ` Vitaly Kuznetsov
@ 2015-09-21 12:17           ` Olaf Hering
  2015-09-21 13:37             ` Vitaly Kuznetsov
  2015-09-21 16:45             ` KY Srinivasan
  0 siblings, 2 replies; 19+ messages in thread
From: Olaf Hering @ 2015-09-21 12:17 UTC (permalink / raw)
  To: Vitaly Kuznetsov
  Cc: K. Y. Srinivasan, Greg KH, linux-kernel, devel, apw, jasowang

On Mon, Sep 21, Vitaly Kuznetsov wrote:

> I'd like to see a trace from the hang, it is not obvious to me how it
> happened and what caused it. (or if you have such hang scenario in your
> head, can you please reveal it?)

There is no trace. I think fcopy_respond_to_host notifies the host,
which in turn triggers an interrupt right away which is processed while
fcopy_on_msg is executing somewhere between the return from
fcopy_respond_to_host and the call into hv_fcopy_onchannelcallback.

> AFAICS barriers you introduced don't give you guarantees in an SMP environment.

Happens to work on x86, and for this purpose. I will see how to add
locking around access to  state and context.

Olaf

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 2/5] hv: add helpers to handle hv_util device state
  2015-09-21 12:17           ` Olaf Hering
@ 2015-09-21 13:37             ` Vitaly Kuznetsov
  2015-09-21 16:45             ` KY Srinivasan
  1 sibling, 0 replies; 19+ messages in thread
From: Vitaly Kuznetsov @ 2015-09-21 13:37 UTC (permalink / raw)
  To: Olaf Hering; +Cc: K. Y. Srinivasan, Greg KH, linux-kernel, devel, apw, jasowang

Olaf Hering <olaf@aepfle.de> writes:

> On Mon, Sep 21, Vitaly Kuznetsov wrote:
>
>> I'd like to see a trace from the hang, it is not obvious to me how it
>> happened and what caused it. (or if you have such hang scenario in your
>> head, can you please reveal it?)
>
> There is no trace. I think fcopy_respond_to_host notifies the host,
> which in turn triggers an interrupt right away which is processed while
> fcopy_on_msg is executing somewhere between the return from
> fcopy_respond_to_host and the call into hv_fcopy_onchannelcallback.
>

I think it is fcopy_transaction.fcopy_context which gets out of sync.

When we're done processing some request we have the following code:

		fcopy_transaction.state = HVUTIL_USERSPACE_RECV;
		fcopy_respond_to_host(*val);
		fcopy_transaction.state = HVUTIL_READY;
		hv_poll_channel(fcopy_transaction.fcopy_context,
				hv_fcopy_onchannelcallback);

If interrupt happens after we did fcopy_respond_to_host()
fcopy_transaction.state will still be HVUTIL_USERSPACE_RECV or even its
previous HVUTIL_USERSPACE_REQ but it's OK as we have the following in
hv_fcopy_onchannelcallback()

	if (fcopy_transaction.state > HVUTIL_READY) {
		/*
		 * We will defer processing this callback once
		 * the current transaction is complete.
		 */
		fcopy_transaction.fcopy_context = context;
		return;
	}

And we're supposed to process the work with hv_poll_channel(). The
problem is (I guess) that fcopy_transaction.fcopy_context gets out of
sync and it still has its previous value (possibly NULL). We call
hv_poll_channel() with NULL and everything gets stuck as we'll never
process the request.

AFAICS proper locking is requred here (and probably in all three
drivers), we need to protect not only .state but the whole transaction.

[...]

-- 
  Vitaly

^ permalink raw reply	[flat|nested] 19+ messages in thread

* RE: [PATCH 4/5] hv: kvp: use wrappers to propaigate state
  2015-09-21  5:26     ` Greg KH
  2015-09-21 10:18       ` Olaf Hering
@ 2015-09-21 16:16       ` KY Srinivasan
  1 sibling, 0 replies; 19+ messages in thread
From: KY Srinivasan @ 2015-09-21 16:16 UTC (permalink / raw)
  To: Greg KH; +Cc: linux-kernel, devel, olaf, apw, vkuznets, jasowang



> -----Original Message-----
> From: Greg KH [mailto:gregkh@linuxfoundation.org]
> Sent: Sunday, September 20, 2015 10:26 PM
> To: KY Srinivasan <kys@microsoft.com>
> Cc: linux-kernel@vger.kernel.org; devel@linuxdriverproject.org;
> olaf@aepfle.de; apw@canonical.com; vkuznets@redhat.com;
> jasowang@redhat.com
> Subject: Re: [PATCH 4/5] hv: kvp: use wrappers to propaigate state
> 
> On Tue, Sep 15, 2015 at 05:37:53PM -0700, K. Y. Srinivasan wrote:
> > From: Olaf Hering <olaf@aepfle.de>
> >
> > The "state" is used by several threads of execution.
> > Propagate the state to make changes visible. Also propagate context
> > change in kvp_on_msg.
> >
> > Signed-off-by: Olaf Hering <olaf@aepfle.de>
> > Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
> > ---
> >  drivers/hv/hv_kvp.c |   39 +++++++++++++++++++++------------------
> >  1 files changed, 21 insertions(+), 18 deletions(-)
> >
> > diff --git a/drivers/hv/hv_kvp.c b/drivers/hv/hv_kvp.c
> > index 74c38a9..778d353 100644
> > --- a/drivers/hv/hv_kvp.c
> > +++ b/drivers/hv/hv_kvp.c
> > @@ -61,7 +61,7 @@
> >   */
> >
> >  static struct {
> > -	int state;   /* hvutil_device_state */
> > +	enum hvutil_device_state state;
> >  	int recv_len; /* number of bytes received. */
> >  	struct hv_kvp_msg  *kvp_msg; /* current message */
> >  	struct vmbus_channel *recv_channel; /* chn we got the request */
> > @@ -74,6 +74,9 @@ static struct {
> >   */
> >  static int dm_reg_value;
> >
> > +#define kvp_get_state()
> hvutil_device_get_state(&kvp_transaction.state)
> > +#define kvp_set_state(s)
> hvutil_device_set_state(&kvp_transaction.state, s)
> > +
> >  static void kvp_send_key(struct work_struct *dummy);
> >
> >
> > @@ -122,8 +125,8 @@ static void kvp_timeout_func(struct work_struct
> *dummy)
> >  	kvp_respond_to_host(NULL, HV_E_FAIL);
> >
> >  	/* Transaction is finished, reset the state. */
> > -	if (kvp_transaction.state > HVUTIL_READY)
> > -		kvp_transaction.state = HVUTIL_READY;
> > +	if (kvp_get_state() > HVUTIL_READY)
> > +		kvp_set_state(HVUTIL_READY);
> >
> 
> And what if the state changed the line after this?  Oops, your code is
> hosed.  See, you need a lock, do this correctly.

This code path is an exception path - request has already been sent to the guest user space and we
are protecting against the guest user space not responding in a reasonable time. Consequently,
the state here has to be >  HVUTIL_READY (we should probably ASSERT this here). When the timeout
triggers, this will be the only code responding back to the host. So there is no issue here and 
I don't think you need a lock here.

The channels for the util driver are all bound to CPU 0. Given this, the simpler solution maybe to 
ensure that we execute all of the various contexts on CPU 0 and have implicit locking.

Regards,

K. Y   











> 
> greg k-h

^ permalink raw reply	[flat|nested] 19+ messages in thread

* RE: [PATCH 4/5] hv: kvp: use wrappers to propaigate state
  2015-09-21 10:18       ` Olaf Hering
@ 2015-09-21 16:31         ` KY Srinivasan
  0 siblings, 0 replies; 19+ messages in thread
From: KY Srinivasan @ 2015-09-21 16:31 UTC (permalink / raw)
  To: Olaf Hering, Greg KH; +Cc: linux-kernel, devel, apw, vkuznets, jasowang

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset="utf-8", Size: 1175 bytes --]



> -----Original Message-----
> From: Olaf Hering [mailto:olaf@aepfle.de]
> Sent: Monday, September 21, 2015 3:18 AM
> To: KY Srinivasan <kys@microsoft.com>; Greg KH
> <gregkh@linuxfoundation.org>
> Cc: linux-kernel@vger.kernel.org; devel@linuxdriverproject.org;
> apw@canonical.com; vkuznets@redhat.com; jasowang@redhat.com
> Subject: Re: [PATCH 4/5] hv: kvp: use wrappers to propaigate state
> 
> On Sun, Sep 20, Greg KH wrote:
> 
> > On Tue, Sep 15, 2015 at 05:37:53PM -0700, K. Y. Srinivasan wrote:
> > > @@ -122,8 +125,8 @@ static void kvp_timeout_func(struct work_struct
> *dummy)
> 
> > > +	if (kvp_get_state() > HVUTIL_READY)
> > > +		kvp_set_state(HVUTIL_READY);
> 
> > And what if the state changed the line after this?  Oops, your code is
> > hosed.  See, you need a lock, do this correctly.
> 
> KY,  can this happen?

No - the state has to be > HVUTIL_READY. Also, if the timeout fires,
it wins since other contexts will attempt to first cancel the timeout before
proceeding further.

K. Y 
> 
> Olaf
ÿôèº{.nÇ+‰·Ÿ®‰­†+%ŠËÿ±éݶ\x17¥Šwÿº{.nÇ+‰·¥Š{±þG«éÿŠ{ayº\x1dʇڙë,j\a­¢f£¢·hšïêÿ‘êçz_è®\x03(­éšŽŠÝ¢j"ú\x1a¶^[m§ÿÿ¾\a«þG«éÿ¢¸?™¨è­Ú&£ø§~á¶iO•æ¬z·švØ^\x14\x04\x1a¶^[m§ÿÿÃ\fÿ¶ìÿ¢¸?–I¥

^ permalink raw reply	[flat|nested] 19+ messages in thread

* RE: [PATCH 2/5] hv: add helpers to handle hv_util device state
  2015-09-21 10:26       ` Olaf Hering
  2015-09-21 11:25         ` Vitaly Kuznetsov
@ 2015-09-21 16:34         ` KY Srinivasan
  2015-09-21 16:43           ` Greg KH
  1 sibling, 1 reply; 19+ messages in thread
From: KY Srinivasan @ 2015-09-21 16:34 UTC (permalink / raw)
  To: Olaf Hering, Greg KH; +Cc: linux-kernel, devel, apw, vkuznets, jasowang

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset="utf-8", Size: 1567 bytes --]



> -----Original Message-----
> From: Olaf Hering [mailto:olaf@aepfle.de]
> Sent: Monday, September 21, 2015 3:26 AM
> To: KY Srinivasan <kys@microsoft.com>; Greg KH
> <gregkh@linuxfoundation.org>
> Cc: linux-kernel@vger.kernel.org; devel@linuxdriverproject.org;
> apw@canonical.com; vkuznets@redhat.com; jasowang@redhat.com
> Subject: Re: [PATCH 2/5] hv: add helpers to handle hv_util device state
> 
> On Sun, Sep 20, Greg KH wrote:
> 
> > Just use a lock, that's what it is there for.
> 
> How would that help? It might help because it enforces ordering. But
> that requires that all three utils get refactored to deal with the
> introduced locking. I will let KY comment on this.
> 
> The issue I see with fcopy is that after or while fcopy_respond_to_host
> runs an interrupt triggers which also calls into
> hv_fcopy_onchannelcallback.  It was most likely caused by a logic change
> in "recent" vmbus updates because this did not happen before. At least,
> the fcopy hang was not seen earler. Maybe the bug did just not trigger
> up to now for other reasons...

All util channels are bound to CPU 0. Just forcing all activity on CPU 0 may be the 
simplest solution here. Besides, these are not performance critical services anyway.

The problem you may have run into could be related to the fact that we could potentially
run the polling function on a CPU other than CPU 0.

Regards,

K. Y
> 
> Olaf
ÿôèº{.nÇ+‰·Ÿ®‰­†+%ŠËÿ±éݶ\x17¥Šwÿº{.nÇ+‰·¥Š{±þG«éÿŠ{ayº\x1dʇڙë,j\a­¢f£¢·hšïêÿ‘êçz_è®\x03(­éšŽŠÝ¢j"ú\x1a¶^[m§ÿÿ¾\a«þG«éÿ¢¸?™¨è­Ú&£ø§~á¶iO•æ¬z·švØ^\x14\x04\x1a¶^[m§ÿÿÃ\fÿ¶ìÿ¢¸?–I¥

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 2/5] hv: add helpers to handle hv_util device state
  2015-09-21 16:34         ` KY Srinivasan
@ 2015-09-21 16:43           ` Greg KH
  2015-09-21 17:00             ` KY Srinivasan
  0 siblings, 1 reply; 19+ messages in thread
From: Greg KH @ 2015-09-21 16:43 UTC (permalink / raw)
  To: KY Srinivasan; +Cc: Olaf Hering, linux-kernel, devel, apw, vkuznets, jasowang

On Mon, Sep 21, 2015 at 04:34:56PM +0000, KY Srinivasan wrote:
> 
> 
> > -----Original Message-----
> > From: Olaf Hering [mailto:olaf@aepfle.de]
> > Sent: Monday, September 21, 2015 3:26 AM
> > To: KY Srinivasan <kys@microsoft.com>; Greg KH
> > <gregkh@linuxfoundation.org>
> > Cc: linux-kernel@vger.kernel.org; devel@linuxdriverproject.org;
> > apw@canonical.com; vkuznets@redhat.com; jasowang@redhat.com
> > Subject: Re: [PATCH 2/5] hv: add helpers to handle hv_util device state
> > 
> > On Sun, Sep 20, Greg KH wrote:
> > 
> > > Just use a lock, that's what it is there for.
> > 
> > How would that help? It might help because it enforces ordering. But
> > that requires that all three utils get refactored to deal with the
> > introduced locking. I will let KY comment on this.
> > 
> > The issue I see with fcopy is that after or while fcopy_respond_to_host
> > runs an interrupt triggers which also calls into
> > hv_fcopy_onchannelcallback.  It was most likely caused by a logic change
> > in "recent" vmbus updates because this did not happen before. At least,
> > the fcopy hang was not seen earler. Maybe the bug did just not trigger
> > up to now for other reasons...
> 
> All util channels are bound to CPU 0. Just forcing all activity on CPU 0 may be the 
> simplest solution here. Besides, these are not performance critical services anyway.
> 
> The problem you may have run into could be related to the fact that we could potentially
> run the polling function on a CPU other than CPU 0.

Again, this sounds like a locking issue, you have multiple
threads/processes accessing the same data.  Even if you bind it all to
one cpu, this shows a real design problem.

Use a lock to fix this properly.  That way, when you stop using only one
CPU, the code will "just work", and if you are really only on one CPU
today, there will not be any lock contention.

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 19+ messages in thread

* RE: [PATCH 2/5] hv: add helpers to handle hv_util device state
  2015-09-21 12:17           ` Olaf Hering
  2015-09-21 13:37             ` Vitaly Kuznetsov
@ 2015-09-21 16:45             ` KY Srinivasan
  1 sibling, 0 replies; 19+ messages in thread
From: KY Srinivasan @ 2015-09-21 16:45 UTC (permalink / raw)
  To: Olaf Hering, Vitaly Kuznetsov; +Cc: Greg KH, linux-kernel, devel, apw, jasowang

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset="utf-8", Size: 1687 bytes --]



> -----Original Message-----
> From: Olaf Hering [mailto:olaf@aepfle.de]
> Sent: Monday, September 21, 2015 5:17 AM
> To: Vitaly Kuznetsov <vkuznets@redhat.com>
> Cc: KY Srinivasan <kys@microsoft.com>; Greg KH
> <gregkh@linuxfoundation.org>; linux-kernel@vger.kernel.org;
> devel@linuxdriverproject.org; apw@canonical.com; jasowang@redhat.com
> Subject: Re: [PATCH 2/5] hv: add helpers to handle hv_util device state
> 
> On Mon, Sep 21, Vitaly Kuznetsov wrote:
> 
> > I'd like to see a trace from the hang, it is not obvious to me how it
> > happened and what caused it. (or if you have such hang scenario in your
> > head, can you please reveal it?)
> 
> There is no trace. I think fcopy_respond_to_host notifies the host,
> which in turn triggers an interrupt right away which is processed while
> fcopy_on_msg is executing somewhere between the return from
> fcopy_respond_to_host and the call into hv_fcopy_onchannelcallback.
> 
> > AFAICS barriers you introduced don't give you guarantees in an SMP
> environment.
> 
> Happens to work on x86, and for this purpose. I will see how to add
> locking around access to  state and context.

All activity starts with the interrupt handler - this is the start of each new
transaction. Given that we have only one outstanding transaction at a time
we have naturally serialized the operations.

If we force all activity onto the correct CPU (the CPU the channel is bound to) 
(currently we force polling to occur on the correct CPU) we should be fine.

Regards,

K. Y 
> 
> Olaf
ÿôèº{.nÇ+‰·Ÿ®‰­†+%ŠËÿ±éݶ\x17¥Šwÿº{.nÇ+‰·¥Š{±þG«éÿŠ{ayº\x1dʇڙë,j\a­¢f£¢·hšïêÿ‘êçz_è®\x03(­éšŽŠÝ¢j"ú\x1a¶^[m§ÿÿ¾\a«þG«éÿ¢¸?™¨è­Ú&£ø§~á¶iO•æ¬z·švØ^\x14\x04\x1a¶^[m§ÿÿÃ\fÿ¶ìÿ¢¸?–I¥

^ permalink raw reply	[flat|nested] 19+ messages in thread

* RE: [PATCH 2/5] hv: add helpers to handle hv_util device state
  2015-09-21 16:43           ` Greg KH
@ 2015-09-21 17:00             ` KY Srinivasan
  0 siblings, 0 replies; 19+ messages in thread
From: KY Srinivasan @ 2015-09-21 17:00 UTC (permalink / raw)
  To: Greg KH; +Cc: Olaf Hering, linux-kernel, devel, apw, vkuznets, jasowang



> -----Original Message-----
> From: Greg KH [mailto:gregkh@linuxfoundation.org]
> Sent: Monday, September 21, 2015 9:44 AM
> To: KY Srinivasan <kys@microsoft.com>
> Cc: Olaf Hering <olaf@aepfle.de>; linux-kernel@vger.kernel.org;
> devel@linuxdriverproject.org; apw@canonical.com; vkuznets@redhat.com;
> jasowang@redhat.com
> Subject: Re: [PATCH 2/5] hv: add helpers to handle hv_util device state
> 
> On Mon, Sep 21, 2015 at 04:34:56PM +0000, KY Srinivasan wrote:
> >
> >
> > > -----Original Message-----
> > > From: Olaf Hering [mailto:olaf@aepfle.de]
> > > Sent: Monday, September 21, 2015 3:26 AM
> > > To: KY Srinivasan <kys@microsoft.com>; Greg KH
> > > <gregkh@linuxfoundation.org>
> > > Cc: linux-kernel@vger.kernel.org; devel@linuxdriverproject.org;
> > > apw@canonical.com; vkuznets@redhat.com; jasowang@redhat.com
> > > Subject: Re: [PATCH 2/5] hv: add helpers to handle hv_util device state
> > >
> > > On Sun, Sep 20, Greg KH wrote:
> > >
> > > > Just use a lock, that's what it is there for.
> > >
> > > How would that help? It might help because it enforces ordering. But
> > > that requires that all three utils get refactored to deal with the
> > > introduced locking. I will let KY comment on this.
> > >
> > > The issue I see with fcopy is that after or while fcopy_respond_to_host
> > > runs an interrupt triggers which also calls into
> > > hv_fcopy_onchannelcallback.  It was most likely caused by a logic change
> > > in "recent" vmbus updates because this did not happen before. At least,
> > > the fcopy hang was not seen earler. Maybe the bug did just not trigger
> > > up to now for other reasons...
> >
> > All util channels are bound to CPU 0. Just forcing all activity on CPU 0 may be
> the
> > simplest solution here. Besides, these are not performance critical services
> anyway.
> >
> > The problem you may have run into could be related to the fact that we
> could potentially
> > run the polling function on a CPU other than CPU 0.
> 
> Again, this sounds like a locking issue, you have multiple
> threads/processes accessing the same data.  Even if you bind it all to
> one cpu, this shows a real design problem.
> 
> Use a lock to fix this properly.  That way, when you stop using only one
> CPU, the code will "just work", and if you are really only on one CPU
> today, there will not be any lock contention.
> 
> thanks,

Thanks Greg; will do.

Regards,

K. Y
> 
> greg k-h

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2015-09-21 17:01 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-09-16  0:37 [PATCH 0/5] Drivers: hv: Miscellaneous fixes K. Y. Srinivasan
2015-09-16  0:37 ` [PATCH 1/5] Drivers: hv: vmbus: fix init_vp_index() for reloading hv_netvsc K. Y. Srinivasan
2015-09-16  0:37   ` [PATCH 2/5] hv: add helpers to handle hv_util device state K. Y. Srinivasan
2015-09-21  5:25     ` Greg KH
2015-09-21 10:26       ` Olaf Hering
2015-09-21 11:25         ` Vitaly Kuznetsov
2015-09-21 12:17           ` Olaf Hering
2015-09-21 13:37             ` Vitaly Kuznetsov
2015-09-21 16:45             ` KY Srinivasan
2015-09-21 16:34         ` KY Srinivasan
2015-09-21 16:43           ` Greg KH
2015-09-21 17:00             ` KY Srinivasan
2015-09-16  0:37   ` [PATCH 3/5] hv: fcopy: use wrappers to propagate state K. Y. Srinivasan
2015-09-16  0:37   ` [PATCH 4/5] hv: kvp: use wrappers to propaigate state K. Y. Srinivasan
2015-09-21  5:26     ` Greg KH
2015-09-21 10:18       ` Olaf Hering
2015-09-21 16:31         ` KY Srinivasan
2015-09-21 16:16       ` KY Srinivasan
2015-09-16  0:37   ` [PATCH 5/5] hv: vss: use wrappers to propagate state K. Y. Srinivasan

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).