linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/6] Drivers: hv: vmbus: fair round robin algorithm for vmbus_get_outgoing_channel()
@ 2015-04-21 14:27 Vitaly Kuznetsov
  2015-04-21 14:27 ` [PATCH 1/6] Drivers: hv: vmbus: unify calls to percpu_channel_enq() Vitaly Kuznetsov
                   ` (5 more replies)
  0 siblings, 6 replies; 18+ messages in thread
From: Vitaly Kuznetsov @ 2015-04-21 14:27 UTC (permalink / raw)
  To: K. Y. Srinivasan; +Cc: Haiyang Zhang, devel, linux-kernel, Dexuan Cui

This series is a continuation of the "Drivers: hv: vmbus: Use a round-robin
algorithm for picking the outgoing channel" work. It is supposed to bring two
significant changes:
1) Subchannels for a channel are distributed evenly across all vcpus we have.
   Currently we try to distribute all channels (including subchannels) across
   all vcpus, this approach doesn't guarantee that the particular channel's
   subchannels will be distributed in the same way as we process all offer
   requests in some random order. (Patch 05)
2) Channel picking based on the current vcpu is dropped from
   vmbus_get_outgoing_channel() in favor of a fair round robin. (Patch 06)

Patches 01 - 04 are cleanup/refactoring.

It would be also possible to make the change the other way around: always pick
the subchannel based on the current vcpu (and fallback to a round robin in case
there is no subchannel for the current vcpu). Please let me know if you think
it is preferable.

Vitaly Kuznetsov (6):
  Drivers: hv: vmbus: unify calls to percpu_channel_enq()
  Drivers: hv: vmbus: briefly comment num_sc and next_oc
  Drivers: hv: vmbus: decrease num_sc on subchannel removal
  Drivers: hv: vmbus: move init_vp_index() call to vmbus_process_offer()
  Drivers: hv: vmbus: distribute subchannels among all vcpus
  Drivers: hv: vmbus: do a fair round robin when selecting an outgoing
    channel

 drivers/hv/channel_mgmt.c | 248 +++++++++++++++++++++++-----------------------
 include/linux/hyperv.h    |  12 ++-
 2 files changed, 135 insertions(+), 125 deletions(-)

-- 
1.9.3


^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH 1/6] Drivers: hv: vmbus: unify calls to percpu_channel_enq()
  2015-04-21 14:27 [PATCH 0/6] Drivers: hv: vmbus: fair round robin algorithm for vmbus_get_outgoing_channel() Vitaly Kuznetsov
@ 2015-04-21 14:27 ` Vitaly Kuznetsov
  2015-04-21 14:27 ` [PATCH 2/6] Drivers: hv: vmbus: briefly comment num_sc and next_oc Vitaly Kuznetsov
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 18+ messages in thread
From: Vitaly Kuznetsov @ 2015-04-21 14:27 UTC (permalink / raw)
  To: K. Y. Srinivasan; +Cc: Haiyang Zhang, devel, linux-kernel, Dexuan Cui

Remove some code duplication, no functional change intended.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
---
 drivers/hv/channel_mgmt.c | 51 +++++++++++++++++------------------------------
 1 file changed, 18 insertions(+), 33 deletions(-)

diff --git a/drivers/hv/channel_mgmt.c b/drivers/hv/channel_mgmt.c
index 4b9d89a..b28cbdf 100644
--- a/drivers/hv/channel_mgmt.c
+++ b/drivers/hv/channel_mgmt.c
@@ -233,7 +233,6 @@ static void vmbus_process_offer(struct vmbus_channel *newchannel)
 {
 	struct vmbus_channel *channel;
 	bool fnew = true;
-	bool enq = false;
 	unsigned long flags;
 
 	/* Make sure this is a new offer */
@@ -249,25 +248,12 @@ static void vmbus_process_offer(struct vmbus_channel *newchannel)
 		}
 	}
 
-	if (fnew) {
+	if (fnew)
 		list_add_tail(&newchannel->listentry,
 			      &vmbus_connection.chn_list);
-		enq = true;
-	}
 
 	spin_unlock_irqrestore(&vmbus_connection.channel_lock, flags);
 
-	if (enq) {
-		if (newchannel->target_cpu != get_cpu()) {
-			put_cpu();
-			smp_call_function_single(newchannel->target_cpu,
-						 percpu_channel_enq,
-						 newchannel, true);
-		} else {
-			percpu_channel_enq(newchannel);
-			put_cpu();
-		}
-	}
 	if (!fnew) {
 		/*
 		 * Check to see if this is a sub-channel.
@@ -280,26 +266,19 @@ static void vmbus_process_offer(struct vmbus_channel *newchannel)
 			spin_lock_irqsave(&channel->lock, flags);
 			list_add_tail(&newchannel->sc_list, &channel->sc_list);
 			spin_unlock_irqrestore(&channel->lock, flags);
-
-			if (newchannel->target_cpu != get_cpu()) {
-				put_cpu();
-				smp_call_function_single(newchannel->target_cpu,
-							 percpu_channel_enq,
-							 newchannel, true);
-			} else {
-				percpu_channel_enq(newchannel);
-				put_cpu();
-			}
-
-			newchannel->state = CHANNEL_OPEN_STATE;
 			channel->num_sc++;
-			if (channel->sc_creation_callback != NULL)
-				channel->sc_creation_callback(newchannel);
-
-			return;
-		}
+		} else
+			goto err_free_chan;
+	}
 
-		goto err_free_chan;
+	if (newchannel->target_cpu != get_cpu()) {
+		put_cpu();
+		smp_call_function_single(newchannel->target_cpu,
+					 percpu_channel_enq,
+					 newchannel, true);
+	} else {
+		percpu_channel_enq(newchannel);
+		put_cpu();
 	}
 
 	/*
@@ -309,6 +288,12 @@ static void vmbus_process_offer(struct vmbus_channel *newchannel)
 	 */
 	newchannel->state = CHANNEL_OPEN_STATE;
 
+	if (!fnew) {
+		if (channel->sc_creation_callback != NULL)
+			channel->sc_creation_callback(newchannel);
+		return;
+	}
+
 	/*
 	 * Start the process of binding this offer to the driver
 	 * We need to set the DeviceObject field before calling
-- 
1.9.3


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 2/6] Drivers: hv: vmbus: briefly comment num_sc and next_oc
  2015-04-21 14:27 [PATCH 0/6] Drivers: hv: vmbus: fair round robin algorithm for vmbus_get_outgoing_channel() Vitaly Kuznetsov
  2015-04-21 14:27 ` [PATCH 1/6] Drivers: hv: vmbus: unify calls to percpu_channel_enq() Vitaly Kuznetsov
@ 2015-04-21 14:27 ` Vitaly Kuznetsov
  2015-04-21 14:27 ` [PATCH 3/6] Drivers: hv: vmbus: decrease num_sc on subchannel removal Vitaly Kuznetsov
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 18+ messages in thread
From: Vitaly Kuznetsov @ 2015-04-21 14:27 UTC (permalink / raw)
  To: K. Y. Srinivasan; +Cc: Haiyang Zhang, devel, linux-kernel, Dexuan Cui

next_oc and num_sc fields of struct vmbus_channel deserve a description. Move
them closer to sc_list as these fields are related to it.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
---
 include/linux/hyperv.h | 12 +++++++++---
 1 file changed, 9 insertions(+), 3 deletions(-)

diff --git a/include/linux/hyperv.h b/include/linux/hyperv.h
index c739cba..cce7f66 100644
--- a/include/linux/hyperv.h
+++ b/include/linux/hyperv.h
@@ -727,6 +727,15 @@ struct vmbus_channel {
 	 */
 	struct list_head sc_list;
 	/*
+	 * Current number of sub-channels.
+	 */
+	int num_sc;
+	/*
+	 * Number of a sub-channel (position within sc_list) which is supposed
+	 * to be used as the next outgoing channel.
+	 */
+	int next_oc;
+	/*
 	 * The primary channel this sub-channel belongs to.
 	 * This will be NULL for the primary channel.
 	 */
@@ -740,9 +749,6 @@ struct vmbus_channel {
 	 * link up channels based on their CPU affinity.
 	 */
 	struct list_head percpu_list;
-
-	int num_sc;
-	int next_oc;
 };
 
 static inline void set_channel_read_state(struct vmbus_channel *c, bool state)
-- 
1.9.3


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 3/6] Drivers: hv: vmbus: decrease num_sc on subchannel removal
  2015-04-21 14:27 [PATCH 0/6] Drivers: hv: vmbus: fair round robin algorithm for vmbus_get_outgoing_channel() Vitaly Kuznetsov
  2015-04-21 14:27 ` [PATCH 1/6] Drivers: hv: vmbus: unify calls to percpu_channel_enq() Vitaly Kuznetsov
  2015-04-21 14:27 ` [PATCH 2/6] Drivers: hv: vmbus: briefly comment num_sc and next_oc Vitaly Kuznetsov
@ 2015-04-21 14:27 ` Vitaly Kuznetsov
  2015-04-24  7:02   ` Dexuan Cui
  2015-04-21 14:27 ` [PATCH 4/6] Drivers: hv: vmbus: move init_vp_index() call to vmbus_process_offer() Vitaly Kuznetsov
                   ` (2 subsequent siblings)
  5 siblings, 1 reply; 18+ messages in thread
From: Vitaly Kuznetsov @ 2015-04-21 14:27 UTC (permalink / raw)
  To: K. Y. Srinivasan; +Cc: Haiyang Zhang, devel, linux-kernel, Dexuan Cui

It is unlikely that that host will ask us to close only one subchannel for a
device but let's be consistent. Do both num_sc++ and num_sc-- with
channel->lock to be on the safe side.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
---
 drivers/hv/channel_mgmt.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/hv/channel_mgmt.c b/drivers/hv/channel_mgmt.c
index b28cbdf..8b4b561 100644
--- a/drivers/hv/channel_mgmt.c
+++ b/drivers/hv/channel_mgmt.c
@@ -205,6 +205,7 @@ void hv_process_channel_removal(struct vmbus_channel *channel, u32 relid)
 		primary_channel = channel->primary_channel;
 		spin_lock_irqsave(&primary_channel->lock, flags);
 		list_del(&channel->sc_list);
+		channel->num_sc--;
 		spin_unlock_irqrestore(&primary_channel->lock, flags);
 	}
 	free_channel(channel);
@@ -265,8 +266,8 @@ static void vmbus_process_offer(struct vmbus_channel *newchannel)
 			newchannel->primary_channel = channel;
 			spin_lock_irqsave(&channel->lock, flags);
 			list_add_tail(&newchannel->sc_list, &channel->sc_list);
-			spin_unlock_irqrestore(&channel->lock, flags);
 			channel->num_sc++;
+			spin_unlock_irqrestore(&channel->lock, flags);
 		} else
 			goto err_free_chan;
 	}
-- 
1.9.3


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 4/6] Drivers: hv: vmbus: move init_vp_index() call to vmbus_process_offer()
  2015-04-21 14:27 [PATCH 0/6] Drivers: hv: vmbus: fair round robin algorithm for vmbus_get_outgoing_channel() Vitaly Kuznetsov
                   ` (2 preceding siblings ...)
  2015-04-21 14:27 ` [PATCH 3/6] Drivers: hv: vmbus: decrease num_sc on subchannel removal Vitaly Kuznetsov
@ 2015-04-21 14:27 ` Vitaly Kuznetsov
  2015-04-24  8:38   ` Dexuan Cui
  2015-04-21 14:27 ` [PATCH 5/6] Drivers: hv: vmbus: distribute subchannels among all vcpus Vitaly Kuznetsov
  2015-04-21 14:27 ` [PATCH 6/6] Drivers: hv: vmbus: do a fair round robin when selecting an outgoing channel Vitaly Kuznetsov
  5 siblings, 1 reply; 18+ messages in thread
From: Vitaly Kuznetsov @ 2015-04-21 14:27 UTC (permalink / raw)
  To: K. Y. Srinivasan; +Cc: Haiyang Zhang, devel, linux-kernel, Dexuan Cui

We need to call init_vp_index() after we added the channel to the appropriate
list (global or subchannel) to be able to use this information when assigning
the channel to the particular vcpu. To do so we need to move a couple of
functions around. The only real change is the init_vp_index() call. This is a
small refactoring without a functional change.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
---
 drivers/hv/channel_mgmt.c | 142 +++++++++++++++++++++++-----------------------
 1 file changed, 71 insertions(+), 71 deletions(-)

diff --git a/drivers/hv/channel_mgmt.c b/drivers/hv/channel_mgmt.c
index 8b4b561..8f2761f 100644
--- a/drivers/hv/channel_mgmt.c
+++ b/drivers/hv/channel_mgmt.c
@@ -226,6 +226,75 @@ void vmbus_free_channels(void)
 	}
 }
 
+enum {
+	IDE = 0,
+	SCSI,
+	NIC,
+	MAX_PERF_CHN,
+};
+
+/*
+ * This is an array of device_ids (device types) that are performance critical.
+ * We attempt to distribute the interrupt load for these devices across
+ * all available CPUs.
+ */
+static const struct hv_vmbus_device_id hp_devs[] = {
+	/* IDE */
+	{ HV_IDE_GUID, },
+	/* Storage - SCSI */
+	{ HV_SCSI_GUID, },
+	/* Network */
+	{ HV_NIC_GUID, },
+	/* NetworkDirect Guest RDMA */
+	{ HV_ND_GUID, },
+};
+
+/*
+ * We use this state to statically distribute the channel interrupt load.
+ */
+static u32  next_vp;
+
+/*
+ * Starting with Win8, we can statically distribute the incoming
+ * channel interrupt load by binding a channel to VCPU. We
+ * implement here a simple round robin scheme for distributing
+ * the interrupt load.
+ * We will bind channels that are not performance critical to cpu 0 and
+ * performance critical channels (IDE, SCSI and Network) will be uniformly
+ * distributed across all available CPUs.
+ */
+static void init_vp_index(struct vmbus_channel *channel,
+			  const uuid_le *type_guid)
+{
+	u32 cur_cpu;
+	int i;
+	bool perf_chn = false;
+	u32 max_cpus = num_online_cpus();
+
+	for (i = IDE; i < MAX_PERF_CHN; i++) {
+		if (!memcmp(type_guid->b, hp_devs[i].guid,
+				 sizeof(uuid_le))) {
+			perf_chn = true;
+			break;
+		}
+	}
+	if ((vmbus_proto_version == VERSION_WS2008) ||
+	    (vmbus_proto_version == VERSION_WIN7) || (!perf_chn)) {
+		/*
+		 * Prior to win8, all channel interrupts are
+		 * delivered on cpu 0.
+		 * Also if the channel is not a performance critical
+		 * channel, bind it to cpu 0.
+		 */
+		channel->target_cpu = 0;
+		channel->target_vp = 0;
+		return;
+	}
+	cur_cpu = (++next_vp % max_cpus);
+	channel->target_cpu = cur_cpu;
+	channel->target_vp = hv_context.vp_index[cur_cpu];
+}
+
 /*
  * vmbus_process_offer - Process the offer by creating a channel/device
  * associated with this offer
@@ -272,6 +341,8 @@ static void vmbus_process_offer(struct vmbus_channel *newchannel)
 			goto err_free_chan;
 	}
 
+	init_vp_index(newchannel, &newchannel->offermsg.offer.if_type);
+
 	if (newchannel->target_cpu != get_cpu()) {
 		put_cpu();
 		smp_call_function_single(newchannel->target_cpu,
@@ -338,75 +409,6 @@ err_free_chan:
 	free_channel(newchannel);
 }
 
-enum {
-	IDE = 0,
-	SCSI,
-	NIC,
-	MAX_PERF_CHN,
-};
-
-/*
- * This is an array of device_ids (device types) that are performance critical.
- * We attempt to distribute the interrupt load for these devices across
- * all available CPUs.
- */
-static const struct hv_vmbus_device_id hp_devs[] = {
-	/* IDE */
-	{ HV_IDE_GUID, },
-	/* Storage - SCSI */
-	{ HV_SCSI_GUID, },
-	/* Network */
-	{ HV_NIC_GUID, },
-	/* NetworkDirect Guest RDMA */
-	{ HV_ND_GUID, },
-};
-
-
-/*
- * We use this state to statically distribute the channel interrupt load.
- */
-static u32  next_vp;
-
-/*
- * Starting with Win8, we can statically distribute the incoming
- * channel interrupt load by binding a channel to VCPU. We
- * implement here a simple round robin scheme for distributing
- * the interrupt load.
- * We will bind channels that are not performance critical to cpu 0 and
- * performance critical channels (IDE, SCSI and Network) will be uniformly
- * distributed across all available CPUs.
- */
-static void init_vp_index(struct vmbus_channel *channel, const uuid_le *type_guid)
-{
-	u32 cur_cpu;
-	int i;
-	bool perf_chn = false;
-	u32 max_cpus = num_online_cpus();
-
-	for (i = IDE; i < MAX_PERF_CHN; i++) {
-		if (!memcmp(type_guid->b, hp_devs[i].guid,
-				 sizeof(uuid_le))) {
-			perf_chn = true;
-			break;
-		}
-	}
-	if ((vmbus_proto_version == VERSION_WS2008) ||
-	    (vmbus_proto_version == VERSION_WIN7) || (!perf_chn)) {
-		/*
-		 * Prior to win8, all channel interrupts are
-		 * delivered on cpu 0.
-		 * Also if the channel is not a performance critical
-		 * channel, bind it to cpu 0.
-		 */
-		channel->target_cpu = 0;
-		channel->target_vp = 0;
-		return;
-	}
-	cur_cpu = (++next_vp % max_cpus);
-	channel->target_cpu = cur_cpu;
-	channel->target_vp = hv_context.vp_index[cur_cpu];
-}
-
 /*
  * vmbus_unload_response - Handler for the unload response.
  */
@@ -476,8 +478,6 @@ static void vmbus_onoffer(struct vmbus_channel_message_header *hdr)
 				offer->connection_id;
 	}
 
-	init_vp_index(newchannel, &offer->offer.if_type);
-
 	memcpy(&newchannel->offermsg, offer,
 	       sizeof(struct vmbus_channel_offer_channel));
 	newchannel->monitor_grp = (u8)offer->monitorid / 32;
-- 
1.9.3


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 5/6] Drivers: hv: vmbus: distribute subchannels among all vcpus
  2015-04-21 14:27 [PATCH 0/6] Drivers: hv: vmbus: fair round robin algorithm for vmbus_get_outgoing_channel() Vitaly Kuznetsov
                   ` (3 preceding siblings ...)
  2015-04-21 14:27 ` [PATCH 4/6] Drivers: hv: vmbus: move init_vp_index() call to vmbus_process_offer() Vitaly Kuznetsov
@ 2015-04-21 14:27 ` Vitaly Kuznetsov
  2015-04-24  8:40   ` Dexuan Cui
  2015-04-21 14:27 ` [PATCH 6/6] Drivers: hv: vmbus: do a fair round robin when selecting an outgoing channel Vitaly Kuznetsov
  5 siblings, 1 reply; 18+ messages in thread
From: Vitaly Kuznetsov @ 2015-04-21 14:27 UTC (permalink / raw)
  To: K. Y. Srinivasan; +Cc: Haiyang Zhang, devel, linux-kernel, Dexuan Cui

Primary channels are distributed evenly across all vcpus we have. When the host
asks us to create subchannels it usually makes us num_cpus-1 offers and we are
supposed to distribute the work evenly among the channel itself and all its
subchannels. Make sure they are all assigned to different vcpus.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
---
 drivers/hv/channel_mgmt.c | 29 ++++++++++++++++++++++++++++-
 1 file changed, 28 insertions(+), 1 deletion(-)

diff --git a/drivers/hv/channel_mgmt.c b/drivers/hv/channel_mgmt.c
index 8f2761f..daa6417 100644
--- a/drivers/hv/channel_mgmt.c
+++ b/drivers/hv/channel_mgmt.c
@@ -270,6 +270,8 @@ static void init_vp_index(struct vmbus_channel *channel,
 	int i;
 	bool perf_chn = false;
 	u32 max_cpus = num_online_cpus();
+	struct vmbus_channel *primary = channel->primary_channel, *prev;
+	unsigned long flags;
 
 	for (i = IDE; i < MAX_PERF_CHN; i++) {
 		if (!memcmp(type_guid->b, hp_devs[i].guid,
@@ -290,7 +292,32 @@ static void init_vp_index(struct vmbus_channel *channel,
 		channel->target_vp = 0;
 		return;
 	}
-	cur_cpu = (++next_vp % max_cpus);
+
+	/*
+	 * Primary channels are distributed evenly across all vcpus we have.
+	 * When the host asks us to create subchannels it usually makes us
+	 * num_cpus-1 offers and we are supposed to distribute the work evenly
+	 * among the channel itself and all its subchannels. Make sure they are
+	 * all assigned to different vcpus.
+	 */
+	if (!primary)
+		cur_cpu = (++next_vp % max_cpus);
+	else {
+		/*
+		 * Let's assign the first subchannel of a channel to the
+		 * primary->target_cpu+1 and all the subsequent channels to
+		 * the prev->target_cpu+1.
+		 */
+		spin_lock_irqsave(&primary->lock, flags);
+		if (primary->num_sc == 1)
+			cur_cpu = (primary->target_cpu + 1) % max_cpus;
+		else {
+			prev = list_prev_entry(channel, sc_list);
+			cur_cpu = (prev->target_cpu + 1) % max_cpus;
+		}
+		spin_unlock_irqrestore(&primary->lock, flags);
+	}
+
 	channel->target_cpu = cur_cpu;
 	channel->target_vp = hv_context.vp_index[cur_cpu];
 }
-- 
1.9.3


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 6/6] Drivers: hv: vmbus: do a fair round robin when selecting an outgoing channel
  2015-04-21 14:27 [PATCH 0/6] Drivers: hv: vmbus: fair round robin algorithm for vmbus_get_outgoing_channel() Vitaly Kuznetsov
                   ` (4 preceding siblings ...)
  2015-04-21 14:27 ` [PATCH 5/6] Drivers: hv: vmbus: distribute subchannels among all vcpus Vitaly Kuznetsov
@ 2015-04-21 14:27 ` Vitaly Kuznetsov
  2015-04-24  8:42   ` Dexuan Cui
  5 siblings, 1 reply; 18+ messages in thread
From: Vitaly Kuznetsov @ 2015-04-21 14:27 UTC (permalink / raw)
  To: K. Y. Srinivasan; +Cc: Haiyang Zhang, devel, linux-kernel, Dexuan Cui

vmbus_get_outgoing_channel() implements the following algorithm for selecting
an outgoing channel (despite the comment before the function saying it
distributes the load equally):
1) If we have no subchannels return the primary channel;
2) If primary->next_oc is grater than primary->num_sc reset the primary->next_oc
   to 0 and return the primary channel;
3) Aim for the primary->next_oc subchannel, increment primary->next_oc;
4) Loop through all opened subchannels. If we see a channel which has
 target_cpu == current_cpu return it. If we reached the primary->next_oc'th
open subchannel return it;
5) Return the primary channel.
The implementation also skips the subchannel No. 0 unless it matches the current
cpu as we assign i to 1 in the initialization.

This is not a fair round robin as subchannels in the beginning of the list are
more likely to be returned and checking for current cpu aslo creates additional
complexity. Simplify the vmbus_get_outgoing_channel() function, make it do what
the comment before it says.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
---
 drivers/hv/channel_mgmt.c | 27 +++++++++------------------
 1 file changed, 9 insertions(+), 18 deletions(-)

diff --git a/drivers/hv/channel_mgmt.c b/drivers/hv/channel_mgmt.c
index daa6417..df82442 100644
--- a/drivers/hv/channel_mgmt.c
+++ b/drivers/hv/channel_mgmt.c
@@ -827,39 +827,30 @@ cleanup:
 struct vmbus_channel *vmbus_get_outgoing_channel(struct vmbus_channel *primary)
 {
 	struct list_head *cur, *tmp;
-	int cur_cpu;
 	struct vmbus_channel *cur_channel;
-	struct vmbus_channel *outgoing_channel = primary;
-	int next_channel;
-	int i = 1;
+	int i = 0;
 
 	if (list_empty(&primary->sc_list))
-		return outgoing_channel;
+		return primary;
 
-	next_channel = primary->next_oc++;
-
-	if (next_channel > (primary->num_sc)) {
+	if (primary->next_oc > primary->num_sc) {
 		primary->next_oc = 0;
-		return outgoing_channel;
+		return primary;
 	}
 
-	cur_cpu = hv_context.vp_index[get_cpu()];
-	put_cpu();
 	list_for_each_safe(cur, tmp, &primary->sc_list) {
+		i++;
 		cur_channel = list_entry(cur, struct vmbus_channel, sc_list);
 		if (cur_channel->state != CHANNEL_OPENED_STATE)
 			continue;
 
-		if (cur_channel->target_vp == cur_cpu)
-			return cur_channel;
-
-		if (i == next_channel)
+		if (i > primary->next_oc) {
+			primary->next_oc = i;
 			return cur_channel;
-
-		i++;
+		}
 	}
 
-	return outgoing_channel;
+	return primary;
 }
 EXPORT_SYMBOL_GPL(vmbus_get_outgoing_channel);
 
-- 
1.9.3


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* RE: [PATCH 3/6] Drivers: hv: vmbus: decrease num_sc on subchannel removal
  2015-04-21 14:27 ` [PATCH 3/6] Drivers: hv: vmbus: decrease num_sc on subchannel removal Vitaly Kuznetsov
@ 2015-04-24  7:02   ` Dexuan Cui
  2015-04-24  8:40     ` Vitaly Kuznetsov
  0 siblings, 1 reply; 18+ messages in thread
From: Dexuan Cui @ 2015-04-24  7:02 UTC (permalink / raw)
  To: Vitaly Kuznetsov, KY Srinivasan; +Cc: Haiyang Zhang, devel, linux-kernel

> -----Original Message-----
> From: Vitaly Kuznetsov [mailto:vkuznets@redhat.com]
> Sent: Tuesday, April 21, 2015 22:28
> To: KY Srinivasan
> Cc: Haiyang Zhang; devel@linuxdriverproject.org; linux-
> kernel@vger.kernel.org; Dexuan Cui
> Subject: [PATCH 3/6] Drivers: hv: vmbus: decrease num_sc on subchannel
> removal
> 
> It is unlikely that that host will ask us to close only one subchannel for a
> device but let's be consistent. Do both num_sc++ and num_sc-- with
> channel->lock to be on the safe side.
> 
> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
> ---
>  drivers/hv/channel_mgmt.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/hv/channel_mgmt.c b/drivers/hv/channel_mgmt.c
> index b28cbdf..8b4b561 100644
> --- a/drivers/hv/channel_mgmt.c
> +++ b/drivers/hv/channel_mgmt.c
> @@ -205,6 +205,7 @@ void hv_process_channel_removal(struct
> vmbus_channel *channel, u32 relid)
>  		primary_channel = channel->primary_channel;
>  		spin_lock_irqsave(&primary_channel->lock, flags);
>  		list_del(&channel->sc_list);
> +		channel->num_sc--;

Hi Vitaly,
Here it should be
		primary_channel->num_sc--;

Thanks,
-- Dexuan

>  		spin_unlock_irqrestore(&primary_channel->lock, flags);
>  	}
>  	free_channel(channel);
> @@ -265,8 +266,8 @@ static void vmbus_process_offer(struct
> vmbus_channel *newchannel)
>  			newchannel->primary_channel = channel;
>  			spin_lock_irqsave(&channel->lock, flags);
>  			list_add_tail(&newchannel->sc_list, &channel-
> >sc_list);
> -			spin_unlock_irqrestore(&channel->lock, flags);
>  			channel->num_sc++;
> +			spin_unlock_irqrestore(&channel->lock, flags);
>  		} else
>  			goto err_free_chan;
>  	}
> --
> 1.9.3


^ permalink raw reply	[flat|nested] 18+ messages in thread

* RE: [PATCH 4/6] Drivers: hv: vmbus: move init_vp_index() call to vmbus_process_offer()
  2015-04-21 14:27 ` [PATCH 4/6] Drivers: hv: vmbus: move init_vp_index() call to vmbus_process_offer() Vitaly Kuznetsov
@ 2015-04-24  8:38   ` Dexuan Cui
  2015-04-24  8:46     ` Vitaly Kuznetsov
  0 siblings, 1 reply; 18+ messages in thread
From: Dexuan Cui @ 2015-04-24  8:38 UTC (permalink / raw)
  To: Vitaly Kuznetsov, KY Srinivasan; +Cc: Haiyang Zhang, devel, linux-kernel

> -----Original Message-----
> From: Vitaly Kuznetsov [mailto:vkuznets@redhat.com]
> Sent: Tuesday, April 21, 2015 22:28
> To: KY Srinivasan
> Cc: Haiyang Zhang; devel@linuxdriverproject.org; linux-
> kernel@vger.kernel.org; Dexuan Cui
> Subject: [PATCH 4/6] Drivers: hv: vmbus: move init_vp_index() call to
> vmbus_process_offer()
> 
> We need to call init_vp_index() after we added the channel to the
> appropriate
> list (global or subchannel) to be able to use this information when assigning
> the channel to the particular vcpu. To do so we need to move a couple of
> functions around. The only real change is the init_vp_index() call. This is a
> small refactoring without a functional change.

IMO we don't have to move such a big block of code -- we only need to add a
forward function declaration of init_vp_index()?

Thanks,
-- Dexuan

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 3/6] Drivers: hv: vmbus: decrease num_sc on subchannel removal
  2015-04-24  7:02   ` Dexuan Cui
@ 2015-04-24  8:40     ` Vitaly Kuznetsov
  0 siblings, 0 replies; 18+ messages in thread
From: Vitaly Kuznetsov @ 2015-04-24  8:40 UTC (permalink / raw)
  To: Dexuan Cui; +Cc: KY Srinivasan, Haiyang Zhang, devel, linux-kernel

Dexuan Cui <decui@microsoft.com> writes:

>> -----Original Message-----
>> From: Vitaly Kuznetsov [mailto:vkuznets@redhat.com]
>> Sent: Tuesday, April 21, 2015 22:28
>> To: KY Srinivasan
>> Cc: Haiyang Zhang; devel@linuxdriverproject.org; linux-
>> kernel@vger.kernel.org; Dexuan Cui
>> Subject: [PATCH 3/6] Drivers: hv: vmbus: decrease num_sc on subchannel
>> removal
>> 
>> It is unlikely that that host will ask us to close only one subchannel for a
>> device but let's be consistent. Do both num_sc++ and num_sc-- with
>> channel->lock to be on the safe side.
>> 
>> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
>> ---
>>  drivers/hv/channel_mgmt.c | 3 ++-
>>  1 file changed, 2 insertions(+), 1 deletion(-)
>> 
>> diff --git a/drivers/hv/channel_mgmt.c b/drivers/hv/channel_mgmt.c
>> index b28cbdf..8b4b561 100644
>> --- a/drivers/hv/channel_mgmt.c
>> +++ b/drivers/hv/channel_mgmt.c
>> @@ -205,6 +205,7 @@ void hv_process_channel_removal(struct
>> vmbus_channel *channel, u32 relid)
>>  		primary_channel = channel->primary_channel;
>>  		spin_lock_irqsave(&primary_channel->lock, flags);
>>  		list_del(&channel->sc_list);
>> +		channel->num_sc--;
>
> Hi Vitaly,
> Here it should be
> 		primary_channel->num_sc--;

Ah, of course. I'll fix and resend, thanks!

>
> Thanks,
> -- Dexuan
>
>>  		spin_unlock_irqrestore(&primary_channel->lock, flags);
>>  	}
>>  	free_channel(channel);
>> @@ -265,8 +266,8 @@ static void vmbus_process_offer(struct
>> vmbus_channel *newchannel)
>>  			newchannel->primary_channel = channel;
>>  			spin_lock_irqsave(&channel->lock, flags);
>>  			list_add_tail(&newchannel->sc_list, &channel-
>> >sc_list);
>> -			spin_unlock_irqrestore(&channel->lock, flags);
>>  			channel->num_sc++;
>> +			spin_unlock_irqrestore(&channel->lock, flags);
>>  		} else
>>  			goto err_free_chan;
>>  	}
>> --
>> 1.9.3

-- 
  Vitaly

^ permalink raw reply	[flat|nested] 18+ messages in thread

* RE: [PATCH 5/6] Drivers: hv: vmbus: distribute subchannels among all vcpus
  2015-04-21 14:27 ` [PATCH 5/6] Drivers: hv: vmbus: distribute subchannels among all vcpus Vitaly Kuznetsov
@ 2015-04-24  8:40   ` Dexuan Cui
  2015-04-24  9:05     ` Vitaly Kuznetsov
  0 siblings, 1 reply; 18+ messages in thread
From: Dexuan Cui @ 2015-04-24  8:40 UTC (permalink / raw)
  To: Vitaly Kuznetsov, KY Srinivasan; +Cc: Haiyang Zhang, devel, linux-kernel

> -----Original Message-----
> From: Vitaly Kuznetsov [mailto:vkuznets@redhat.com]
> Sent: Tuesday, April 21, 2015 22:28
> To: KY Srinivasan
> Cc: Haiyang Zhang; devel@linuxdriverproject.org; linux-
> kernel@vger.kernel.org; Dexuan Cui
> Subject: [PATCH 5/6] Drivers: hv: vmbus: distribute subchannels among all
> vcpus
>
> Primary channels are distributed evenly across all vcpus we have. When the
> host asks us to create subchannels it usually makes us num_cpus-1 offers

Hi Vitaly,
AFAIK, in the VSP of storvsc, the number of subchannel is
 (the_number_of_vcpus - 1) / 4.

This means for a 8-vCPU guest, there is only 1 subchannel.

Your new algorithm tends to make the vCPUs with small-number busier:
e.g., in the 8-vCPU case, assuming we have 4 SCSI controllers:
vCPU0: scsi0's PrimaryChannel (P)
vCPU1: scsi0's SubChannel (S) + scsi1's P
vCPU2: scsi1's S + scsi2's P
vCPU3: scsi2's S + scsi3's P
vCPU4: scsi3's S
vCPU5, 6 and 7 are idle.

In this special case, the existing algorithm is better. :-)

However, I do like this idea in your patch, that is, making sure a device's
primary/sub channels are assigned to differents vCPUs.

I'm just wondering if we should use an even better (and complex) algorithm :-)

PS, yeah, for netvsc(HV_NIC_GUID), the number of SC is indeed
the_number_vcpus -1. I'm not sure about the upcoming HV_ND_GUID --
maybe it's the same as HV_NIC_GUID.

Thanks,
-- Dexuan

> and we are supposed to distribute the work evenly among the channel
>  itself and all its  subchannels. Make sure they are all assigned to
>  different vcpus.
>
> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
> ---
>  drivers/hv/channel_mgmt.c | 29 ++++++++++++++++++++++++++++-
>  1 file changed, 28 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/hv/channel_mgmt.c b/drivers/hv/channel_mgmt.c
> index 8f2761f..daa6417 100644
> --- a/drivers/hv/channel_mgmt.c
> +++ b/drivers/hv/channel_mgmt.c
> @@ -270,6 +270,8 @@ static void init_vp_index(struct vmbus_channel
> *channel,
>       int i;
>       bool perf_chn = false;
>       u32 max_cpus = num_online_cpus();
> +     struct vmbus_channel *primary = channel->primary_channel, *prev;
> +     unsigned long flags;
>
>       for (i = IDE; i < MAX_PERF_CHN; i++) {
>               if (!memcmp(type_guid->b, hp_devs[i].guid,
> @@ -290,7 +292,32 @@ static void init_vp_index(struct vmbus_channel
> *channel,
>               channel->target_vp = 0;
>               return;
>       }
> -     cur_cpu = (++next_vp % max_cpus);
> +
> +     /*
> +      * Primary channels are distributed evenly across all vcpus we have.
> +      * When the host asks us to create subchannels it usually makes us
> +      * num_cpus-1 offers and we are supposed to distribute the work
> evenly
> +      * among the channel itself and all its subchannels. Make sure they
> are
> +      * all assigned to different vcpus.
> +      */
> +     if (!primary)
> +             cur_cpu = (++next_vp % max_cpus);
> +     else {
> +             /*
> +              * Let's assign the first subchannel of a channel to the
> +              * primary->target_cpu+1 and all the subsequent channels
> to
> +              * the prev->target_cpu+1.
> +              */
> +             spin_lock_irqsave(&primary->lock, flags);
> +             if (primary->num_sc == 1)
> +                     cur_cpu = (primary->target_cpu + 1) % max_cpus;
> +             else {
> +                     prev = list_prev_entry(channel, sc_list);
> +                     cur_cpu = (prev->target_cpu + 1) % max_cpus;
> +             }
> +             spin_unlock_irqrestore(&primary->lock, flags);
> +     }
> +
>       channel->target_cpu = cur_cpu;
>       channel->target_vp = hv_context.vp_index[cur_cpu];
>  }
> --
> 1.9.3


^ permalink raw reply	[flat|nested] 18+ messages in thread

* RE: [PATCH 6/6] Drivers: hv: vmbus: do a fair round robin when selecting an outgoing channel
  2015-04-21 14:27 ` [PATCH 6/6] Drivers: hv: vmbus: do a fair round robin when selecting an outgoing channel Vitaly Kuznetsov
@ 2015-04-24  8:42   ` Dexuan Cui
  2015-04-24  8:59     ` Vitaly Kuznetsov
  0 siblings, 1 reply; 18+ messages in thread
From: Dexuan Cui @ 2015-04-24  8:42 UTC (permalink / raw)
  To: Vitaly Kuznetsov, KY Srinivasan; +Cc: Haiyang Zhang, devel, linux-kernel

> -----Original Message-----
> From: Vitaly Kuznetsov [mailto:vkuznets@redhat.com]
> Sent: Tuesday, April 21, 2015 22:28
> To: KY Srinivasan
> Cc: Haiyang Zhang; devel@linuxdriverproject.org; linux-
> kernel@vger.kernel.org; Dexuan Cui
> Subject: [PATCH 6/6] Drivers: hv: vmbus: do a fair round robin when
> selecting an outgoing channel
> 
> vmbus_get_outgoing_channel() implements the following algorithm for
> selecting
> an outgoing channel (despite the comment before the function saying it
> distributes the load equally):

Yeah, I also found the issue. 

> 1) If we have no subchannels return the primary channel;
> 2) If primary->next_oc is grater than primary->num_sc reset the primary-
> >next_oc
>    to 0 and return the primary channel;
> 3) Aim for the primary->next_oc subchannel, increment primary->next_oc;
> 4) Loop through all opened subchannels. If we see a channel which has
>  target_cpu == current_cpu return it. If we reached the primary->next_oc'th
> open subchannel return it;
> 5) Return the primary channel.
> The implementation also skips the subchannel No. 0 unless it matches the
> current
> cpu as we assign i to 1 in the initialization.
> 
> This is not a fair round robin as subchannels in the beginning of the list are
> more likely to be returned and checking for current cpu aslo creates

I suppose the current algorithm is trying to make use of cache locality?
KY may share more information.


> additional
> complexity. Simplify the vmbus_get_outgoing_channel() function, make it
> do what the comment before it says.

Hi Vitaly,
It looks your algorithm also has an issue:
Assuming primary->num_sc == 3 (SC1, SC2, SC3)
1st time: we choose SC1 and primary->next_oc is set to 1.
2nd time: we choose SC2 and primary->next_oc is set to 2.
3rd time: we choose SC3 and primary->next_oc is set to 3.
4th time and later:  since i varies among 1~3 and can't be bigger than 3,
we always choose the primary channel.


BTW, IMO it's not easy to achieve complete fairness because 
vmbus_get_outgoing_channel() can run simultaneously on different
CPUs (so primary->next_oc can be modified at the same time by multiple
CPUs), and we believe it should be lockless.
Maybe atomic_t can help(?)

Thanks,
-- Dexuan

> 
> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
> ---
>  drivers/hv/channel_mgmt.c | 27 +++++++++------------------
>  1 file changed, 9 insertions(+), 18 deletions(-)
> 
> diff --git a/drivers/hv/channel_mgmt.c b/drivers/hv/channel_mgmt.c
> index daa6417..df82442 100644
> --- a/drivers/hv/channel_mgmt.c
> +++ b/drivers/hv/channel_mgmt.c
> @@ -827,39 +827,30 @@ cleanup:
>  struct vmbus_channel *vmbus_get_outgoing_channel(struct
> vmbus_channel *primary)
>  {
>  	struct list_head *cur, *tmp;
> -	int cur_cpu;
>  	struct vmbus_channel *cur_channel;
> -	struct vmbus_channel *outgoing_channel = primary;
> -	int next_channel;
> -	int i = 1;
> +	int i = 0;
> 
>  	if (list_empty(&primary->sc_list))
> -		return outgoing_channel;
> +		return primary;
> 
> -	next_channel = primary->next_oc++;
> -
> -	if (next_channel > (primary->num_sc)) {
> +	if (primary->next_oc > primary->num_sc) {
>  		primary->next_oc = 0;
> -		return outgoing_channel;
> +		return primary;
>  	}
> 
> -	cur_cpu = hv_context.vp_index[get_cpu()];
> -	put_cpu();
>  	list_for_each_safe(cur, tmp, &primary->sc_list) {
> +		i++;
>  		cur_channel = list_entry(cur, struct vmbus_channel, sc_list);
>  		if (cur_channel->state != CHANNEL_OPENED_STATE)
>  			continue;
> 
> -		if (cur_channel->target_vp == cur_cpu)
> -			return cur_channel;
> -
> -		if (i == next_channel)
> +		if (i > primary->next_oc) {
> +			primary->next_oc = i;
>  			return cur_channel;
> -
> -		i++;
> +		}
>  	}
> 
> -	return outgoing_channel;
> +	return primary;
>  }
>  EXPORT_SYMBOL_GPL(vmbus_get_outgoing_channel);
> 
> --
> 1.9.3


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 4/6] Drivers: hv: vmbus: move init_vp_index() call to vmbus_process_offer()
  2015-04-24  8:38   ` Dexuan Cui
@ 2015-04-24  8:46     ` Vitaly Kuznetsov
  0 siblings, 0 replies; 18+ messages in thread
From: Vitaly Kuznetsov @ 2015-04-24  8:46 UTC (permalink / raw)
  To: Dexuan Cui; +Cc: KY Srinivasan, Haiyang Zhang, devel, linux-kernel

Dexuan Cui <decui@microsoft.com> writes:

>> -----Original Message-----
>> From: Vitaly Kuznetsov [mailto:vkuznets@redhat.com]
>> Sent: Tuesday, April 21, 2015 22:28
>> To: KY Srinivasan
>> Cc: Haiyang Zhang; devel@linuxdriverproject.org; linux-
>> kernel@vger.kernel.org; Dexuan Cui
>> Subject: [PATCH 4/6] Drivers: hv: vmbus: move init_vp_index() call to
>> vmbus_process_offer()
>> 
>> We need to call init_vp_index() after we added the channel to the
>> appropriate
>> list (global or subchannel) to be able to use this information when assigning
>> the channel to the particular vcpu. To do so we need to move a couple of
>> functions around. The only real change is the init_vp_index() call. This is a
>> small refactoring without a functional change.
>
> IMO we don't have to move such a big block of code -- we only need to add a
> forward function declaration of init_vp_index()?

Ok, let's make the change smaller. I'll fix and resend, thanks!

>
> Thanks,
> -- Dexuan

-- 
  Vitaly

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 6/6] Drivers: hv: vmbus: do a fair round robin when selecting an outgoing channel
  2015-04-24  8:42   ` Dexuan Cui
@ 2015-04-24  8:59     ` Vitaly Kuznetsov
  0 siblings, 0 replies; 18+ messages in thread
From: Vitaly Kuznetsov @ 2015-04-24  8:59 UTC (permalink / raw)
  To: Dexuan Cui; +Cc: KY Srinivasan, Haiyang Zhang, devel, linux-kernel

Dexuan Cui <decui@microsoft.com> writes:

>> -----Original Message-----
>> From: Vitaly Kuznetsov [mailto:vkuznets@redhat.com]
>> Sent: Tuesday, April 21, 2015 22:28
>> To: KY Srinivasan
>> Cc: Haiyang Zhang; devel@linuxdriverproject.org; linux-
>> kernel@vger.kernel.org; Dexuan Cui
>> Subject: [PATCH 6/6] Drivers: hv: vmbus: do a fair round robin when
>> selecting an outgoing channel
>> 
>> vmbus_get_outgoing_channel() implements the following algorithm for
>> selecting
>> an outgoing channel (despite the comment before the function saying it
>> distributes the load equally):
>
> Yeah, I also found the issue. 
>
>> 1) If we have no subchannels return the primary channel;
>> 2) If primary->next_oc is grater than primary->num_sc reset the primary-
>> >next_oc
>>    to 0 and return the primary channel;
>> 3) Aim for the primary->next_oc subchannel, increment primary->next_oc;
>> 4) Loop through all opened subchannels. If we see a channel which has
>>  target_cpu == current_cpu return it. If we reached the primary->next_oc'th
>> open subchannel return it;
>> 5) Return the primary channel.
>> The implementation also skips the subchannel No. 0 unless it matches the
>> current
>> cpu as we assign i to 1 in the initialization.
>> 
>> This is not a fair round robin as subchannels in the beginning of the list are
>> more likely to be returned and checking for current cpu aslo creates
>
> I suppose the current algorithm is trying to make use of cache locality?
> KY may share more information.
>
>> additional
>> complexity. Simplify the vmbus_get_outgoing_channel() function, make it
>> do what the comment before it says.
>
> Hi Vitaly,
> It looks your algorithm also has an issue:
> Assuming primary->num_sc == 3 (SC1, SC2, SC3)
> 1st time: we choose SC1 and primary->next_oc is set to 1.
> 2nd time: we choose SC2 and primary->next_oc is set to 2.
> 3rd time: we choose SC3 and primary->next_oc is set to 3.
> 4th time and later:  since i varies among 1~3 and can't be bigger than 3,
> we always choose the primary channel.

You're right, it seems 'if (primary->next_oc > primary->num_sc)' is
off-by-one, it should be >=

>
> BTW, IMO it's not easy to achieve complete fairness because 
> vmbus_get_outgoing_channel() can run simultaneously on different
> CPUs (so primary->next_oc can be modified at the same time by multiple
> CPUs), and we believe it should be lockless.
> Maybe atomic_t can help(?)

This thing bothered me a bit but then I realized that we don't actually
care - doing a mistake once is better than suffering from the slowness
of locks/atomics. I'd suggest we keep it this way (with the fix
mentioned above).

Thanks!

>
> Thanks,
> -- Dexuan
>
>> 
>> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
>> ---
>>  drivers/hv/channel_mgmt.c | 27 +++++++++------------------
>>  1 file changed, 9 insertions(+), 18 deletions(-)
>> 
>> diff --git a/drivers/hv/channel_mgmt.c b/drivers/hv/channel_mgmt.c
>> index daa6417..df82442 100644
>> --- a/drivers/hv/channel_mgmt.c
>> +++ b/drivers/hv/channel_mgmt.c
>> @@ -827,39 +827,30 @@ cleanup:
>>  struct vmbus_channel *vmbus_get_outgoing_channel(struct
>> vmbus_channel *primary)
>>  {
>>  	struct list_head *cur, *tmp;
>> -	int cur_cpu;
>>  	struct vmbus_channel *cur_channel;
>> -	struct vmbus_channel *outgoing_channel = primary;
>> -	int next_channel;
>> -	int i = 1;
>> +	int i = 0;
>> 
>>  	if (list_empty(&primary->sc_list))
>> -		return outgoing_channel;
>> +		return primary;
>> 
>> -	next_channel = primary->next_oc++;
>> -
>> -	if (next_channel > (primary->num_sc)) {
>> +	if (primary->next_oc > primary->num_sc) {
>>  		primary->next_oc = 0;
>> -		return outgoing_channel;
>> +		return primary;
>>  	}
>> 
>> -	cur_cpu = hv_context.vp_index[get_cpu()];
>> -	put_cpu();
>>  	list_for_each_safe(cur, tmp, &primary->sc_list) {
>> +		i++;
>>  		cur_channel = list_entry(cur, struct vmbus_channel, sc_list);
>>  		if (cur_channel->state != CHANNEL_OPENED_STATE)
>>  			continue;
>> 
>> -		if (cur_channel->target_vp == cur_cpu)
>> -			return cur_channel;
>> -
>> -		if (i == next_channel)
>> +		if (i > primary->next_oc) {
>> +			primary->next_oc = i;
>>  			return cur_channel;
>> -
>> -		i++;
>> +		}
>>  	}
>> 
>> -	return outgoing_channel;
>> +	return primary;
>>  }
>>  EXPORT_SYMBOL_GPL(vmbus_get_outgoing_channel);
>> 
>> --
>> 1.9.3

-- 
  Vitaly

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 5/6] Drivers: hv: vmbus: distribute subchannels among all vcpus
  2015-04-24  8:40   ` Dexuan Cui
@ 2015-04-24  9:05     ` Vitaly Kuznetsov
  2015-04-24 16:46       ` KY Srinivasan
  0 siblings, 1 reply; 18+ messages in thread
From: Vitaly Kuznetsov @ 2015-04-24  9:05 UTC (permalink / raw)
  To: Dexuan Cui; +Cc: KY Srinivasan, Haiyang Zhang, devel, linux-kernel

Dexuan Cui <decui@microsoft.com> writes:

>> -----Original Message-----
>> From: Vitaly Kuznetsov [mailto:vkuznets@redhat.com]
>> Sent: Tuesday, April 21, 2015 22:28
>> To: KY Srinivasan
>> Cc: Haiyang Zhang; devel@linuxdriverproject.org; linux-
>> kernel@vger.kernel.org; Dexuan Cui
>> Subject: [PATCH 5/6] Drivers: hv: vmbus: distribute subchannels among all
>> vcpus
>>
>> Primary channels are distributed evenly across all vcpus we have. When the
>> host asks us to create subchannels it usually makes us num_cpus-1 offers
>
> Hi Vitaly,
> AFAIK, in the VSP of storvsc, the number of subchannel is
>  (the_number_of_vcpus - 1) / 4.
>
> This means for a 8-vCPU guest, there is only 1 subchannel.
>
> Your new algorithm tends to make the vCPUs with small-number busier:
> e.g., in the 8-vCPU case, assuming we have 4 SCSI controllers:
> vCPU0: scsi0's PrimaryChannel (P)
> vCPU1: scsi0's SubChannel (S) + scsi1's P
> vCPU2: scsi1's S + scsi2's P
> vCPU3: scsi2's S + scsi3's P
> vCPU4: scsi3's S
> vCPU5, 6 and 7 are idle.
>
> In this special case, the existing algorithm is better. :-)
>
> However, I do like this idea in your patch, that is, making sure a device's
> primary/sub channels are assigned to differents vCPUs.

Under special circumstances with the current code we can end up with
having all subchannels on the same vCPU with the primary channel I guess
:-) This is not something common, but possible.

>
> I'm just wondering if we should use an even better (and complex)
> algorithm :-)

The question here is - does sticking to the current vCPU help? If it
does, I can suggest the following (I think I even mentioned that in my
PATCH 00): first we try to find a (sub)channel with target_cpu ==
current_vcpu and only when we fail we do the round robin. I'd like to
hear K.Y.'s opinion here as he's the original author :-)

>
> PS, yeah, for netvsc(HV_NIC_GUID), the number of SC is indeed
> the_number_vcpus -1. I'm not sure about the upcoming HV_ND_GUID --
> maybe it's the same as HV_NIC_GUID.
>
> Thanks,
> -- Dexuan
>
>> and we are supposed to distribute the work evenly among the channel
>>  itself and all its  subchannels. Make sure they are all assigned to
>>  different vcpus.
>>
>> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
>> ---
>>  drivers/hv/channel_mgmt.c | 29 ++++++++++++++++++++++++++++-
>>  1 file changed, 28 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/hv/channel_mgmt.c b/drivers/hv/channel_mgmt.c
>> index 8f2761f..daa6417 100644
>> --- a/drivers/hv/channel_mgmt.c
>> +++ b/drivers/hv/channel_mgmt.c
>> @@ -270,6 +270,8 @@ static void init_vp_index(struct vmbus_channel
>> *channel,
>>       int i;
>>       bool perf_chn = false;
>>       u32 max_cpus = num_online_cpus();
>> +     struct vmbus_channel *primary = channel->primary_channel, *prev;
>> +     unsigned long flags;
>>
>>       for (i = IDE; i < MAX_PERF_CHN; i++) {
>>               if (!memcmp(type_guid->b, hp_devs[i].guid,
>> @@ -290,7 +292,32 @@ static void init_vp_index(struct vmbus_channel
>> *channel,
>>               channel->target_vp = 0;
>>               return;
>>       }
>> -     cur_cpu = (++next_vp % max_cpus);
>> +
>> +     /*
>> +      * Primary channels are distributed evenly across all vcpus we have.
>> +      * When the host asks us to create subchannels it usually makes us
>> +      * num_cpus-1 offers and we are supposed to distribute the work
>> evenly
>> +      * among the channel itself and all its subchannels. Make sure they
>> are
>> +      * all assigned to different vcpus.
>> +      */
>> +     if (!primary)
>> +             cur_cpu = (++next_vp % max_cpus);
>> +     else {
>> +             /*
>> +              * Let's assign the first subchannel of a channel to the
>> +              * primary->target_cpu+1 and all the subsequent channels
>> to
>> +              * the prev->target_cpu+1.
>> +              */
>> +             spin_lock_irqsave(&primary->lock, flags);
>> +             if (primary->num_sc == 1)
>> +                     cur_cpu = (primary->target_cpu + 1) % max_cpus;
>> +             else {
>> +                     prev = list_prev_entry(channel, sc_list);
>> +                     cur_cpu = (prev->target_cpu + 1) % max_cpus;
>> +             }
>> +             spin_unlock_irqrestore(&primary->lock, flags);
>> +     }
>> +
>>       channel->target_cpu = cur_cpu;
>>       channel->target_vp = hv_context.vp_index[cur_cpu];
>>  }
>> --
>> 1.9.3

-- 
  Vitaly

^ permalink raw reply	[flat|nested] 18+ messages in thread

* RE: [PATCH 5/6] Drivers: hv: vmbus: distribute subchannels among all vcpus
  2015-04-24  9:05     ` Vitaly Kuznetsov
@ 2015-04-24 16:46       ` KY Srinivasan
  2015-04-27 13:30         ` Vitaly Kuznetsov
  0 siblings, 1 reply; 18+ messages in thread
From: KY Srinivasan @ 2015-04-24 16:46 UTC (permalink / raw)
  To: Vitaly Kuznetsov, Dexuan Cui; +Cc: Haiyang Zhang, devel, linux-kernel



> -----Original Message-----
> From: Vitaly Kuznetsov [mailto:vkuznets@redhat.com]
> Sent: Friday, April 24, 2015 2:05 AM
> To: Dexuan Cui
> Cc: KY Srinivasan; Haiyang Zhang; devel@linuxdriverproject.org; linux-
> kernel@vger.kernel.org
> Subject: Re: [PATCH 5/6] Drivers: hv: vmbus: distribute subchannels among
> all vcpus
> 
> Dexuan Cui <decui@microsoft.com> writes:
> 
> >> -----Original Message-----
> >> From: Vitaly Kuznetsov [mailto:vkuznets@redhat.com]
> >> Sent: Tuesday, April 21, 2015 22:28
> >> To: KY Srinivasan
> >> Cc: Haiyang Zhang; devel@linuxdriverproject.org; linux-
> >> kernel@vger.kernel.org; Dexuan Cui
> >> Subject: [PATCH 5/6] Drivers: hv: vmbus: distribute subchannels among all
> >> vcpus
> >>
> >> Primary channels are distributed evenly across all vcpus we have. When
> the
> >> host asks us to create subchannels it usually makes us num_cpus-1 offers
> >
> > Hi Vitaly,
> > AFAIK, in the VSP of storvsc, the number of subchannel is
> >  (the_number_of_vcpus - 1) / 4.
> >
> > This means for a 8-vCPU guest, there is only 1 subchannel.
> >
> > Your new algorithm tends to make the vCPUs with small-number busier:
> > e.g., in the 8-vCPU case, assuming we have 4 SCSI controllers:
> > vCPU0: scsi0's PrimaryChannel (P)
> > vCPU1: scsi0's SubChannel (S) + scsi1's P
> > vCPU2: scsi1's S + scsi2's P
> > vCPU3: scsi2's S + scsi3's P
> > vCPU4: scsi3's S
> > vCPU5, 6 and 7 are idle.
> >
> > In this special case, the existing algorithm is better. :-)
> >
> > However, I do like this idea in your patch, that is, making sure a device's
> > primary/sub channels are assigned to differents vCPUs.
> 
> Under special circumstances with the current code we can end up with
> having all subchannels on the same vCPU with the primary channel I guess
> :-) This is not something common, but possible.
> 
> >
> > I'm just wondering if we should use an even better (and complex)
> > algorithm :-)
> 
> The question here is - does sticking to the current vCPU help? If it
> does, I can suggest the following (I think I even mentioned that in my
> PATCH 00): first we try to find a (sub)channel with target_cpu ==
> current_vcpu and only when we fail we do the round robin. I'd like to
> hear K.Y.'s opinion here as he's the original author :-)

Sorry for the delayed response. Initially I had implemented a scheme that would 
pick an outgoing CPU that was closest to the CPU on which the request came (to maintain
cache locality especially on NUMA systems). I changed this algorithm to spread the load
more uniformly as we were trying to improve Linux IOPS on Azure XIO
(premium storage). We are currently testing
this code on our Converged Offering - CPS and I am finding that the perf as measured by IOS has regressed.
I have not narrowed the reason for this regression and it may very well be the change in the 
algorithm for selecting the outgoing channel. In general, I don't think the logic here needs to be 
exact and locality (being on the same CPU or within the same NUMA node) is important. Any change
to this algorithm will have to be validated on different MSFT environments (Azure XIO, CPS etc.).

Regards,

K. Y

 

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 5/6] Drivers: hv: vmbus: distribute subchannels among all vcpus
  2015-04-24 16:46       ` KY Srinivasan
@ 2015-04-27 13:30         ` Vitaly Kuznetsov
  2015-04-27 18:09           ` KY Srinivasan
  0 siblings, 1 reply; 18+ messages in thread
From: Vitaly Kuznetsov @ 2015-04-27 13:30 UTC (permalink / raw)
  To: KY Srinivasan; +Cc: Dexuan Cui, Haiyang Zhang, devel, linux-kernel

KY Srinivasan <kys@microsoft.com> writes:

>> -----Original Message-----
>> From: Vitaly Kuznetsov [mailto:vkuznets@redhat.com]
>> Sent: Friday, April 24, 2015 2:05 AM
>> To: Dexuan Cui
>> Cc: KY Srinivasan; Haiyang Zhang; devel@linuxdriverproject.org; linux-
>> kernel@vger.kernel.org
>> Subject: Re: [PATCH 5/6] Drivers: hv: vmbus: distribute subchannels among
>> all vcpus
>> 
>> Dexuan Cui <decui@microsoft.com> writes:
>> 
>> >> -----Original Message-----
>> >> From: Vitaly Kuznetsov [mailto:vkuznets@redhat.com]
>> >> Sent: Tuesday, April 21, 2015 22:28
>> >> To: KY Srinivasan
>> >> Cc: Haiyang Zhang; devel@linuxdriverproject.org; linux-
>> >> kernel@vger.kernel.org; Dexuan Cui
>> >> Subject: [PATCH 5/6] Drivers: hv: vmbus: distribute subchannels among all
>> >> vcpus
>> >>
>> >> Primary channels are distributed evenly across all vcpus we have. When
>> the
>> >> host asks us to create subchannels it usually makes us num_cpus-1 offers
>> >
>> > Hi Vitaly,
>> > AFAIK, in the VSP of storvsc, the number of subchannel is
>> >  (the_number_of_vcpus - 1) / 4.
>> >
>> > This means for a 8-vCPU guest, there is only 1 subchannel.
>> >
>> > Your new algorithm tends to make the vCPUs with small-number busier:
>> > e.g., in the 8-vCPU case, assuming we have 4 SCSI controllers:
>> > vCPU0: scsi0's PrimaryChannel (P)
>> > vCPU1: scsi0's SubChannel (S) + scsi1's P
>> > vCPU2: scsi1's S + scsi2's P
>> > vCPU3: scsi2's S + scsi3's P
>> > vCPU4: scsi3's S
>> > vCPU5, 6 and 7 are idle.
>> >
>> > In this special case, the existing algorithm is better. :-)
>> >
>> > However, I do like this idea in your patch, that is, making sure a device's
>> > primary/sub channels are assigned to differents vCPUs.
>> 
>> Under special circumstances with the current code we can end up with
>> having all subchannels on the same vCPU with the primary channel I guess
>> :-) This is not something common, but possible.
>> 
>> >
>> > I'm just wondering if we should use an even better (and complex)
>> > algorithm :-)
>> 
>> The question here is - does sticking to the current vCPU help? If it
>> does, I can suggest the following (I think I even mentioned that in my
>> PATCH 00): first we try to find a (sub)channel with target_cpu ==
>> current_vcpu and only when we fail we do the round robin. I'd like to
>> hear K.Y.'s opinion here as he's the original author :-)
>
> Sorry for the delayed response. Initially I had implemented a scheme that would 
> pick an outgoing CPU that was closest to the CPU on which the request came (to maintain
> cache locality especially on NUMA systems). I changed this algorithm to spread the load
> more uniformly as we were trying to improve Linux IOPS on Azure XIO
> (premium storage). We are currently testing
> this code on our Converged Offering - CPS and I am finding that the perf as measured by IOS has regressed.
> I have not narrowed the reason for this regression and it may very well be the change in the 
> algorithm for selecting the outgoing channel. In general, I don't think the logic here needs to be 
> exact and locality (being on the same CPU or within the same NUMA node) is important. Any change
> to this algorithm will have to be validated on different MSFT
> environments (Azure XIO, CPS etc.).

Thanks, can you please compare two algorythms here:
1) Simple round robin (the one my patch series implement but with issues
fixed, I'll send v2).
2) Try to find a (sub)channel with matching VCPU and round-robin when we
fail (I can actually include it in v2).
We can later decide something based on these testing results.

>
> Regards,
>
> K. Y

-- 
  Vitaly

^ permalink raw reply	[flat|nested] 18+ messages in thread

* RE: [PATCH 5/6] Drivers: hv: vmbus: distribute subchannels among all vcpus
  2015-04-27 13:30         ` Vitaly Kuznetsov
@ 2015-04-27 18:09           ` KY Srinivasan
  0 siblings, 0 replies; 18+ messages in thread
From: KY Srinivasan @ 2015-04-27 18:09 UTC (permalink / raw)
  To: Vitaly Kuznetsov; +Cc: Dexuan Cui, Haiyang Zhang, devel, linux-kernel



> -----Original Message-----
> From: Vitaly Kuznetsov [mailto:vkuznets@redhat.com]
> Sent: Monday, April 27, 2015 6:30 AM
> To: KY Srinivasan
> Cc: Dexuan Cui; Haiyang Zhang; devel@linuxdriverproject.org; linux-
> kernel@vger.kernel.org
> Subject: Re: [PATCH 5/6] Drivers: hv: vmbus: distribute subchannels among
> all vcpus
> 
> KY Srinivasan <kys@microsoft.com> writes:
> 
> >> -----Original Message-----
> >> From: Vitaly Kuznetsov [mailto:vkuznets@redhat.com]
> >> Sent: Friday, April 24, 2015 2:05 AM
> >> To: Dexuan Cui
> >> Cc: KY Srinivasan; Haiyang Zhang; devel@linuxdriverproject.org; linux-
> >> kernel@vger.kernel.org
> >> Subject: Re: [PATCH 5/6] Drivers: hv: vmbus: distribute subchannels
> among
> >> all vcpus
> >>
> >> Dexuan Cui <decui@microsoft.com> writes:
> >>
> >> >> -----Original Message-----
> >> >> From: Vitaly Kuznetsov [mailto:vkuznets@redhat.com]
> >> >> Sent: Tuesday, April 21, 2015 22:28
> >> >> To: KY Srinivasan
> >> >> Cc: Haiyang Zhang; devel@linuxdriverproject.org; linux-
> >> >> kernel@vger.kernel.org; Dexuan Cui
> >> >> Subject: [PATCH 5/6] Drivers: hv: vmbus: distribute subchannels
> among all
> >> >> vcpus
> >> >>
> >> >> Primary channels are distributed evenly across all vcpus we have.
> When
> >> the
> >> >> host asks us to create subchannels it usually makes us num_cpus-1
> offers
> >> >
> >> > Hi Vitaly,
> >> > AFAIK, in the VSP of storvsc, the number of subchannel is
> >> >  (the_number_of_vcpus - 1) / 4.
> >> >
> >> > This means for a 8-vCPU guest, there is only 1 subchannel.
> >> >
> >> > Your new algorithm tends to make the vCPUs with small-number busier:
> >> > e.g., in the 8-vCPU case, assuming we have 4 SCSI controllers:
> >> > vCPU0: scsi0's PrimaryChannel (P)
> >> > vCPU1: scsi0's SubChannel (S) + scsi1's P
> >> > vCPU2: scsi1's S + scsi2's P
> >> > vCPU3: scsi2's S + scsi3's P
> >> > vCPU4: scsi3's S
> >> > vCPU5, 6 and 7 are idle.
> >> >
> >> > In this special case, the existing algorithm is better. :-)
> >> >
> >> > However, I do like this idea in your patch, that is, making sure a device's
> >> > primary/sub channels are assigned to differents vCPUs.
> >>
> >> Under special circumstances with the current code we can end up with
> >> having all subchannels on the same vCPU with the primary channel I guess
> >> :-) This is not something common, but possible.
> >>
> >> >
> >> > I'm just wondering if we should use an even better (and complex)
> >> > algorithm :-)
> >>
> >> The question here is - does sticking to the current vCPU help? If it
> >> does, I can suggest the following (I think I even mentioned that in my
> >> PATCH 00): first we try to find a (sub)channel with target_cpu ==
> >> current_vcpu and only when we fail we do the round robin. I'd like to
> >> hear K.Y.'s opinion here as he's the original author :-)
> >
> > Sorry for the delayed response. Initially I had implemented a scheme that
> would
> > pick an outgoing CPU that was closest to the CPU on which the request
> came (to maintain
> > cache locality especially on NUMA systems). I changed this algorithm to
> spread the load
> > more uniformly as we were trying to improve Linux IOPS on Azure XIO
> > (premium storage). We are currently testing
> > this code on our Converged Offering - CPS and I am finding that the perf as
> measured by IOS has regressed.
> > I have not narrowed the reason for this regression and it may very well be
> the change in the
> > algorithm for selecting the outgoing channel. In general, I don't think the
> logic here needs to be
> > exact and locality (being on the same CPU or within the same NUMA node)
> is important. Any change
> > to this algorithm will have to be validated on different MSFT
> > environments (Azure XIO, CPS etc.).
> 
> Thanks, can you please compare two algorythms here:
> 1) Simple round robin (the one my patch series implement but with issues
> fixed, I'll send v2).
> 2) Try to find a (sub)channel with matching VCPU and round-robin when we
> fail (I can actually include it in v2).
> We can later decide something based on these testing results.

We will do some testing.

K. Y
> 
> >
> > Regards,
> >
> > K. Y
> 
> --
>   Vitaly

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2015-04-27 18:09 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-04-21 14:27 [PATCH 0/6] Drivers: hv: vmbus: fair round robin algorithm for vmbus_get_outgoing_channel() Vitaly Kuznetsov
2015-04-21 14:27 ` [PATCH 1/6] Drivers: hv: vmbus: unify calls to percpu_channel_enq() Vitaly Kuznetsov
2015-04-21 14:27 ` [PATCH 2/6] Drivers: hv: vmbus: briefly comment num_sc and next_oc Vitaly Kuznetsov
2015-04-21 14:27 ` [PATCH 3/6] Drivers: hv: vmbus: decrease num_sc on subchannel removal Vitaly Kuznetsov
2015-04-24  7:02   ` Dexuan Cui
2015-04-24  8:40     ` Vitaly Kuznetsov
2015-04-21 14:27 ` [PATCH 4/6] Drivers: hv: vmbus: move init_vp_index() call to vmbus_process_offer() Vitaly Kuznetsov
2015-04-24  8:38   ` Dexuan Cui
2015-04-24  8:46     ` Vitaly Kuznetsov
2015-04-21 14:27 ` [PATCH 5/6] Drivers: hv: vmbus: distribute subchannels among all vcpus Vitaly Kuznetsov
2015-04-24  8:40   ` Dexuan Cui
2015-04-24  9:05     ` Vitaly Kuznetsov
2015-04-24 16:46       ` KY Srinivasan
2015-04-27 13:30         ` Vitaly Kuznetsov
2015-04-27 18:09           ` KY Srinivasan
2015-04-21 14:27 ` [PATCH 6/6] Drivers: hv: vmbus: do a fair round robin when selecting an outgoing channel Vitaly Kuznetsov
2015-04-24  8:42   ` Dexuan Cui
2015-04-24  8:59     ` Vitaly Kuznetsov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).