linux-hyperv.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 00/11] VMBus channel interrupt reassignment
@ 2020-04-06  0:15 Andrea Parri (Microsoft)
  2020-04-06  0:15 ` [PATCH 01/11] Drivers: hv: vmbus: Always handle the VMBus messages on CPU0 Andrea Parri (Microsoft)
                   ` (11 more replies)
  0 siblings, 12 replies; 29+ messages in thread
From: Andrea Parri (Microsoft) @ 2020-04-06  0:15 UTC (permalink / raw)
  To: linux-kernel
  Cc: K . Y . Srinivasan, Haiyang Zhang, Stephen Hemminger, Wei Liu,
	linux-hyperv, Michael Kelley, Dexuan Cui, Boqun Feng,
	Vitaly Kuznetsov, Andrea Parri (Microsoft),
	David S. Miller, Lorenzo Pieralisi, Andrew Murray, Bjorn Helgaas,
	James E.J. Bottomley, Martin K. Petersen

Hi all,

This is a follow-up on the RFC submission [1].  The series introduces
changes in the VMBus drivers to reassign the CPU that a VMbus channel
will interrupt.  This feature can be used for load balancing or other
purposes (e.g. CPU offlining).  The submission integrates feedback in
the RFC to amend the handling of the 'array of channels' (patch #3).

Thanks,
  Andrea

[1] https://lkml.kernel.org/r/20200325225505.23998-1-parri.andrea@gmail.com

Cc: "David S. Miller" <davem@davemloft.net>
Cc: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Cc: Andrew Murray <amurray@thegoodpenguin.co.uk>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: "James E.J. Bottomley" <jejb@linux.ibm.com>
Cc: "Martin K. Petersen" <martin.petersen@oracle.com>

Andrea Parri (Microsoft) (11):
  Drivers: hv: vmbus: Always handle the VMBus messages on CPU0
  Drivers: hv: vmbus: Don't bind the offer&rescind works to a specific
    CPU
  Drivers: hv: vmbus: Replace the per-CPU channel lists with a global
    array of channels
  hv_netvsc: Disable NAPI before closing the VMBus channel
  hv_utils: Always execute the fcopy and vss callbacks in a tasklet
  Drivers: hv: vmbus: Use a spin lock for synchronizing channel
    scheduling vs. channel removal
  PCI: hv: Prepare hv_compose_msi_msg() for the
    VMBus-channel-interrupt-to-vCPU reassignment functionality
  Drivers: hv: vmbus: Remove the unused HV_LOCALIZED channel affinity
    logic
  Drivers: hv: vmbus: Synchronize init_vp_index() vs. CPU hotplug
  Drivers: hv: vmbus: Introduce the CHANNELMSG_MODIFYCHANNEL message
    type
  scsi: storvsc: Re-init stor_chns when a channel interrupt is
    re-assigned

 drivers/hv/channel.c                |  58 +++--
 drivers/hv/channel_mgmt.c           | 347 +++++++++++++++-------------
 drivers/hv/connection.c             |  58 +----
 drivers/hv/hv.c                     |  16 +-
 drivers/hv/hv_fcopy.c               |   2 +-
 drivers/hv/hv_snapshot.c            |   2 +-
 drivers/hv/hv_trace.h               |  19 ++
 drivers/hv/hyperv_vmbus.h           |  32 ++-
 drivers/hv/vmbus_drv.c              | 241 +++++++++++++++----
 drivers/net/hyperv/netvsc.c         |   7 +-
 drivers/pci/controller/pci-hyperv.c |  44 ++--
 drivers/scsi/storvsc_drv.c          |  95 +++++++-
 include/linux/hyperv.h              |  51 ++--
 13 files changed, 620 insertions(+), 352 deletions(-)

-- 
2.24.0


^ permalink raw reply	[flat|nested] 29+ messages in thread

* [PATCH 01/11] Drivers: hv: vmbus: Always handle the VMBus messages on CPU0
  2020-04-06  0:15 [PATCH 00/11] VMBus channel interrupt reassignment Andrea Parri (Microsoft)
@ 2020-04-06  0:15 ` Andrea Parri (Microsoft)
  2020-04-10 17:18   ` Michael Kelley
  2020-04-06  0:15 ` [PATCH 02/11] Drivers: hv: vmbus: Don't bind the offer&rescind works to a specific CPU Andrea Parri (Microsoft)
                   ` (10 subsequent siblings)
  11 siblings, 1 reply; 29+ messages in thread
From: Andrea Parri (Microsoft) @ 2020-04-06  0:15 UTC (permalink / raw)
  To: linux-kernel
  Cc: K . Y . Srinivasan, Haiyang Zhang, Stephen Hemminger, Wei Liu,
	linux-hyperv, Michael Kelley, Dexuan Cui, Boqun Feng,
	Vitaly Kuznetsov, Andrea Parri (Microsoft)

A Linux guest have to pick a "connect CPU" to communicate with the
Hyper-V host.  This CPU can not be taken offline because Hyper-V does
not provide a way to change that CPU assignment.

Current code sets the connect CPU to whatever CPU ends up running the
function vmbus_negotiate_version(), and this will generate problems if
that CPU is taken offine.

Establish CPU0 as the connect CPU, and add logics to prevents the
connect CPU from being taken offline.   We could pick some other CPU,
and we could pick that "other CPU" dynamically if there was a reason to
do so at some point in the future.  But for now, #defining the connect
CPU to 0 is the most straightforward and least complex solution.

While on this, add inline comments explaining "why" offer and rescind
messages should not be handled by a same serialized work queue.

Suggested-by: Dexuan Cui <decui@microsoft.com>
Signed-off-by: Andrea Parri (Microsoft) <parri.andrea@gmail.com>
Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com>
---
 drivers/hv/connection.c   | 20 +-------------------
 drivers/hv/hv.c           |  7 +++++++
 drivers/hv/hyperv_vmbus.h | 11 ++++++-----
 drivers/hv/vmbus_drv.c    | 20 +++++++++++++++++---
 4 files changed, 31 insertions(+), 27 deletions(-)

diff --git a/drivers/hv/connection.c b/drivers/hv/connection.c
index 74e77de89b4f3..f4bd306d2cef9 100644
--- a/drivers/hv/connection.c
+++ b/drivers/hv/connection.c
@@ -69,7 +69,6 @@ MODULE_PARM_DESC(max_version,
 int vmbus_negotiate_version(struct vmbus_channel_msginfo *msginfo, u32 version)
 {
 	int ret = 0;
-	unsigned int cur_cpu;
 	struct vmbus_channel_initiate_contact *msg;
 	unsigned long flags;
 
@@ -102,24 +101,7 @@ int vmbus_negotiate_version(struct vmbus_channel_msginfo *msginfo, u32 version)
 
 	msg->monitor_page1 = virt_to_phys(vmbus_connection.monitor_pages[0]);
 	msg->monitor_page2 = virt_to_phys(vmbus_connection.monitor_pages[1]);
-	/*
-	 * We want all channel messages to be delivered on CPU 0.
-	 * This has been the behavior pre-win8. This is not
-	 * perf issue and having all channel messages delivered on CPU 0
-	 * would be ok.
-	 * For post win8 hosts, we support receiving channel messagges on
-	 * all the CPUs. This is needed for kexec to work correctly where
-	 * the CPU attempting to connect may not be CPU 0.
-	 */
-	if (version >= VERSION_WIN8_1) {
-		cur_cpu = get_cpu();
-		msg->target_vcpu = hv_cpu_number_to_vp_number(cur_cpu);
-		vmbus_connection.connect_cpu = cur_cpu;
-		put_cpu();
-	} else {
-		msg->target_vcpu = 0;
-		vmbus_connection.connect_cpu = 0;
-	}
+	msg->target_vcpu = hv_cpu_number_to_vp_number(VMBUS_CONNECT_CPU);
 
 	/*
 	 * Add to list before we send the request since we may
diff --git a/drivers/hv/hv.c b/drivers/hv/hv.c
index 6098e0cbdb4b0..e2b3310454640 100644
--- a/drivers/hv/hv.c
+++ b/drivers/hv/hv.c
@@ -249,6 +249,13 @@ int hv_synic_cleanup(unsigned int cpu)
 	bool channel_found = false;
 	unsigned long flags;
 
+	/*
+	 * Hyper-V does not provide a way to change the connect CPU once
+	 * it is set; we must prevent the connect CPU from going offline.
+	 */
+	if (cpu == VMBUS_CONNECT_CPU)
+		return -EBUSY;
+
 	/*
 	 * Search for channels which are bound to the CPU we're about to
 	 * cleanup. In case we find one and vmbus is still connected we need to
diff --git a/drivers/hv/hyperv_vmbus.h b/drivers/hv/hyperv_vmbus.h
index f5fa3b3c9baf7..ce19a4c583884 100644
--- a/drivers/hv/hyperv_vmbus.h
+++ b/drivers/hv/hyperv_vmbus.h
@@ -212,12 +212,13 @@ enum vmbus_connect_state {
 
 #define MAX_SIZE_CHANNEL_MESSAGE	HV_MESSAGE_PAYLOAD_BYTE_COUNT
 
-struct vmbus_connection {
-	/*
-	 * CPU on which the initial host contact was made.
-	 */
-	int connect_cpu;
+/*
+ * The CPU that Hyper-V will interrupt for VMBUS messages, such as
+ * CHANNELMSG_OFFERCHANNEL and CHANNELMSG_RESCIND_CHANNELOFFER.
+ */
+#define VMBUS_CONNECT_CPU	0
 
+struct vmbus_connection {
 	u32 msg_conn_id;
 
 	atomic_t offer_in_progress;
diff --git a/drivers/hv/vmbus_drv.c b/drivers/hv/vmbus_drv.c
index 172ceae69abb7..a491d44362f9f 100644
--- a/drivers/hv/vmbus_drv.c
+++ b/drivers/hv/vmbus_drv.c
@@ -1083,14 +1083,28 @@ void vmbus_on_msg_dpc(unsigned long data)
 			/*
 			 * If we are handling the rescind message;
 			 * schedule the work on the global work queue.
+			 *
+			 * The OFFER message and the RESCIND message should
+			 * not be handled by the same serialized work queue,
+			 * because the OFFER handler may call vmbus_open(),
+			 * which tries to open the channel by sending an
+			 * OPEN_CHANNEL message to the host and waits for
+			 * the host's response; however, if the host has
+			 * rescinded the channel before it receives the
+			 * OPEN_CHANNEL message, the host just silently
+			 * ignores the OPEN_CHANNEL message; as a result,
+			 * the guest's OFFER handler hangs for ever, if we
+			 * handle the RESCIND message in the same serialized
+			 * work queue: the RESCIND handler can not start to
+			 * run before the OFFER handler finishes.
 			 */
-			schedule_work_on(vmbus_connection.connect_cpu,
+			schedule_work_on(VMBUS_CONNECT_CPU,
 					 &ctx->work);
 			break;
 
 		case CHANNELMSG_OFFERCHANNEL:
 			atomic_inc(&vmbus_connection.offer_in_progress);
-			queue_work_on(vmbus_connection.connect_cpu,
+			queue_work_on(VMBUS_CONNECT_CPU,
 				      vmbus_connection.work_queue,
 				      &ctx->work);
 			break;
@@ -1137,7 +1151,7 @@ static void vmbus_force_channel_rescinded(struct vmbus_channel *channel)
 
 	INIT_WORK(&ctx->work, vmbus_onmessage_work);
 
-	queue_work_on(vmbus_connection.connect_cpu,
+	queue_work_on(VMBUS_CONNECT_CPU,
 		      vmbus_connection.work_queue,
 		      &ctx->work);
 }
-- 
2.24.0


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH 02/11] Drivers: hv: vmbus: Don't bind the offer&rescind works to a specific CPU
  2020-04-06  0:15 [PATCH 00/11] VMBus channel interrupt reassignment Andrea Parri (Microsoft)
  2020-04-06  0:15 ` [PATCH 01/11] Drivers: hv: vmbus: Always handle the VMBus messages on CPU0 Andrea Parri (Microsoft)
@ 2020-04-06  0:15 ` Andrea Parri (Microsoft)
  2020-04-10 17:23   ` Michael Kelley
  2020-04-06  0:15 ` [PATCH 03/11] Drivers: hv: vmbus: Replace the per-CPU channel lists with a global array of channels Andrea Parri (Microsoft)
                   ` (9 subsequent siblings)
  11 siblings, 1 reply; 29+ messages in thread
From: Andrea Parri (Microsoft) @ 2020-04-06  0:15 UTC (permalink / raw)
  To: linux-kernel
  Cc: K . Y . Srinivasan, Haiyang Zhang, Stephen Hemminger, Wei Liu,
	linux-hyperv, Michael Kelley, Dexuan Cui, Boqun Feng,
	Vitaly Kuznetsov, Andrea Parri (Microsoft)

The offer and rescind works are currently scheduled on the so called
"connect CPU".  However, this is not really needed: we can synchronize
the works by relying on the usage of the offer_in_progress counter and
of the channel_mutex mutex.  This synchronization is already in place.
So, remove this unnecessary "bind to the connect CPU" constraint and
update the inline comments accordingly.

Suggested-by: Dexuan Cui <decui@microsoft.com>
Signed-off-by: Andrea Parri (Microsoft) <parri.andrea@gmail.com>
---
 drivers/hv/channel_mgmt.c | 21 ++++++++++++++++-----
 drivers/hv/vmbus_drv.c    | 39 ++++++++++++++++++++++++++++-----------
 2 files changed, 44 insertions(+), 16 deletions(-)

diff --git a/drivers/hv/channel_mgmt.c b/drivers/hv/channel_mgmt.c
index 501c43c5851dc..aed8db59a5ff8 100644
--- a/drivers/hv/channel_mgmt.c
+++ b/drivers/hv/channel_mgmt.c
@@ -1028,11 +1028,22 @@ static void vmbus_onoffer_rescind(struct vmbus_channel_message_header *hdr)
 	 * offer comes in first and then the rescind.
 	 * Since we process these events in work elements,
 	 * and with preemption, we may end up processing
-	 * the events out of order. Given that we handle these
-	 * work elements on the same CPU, this is possible only
-	 * in the case of preemption. In any case wait here
-	 * until the offer processing has moved beyond the
-	 * point where the channel is discoverable.
+	 * the events out of order.  We rely on the synchronization
+	 * provided by offer_in_progress and by channel_mutex for
+	 * ordering these events:
+	 *
+	 * { Initially: offer_in_progress = 1 }
+	 *
+	 * CPU1				CPU2
+	 *
+	 * [vmbus_process_offer()]	[vmbus_onoffer_rescind()]
+	 *
+	 * LOCK channel_mutex		WAIT_ON offer_in_progress == 0
+	 * DECREMENT offer_in_progress	LOCK channel_mutex
+	 * INSERT chn_list		SEARCH chn_list
+	 * UNLOCK channel_mutex		UNLOCK channel_mutex
+	 *
+	 * Forbids: CPU2's SEARCH from *not* seeing CPU1's INSERT
 	 */
 
 	while (atomic_read(&vmbus_connection.offer_in_progress) != 0) {
diff --git a/drivers/hv/vmbus_drv.c b/drivers/hv/vmbus_drv.c
index a491d44362f9f..41ec0c95df33f 100644
--- a/drivers/hv/vmbus_drv.c
+++ b/drivers/hv/vmbus_drv.c
@@ -1075,8 +1075,9 @@ void vmbus_on_msg_dpc(unsigned long data)
 		/*
 		 * The host can generate a rescind message while we
 		 * may still be handling the original offer. We deal with
-		 * this condition by ensuring the processing is done on the
-		 * same CPU.
+		 * this condition by relying on the synchronization provided
+		 * by offer_in_progress and by channel_mutex.  See also the
+		 * inline comments in vmbus_onoffer_rescind().
 		 */
 		switch (hdr->msgtype) {
 		case CHANNELMSG_RESCIND_CHANNELOFFER:
@@ -1098,16 +1099,34 @@ void vmbus_on_msg_dpc(unsigned long data)
 			 * work queue: the RESCIND handler can not start to
 			 * run before the OFFER handler finishes.
 			 */
-			schedule_work_on(VMBUS_CONNECT_CPU,
-					 &ctx->work);
+			schedule_work(&ctx->work);
 			break;
 
 		case CHANNELMSG_OFFERCHANNEL:
+			/*
+			 * The host sends the offer message of a given channel
+			 * before sending the rescind message of the same
+			 * channel.  These messages are sent to the guest's
+			 * connect CPU; the guest then starts processing them
+			 * in the tasklet handler on this CPU:
+			 *
+			 * VMBUS_CONNECT_CPU
+			 *
+			 * [vmbus_on_msg_dpc()]
+			 * atomic_inc()  // CHANNELMSG_OFFERCHANNEL
+			 * queue_work()
+			 * ...
+			 * [vmbus_on_msg_dpc()]
+			 * schedule_work()  // CHANNELMSG_RESCIND_CHANNELOFFER
+			 *
+			 * We rely on the memory-ordering properties of the
+			 * queue_work() and schedule_work() primitives, which
+			 * guarantee that the atomic increment will be visible
+			 * to the CPUs which will execute the offer & rescind
+			 * works by the time these works will start execution.
+			 */
 			atomic_inc(&vmbus_connection.offer_in_progress);
-			queue_work_on(VMBUS_CONNECT_CPU,
-				      vmbus_connection.work_queue,
-				      &ctx->work);
-			break;
+			fallthrough;
 
 		default:
 			queue_work(vmbus_connection.work_queue, &ctx->work);
@@ -1151,9 +1170,7 @@ static void vmbus_force_channel_rescinded(struct vmbus_channel *channel)
 
 	INIT_WORK(&ctx->work, vmbus_onmessage_work);
 
-	queue_work_on(VMBUS_CONNECT_CPU,
-		      vmbus_connection.work_queue,
-		      &ctx->work);
+	queue_work(vmbus_connection.work_queue, &ctx->work);
 }
 #endif /* CONFIG_PM_SLEEP */
 
-- 
2.24.0


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH 03/11] Drivers: hv: vmbus: Replace the per-CPU channel lists with a global array of channels
  2020-04-06  0:15 [PATCH 00/11] VMBus channel interrupt reassignment Andrea Parri (Microsoft)
  2020-04-06  0:15 ` [PATCH 01/11] Drivers: hv: vmbus: Always handle the VMBus messages on CPU0 Andrea Parri (Microsoft)
  2020-04-06  0:15 ` [PATCH 02/11] Drivers: hv: vmbus: Don't bind the offer&rescind works to a specific CPU Andrea Parri (Microsoft)
@ 2020-04-06  0:15 ` Andrea Parri (Microsoft)
  2020-04-10 17:29   ` Michael Kelley
  2020-04-06  0:15 ` [PATCH 04/11] hv_netvsc: Disable NAPI before closing the VMBus channel Andrea Parri (Microsoft)
                   ` (8 subsequent siblings)
  11 siblings, 1 reply; 29+ messages in thread
From: Andrea Parri (Microsoft) @ 2020-04-06  0:15 UTC (permalink / raw)
  To: linux-kernel
  Cc: K . Y . Srinivasan, Haiyang Zhang, Stephen Hemminger, Wei Liu,
	linux-hyperv, Michael Kelley, Dexuan Cui, Boqun Feng,
	Vitaly Kuznetsov, Andrea Parri (Microsoft)

When Hyper-V sends an interrupt to the guest, the guest has to figure
out which channel the interrupt is associated with.  Hyper-V sets a bit
in a memory page that is shared with the guest, indicating a particular
"relid" that the interrupt is associated with.  The current Linux code
then uses a set of per-CPU linked lists to map a given "relid" to a
pointer to a channel structure.

This design introduces a synchronization problem if the CPU that Hyper-V
will interrupt for a certain channel is changed.  If the interrupt comes
on the "old CPU" and the channel was already moved to the per-CPU list
of the "new CPU", then the relid -> channel mapping will fail and the
interrupt is dropped.  Similarly, if the interrupt comes on the new CPU
but the channel was not moved to the per-CPU list of the new CPU, then
the mapping will fail and the interrupt is dropped.

Relids are integers ranging from 0 to 2047.  The mapping from relids to
channel structures can be done by setting up an array with 2048 entries,
each entry being a pointer to a channel structure (hence total size ~16K
bytes, which is not a problem).  The array is global, so there are no
per-CPU linked lists to update.  The array can be searched and updated
by loading from/storing to the array at the specified index.  With no
per-CPU data structures, the above mentioned synchronization problem is
avoided and the relid2channel() function gets simpler.

Suggested-by: Michael Kelley <mikelley@microsoft.com>
Signed-off-by: Andrea Parri (Microsoft) <parri.andrea@gmail.com>
---
NOTE.  The inline comment in vmbus_channel_map_relid() follows the argument
discussed in the RFC [1].  An attempt has been made to make this argument
more 'explicit' (and, hopefully, more 'robust' against future changes) by
adding a full memory barrier in vmbus_channel_map_relid(): the barrier takes
the role of the 'implicit' full memory barriers from queue_work() and from
check_ready_for_resume_event() referred to in the RFC.

[1] https://lkml.kernel.org/r/20200403133826.GA25401@andrea

 drivers/hv/channel_mgmt.c | 179 ++++++++++++++++++++++++--------------
 drivers/hv/connection.c   |  38 +++-----
 drivers/hv/hv.c           |   2 -
 drivers/hv/hyperv_vmbus.h |  14 +--
 drivers/hv/vmbus_drv.c    |  48 ++++++----
 include/linux/hyperv.h    |   5 --
 6 files changed, 160 insertions(+), 126 deletions(-)

diff --git a/drivers/hv/channel_mgmt.c b/drivers/hv/channel_mgmt.c
index aed8db59a5ff8..8c179f6866ef7 100644
--- a/drivers/hv/channel_mgmt.c
+++ b/drivers/hv/channel_mgmt.c
@@ -319,7 +319,6 @@ static struct vmbus_channel *alloc_channel(void)
 	init_completion(&channel->rescind_event);
 
 	INIT_LIST_HEAD(&channel->sc_list);
-	INIT_LIST_HEAD(&channel->percpu_list);
 
 	tasklet_init(&channel->callback_event,
 		     vmbus_on_event, (unsigned long)channel);
@@ -340,23 +339,49 @@ static void free_channel(struct vmbus_channel *channel)
 	kobject_put(&channel->kobj);
 }
 
-static void percpu_channel_enq(void *arg)
+void vmbus_channel_map_relid(struct vmbus_channel *channel)
 {
-	struct vmbus_channel *channel = arg;
-	struct hv_per_cpu_context *hv_cpu
-		= this_cpu_ptr(hv_context.cpu_context);
-
-	list_add_tail_rcu(&channel->percpu_list, &hv_cpu->chan_list);
+	if (WARN_ON(channel->offermsg.child_relid >= MAX_CHANNEL_RELIDS))
+		return;
+	/*
+	 * The mapping of the channel's relid is visible from the CPUs that
+	 * execute vmbus_chan_sched() by the time that vmbus_chan_sched() will
+	 * execute:
+	 *
+	 *  (a) In the "normal (i.e., not resuming from hibernation)" path,
+	 *      the full barrier in smp_store_mb() guarantees that the store
+	 *      is propagated to all CPUs before the add_channel_work work
+	 *      is queued.  In turn, add_channel_work is queued before the
+	 *      channel's ring buffer is allocated/initialized and the
+	 *      OPENCHANNEL message for the channel is sent in vmbus_open().
+	 *      Hyper-V won't start sending the interrupts for the channel
+	 *      before the OPENCHANNEL message is acked.  The memory barrier
+	 *      in vmbus_chan_sched() -> sync_test_and_clear_bit() ensures
+	 *      that vmbus_chan_sched() must find the channel's relid in
+	 *      recv_int_page before retrieving the channel pointer from the
+	 *      array of channels.
+	 *
+	 *  (b) In the "resuming from hibernation" path, the smp_store_mb()
+	 *      guarantees that the store is propagated to all CPUs before
+	 *      the VMBus connection is marked as ready for the resume event
+	 *      (cf. check_ready_for_resume_event()).  The interrupt handler
+	 *      of the VMBus driver and vmbus_chan_sched() can not run before
+	 *      vmbus_bus_resume() has completed execution (cf. resume_noirq).
+	 */
+	smp_store_mb(
+		vmbus_connection.channels[channel->offermsg.child_relid],
+		channel);
 }
 
-static void percpu_channel_deq(void *arg)
+void vmbus_channel_unmap_relid(struct vmbus_channel *channel)
 {
-	struct vmbus_channel *channel = arg;
-
-	list_del_rcu(&channel->percpu_list);
+	if (WARN_ON(channel->offermsg.child_relid >= MAX_CHANNEL_RELIDS))
+		return;
+	WRITE_ONCE(
+		vmbus_connection.channels[channel->offermsg.child_relid],
+		NULL);
 }
 
-
 static void vmbus_release_relid(u32 relid)
 {
 	struct vmbus_channel_relid_released msg;
@@ -376,17 +401,25 @@ void hv_process_channel_removal(struct vmbus_channel *channel)
 	struct vmbus_channel *primary_channel;
 	unsigned long flags;
 
-	BUG_ON(!mutex_is_locked(&vmbus_connection.channel_mutex));
+	lockdep_assert_held(&vmbus_connection.channel_mutex);
 	BUG_ON(!channel->rescind);
 
-	if (channel->target_cpu != get_cpu()) {
-		put_cpu();
-		smp_call_function_single(channel->target_cpu,
-					 percpu_channel_deq, channel, true);
-	} else {
-		percpu_channel_deq(channel);
-		put_cpu();
-	}
+	/*
+	 * hv_process_channel_removal() could find INVALID_RELID only for
+	 * hv_sock channels.  See the inline comments in vmbus_onoffer().
+	 */
+	WARN_ON(channel->offermsg.child_relid == INVALID_RELID &&
+		!is_hvsock_channel(channel));
+
+	/*
+	 * Upon suspend, an in-use hv_sock channel is removed from the array of
+	 * channels and the relid is invalidated.  After hibernation, when the
+	 * user-space appplication destroys the channel, it's unnecessary and
+	 * unsafe to remove the channel from the array of channels.  See also
+	 * the inline comments before the call of vmbus_release_relid() below.
+	 */
+	if (channel->offermsg.child_relid != INVALID_RELID)
+		vmbus_channel_unmap_relid(channel);
 
 	if (channel->primary_channel == NULL) {
 		list_del(&channel->listentry);
@@ -447,16 +480,6 @@ static void vmbus_add_channel_work(struct work_struct *work)
 
 	init_vp_index(newchannel, dev_type);
 
-	if (newchannel->target_cpu != get_cpu()) {
-		put_cpu();
-		smp_call_function_single(newchannel->target_cpu,
-					 percpu_channel_enq,
-					 newchannel, true);
-	} else {
-		percpu_channel_enq(newchannel);
-		put_cpu();
-	}
-
 	/*
 	 * This state is used to indicate a successful open
 	 * so that when we do close the channel normally, we
@@ -523,17 +546,10 @@ static void vmbus_add_channel_work(struct work_struct *work)
 		spin_unlock_irqrestore(&primary_channel->lock, flags);
 	}
 
-	mutex_unlock(&vmbus_connection.channel_mutex);
+	/* vmbus_process_offer() has mapped the channel. */
+	vmbus_channel_unmap_relid(newchannel);
 
-	if (newchannel->target_cpu != get_cpu()) {
-		put_cpu();
-		smp_call_function_single(newchannel->target_cpu,
-					 percpu_channel_deq,
-					 newchannel, true);
-	} else {
-		percpu_channel_deq(newchannel);
-		put_cpu();
-	}
+	mutex_unlock(&vmbus_connection.channel_mutex);
 
 	vmbus_release_relid(newchannel->offermsg.child_relid);
 
@@ -599,6 +615,8 @@ static void vmbus_process_offer(struct vmbus_channel *newchannel)
 		spin_unlock_irqrestore(&channel->lock, flags);
 	}
 
+	vmbus_channel_map_relid(newchannel);
+
 	mutex_unlock(&vmbus_connection.channel_mutex);
 
 	/*
@@ -940,8 +958,6 @@ static void vmbus_onoffer(struct vmbus_channel_message_header *hdr)
 	oldchannel = find_primary_channel_by_offer(offer);
 
 	if (oldchannel != NULL) {
-		atomic_dec(&vmbus_connection.offer_in_progress);
-
 		/*
 		 * We're resuming from hibernation: all the sub-channel and
 		 * hv_sock channels we had before the hibernation should have
@@ -949,36 +965,65 @@ static void vmbus_onoffer(struct vmbus_channel_message_header *hdr)
 		 * primary channel that we had before the hibernation.
 		 */
 
+		/*
+		 * { Initially: channel relid = INVALID_RELID,
+		 *		channels[valid_relid] = NULL }
+		 *
+		 * CPU1					CPU2
+		 *
+		 * [vmbus_onoffer()]			[vmbus_device_release()]
+		 *
+		 * LOCK channel_mutex			LOCK channel_mutex
+		 * STORE channel relid = valid_relid	LOAD r1 = channel relid
+		 * MAP_RELID channel			if (r1 != INVALID_RELID)
+		 * UNLOCK channel_mutex			  UNMAP_RELID channel
+		 *					UNLOCK channel_mutex
+		 *
+		 * Forbids: r1 == valid_relid &&
+		 * 		channels[valid_relid] == channel
+		 *
+		 * Note.  r1 can be INVALID_RELID only for an hv_sock channel.
+		 * None of the hv_sock channels which were present before the
+		 * suspend are re-offered upon the resume.  See the WARN_ON()
+		 * in hv_process_channel_removal().
+		 */
+		mutex_lock(&vmbus_connection.channel_mutex);
+
+		atomic_dec(&vmbus_connection.offer_in_progress);
+
 		WARN_ON(oldchannel->offermsg.child_relid != INVALID_RELID);
 		/* Fix up the relid. */
 		oldchannel->offermsg.child_relid = offer->child_relid;
 
 		offer_sz = sizeof(*offer);
-		if (memcmp(offer, &oldchannel->offermsg, offer_sz) == 0) {
-			check_ready_for_resume_event();
-			return;
+		if (memcmp(offer, &oldchannel->offermsg, offer_sz) != 0) {
+			/*
+			 * This is not an error, since the host can also change
+			 * the other field(s) of the offer, e.g. on WS RS5
+			 * (Build 17763), the offer->connection_id of the
+			 * Mellanox VF vmbus device can change when the host
+			 * reoffers the device upon resume.
+			 */
+			pr_debug("vmbus offer changed: relid=%d\n",
+				 offer->child_relid);
+
+			print_hex_dump_debug("Old vmbus offer: ",
+					     DUMP_PREFIX_OFFSET, 16, 4,
+					     &oldchannel->offermsg, offer_sz,
+					     false);
+			print_hex_dump_debug("New vmbus offer: ",
+					     DUMP_PREFIX_OFFSET, 16, 4,
+					     offer, offer_sz, false);
+
+			/* Fix up the old channel. */
+			vmbus_setup_channel_state(oldchannel, offer);
 		}
 
-		/*
-		 * This is not an error, since the host can also change the
-		 * other field(s) of the offer, e.g. on WS RS5 (Build 17763),
-		 * the offer->connection_id of the Mellanox VF vmbus device
-		 * can change when the host reoffers the device upon resume.
-		 */
-		pr_debug("vmbus offer changed: relid=%d\n",
-			 offer->child_relid);
-
-		print_hex_dump_debug("Old vmbus offer: ", DUMP_PREFIX_OFFSET,
-				     16, 4, &oldchannel->offermsg, offer_sz,
-				     false);
-		print_hex_dump_debug("New vmbus offer: ", DUMP_PREFIX_OFFSET,
-				     16, 4, offer, offer_sz, false);
-
-		/* Fix up the old channel. */
-		vmbus_setup_channel_state(oldchannel, offer);
-
+		/* Add the channel back to the array of channels. */
+		vmbus_channel_map_relid(oldchannel);
 		check_ready_for_resume_event();
 
+		mutex_unlock(&vmbus_connection.channel_mutex);
 		return;
 	}
 
@@ -1036,14 +1081,14 @@ static void vmbus_onoffer_rescind(struct vmbus_channel_message_header *hdr)
 	 *
 	 * CPU1				CPU2
 	 *
-	 * [vmbus_process_offer()]	[vmbus_onoffer_rescind()]
+	 * [vmbus_onoffer()]		[vmbus_onoffer_rescind()]
 	 *
 	 * LOCK channel_mutex		WAIT_ON offer_in_progress == 0
 	 * DECREMENT offer_in_progress	LOCK channel_mutex
-	 * INSERT chn_list		SEARCH chn_list
+	 * STORE channels[]		LOAD channels[]
 	 * UNLOCK channel_mutex		UNLOCK channel_mutex
 	 *
-	 * Forbids: CPU2's SEARCH from *not* seeing CPU1's INSERT
+	 * Forbids: CPU2's LOAD from *not* seeing CPU1's STORE
 	 */
 
 	while (atomic_read(&vmbus_connection.offer_in_progress) != 0) {
diff --git a/drivers/hv/connection.c b/drivers/hv/connection.c
index f4bd306d2cef9..11170d9a2e1a5 100644
--- a/drivers/hv/connection.c
+++ b/drivers/hv/connection.c
@@ -248,6 +248,14 @@ int vmbus_connect(void)
 	pr_info("Vmbus version:%d.%d\n",
 		version >> 16, version & 0xFFFF);
 
+	vmbus_connection.channels = kcalloc(MAX_CHANNEL_RELIDS,
+					    sizeof(struct vmbus_channel *),
+					    GFP_KERNEL);
+	if (vmbus_connection.channels == NULL) {
+		ret = -ENOMEM;
+		goto cleanup;
+	}
+
 	kfree(msginfo);
 	return 0;
 
@@ -295,33 +303,9 @@ void vmbus_disconnect(void)
  */
 struct vmbus_channel *relid2channel(u32 relid)
 {
-	struct vmbus_channel *channel;
-	struct vmbus_channel *found_channel  = NULL;
-	struct list_head *cur, *tmp;
-	struct vmbus_channel *cur_sc;
-
-	BUG_ON(!mutex_is_locked(&vmbus_connection.channel_mutex));
-
-	list_for_each_entry(channel, &vmbus_connection.chn_list, listentry) {
-		if (channel->offermsg.child_relid == relid) {
-			found_channel = channel;
-			break;
-		} else if (!list_empty(&channel->sc_list)) {
-			/*
-			 * Deal with sub-channels.
-			 */
-			list_for_each_safe(cur, tmp, &channel->sc_list) {
-				cur_sc = list_entry(cur, struct vmbus_channel,
-							sc_list);
-				if (cur_sc->offermsg.child_relid == relid) {
-					found_channel = cur_sc;
-					break;
-				}
-			}
-		}
-	}
-
-	return found_channel;
+	if (WARN_ON(relid >= MAX_CHANNEL_RELIDS))
+		return NULL;
+	return READ_ONCE(vmbus_connection.channels[relid]);
 }
 
 /*
diff --git a/drivers/hv/hv.c b/drivers/hv/hv.c
index e2b3310454640..17bf1f229152b 100644
--- a/drivers/hv/hv.c
+++ b/drivers/hv/hv.c
@@ -117,8 +117,6 @@ int hv_synic_alloc(void)
 			pr_err("Unable to allocate post msg page\n");
 			goto err;
 		}
-
-		INIT_LIST_HEAD(&hv_cpu->chan_list);
 	}
 
 	return 0;
diff --git a/drivers/hv/hyperv_vmbus.h b/drivers/hv/hyperv_vmbus.h
index ce19a4c583884..41b7267da529a 100644
--- a/drivers/hv/hyperv_vmbus.h
+++ b/drivers/hv/hyperv_vmbus.h
@@ -132,12 +132,6 @@ struct hv_per_cpu_context {
 	 * basis.
 	 */
 	struct tasklet_struct msg_dpc;
-
-	/*
-	 * To optimize the mapping of relid to channel, maintain
-	 * per-cpu list of the channels based on their CPU affinity.
-	 */
-	struct list_head chan_list;
 };
 
 struct hv_context {
@@ -202,6 +196,8 @@ int hv_ringbuffer_read(struct vmbus_channel *channel,
 /* TODO: Need to make this configurable */
 #define MAX_NUM_CHANNELS_SUPPORTED	256
 
+#define MAX_CHANNEL_RELIDS					\
+	max(MAX_NUM_CHANNELS_SUPPORTED, HV_EVENT_FLAGS_COUNT)
 
 enum vmbus_connect_state {
 	DISCONNECTED,
@@ -251,6 +247,9 @@ struct vmbus_connection {
 	struct list_head chn_list;
 	struct mutex channel_mutex;
 
+	/* Array of channels */
+	struct vmbus_channel **channels;
+
 	/*
 	 * An offer message is handled first on the work_queue, and then
 	 * is further handled on handle_primary_chan_wq or
@@ -337,6 +336,9 @@ int vmbus_add_channel_kobj(struct hv_device *device_obj,
 
 void vmbus_remove_channel_attr_group(struct vmbus_channel *channel);
 
+void vmbus_channel_map_relid(struct vmbus_channel *channel);
+void vmbus_channel_unmap_relid(struct vmbus_channel *channel);
+
 struct vmbus_channel *relid2channel(u32 relid);
 
 void vmbus_free_channels(void);
diff --git a/drivers/hv/vmbus_drv.c b/drivers/hv/vmbus_drv.c
index 41ec0c95df33f..0e5a98ebe23b1 100644
--- a/drivers/hv/vmbus_drv.c
+++ b/drivers/hv/vmbus_drv.c
@@ -1225,33 +1225,39 @@ static void vmbus_chan_sched(struct hv_per_cpu_context *hv_cpu)
 		if (relid == 0)
 			continue;
 
+		/*
+		 * Pairs with the kfree_rcu() in vmbus_chan_release().
+		 * Guarantees that the channel data structure doesn't
+		 * get freed while the channel pointer below is being
+		 * dereferenced.
+		 */
 		rcu_read_lock();
 
 		/* Find channel based on relid */
-		list_for_each_entry_rcu(channel, &hv_cpu->chan_list, percpu_list) {
-			if (channel->offermsg.child_relid != relid)
-				continue;
+		channel = relid2channel(relid);
+		if (channel == NULL)
+			goto sched_unlock_rcu;
 
-			if (channel->rescind)
-				continue;
+		if (channel->rescind)
+			goto sched_unlock_rcu;
 
-			trace_vmbus_chan_sched(channel);
+		trace_vmbus_chan_sched(channel);
 
-			++channel->interrupts;
+		++channel->interrupts;
 
-			switch (channel->callback_mode) {
-			case HV_CALL_ISR:
-				vmbus_channel_isr(channel);
-				break;
+		switch (channel->callback_mode) {
+		case HV_CALL_ISR:
+			vmbus_channel_isr(channel);
+			break;
 
-			case HV_CALL_BATCHED:
-				hv_begin_read(&channel->inbound);
-				/* fallthrough */
-			case HV_CALL_DIRECT:
-				tasklet_schedule(&channel->callback_event);
-			}
+		case HV_CALL_BATCHED:
+			hv_begin_read(&channel->inbound);
+			fallthrough;
+		case HV_CALL_DIRECT:
+			tasklet_schedule(&channel->callback_event);
 		}
 
+sched_unlock_rcu:
 		rcu_read_unlock();
 	}
 }
@@ -2237,9 +2243,12 @@ static int vmbus_bus_suspend(struct device *dev)
 
 	list_for_each_entry(channel, &vmbus_connection.chn_list, listentry) {
 		/*
-		 * Invalidate the field. Upon resume, vmbus_onoffer() will fix
-		 * up the field, and the other fields (if necessary).
+		 * Remove the channel from the array of channels and invalidate
+		 * the channel's relid.  Upon resume, vmbus_onoffer() will fix
+		 * up the relid (and other fields, if necessary) and add the
+		 * channel back to the array.
 		 */
+		vmbus_channel_unmap_relid(channel);
 		channel->offermsg.child_relid = INVALID_RELID;
 
 		if (is_hvsock_channel(channel)) {
@@ -2475,6 +2484,7 @@ static void __exit vmbus_exit(void)
 	hv_debug_rm_all_dir();
 
 	vmbus_free_channels();
+	kfree(vmbus_connection.channels);
 
 	if (ms_hyperv.misc_features & HV_FEATURE_GUEST_CRASH_MSR_AVAILABLE) {
 		kmsg_dump_unregister(&hv_kmsg_dumper);
diff --git a/include/linux/hyperv.h b/include/linux/hyperv.h
index 692c89ccf5dfd..6c794fd5c903e 100644
--- a/include/linux/hyperv.h
+++ b/include/linux/hyperv.h
@@ -854,11 +854,6 @@ struct vmbus_channel {
 	 * Support per-channel state for use by vmbus drivers.
 	 */
 	void *per_channel_state;
-	/*
-	 * To support per-cpu lookup mapping of relid to channel,
-	 * link up channels based on their CPU affinity.
-	 */
-	struct list_head percpu_list;
 
 	/*
 	 * Defer freeing channel until after all cpu's have
-- 
2.24.0


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH 04/11] hv_netvsc: Disable NAPI before closing the VMBus channel
  2020-04-06  0:15 [PATCH 00/11] VMBus channel interrupt reassignment Andrea Parri (Microsoft)
                   ` (2 preceding siblings ...)
  2020-04-06  0:15 ` [PATCH 03/11] Drivers: hv: vmbus: Replace the per-CPU channel lists with a global array of channels Andrea Parri (Microsoft)
@ 2020-04-06  0:15 ` Andrea Parri (Microsoft)
  2020-04-10 19:20   ` Michael Kelley
  2020-04-06  0:15 ` [PATCH 05/11] hv_utils: Always execute the fcopy and vss callbacks in a tasklet Andrea Parri (Microsoft)
                   ` (7 subsequent siblings)
  11 siblings, 1 reply; 29+ messages in thread
From: Andrea Parri (Microsoft) @ 2020-04-06  0:15 UTC (permalink / raw)
  To: linux-kernel
  Cc: K . Y . Srinivasan, Haiyang Zhang, Stephen Hemminger, Wei Liu,
	linux-hyperv, Michael Kelley, Dexuan Cui, Boqun Feng,
	Vitaly Kuznetsov, Andrea Parri (Microsoft),
	Stephen Hemminger, David S. Miller, netdev

vmbus_chan_sched() might call the netvsc driver callback function that
ends up scheduling NAPI work.  This "work" can access the channel ring
buffer, so we must ensure that any such work is completed and that the
ring buffer is no longer being accessed before freeing the ring buffer
data structure in the channel closure path.  To this end, disable NAPI
before calling vmbus_close() in netvsc_device_remove().

Suggested-by: Michael Kelley <mikelley@microsoft.com>
Signed-off-by: Andrea Parri (Microsoft) <parri.andrea@gmail.com>
Acked-by: Stephen Hemminger <stephen@networkplumber.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: <netdev@vger.kernel.org>
---
 drivers/hv/channel.c        | 6 ++++++
 drivers/net/hyperv/netvsc.c | 7 +++++--
 2 files changed, 11 insertions(+), 2 deletions(-)

diff --git a/drivers/hv/channel.c b/drivers/hv/channel.c
index 23f358cb7f494..256ee90c74460 100644
--- a/drivers/hv/channel.c
+++ b/drivers/hv/channel.c
@@ -609,6 +609,12 @@ void vmbus_reset_channel_cb(struct vmbus_channel *channel)
 	 * the former is accessing channel->inbound.ring_buffer, the latter
 	 * could be freeing the ring_buffer pages, so here we must stop it
 	 * first.
+	 *
+	 * vmbus_chan_sched() might call the netvsc driver callback function
+	 * that ends up scheduling NAPI work that accesses the ring buffer.
+	 * At this point, we have to ensure that any such work is completed
+	 * and that the channel ring buffer is no longer being accessed, cf.
+	 * the calls to napi_disable() in netvsc_device_remove().
 	 */
 	tasklet_disable(&channel->callback_event);
 
diff --git a/drivers/net/hyperv/netvsc.c b/drivers/net/hyperv/netvsc.c
index 1b320bcf150a4..806cc85d10033 100644
--- a/drivers/net/hyperv/netvsc.c
+++ b/drivers/net/hyperv/netvsc.c
@@ -635,9 +635,12 @@ void netvsc_device_remove(struct hv_device *device)
 
 	RCU_INIT_POINTER(net_device_ctx->nvdev, NULL);
 
-	/* And disassociate NAPI context from device */
-	for (i = 0; i < net_device->num_chn; i++)
+	/* Disable NAPI and disassociate its context from the device. */
+	for (i = 0; i < net_device->num_chn; i++) {
+		/* See also vmbus_reset_channel_cb(). */
+		napi_disable(&net_device->chan_table[i].napi);
 		netif_napi_del(&net_device->chan_table[i].napi);
+	}
 
 	/*
 	 * At this point, no one should be accessing net_device
-- 
2.24.0


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH 05/11] hv_utils: Always execute the fcopy and vss callbacks in a tasklet
  2020-04-06  0:15 [PATCH 00/11] VMBus channel interrupt reassignment Andrea Parri (Microsoft)
                   ` (3 preceding siblings ...)
  2020-04-06  0:15 ` [PATCH 04/11] hv_netvsc: Disable NAPI before closing the VMBus channel Andrea Parri (Microsoft)
@ 2020-04-06  0:15 ` Andrea Parri (Microsoft)
  2020-04-10 19:26   ` Michael Kelley
  2020-04-06  0:15 ` [PATCH 06/11] Drivers: hv: vmbus: Use a spin lock for synchronizing channel scheduling vs. channel removal Andrea Parri (Microsoft)
                   ` (6 subsequent siblings)
  11 siblings, 1 reply; 29+ messages in thread
From: Andrea Parri (Microsoft) @ 2020-04-06  0:15 UTC (permalink / raw)
  To: linux-kernel
  Cc: K . Y . Srinivasan, Haiyang Zhang, Stephen Hemminger, Wei Liu,
	linux-hyperv, Michael Kelley, Dexuan Cui, Boqun Feng,
	Vitaly Kuznetsov, Andrea Parri (Microsoft)

The fcopy and vss callback functions could be running in a tasklet
at the same time they are called in hv_poll_channel().  Current code
serializes the invocations of these functions, and their accesses to
the channel ring buffer, by sending an IPI to the CPU that is allowed
to access the ring buffer, cf. hv_poll_channel().  This IPI mechanism
becomes infeasible if we allow changing the CPU that a channel will
interrupt.  Instead modify the callback wrappers to always execute
the fcopy and vss callbacks in a tasklet, thus mirroring the solution
for the kvp callback functions adopted since commit a3ade8cc474d8
("HV: properly delay KVP packets when negotiation is in progress").
This will ensure that the callback function can't run on two CPUs at
the same time.

Suggested-by: Michael Kelley <mikelley@microsoft.com>
Signed-off-by: Andrea Parri (Microsoft) <parri.andrea@gmail.com>
---
 drivers/hv/hv_fcopy.c     | 2 +-
 drivers/hv/hv_snapshot.c  | 2 +-
 drivers/hv/hyperv_vmbus.h | 7 +------
 3 files changed, 3 insertions(+), 8 deletions(-)

diff --git a/drivers/hv/hv_fcopy.c b/drivers/hv/hv_fcopy.c
index bb9ba3f7c7949..5040d7e0cd9e9 100644
--- a/drivers/hv/hv_fcopy.c
+++ b/drivers/hv/hv_fcopy.c
@@ -71,7 +71,7 @@ static void fcopy_poll_wrapper(void *channel)
 {
 	/* Transaction is finished, reset the state here to avoid races. */
 	fcopy_transaction.state = HVUTIL_READY;
-	hv_fcopy_onchannelcallback(channel);
+	tasklet_schedule(&((struct vmbus_channel *)channel)->callback_event);
 }
 
 static void fcopy_timeout_func(struct work_struct *dummy)
diff --git a/drivers/hv/hv_snapshot.c b/drivers/hv/hv_snapshot.c
index 1c75b38f0d6da..783779e4cc1a5 100644
--- a/drivers/hv/hv_snapshot.c
+++ b/drivers/hv/hv_snapshot.c
@@ -80,7 +80,7 @@ static void vss_poll_wrapper(void *channel)
 {
 	/* Transaction is finished, reset the state here to avoid races. */
 	vss_transaction.state = HVUTIL_READY;
-	hv_vss_onchannelcallback(channel);
+	tasklet_schedule(&((struct vmbus_channel *)channel)->callback_event);
 }
 
 /*
diff --git a/drivers/hv/hyperv_vmbus.h b/drivers/hv/hyperv_vmbus.h
index 41b7267da529a..8df138f98e9e5 100644
--- a/drivers/hv/hyperv_vmbus.h
+++ b/drivers/hv/hyperv_vmbus.h
@@ -377,12 +377,7 @@ static inline void hv_poll_channel(struct vmbus_channel *channel,
 {
 	if (!channel)
 		return;
-
-	if (in_interrupt() && (channel->target_cpu == smp_processor_id())) {
-		cb(channel);
-		return;
-	}
-	smp_call_function_single(channel->target_cpu, cb, channel, true);
+	cb(channel);
 }
 
 enum hvutil_device_state {
-- 
2.24.0


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH 06/11] Drivers: hv: vmbus: Use a spin lock for synchronizing channel scheduling vs. channel removal
  2020-04-06  0:15 [PATCH 00/11] VMBus channel interrupt reassignment Andrea Parri (Microsoft)
                   ` (4 preceding siblings ...)
  2020-04-06  0:15 ` [PATCH 05/11] hv_utils: Always execute the fcopy and vss callbacks in a tasklet Andrea Parri (Microsoft)
@ 2020-04-06  0:15 ` Andrea Parri (Microsoft)
  2020-04-10 19:28   ` Michael Kelley
  2020-04-06  0:15 ` [PATCH 07/11] PCI: hv: Prepare hv_compose_msi_msg() for the VMBus-channel-interrupt-to-vCPU reassignment functionality Andrea Parri (Microsoft)
                   ` (5 subsequent siblings)
  11 siblings, 1 reply; 29+ messages in thread
From: Andrea Parri (Microsoft) @ 2020-04-06  0:15 UTC (permalink / raw)
  To: linux-kernel
  Cc: K . Y . Srinivasan, Haiyang Zhang, Stephen Hemminger, Wei Liu,
	linux-hyperv, Michael Kelley, Dexuan Cui, Boqun Feng,
	Vitaly Kuznetsov, Andrea Parri (Microsoft)

Since vmbus_chan_sched() dereferences the ring buffer pointer, we have
to make sure that the ring buffer data structures don't get freed while
such dereferencing is happening.  Current code does this by sending an
IPI to the CPU that is allowed to access that ring buffer from interrupt
level, cf., vmbus_reset_channel_cb().  But with the new functionality
to allow changing the CPU that a channel will interrupt, we can't be
sure what CPU will be running the vmbus_chan_sched() function for a
particular channel, so the current IPI mechanism is infeasible.

Instead synchronize vmbus_chan_sched() and vmbus_reset_channel_cb() by
using the (newly introduced) per-channel spin lock "sched_lock".  Move
the test for onchannel_callback being NULL before the "switch" control
statement in vmbus_chan_sched(), in order to not access the ring buffer
if the vmbus_reset_channel_cb() has been completed on the channel.

Suggested-by: Michael Kelley <mikelley@microsoft.com>
Signed-off-by: Andrea Parri (Microsoft) <parri.andrea@gmail.com>
---
 drivers/hv/channel.c      | 24 +++++++-----------------
 drivers/hv/channel_mgmt.c |  1 +
 drivers/hv/vmbus_drv.c    | 30 +++++++++++++++++-------------
 include/linux/hyperv.h    |  6 ++++++
 4 files changed, 31 insertions(+), 30 deletions(-)

diff --git a/drivers/hv/channel.c b/drivers/hv/channel.c
index 256ee90c74460..132e476f87b2e 100644
--- a/drivers/hv/channel.c
+++ b/drivers/hv/channel.c
@@ -594,15 +594,10 @@ int vmbus_teardown_gpadl(struct vmbus_channel *channel, u32 gpadl_handle)
 }
 EXPORT_SYMBOL_GPL(vmbus_teardown_gpadl);
 
-static void reset_channel_cb(void *arg)
-{
-	struct vmbus_channel *channel = arg;
-
-	channel->onchannel_callback = NULL;
-}
-
 void vmbus_reset_channel_cb(struct vmbus_channel *channel)
 {
+	unsigned long flags;
+
 	/*
 	 * vmbus_on_event(), running in the per-channel tasklet, can race
 	 * with vmbus_close_internal() in the case of SMP guest, e.g., when
@@ -618,17 +613,12 @@ void vmbus_reset_channel_cb(struct vmbus_channel *channel)
 	 */
 	tasklet_disable(&channel->callback_event);
 
-	channel->sc_creation_callback = NULL;
+	/* See the inline comments in vmbus_chan_sched(). */
+	spin_lock_irqsave(&channel->sched_lock, flags);
+	channel->onchannel_callback = NULL;
+	spin_unlock_irqrestore(&channel->sched_lock, flags);
 
-	/* Stop the callback asap */
-	if (channel->target_cpu != get_cpu()) {
-		put_cpu();
-		smp_call_function_single(channel->target_cpu, reset_channel_cb,
-					 channel, true);
-	} else {
-		reset_channel_cb(channel);
-		put_cpu();
-	}
+	channel->sc_creation_callback = NULL;
 
 	/* Re-enable tasklet for use on re-open */
 	tasklet_enable(&channel->callback_event);
diff --git a/drivers/hv/channel_mgmt.c b/drivers/hv/channel_mgmt.c
index 8c179f6866ef7..46ea978e4222a 100644
--- a/drivers/hv/channel_mgmt.c
+++ b/drivers/hv/channel_mgmt.c
@@ -315,6 +315,7 @@ static struct vmbus_channel *alloc_channel(void)
 	if (!channel)
 		return NULL;
 
+	spin_lock_init(&channel->sched_lock);
 	spin_lock_init(&channel->lock);
 	init_completion(&channel->rescind_event);
 
diff --git a/drivers/hv/vmbus_drv.c b/drivers/hv/vmbus_drv.c
index 0e5a98ebe23b1..18bd1c0e09a83 100644
--- a/drivers/hv/vmbus_drv.c
+++ b/drivers/hv/vmbus_drv.c
@@ -1174,18 +1174,6 @@ static void vmbus_force_channel_rescinded(struct vmbus_channel *channel)
 }
 #endif /* CONFIG_PM_SLEEP */
 
-/*
- * Direct callback for channels using other deferred processing
- */
-static void vmbus_channel_isr(struct vmbus_channel *channel)
-{
-	void (*callback_fn)(void *);
-
-	callback_fn = READ_ONCE(channel->onchannel_callback);
-	if (likely(callback_fn != NULL))
-		(*callback_fn)(channel->channel_callback_context);
-}
-
 /*
  * Schedule all channels with events pending
  */
@@ -1216,6 +1204,7 @@ static void vmbus_chan_sched(struct hv_per_cpu_context *hv_cpu)
 		return;
 
 	for_each_set_bit(relid, recv_int_page, maxbits) {
+		void (*callback_fn)(void *context);
 		struct vmbus_channel *channel;
 
 		if (!sync_test_and_clear_bit(relid, recv_int_page))
@@ -1241,13 +1230,26 @@ static void vmbus_chan_sched(struct hv_per_cpu_context *hv_cpu)
 		if (channel->rescind)
 			goto sched_unlock_rcu;
 
+		/*
+		 * Make sure that the ring buffer data structure doesn't get
+		 * freed while we dereference the ring buffer pointer.  Test
+		 * for the channel's onchannel_callback being NULL within a
+		 * sched_lock critical section.  See also the inline comments
+		 * in vmbus_reset_channel_cb().
+		 */
+		spin_lock(&channel->sched_lock);
+
+		callback_fn = channel->onchannel_callback;
+		if (unlikely(callback_fn == NULL))
+			goto sched_unlock;
+
 		trace_vmbus_chan_sched(channel);
 
 		++channel->interrupts;
 
 		switch (channel->callback_mode) {
 		case HV_CALL_ISR:
-			vmbus_channel_isr(channel);
+			(*callback_fn)(channel->channel_callback_context);
 			break;
 
 		case HV_CALL_BATCHED:
@@ -1257,6 +1259,8 @@ static void vmbus_chan_sched(struct hv_per_cpu_context *hv_cpu)
 			tasklet_schedule(&channel->callback_event);
 		}
 
+sched_unlock:
+		spin_unlock(&channel->sched_lock);
 sched_unlock_rcu:
 		rcu_read_unlock();
 	}
diff --git a/include/linux/hyperv.h b/include/linux/hyperv.h
index 6c794fd5c903e..ce32ab186192f 100644
--- a/include/linux/hyperv.h
+++ b/include/linux/hyperv.h
@@ -771,6 +771,12 @@ struct vmbus_channel {
 	void (*onchannel_callback)(void *context);
 	void *channel_callback_context;
 
+	/*
+	 * Synchronize channel scheduling and channel removal; see the inline
+	 * comments in vmbus_chan_sched() and vmbus_reset_channel_cb().
+	 */
+	spinlock_t sched_lock;
+
 	/*
 	 * A channel can be marked for one of three modes of reading:
 	 *   BATCHED - callback called from taslket and should read
-- 
2.24.0


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH 07/11] PCI: hv: Prepare hv_compose_msi_msg() for the VMBus-channel-interrupt-to-vCPU reassignment functionality
  2020-04-06  0:15 [PATCH 00/11] VMBus channel interrupt reassignment Andrea Parri (Microsoft)
                   ` (5 preceding siblings ...)
  2020-04-06  0:15 ` [PATCH 06/11] Drivers: hv: vmbus: Use a spin lock for synchronizing channel scheduling vs. channel removal Andrea Parri (Microsoft)
@ 2020-04-06  0:15 ` Andrea Parri (Microsoft)
  2020-04-10 19:30   ` Michael Kelley
  2020-04-06  0:15 ` [PATCH 08/11] Drivers: hv: vmbus: Remove the unused HV_LOCALIZED channel affinity logic Andrea Parri (Microsoft)
                   ` (4 subsequent siblings)
  11 siblings, 1 reply; 29+ messages in thread
From: Andrea Parri (Microsoft) @ 2020-04-06  0:15 UTC (permalink / raw)
  To: linux-kernel
  Cc: K . Y . Srinivasan, Haiyang Zhang, Stephen Hemminger, Wei Liu,
	linux-hyperv, Michael Kelley, Dexuan Cui, Boqun Feng,
	Vitaly Kuznetsov, Andrea Parri (Microsoft),
	Lorenzo Pieralisi, Andrew Murray, Bjorn Helgaas, linux-pci

The current implementation of hv_compose_msi_msg() is incompatible with
the new functionality that allows changing the vCPU a VMBus channel will
interrupt: if this function always calls hv_pci_onchannelcallback() in
the polling loop, the interrupt going to a different CPU could cause
hv_pci_onchannelcallback() to be running simultaneously in a tasklet,
which will break.  The current code also has a problem in that it is not
synchronized with vmbus_reset_channel_cb(): hv_compose_msi_msg() could
be accessing the ring buffer via the call of hv_pci_onchannelcallback()
well after the time that vmbus_reset_channel_cb() has finished.

Fix these issues as follows.  Disable the channel tasklet before
entering the polling loop in hv_compose_msi_msg() and re-enable it when
done.  This will prevent hv_pci_onchannelcallback() from running in a
tasklet on a different CPU.  Moreover, poll by always calling
hv_pci_onchannelcallback(), but check the channel callback function for
NULL and invoke the callback within a sched_lock critical section.  This
will prevent hv_compose_msi_msg() from accessing the ring buffer after
vmbus_reset_channel_cb() has acquired the sched_lock spinlock.

Suggested-by: Michael Kelley <mikelley@microsoft.com>
Signed-off-by: Andrea Parri (Microsoft) <parri.andrea@gmail.com>
Cc: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Cc: Andrew Murray <amurray@thegoodpenguin.co.uk>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: <linux-pci@vger.kernel.org>
---
 drivers/pci/controller/pci-hyperv.c | 44 ++++++++++++++++++-----------
 1 file changed, 28 insertions(+), 16 deletions(-)

diff --git a/drivers/pci/controller/pci-hyperv.c b/drivers/pci/controller/pci-hyperv.c
index 9977abff92fc5..e6020480a28b1 100644
--- a/drivers/pci/controller/pci-hyperv.c
+++ b/drivers/pci/controller/pci-hyperv.c
@@ -1350,11 +1350,11 @@ static void hv_compose_msi_msg(struct irq_data *data, struct msi_msg *msg)
 {
 	struct irq_cfg *cfg = irqd_cfg(data);
 	struct hv_pcibus_device *hbus;
+	struct vmbus_channel *channel;
 	struct hv_pci_dev *hpdev;
 	struct pci_bus *pbus;
 	struct pci_dev *pdev;
 	struct cpumask *dest;
-	unsigned long flags;
 	struct compose_comp_ctxt comp;
 	struct tran_int_desc *int_desc;
 	struct {
@@ -1372,6 +1372,7 @@ static void hv_compose_msi_msg(struct irq_data *data, struct msi_msg *msg)
 	dest = irq_data_get_effective_affinity_mask(data);
 	pbus = pdev->bus;
 	hbus = container_of(pbus->sysdata, struct hv_pcibus_device, sysdata);
+	channel = hbus->hdev->channel;
 	hpdev = get_pcichild_wslot(hbus, devfn_to_wslot(pdev->devfn));
 	if (!hpdev)
 		goto return_null_message;
@@ -1428,43 +1429,52 @@ static void hv_compose_msi_msg(struct irq_data *data, struct msi_msg *msg)
 		goto free_int_desc;
 	}
 
+	/*
+	 * Prevents hv_pci_onchannelcallback() from running concurrently
+	 * in the tasklet.
+	 */
+	tasklet_disable(&channel->callback_event);
+
 	/*
 	 * Since this function is called with IRQ locks held, can't
 	 * do normal wait for completion; instead poll.
 	 */
 	while (!try_wait_for_completion(&comp.comp_pkt.host_event)) {
+		unsigned long flags;
+
 		/* 0xFFFF means an invalid PCI VENDOR ID. */
 		if (hv_pcifront_get_vendor_id(hpdev) == 0xFFFF) {
 			dev_err_once(&hbus->hdev->device,
 				     "the device has gone\n");
-			goto free_int_desc;
+			goto enable_tasklet;
 		}
 
 		/*
-		 * When the higher level interrupt code calls us with
-		 * interrupt disabled, we must poll the channel by calling
-		 * the channel callback directly when channel->target_cpu is
-		 * the current CPU. When the higher level interrupt code
-		 * calls us with interrupt enabled, let's add the
-		 * local_irq_save()/restore() to avoid race:
-		 * hv_pci_onchannelcallback() can also run in tasklet.
+		 * Make sure that the ring buffer data structure doesn't get
+		 * freed while we dereference the ring buffer pointer.  Test
+		 * for the channel's onchannel_callback being NULL within a
+		 * sched_lock critical section.  See also the inline comments
+		 * in vmbus_reset_channel_cb().
 		 */
-		local_irq_save(flags);
-
-		if (hbus->hdev->channel->target_cpu == smp_processor_id())
-			hv_pci_onchannelcallback(hbus);
-
-		local_irq_restore(flags);
+		spin_lock_irqsave(&channel->sched_lock, flags);
+		if (unlikely(channel->onchannel_callback == NULL)) {
+			spin_unlock_irqrestore(&channel->sched_lock, flags);
+			goto enable_tasklet;
+		}
+		hv_pci_onchannelcallback(hbus);
+		spin_unlock_irqrestore(&channel->sched_lock, flags);
 
 		if (hpdev->state == hv_pcichild_ejecting) {
 			dev_err_once(&hbus->hdev->device,
 				     "the device is being ejected\n");
-			goto free_int_desc;
+			goto enable_tasklet;
 		}
 
 		udelay(100);
 	}
 
+	tasklet_enable(&channel->callback_event);
+
 	if (comp.comp_pkt.completion_status < 0) {
 		dev_err(&hbus->hdev->device,
 			"Request for interrupt failed: 0x%x",
@@ -1488,6 +1498,8 @@ static void hv_compose_msi_msg(struct irq_data *data, struct msi_msg *msg)
 	put_pcichild(hpdev);
 	return;
 
+enable_tasklet:
+	tasklet_enable(&channel->callback_event);
 free_int_desc:
 	kfree(int_desc);
 drop_reference:
-- 
2.24.0


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH 08/11] Drivers: hv: vmbus: Remove the unused HV_LOCALIZED channel affinity logic
  2020-04-06  0:15 [PATCH 00/11] VMBus channel interrupt reassignment Andrea Parri (Microsoft)
                   ` (6 preceding siblings ...)
  2020-04-06  0:15 ` [PATCH 07/11] PCI: hv: Prepare hv_compose_msi_msg() for the VMBus-channel-interrupt-to-vCPU reassignment functionality Andrea Parri (Microsoft)
@ 2020-04-06  0:15 ` Andrea Parri (Microsoft)
  2020-04-10 19:32   ` Michael Kelley
  2020-04-06  0:15 ` [PATCH 09/11] Drivers: hv: vmbus: Synchronize init_vp_index() vs. CPU hotplug Andrea Parri (Microsoft)
                   ` (3 subsequent siblings)
  11 siblings, 1 reply; 29+ messages in thread
From: Andrea Parri (Microsoft) @ 2020-04-06  0:15 UTC (permalink / raw)
  To: linux-kernel
  Cc: K . Y . Srinivasan, Haiyang Zhang, Stephen Hemminger, Wei Liu,
	linux-hyperv, Michael Kelley, Dexuan Cui, Boqun Feng,
	Vitaly Kuznetsov, Andrea Parri (Microsoft)

The logic is unused since commit 509879bdb30b8 ("Drivers: hv: Introduce
a policy for controlling channel affinity").

This logic assumes that a channel target_cpu doesn't change during the
lifetime of a channel, but this assumption is incompatible with the new
functionality that allows changing the vCPU a channel will interrupt.

Signed-off-by: Andrea Parri (Microsoft) <parri.andrea@gmail.com>
---
 drivers/hv/channel_mgmt.c | 105 +++++++++-----------------------------
 include/linux/hyperv.h    |  27 ----------
 2 files changed, 25 insertions(+), 107 deletions(-)

diff --git a/drivers/hv/channel_mgmt.c b/drivers/hv/channel_mgmt.c
index 46ea978e4222a..9041e5950e367 100644
--- a/drivers/hv/channel_mgmt.c
+++ b/drivers/hv/channel_mgmt.c
@@ -433,14 +433,6 @@ void hv_process_channel_removal(struct vmbus_channel *channel)
 		spin_unlock_irqrestore(&primary_channel->lock, flags);
 	}
 
-	/*
-	 * We need to free the bit for init_vp_index() to work in the case
-	 * of sub-channel, when we reload drivers like hv_netvsc.
-	 */
-	if (channel->affinity_policy == HV_LOCALIZED)
-		cpumask_clear_cpu(channel->target_cpu,
-				  &primary_channel->alloced_cpus_in_node);
-
 	/*
 	 * Upon suspend, an in-use hv_sock channel is marked as "rescinded" and
 	 * the relid is invalidated; after hibernation, when the user-space app
@@ -662,20 +654,21 @@ static DEFINE_SPINLOCK(bind_channel_to_cpu_lock);
 /*
  * Starting with Win8, we can statically distribute the incoming
  * channel interrupt load by binding a channel to VCPU.
- * We distribute the interrupt loads to one or more NUMA nodes based on
- * the channel's affinity_policy.
  *
  * For pre-win8 hosts or non-performance critical channels we assign the
  * first CPU in the first NUMA node.
+ *
+ * Starting with win8, performance critical channels will be distributed
+ * evenly among all the available NUMA nodes.  Once the node is assigned,
+ * we will assign the CPU based on a simple round robin scheme.
  */
 static void init_vp_index(struct vmbus_channel *channel, u16 dev_type)
 {
-	u32 cur_cpu;
 	bool perf_chn = vmbus_devs[dev_type].perf_device;
-	struct vmbus_channel *primary = channel->primary_channel;
-	int next_node;
 	cpumask_var_t available_mask;
 	struct cpumask *alloced_mask;
+	u32 target_cpu;
+	int numa_node;
 
 	if ((vmbus_proto_version == VERSION_WS2008) ||
 	    (vmbus_proto_version == VERSION_WIN7) || (!perf_chn) ||
@@ -693,31 +686,27 @@ static void init_vp_index(struct vmbus_channel *channel, u16 dev_type)
 		return;
 	}
 
-	spin_lock(&bind_channel_to_cpu_lock);
-
 	/*
-	 * Based on the channel affinity policy, we will assign the NUMA
-	 * nodes.
+	 * Serializes the accesses to the global variable next_numa_node_id.
+	 * See also the header comment of the spin lock declaration.
 	 */
+	spin_lock(&bind_channel_to_cpu_lock);
 
-	if ((channel->affinity_policy == HV_BALANCED) || (!primary)) {
-		while (true) {
-			next_node = next_numa_node_id++;
-			if (next_node == nr_node_ids) {
-				next_node = next_numa_node_id = 0;
-				continue;
-			}
-			if (cpumask_empty(cpumask_of_node(next_node)))
-				continue;
-			break;
+	while (true) {
+		numa_node = next_numa_node_id++;
+		if (numa_node == nr_node_ids) {
+			next_numa_node_id = 0;
+			continue;
 		}
-		channel->numa_node = next_node;
-		primary = channel;
+		if (cpumask_empty(cpumask_of_node(numa_node)))
+			continue;
+		break;
 	}
-	alloced_mask = &hv_context.hv_numa_map[primary->numa_node];
+	channel->numa_node = numa_node;
+	alloced_mask = &hv_context.hv_numa_map[numa_node];
 
 	if (cpumask_weight(alloced_mask) ==
-	    cpumask_weight(cpumask_of_node(primary->numa_node))) {
+	    cpumask_weight(cpumask_of_node(numa_node))) {
 		/*
 		 * We have cycled through all the CPUs in the node;
 		 * reset the alloced map.
@@ -725,57 +714,13 @@ static void init_vp_index(struct vmbus_channel *channel, u16 dev_type)
 		cpumask_clear(alloced_mask);
 	}
 
-	cpumask_xor(available_mask, alloced_mask,
-		    cpumask_of_node(primary->numa_node));
+	cpumask_xor(available_mask, alloced_mask, cpumask_of_node(numa_node));
 
-	cur_cpu = -1;
-
-	if (primary->affinity_policy == HV_LOCALIZED) {
-		/*
-		 * Normally Hyper-V host doesn't create more subchannels
-		 * than there are VCPUs on the node but it is possible when not
-		 * all present VCPUs on the node are initialized by guest.
-		 * Clear the alloced_cpus_in_node to start over.
-		 */
-		if (cpumask_equal(&primary->alloced_cpus_in_node,
-				  cpumask_of_node(primary->numa_node)))
-			cpumask_clear(&primary->alloced_cpus_in_node);
-	}
-
-	while (true) {
-		cur_cpu = cpumask_next(cur_cpu, available_mask);
-		if (cur_cpu >= nr_cpu_ids) {
-			cur_cpu = -1;
-			cpumask_copy(available_mask,
-				     cpumask_of_node(primary->numa_node));
-			continue;
-		}
-
-		if (primary->affinity_policy == HV_LOCALIZED) {
-			/*
-			 * NOTE: in the case of sub-channel, we clear the
-			 * sub-channel related bit(s) in
-			 * primary->alloced_cpus_in_node in
-			 * hv_process_channel_removal(), so when we
-			 * reload drivers like hv_netvsc in SMP guest, here
-			 * we're able to re-allocate
-			 * bit from primary->alloced_cpus_in_node.
-			 */
-			if (!cpumask_test_cpu(cur_cpu,
-					      &primary->alloced_cpus_in_node)) {
-				cpumask_set_cpu(cur_cpu,
-						&primary->alloced_cpus_in_node);
-				cpumask_set_cpu(cur_cpu, alloced_mask);
-				break;
-			}
-		} else {
-			cpumask_set_cpu(cur_cpu, alloced_mask);
-			break;
-		}
-	}
+	target_cpu = cpumask_first(available_mask);
+	cpumask_set_cpu(target_cpu, alloced_mask);
 
-	channel->target_cpu = cur_cpu;
-	channel->target_vp = hv_cpu_number_to_vp_number(cur_cpu);
+	channel->target_cpu = target_cpu;
+	channel->target_vp = hv_cpu_number_to_vp_number(target_cpu);
 
 	spin_unlock(&bind_channel_to_cpu_lock);
 
diff --git a/include/linux/hyperv.h b/include/linux/hyperv.h
index ce32ab186192f..f8e7c22d41a1a 100644
--- a/include/linux/hyperv.h
+++ b/include/linux/hyperv.h
@@ -689,11 +689,6 @@ union hv_connection_id {
 	} u;
 };
 
-enum hv_numa_policy {
-	HV_BALANCED = 0,
-	HV_LOCALIZED,
-};
-
 enum vmbus_device_type {
 	HV_IDE = 0,
 	HV_SCSI,
@@ -808,10 +803,6 @@ struct vmbus_channel {
 	u32 target_vp;
 	/* The corresponding CPUID in the guest */
 	u32 target_cpu;
-	/*
-	 * State to manage the CPU affiliation of channels.
-	 */
-	struct cpumask alloced_cpus_in_node;
 	int numa_node;
 	/*
 	 * Support for sub-channels. For high performance devices,
@@ -898,18 +889,6 @@ struct vmbus_channel {
 	 */
 	bool low_latency;
 
-	/*
-	 * NUMA distribution policy:
-	 * We support two policies:
-	 * 1) Balanced: Here all performance critical channels are
-	 *    distributed evenly amongst all the NUMA nodes.
-	 *    This policy will be the default policy.
-	 * 2) Localized: All channels of a given instance of a
-	 *    performance critical service will be assigned CPUs
-	 *    within a selected NUMA node.
-	 */
-	enum hv_numa_policy affinity_policy;
-
 	bool probe_done;
 
 	/*
@@ -965,12 +944,6 @@ static inline bool is_sub_channel(const struct vmbus_channel *c)
 	return c->offermsg.offer.sub_channel_index != 0;
 }
 
-static inline void set_channel_affinity_state(struct vmbus_channel *c,
-					      enum hv_numa_policy policy)
-{
-	c->affinity_policy = policy;
-}
-
 static inline void set_channel_read_mode(struct vmbus_channel *c,
 					enum hv_callback_mode mode)
 {
-- 
2.24.0


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH 09/11] Drivers: hv: vmbus: Synchronize init_vp_index() vs. CPU hotplug
  2020-04-06  0:15 [PATCH 00/11] VMBus channel interrupt reassignment Andrea Parri (Microsoft)
                   ` (7 preceding siblings ...)
  2020-04-06  0:15 ` [PATCH 08/11] Drivers: hv: vmbus: Remove the unused HV_LOCALIZED channel affinity logic Andrea Parri (Microsoft)
@ 2020-04-06  0:15 ` Andrea Parri (Microsoft)
  2020-04-10 19:33   ` Michael Kelley
  2020-04-06  0:15 ` [PATCH 10/11] Drivers: hv: vmbus: Introduce the CHANNELMSG_MODIFYCHANNEL message type Andrea Parri (Microsoft)
                   ` (2 subsequent siblings)
  11 siblings, 1 reply; 29+ messages in thread
From: Andrea Parri (Microsoft) @ 2020-04-06  0:15 UTC (permalink / raw)
  To: linux-kernel
  Cc: K . Y . Srinivasan, Haiyang Zhang, Stephen Hemminger, Wei Liu,
	linux-hyperv, Michael Kelley, Dexuan Cui, Boqun Feng,
	Vitaly Kuznetsov, Andrea Parri (Microsoft)

init_vp_index() may access the cpu_online_mask mask via its calls of
cpumask_of_node().  Make sure to protect these accesses with a
cpus_read_lock() critical section.

Also, remove some (hardcoded) instances of CPU(0) from init_vp_index()
and replace them with VMBUS_CONNECT_CPU.  The connect CPU can not go
offline, since Hyper-V does not provide a way to change it.

Finally, order the accesses of target_cpu from init_vp_index() and
hv_synic_cleanup() by relying on the channel_mutex; this is achieved
by moving the call of init_vp_index() into vmbus_process_offer().

Signed-off-by: Andrea Parri (Microsoft) <parri.andrea@gmail.com>
---
 drivers/hv/channel_mgmt.c | 47 ++++++++++++++++++++++++++++-----------
 drivers/hv/hv.c           |  7 +++---
 2 files changed, 38 insertions(+), 16 deletions(-)

diff --git a/drivers/hv/channel_mgmt.c b/drivers/hv/channel_mgmt.c
index 9041e5950e367..3785beead503d 100644
--- a/drivers/hv/channel_mgmt.c
+++ b/drivers/hv/channel_mgmt.c
@@ -18,6 +18,7 @@
 #include <linux/module.h>
 #include <linux/completion.h>
 #include <linux/delay.h>
+#include <linux/cpu.h>
 #include <linux/hyperv.h>
 #include <asm/mshyperv.h>
 
@@ -466,13 +467,8 @@ static void vmbus_add_channel_work(struct work_struct *work)
 		container_of(work, struct vmbus_channel, add_channel_work);
 	struct vmbus_channel *primary_channel = newchannel->primary_channel;
 	unsigned long flags;
-	u16 dev_type;
 	int ret;
 
-	dev_type = hv_get_dev_type(newchannel);
-
-	init_vp_index(newchannel, dev_type);
-
 	/*
 	 * This state is used to indicate a successful open
 	 * so that when we do close the channel normally, we
@@ -504,7 +500,7 @@ static void vmbus_add_channel_work(struct work_struct *work)
 	if (!newchannel->device_obj)
 		goto err_deq_chan;
 
-	newchannel->device_obj->device_id = dev_type;
+	newchannel->device_obj->device_id = hv_get_dev_type(newchannel);
 	/*
 	 * Add the new device to the bus. This will kick off device-driver
 	 * binding which eventually invokes the device driver's AddDevice()
@@ -560,6 +556,25 @@ static void vmbus_process_offer(struct vmbus_channel *newchannel)
 	unsigned long flags;
 	bool fnew = true;
 
+	/*
+	 * Initialize the target_CPU before inserting the channel in
+	 * the chn_list and sc_list lists, within the channel_mutex
+	 * critical section:
+	 *
+	 * CPU1				CPU2
+	 *
+	 * [vmbus_process_offer()]	[hv_syninc_cleanup()]
+	 *
+	 * STORE target_cpu		LOCK channel_mutex
+	 * LOCK channel_mutex		SEARCH chn_list
+	 * INSERT chn_list		LOAD target_cpu
+	 * UNLOCK channel_mutex		UNLOCK channel_mutex
+	 *
+	 * Forbids: CPU2's SEARCH from seeing CPU1's INSERT &&
+	 * 		CPU2's LOAD from *not* seing CPU1's STORE
+	 */
+	init_vp_index(newchannel, hv_get_dev_type(newchannel));
+
 	mutex_lock(&vmbus_connection.channel_mutex);
 
 	/* Remember the channels that should be cleaned up upon suspend. */
@@ -656,7 +671,7 @@ static DEFINE_SPINLOCK(bind_channel_to_cpu_lock);
  * channel interrupt load by binding a channel to VCPU.
  *
  * For pre-win8 hosts or non-performance critical channels we assign the
- * first CPU in the first NUMA node.
+ * VMBUS_CONNECT_CPU.
  *
  * Starting with win8, performance critical channels will be distributed
  * evenly among all the available NUMA nodes.  Once the node is assigned,
@@ -675,17 +690,22 @@ static void init_vp_index(struct vmbus_channel *channel, u16 dev_type)
 	    !alloc_cpumask_var(&available_mask, GFP_KERNEL)) {
 		/*
 		 * Prior to win8, all channel interrupts are
-		 * delivered on cpu 0.
+		 * delivered on VMBUS_CONNECT_CPU.
 		 * Also if the channel is not a performance critical
-		 * channel, bind it to cpu 0.
-		 * In case alloc_cpumask_var() fails, bind it to cpu 0.
+		 * channel, bind it to VMBUS_CONNECT_CPU.
+		 * In case alloc_cpumask_var() fails, bind it to
+		 * VMBUS_CONNECT_CPU.
 		 */
-		channel->numa_node = 0;
-		channel->target_cpu = 0;
-		channel->target_vp = hv_cpu_number_to_vp_number(0);
+		channel->numa_node = cpu_to_node(VMBUS_CONNECT_CPU);
+		channel->target_cpu = VMBUS_CONNECT_CPU;
+		channel->target_vp =
+			hv_cpu_number_to_vp_number(VMBUS_CONNECT_CPU);
 		return;
 	}
 
+	/* No CPUs can come up or down during this. */
+	cpus_read_lock();
+
 	/*
 	 * Serializes the accesses to the global variable next_numa_node_id.
 	 * See also the header comment of the spin lock declaration.
@@ -723,6 +743,7 @@ static void init_vp_index(struct vmbus_channel *channel, u16 dev_type)
 	channel->target_vp = hv_cpu_number_to_vp_number(target_cpu);
 
 	spin_unlock(&bind_channel_to_cpu_lock);
+	cpus_read_unlock();
 
 	free_cpumask_var(available_mask);
 }
diff --git a/drivers/hv/hv.c b/drivers/hv/hv.c
index 17bf1f229152b..188b42b07f07b 100644
--- a/drivers/hv/hv.c
+++ b/drivers/hv/hv.c
@@ -256,9 +256,10 @@ int hv_synic_cleanup(unsigned int cpu)
 
 	/*
 	 * Search for channels which are bound to the CPU we're about to
-	 * cleanup. In case we find one and vmbus is still connected we need to
-	 * fail, this will effectively prevent CPU offlining. There is no way
-	 * we can re-bind channels to different CPUs for now.
+	 * cleanup.  In case we find one and vmbus is still connected, we
+	 * fail; this will effectively prevent CPU offlining.
+	 *
+	 * TODO: Re-bind the channels to different CPUs.
 	 */
 	mutex_lock(&vmbus_connection.channel_mutex);
 	list_for_each_entry(channel, &vmbus_connection.chn_list, listentry) {
-- 
2.24.0


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH 10/11] Drivers: hv: vmbus: Introduce the CHANNELMSG_MODIFYCHANNEL message type
  2020-04-06  0:15 [PATCH 00/11] VMBus channel interrupt reassignment Andrea Parri (Microsoft)
                   ` (8 preceding siblings ...)
  2020-04-06  0:15 ` [PATCH 09/11] Drivers: hv: vmbus: Synchronize init_vp_index() vs. CPU hotplug Andrea Parri (Microsoft)
@ 2020-04-06  0:15 ` Andrea Parri (Microsoft)
  2020-04-10 19:34   ` Michael Kelley
  2020-04-06  0:15 ` [PATCH 11/11] scsi: storvsc: Re-init stor_chns when a channel interrupt is re-assigned Andrea Parri (Microsoft)
  2020-04-13 15:48 ` [PATCH 00/11] VMBus channel interrupt reassignment Wei Liu
  11 siblings, 1 reply; 29+ messages in thread
From: Andrea Parri (Microsoft) @ 2020-04-06  0:15 UTC (permalink / raw)
  To: linux-kernel
  Cc: K . Y . Srinivasan, Haiyang Zhang, Stephen Hemminger, Wei Liu,
	linux-hyperv, Michael Kelley, Dexuan Cui, Boqun Feng,
	Vitaly Kuznetsov, Andrea Parri (Microsoft)

VMBus version 4.1 and later support the CHANNELMSG_MODIFYCHANNEL(22)
message type which can be used to request Hyper-V to change the vCPU
that a channel will interrupt.

Introduce the CHANNELMSG_MODIFYCHANNEL message type, and define the
vmbus_send_modifychannel() function to send CHANNELMSG_MODIFYCHANNEL
requests to the host via a hypercall.  The function is then used to
define a sysfs "store" operation, which allows to change the (v)CPU
the channel will interrupt by using the sysfs interface.  The feature
can be used for load balancing or other purposes.

One interesting catch here is that Hyper-V can *not* currently ACK
CHANNELMSG_MODIFYCHANNEL messages with the promise that (after the ACK
is sent) the channel won't send any more interrupts to the "old" CPU.

The peculiarity of the CHANNELMSG_MODIFYCHANNEL messages is problematic
if the user want to take a CPU offline, since we don't want to take a
CPU offline (and, potentially, "lose" channel interrupts on such CPU)
if the host is still processing a CHANNELMSG_MODIFYCHANNEL message
associated to that CPU.

It is worth mentioning, however, that we have been unable to observe
the above mentioned "race": in all our tests, CHANNELMSG_MODIFYCHANNEL
requests appeared *as if* they were processed synchronously by the host.

Suggested-by: Michael Kelley <mikelley@microsoft.com>
Signed-off-by: Andrea Parri (Microsoft) <parri.andrea@gmail.com>
---
 drivers/hv/channel.c      |  28 ++++++++++
 drivers/hv/channel_mgmt.c |   2 +-
 drivers/hv/hv_trace.h     |  19 +++++++
 drivers/hv/vmbus_drv.c    | 108 +++++++++++++++++++++++++++++++++++++-
 include/linux/hyperv.h    |  10 +++-
 5 files changed, 163 insertions(+), 4 deletions(-)

diff --git a/drivers/hv/channel.c b/drivers/hv/channel.c
index 132e476f87b2e..90070b337c10d 100644
--- a/drivers/hv/channel.c
+++ b/drivers/hv/channel.c
@@ -289,6 +289,34 @@ int vmbus_send_tl_connect_request(const guid_t *shv_guest_servie_id,
 }
 EXPORT_SYMBOL_GPL(vmbus_send_tl_connect_request);
 
+/*
+ * Set/change the vCPU (@target_vp) the channel (@child_relid) will interrupt.
+ *
+ * CHANNELMSG_MODIFYCHANNEL messages are aynchronous.  Also, Hyper-V does not
+ * ACK such messages.  IOW we can't know when the host will stop interrupting
+ * the "old" vCPU and start interrupting the "new" vCPU for the given channel.
+ *
+ * The CHANNELMSG_MODIFYCHANNEL message type is supported since VMBus version
+ * VERSION_WIN10_V4_1.
+ */
+int vmbus_send_modifychannel(u32 child_relid, u32 target_vp)
+{
+	struct vmbus_channel_modifychannel conn_msg;
+	int ret;
+
+	memset(&conn_msg, 0, sizeof(conn_msg));
+	conn_msg.header.msgtype = CHANNELMSG_MODIFYCHANNEL;
+	conn_msg.child_relid = child_relid;
+	conn_msg.target_vp = target_vp;
+
+	ret = vmbus_post_msg(&conn_msg, sizeof(conn_msg), true);
+
+	trace_vmbus_send_modifychannel(&conn_msg, ret);
+
+	return ret;
+}
+EXPORT_SYMBOL_GPL(vmbus_send_modifychannel);
+
 /*
  * create_gpadl_header - Creates a gpadl for the specified buffer
  */
diff --git a/drivers/hv/channel_mgmt.c b/drivers/hv/channel_mgmt.c
index 3785beead503d..1058c814ab06e 100644
--- a/drivers/hv/channel_mgmt.c
+++ b/drivers/hv/channel_mgmt.c
@@ -1377,7 +1377,7 @@ channel_message_table[CHANNELMSG_COUNT] = {
 	{ CHANNELMSG_19,			0, NULL },
 	{ CHANNELMSG_20,			0, NULL },
 	{ CHANNELMSG_TL_CONNECT_REQUEST,	0, NULL },
-	{ CHANNELMSG_22,			0, NULL },
+	{ CHANNELMSG_MODIFYCHANNEL,		0, NULL },
 	{ CHANNELMSG_TL_CONNECT_RESULT,		0, NULL },
 };
 
diff --git a/drivers/hv/hv_trace.h b/drivers/hv/hv_trace.h
index e70783e33680f..a43bc76c2d5d0 100644
--- a/drivers/hv/hv_trace.h
+++ b/drivers/hv/hv_trace.h
@@ -296,6 +296,25 @@ TRACE_EVENT(vmbus_send_tl_connect_request,
 		    )
 	);
 
+TRACE_EVENT(vmbus_send_modifychannel,
+	    TP_PROTO(const struct vmbus_channel_modifychannel *msg,
+		     int ret),
+	    TP_ARGS(msg, ret),
+	    TP_STRUCT__entry(
+		    __field(u32, child_relid)
+		    __field(u32, target_vp)
+		    __field(int, ret)
+		    ),
+	    TP_fast_assign(
+		    __entry->child_relid = msg->child_relid;
+		    __entry->target_vp = msg->target_vp;
+		    __entry->ret = ret;
+		    ),
+	    TP_printk("binding child_relid 0x%x to target_vp 0x%x, ret %d",
+		      __entry->child_relid, __entry->target_vp, __entry->ret
+		    )
+	);
+
 DECLARE_EVENT_CLASS(vmbus_channel,
 	TP_PROTO(const struct vmbus_channel *channel),
 	TP_ARGS(channel),
diff --git a/drivers/hv/vmbus_drv.c b/drivers/hv/vmbus_drv.c
index 18bd1c0e09a83..0f204640c50c2 100644
--- a/drivers/hv/vmbus_drv.c
+++ b/drivers/hv/vmbus_drv.c
@@ -1579,8 +1579,24 @@ static ssize_t vmbus_chan_attr_show(struct kobject *kobj,
 	return attribute->show(chan, buf);
 }
 
+static ssize_t vmbus_chan_attr_store(struct kobject *kobj,
+				     struct attribute *attr, const char *buf,
+				     size_t count)
+{
+	const struct vmbus_chan_attribute *attribute
+		= container_of(attr, struct vmbus_chan_attribute, attr);
+	struct vmbus_channel *chan
+		= container_of(kobj, struct vmbus_channel, kobj);
+
+	if (!attribute->store)
+		return -EIO;
+
+	return attribute->store(chan, buf, count);
+}
+
 static const struct sysfs_ops vmbus_chan_sysfs_ops = {
 	.show = vmbus_chan_attr_show,
+	.store = vmbus_chan_attr_store,
 };
 
 static ssize_t out_mask_show(struct vmbus_channel *channel, char *buf)
@@ -1651,11 +1667,99 @@ static ssize_t write_avail_show(struct vmbus_channel *channel, char *buf)
 }
 static VMBUS_CHAN_ATTR_RO(write_avail);
 
-static ssize_t show_target_cpu(struct vmbus_channel *channel, char *buf)
+static ssize_t target_cpu_show(struct vmbus_channel *channel, char *buf)
 {
 	return sprintf(buf, "%u\n", channel->target_cpu);
 }
-static VMBUS_CHAN_ATTR(cpu, S_IRUGO, show_target_cpu, NULL);
+static ssize_t target_cpu_store(struct vmbus_channel *channel,
+				const char *buf, size_t count)
+{
+	ssize_t ret = count;
+	u32 target_cpu;
+
+	if (vmbus_proto_version < VERSION_WIN10_V4_1)
+		return -EIO;
+
+	if (sscanf(buf, "%uu", &target_cpu) != 1)
+		return -EIO;
+
+	/* Validate target_cpu for the cpumask_test_cpu() operation below. */
+	if (target_cpu >= nr_cpumask_bits)
+		return -EINVAL;
+
+	/* No CPUs should come up or down during this. */
+	cpus_read_lock();
+
+	if (!cpumask_test_cpu(target_cpu, cpu_online_mask)) {
+		cpus_read_unlock();
+		return -EINVAL;
+	}
+
+	/*
+	 * Synchronizes target_cpu_store() and channel closure:
+	 *
+	 * { Initially: state = CHANNEL_OPENED }
+	 *
+	 * CPU1				CPU2
+	 *
+	 * [target_cpu_store()]		[vmbus_disconnect_ring()]
+	 *
+	 * LOCK channel_mutex		LOCK channel_mutex
+	 * LOAD r1 = state		LOAD r2 = state
+	 * IF (r1 == CHANNEL_OPENED)	IF (r2 == CHANNEL_OPENED)
+	 *   SEND MODIFYCHANNEL		  STORE state = CHANNEL_OPEN
+	 *   [...]			  SEND CLOSECHANNEL
+	 * UNLOCK channel_mutex		UNLOCK channel_mutex
+	 *
+	 * Forbids: r1 == r2 == CHANNEL_OPENED (i.e., CPU1's LOCK precedes
+	 * 		CPU2's LOCK) && CPU2's SEND precedes CPU1's SEND
+	 *
+	 * Note.  The host processes the channel messages "sequentially", in
+	 * the order in which they are received on a per-partition basis.
+	 */
+	mutex_lock(&vmbus_connection.channel_mutex);
+
+	/*
+	 * Hyper-V will ignore MODIFYCHANNEL messages for "non-open" channels;
+	 * avoid sending the message and fail here for such channels.
+	 */
+	if (channel->state != CHANNEL_OPENED_STATE) {
+		ret = -EIO;
+		goto cpu_store_unlock;
+	}
+
+	if (channel->target_cpu == target_cpu)
+		goto cpu_store_unlock;
+
+	if (vmbus_send_modifychannel(channel->offermsg.child_relid,
+				     hv_cpu_number_to_vp_number(target_cpu))) {
+		ret = -EIO;
+		goto cpu_store_unlock;
+	}
+
+	/*
+	 * Warning.  At this point, there is *no* guarantee that the host will
+	 * have successfully processed the vmbus_send_modifychannel() request.
+	 * See the header comment of vmbus_send_modifychannel() for more info.
+	 *
+	 * Lags in the processing of the above vmbus_send_modifychannel() can
+	 * result in missed interrupts if the "old" target CPU is taken offline
+	 * before Hyper-V starts sending interrupts to the "new" target CPU.
+	 * But apart from this offlining scenario, the code tolerates such
+	 * lags.  It will function correctly even if a channel interrupt comes
+	 * in on a CPU that is different from the channel target_cpu value.
+	 */
+
+	channel->target_cpu = target_cpu;
+	channel->target_vp = hv_cpu_number_to_vp_number(target_cpu);
+	channel->numa_node = cpu_to_node(target_cpu);
+
+cpu_store_unlock:
+	mutex_unlock(&vmbus_connection.channel_mutex);
+	cpus_read_unlock();
+	return ret;
+}
+static VMBUS_CHAN_ATTR(cpu, 0644, target_cpu_show, target_cpu_store);
 
 static ssize_t channel_pending_show(struct vmbus_channel *channel,
 				    char *buf)
diff --git a/include/linux/hyperv.h b/include/linux/hyperv.h
index f8e7c22d41a1a..edfcd42319ef3 100644
--- a/include/linux/hyperv.h
+++ b/include/linux/hyperv.h
@@ -425,7 +425,7 @@ enum vmbus_channel_message_type {
 	CHANNELMSG_19				= 19,
 	CHANNELMSG_20				= 20,
 	CHANNELMSG_TL_CONNECT_REQUEST		= 21,
-	CHANNELMSG_22				= 22,
+	CHANNELMSG_MODIFYCHANNEL		= 22,
 	CHANNELMSG_TL_CONNECT_RESULT		= 23,
 	CHANNELMSG_COUNT
 };
@@ -620,6 +620,13 @@ struct vmbus_channel_tl_connect_request {
 	guid_t host_service_id;
 } __packed;
 
+/* Modify Channel parameters, cf. vmbus_send_modifychannel() */
+struct vmbus_channel_modifychannel {
+	struct vmbus_channel_message_header header;
+	u32 child_relid;
+	u32 target_vp;
+} __packed;
+
 struct vmbus_channel_version_response {
 	struct vmbus_channel_message_header header;
 	u8 version_supported;
@@ -1505,6 +1512,7 @@ extern __u32 vmbus_proto_version;
 
 int vmbus_send_tl_connect_request(const guid_t *shv_guest_servie_id,
 				  const guid_t *shv_host_servie_id);
+int vmbus_send_modifychannel(u32 child_relid, u32 target_vp);
 void vmbus_set_event(struct vmbus_channel *channel);
 
 /* Get the start of the ring buffer. */
-- 
2.24.0


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH 11/11] scsi: storvsc: Re-init stor_chns when a channel interrupt is re-assigned
  2020-04-06  0:15 [PATCH 00/11] VMBus channel interrupt reassignment Andrea Parri (Microsoft)
                   ` (9 preceding siblings ...)
  2020-04-06  0:15 ` [PATCH 10/11] Drivers: hv: vmbus: Introduce the CHANNELMSG_MODIFYCHANNEL message type Andrea Parri (Microsoft)
@ 2020-04-06  0:15 ` Andrea Parri (Microsoft)
  2020-04-06 19:54   ` Long Li
  2020-04-10 19:35   ` Michael Kelley
  2020-04-13 15:48 ` [PATCH 00/11] VMBus channel interrupt reassignment Wei Liu
  11 siblings, 2 replies; 29+ messages in thread
From: Andrea Parri (Microsoft) @ 2020-04-06  0:15 UTC (permalink / raw)
  To: linux-kernel
  Cc: K . Y . Srinivasan, Haiyang Zhang, Stephen Hemminger, Wei Liu,
	linux-hyperv, Michael Kelley, Dexuan Cui, Boqun Feng,
	Vitaly Kuznetsov, Andrea Parri (Microsoft),
	James E.J. Bottomley, Martin K. Petersen, linux-scsi

For each storvsc_device, storvsc keeps track of the channel target CPUs
associated to the device (alloced_cpus) and it uses this information to
fill a "cache" (stor_chns) mapping CPU->channel according to a certain
heuristic.  Update the alloced_cpus mask and the stor_chns array when a
channel of the storvsc device is re-assigned to a different CPU.

Signed-off-by: Andrea Parri (Microsoft) <parri.andrea@gmail.com>
Cc: "James E.J. Bottomley" <jejb@linux.ibm.com>
Cc: "Martin K. Petersen" <martin.petersen@oracle.com>
Cc: <linux-scsi@vger.kernel.org>
---
 drivers/hv/vmbus_drv.c     |  4 ++
 drivers/scsi/storvsc_drv.c | 95 ++++++++++++++++++++++++++++++++++----
 include/linux/hyperv.h     |  3 ++
 3 files changed, 94 insertions(+), 8 deletions(-)

diff --git a/drivers/hv/vmbus_drv.c b/drivers/hv/vmbus_drv.c
index 0f204640c50c2..f0be41bfcbf57 100644
--- a/drivers/hv/vmbus_drv.c
+++ b/drivers/hv/vmbus_drv.c
@@ -1750,6 +1750,10 @@ static ssize_t target_cpu_store(struct vmbus_channel *channel,
 	 * in on a CPU that is different from the channel target_cpu value.
 	 */
 
+	if (channel->change_target_cpu_callback)
+		(*channel->change_target_cpu_callback)(channel,
+				channel->target_cpu, target_cpu);
+
 	channel->target_cpu = target_cpu;
 	channel->target_vp = hv_cpu_number_to_vp_number(target_cpu);
 	channel->numa_node = cpu_to_node(target_cpu);
diff --git a/drivers/scsi/storvsc_drv.c b/drivers/scsi/storvsc_drv.c
index fb41636519ee8..a680592b9d32a 100644
--- a/drivers/scsi/storvsc_drv.c
+++ b/drivers/scsi/storvsc_drv.c
@@ -621,6 +621,63 @@ static inline struct storvsc_device *get_in_stor_device(
 
 }
 
+void storvsc_change_target_cpu(struct vmbus_channel *channel, u32 old, u32 new)
+{
+	struct storvsc_device *stor_device;
+	struct vmbus_channel *cur_chn;
+	bool old_is_alloced = false;
+	struct hv_device *device;
+	unsigned long flags;
+	int cpu;
+
+	device = channel->primary_channel ?
+			channel->primary_channel->device_obj
+				: channel->device_obj;
+	stor_device = get_out_stor_device(device);
+	if (!stor_device)
+		return;
+
+	/* See storvsc_do_io() -> get_og_chn(). */
+	spin_lock_irqsave(&device->channel->lock, flags);
+
+	/*
+	 * Determines if the storvsc device has other channels assigned to
+	 * the "old" CPU to update the alloced_cpus mask and the stor_chns
+	 * array.
+	 */
+	if (device->channel != channel && device->channel->target_cpu == old) {
+		cur_chn = device->channel;
+		old_is_alloced = true;
+		goto old_is_alloced;
+	}
+	list_for_each_entry(cur_chn, &device->channel->sc_list, sc_list) {
+		if (cur_chn == channel)
+			continue;
+		if (cur_chn->target_cpu == old) {
+			old_is_alloced = true;
+			goto old_is_alloced;
+		}
+	}
+
+old_is_alloced:
+	if (old_is_alloced)
+		WRITE_ONCE(stor_device->stor_chns[old], cur_chn);
+	else
+		cpumask_clear_cpu(old, &stor_device->alloced_cpus);
+
+	/* "Flush" the stor_chns array. */
+	for_each_possible_cpu(cpu) {
+		if (stor_device->stor_chns[cpu] && !cpumask_test_cpu(
+					cpu, &stor_device->alloced_cpus))
+			WRITE_ONCE(stor_device->stor_chns[cpu], NULL);
+	}
+
+	WRITE_ONCE(stor_device->stor_chns[new], channel);
+	cpumask_set_cpu(new, &stor_device->alloced_cpus);
+
+	spin_unlock_irqrestore(&device->channel->lock, flags);
+}
+
 static void handle_sc_creation(struct vmbus_channel *new_sc)
 {
 	struct hv_device *device = new_sc->primary_channel->device_obj;
@@ -648,6 +705,8 @@ static void handle_sc_creation(struct vmbus_channel *new_sc)
 		return;
 	}
 
+	new_sc->change_target_cpu_callback = storvsc_change_target_cpu;
+
 	/* Add the sub-channel to the array of available channels. */
 	stor_device->stor_chns[new_sc->target_cpu] = new_sc;
 	cpumask_set_cpu(new_sc->target_cpu, &stor_device->alloced_cpus);
@@ -876,6 +935,8 @@ static int storvsc_channel_init(struct hv_device *device, bool is_fc)
 	if (stor_device->stor_chns == NULL)
 		return -ENOMEM;
 
+	device->channel->change_target_cpu_callback = storvsc_change_target_cpu;
+
 	stor_device->stor_chns[device->channel->target_cpu] = device->channel;
 	cpumask_set_cpu(device->channel->target_cpu,
 			&stor_device->alloced_cpus);
@@ -1248,8 +1309,10 @@ static struct vmbus_channel *get_og_chn(struct storvsc_device *stor_device,
 	const struct cpumask *node_mask;
 	int num_channels, tgt_cpu;
 
-	if (stor_device->num_sc == 0)
+	if (stor_device->num_sc == 0) {
+		stor_device->stor_chns[q_num] = stor_device->device->channel;
 		return stor_device->device->channel;
+	}
 
 	/*
 	 * Our channel array is sparsley populated and we
@@ -1258,7 +1321,6 @@ static struct vmbus_channel *get_og_chn(struct storvsc_device *stor_device,
 	 * The strategy is simple:
 	 * I. Ensure NUMA locality
 	 * II. Distribute evenly (best effort)
-	 * III. Mapping is persistent.
 	 */
 
 	node_mask = cpumask_of_node(cpu_to_node(q_num));
@@ -1268,8 +1330,10 @@ static struct vmbus_channel *get_og_chn(struct storvsc_device *stor_device,
 		if (cpumask_test_cpu(tgt_cpu, node_mask))
 			num_channels++;
 	}
-	if (num_channels == 0)
+	if (num_channels == 0) {
+		stor_device->stor_chns[q_num] = stor_device->device->channel;
 		return stor_device->device->channel;
+	}
 
 	hash_qnum = q_num;
 	while (hash_qnum >= num_channels)
@@ -1295,6 +1359,7 @@ static int storvsc_do_io(struct hv_device *device,
 	struct storvsc_device *stor_device;
 	struct vstor_packet *vstor_packet;
 	struct vmbus_channel *outgoing_channel, *channel;
+	unsigned long flags;
 	int ret = 0;
 	const struct cpumask *node_mask;
 	int tgt_cpu;
@@ -1308,10 +1373,11 @@ static int storvsc_do_io(struct hv_device *device,
 
 	request->device  = device;
 	/*
-	 * Select an an appropriate channel to send the request out.
+	 * Select an appropriate channel to send the request out.
 	 */
-	if (stor_device->stor_chns[q_num] != NULL) {
-		outgoing_channel = stor_device->stor_chns[q_num];
+	/* See storvsc_change_target_cpu(). */
+	outgoing_channel = READ_ONCE(stor_device->stor_chns[q_num]);
+	if (outgoing_channel != NULL) {
 		if (outgoing_channel->target_cpu == q_num) {
 			/*
 			 * Ideally, we want to pick a different channel if
@@ -1324,7 +1390,10 @@ static int storvsc_do_io(struct hv_device *device,
 					continue;
 				if (tgt_cpu == q_num)
 					continue;
-				channel = stor_device->stor_chns[tgt_cpu];
+				channel = READ_ONCE(
+					stor_device->stor_chns[tgt_cpu]);
+				if (channel == NULL)
+					continue;
 				if (hv_get_avail_to_write_percent(
 							&channel->outbound)
 						> ring_avail_percent_lowater) {
@@ -1350,7 +1419,10 @@ static int storvsc_do_io(struct hv_device *device,
 			for_each_cpu(tgt_cpu, &stor_device->alloced_cpus) {
 				if (cpumask_test_cpu(tgt_cpu, node_mask))
 					continue;
-				channel = stor_device->stor_chns[tgt_cpu];
+				channel = READ_ONCE(
+					stor_device->stor_chns[tgt_cpu]);
+				if (channel == NULL)
+					continue;
 				if (hv_get_avail_to_write_percent(
 							&channel->outbound)
 						> ring_avail_percent_lowater) {
@@ -1360,7 +1432,14 @@ static int storvsc_do_io(struct hv_device *device,
 			}
 		}
 	} else {
+		spin_lock_irqsave(&device->channel->lock, flags);
+		outgoing_channel = stor_device->stor_chns[q_num];
+		if (outgoing_channel != NULL) {
+			spin_unlock_irqrestore(&device->channel->lock, flags);
+			goto found_channel;
+		}
 		outgoing_channel = get_og_chn(stor_device, q_num);
+		spin_unlock_irqrestore(&device->channel->lock, flags);
 	}
 
 found_channel:
diff --git a/include/linux/hyperv.h b/include/linux/hyperv.h
index edfcd42319ef3..9018b89614b78 100644
--- a/include/linux/hyperv.h
+++ b/include/linux/hyperv.h
@@ -773,6 +773,9 @@ struct vmbus_channel {
 	void (*onchannel_callback)(void *context);
 	void *channel_callback_context;
 
+	void (*change_target_cpu_callback)(struct vmbus_channel *channel,
+			u32 old, u32 new);
+
 	/*
 	 * Synchronize channel scheduling and channel removal; see the inline
 	 * comments in vmbus_chan_sched() and vmbus_reset_channel_cb().
-- 
2.24.0


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* RE: [PATCH 11/11] scsi: storvsc: Re-init stor_chns when a channel interrupt is re-assigned
  2020-04-06  0:15 ` [PATCH 11/11] scsi: storvsc: Re-init stor_chns when a channel interrupt is re-assigned Andrea Parri (Microsoft)
@ 2020-04-06 19:54   ` Long Li
  2020-04-07  0:35     ` Andrea Parri
  2020-04-10 19:35   ` Michael Kelley
  1 sibling, 1 reply; 29+ messages in thread
From: Long Li @ 2020-04-06 19:54 UTC (permalink / raw)
  To: Andrea Parri (Microsoft), linux-kernel
  Cc: KY Srinivasan, Haiyang Zhang, Stephen Hemminger, Wei Liu,
	linux-hyperv, Michael Kelley, Dexuan Cui, Boqun Feng, vkuznets,
	James E.J. Bottomley, Martin K. Petersen, linux-scsi

>Subject: [PATCH 11/11] scsi: storvsc: Re-init stor_chns when a channel
>interrupt is re-assigned
>
>For each storvsc_device, storvsc keeps track of the channel target CPUs
>associated to the device (alloced_cpus) and it uses this information to fill a
>"cache" (stor_chns) mapping CPU->channel according to a certain heuristic.
>Update the alloced_cpus mask and the stor_chns array when a channel of the
>storvsc device is re-assigned to a different CPU.
>
>Signed-off-by: Andrea Parri (Microsoft) <parri.andrea@gmail.com>
>Cc: "James E.J. Bottomley" <jejb@linux.ibm.com>
>Cc: "Martin K. Petersen" <martin.petersen@oracle.com>
>Cc: <linux-scsi@vger.kernel.org>
>---
> drivers/hv/vmbus_drv.c     |  4 ++
> drivers/scsi/storvsc_drv.c | 95 ++++++++++++++++++++++++++++++++++---
>-
> include/linux/hyperv.h     |  3 ++
> 3 files changed, 94 insertions(+), 8 deletions(-)
>
>diff --git a/drivers/hv/vmbus_drv.c b/drivers/hv/vmbus_drv.c index
>0f204640c50c2..f0be41bfcbf57 100644
>--- a/drivers/hv/vmbus_drv.c
>+++ b/drivers/hv/vmbus_drv.c
>@@ -1750,6 +1750,10 @@ static ssize_t target_cpu_store(struct
>vmbus_channel *channel,
> 	 * in on a CPU that is different from the channel target_cpu value.
> 	 */
>
>+	if (channel->change_target_cpu_callback)
>+		(*channel->change_target_cpu_callback)(channel,
>+				channel->target_cpu, target_cpu);
>+
> 	channel->target_cpu = target_cpu;
> 	channel->target_vp = hv_cpu_number_to_vp_number(target_cpu);
> 	channel->numa_node = cpu_to_node(target_cpu); diff --git
>a/drivers/scsi/storvsc_drv.c b/drivers/scsi/storvsc_drv.c index
>fb41636519ee8..a680592b9d32a 100644
>--- a/drivers/scsi/storvsc_drv.c
>+++ b/drivers/scsi/storvsc_drv.c
>@@ -621,6 +621,63 @@ static inline struct storvsc_device
>*get_in_stor_device(
>
> }
>
>+void storvsc_change_target_cpu(struct vmbus_channel *channel, u32 old,
>+u32 new) {
>+	struct storvsc_device *stor_device;
>+	struct vmbus_channel *cur_chn;
>+	bool old_is_alloced = false;
>+	struct hv_device *device;
>+	unsigned long flags;
>+	int cpu;
>+
>+	device = channel->primary_channel ?
>+			channel->primary_channel->device_obj
>+				: channel->device_obj;
>+	stor_device = get_out_stor_device(device);
>+	if (!stor_device)
>+		return;
>+
>+	/* See storvsc_do_io() -> get_og_chn(). */
>+	spin_lock_irqsave(&device->channel->lock, flags);
>+
>+	/*
>+	 * Determines if the storvsc device has other channels assigned to
>+	 * the "old" CPU to update the alloced_cpus mask and the stor_chns
>+	 * array.
>+	 */
>+	if (device->channel != channel && device->channel->target_cpu ==
>old) {
>+		cur_chn = device->channel;
>+		old_is_alloced = true;
>+		goto old_is_alloced;
>+	}
>+	list_for_each_entry(cur_chn, &device->channel->sc_list, sc_list) {
>+		if (cur_chn == channel)
>+			continue;
>+		if (cur_chn->target_cpu == old) {
>+			old_is_alloced = true;
>+			goto old_is_alloced;
>+		}
>+	}
>+
>+old_is_alloced:
>+	if (old_is_alloced)
>+		WRITE_ONCE(stor_device->stor_chns[old], cur_chn);
>+	else
>+		cpumask_clear_cpu(old, &stor_device->alloced_cpus);

If the old cpu is not allocated, is it still necessary to do a cpumask_clear_cpu?

>+
>+	/* "Flush" the stor_chns array. */
>+	for_each_possible_cpu(cpu) {
>+		if (stor_device->stor_chns[cpu] && !cpumask_test_cpu(
>+					cpu, &stor_device->alloced_cpus))
>+			WRITE_ONCE(stor_device->stor_chns[cpu], NULL);
>+	}
>+
>+	WRITE_ONCE(stor_device->stor_chns[new], channel);
>+	cpumask_set_cpu(new, &stor_device->alloced_cpus);
>+
>+	spin_unlock_irqrestore(&device->channel->lock, flags); }
>+
> static void handle_sc_creation(struct vmbus_channel *new_sc)  {
> 	struct hv_device *device = new_sc->primary_channel->device_obj;
>@@ -648,6 +705,8 @@ static void handle_sc_creation(struct vmbus_channel
>*new_sc)
> 		return;
> 	}
>
>+	new_sc->change_target_cpu_callback = storvsc_change_target_cpu;
>+
> 	/* Add the sub-channel to the array of available channels. */
> 	stor_device->stor_chns[new_sc->target_cpu] = new_sc;
> 	cpumask_set_cpu(new_sc->target_cpu, &stor_device-
>>alloced_cpus); @@ -876,6 +935,8 @@ static int storvsc_channel_init(struct
>hv_device *device, bool is_fc)
> 	if (stor_device->stor_chns == NULL)
> 		return -ENOMEM;
>
>+	device->channel->change_target_cpu_callback =
>+storvsc_change_target_cpu;
>+
> 	stor_device->stor_chns[device->channel->target_cpu] = device-
>>channel;
> 	cpumask_set_cpu(device->channel->target_cpu,
> 			&stor_device->alloced_cpus);
>@@ -1248,8 +1309,10 @@ static struct vmbus_channel *get_og_chn(struct
>storvsc_device *stor_device,
> 	const struct cpumask *node_mask;
> 	int num_channels, tgt_cpu;
>
>-	if (stor_device->num_sc == 0)
>+	if (stor_device->num_sc == 0) {
>+		stor_device->stor_chns[q_num] = stor_device->device-
>>channel;
> 		return stor_device->device->channel;
>+	}
>
> 	/*
> 	 * Our channel array is sparsley populated and we @@ -1258,7 +1321,6
>@@ static struct vmbus_channel *get_og_chn(struct storvsc_device
>*stor_device,
> 	 * The strategy is simple:
> 	 * I. Ensure NUMA locality
> 	 * II. Distribute evenly (best effort)
>-	 * III. Mapping is persistent.
> 	 */
>
> 	node_mask = cpumask_of_node(cpu_to_node(q_num));
>@@ -1268,8 +1330,10 @@ static struct vmbus_channel *get_og_chn(struct
>storvsc_device *stor_device,
> 		if (cpumask_test_cpu(tgt_cpu, node_mask))
> 			num_channels++;
> 	}
>-	if (num_channels == 0)
>+	if (num_channels == 0) {
>+		stor_device->stor_chns[q_num] = stor_device->device-
>>channel;
> 		return stor_device->device->channel;
>+	}
>
> 	hash_qnum = q_num;
> 	while (hash_qnum >= num_channels)
>@@ -1295,6 +1359,7 @@ static int storvsc_do_io(struct hv_device *device,
> 	struct storvsc_device *stor_device;
> 	struct vstor_packet *vstor_packet;
> 	struct vmbus_channel *outgoing_channel, *channel;
>+	unsigned long flags;
> 	int ret = 0;
> 	const struct cpumask *node_mask;
> 	int tgt_cpu;
>@@ -1308,10 +1373,11 @@ static int storvsc_do_io(struct hv_device *device,
>
> 	request->device  = device;
> 	/*
>-	 * Select an an appropriate channel to send the request out.
>+	 * Select an appropriate channel to send the request out.
> 	 */
>-	if (stor_device->stor_chns[q_num] != NULL) {
>-		outgoing_channel = stor_device->stor_chns[q_num];
>+	/* See storvsc_change_target_cpu(). */
>+	outgoing_channel = READ_ONCE(stor_device->stor_chns[q_num]);
>+	if (outgoing_channel != NULL) {
> 		if (outgoing_channel->target_cpu == q_num) {
> 			/*
> 			 * Ideally, we want to pick a different channel if @@ -
>1324,7 +1390,10 @@ static int storvsc_do_io(struct hv_device *device,
> 					continue;
> 				if (tgt_cpu == q_num)
> 					continue;
>-				channel = stor_device->stor_chns[tgt_cpu];
>+				channel = READ_ONCE(
>+					stor_device->stor_chns[tgt_cpu]);
>+				if (channel == NULL)
>+					continue;
> 				if (hv_get_avail_to_write_percent(
> 							&channel->outbound)
> 						> ring_avail_percent_lowater)
>{
>@@ -1350,7 +1419,10 @@ static int storvsc_do_io(struct hv_device *device,
> 			for_each_cpu(tgt_cpu, &stor_device->alloced_cpus) {
> 				if (cpumask_test_cpu(tgt_cpu, node_mask))
> 					continue;
>-				channel = stor_device->stor_chns[tgt_cpu];
>+				channel = READ_ONCE(
>+					stor_device->stor_chns[tgt_cpu]);
>+				if (channel == NULL)
>+					continue;
> 				if (hv_get_avail_to_write_percent(
> 							&channel->outbound)
> 						> ring_avail_percent_lowater)
>{
>@@ -1360,7 +1432,14 @@ static int storvsc_do_io(struct hv_device *device,
> 			}
> 		}
> 	} else {
>+		spin_lock_irqsave(&device->channel->lock, flags);
>+		outgoing_channel = stor_device->stor_chns[q_num];
>+		if (outgoing_channel != NULL) {
>+			spin_unlock_irqrestore(&device->channel->lock,
>flags);

Checking outgoing_channel again seems unnecessary. Why not just call get_og_chn()?

>+			goto found_channel;
>+		}
> 		outgoing_channel = get_og_chn(stor_device, q_num);
>+		spin_unlock_irqrestore(&device->channel->lock, flags);
> 	}

With device->channel->lock, now we have one more lock on the I/O issuing path. It doesn't seem optimal as you are trying to protect the code in storvsc_change_target_cpu(), this doesn't need to block concurrent I/O issuers. Maybe moving to RCU is a better approach?

Thanks,
Long


>
> found_channel:
>diff --git a/include/linux/hyperv.h b/include/linux/hyperv.h index
>edfcd42319ef3..9018b89614b78 100644
>--- a/include/linux/hyperv.h
>+++ b/include/linux/hyperv.h
>@@ -773,6 +773,9 @@ struct vmbus_channel {
> 	void (*onchannel_callback)(void *context);
> 	void *channel_callback_context;
>
>+	void (*change_target_cpu_callback)(struct vmbus_channel *channel,
>+			u32 old, u32 new);
>+
> 	/*
> 	 * Synchronize channel scheduling and channel removal; see the inline
> 	 * comments in vmbus_chan_sched() and vmbus_reset_channel_cb().
>--
>2.24.0


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 11/11] scsi: storvsc: Re-init stor_chns when a channel interrupt is re-assigned
  2020-04-06 19:54   ` Long Li
@ 2020-04-07  0:35     ` Andrea Parri
  2020-04-08  2:25       ` Long Li
  0 siblings, 1 reply; 29+ messages in thread
From: Andrea Parri @ 2020-04-07  0:35 UTC (permalink / raw)
  To: Long Li
  Cc: linux-kernel, KY Srinivasan, Haiyang Zhang, Stephen Hemminger,
	Wei Liu, linux-hyperv, Michael Kelley, Dexuan Cui, Boqun Feng,
	vkuznets, James E.J. Bottomley, Martin K. Petersen, linux-scsi

> >@@ -621,6 +621,63 @@ static inline struct storvsc_device
> >*get_in_stor_device(
> >
> > }
> >
> >+void storvsc_change_target_cpu(struct vmbus_channel *channel, u32 old,
> >+u32 new) {
> >+	struct storvsc_device *stor_device;
> >+	struct vmbus_channel *cur_chn;
> >+	bool old_is_alloced = false;
> >+	struct hv_device *device;
> >+	unsigned long flags;
> >+	int cpu;
> >+
> >+	device = channel->primary_channel ?
> >+			channel->primary_channel->device_obj
> >+				: channel->device_obj;
> >+	stor_device = get_out_stor_device(device);
> >+	if (!stor_device)
> >+		return;
> >+
> >+	/* See storvsc_do_io() -> get_og_chn(). */
> >+	spin_lock_irqsave(&device->channel->lock, flags);
> >+
> >+	/*
> >+	 * Determines if the storvsc device has other channels assigned to
> >+	 * the "old" CPU to update the alloced_cpus mask and the stor_chns
> >+	 * array.
> >+	 */
> >+	if (device->channel != channel && device->channel->target_cpu ==
> >old) {
> >+		cur_chn = device->channel;
> >+		old_is_alloced = true;
> >+		goto old_is_alloced;
> >+	}
> >+	list_for_each_entry(cur_chn, &device->channel->sc_list, sc_list) {
> >+		if (cur_chn == channel)
> >+			continue;
> >+		if (cur_chn->target_cpu == old) {
> >+			old_is_alloced = true;
> >+			goto old_is_alloced;
> >+		}
> >+	}
> >+
> >+old_is_alloced:
> >+	if (old_is_alloced)
> >+		WRITE_ONCE(stor_device->stor_chns[old], cur_chn);
> >+	else
> >+		cpumask_clear_cpu(old, &stor_device->alloced_cpus);
> 
> If the old cpu is not allocated, is it still necessary to do a cpumask_clear_cpu?

AFAICT, this really depends on how much we "believe" in the current
heuristic (as implemented by get_og_chn()):  ;-)

The cpumask_clear_cpu() (and the below, dependent "flush" as well)
are intended to re-initialize alloced_cpus and stor_chns in order
for get_og_chn() to re-process/update them.

Also, notice that (both in the current code and after this series)
alloced_cpus can't be offlined and get_og_chn() does rely on this
property (cf., e.g., the loop/check over alloced_cpus/node_mask).

I suspect that giving up on this invariant/property would require
a certain amount of re-design in the heuristic/code in question...


> >@@ -1360,7 +1432,14 @@ static int storvsc_do_io(struct hv_device *device,
> > 			}
> > 		}
> > 	} else {
> >+		spin_lock_irqsave(&device->channel->lock, flags);
> >+		outgoing_channel = stor_device->stor_chns[q_num];
> >+		if (outgoing_channel != NULL) {
> >+			spin_unlock_irqrestore(&device->channel->lock,
> >flags);
> 
> Checking outgoing_channel again seems unnecessary. Why not just call get_og_chn()?

target_cpu_store() might have changed stor_chns (and alloced_cpus)
in the meantime (but before we've acquired the device's lock): the
double check is to make sure we have a "consistent"/an up-to-date
view of stor_chns and alloced_cpus.


> 
> >+			goto found_channel;
> >+		}
> > 		outgoing_channel = get_og_chn(stor_device, q_num);
> >+		spin_unlock_irqrestore(&device->channel->lock, flags);
> > 	}
> 
> With device->channel->lock, now we have one more lock on the I/O issuing path. It doesn't seem optimal as you are trying to protect the code in storvsc_change_target_cpu(), this doesn't need to block concurrent I/O issuers. Maybe moving to RCU is a better approach?

I don't see this as a problem (*and I've validated such conclusion
in experiments, where the "patched kernel" was sometimes performing
slighlty better than the "unpatched kernel" and sometimes slightly
worse...):

On the one hand, the stor_chns array "stabilizes" quite early after
system initialization in "normal" (i.e., common) situations (i.e.,
no channel reassignments, no device hotplugs...); IOW, get_og_chn()
really represents the "rare and slow" path here (but not that slow!
after all...).  Furthermore, notice that even in those "rare cases"
the number of "contending" channels is limited to at most 1 per 4
CPUs IIRC (alloced_cpus is "sparsely populated"...).

The latencies of the RCU grace period (in the order of milliseconds)
would be a major concern for the adoption of RCU here (at least, if
we continue to consider get_og_chn() as an "updater").  I'm afraid
that this could be "too slow" even for our slow path...  ;-/

What am I missing?  ;-)

Thanks,
  Andrea

^ permalink raw reply	[flat|nested] 29+ messages in thread

* RE: [PATCH 11/11] scsi: storvsc: Re-init stor_chns when a channel interrupt is re-assigned
  2020-04-07  0:35     ` Andrea Parri
@ 2020-04-08  2:25       ` Long Li
  2020-04-08 14:54         ` Andrea Parri
  0 siblings, 1 reply; 29+ messages in thread
From: Long Li @ 2020-04-08  2:25 UTC (permalink / raw)
  To: Andrea Parri
  Cc: linux-kernel, KY Srinivasan, Haiyang Zhang, Stephen Hemminger,
	Wei Liu, linux-hyperv, Michael Kelley, Dexuan Cui, Boqun Feng,
	vkuznets, James E.J. Bottomley, Martin K. Petersen, linux-scsi

>Subject: Re: [PATCH 11/11] scsi: storvsc: Re-init stor_chns when a channel
>interrupt is re-assigned
>
>> >@@ -621,6 +621,63 @@ static inline struct storvsc_device
>> >*get_in_stor_device(
>> >
>> > }
>> >
>> >+void storvsc_change_target_cpu(struct vmbus_channel *channel, u32
>> >+old,
>> >+u32 new) {
>> >+	struct storvsc_device *stor_device;
>> >+	struct vmbus_channel *cur_chn;
>> >+	bool old_is_alloced = false;
>> >+	struct hv_device *device;
>> >+	unsigned long flags;
>> >+	int cpu;
>> >+
>> >+	device = channel->primary_channel ?
>> >+			channel->primary_channel->device_obj
>> >+				: channel->device_obj;
>> >+	stor_device = get_out_stor_device(device);
>> >+	if (!stor_device)
>> >+		return;
>> >+
>> >+	/* See storvsc_do_io() -> get_og_chn(). */
>> >+	spin_lock_irqsave(&device->channel->lock, flags);
>> >+
>> >+	/*
>> >+	 * Determines if the storvsc device has other channels assigned to
>> >+	 * the "old" CPU to update the alloced_cpus mask and the stor_chns
>> >+	 * array.
>> >+	 */
>> >+	if (device->channel != channel && device->channel->target_cpu ==
>> >old) {
>> >+		cur_chn = device->channel;
>> >+		old_is_alloced = true;
>> >+		goto old_is_alloced;
>> >+	}
>> >+	list_for_each_entry(cur_chn, &device->channel->sc_list, sc_list) {
>> >+		if (cur_chn == channel)
>> >+			continue;
>> >+		if (cur_chn->target_cpu == old) {
>> >+			old_is_alloced = true;
>> >+			goto old_is_alloced;
>> >+		}
>> >+	}
>> >+
>> >+old_is_alloced:
>> >+	if (old_is_alloced)
>> >+		WRITE_ONCE(stor_device->stor_chns[old], cur_chn);
>> >+	else
>> >+		cpumask_clear_cpu(old, &stor_device->alloced_cpus);
>>
>> If the old cpu is not allocated, is it still necessary to do a cpumask_clear_cpu?
>
>AFAICT, this really depends on how much we "believe" in the current heuristic
>(as implemented by get_og_chn()):  ;-)
>
>The cpumask_clear_cpu() (and the below, dependent "flush" as well) are
>intended to re-initialize alloced_cpus and stor_chns in order for get_og_chn()
>to re-process/update them.
>
>Also, notice that (both in the current code and after this series) alloced_cpus
>can't be offlined and get_og_chn() does rely on this property (cf., e.g., the
>loop/check over alloced_cpus/node_mask).
>
>I suspect that giving up on this invariant/property would require a certain
>amount of re-design in the heuristic/code in question...
>
>
>> >@@ -1360,7 +1432,14 @@ static int storvsc_do_io(struct hv_device
>*device,
>> > 			}
>> > 		}
>> > 	} else {
>> >+		spin_lock_irqsave(&device->channel->lock, flags);
>> >+		outgoing_channel = stor_device->stor_chns[q_num];
>> >+		if (outgoing_channel != NULL) {
>> >+			spin_unlock_irqrestore(&device->channel->lock,
>> >flags);
>>
>> Checking outgoing_channel again seems unnecessary. Why not just call
>get_og_chn()?
>
>target_cpu_store() might have changed stor_chns (and alloced_cpus) in the
>meantime (but before we've acquired the device's lock): the double check is
>to make sure we have a "consistent"/an up-to-date view of stor_chns and
>alloced_cpus.
>
>
>>
>> >+			goto found_channel;
>> >+		}
>> > 		outgoing_channel = get_og_chn(stor_device, q_num);
>> >+		spin_unlock_irqrestore(&device->channel->lock, flags);
>> > 	}
>>
>> With device->channel->lock, now we have one more lock on the I/O issuing
>path. It doesn't seem optimal as you are trying to protect the code in
>storvsc_change_target_cpu(), this doesn't need to block concurrent I/O
>issuers. Maybe moving to RCU is a better approach?
>
>I don't see this as a problem (*and I've validated such conclusion in
>experiments, where the "patched kernel" was sometimes performing slighlty
>better than the "unpatched kernel" and sometimes slightly
>worse...):
>
>On the one hand, the stor_chns array "stabilizes" quite early after system
>initialization in "normal" (i.e., common) situations (i.e., no channel
>reassignments, no device hotplugs...); IOW, get_og_chn() really represents
>the "rare and slow" path here (but not that slow!
>after all...).  Furthermore, notice that even in those "rare cases"
>the number of "contending" channels is limited to at most 1 per 4 CPUs IIRC
>(alloced_cpus is "sparsely populated"...).

Yes I realized it is on the slow path. There is no need to optimize locks.

Reviewed-by; Long Li <longli@microsoft.com>

>
>The latencies of the RCU grace period (in the order of milliseconds) would be a
>major concern for the adoption of RCU here (at least, if we continue to
>consider get_og_chn() as an "updater").  I'm afraid that this could be "too
>slow" even for our slow path...  ;-/
>
>What am I missing?  ;-)
>
>Thanks,
>  Andrea

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 11/11] scsi: storvsc: Re-init stor_chns when a channel interrupt is re-assigned
  2020-04-08  2:25       ` Long Li
@ 2020-04-08 14:54         ` Andrea Parri
  0 siblings, 0 replies; 29+ messages in thread
From: Andrea Parri @ 2020-04-08 14:54 UTC (permalink / raw)
  To: Long Li
  Cc: linux-kernel, KY Srinivasan, Haiyang Zhang, Stephen Hemminger,
	Wei Liu, linux-hyperv, Michael Kelley, Dexuan Cui, Boqun Feng,
	vkuznets, James E.J. Bottomley, Martin K. Petersen, linux-scsi

On Wed, Apr 08, 2020 at 02:25:52AM +0000, Long Li wrote:
> >Subject: Re: [PATCH 11/11] scsi: storvsc: Re-init stor_chns when a channel
> >interrupt is re-assigned
> >
> >> >@@ -621,6 +621,63 @@ static inline struct storvsc_device
> >> >*get_in_stor_device(
> >> >
> >> > }
> >> >
> >> >+void storvsc_change_target_cpu(struct vmbus_channel *channel, u32
> >> >+old,
> >> >+u32 new) {
> >> >+	struct storvsc_device *stor_device;
> >> >+	struct vmbus_channel *cur_chn;
> >> >+	bool old_is_alloced = false;
> >> >+	struct hv_device *device;
> >> >+	unsigned long flags;
> >> >+	int cpu;
> >> >+
> >> >+	device = channel->primary_channel ?
> >> >+			channel->primary_channel->device_obj
> >> >+				: channel->device_obj;
> >> >+	stor_device = get_out_stor_device(device);
> >> >+	if (!stor_device)
> >> >+		return;
> >> >+
> >> >+	/* See storvsc_do_io() -> get_og_chn(). */
> >> >+	spin_lock_irqsave(&device->channel->lock, flags);
> >> >+
> >> >+	/*
> >> >+	 * Determines if the storvsc device has other channels assigned to
> >> >+	 * the "old" CPU to update the alloced_cpus mask and the stor_chns
> >> >+	 * array.
> >> >+	 */
> >> >+	if (device->channel != channel && device->channel->target_cpu ==
> >> >old) {
> >> >+		cur_chn = device->channel;
> >> >+		old_is_alloced = true;
> >> >+		goto old_is_alloced;
> >> >+	}
> >> >+	list_for_each_entry(cur_chn, &device->channel->sc_list, sc_list) {
> >> >+		if (cur_chn == channel)
> >> >+			continue;
> >> >+		if (cur_chn->target_cpu == old) {
> >> >+			old_is_alloced = true;
> >> >+			goto old_is_alloced;
> >> >+		}
> >> >+	}
> >> >+
> >> >+old_is_alloced:
> >> >+	if (old_is_alloced)
> >> >+		WRITE_ONCE(stor_device->stor_chns[old], cur_chn);
> >> >+	else
> >> >+		cpumask_clear_cpu(old, &stor_device->alloced_cpus);
> >>
> >> If the old cpu is not allocated, is it still necessary to do a cpumask_clear_cpu?
> >
> >AFAICT, this really depends on how much we "believe" in the current heuristic
> >(as implemented by get_og_chn()):  ;-)
> >
> >The cpumask_clear_cpu() (and the below, dependent "flush" as well) are
> >intended to re-initialize alloced_cpus and stor_chns in order for get_og_chn()
> >to re-process/update them.
> >
> >Also, notice that (both in the current code and after this series) alloced_cpus
> >can't be offlined and get_og_chn() does rely on this property (cf., e.g., the
> >loop/check over alloced_cpus/node_mask).
> >
> >I suspect that giving up on this invariant/property would require a certain
> >amount of re-design in the heuristic/code in question...
> >
> >
> >> >@@ -1360,7 +1432,14 @@ static int storvsc_do_io(struct hv_device
> >*device,
> >> > 			}
> >> > 		}
> >> > 	} else {
> >> >+		spin_lock_irqsave(&device->channel->lock, flags);
> >> >+		outgoing_channel = stor_device->stor_chns[q_num];
> >> >+		if (outgoing_channel != NULL) {
> >> >+			spin_unlock_irqrestore(&device->channel->lock,
> >> >flags);
> >>
> >> Checking outgoing_channel again seems unnecessary. Why not just call
> >get_og_chn()?
> >
> >target_cpu_store() might have changed stor_chns (and alloced_cpus) in the
> >meantime (but before we've acquired the device's lock): the double check is
> >to make sure we have a "consistent"/an up-to-date view of stor_chns and
> >alloced_cpus.
> >
> >
> >>
> >> >+			goto found_channel;
> >> >+		}
> >> > 		outgoing_channel = get_og_chn(stor_device, q_num);
> >> >+		spin_unlock_irqrestore(&device->channel->lock, flags);
> >> > 	}
> >>
> >> With device->channel->lock, now we have one more lock on the I/O issuing
> >path. It doesn't seem optimal as you are trying to protect the code in
> >storvsc_change_target_cpu(), this doesn't need to block concurrent I/O
> >issuers. Maybe moving to RCU is a better approach?
> >
> >I don't see this as a problem (*and I've validated such conclusion in
> >experiments, where the "patched kernel" was sometimes performing slighlty
> >better than the "unpatched kernel" and sometimes slightly
> >worse...):
> >
> >On the one hand, the stor_chns array "stabilizes" quite early after system
> >initialization in "normal" (i.e., common) situations (i.e., no channel
> >reassignments, no device hotplugs...); IOW, get_og_chn() really represents
> >the "rare and slow" path here (but not that slow!
> >after all...).  Furthermore, notice that even in those "rare cases"
> >the number of "contending" channels is limited to at most 1 per 4 CPUs IIRC
> >(alloced_cpus is "sparsely populated"...).
> 
> Yes I realized it is on the slow path. There is no need to optimize locks.
> 
> Reviewed-by; Long Li <longli@microsoft.com>

Thanks for the review, Long.  I've added the tag.

  Andrea


> 
> >
> >The latencies of the RCU grace period (in the order of milliseconds) would be a
> >major concern for the adoption of RCU here (at least, if we continue to
> >consider get_og_chn() as an "updater").  I'm afraid that this could be "too
> >slow" even for our slow path...  ;-/
> >
> >What am I missing?  ;-)
> >
> >Thanks,
> >  Andrea

^ permalink raw reply	[flat|nested] 29+ messages in thread

* RE: [PATCH 01/11] Drivers: hv: vmbus: Always handle the VMBus messages on CPU0
  2020-04-06  0:15 ` [PATCH 01/11] Drivers: hv: vmbus: Always handle the VMBus messages on CPU0 Andrea Parri (Microsoft)
@ 2020-04-10 17:18   ` Michael Kelley
  0 siblings, 0 replies; 29+ messages in thread
From: Michael Kelley @ 2020-04-10 17:18 UTC (permalink / raw)
  To: Andrea Parri (Microsoft), linux-kernel
  Cc: KY Srinivasan, Haiyang Zhang, Stephen Hemminger, Wei Liu,
	linux-hyperv, Dexuan Cui, Boqun Feng, vkuznets

From: Andrea Parri (Microsoft) <parri.andrea@gmail.com> Sent: Sunday, April 5, 2020 5:15 PM
> 
> A Linux guest have to pick a "connect CPU" to communicate with the
> Hyper-V host.  This CPU can not be taken offline because Hyper-V does
> not provide a way to change that CPU assignment.
> 
> Current code sets the connect CPU to whatever CPU ends up running the
> function vmbus_negotiate_version(), and this will generate problems if
> that CPU is taken offine.
> 
> Establish CPU0 as the connect CPU, and add logics to prevents the
> connect CPU from being taken offline.   We could pick some other CPU,
> and we could pick that "other CPU" dynamically if there was a reason to
> do so at some point in the future.  But for now, #defining the connect
> CPU to 0 is the most straightforward and least complex solution.
> 
> While on this, add inline comments explaining "why" offer and rescind
> messages should not be handled by a same serialized work queue.
> 
> Suggested-by: Dexuan Cui <decui@microsoft.com>
> Signed-off-by: Andrea Parri (Microsoft) <parri.andrea@gmail.com>
> Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com>
> ---
>  drivers/hv/connection.c   | 20 +-------------------
>  drivers/hv/hv.c           |  7 +++++++
>  drivers/hv/hyperv_vmbus.h | 11 ++++++-----
>  drivers/hv/vmbus_drv.c    | 20 +++++++++++++++++---
>  4 files changed, 31 insertions(+), 27 deletions(-)

Reviewed-by: Michael Kelley <mikelley@microsoft.com>

^ permalink raw reply	[flat|nested] 29+ messages in thread

* RE: [PATCH 02/11] Drivers: hv: vmbus: Don't bind the offer&rescind works to a specific CPU
  2020-04-06  0:15 ` [PATCH 02/11] Drivers: hv: vmbus: Don't bind the offer&rescind works to a specific CPU Andrea Parri (Microsoft)
@ 2020-04-10 17:23   ` Michael Kelley
  0 siblings, 0 replies; 29+ messages in thread
From: Michael Kelley @ 2020-04-10 17:23 UTC (permalink / raw)
  To: Andrea Parri (Microsoft), linux-kernel
  Cc: KY Srinivasan, Haiyang Zhang, Stephen Hemminger, Wei Liu,
	linux-hyperv, Dexuan Cui, Boqun Feng, vkuznets

From: Andrea Parri (Microsoft) <parri.andrea@gmail.com> Sent: Sunday, April 5, 2020 5:15 PM
> 
> The offer and rescind works are currently scheduled on the so called
> "connect CPU".  However, this is not really needed: we can synchronize
> the works by relying on the usage of the offer_in_progress counter and
> of the channel_mutex mutex.  This synchronization is already in place.
> So, remove this unnecessary "bind to the connect CPU" constraint and
> update the inline comments accordingly.
> 
> Suggested-by: Dexuan Cui <decui@microsoft.com>
> Signed-off-by: Andrea Parri (Microsoft) <parri.andrea@gmail.com>
> ---
>  drivers/hv/channel_mgmt.c | 21 ++++++++++++++++-----
>  drivers/hv/vmbus_drv.c    | 39 ++++++++++++++++++++++++++++-----------
>  2 files changed, 44 insertions(+), 16 deletions(-)
> 

Reviewed-by: Michael Kelley <mikelley@microsoft.com>

^ permalink raw reply	[flat|nested] 29+ messages in thread

* RE: [PATCH 03/11] Drivers: hv: vmbus: Replace the per-CPU channel lists with a global array of channels
  2020-04-06  0:15 ` [PATCH 03/11] Drivers: hv: vmbus: Replace the per-CPU channel lists with a global array of channels Andrea Parri (Microsoft)
@ 2020-04-10 17:29   ` Michael Kelley
  0 siblings, 0 replies; 29+ messages in thread
From: Michael Kelley @ 2020-04-10 17:29 UTC (permalink / raw)
  To: Andrea Parri (Microsoft), linux-kernel
  Cc: KY Srinivasan, Haiyang Zhang, Stephen Hemminger, Wei Liu,
	linux-hyperv, Dexuan Cui, Boqun Feng, vkuznets

From: Andrea Parri (Microsoft) <parri.andrea@gmail.com> Sent: Sunday, April 5, 2020 5:15 PM
> 
> When Hyper-V sends an interrupt to the guest, the guest has to figure
> out which channel the interrupt is associated with.  Hyper-V sets a bit
> in a memory page that is shared with the guest, indicating a particular
> "relid" that the interrupt is associated with.  The current Linux code
> then uses a set of per-CPU linked lists to map a given "relid" to a
> pointer to a channel structure.
> 
> This design introduces a synchronization problem if the CPU that Hyper-V
> will interrupt for a certain channel is changed.  If the interrupt comes
> on the "old CPU" and the channel was already moved to the per-CPU list
> of the "new CPU", then the relid -> channel mapping will fail and the
> interrupt is dropped.  Similarly, if the interrupt comes on the new CPU
> but the channel was not moved to the per-CPU list of the new CPU, then
> the mapping will fail and the interrupt is dropped.
> 
> Relids are integers ranging from 0 to 2047.  The mapping from relids to
> channel structures can be done by setting up an array with 2048 entries,
> each entry being a pointer to a channel structure (hence total size ~16K
> bytes, which is not a problem).  The array is global, so there are no
> per-CPU linked lists to update.  The array can be searched and updated
> by loading from/storing to the array at the specified index.  With no
> per-CPU data structures, the above mentioned synchronization problem is
> avoided and the relid2channel() function gets simpler.
> 
> Suggested-by: Michael Kelley <mikelley@microsoft.com>
> Signed-off-by: Andrea Parri (Microsoft) <parri.andrea@gmail.com>
> ---
> NOTE.  The inline comment in vmbus_channel_map_relid() follows the argument
> discussed in the RFC [1].  An attempt has been made to make this argument
> more 'explicit' (and, hopefully, more 'robust' against future changes) by
> adding a full memory barrier in vmbus_channel_map_relid(): the barrier takes
> the role of the 'implicit' full memory barriers from queue_work() and from
> check_ready_for_resume_event() referred to in the RFC.
> 
> [1] https://lore.kernel.org/linux-hyperv/20200403133826.GA25401@andrea/
> 
>  drivers/hv/channel_mgmt.c | 179 ++++++++++++++++++++++++--------------
>  drivers/hv/connection.c   |  38 +++-----
>  drivers/hv/hv.c           |   2 -
>  drivers/hv/hyperv_vmbus.h |  14 +--
>  drivers/hv/vmbus_drv.c    |  48 ++++++----
>  include/linux/hyperv.h    |   5 --
>  6 files changed, 160 insertions(+), 126 deletions(-)
> 

Reviewed-by: Michael Kelley <mikelley@microsoft.com>

^ permalink raw reply	[flat|nested] 29+ messages in thread

* RE: [PATCH 04/11] hv_netvsc: Disable NAPI before closing the VMBus channel
  2020-04-06  0:15 ` [PATCH 04/11] hv_netvsc: Disable NAPI before closing the VMBus channel Andrea Parri (Microsoft)
@ 2020-04-10 19:20   ` Michael Kelley
  0 siblings, 0 replies; 29+ messages in thread
From: Michael Kelley @ 2020-04-10 19:20 UTC (permalink / raw)
  To: Andrea Parri (Microsoft), linux-kernel
  Cc: KY Srinivasan, Haiyang Zhang, Stephen Hemminger, Wei Liu,
	linux-hyperv, Dexuan Cui, Boqun Feng, vkuznets,
	Stephen Hemminger, David S. Miller, netdev

From: Andrea Parri (Microsoft) <parri.andrea@gmail.com> Sent: Sunday, April 5, 2020 5:15 PM
> 
> vmbus_chan_sched() might call the netvsc driver callback function that
> ends up scheduling NAPI work.  This "work" can access the channel ring
> buffer, so we must ensure that any such work is completed and that the
> ring buffer is no longer being accessed before freeing the ring buffer
> data structure in the channel closure path.  To this end, disable NAPI
> before calling vmbus_close() in netvsc_device_remove().
> 
> Suggested-by: Michael Kelley <mikelley@microsoft.com>
> Signed-off-by: Andrea Parri (Microsoft) <parri.andrea@gmail.com>
> Acked-by: Stephen Hemminger <stephen@networkplumber.org>
> Cc: "David S. Miller" <davem@davemloft.net>
> Cc: <netdev@vger.kernel.org>
> ---
>  drivers/hv/channel.c        | 6 ++++++
>  drivers/net/hyperv/netvsc.c | 7 +++++--
>  2 files changed, 11 insertions(+), 2 deletions(-)
> 

Reviewed-by: Michael Kelley <mikelley@microsoft.com>

^ permalink raw reply	[flat|nested] 29+ messages in thread

* RE: [PATCH 05/11] hv_utils: Always execute the fcopy and vss callbacks in a tasklet
  2020-04-06  0:15 ` [PATCH 05/11] hv_utils: Always execute the fcopy and vss callbacks in a tasklet Andrea Parri (Microsoft)
@ 2020-04-10 19:26   ` Michael Kelley
  0 siblings, 0 replies; 29+ messages in thread
From: Michael Kelley @ 2020-04-10 19:26 UTC (permalink / raw)
  To: Andrea Parri (Microsoft), linux-kernel
  Cc: KY Srinivasan, Haiyang Zhang, Stephen Hemminger, Wei Liu,
	linux-hyperv, Dexuan Cui, Boqun Feng, vkuznets

From: Andrea Parri (Microsoft) <parri.andrea@gmail.com> Sent: Sunday, April 5, 2020 5:15 PM
> 
> The fcopy and vss callback functions could be running in a tasklet
> at the same time they are called in hv_poll_channel().  Current code
> serializes the invocations of these functions, and their accesses to
> the channel ring buffer, by sending an IPI to the CPU that is allowed
> to access the ring buffer, cf. hv_poll_channel().  This IPI mechanism
> becomes infeasible if we allow changing the CPU that a channel will
> interrupt.  Instead modify the callback wrappers to always execute
> the fcopy and vss callbacks in a tasklet, thus mirroring the solution
> for the kvp callback functions adopted since commit a3ade8cc474d8
> ("HV: properly delay KVP packets when negotiation is in progress").
> This will ensure that the callback function can't run on two CPUs at
> the same time.
> 
> Suggested-by: Michael Kelley <mikelley@microsoft.com>
> Signed-off-by: Andrea Parri (Microsoft) <parri.andrea@gmail.com>
> ---
>  drivers/hv/hv_fcopy.c     | 2 +-
>  drivers/hv/hv_snapshot.c  | 2 +-
>  drivers/hv/hyperv_vmbus.h | 7 +------
>  3 files changed, 3 insertions(+), 8 deletions(-)
> 

Reviewed-by: Michael Kelley <mikelley@microsoft.com>

^ permalink raw reply	[flat|nested] 29+ messages in thread

* RE: [PATCH 06/11] Drivers: hv: vmbus: Use a spin lock for synchronizing channel scheduling vs. channel removal
  2020-04-06  0:15 ` [PATCH 06/11] Drivers: hv: vmbus: Use a spin lock for synchronizing channel scheduling vs. channel removal Andrea Parri (Microsoft)
@ 2020-04-10 19:28   ` Michael Kelley
  0 siblings, 0 replies; 29+ messages in thread
From: Michael Kelley @ 2020-04-10 19:28 UTC (permalink / raw)
  To: Andrea Parri (Microsoft), linux-kernel
  Cc: KY Srinivasan, Haiyang Zhang, Stephen Hemminger, Wei Liu,
	linux-hyperv, Dexuan Cui, Boqun Feng, vkuznets

From: Andrea Parri (Microsoft) <parri.andrea@gmail.com> Sent: Sunday, April 5, 2020 5:15 PM
> 
> Since vmbus_chan_sched() dereferences the ring buffer pointer, we have
> to make sure that the ring buffer data structures don't get freed while
> such dereferencing is happening.  Current code does this by sending an
> IPI to the CPU that is allowed to access that ring buffer from interrupt
> level, cf., vmbus_reset_channel_cb().  But with the new functionality
> to allow changing the CPU that a channel will interrupt, we can't be
> sure what CPU will be running the vmbus_chan_sched() function for a
> particular channel, so the current IPI mechanism is infeasible.
> 
> Instead synchronize vmbus_chan_sched() and vmbus_reset_channel_cb() by
> using the (newly introduced) per-channel spin lock "sched_lock".  Move
> the test for onchannel_callback being NULL before the "switch" control
> statement in vmbus_chan_sched(), in order to not access the ring buffer
> if the vmbus_reset_channel_cb() has been completed on the channel.
> 
> Suggested-by: Michael Kelley <mikelley@microsoft.com>
> Signed-off-by: Andrea Parri (Microsoft) <parri.andrea@gmail.com>
> ---
>  drivers/hv/channel.c      | 24 +++++++-----------------
>  drivers/hv/channel_mgmt.c |  1 +
>  drivers/hv/vmbus_drv.c    | 30 +++++++++++++++++-------------
>  include/linux/hyperv.h    |  6 ++++++
>  4 files changed, 31 insertions(+), 30 deletions(-)
> 

Reviewed-by:  Michael Kelley <mikelley@microsoft.com>

^ permalink raw reply	[flat|nested] 29+ messages in thread

* RE: [PATCH 07/11] PCI: hv: Prepare hv_compose_msi_msg() for the VMBus-channel-interrupt-to-vCPU reassignment functionality
  2020-04-06  0:15 ` [PATCH 07/11] PCI: hv: Prepare hv_compose_msi_msg() for the VMBus-channel-interrupt-to-vCPU reassignment functionality Andrea Parri (Microsoft)
@ 2020-04-10 19:30   ` Michael Kelley
  0 siblings, 0 replies; 29+ messages in thread
From: Michael Kelley @ 2020-04-10 19:30 UTC (permalink / raw)
  To: Andrea Parri (Microsoft), linux-kernel
  Cc: KY Srinivasan, Haiyang Zhang, Stephen Hemminger, Wei Liu,
	linux-hyperv, Dexuan Cui, Boqun Feng, vkuznets,
	Lorenzo Pieralisi, Andrew Murray, Bjorn Helgaas, linux-pci

From: Andrea Parri (Microsoft) <parri.andrea@gmail.com> Sent: Sunday, April 5, 2020 5:15 PM
> 
> The current implementation of hv_compose_msi_msg() is incompatible with
> the new functionality that allows changing the vCPU a VMBus channel will
> interrupt: if this function always calls hv_pci_onchannelcallback() in
> the polling loop, the interrupt going to a different CPU could cause
> hv_pci_onchannelcallback() to be running simultaneously in a tasklet,
> which will break.  The current code also has a problem in that it is not
> synchronized with vmbus_reset_channel_cb(): hv_compose_msi_msg() could
> be accessing the ring buffer via the call of hv_pci_onchannelcallback()
> well after the time that vmbus_reset_channel_cb() has finished.
> 
> Fix these issues as follows.  Disable the channel tasklet before
> entering the polling loop in hv_compose_msi_msg() and re-enable it when
> done.  This will prevent hv_pci_onchannelcallback() from running in a
> tasklet on a different CPU.  Moreover, poll by always calling
> hv_pci_onchannelcallback(), but check the channel callback function for
> NULL and invoke the callback within a sched_lock critical section.  This
> will prevent hv_compose_msi_msg() from accessing the ring buffer after
> vmbus_reset_channel_cb() has acquired the sched_lock spinlock.
> 
> Suggested-by: Michael Kelley <mikelley@microsoft.com>
> Signed-off-by: Andrea Parri (Microsoft) <parri.andrea@gmail.com>
> Cc: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
> Cc: Andrew Murray <amurray@thegoodpenguin.co.uk>
> Cc: Bjorn Helgaas <bhelgaas@google.com>
> Cc: <linux-pci@vger.kernel.org>
> ---
>  drivers/pci/controller/pci-hyperv.c | 44 ++++++++++++++++++-----------
>  1 file changed, 28 insertions(+), 16 deletions(-)
> 

Reviewed-by: Michael Kelley <mikelley@microsoft.com>

^ permalink raw reply	[flat|nested] 29+ messages in thread

* RE: [PATCH 08/11] Drivers: hv: vmbus: Remove the unused HV_LOCALIZED channel affinity logic
  2020-04-06  0:15 ` [PATCH 08/11] Drivers: hv: vmbus: Remove the unused HV_LOCALIZED channel affinity logic Andrea Parri (Microsoft)
@ 2020-04-10 19:32   ` Michael Kelley
  0 siblings, 0 replies; 29+ messages in thread
From: Michael Kelley @ 2020-04-10 19:32 UTC (permalink / raw)
  To: Andrea Parri (Microsoft), linux-kernel
  Cc: KY Srinivasan, Haiyang Zhang, Stephen Hemminger, Wei Liu,
	linux-hyperv, Dexuan Cui, Boqun Feng, vkuznets

From: Andrea Parri (Microsoft) <parri.andrea@gmail.com> Sent: Sunday, April 5, 2020 5:15 PM
> 
> The logic is unused since commit 509879bdb30b8 ("Drivers: hv: Introduce
> a policy for controlling channel affinity").
> 
> This logic assumes that a channel target_cpu doesn't change during the
> lifetime of a channel, but this assumption is incompatible with the new
> functionality that allows changing the vCPU a channel will interrupt.
> 
> Signed-off-by: Andrea Parri (Microsoft) <parri.andrea@gmail.com>
> ---
>  drivers/hv/channel_mgmt.c | 105 +++++++++-----------------------------
>  include/linux/hyperv.h    |  27 ----------
>  2 files changed, 25 insertions(+), 107 deletions(-)
> 

Reviewed-by: Michael Kelley <mikelley@microsoft.com>

^ permalink raw reply	[flat|nested] 29+ messages in thread

* RE: [PATCH 09/11] Drivers: hv: vmbus: Synchronize init_vp_index() vs. CPU hotplug
  2020-04-06  0:15 ` [PATCH 09/11] Drivers: hv: vmbus: Synchronize init_vp_index() vs. CPU hotplug Andrea Parri (Microsoft)
@ 2020-04-10 19:33   ` Michael Kelley
  0 siblings, 0 replies; 29+ messages in thread
From: Michael Kelley @ 2020-04-10 19:33 UTC (permalink / raw)
  To: Andrea Parri (Microsoft), linux-kernel
  Cc: KY Srinivasan, Haiyang Zhang, Stephen Hemminger, Wei Liu,
	linux-hyperv, Dexuan Cui, Boqun Feng, vkuznets

From: Andrea Parri (Microsoft) <parri.andrea@gmail.com> Sent: Sunday, April 5, 2020 5:15 PM
> 
> init_vp_index() may access the cpu_online_mask mask via its calls of
> cpumask_of_node().  Make sure to protect these accesses with a
> cpus_read_lock() critical section.
> 
> Also, remove some (hardcoded) instances of CPU(0) from init_vp_index()
> and replace them with VMBUS_CONNECT_CPU.  The connect CPU can not go
> offline, since Hyper-V does not provide a way to change it.
> 
> Finally, order the accesses of target_cpu from init_vp_index() and
> hv_synic_cleanup() by relying on the channel_mutex; this is achieved
> by moving the call of init_vp_index() into vmbus_process_offer().
> 
> Signed-off-by: Andrea Parri (Microsoft) <parri.andrea@gmail.com>
> ---
>  drivers/hv/channel_mgmt.c | 47 ++++++++++++++++++++++++++++-----------
>  drivers/hv/hv.c           |  7 +++---
>  2 files changed, 38 insertions(+), 16 deletions(-)
> 

Reviewed-by: Michael Kelley <mikelley@microsoft.com>

^ permalink raw reply	[flat|nested] 29+ messages in thread

* RE: [PATCH 10/11] Drivers: hv: vmbus: Introduce the CHANNELMSG_MODIFYCHANNEL message type
  2020-04-06  0:15 ` [PATCH 10/11] Drivers: hv: vmbus: Introduce the CHANNELMSG_MODIFYCHANNEL message type Andrea Parri (Microsoft)
@ 2020-04-10 19:34   ` Michael Kelley
  0 siblings, 0 replies; 29+ messages in thread
From: Michael Kelley @ 2020-04-10 19:34 UTC (permalink / raw)
  To: Andrea Parri (Microsoft), linux-kernel
  Cc: KY Srinivasan, Haiyang Zhang, Stephen Hemminger, Wei Liu,
	linux-hyperv, Dexuan Cui, Boqun Feng, vkuznets

From: Andrea Parri (Microsoft) <parri.andrea@gmail.com> Sent: Sunday, April 5, 2020 5:15 PM
> 
> VMBus version 4.1 and later support the CHANNELMSG_MODIFYCHANNEL(22)
> message type which can be used to request Hyper-V to change the vCPU
> that a channel will interrupt.
> 
> Introduce the CHANNELMSG_MODIFYCHANNEL message type, and define the
> vmbus_send_modifychannel() function to send CHANNELMSG_MODIFYCHANNEL
> requests to the host via a hypercall.  The function is then used to
> define a sysfs "store" operation, which allows to change the (v)CPU
> the channel will interrupt by using the sysfs interface.  The feature
> can be used for load balancing or other purposes.
> 
> One interesting catch here is that Hyper-V can *not* currently ACK
> CHANNELMSG_MODIFYCHANNEL messages with the promise that (after the ACK
> is sent) the channel won't send any more interrupts to the "old" CPU.
> 
> The peculiarity of the CHANNELMSG_MODIFYCHANNEL messages is problematic
> if the user want to take a CPU offline, since we don't want to take a
> CPU offline (and, potentially, "lose" channel interrupts on such CPU)
> if the host is still processing a CHANNELMSG_MODIFYCHANNEL message
> associated to that CPU.
> 
> It is worth mentioning, however, that we have been unable to observe
> the above mentioned "race": in all our tests, CHANNELMSG_MODIFYCHANNEL
> requests appeared *as if* they were processed synchronously by the host.
> 
> Suggested-by: Michael Kelley <mikelley@microsoft.com>
> Signed-off-by: Andrea Parri (Microsoft) <parri.andrea@gmail.com>
> ---
>  drivers/hv/channel.c      |  28 ++++++++++
>  drivers/hv/channel_mgmt.c |   2 +-
>  drivers/hv/hv_trace.h     |  19 +++++++
>  drivers/hv/vmbus_drv.c    | 108 +++++++++++++++++++++++++++++++++++++-
>  include/linux/hyperv.h    |  10 +++-
>  5 files changed, 163 insertions(+), 4 deletions(-)
> 

Reviewed-by: Michael Kelley <mikelley@microsoft.com>

^ permalink raw reply	[flat|nested] 29+ messages in thread

* RE: [PATCH 11/11] scsi: storvsc: Re-init stor_chns when a channel interrupt is re-assigned
  2020-04-06  0:15 ` [PATCH 11/11] scsi: storvsc: Re-init stor_chns when a channel interrupt is re-assigned Andrea Parri (Microsoft)
  2020-04-06 19:54   ` Long Li
@ 2020-04-10 19:35   ` Michael Kelley
  1 sibling, 0 replies; 29+ messages in thread
From: Michael Kelley @ 2020-04-10 19:35 UTC (permalink / raw)
  To: Andrea Parri (Microsoft), linux-kernel
  Cc: KY Srinivasan, Haiyang Zhang, Stephen Hemminger, Wei Liu,
	linux-hyperv, Dexuan Cui, Boqun Feng, vkuznets,
	James E.J. Bottomley, Martin K. Petersen, linux-scsi

From: Andrea Parri (Microsoft) <parri.andrea@gmail.com> Sent: Sunday, April 5, 2020 5:15 PM
> 
> For each storvsc_device, storvsc keeps track of the channel target CPUs
> associated to the device (alloced_cpus) and it uses this information to
> fill a "cache" (stor_chns) mapping CPU->channel according to a certain
> heuristic.  Update the alloced_cpus mask and the stor_chns array when a
> channel of the storvsc device is re-assigned to a different CPU.
> 
> Signed-off-by: Andrea Parri (Microsoft) <parri.andrea@gmail.com>
> Cc: "James E.J. Bottomley" <jejb@linux.ibm.com>
> Cc: "Martin K. Petersen" <martin.petersen@oracle.com>
> Cc: <linux-scsi@vger.kernel.org>
> ---
>  drivers/hv/vmbus_drv.c     |  4 ++
>  drivers/scsi/storvsc_drv.c | 95 ++++++++++++++++++++++++++++++++++----
>  include/linux/hyperv.h     |  3 ++
>  3 files changed, 94 insertions(+), 8 deletions(-)
> 

Reviewed-by: Michael Kelley <mikelley@microsoft.com>

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 00/11] VMBus channel interrupt reassignment
  2020-04-06  0:15 [PATCH 00/11] VMBus channel interrupt reassignment Andrea Parri (Microsoft)
                   ` (10 preceding siblings ...)
  2020-04-06  0:15 ` [PATCH 11/11] scsi: storvsc: Re-init stor_chns when a channel interrupt is re-assigned Andrea Parri (Microsoft)
@ 2020-04-13 15:48 ` Wei Liu
  2020-04-13 17:07   ` Andrea Parri
  11 siblings, 1 reply; 29+ messages in thread
From: Wei Liu @ 2020-04-13 15:48 UTC (permalink / raw)
  To: Andrea Parri (Microsoft)
  Cc: linux-kernel, K . Y . Srinivasan, Haiyang Zhang,
	Stephen Hemminger, Wei Liu, linux-hyperv, Michael Kelley,
	Dexuan Cui, Boqun Feng, Vitaly Kuznetsov, David S. Miller,
	Lorenzo Pieralisi, Andrew Murray, Bjorn Helgaas,
	James E.J. Bottomley, Martin K. Petersen

On Mon, Apr 06, 2020 at 02:15:03AM +0200, Andrea Parri (Microsoft) wrote:
> Hi all,
> 
> This is a follow-up on the RFC submission [1].  The series introduces
> changes in the VMBus drivers to reassign the CPU that a VMbus channel
> will interrupt.  This feature can be used for load balancing or other
> purposes (e.g. CPU offlining).  The submission integrates feedback in
> the RFC to amend the handling of the 'array of channels' (patch #3).
> 
> Thanks,
>   Andrea
> 
> [1] https://lkml.kernel.org/r/20200325225505.23998-1-parri.andrea@gmail.com
> 
> Cc: "David S. Miller" <davem@davemloft.net>
> Cc: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
> Cc: Andrew Murray <amurray@thegoodpenguin.co.uk>
> Cc: Bjorn Helgaas <bhelgaas@google.com>
> Cc: "James E.J. Bottomley" <jejb@linux.ibm.com>
> Cc: "Martin K. Petersen" <martin.petersen@oracle.com>
> 
> Andrea Parri (Microsoft) (11):
>   Drivers: hv: vmbus: Always handle the VMBus messages on CPU0
>   Drivers: hv: vmbus: Don't bind the offer&rescind works to a specific
>     CPU
>   Drivers: hv: vmbus: Replace the per-CPU channel lists with a global
>     array of channels
>   hv_netvsc: Disable NAPI before closing the VMBus channel
>   hv_utils: Always execute the fcopy and vss callbacks in a tasklet
>   Drivers: hv: vmbus: Use a spin lock for synchronizing channel
>     scheduling vs. channel removal
>   PCI: hv: Prepare hv_compose_msi_msg() for the
>     VMBus-channel-interrupt-to-vCPU reassignment functionality
>   Drivers: hv: vmbus: Remove the unused HV_LOCALIZED channel affinity
>     logic
>   Drivers: hv: vmbus: Synchronize init_vp_index() vs. CPU hotplug
>   Drivers: hv: vmbus: Introduce the CHANNELMSG_MODIFYCHANNEL message
>     type
>   scsi: storvsc: Re-init stor_chns when a channel interrupt is
>     re-assigned
> 

Applied to hyperv-next. Thanks.

This hunk in patch 10 doesn't apply cleanly because it conflicts with
Vitaly's patch.

diff --git a/drivers/hv/channel_mgmt.c b/drivers/hv/channel_mgmt.c
index 3785beead503d..1058c814ab06e 100644
--- a/drivers/hv/channel_mgmt.c
+++ b/drivers/hv/channel_mgmt.c
@@ -1377,7 +1377,7 @@ channel_message_table[CHANNELMSG_COUNT] = {
        { CHANNELMSG_19,                        0, NULL },
        { CHANNELMSG_20,                        0, NULL },
        { CHANNELMSG_TL_CONNECT_REQUEST,        0, NULL },
-       { CHANNELMSG_22,                        0, NULL },
+       { CHANNELMSG_MODIFYCHANNEL,             0, NULL },
        { CHANNELMSG_TL_CONNECT_RESULT,         0, NULL },
 };

I have fixed it up. New hunk looks like:

diff --git a/drivers/hv/channel_mgmt.c b/drivers/hv/channel_mgmt.c
index a6b21838e2de..9c62eb5e4135 100644
--- a/drivers/hv/channel_mgmt.c
+++ b/drivers/hv/channel_mgmt.c
@@ -1380,7 +1380,7 @@ channel_message_table[CHANNELMSG_COUNT] = {
        { CHANNELMSG_19,                        0, NULL, 0},
        { CHANNELMSG_20,                        0, NULL, 0},
        { CHANNELMSG_TL_CONNECT_REQUEST,        0, NULL, 0},
-       { CHANNELMSG_22,                        0, NULL, 0},
+       { CHANNELMSG_MODIFYCHANNEL,             0, NULL, 0},
        { CHANNELMSG_TL_CONNECT_RESULT,         0, NULL, 0},
 };

Let me know if I messed it up.

Wei.

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* Re: [PATCH 00/11] VMBus channel interrupt reassignment
  2020-04-13 15:48 ` [PATCH 00/11] VMBus channel interrupt reassignment Wei Liu
@ 2020-04-13 17:07   ` Andrea Parri
  0 siblings, 0 replies; 29+ messages in thread
From: Andrea Parri @ 2020-04-13 17:07 UTC (permalink / raw)
  To: Wei Liu
  Cc: linux-kernel, K . Y . Srinivasan, Haiyang Zhang,
	Stephen Hemminger, linux-hyperv, Michael Kelley, Dexuan Cui,
	Boqun Feng, Vitaly Kuznetsov, David S. Miller, Lorenzo Pieralisi,
	Andrew Murray, Bjorn Helgaas, James E.J. Bottomley,
	Martin K. Petersen

On Mon, Apr 13, 2020 at 04:48:36PM +0100, Wei Liu wrote:
> On Mon, Apr 06, 2020 at 02:15:03AM +0200, Andrea Parri (Microsoft) wrote:
> > Hi all,
> > 
> > This is a follow-up on the RFC submission [1].  The series introduces
> > changes in the VMBus drivers to reassign the CPU that a VMbus channel
> > will interrupt.  This feature can be used for load balancing or other
> > purposes (e.g. CPU offlining).  The submission integrates feedback in
> > the RFC to amend the handling of the 'array of channels' (patch #3).
> > 
> > Thanks,
> >   Andrea
> > 
> > [1] https://lkml.kernel.org/r/20200325225505.23998-1-parri.andrea@gmail.com
> > 
> > Cc: "David S. Miller" <davem@davemloft.net>
> > Cc: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
> > Cc: Andrew Murray <amurray@thegoodpenguin.co.uk>
> > Cc: Bjorn Helgaas <bhelgaas@google.com>
> > Cc: "James E.J. Bottomley" <jejb@linux.ibm.com>
> > Cc: "Martin K. Petersen" <martin.petersen@oracle.com>
> > 
> > Andrea Parri (Microsoft) (11):
> >   Drivers: hv: vmbus: Always handle the VMBus messages on CPU0
> >   Drivers: hv: vmbus: Don't bind the offer&rescind works to a specific
> >     CPU
> >   Drivers: hv: vmbus: Replace the per-CPU channel lists with a global
> >     array of channels
> >   hv_netvsc: Disable NAPI before closing the VMBus channel
> >   hv_utils: Always execute the fcopy and vss callbacks in a tasklet
> >   Drivers: hv: vmbus: Use a spin lock for synchronizing channel
> >     scheduling vs. channel removal
> >   PCI: hv: Prepare hv_compose_msi_msg() for the
> >     VMBus-channel-interrupt-to-vCPU reassignment functionality
> >   Drivers: hv: vmbus: Remove the unused HV_LOCALIZED channel affinity
> >     logic
> >   Drivers: hv: vmbus: Synchronize init_vp_index() vs. CPU hotplug
> >   Drivers: hv: vmbus: Introduce the CHANNELMSG_MODIFYCHANNEL message
> >     type
> >   scsi: storvsc: Re-init stor_chns when a channel interrupt is
> >     re-assigned
> > 
> 
> Applied to hyperv-next. Thanks.
> 
> This hunk in patch 10 doesn't apply cleanly because it conflicts with
> Vitaly's patch.
> 
> diff --git a/drivers/hv/channel_mgmt.c b/drivers/hv/channel_mgmt.c
> index 3785beead503d..1058c814ab06e 100644
> --- a/drivers/hv/channel_mgmt.c
> +++ b/drivers/hv/channel_mgmt.c
> @@ -1377,7 +1377,7 @@ channel_message_table[CHANNELMSG_COUNT] = {
>         { CHANNELMSG_19,                        0, NULL },
>         { CHANNELMSG_20,                        0, NULL },
>         { CHANNELMSG_TL_CONNECT_REQUEST,        0, NULL },
> -       { CHANNELMSG_22,                        0, NULL },
> +       { CHANNELMSG_MODIFYCHANNEL,             0, NULL },
>         { CHANNELMSG_TL_CONNECT_RESULT,         0, NULL },
>  };
> 
> I have fixed it up. New hunk looks like:
> 
> diff --git a/drivers/hv/channel_mgmt.c b/drivers/hv/channel_mgmt.c
> index a6b21838e2de..9c62eb5e4135 100644
> --- a/drivers/hv/channel_mgmt.c
> +++ b/drivers/hv/channel_mgmt.c
> @@ -1380,7 +1380,7 @@ channel_message_table[CHANNELMSG_COUNT] = {
>         { CHANNELMSG_19,                        0, NULL, 0},
>         { CHANNELMSG_20,                        0, NULL, 0},
>         { CHANNELMSG_TL_CONNECT_REQUEST,        0, NULL, 0},
> -       { CHANNELMSG_22,                        0, NULL, 0},
> +       { CHANNELMSG_MODIFYCHANNEL,             0, NULL, 0},
>         { CHANNELMSG_TL_CONNECT_RESULT,         0, NULL, 0},
>  };

The fix looks good to me.  Thank you Wei!

Thanks,
  Andrea

^ permalink raw reply	[flat|nested] 29+ messages in thread

end of thread, other threads:[~2020-04-13 17:07 UTC | newest]

Thread overview: 29+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-04-06  0:15 [PATCH 00/11] VMBus channel interrupt reassignment Andrea Parri (Microsoft)
2020-04-06  0:15 ` [PATCH 01/11] Drivers: hv: vmbus: Always handle the VMBus messages on CPU0 Andrea Parri (Microsoft)
2020-04-10 17:18   ` Michael Kelley
2020-04-06  0:15 ` [PATCH 02/11] Drivers: hv: vmbus: Don't bind the offer&rescind works to a specific CPU Andrea Parri (Microsoft)
2020-04-10 17:23   ` Michael Kelley
2020-04-06  0:15 ` [PATCH 03/11] Drivers: hv: vmbus: Replace the per-CPU channel lists with a global array of channels Andrea Parri (Microsoft)
2020-04-10 17:29   ` Michael Kelley
2020-04-06  0:15 ` [PATCH 04/11] hv_netvsc: Disable NAPI before closing the VMBus channel Andrea Parri (Microsoft)
2020-04-10 19:20   ` Michael Kelley
2020-04-06  0:15 ` [PATCH 05/11] hv_utils: Always execute the fcopy and vss callbacks in a tasklet Andrea Parri (Microsoft)
2020-04-10 19:26   ` Michael Kelley
2020-04-06  0:15 ` [PATCH 06/11] Drivers: hv: vmbus: Use a spin lock for synchronizing channel scheduling vs. channel removal Andrea Parri (Microsoft)
2020-04-10 19:28   ` Michael Kelley
2020-04-06  0:15 ` [PATCH 07/11] PCI: hv: Prepare hv_compose_msi_msg() for the VMBus-channel-interrupt-to-vCPU reassignment functionality Andrea Parri (Microsoft)
2020-04-10 19:30   ` Michael Kelley
2020-04-06  0:15 ` [PATCH 08/11] Drivers: hv: vmbus: Remove the unused HV_LOCALIZED channel affinity logic Andrea Parri (Microsoft)
2020-04-10 19:32   ` Michael Kelley
2020-04-06  0:15 ` [PATCH 09/11] Drivers: hv: vmbus: Synchronize init_vp_index() vs. CPU hotplug Andrea Parri (Microsoft)
2020-04-10 19:33   ` Michael Kelley
2020-04-06  0:15 ` [PATCH 10/11] Drivers: hv: vmbus: Introduce the CHANNELMSG_MODIFYCHANNEL message type Andrea Parri (Microsoft)
2020-04-10 19:34   ` Michael Kelley
2020-04-06  0:15 ` [PATCH 11/11] scsi: storvsc: Re-init stor_chns when a channel interrupt is re-assigned Andrea Parri (Microsoft)
2020-04-06 19:54   ` Long Li
2020-04-07  0:35     ` Andrea Parri
2020-04-08  2:25       ` Long Li
2020-04-08 14:54         ` Andrea Parri
2020-04-10 19:35   ` Michael Kelley
2020-04-13 15:48 ` [PATCH 00/11] VMBus channel interrupt reassignment Wei Liu
2020-04-13 17:07   ` Andrea Parri

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).