[v2,5/8] hv_balloon: don't check for memhp_auto_online manually
diff mbox series

Message ID 20200317104942.11178-6-david@redhat.com
State In Next
Commit ecb61b02a36027c1f3786859fc0489fbfb85f043
Headers show
Series
  • [v2,1/8] drivers/base/memory: rename MMOP_ONLINE_KEEP to MMOP_ONLINE
Related show

Commit Message

David Hildenbrand March 17, 2020, 10:49 a.m. UTC
We get the MEM_ONLINE notifier call if memory is added right from the
kernel via add_memory() or later from user space.

Let's get rid of the "ha_waiting" flag - the wait event has an inbuilt
mechanism (->done) for that. Initialize the wait event only once and
reinitialize before adding memory. Unconditionally call complete() and
wait_for_completion_timeout().

If there are no waiters, complete() will only increment ->done - which
will be reset by reinit_completion(). If complete() has already been
called, wait_for_completion_timeout() will not wait.

There is still the chance for a small race between concurrent
reinit_completion() and complete(). If complete() wins, we would not
wait - which is tolerable (and the race exists in current code as well).

Note: We only wait for "some" memory to get onlined, which seems to be
      good enough for now.

Cc: "K. Y. Srinivasan" <kys@microsoft.com>
Cc: Haiyang Zhang <haiyangz@microsoft.com>
Cc: Stephen Hemminger <sthemmin@microsoft.com>
Cc: Wei Liu <wei.liu@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: "Rafael J. Wysocki" <rafael@kernel.org>
Cc: Baoquan He <bhe@redhat.com>
Cc: Wei Yang <richard.weiyang@gmail.com>
Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
Cc: linux-hyperv@vger.kernel.org
Signed-off-by: David Hildenbrand <david@redhat.com>
---
 drivers/hv/hv_balloon.c | 25 ++++++++++---------------
 1 file changed, 10 insertions(+), 15 deletions(-)

Comments

Vitaly Kuznetsov March 17, 2020, 4:29 p.m. UTC | #1
David Hildenbrand <david@redhat.com> writes:

> We get the MEM_ONLINE notifier call if memory is added right from the
> kernel via add_memory() or later from user space.
>
> Let's get rid of the "ha_waiting" flag - the wait event has an inbuilt
> mechanism (->done) for that. Initialize the wait event only once and
> reinitialize before adding memory. Unconditionally call complete() and
> wait_for_completion_timeout().
>
> If there are no waiters, complete() will only increment ->done - which
> will be reset by reinit_completion(). If complete() has already been
> called, wait_for_completion_timeout() will not wait.
>
> There is still the chance for a small race between concurrent
> reinit_completion() and complete(). If complete() wins, we would not
> wait - which is tolerable (and the race exists in current code as
> well).

How can we see concurent reinit_completion() and complete()? Obvioulsy,
we are not onlining new memory in kernel and hv_mem_hot_add() calls are
serialized, we're waiting up to 5*HZ for the added block to come online
before proceeding to the next one. Or do you mean we actually hit this
5*HZ timeout, proceeded to the next block and immediately after
reinit_completion() we saw complete() for the previously added block?
This is tolerable indeed, we're making forward progress (and this all is
'best effort' anyway).

>
> Note: We only wait for "some" memory to get onlined, which seems to be
>       good enough for now.
>
> Cc: "K. Y. Srinivasan" <kys@microsoft.com>
> Cc: Haiyang Zhang <haiyangz@microsoft.com>
> Cc: Stephen Hemminger <sthemmin@microsoft.com>
> Cc: Wei Liu <wei.liu@kernel.org>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Michal Hocko <mhocko@kernel.org>
> Cc: Oscar Salvador <osalvador@suse.de>
> Cc: "Rafael J. Wysocki" <rafael@kernel.org>
> Cc: Baoquan He <bhe@redhat.com>
> Cc: Wei Yang <richard.weiyang@gmail.com>
> Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
> Cc: linux-hyperv@vger.kernel.org
> Signed-off-by: David Hildenbrand <david@redhat.com>
> ---
>  drivers/hv/hv_balloon.c | 25 ++++++++++---------------
>  1 file changed, 10 insertions(+), 15 deletions(-)
>
> diff --git a/drivers/hv/hv_balloon.c b/drivers/hv/hv_balloon.c
> index a02ce43d778d..af5e09f08130 100644
> --- a/drivers/hv/hv_balloon.c
> +++ b/drivers/hv/hv_balloon.c
> @@ -533,7 +533,6 @@ struct hv_dynmem_device {
>  	 * State to synchronize hot-add.
>  	 */
>  	struct completion  ol_waitevent;
> -	bool ha_waiting;
>  	/*
>  	 * This thread handles hot-add
>  	 * requests from the host as well as notifying
> @@ -634,10 +633,7 @@ static int hv_memory_notifier(struct notifier_block *nb, unsigned long val,
>  	switch (val) {
>  	case MEM_ONLINE:
>  	case MEM_CANCEL_ONLINE:
> -		if (dm_device.ha_waiting) {
> -			dm_device.ha_waiting = false;
> -			complete(&dm_device.ol_waitevent);
> -		}
> +		complete(&dm_device.ol_waitevent);
>  		break;
>  
>  	case MEM_OFFLINE:
> @@ -726,8 +722,7 @@ static void hv_mem_hot_add(unsigned long start, unsigned long size,
>  		has->covered_end_pfn +=  processed_pfn;
>  		spin_unlock_irqrestore(&dm_device.ha_lock, flags);
>  
> -		init_completion(&dm_device.ol_waitevent);
> -		dm_device.ha_waiting = !memhp_auto_online;
> +		reinit_completion(&dm_device.ol_waitevent);
>  
>  		nid = memory_add_physaddr_to_nid(PFN_PHYS(start_pfn));
>  		ret = add_memory(nid, PFN_PHYS((start_pfn)),
> @@ -753,15 +748,14 @@ static void hv_mem_hot_add(unsigned long start, unsigned long size,
>  		}
>  
>  		/*
> -		 * Wait for the memory block to be onlined when memory onlining
> -		 * is done outside of kernel (memhp_auto_online). Since the hot
> -		 * add has succeeded, it is ok to proceed even if the pages in
> -		 * the hot added region have not been "onlined" within the
> -		 * allowed time.
> +		 * Wait for memory to get onlined. If the kernel onlined the
> +		 * memory when adding it, this will return directly. Otherwise,
> +		 * it will wait for user space to online the memory. This helps
> +		 * to avoid adding memory faster than it is getting onlined. As
> +		 * adding succeeded, it is ok to proceed even if the memory was
> +		 * not onlined in time.
>  		 */
> -		if (dm_device.ha_waiting)
> -			wait_for_completion_timeout(&dm_device.ol_waitevent,
> -						    5*HZ);
> +		wait_for_completion_timeout(&dm_device.ol_waitevent, 5 * HZ);
>  		post_status(&dm_device);
>  	}
>  }
> @@ -1707,6 +1701,7 @@ static int balloon_probe(struct hv_device *dev,
>  #ifdef CONFIG_MEMORY_HOTPLUG
>  	set_online_page_callback(&hv_online_page);
>  	register_memory_notifier(&hv_memory_nb);
> +	init_completion(&dm_device.ol_waitevent);
>  #endif
>  
>  	hv_set_drvdata(dev, &dm_device);

Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com>
David Hildenbrand March 17, 2020, 4:33 p.m. UTC | #2
On 17.03.20 17:29, Vitaly Kuznetsov wrote:
> David Hildenbrand <david@redhat.com> writes:
> 
>> We get the MEM_ONLINE notifier call if memory is added right from the
>> kernel via add_memory() or later from user space.
>>
>> Let's get rid of the "ha_waiting" flag - the wait event has an inbuilt
>> mechanism (->done) for that. Initialize the wait event only once and
>> reinitialize before adding memory. Unconditionally call complete() and
>> wait_for_completion_timeout().
>>
>> If there are no waiters, complete() will only increment ->done - which
>> will be reset by reinit_completion(). If complete() has already been
>> called, wait_for_completion_timeout() will not wait.
>>
>> There is still the chance for a small race between concurrent
>> reinit_completion() and complete(). If complete() wins, we would not
>> wait - which is tolerable (and the race exists in current code as
>> well).
> 
> How can we see concurent reinit_completion() and complete()? Obvioulsy,
> we are not onlining new memory in kernel and hv_mem_hot_add() calls are
> serialized, we're waiting up to 5*HZ for the added block to come online
> before proceeding to the next one. Or do you mean we actually hit this
> 5*HZ timeout, proceeded to the next block and immediately after
> reinit_completion() we saw complete() for the previously added block?

Yes exactly - or if an admin manually offlines+re-onlines a random
memory block.

> This is tolerable indeed, we're making forward progress (and this all is
> 'best effort' anyway).

Exactly my thoughts.

[...]

> 
> Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com>
> 

Thanks!
David Hildenbrand March 17, 2020, 6:46 p.m. UTC | #3
> @@ -1707,6 +1701,7 @@ static int balloon_probe(struct hv_device *dev,
>  #ifdef CONFIG_MEMORY_HOTPLUG
>  	set_online_page_callback(&hv_online_page);
>  	register_memory_notifier(&hv_memory_nb);
> +	init_completion(&dm_device.ol_waitevent);

I'll move this one line up.

>  #endif
>  
>  	hv_set_drvdata(dev, &dm_device);
>

Patch
diff mbox series

diff --git a/drivers/hv/hv_balloon.c b/drivers/hv/hv_balloon.c
index a02ce43d778d..af5e09f08130 100644
--- a/drivers/hv/hv_balloon.c
+++ b/drivers/hv/hv_balloon.c
@@ -533,7 +533,6 @@  struct hv_dynmem_device {
 	 * State to synchronize hot-add.
 	 */
 	struct completion  ol_waitevent;
-	bool ha_waiting;
 	/*
 	 * This thread handles hot-add
 	 * requests from the host as well as notifying
@@ -634,10 +633,7 @@  static int hv_memory_notifier(struct notifier_block *nb, unsigned long val,
 	switch (val) {
 	case MEM_ONLINE:
 	case MEM_CANCEL_ONLINE:
-		if (dm_device.ha_waiting) {
-			dm_device.ha_waiting = false;
-			complete(&dm_device.ol_waitevent);
-		}
+		complete(&dm_device.ol_waitevent);
 		break;
 
 	case MEM_OFFLINE:
@@ -726,8 +722,7 @@  static void hv_mem_hot_add(unsigned long start, unsigned long size,
 		has->covered_end_pfn +=  processed_pfn;
 		spin_unlock_irqrestore(&dm_device.ha_lock, flags);
 
-		init_completion(&dm_device.ol_waitevent);
-		dm_device.ha_waiting = !memhp_auto_online;
+		reinit_completion(&dm_device.ol_waitevent);
 
 		nid = memory_add_physaddr_to_nid(PFN_PHYS(start_pfn));
 		ret = add_memory(nid, PFN_PHYS((start_pfn)),
@@ -753,15 +748,14 @@  static void hv_mem_hot_add(unsigned long start, unsigned long size,
 		}
 
 		/*
-		 * Wait for the memory block to be onlined when memory onlining
-		 * is done outside of kernel (memhp_auto_online). Since the hot
-		 * add has succeeded, it is ok to proceed even if the pages in
-		 * the hot added region have not been "onlined" within the
-		 * allowed time.
+		 * Wait for memory to get onlined. If the kernel onlined the
+		 * memory when adding it, this will return directly. Otherwise,
+		 * it will wait for user space to online the memory. This helps
+		 * to avoid adding memory faster than it is getting onlined. As
+		 * adding succeeded, it is ok to proceed even if the memory was
+		 * not onlined in time.
 		 */
-		if (dm_device.ha_waiting)
-			wait_for_completion_timeout(&dm_device.ol_waitevent,
-						    5*HZ);
+		wait_for_completion_timeout(&dm_device.ol_waitevent, 5 * HZ);
 		post_status(&dm_device);
 	}
 }
@@ -1707,6 +1701,7 @@  static int balloon_probe(struct hv_device *dev,
 #ifdef CONFIG_MEMORY_HOTPLUG
 	set_online_page_callback(&hv_online_page);
 	register_memory_notifier(&hv_memory_nb);
+	init_completion(&dm_device.ol_waitevent);
 #endif
 
 	hv_set_drvdata(dev, &dm_device);