All of lore.kernel.org
 help / color / mirror / Atom feed
From: Wei Wang <wei.w.wang@intel.com>
To: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>, mst@redhat.com
Cc: linux-mm@kvack.org, virtualization@lists.linux-foundation.org,
	linux-kernel@vger.kernel.org, mhocko@kernel.org
Subject: Re: [PATCH v1 1/3] virtio-balloon: replace the coarse-grained balloon_lock
Date: Sun, 22 Oct 2017 19:24:37 +0800	[thread overview]
Message-ID: <59EC7FF5.6070906__11366.809186484$1508671372$gmane$org@intel.com> (raw)
In-Reply-To: <201710221420.FHG17654.OOMFQSFJVFHLtO@I-love.SAKURA.ne.jp>

On 10/22/2017 01:20 PM, Tetsuo Handa wrote:
> Wei Wang wrote:
>> The balloon_lock was used to synchronize the access demand to elements
>> of struct virtio_balloon and its queue operations (please see commit
>> e22504296d). This prevents the concurrent run of the leak_balloon and
>> fill_balloon functions, thereby resulting in a deadlock issue on OOM:
>>
>> fill_balloon: take balloon_lock and wait for OOM to get some memory;
>> oom_notify: release some inflated memory via leak_balloon();
>> leak_balloon: wait for balloon_lock to be released by fill_balloon.
>>
>> This patch breaks the lock into two fine-grained inflate_lock and
>> deflate_lock, and eliminates the unnecessary use of the shared data
>> (i.e. vb->pnfs, vb->num_pfns). This enables leak_balloon and
>> fill_balloon to run concurrently and solves the deadlock issue.
>>
>> @@ -162,20 +160,20 @@ static unsigned fill_balloon(struct virtio_balloon *vb, size_t num)
>>   			msleep(200);
>>   			break;
>>   		}
>> -		set_page_pfns(vb, vb->pfns + vb->num_pfns, page);
>> -		vb->num_pages += VIRTIO_BALLOON_PAGES_PER_PAGE;
>> +		set_page_pfns(vb, pfns + num_pfns, page);
>>   		if (!virtio_has_feature(vb->vdev,
>>   					VIRTIO_BALLOON_F_DEFLATE_ON_OOM))
>>   			adjust_managed_page_count(page, -1);
>>   	}
>>   
>> -	num_allocated_pages = vb->num_pfns;
>> +	mutex_lock(&vb->inflate_lock);
>>   	/* Did we get any? */
>> -	if (vb->num_pfns != 0)
>> -		tell_host(vb, vb->inflate_vq);
>> -	mutex_unlock(&vb->balloon_lock);
>> +	if (num_pfns != 0)
>> +		tell_host(vb, vb->inflate_vq, pfns, num_pfns);
>> +	mutex_unlock(&vb->inflate_lock);
>> +	atomic64_add(num_pfns, &vb->num_pages);
> Isn't this addition too late? If leak_balloon() is called due to
> out_of_memory(), it will fail to find up to dated vb->num_pages value.

Not really. I think the old way of implementation above:
"vb->num_pages += VIRTIO_BALLOON_PAGES_PER_PAGE"
isn't quite accurate, because "vb->num_page" should reflect the number of
pages that have already been inflated, which means those pages have
already been given to the host via "tell_host()".

If we update "vb->num_page" earlier before tell_host(), then it will 
include the pages
that haven't been given to the host, which I think shouldn't be counted 
as inflated pages.

On the other hand, OOM will use leak_balloon() to release the pages that 
should
have already been inflated.

In addition, I think we would also need to move balloon_page_insert(), 
which puts the
page onto the inflated page list, after tell_host().



>>   
>> -	return num_allocated_pages;
>> +	return num_pfns;
>>   }
>>   
>>   static void release_pages_balloon(struct virtio_balloon *vb,
>> @@ -194,38 +192,39 @@ static void release_pages_balloon(struct virtio_balloon *vb,
>>   
>>   static unsigned leak_balloon(struct virtio_balloon *vb, size_t num)
>>   {
>> -	unsigned num_freed_pages;
>>   	struct page *page;
>>   	struct balloon_dev_info *vb_dev_info = &vb->vb_dev_info;
>>   	LIST_HEAD(pages);
>> +	unsigned int num_pfns;
>> +	__virtio32 pfns[VIRTIO_BALLOON_ARRAY_PFNS_MAX];
> This array consumes 1024 bytes of kernel stack, doesn't it?
> leak_balloon() might be called from out_of_memory() where kernel stack
> is already largely consumed before entering __alloc_pages_nodemask().
> For reducing possibility of stack overflow, since out_of_memory() is
> serialized by oom_lock, I suggest using static (maybe kmalloc()ed as
> vb->oom_pfns[VIRTIO_BALLOON_ARRAY_PFNS_MAX]) buffer when called from
> out_of_memory().

In that case, we might as well to use
vb->inflate_pfns = kmalloc(VIRTIO_BALLOON_ARRAY_PFNS_MAX..);
vb->deflate_pfns = kmalloc(VIRTIO_BALLOON_ARRAY_PFNS_MAX..);
which are allocated in probe().

>>   
>>   	/* We can only do one array worth at a time. */
>> -	num = min(num, ARRAY_SIZE(vb->pfns));
>> +	num = min_t(size_t, num, VIRTIO_BALLOON_ARRAY_PFNS_MAX);
>>   
>> -	mutex_lock(&vb->balloon_lock);
>>   	/* We can't release more pages than taken */
>> -	num = min(num, (size_t)vb->num_pages);
>> -	for (vb->num_pfns = 0; vb->num_pfns < num;
>> -	     vb->num_pfns += VIRTIO_BALLOON_PAGES_PER_PAGE) {
>> +	num = min_t(size_t, num, atomic64_read(&vb->num_pages));
>> +	for (num_pfns = 0; num_pfns < num;
>> +	     num_pfns += VIRTIO_BALLOON_PAGES_PER_PAGE) {
>>   		page = balloon_page_dequeue(vb_dev_info);
> If balloon_page_dequeue() can be concurrently called by both host's request
> and guest's OOM event, is (!dequeued_page) test in balloon_page_dequeue() safe?


I'm not sure about the question. The "dequeue_page" is a local variable
in the function, why would it be unsafe for two invocations (the shared
b_dev_info->pages are operated under a lock)?



> Is such concurrency needed?

Thanks for this question, it triggers another optimization, which I want to
introduce if this direction could be accepted:

I think it is not quite necessary to deflate pages in OOM-->leak_balloon()
when the host request leak_ballon() is running. In that case, I think OOM
can just count the pages that are deflated by the host request.

The implementation logic will be simple, here is the major part:

1) Introduce a "vb->deflating" flag, to tell whether deflating is in 
progress

2) At the beginning of leak_balloon():
     if (READ_ONCE(vb->deflating)) {
            npages = atomic64_read(&vb->num_pages);
            /* Wait till the other run of leak_balloon() returns */
            while (READ_ONCE(vb->deflating));
            npages = npages - atomic64_read(&vb->num_pages)
     } else {
         WRITE_ONCE(vb->deflating, true);
     }
     ...

3) At the end of leak_balloon():
     WRITE_ONCE(vb->deflating, false);

(The above vb->deflating doesn't have to be in vb though, it can be a 
static variable inside leak_balloon(). we can
discuss more about the implementation when reaching that step)


Best,
Wei

  reply	other threads:[~2017-10-22 11:24 UTC|newest]

Thread overview: 47+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-10-20 11:54 [PATCH v1 0/3] Virtio-balloon Improvement Wei Wang
2017-10-20 11:54 ` Wei Wang
2017-10-20 11:54 ` [PATCH v1 1/3] virtio-balloon: replace the coarse-grained balloon_lock Wei Wang
2017-10-20 11:54   ` Wei Wang
2017-10-22  5:20   ` Tetsuo Handa
2017-10-22  5:20   ` Tetsuo Handa
2017-10-22  5:20     ` Tetsuo Handa
2017-10-22 11:24     ` Wei Wang [this message]
2017-10-22 11:24     ` Wei Wang
2017-10-22 11:24       ` Wei Wang
2017-10-22 11:50       ` Tetsuo Handa
2017-10-22 11:50         ` Tetsuo Handa
2017-10-24  1:46         ` Wei Wang
2017-10-24  1:46           ` Wei Wang
2017-10-24  1:46         ` Wei Wang
2017-10-22 11:50       ` Tetsuo Handa
2017-10-20 11:54 ` Wei Wang
2017-10-20 11:54 ` [PATCH v1 2/3] virtio-balloon: deflate up to oom_pages on OOM Wei Wang
2017-10-20 11:54 ` Wei Wang
2017-10-20 11:54   ` Wei Wang
2017-10-22  3:21   ` Michael S. Tsirkin
2017-10-22  3:21   ` Michael S. Tsirkin
2017-10-22  3:21     ` Michael S. Tsirkin
2017-10-22  4:11     ` Tetsuo Handa
2017-10-22  4:11     ` Tetsuo Handa
2017-10-22  4:11       ` Tetsuo Handa
2017-10-22 11:31       ` Wei Wang
2017-10-22 11:31         ` Wei Wang
2017-10-22 11:31       ` Wei Wang
2017-10-20 11:54 ` [PATCH v1 3/3] virtio-balloon: stop inflating when OOM occurs Wei Wang
2017-10-20 11:54   ` Wei Wang
2017-10-22 17:13   ` Michael S. Tsirkin
2017-10-22 17:13   ` Michael S. Tsirkin
2017-10-22 17:13     ` Michael S. Tsirkin
2017-10-24  1:58     ` Wei Wang
2017-10-24  1:58       ` Wei Wang
2017-10-24  1:58     ` Wei Wang
2017-10-20 11:54 ` Wei Wang
2017-10-22  3:19 ` [PATCH v1 0/3] Virtio-balloon Improvement Michael S. Tsirkin
2017-10-22  3:19 ` Michael S. Tsirkin
2017-10-22  3:19   ` Michael S. Tsirkin
2017-10-22 11:19   ` Wei Wang
2017-10-22 11:19     ` Wei Wang
2017-10-22 11:19   ` Wei Wang
2017-11-03  8:35   ` Wei Wang
2017-11-03  8:35     ` Wei Wang
2017-11-03  8:35   ` Wei Wang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='59EC7FF5.6070906__11366.809186484$1508671372$gmane$org@intel.com' \
    --to=wei.w.wang@intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@kernel.org \
    --cc=mst@redhat.com \
    --cc=penguin-kernel@I-love.SAKURA.ne.jp \
    --cc=virtualization@lists.linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.