All of lore.kernel.org
 help / color / mirror / Atom feed
From: David Hildenbrand <david@redhat.com>
To: Chris Goldsworthy <cgoldswo@codeaurora.org>
Cc: akpm@linux-foundation.org, linux-mm@kvack.org,
	linux-arm-msm@vger.kernel.org, linux-kernel@vger.kernel.org,
	pratikp@codeaurora.org, pdaly@codeaurora.org,
	sudraja@codeaurora.org, iamjoonsoo.kim@lge.com,
	linux-arm-msm-owner@vger.kernel.org,
	Vinayak Menon <vinmenon@codeaurora.org>,
	linux-kernel-owner@vger.kernel.org
Subject: Re: [PATCH v2] mm: cma: indefinitely retry allocations in cma_alloc
Date: Tue, 15 Sep 2020 09:53:30 +0200	[thread overview]
Message-ID: <a3d62a77-4c4f-e86c-de6d-5222c2a747e0@redhat.com> (raw)
In-Reply-To: <72ae0f361df527cf70946992e4ab1eb3@codeaurora.org>

On 14.09.20 20:33, Chris Goldsworthy wrote:
> On 2020-09-14 02:31, David Hildenbrand wrote:
>> On 11.09.20 21:17, Chris Goldsworthy wrote:
>>>
>>> So, inside of cma_alloc(), instead of giving up when 
>>> alloc_contig_range()
>>> returns -EBUSY after having scanned a whole CMA-region bitmap, perform
>>> retries indefinitely, with sleeps, to give the system an opportunity 
>>> to
>>> unpin any pinned pages.
>>>
>>> Signed-off-by: Chris Goldsworthy <cgoldswo@codeaurora.org>
>>> Co-developed-by: Vinayak Menon <vinmenon@codeaurora.org>
>>> Signed-off-by: Vinayak Menon <vinmenon@codeaurora.org>
>>> ---
>>>  mm/cma.c | 25 +++++++++++++++++++++++--
>>>  1 file changed, 23 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/mm/cma.c b/mm/cma.c
>>> index 7f415d7..90bb505 100644
>>> --- a/mm/cma.c
>>> +++ b/mm/cma.c
>>> @@ -442,8 +443,28 @@ struct page *cma_alloc(struct cma *cma, size_t 
>>> count, unsigned int align,
>>>  				bitmap_maxno, start, bitmap_count, mask,
>>>  				offset);
>>>  		if (bitmap_no >= bitmap_maxno) {
>>> -			mutex_unlock(&cma->lock);
>>> -			break;
>>> +			if (ret == -EBUSY) {
>>> +				mutex_unlock(&cma->lock);
>>> +
>>> +				/*
>>> +				 * Page may be momentarily pinned by some other
>>> +				 * process which has been scheduled out, e.g.
>>> +				 * in exit path, during unmap call, or process
>>> +				 * fork and so cannot be freed there. Sleep
>>> +				 * for 100ms and retry the allocation.
>>> +				 */
>>> +				start = 0;
>>> +				ret = -ENOMEM;
>>> +				msleep(100);
>>> +				continue;
>>> +			} else {
>>> +				/*
>>> +				 * ret == -ENOMEM - all bits in cma->bitmap are
>>> +				 * set, so we break accordingly.
>>> +				 */
>>> +				mutex_unlock(&cma->lock);
>>> +				break;
>>> +			}
>>>  		}
>>>  		bitmap_set(cma->bitmap, bitmap_no, bitmap_count);
>>>  		/*
>>>
>>
>> What about long-term pinnings? IIRC, that can happen easily e.g., with
>> vfio (and I remember there is a way via vmsplice).
>>
>> Not convinced trying forever is a sane approach in the general case ...
> 
> Hi David,
> 
> I've botched the threading, so there are discussions with respect to the 
> previous patch-set that is missing on this thread, which I will 
> summarize below:
> 
> V1:
> [1] https://lkml.org/lkml/2020/8/5/1097
> [2] https://lkml.org/lkml/2020/8/6/1040
> [3] https://lkml.org/lkml/2020/8/11/893
> [4] https://lkml.org/lkml/2020/8/21/1490
> [5] https://lkml.org/lkml/2020/9/11/1072
> 
> [1] features version of the patch featured a finite number of retries, 
> which has been stable for our kernels. In [2], Andrew questioned whether 
> we could actually find a way of solving the problem on the grounds that 
> doing a finite number of retries doesn't actually fix the problem (more 
> importantly, in [4] Andrew indicated that he would prefer not to merge 
> the patch as it doesn't solve the issue).  In [3], I suggest one actual 
> fix for this, which is to use preempt_disable/enable() to prevent 
> context switches from occurring during the periods in copy_one_pte() and 
> exit_mmap() (I forgot to mention this case in the commit text) in which 
> _refcount > _mapcount for a page - you would also need to prevent 
> interrupts from occurring to if we were to fully prevent the issue from 
> occurring.  I think this would be acceptable for the copy_one_pte() 
> case, since there _refcount > _mapcount for little time.  For the 
> exit_mmap() case, however, _refcount is greater than _mapcount whilst 
> the page-tables are being torn down for a process - that could be too 
> long for disabling preemption / interrupts.
> 
> So, in [4], Andrew asks about two alternatives to see if they're viable: 
> (1) acquiring locks on the exit_mmap path and migration paths, (2) 
> retrying indefinitely.  In [5], I discuss how using locks could increase 
> the time it takes to perform a CMA allocation, such that a retry 
> approach would avoid increased CMA allocation times. I'm also uncertain 
> about how the locking scheme could be implemented effectively without 
> introducing a new per-page lock that will be used specifically to solve 
> this issue, and I'm not sure this would be accepted.

Thanks for the nice summary!

> 
> We're fine with doing indefinite retries, on the grounds that if there 
> is some long-term pinning that occurs when alloc_contig_range returns 
> -EBUSY, that it should be debugged and fixed.  Would it be possible to 
> make this infinite-retrying something that could be enabled or disabled 
> by a defconfig option?

Two thoughts:

1. Most (all?) alloc_contig_range() users are interested in handling
short-term pinnings in a nice way (IOW, make the allocation succeed).
I'd much rather want to see this being handled in a nice fashion inside
alloc_contig_range() than having to encode endless loops in the caller.
This means I strongly prefer something like [3] if feasible. But I can
understand that stuff ([5]) is complicated. I have to admit that I am
not an expert on the short term pinning described by you, and how to
eventually fix it.

2. The issue that I am having is that long-term pinnings are
(unfortunately) a real thing. It's not something to debug and fix as you
suggest. Like, run a VM with VFIO (e.g., PCI passthrough). While that VM
is running, all VM memory will be pinned. If memory falls onto a CMA
region your cma_alloc() will be stuck in an (endless, meaning until the
VM ended) loop. I am not sure if all cma users are fine with that -
especially, think about CMA being used for gigantic pages now.

Assume you want to start a new VM while the other one is running and use
some (new) gigantic pages for it. Suddenly you're trapped in an endless
loop in the kernel. That's nasty.

We do have a similar endless loop on the memory hotunplug/offlining path
(offline_pages()). However, if triggered by a user (echo 0 >
/sys/devices/system/memory/memoryX/online), a user can stop trying by
sending a signal.


If we want to stick to retrying forever, can't we use flags like
__GFP_NOFAIL to explicitly enable this new behavior for selected
cma_alloc() users that really can't fail/retry manually again?

-- 
Thanks,

David / dhildenb


  parent reply	other threads:[~2020-09-15  7:54 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <06489716814387e7f147cf53d1b185a8@codeaurora.org>
2020-09-11 19:17 ` [PATCH v2] cma_alloc(), indefinitely retry allocations for -EBUSY failures Chris Goldsworthy
2020-09-11 19:17 ` Chris Goldsworthy
     [not found] ` <1599851809-4342-1-git-send-email-cgoldswo@codeaurora.org>
2020-09-11 19:17   ` [PATCH v2] mm: cma: indefinitely retry allocations in cma_alloc Chris Goldsworthy
2020-09-11 19:17   ` Chris Goldsworthy
2020-09-14  9:31     ` David Hildenbrand
2020-09-14 18:33       ` Chris Goldsworthy
2020-09-14 21:52         ` Chris Goldsworthy
2020-09-15  7:53         ` David Hildenbrand [this message]
2020-09-17 17:26           ` Chris Goldsworthy
2020-09-17 17:54           ` Chris Goldsworthy
2020-09-24  5:13             ` Chris Goldsworthy
2020-09-28  7:39           ` Christoph Hellwig
     [not found] <1599855850-11337-1-git-send-email-cgoldswo@codeaurora.org>
2020-09-11 20:24 ` Chris Goldsworthy
     [not found] <1599857630-23714-1-git-send-email-cgoldswo@codeaurora.org>
2020-09-11 20:54 ` Chris Goldsworthy
2020-09-11 21:37   ` Florian Fainelli
2020-09-11 21:42     ` Randy Dunlap
2020-09-14 18:45       ` Chris Goldsworthy
2020-09-14 18:39     ` Chris Goldsworthy

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=a3d62a77-4c4f-e86c-de6d-5222c2a747e0@redhat.com \
    --to=david@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=cgoldswo@codeaurora.org \
    --cc=iamjoonsoo.kim@lge.com \
    --cc=linux-arm-msm-owner@vger.kernel.org \
    --cc=linux-arm-msm@vger.kernel.org \
    --cc=linux-kernel-owner@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=pdaly@codeaurora.org \
    --cc=pratikp@codeaurora.org \
    --cc=sudraja@codeaurora.org \
    --cc=vinmenon@codeaurora.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.