From: Chris Goldsworthy <cgoldswo@codeaurora.org>
To: Christoph Hellwig <hch@infradead.org>
Cc: akpm@linux-foundation.org, linux-mm@kvack.org,
minchan@kernel.org, linux-arm-msm@vger.kernel.org,
linux-kernel@vger.kernel.org, pratikp@codeaurora.org,
pdaly@codeaurora.org, sudaraja@codeaurora.org,
iamjoonsoo.kim@lge.com, david@redhat.com,
vinmenon@codeaurora.org, minchan.kim@gmail.com
Subject: Re: [PATCH v4] mm: cma: indefinitely retry allocations in cma_alloc
Date: Fri, 09 Oct 2020 13:26:59 -0700 [thread overview]
Message-ID: <3cdd6c30c062cf11eb1a7e3c47ff111e@codeaurora.org> (raw)
In-Reply-To: <20200929055937.GA5332@infradead.org>
On 2020-09-28 22:59, Christoph Hellwig wrote:
> On Mon, Sep 28, 2020 at 01:30:27PM -0700, Chris Goldsworthy wrote:
>> CMA allocations will fail if 'pinned' pages are in a CMA area, since
>> we
>> cannot migrate pinned pages. The _refcount of a struct page being
>> greater
>> than _mapcount for that page can cause pinning for anonymous pages.
>> This
>> is because try_to_unmap(), which (1) is called in the CMA allocation
>> path,
>> and (2) decrements both _refcount and _mapcount for a page, will stop
>> unmapping a page from VMAs once the _mapcount for a page reaches 0.
>> This
>> implies that after try_to_unmap() has finished successfully for a page
>> where _recount > _mapcount, that _refcount will be greater than 0.
>> Later
>> in the CMA allocation path in migrate_page_move_mapping(), we will
>> have one
>> more reference count than intended for anonymous pages, meaning the
>> allocation will fail for that page.
>>
>> If a process ends up causing _refcount > _mapcount for a page (by
>> either
>> incrementing _recount or decrementing _mapcount), such that the
>> process is
>> context switched out after modifying one refcount but before modifying
>> the
>> other, the page will be temporarily pinned.
>>
>> One example of where _refcount can be greater than _mapcount is inside
>> of
>> zap_pte_range(), which is called for all the entries of a PMD when a
>> process is exiting, to unmap the process's memory. Inside of
>> zap_pte_range(), after unammping a page with page_remove_rmap(), we
>> have
>> that _recount > _mapcount. _refcount can only be decremented after a
>> TLB
>> flush is performed for the page - this doesn't occur until enough
>> pages
>> have been batched together for flushing. The flush can either occur
>> inside
>> of zap_pte_range() (during the same invocation or a later one), or if
>> there
>> aren't enough pages collected by the time we unmap all of the pages in
>> a
>> process, the flush will occur in tlb_finish_mmu() in exit_mmap().
>> After
>> the flush has occurred, tlb_batch_pages_flush() will decrement the
>> references on the flushed pages.
>>
>> Another such example like the above is inside of copy_one_pte(), which
>> is
>> called during a fork. For PTEs for which pte_present(pte) == true,
>> copy_one_pte() will increment the _refcount field followed by the
>> _mapcount field of a page.
>>
>> So, inside of cma_alloc(), add the option of letting users pass in
>> __GFP_NOFAIL to indicate that we should retry CMA allocations
>> indefinitely,
>> in the event that alloc_contig_range() returns -EBUSY after having
>> scanned
>> a whole CMA-region bitmap.
>
> And who is going to use this? AS-is this just seems to add code that
> isn't actually used and thus actually tested. (In addition to beeing
> a relly bad idea as discussed before)
Hi Christoph,
That had slipped my mind - what we would have submitted would have been
a modified /drivers/dma-heap/heaps/cma_heap.c, which would have created
a "linux,cma-nofail" heap, that when allocated from, passes GFP_NOFAIL
to cma_alloc(). But, since this retry approach (finite and infinite)
has effectively been nacked, I've gone back to the drawing board to find
either (1) a lock based approach to solving this (as posed by Andrew
Morton here: https://lkml.org/lkml/2020/8/21/1490), or (2) using
preempt_disable() calls.
Thanks,
Chris.
>> --- a/kernel/dma/contiguous.c
>> +++ b/kernel/dma/contiguous.c
>> @@ -196,7 +196,7 @@ struct page *dma_alloc_from_contiguous(struct
>> device *dev, size_t count,
>> if (align > CONFIG_CMA_ALIGNMENT)
>> align = CONFIG_CMA_ALIGNMENT;
>>
>> - return cma_alloc(dev_get_cma_area(dev), count, align, no_warn);
>> + return cma_alloc(dev_get_cma_area(dev), count, align, no_warn ?
>> __GFP_NOWARN : 0);
>
> Also don't add pointlessly overlong lines.
--
The Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora
Forum,
a Linux Foundation Collaborative Project
prev parent reply other threads:[~2020-10-09 20:27 UTC|newest]
Thread overview: 4+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-09-28 20:30 [PATCH v4] mm: cma: indefinitely retry allocations in cma_alloc Chris Goldsworthy
2020-09-28 20:30 ` Chris Goldsworthy
2020-09-29 5:59 ` Christoph Hellwig
2020-10-09 20:26 ` Chris Goldsworthy [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=3cdd6c30c062cf11eb1a7e3c47ff111e@codeaurora.org \
--to=cgoldswo@codeaurora.org \
--cc=akpm@linux-foundation.org \
--cc=david@redhat.com \
--cc=hch@infradead.org \
--cc=iamjoonsoo.kim@lge.com \
--cc=linux-arm-msm@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=minchan.kim@gmail.com \
--cc=minchan@kernel.org \
--cc=pdaly@codeaurora.org \
--cc=pratikp@codeaurora.org \
--cc=sudaraja@codeaurora.org \
--cc=vinmenon@codeaurora.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).