* Can the huge zero page be partially mapped?
@ 2024-03-04 16:54 Matthew Wilcox
2024-03-04 19:19 ` Yang Shi
0 siblings, 1 reply; 3+ messages in thread
From: Matthew Wilcox @ 2024-03-04 16:54 UTC (permalink / raw)
To: linux-mm
I looked at the definition of is_huge_zero_page():
static inline bool is_huge_zero_page(struct page *page)
{
return READ_ONCE(huge_zero_page) == page;
}
That made me raise my eyebrows a bit because it will return false for
tail pages of the HZP (that was at least unexpected for me). Then we
have this beauty:
void free_page_and_swap_cache(struct page *page)
{
struct folio *folio = page_folio(page);
free_swap_cache(folio);
if (!is_huge_zero_page(page))
folio_put(folio);
}
So if we can call free_page_and_swap_cache() with a tail of the HZP
we can absolutely screw up its refcounting. Now, we have VM_BUGs
to catch the refcount going below 0, and I haven't seen them being
hit, so I _presume_ it doesn't happen, but maybe somebody inventive
could come up with a way of putting a HZP tail into a page table ...?
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: Can the huge zero page be partially mapped?
2024-03-04 16:54 Can the huge zero page be partially mapped? Matthew Wilcox
@ 2024-03-04 19:19 ` Yang Shi
2024-03-04 21:52 ` David Hildenbrand
0 siblings, 1 reply; 3+ messages in thread
From: Yang Shi @ 2024-03-04 19:19 UTC (permalink / raw)
To: Matthew Wilcox; +Cc: linux-mm
On Mon, Mar 4, 2024 at 8:54 AM Matthew Wilcox <willy@infradead.org> wrote:
>
> I looked at the definition of is_huge_zero_page():
>
> static inline bool is_huge_zero_page(struct page *page)
> {
> return READ_ONCE(huge_zero_page) == page;
> }
>
> That made me raise my eyebrows a bit because it will return false for
> tail pages of the HZP (that was at least unexpected for me). Then we
> have this beauty:
>
> void free_page_and_swap_cache(struct page *page)
> {
> struct folio *folio = page_folio(page);
>
> free_swap_cache(folio);
> if (!is_huge_zero_page(page))
> folio_put(folio);
> }
>
> So if we can call free_page_and_swap_cache() with a tail of the HZP
> we can absolutely screw up its refcounting. Now, we have VM_BUGs
> to catch the refcount going below 0, and I haven't seen them being
> hit, so I _presume_ it doesn't happen, but maybe somebody inventive
> could come up with a way of putting a HZP tail into a page table ...?
The huge zero pmd split is specially handled by
__split_huge_zero_page_pmd(), which actually replaces every subpages
of HZP to zero page.
>
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: Can the huge zero page be partially mapped?
2024-03-04 19:19 ` Yang Shi
@ 2024-03-04 21:52 ` David Hildenbrand
0 siblings, 0 replies; 3+ messages in thread
From: David Hildenbrand @ 2024-03-04 21:52 UTC (permalink / raw)
To: Yang Shi, Matthew Wilcox; +Cc: linux-mm, David Howells
On 04.03.24 20:19, Yang Shi wrote:
> On Mon, Mar 4, 2024 at 8:54 AM Matthew Wilcox <willy@infradead.org> wrote:
>>
>> I looked at the definition of is_huge_zero_page():
>>
>> static inline bool is_huge_zero_page(struct page *page)
>> {
>> return READ_ONCE(huge_zero_page) == page;
>> }
>>
>> That made me raise my eyebrows a bit because it will return false for
>> tail pages of the HZP (that was at least unexpected for me). Then we
>> have this beauty:
>>
>> void free_page_and_swap_cache(struct page *page)
>> {
>> struct folio *folio = page_folio(page);
>>
>> free_swap_cache(folio);
>> if (!is_huge_zero_page(page))
>> folio_put(folio);
>> }
>>
>> So if we can call free_page_and_swap_cache() with a tail of the HZP
>> we can absolutely screw up its refcounting. Now, we have VM_BUGs
>> to catch the refcount going below 0, and I haven't seen them being
>> hit, so I _presume_ it doesn't happen, but maybe somebody inventive
>> could come up with a way of putting a HZP tail into a page table ...?
>
> The huge zero pmd split is specially handled by
> __split_huge_zero_page_pmd(), which actually replaces every subpages
> of HZP to zero page.
Right.
The only thing that can happen is that we GUP a part of the huge
zeropage (FOLL_PIN only, FOLL_LONGTERM/FOLL_WRITE would trigger a fault
first and map us an anon folio), and unpinning would drop these references.
unpin_user_page()->gup_put_folio()->folio_put_refs() would call
__folio_put().
Not sure if __folio_put() does the right thing, but I hope so :) Did not
look into the details.
In folios_put_refs() we do have is_huge_zero_page() special handling, I
guess that is for ordinary zap/unmap and likely the right thing to do.
Looks a bit inconsistent. (folio_put_refs() vs. folios_put_refs())
Likely, we should also not perform any refcounting on the huge zeropage
in GUP, just like we do for the ordinary zeropage nowdays. [ccing Dave]
--
Cheers,
David / dhildenb
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2024-03-04 21:52 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-03-04 16:54 Can the huge zero page be partially mapped? Matthew Wilcox
2024-03-04 19:19 ` Yang Shi
2024-03-04 21:52 ` David Hildenbrand
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.