All of lore.kernel.org
 help / color / mirror / Atom feed
* Can the huge zero page be partially mapped?
@ 2024-03-04 16:54 Matthew Wilcox
  2024-03-04 19:19 ` Yang Shi
  0 siblings, 1 reply; 3+ messages in thread
From: Matthew Wilcox @ 2024-03-04 16:54 UTC (permalink / raw)
  To: linux-mm

I looked at the definition of is_huge_zero_page():

static inline bool is_huge_zero_page(struct page *page)
{
        return READ_ONCE(huge_zero_page) == page;
}

That made me raise my eyebrows a bit because it will return false for
tail pages of the HZP (that was at least unexpected for me).  Then we
have this beauty:

void free_page_and_swap_cache(struct page *page)
{
        struct folio *folio = page_folio(page);

        free_swap_cache(folio);
        if (!is_huge_zero_page(page))
                folio_put(folio);
}

So if we can call free_page_and_swap_cache() with a tail of the HZP
we can absolutely screw up its refcounting.  Now, we have VM_BUGs
to catch the refcount going below 0, and I haven't seen them being
hit, so I _presume_ it doesn't happen, but maybe somebody inventive
could come up with a way of putting a HZP tail into a page table ...?


^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: Can the huge zero page be partially mapped?
  2024-03-04 16:54 Can the huge zero page be partially mapped? Matthew Wilcox
@ 2024-03-04 19:19 ` Yang Shi
  2024-03-04 21:52   ` David Hildenbrand
  0 siblings, 1 reply; 3+ messages in thread
From: Yang Shi @ 2024-03-04 19:19 UTC (permalink / raw)
  To: Matthew Wilcox; +Cc: linux-mm

On Mon, Mar 4, 2024 at 8:54 AM Matthew Wilcox <willy@infradead.org> wrote:
>
> I looked at the definition of is_huge_zero_page():
>
> static inline bool is_huge_zero_page(struct page *page)
> {
>         return READ_ONCE(huge_zero_page) == page;
> }
>
> That made me raise my eyebrows a bit because it will return false for
> tail pages of the HZP (that was at least unexpected for me).  Then we
> have this beauty:
>
> void free_page_and_swap_cache(struct page *page)
> {
>         struct folio *folio = page_folio(page);
>
>         free_swap_cache(folio);
>         if (!is_huge_zero_page(page))
>                 folio_put(folio);
> }
>
> So if we can call free_page_and_swap_cache() with a tail of the HZP
> we can absolutely screw up its refcounting.  Now, we have VM_BUGs
> to catch the refcount going below 0, and I haven't seen them being
> hit, so I _presume_ it doesn't happen, but maybe somebody inventive
> could come up with a way of putting a HZP tail into a page table ...?

The huge zero pmd split is specially handled by
__split_huge_zero_page_pmd(), which actually replaces every subpages
of HZP to zero page.

>


^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: Can the huge zero page be partially mapped?
  2024-03-04 19:19 ` Yang Shi
@ 2024-03-04 21:52   ` David Hildenbrand
  0 siblings, 0 replies; 3+ messages in thread
From: David Hildenbrand @ 2024-03-04 21:52 UTC (permalink / raw)
  To: Yang Shi, Matthew Wilcox; +Cc: linux-mm, David Howells

On 04.03.24 20:19, Yang Shi wrote:
> On Mon, Mar 4, 2024 at 8:54 AM Matthew Wilcox <willy@infradead.org> wrote:
>>
>> I looked at the definition of is_huge_zero_page():
>>
>> static inline bool is_huge_zero_page(struct page *page)
>> {
>>          return READ_ONCE(huge_zero_page) == page;
>> }
>>
>> That made me raise my eyebrows a bit because it will return false for
>> tail pages of the HZP (that was at least unexpected for me).  Then we
>> have this beauty:
>>
>> void free_page_and_swap_cache(struct page *page)
>> {
>>          struct folio *folio = page_folio(page);
>>
>>          free_swap_cache(folio);
>>          if (!is_huge_zero_page(page))
>>                  folio_put(folio);
>> }
>>
>> So if we can call free_page_and_swap_cache() with a tail of the HZP
>> we can absolutely screw up its refcounting.  Now, we have VM_BUGs
>> to catch the refcount going below 0, and I haven't seen them being
>> hit, so I _presume_ it doesn't happen, but maybe somebody inventive
>> could come up with a way of putting a HZP tail into a page table ...?
> 
> The huge zero pmd split is specially handled by
> __split_huge_zero_page_pmd(), which actually replaces every subpages
> of HZP to zero page.

Right.

The only thing that can happen is that we GUP a part of the huge 
zeropage (FOLL_PIN only, FOLL_LONGTERM/FOLL_WRITE would trigger a fault 
first and map us an anon folio), and unpinning would drop these references.

unpin_user_page()->gup_put_folio()->folio_put_refs() would call 
__folio_put().

Not sure if __folio_put() does the right thing, but I hope so :) Did not 
look into the details.

In folios_put_refs() we do have is_huge_zero_page() special handling, I 
guess that is for ordinary zap/unmap and likely the right thing to do.

Looks a bit inconsistent. (folio_put_refs() vs. folios_put_refs())

Likely, we should also not perform any refcounting on the huge zeropage 
in GUP, just like we do for the ordinary zeropage nowdays. [ccing Dave]

-- 
Cheers,

David / dhildenb



^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2024-03-04 21:52 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-03-04 16:54 Can the huge zero page be partially mapped? Matthew Wilcox
2024-03-04 19:19 ` Yang Shi
2024-03-04 21:52   ` David Hildenbrand

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.