All of lore.kernel.org
 help / color / mirror / Atom feed
From: Mike Kravetz <mike.kravetz@oracle.com>
To: Mina Almasry <almasrymina@google.com>
Cc: Linux-MM <linux-mm@kvack.org>,
	lkml <linux-kernel@vger.kernel.org>,
	Axel Rasmussen <axelrasmussen@google.com>,
	Peter Xu <peterx@redhat.com>,
	Muchun Song <songmuchun@bytedance.com>,
	Michal Hocko <mhocko@suse.com>,
	Naoya Horiguchi <naoya.horiguchi@nec.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	stable@vger.kernel.org
Subject: Re: [PATCH] mm/hugetlb: expand restore_reserve_on_error functionality
Date: Fri, 4 Jun 2021 10:33:25 -0700	[thread overview]
Message-ID: <aa145d1d-93ff-2507-3c4b-20e509a3a048@oracle.com> (raw)
In-Reply-To: <CAHS8izM-GU8_v2A9p0-ez+QNU=J9nh2zM6ZAkAMsNK7ubnsK3A@mail.gmail.com>

On 6/3/21 4:59 PM, Mina Almasry wrote:
> On Wed, Jun 2, 2021 at 4:50 PM Mike Kravetz <mike.kravetz@oracle.com> wrote:
>>
>> The routine restore_reserve_on_error is called to restore reservation
>> information when an error occurs after page allocation.  The routine
>> alloc_huge_page modifies the mapping reserve map and potentially the
>> reserve count during allocation.  If code calling alloc_huge_page
>> encounters an error after allocation and needs to free the page, the
>> reservation information needs to be adjusted.
>>
>> Currently, restore_reserve_on_error only takes action on pages for which
>> the reserve count was adjusted(HPageRestoreReserve flag).  There is
>> nothing wrong with these adjustments.  However, alloc_huge_page ALWAYS
>> modifies the reserve map during allocation even if the reserve count is
>> not adjusted.  This can cause issues as observed during development of
>> this patch [1].
>>
>> One specific series of operations causing an issue is:
>> - Create a shared hugetlb mapping
>>   Reservations for all pages created by default
>> - Fault in a page in the mapping
>>   Reservation exists so reservation count is decremented
>> - Punch a hole in the file/mapping at index previously faulted
>>   Reservation and any associated pages will be removed
>> - Allocate a page to fill the hole
>>   No reservation entry, so reserve count unmodified
>>   Reservation entry added to map by alloc_huge_page
>> - Error after allocation and before instantiating the page
>>   Reservation entry remains in map
>> - Allocate a page to fill the hole
>>   Reservation entry exists, so decrement reservation count
>> This will cause a reservation count underflow as the reservation count
>> was decremented twice for the same index.
>>
>> A user would observe a very large number for HugePages_Rsvd in
>> /proc/meminfo.  This would also likely cause subsequent allocations of
>> hugetlb pages to fail as it would 'appear' that all pages are reserved.
>>
>> This sequence of operations is unlikely to happen, however they were
>> easily reproduced and observed using hacked up code as described in [1].
>>
>> Address the issue by having the routine restore_reserve_on_error take
>> action on pages where HPageRestoreReserve is not set.  In this case, we
>> need to remove any reserve map entry created by alloc_huge_page.  A new
>> helper routine vma_del_reservation assists with this operation.
>>
>> There are three callers of alloc_huge_page which do not currently call
>> restore_reserve_on error before freeing a page on error paths.  Add
>> those missing calls.
>>
>> [1] https://lore.kernel.org/linux-mm/20210528005029.88088-1-almasrymina@google.com/
>> Fixes: 96b96a96ddee ("mm/hugetlb: fix huge page reservation leak in private mapping error paths"
>> Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
>> Cc: <stable@vger.kernel.org>
> 
> Yep, works perfectly. Thanks!
> 

Thank you for testing!

>> diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
>> @@ -2360,25 +2376,39 @@ static long vma_add_reservation(struct hstate *h,
>>         return __vma_reservation_common(h, vma, addr, VMA_ADD_RESV);
>>  }
>>
>> +static long vma_del_reservation(struct hstate *h,
>> +                       struct vm_area_struct *vma, unsigned long addr)
>> +{
>> +       return __vma_reservation_common(h, vma, addr, VMA_DEL_RESV);
>> +}
>> +
>>  /*
>> - * This routine is called to restore a reservation on error paths.  In the
>> - * specific error paths, a huge page was allocated (via alloc_huge_page)
>> - * and is about to be freed.  If a reservation for the page existed,
>> - * alloc_huge_page would have consumed the reservation and set
>> - * HPageRestoreReserve in the newly allocated page.  When the page is freed
>> - * via free_huge_page, the global reservation count will be incremented if
>> - * HPageRestoreReserve is set.  However, free_huge_page can not adjust the
>> - * reserve map.  Adjust the reserve map here to be consistent with global
>> - * reserve count adjustments to be made by free_huge_page.
>> + * This routine is called to restore reservation information on error paths.
>> + * It should ONLY be called for pages allocated via alloc_huge_page(), and
>> + * the hugetlb mutex should remain held when calling this routine.
>> + *
>> + * It handles two specific cases:
>> + * 1) A reservation was in place and page consumed the reservation.
>> + *    HPageRestoreRsvCnt is set in the page.
> 
> HPageRestoreReserve, not HPageRestoreRsvCnt.
> 

Oops, that was from a previous attempt at fixing where I renamed the flag.

I took some time to think about exactly what was needed in error paths
after page allocation.  The result was this patch.  I tried to add lots
of comments describing what is being done and why.  This code is very
complicated with subtle details.  The opposite meaning of entries in the
reserve map for shared and private mappings being one example.

>> + * 2) No reservation was in place for the page, so HPageRestoreRsvCnt is
> 
> Same.

Will be fixed in v2.

> 
> Otherwise it looks good to me. Thanks!
> 
> Reviewed-by: Mina Almasry <almasrymina@google.com>

Thank you,
-- 
Mike Kravetz

      reply	other threads:[~2021-06-04 17:33 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-06-02 23:49 [PATCH] mm/hugetlb: expand restore_reserve_on_error functionality Mike Kravetz
2021-06-03  0:38 ` Mina Almasry
2021-06-03  0:38   ` Mina Almasry
2021-06-03  1:07   ` Mike Kravetz
2021-06-03 23:59 ` Mina Almasry
2021-06-03 23:59   ` Mina Almasry
2021-06-04 17:33   ` Mike Kravetz [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aa145d1d-93ff-2507-3c4b-20e509a3a048@oracle.com \
    --to=mike.kravetz@oracle.com \
    --cc=akpm@linux-foundation.org \
    --cc=almasrymina@google.com \
    --cc=axelrasmussen@google.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@suse.com \
    --cc=naoya.horiguchi@nec.com \
    --cc=peterx@redhat.com \
    --cc=songmuchun@bytedance.com \
    --cc=stable@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.