From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-23.3 required=3.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_CR_TRAILER,INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS, USER_IN_DEF_DKIM_WL autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9AD20C433B4 for ; Wed, 12 May 2021 19:42:47 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 0BA3761406 for ; Wed, 12 May 2021 19:42:46 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 0BA3761406 Authentication-Results: mail.kernel.org; dmarc=fail (p=reject dis=none) header.from=google.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 59B306B006C; Wed, 12 May 2021 15:42:46 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 54AC76B006E; Wed, 12 May 2021 15:42:46 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 39D146B0070; Wed, 12 May 2021 15:42:46 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0092.hostedemail.com [216.40.44.92]) by kanga.kvack.org (Postfix) with ESMTP id 04C336B006C for ; Wed, 12 May 2021 15:42:45 -0400 (EDT) Received: from smtpin16.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with ESMTP id 96DEF181AF5CC for ; Wed, 12 May 2021 19:42:45 +0000 (UTC) X-FDA: 78133601490.16.5A40C30 Received: from mail-pg1-f169.google.com (mail-pg1-f169.google.com [209.85.215.169]) by imf25.hostedemail.com (Postfix) with ESMTP id B47A7600013E for ; Wed, 12 May 2021 19:42:36 +0000 (UTC) Received: by mail-pg1-f169.google.com with SMTP id m37so19001424pgb.8 for ; Wed, 12 May 2021 12:42:45 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=PNyPKfewiOG+4JoXBtJw18X3m/UEfPNiCXv5phkC7h8=; b=o5UsP0rDLBAs6y1mbILO2meD6IT6Od+Ft5nlgQk0YYxULqecly13h60X5jo0/KH34M nDakxS4l2wvVzfBQUQHwq7pyhOW4HvkNfufVP7lWvXV9dv5dzi8Q84xOTKzktCDaZVnT tNmmIPxy6i0XyMr8ZrgBugntU3oCme2qFKhIbLxkAWpfb4XnIWTjQQQ6QcfOMxg97tOB hni9AtGwTG5h0VuQPAjhtXwrUwu1wKjDPdUw3WqtH6k7xRbEwUNFOM6G2494lfLfr7+R RhHETiU0rN3fBKpql5Ej3uGA8KGawv/gL1aHE91gI1VjZhX6Lm53AnRBQzwtovhAn40s BpIw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=PNyPKfewiOG+4JoXBtJw18X3m/UEfPNiCXv5phkC7h8=; b=biSREDmUbJI33uNZi5HPTzSlrG6W938TfIvgh+jbl02idDoi/mfoOR5Z3sFasrb7J2 dhy4lnOvomrMNEgqupelkjYok8MfnguWCMs3rKb7Vw1LiHt+4zZboIlKv0VqZKGpLwv/ v30HufojLplZLBHX9IywC04eZZNcPT2dYi6LNJuhijJIRhhbUgKn9LtNxFQqHK27652l 3ioKPm7wNa5IJUMFKFLLQj669y+nGLYO/WYAYuXTw2qspIMChWcFQC9SLn/E7CwvQ0WV cii0F29DXGAdnV/P6aipJVX4oKFTK7IMlmRS+ZiiYLNE46Xxq+N/tBMRn4xwIP87n7Gu KixA== X-Gm-Message-State: AOAM533FLib/m0/ud6IEGuzhr/xbJH7fnkjQq8gXdfcojsAM7SdN9CV4 lKn4j7w705Ju3i1g8adxbTI48+oQEEZsqaRilXNb9w== X-Google-Smtp-Source: ABdhPJxdjiDtYqG8gqEjdRdc5eyzwsSYcqQAijzO2agw0h+mdwirm/gE8rrNbZQe2evUTaA7XXaMf5kfpPij0Oetxig= X-Received: by 2002:aa7:904e:0:b029:28f:da01:1a5f with SMTP id n14-20020aa7904e0000b029028fda011a5fmr36651523pfo.67.1620848563984; Wed, 12 May 2021 12:42:43 -0700 (PDT) MIME-Version: 1.0 References: <20210512065813.89270-1-almasrymina@google.com> In-Reply-To: From: Mina Almasry Date: Wed, 12 May 2021 12:42:32 -0700 Message-ID: Subject: Re: [PATCH] mm, hugetlb: fix resv_huge_pages underflow on UFFDIO_COPY To: Mike Kravetz , Andrew Morton , Linux-MM , open list Cc: Axel Rasmussen , Peter Xu Content-Type: text/plain; charset="UTF-8" X-Rspamd-Queue-Id: B47A7600013E Authentication-Results: imf25.hostedemail.com; dkim=pass header.d=google.com header.s=20161025 header.b=o5UsP0rD; spf=pass (imf25.hostedemail.com: domain of almasrymina@google.com designates 209.85.215.169 as permitted sender) smtp.mailfrom=almasrymina@google.com; dmarc=pass (policy=reject) header.from=google.com X-Rspamd-Server: rspam04 X-Stat-Signature: 3bto9cxyzux77d6qy3tammdahoka4rpp Received-SPF: none (google.com>: No applicable sender policy available) receiver=imf25; identity=mailfrom; envelope-from=""; helo=mail-pg1-f169.google.com; client-ip=209.85.215.169 X-HE-DKIM-Result: pass/pass X-HE-Tag: 1620848556-680115 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Wed, May 12, 2021 at 10:22 AM Mike Kravetz wrote: > > On 5/12/21 8:53 AM, Axel Rasmussen wrote: > > On Tue, May 11, 2021 at 11:58 PM Mina Almasry wrote: > >> > >> When hugetlb_mcopy_atomic_pte() is called with: > >> - mode==MCOPY_ATOMIC_NORMAL and, > >> - we already have a page in the page cache corresponding to the > >> associated address, > >> > >> We will allocate a huge page from the reserves, and then fail to insert it > >> into the cache and return -EEXIST. > >> > >> In this case, we need to return -EEXIST without allocating a new page as > >> the page already exists in the cache. Allocating the extra page causes > >> the resv_huge_pages to underflow temporarily until the extra page is > >> freed. > >> > >> Also, add the warning so that future similar instances of resv_huge_pages > >> underflowing will be caught. > >> > >> Also, minor drive-by cleanups to this code path: > >> - pagep is an out param never set by calling code, so delete code > >> assuming there could be a valid page in there. > >> - use hugetlbfs_pagecache_page() instead of repeating its > >> implementation. > >> > >> Tested using: > >> ./tools/testing/selftests/vm/userfaultfd hugetlb_shared 1024 200 \ > >> /mnt/huge > >> > >> Test passes, and dmesg shows no underflow warnings. > >> > >> Signed-off-by: Mina Almasry > >> Cc: Mike Kravetz > >> Cc: Axel Rasmussen > >> Cc: Peter Xu > >> > >> --- > >> mm/hugetlb.c | 33 ++++++++++++++++++++------------- > >> 1 file changed, 20 insertions(+), 13 deletions(-) > >> > >> diff --git a/mm/hugetlb.c b/mm/hugetlb.c > >> index 629aa4c2259c..40f4ad1bca29 100644 > >> --- a/mm/hugetlb.c > >> +++ b/mm/hugetlb.c > >> @@ -1165,6 +1165,7 @@ static struct page *dequeue_huge_page_vma(struct hstate *h, > >> page = dequeue_huge_page_nodemask(h, gfp_mask, nid, nodemask); > >> if (page && !avoid_reserve && vma_has_reserves(vma, chg)) { > >> SetHPageRestoreReserve(page); > >> + WARN_ON_ONCE(!h->resv_huge_pages); > > This warning catches underflow in a relatively specific case. In a > previous email, you mentioned that you have seem underflow on production > systems several times. Was it this specific case? I am also assuming > that the underflow you saw was transitive and corrected itself. The > value did not remain negative? > > As mentioned above, this warning only catches the specific case where > resv_huge_pages goes negative in this routine. If we believe this is > possible, then there are likely more cases where resv_huge_pages is simply > decremented when it should not. For example: resv_huge_pages temporarily > goes to 2034 from 2035 when it should not. Also, there are several > other places where resv_huge_pages could incorrectly be modified and go > negative. > My only motivation for adding this particular warning is to make sure this particular issue remains fixed and doesn't get re-introduced in the future. If that's not too useful then I can remove it, no problem, I'm not too attached to it or anything. > I would prefer not to add this warning unless you have seen this > specific case in production or some other environments. If so, then > please add the specifics. I am not opposed to adding warnings or code to > detect underflow or other accounting issues. We just need to make sure > they are likely to provide useful data. > I've actually looked at all the resv_huge_pages underflow issues we have internally, and upon a closer look I find that they are all on kernels so old they don't have 1b1edc140dc7 ("hugetlbfs: dirty pages as they are added to pagecache") or any of the others patches that fixed resv_huge_pages issues recently. I can't seem to find instances new enough that they would be useful to look at, so I take back what I said earlier. If any underflow issues pop up on our newer kernels I'll bring this up again, but for now, it seems it's just this issue related to userfaultfd. Sorry for the noise :( > >> h->resv_huge_pages--; > >> } > >> > >> @@ -4868,30 +4869,39 @@ int hugetlb_mcopy_atomic_pte(struct mm_struct *dst_mm, > >> struct page **pagep) > >> { > >> bool is_continue = (mode == MCOPY_ATOMIC_CONTINUE); > >> - struct address_space *mapping; > >> - pgoff_t idx; > >> + struct hstate *h = hstate_vma(dst_vma); > >> + struct address_space *mapping = dst_vma->vm_file->f_mapping; > >> + pgoff_t idx = vma_hugecache_offset(h, dst_vma, dst_addr); > >> unsigned long size; > >> int vm_shared = dst_vma->vm_flags & VM_SHARED; > >> - struct hstate *h = hstate_vma(dst_vma); > >> pte_t _dst_pte; > >> spinlock_t *ptl; > >> - int ret; > >> + int ret = -ENOMEM; > >> struct page *page; > >> int writable; > >> > >> - mapping = dst_vma->vm_file->f_mapping; > >> - idx = vma_hugecache_offset(h, dst_vma, dst_addr); > >> + /* Out parameter. */ > >> + WARN_ON(*pagep); > > > > I don't think this warning works, because we do set *pagep, in the > > copy_huge_page_from_user failure case. In that case, the following > > happens: > > > > 1. We set *pagep, and return immediately. > > 2. Our caller notices this particular error, drops mmap_lock, and then > > calls us again with *pagep set. > > > > In this path, we're supposed to just re-use this existing *pagep > > instead of allocating a second new page. > > > > I think this also means we need to keep the "else" case where *pagep > > is set below. > > > > +1 to Peter's comment. > Gah, sorry about that. I'll fix in v2. > -- > Mike Kravetz