From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-24.8 required=3.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_CR_TRAILER,INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS, USER_AGENT_SANE_1,USER_IN_DEF_DKIM_WL autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 955FEC433DB for ; Sat, 16 Jan 2021 22:16:38 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 23C5122C7D for ; Sat, 16 Jan 2021 22:16:38 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 23C5122C7D Authentication-Results: mail.kernel.org; dmarc=fail (p=reject dis=none) header.from=google.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 8FDB76B00A3; Sat, 16 Jan 2021 17:16:37 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 8AF198D0212; Sat, 16 Jan 2021 17:16:37 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 79D5E8D01D5; Sat, 16 Jan 2021 17:16:37 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0219.hostedemail.com [216.40.44.219]) by kanga.kvack.org (Postfix) with ESMTP id 638186B00A3 for ; Sat, 16 Jan 2021 17:16:37 -0500 (EST) Received: from smtpin19.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with ESMTP id 1E4983644 for ; Sat, 16 Jan 2021 22:16:37 +0000 (UTC) X-FDA: 77713048434.19.tooth65_63137f62753b Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin19.hostedemail.com (Postfix) with ESMTP id DBF101ACC3E for ; Sat, 16 Jan 2021 22:16:36 +0000 (UTC) X-HE-Tag: tooth65_63137f62753b X-Filterd-Recvd-Size: 7052 Received: from mail-ot1-f48.google.com (mail-ot1-f48.google.com [209.85.210.48]) by imf12.hostedemail.com (Postfix) with ESMTP for ; Sat, 16 Jan 2021 22:16:36 +0000 (UTC) Received: by mail-ot1-f48.google.com with SMTP id f6so3209954ots.9 for ; Sat, 16 Jan 2021 14:16:36 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=date:from:to:cc:subject:message-id:user-agent:mime-version; bh=hfLIQKTOI9IiPQSzWSR50RLwJdTPm03LLxmWFc+LDGc=; b=nQltEpP9IPDjL7NdVJ/qCUEezJSAkkgayb83tTFPyZX3OwDR5OUIjY8lomLuRt3SAH bWqRqWfjkYeJB2fcgcfIdVPphZuR8GvyzLRd0axgS3usqvbn72wGdDITn0vj7WgI/jSf VJcuoJUOR6qCKK/WABCynrRQeXTcY0O8uoQx2qQoEF6jdnVGFRtCFMYkvTtj9PZTaBo5 TkggD2vEpPrxLBp7dHpLo5xirO5q+AA85BCcI62zBIB5qK2KVrXgVEag6MGcvFtuMXE3 78zBqBUmQ6K6uo/Gd1abjH3ulo486Y6olbBX6dt8i4nOYWZK7bv6PNA6CUJpW2z/Rowc NR6A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:user-agent :mime-version; bh=hfLIQKTOI9IiPQSzWSR50RLwJdTPm03LLxmWFc+LDGc=; b=XhtbhzSRJ3sGQkzei5OqVPl+w2+GWSzW6u5+yaknqcqDNAAamU/XYIdL3ay5bZ2e2D 1J2rNoo2vdxdl9UMq2nyYJPeT5Z4v4mNORAcmzzz1LmAf+gOPq7iLI2rttyutT48kAiD 68siHRzZcCmuLikRrP3Qgw4842kToluCPNtPDmYbRQXq0M9c9B/GcB7NrjfxtmNpOsO8 fyb6Db5yBuxMbcin0jiAKPBPyJ6bQYKOEBlQj2aL9npZNXOZBU5hQ8kBDjM2us2vzOqb RRfS8BWAsb9FekDG8MN24jgqX19fRflAAIGEq1oxbYfq0Vk4Uns74Z1Md2/n9yraRj6i /Cjw== X-Gm-Message-State: AOAM531J7Lms5aeax59bC7lsvT6QxQay8CGNtnEjBKaaugULVGdKRi9P pCYbcYsm0jbL5pAak0zEQB/hJA== X-Google-Smtp-Source: ABdhPJySQElsInHJUFhtoqiKGBkTqFBYNs36UvhaQZlN0hf32s6S0iXTuj3uVDIzXBeN5SssjxalQA== X-Received: by 2002:a05:6830:1517:: with SMTP id k23mr13407136otp.348.1610835395565; Sat, 16 Jan 2021 14:16:35 -0800 (PST) Received: from eggly.attlocal.net (172-10-233-147.lightspeed.sntcca.sbcglobal.net. [172.10.233.147]) by smtp.gmail.com with ESMTPSA id i1sm2911608otr.81.2021.01.16.14.16.34 (version=TLS1 cipher=ECDHE-ECDSA-AES128-SHA bits=128/128); Sat, 16 Jan 2021 14:16:34 -0800 (PST) Date: Sat, 16 Jan 2021 14:16:24 -0800 (PST) From: Hugh Dickins X-X-Sender: hugh@eggly.anvils To: Andrew Morton cc: Sergey Senozhatsky , Sergey Senozhatsky , "Kirill A. Shutemov" , "Kirill A. Shutemov" , Suleiman Souhlal , Jann Horn , Hugh Dickins , Matthew Wilcox , Andrea Arcangeli , linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: [PATCH] mm: thp: fix MADV_REMOVE deadlock on shmem THP Message-ID: User-Agent: Alpine 2.11 (LSU 23 2013-08-11) MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Sergey reported deadlock between kswapd correctly doing its usual lock_page(page) followed by down_read(page->mapping->i_mmap_rwsem), and madvise(MADV_REMOVE) on an madvise(MADV_HUGEPAGE) area doing down_write(page->mapping->i_mmap_rwsem) followed by lock_page(page). This happened when shmem_fallocate(punch hole)'s unmap_mapping_range() reaches zap_pmd_range()'s call to __split_huge_pmd(). The same deadlock could occur when partially truncating a mapped huge tmpfs file, or using fallocate(FALLOC_FL_PUNCH_HOLE) on it. __split_huge_pmd()'s page lock was added in 5.8, to make sure that any concurrent use of reuse_swap_page() (holding page lock) could not catch the anon THP's mapcounts and swapcounts while they were being split. Fortunately, reuse_swap_page() is never applied to a shmem or file THP (not even by khugepaged, which checks PageSwapCache before calling), and anonymous THPs are never created in shmem or file areas: so that __split_huge_pmd()'s page lock can only be necessary for anonymous THPs, on which there is no risk of deadlock with i_mmap_rwsem. Reported-by: Sergey Senozhatsky Fixes: c444eb564fb1 ("mm: thp: make the THP mapcount atomic against __split_huge_pmd_locked()") Signed-off-by: Hugh Dickins Reviewed-by: Andrea Arcangeli Cc: stable@vger.kernel.org --- The status of reuse_swap_page(), and its use on THPs, is currently under discussion, and may need to be changed: but this patch is a simple fix to the reported deadlock, which can go in now, and be easily backported to whichever stable and longterm releases took in 5.8's c444eb564fb1. mm/huge_memory.c | 37 +++++++++++++++++++++++-------------- 1 file changed, 23 insertions(+), 14 deletions(-) --- 5.11-rc3/mm/huge_memory.c 2020-12-27 20:39:37.667932292 -0800 +++ linux/mm/huge_memory.c 2021-01-16 08:02:08.265551393 -0800 @@ -2202,7 +2202,7 @@ void __split_huge_pmd(struct vm_area_str { spinlock_t *ptl; struct mmu_notifier_range range; - bool was_locked = false; + bool do_unlock_page = false; pmd_t _pmd; mmu_notifier_range_init(&range, MMU_NOTIFY_CLEAR, 0, vma, vma->vm_mm, @@ -2218,7 +2218,6 @@ void __split_huge_pmd(struct vm_area_str VM_BUG_ON(freeze && !page); if (page) { VM_WARN_ON_ONCE(!PageLocked(page)); - was_locked = true; if (page != pmd_page(*pmd)) goto out; } @@ -2227,19 +2226,29 @@ repeat: if (pmd_trans_huge(*pmd)) { if (!page) { page = pmd_page(*pmd); - if (unlikely(!trylock_page(page))) { - get_page(page); - _pmd = *pmd; - spin_unlock(ptl); - lock_page(page); - spin_lock(ptl); - if (unlikely(!pmd_same(*pmd, _pmd))) { - unlock_page(page); + /* + * An anonymous page must be locked, to ensure that a + * concurrent reuse_swap_page() sees stable mapcount; + * but reuse_swap_page() is not used on shmem or file, + * and page lock must not be taken when zap_pmd_range() + * calls __split_huge_pmd() while i_mmap_lock is held. + */ + if (PageAnon(page)) { + if (unlikely(!trylock_page(page))) { + get_page(page); + _pmd = *pmd; + spin_unlock(ptl); + lock_page(page); + spin_lock(ptl); + if (unlikely(!pmd_same(*pmd, _pmd))) { + unlock_page(page); + put_page(page); + page = NULL; + goto repeat; + } put_page(page); - page = NULL; - goto repeat; } - put_page(page); + do_unlock_page = true; } } if (PageMlocked(page)) @@ -2249,7 +2258,7 @@ repeat: __split_huge_pmd_locked(vma, pmd, range.start, freeze); out: spin_unlock(ptl); - if (!was_locked && page) + if (do_unlock_page) unlock_page(page); /* * No need to double call mmu_notifier->invalidate_range() callback.