From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id EF101C433EF for ; Fri, 28 Jan 2022 04:54:33 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 908FA6B008C; Thu, 27 Jan 2022 23:54:33 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 891466B0092; Thu, 27 Jan 2022 23:54:33 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 6E4106B0093; Thu, 27 Jan 2022 23:54:33 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0007.hostedemail.com [216.40.44.7]) by kanga.kvack.org (Postfix) with ESMTP id 5B1A06B008C for ; Thu, 27 Jan 2022 23:54:33 -0500 (EST) Received: from smtpin07.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with ESMTP id 144B1181059D9 for ; Fri, 28 Jan 2022 04:54:33 +0000 (UTC) X-FDA: 79078480026.07.D4D775A Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by imf06.hostedemail.com (Postfix) with ESMTP id 9AB5F180014 for ; Fri, 28 Jan 2022 04:54:32 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1643345672; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=r0OLEdIrUqYcaY6DYkFVTZPLD4+1hLYjm8QOXt2Ctgs=; b=KS0rCxbUCeGWpqdh3MnrO9A1mZgVFH+ymefxG+6bCb6t7XB9L4SS6XfjU8H5EkdD0VfLef krK/FHiO5R+zsL0DnOB+DLjdR/ZdgngzSO4Pg+/Znb8aFef7UssGBxcnj8S9/29zftmqpH sLEXpgSdTzHNStNBJZsvVq2cmWUMeJw= Received: from mail-wm1-f70.google.com (mail-wm1-f70.google.com [209.85.128.70]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-639-GO06Lr_ZMqqWuIREHUmbRw-1; Thu, 27 Jan 2022 23:54:31 -0500 X-MC-Unique: GO06Lr_ZMqqWuIREHUmbRw-1 Received: by mail-wm1-f70.google.com with SMTP id n7-20020a1c7207000000b0034ec3d8ce0aso2392468wmc.8 for ; Thu, 27 Jan 2022 20:54:30 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=r0OLEdIrUqYcaY6DYkFVTZPLD4+1hLYjm8QOXt2Ctgs=; b=fjti5C7Xo59SI4/RPLooTy0MS7yj6rhJYt2WAT7+uoQtCp5vq0r1bC0vBv17A94GyK 2233E8OmOChpYIKkHqpv67iO9LabsKPpqqV9D3ZsU0FUUq5Tpz0QMgwfXWMx773Pzg9t eOPG3ZnGea3JajjvJTu8Qq9EtL6lmZC0junnZXGH9pSNd/MhOo2QUlYdhN0EiyN/mXzP BZcdtq6nS2qRC9zoNwtI8t/8VLtjMHveGgfoDDklaXcV9J0kO/j04Bv+5dD0igwoIEzU ZGgbJvwmASyaf10EKc8G2EHvZV3XKwVXoxydyUcBhDQ6hj7MTs4uyii9UV53k88NPyat 56SA== X-Gm-Message-State: AOAM532/sGdYyEdwm6Po2una2ft879W+pvFNxjLwiGLc7TQbO5+FuOO8 jj+y5Yk+SMKpy66rs3OeiQlC87xWmznr2yjMBf8zB87znSjv/AsRx5lo5IRi2HIlhqopueVTmwT 4NcNMdfp8GZET4l/hUYhyVcs9AeitqKRYiLEmQl+Zv/5w/YumCDuEjxy6SUhe X-Received: by 2002:a5d:6da4:: with SMTP id u4mr5486429wrs.611.1643345669693; Thu, 27 Jan 2022 20:54:29 -0800 (PST) X-Google-Smtp-Source: ABdhPJzNjlbcr557L0JzcCx8cJn1ypoGGBxeXy3L3cPs1G3/t3aoCJ8N911EUy9hAc8Wvze/o8vnfQ== X-Received: by 2002:a5d:6da4:: with SMTP id u4mr5486406wrs.611.1643345669412; Thu, 27 Jan 2022 20:54:29 -0800 (PST) Received: from localhost.localdomain ([64.64.123.9]) by smtp.gmail.com with ESMTPSA id i13sm814014wrf.3.2022.01.27.20.54.24 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Thu, 27 Jan 2022 20:54:29 -0800 (PST) From: Peter Xu To: linux-mm@kvack.org, linux-kernel@vger.kernel.org Cc: peterx@redhat.com, Alistair Popple , Andrew Morton , Andrea Arcangeli , David Hildenbrand , Matthew Wilcox , John Hubbard , Hugh Dickins , Vlastimil Babka , Yang Shi , "Kirill A . Shutemov" Subject: [PATCH v3 1/4] mm: Don't skip swap entry even if zap_details specified Date: Fri, 28 Jan 2022 12:54:09 +0800 Message-Id: <20220128045412.18695-2-peterx@redhat.com> X-Mailer: git-send-email 2.32.0 In-Reply-To: <20220128045412.18695-1-peterx@redhat.com> References: <20220128045412.18695-1-peterx@redhat.com> MIME-Version: 1.0 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset="US-ASCII" X-Rspamd-Server: rspam11 X-Rspamd-Queue-Id: 9AB5F180014 X-Stat-Signature: 6ohckoek5ugmq9mxghm5i5dsuiwr3wnb Authentication-Results: imf06.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=KS0rCxbU; dmarc=pass (policy=none) header.from=redhat.com; spf=none (imf06.hostedemail.com: domain of peterx@redhat.com has no SPF policy when checking 170.10.129.124) smtp.mailfrom=peterx@redhat.com X-Rspam-User: nil X-HE-Tag: 1643345672-941044 Content-Transfer-Encoding: quoted-printable X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: The "details" pointer shouldn't be the token to decide whether we should = skip swap entries. For example, when the user specified details->zap_mapping=3D= =3DNULL, it means the user wants to zap all the pages (including COWed pages), the= n we need to look into swap entries because there can be private COWed pages t= hat was swapped out. Skipping some swap entries when details is non-NULL may lead to wrongly l= eaving some of the swap entries while we should have zapped them. A reproducer of the problem: =3D=3D=3D8<=3D=3D=3D #define _GNU_SOURCE /* See feature_test_macros(7) */ #include #include #include #include #include int page_size; int shmem_fd; char *buffer; void main(void) { int ret; char val; page_size =3D getpagesize(); shmem_fd =3D memfd_create("test", 0); assert(shmem_fd >=3D 0); ret =3D ftruncate(shmem_fd, page_size * 2); assert(ret =3D=3D 0); buffer =3D mmap(NULL, page_size * 2, PROT_READ | PROT_WRI= TE, MAP_PRIVATE, shmem_fd, 0); assert(buffer !=3D MAP_FAILED); /* Write private page, swap it out */ buffer[page_size] =3D 1; madvise(buffer, page_size * 2, MADV_PAGEOUT); /* This should drop private buffer[page_size] already */ ret =3D ftruncate(shmem_fd, page_size); assert(ret =3D=3D 0); /* Recover the size */ ret =3D ftruncate(shmem_fd, page_size * 2); assert(ret =3D=3D 0); /* Re-read the data, it should be all zero */ val =3D buffer[page_size]; if (val =3D=3D 0) printf("Good\n"); else printf("BUG\n"); } =3D=3D=3D8<=3D=3D=3D We don't need to touch up the pmd path, because pmd never had a issue wit= h swap entries. For example, shmem pmd migration will always be split into pte = level, and same to swapping on anonymous. Add another helper should_zap_cows() so that we can also check whether we should zap private mappings when there's no page pointer specified. This patch drops that trick, so we handle swap ptes coherently. Meanwhil= e we should do the same check upon migration entry, hwpoison entry and genuine= swap entries too. To be explicit, we should still remember to keep the privat= e entries if even_cows=3D=3Dfalse, and always zap them when even_cows=3D=3D= true. The issue seems to exist starting from the initial commit of git. Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Peter Xu --- mm/memory.c | 45 ++++++++++++++++++++++++++++++++++++--------- 1 file changed, 36 insertions(+), 9 deletions(-) diff --git a/mm/memory.c b/mm/memory.c index c125c4969913..4bfeaca7cbc7 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -1313,6 +1313,17 @@ struct zap_details { struct folio *single_folio; /* Locked folio to be unmapped */ }; =20 +/* Whether we should zap all COWed (private) pages too */ +static inline bool should_zap_cows(struct zap_details *details) +{ + /* By default, zap all pages */ + if (!details) + return true; + + /* Or, we zap COWed pages only if the caller wants to */ + return !details->zap_mapping; +} + /* * We set details->zap_mapping when we want to unmap shared but keep pri= vate * pages. Return true if skip zapping this page, false otherwise. @@ -1320,11 +1331,15 @@ struct zap_details { static inline bool zap_skip_check_mapping(struct zap_details *details, struct page *page) { - if (!details || !page) + /* If we can make a decision without *page.. */ + if (should_zap_cows(details)) return false; =20 - return details->zap_mapping && - (details->zap_mapping !=3D page_rmapping(page)); + /* E.g. zero page */ + if (!page) + return false; + + return details->zap_mapping !=3D page_rmapping(page); } =20 static unsigned long zap_pte_range(struct mmu_gather *tlb, @@ -1405,17 +1420,29 @@ static unsigned long zap_pte_range(struct mmu_gat= her *tlb, continue; } =20 - /* If details->check_mapping, we leave swap entries. */ - if (unlikely(details)) - continue; - - if (!non_swap_entry(entry)) + if (!non_swap_entry(entry)) { + /* + * If this is a genuine swap entry, then it must be an + * private anon page. If the caller wants to skip + * COWed pages, ignore it. + */ + if (!should_zap_cows(details)) + continue; rss[MM_SWAPENTS]--; - else if (is_migration_entry(entry)) { + } else if (is_migration_entry(entry)) { struct page *page; =20 page =3D pfn_swap_entry_to_page(entry); + if (zap_skip_check_mapping(details, page)) + continue; rss[mm_counter(page)]--; + } else if (is_hwpoison_entry(entry)) { + /* If the caller wants to skip COWed pages, ignore it */ + if (!should_zap_cows(details)) + continue; + } else { + /* We should have covered all the swap entry types */ + WARN_ON_ONCE(1); } if (unlikely(!free_swap_and_cache(entry))) print_bad_pte(vma, addr, ptent, NULL); --=20 2.32.0