All of lore.kernel.org
 help / color / mirror / Atom feed
From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: linux-kernel@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	stable@vger.kernel.org, Josef Bacik <josef@toxicpanda.com>,
	Rik van Riel <riel@surriel.com>, Chris Mason <clm@fb.com>,
	"Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>,
	"Matthew Wilcox (Oracle)" <willy@infradead.org>,
	Andrew Morton <akpm@linux-foundation.org>
Subject: [PATCH 5.15 06/69] mm: fix page leak with multiple threads mapping the same page
Date: Mon,  1 Aug 2022 13:46:30 +0200	[thread overview]
Message-ID: <20220801114134.726736393@linuxfoundation.org> (raw)
In-Reply-To: <20220801114134.468284027@linuxfoundation.org>

From: Josef Bacik <josef@toxicpanda.com>

commit 3fe2895cfecd03ac74977f32102b966b6589f481 upstream.

We have an application with a lot of threads that use a shared mmap backed
by tmpfs mounted with -o huge=within_size.  This application started
leaking loads of huge pages when we upgraded to a recent kernel.

Using the page ref tracepoints and a BPF program written by Tejun Heo we
were able to determine that these pages would have multiple refcounts from
the page fault path, but when it came to unmap time we wouldn't drop the
number of refs we had added from the faults.

I wrote a reproducer that mmap'ed a file backed by tmpfs with -o
huge=always, and then spawned 20 threads all looping faulting random
offsets in this map, while using madvise(MADV_DONTNEED) randomly for huge
page aligned ranges.  This very quickly reproduced the problem.

The problem here is that we check for the case that we have multiple
threads faulting in a range that was previously unmapped.  One thread maps
the PMD, the other thread loses the race and then returns 0.  However at
this point we already have the page, and we are no longer putting this
page into the processes address space, and so we leak the page.  We
actually did the correct thing prior to f9ce0be71d1f, however it looks
like Kirill copied what we do in the anonymous page case.  In the
anonymous page case we don't yet have a page, so we don't have to drop a
reference on anything.  Previously we did the correct thing for file based
faults by returning VM_FAULT_NOPAGE so we correctly drop the reference on
the page we faulted in.

Fix this by returning VM_FAULT_NOPAGE in the pmd_devmap_trans_unstable()
case, this makes us drop the ref on the page properly, and now my
reproducer no longer leaks the huge pages.

[josef@toxicpanda.com: v2]
  Link: https://lkml.kernel.org/r/e90c8f0dbae836632b669c2afc434006a00d4a67.1657721478.git.josef@toxicpanda.com
Link: https://lkml.kernel.org/r/2b798acfd95c9ab9395fe85e8d5a835e2e10a920.1657051137.git.josef@toxicpanda.com
Fixes: f9ce0be71d1f ("mm: Cleanup faultaround and finish_fault() codepaths")
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Rik van Riel <riel@surriel.com>
Signed-off-by: Chris Mason <clm@fb.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 mm/memory.c |    7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

--- a/mm/memory.c
+++ b/mm/memory.c
@@ -4071,9 +4071,12 @@ vm_fault_t finish_fault(struct vm_fault
 		}
 	}
 
-	/* See comment in handle_pte_fault() */
+	/*
+	 * See comment in handle_pte_fault() for how this scenario happens, we
+	 * need to return NOPAGE so that we drop this page.
+	 */
 	if (pmd_devmap_trans_unstable(vmf->pmd))
-		return 0;
+		return VM_FAULT_NOPAGE;
 
 	vmf->pte = pte_offset_map_lock(vma->vm_mm, vmf->pmd,
 				      vmf->address, &vmf->ptl);



  parent reply	other threads:[~2022-08-01 12:03 UTC|newest]

Thread overview: 78+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-08-01 11:46 [PATCH 5.15 00/69] 5.15.59-rc1 review Greg Kroah-Hartman
2022-08-01 11:46 ` [PATCH 5.15 01/69] Bluetooth: L2CAP: Fix use-after-free caused by l2cap_chan_put Greg Kroah-Hartman
2022-08-01 11:46 ` [PATCH 5.15 02/69] Revert "ocfs2: mount shared volume without ha stack" Greg Kroah-Hartman
2022-08-01 11:46 ` [PATCH 5.15 03/69] ntfs: fix use-after-free in ntfs_ucsncmp() Greg Kroah-Hartman
2022-08-01 11:46 ` [PATCH 5.15 04/69] fs: sendfile handles O_NONBLOCK of out_fd Greg Kroah-Hartman
2022-08-01 11:46 ` [PATCH 5.15 05/69] secretmem: fix unhandled fault in truncate Greg Kroah-Hartman
2022-08-01 11:46 ` Greg Kroah-Hartman [this message]
2022-08-01 11:46 ` [PATCH 5.15 07/69] hugetlb: fix memoryleak in hugetlb_mcopy_atomic_pte Greg Kroah-Hartman
2022-08-01 11:46 ` [PATCH 5.15 08/69] asm-generic: remove a broken and needless ifdef conditional Greg Kroah-Hartman
2022-08-01 11:46 ` [PATCH 5.15 09/69] s390/archrandom: prevent CPACF trng invocations in interrupt context Greg Kroah-Hartman
2022-08-01 11:46 ` [PATCH 5.15 10/69] nouveau/svm: Fix to migrate all requested pages Greg Kroah-Hartman
2022-08-01 11:46 ` [PATCH 5.15 11/69] drm/simpledrm: Fix return type of simpledrm_simple_display_pipe_mode_valid() Greg Kroah-Hartman
2022-08-01 11:46 ` [PATCH 5.15 12/69] watch_queue: Fix missing rcu annotation Greg Kroah-Hartman
2022-08-01 11:46 ` [PATCH 5.15 13/69] watch_queue: Fix missing locking in add_watch_to_object() Greg Kroah-Hartman
2022-08-01 11:46 ` [PATCH 5.15 14/69] tcp: Fix data-races around sysctl_tcp_dsack Greg Kroah-Hartman
2022-08-01 11:46 ` [PATCH 5.15 15/69] tcp: Fix a data-race around sysctl_tcp_app_win Greg Kroah-Hartman
2022-08-01 11:46 ` [PATCH 5.15 16/69] tcp: Fix a data-race around sysctl_tcp_adv_win_scale Greg Kroah-Hartman
2022-08-01 11:46 ` [PATCH 5.15 17/69] tcp: Fix a data-race around sysctl_tcp_frto Greg Kroah-Hartman
2022-08-01 11:46 ` [PATCH 5.15 18/69] tcp: Fix a data-race around sysctl_tcp_nometrics_save Greg Kroah-Hartman
2022-08-01 11:46 ` [PATCH 5.15 19/69] tcp: Fix data-races around sysctl_tcp_no_ssthresh_metrics_save Greg Kroah-Hartman
2022-08-01 11:46 ` [PATCH 5.15 20/69] ice: check (DD | EOF) bits on Rx descriptor rather than (EOP | RS) Greg Kroah-Hartman
2022-08-01 11:46 ` [PATCH 5.15 21/69] ice: do not setup vlan for loopback VSI Greg Kroah-Hartman
2022-08-01 11:46 ` [PATCH 5.15 22/69] scsi: ufs: host: Hold reference returned by of_parse_phandle() Greg Kroah-Hartman
2022-08-01 11:46 ` [PATCH 5.15 23/69] Revert "tcp: change pingpong threshold to 3" Greg Kroah-Hartman
2022-08-01 11:46 ` [PATCH 5.15 24/69] octeontx2-pf: Fix UDP/TCP src and dst port tc filters Greg Kroah-Hartman
2022-08-01 11:46 ` [PATCH 5.15 25/69] tcp: Fix data-races around sysctl_tcp_moderate_rcvbuf Greg Kroah-Hartman
2022-08-01 11:46 ` [PATCH 5.15 26/69] tcp: Fix a data-race around sysctl_tcp_limit_output_bytes Greg Kroah-Hartman
2022-08-01 11:46 ` [PATCH 5.15 27/69] tcp: Fix a data-race around sysctl_tcp_challenge_ack_limit Greg Kroah-Hartman
2022-08-01 11:46 ` [PATCH 5.15 28/69] scsi: core: Fix warning in scsi_alloc_sgtables() Greg Kroah-Hartman
2022-08-01 11:46 ` [PATCH 5.15 29/69] scsi: mpt3sas: Stop fw fault watchdog work item during system shutdown Greg Kroah-Hartman
2022-08-01 11:46 ` [PATCH 5.15 30/69] net: ping6: Fix memleak in ipv6_renew_options() Greg Kroah-Hartman
2022-08-01 11:46 ` [PATCH 5.15 31/69] ipv6/addrconf: fix a null-ptr-deref bug for ip6_ptr Greg Kroah-Hartman
2022-08-01 11:46 ` [PATCH 5.15 32/69] net/tls: Remove the context from the list in tls_device_down Greg Kroah-Hartman
2022-08-01 11:46 ` [PATCH 5.15 33/69] igmp: Fix data-races around sysctl_igmp_qrv Greg Kroah-Hartman
2022-08-01 11:46 ` [PATCH 5.15 34/69] net: pcs: xpcs: propagate xpcs_read error to xpcs_get_state_c37_sgmii Greg Kroah-Hartman
2022-08-01 11:46 ` [PATCH 5.15 35/69] net: sungem_phy: Add of_node_put() for reference returned by of_get_parent() Greg Kroah-Hartman
2022-08-01 11:47 ` [PATCH 5.15 36/69] tcp: Fix a data-race around sysctl_tcp_min_tso_segs Greg Kroah-Hartman
2022-08-01 11:47 ` [PATCH 5.15 37/69] tcp: Fix a data-race around sysctl_tcp_min_rtt_wlen Greg Kroah-Hartman
2022-08-01 11:47 ` [PATCH 5.15 38/69] tcp: Fix a data-race around sysctl_tcp_autocorking Greg Kroah-Hartman
2022-08-01 11:47 ` [PATCH 5.15 39/69] tcp: Fix a data-race around sysctl_tcp_invalid_ratelimit Greg Kroah-Hartman
2022-08-01 11:47 ` [PATCH 5.15 40/69] Documentation: fix sctp_wmem in ip-sysctl.rst Greg Kroah-Hartman
2022-08-01 11:47 ` [PATCH 5.15 41/69] macsec: fix NULL deref in macsec_add_rxsa Greg Kroah-Hartman
2022-08-01 11:47 ` [PATCH 5.15 42/69] macsec: fix error message in macsec_add_rxsa and _txsa Greg Kroah-Hartman
2022-08-01 11:47 ` [PATCH 5.15 43/69] macsec: limit replay window size with XPN Greg Kroah-Hartman
2022-08-01 11:47 ` [PATCH 5.15 44/69] macsec: always read MACSEC_SA_ATTR_PN as a u64 Greg Kroah-Hartman
2022-08-01 11:47 ` [PATCH 5.15 45/69] net: macsec: fix potential resource leak in macsec_add_rxsa() and macsec_add_txsa() Greg Kroah-Hartman
2022-08-01 11:47 ` [PATCH 5.15 46/69] net: mld: fix reference count leak in mld_{query | report}_work() Greg Kroah-Hartman
2022-08-01 11:47 ` [PATCH 5.15 47/69] tcp: Fix data-races around sk_pacing_rate Greg Kroah-Hartman
2022-08-01 11:47 ` [PATCH 5.15 48/69] net: Fix data-races around sysctl_[rw]mem(_offset)? Greg Kroah-Hartman
2022-08-01 11:47 ` [PATCH 5.15 49/69] tcp: Fix a data-race around sysctl_tcp_comp_sack_delay_ns Greg Kroah-Hartman
2022-08-01 11:47 ` [PATCH 5.15 50/69] tcp: Fix a data-race around sysctl_tcp_comp_sack_slack_ns Greg Kroah-Hartman
2022-08-01 11:47 ` [PATCH 5.15 51/69] tcp: Fix a data-race around sysctl_tcp_comp_sack_nr Greg Kroah-Hartman
2022-08-01 11:47 ` [PATCH 5.15 52/69] tcp: Fix data-races around sysctl_tcp_reflect_tos Greg Kroah-Hartman
2022-08-01 11:47 ` [PATCH 5.15 53/69] ipv4: Fix data-races around sysctl_fib_notify_on_flag_change Greg Kroah-Hartman
2022-08-01 11:47 ` [PATCH 5.15 54/69] i40e: Fix interface init with MSI interrupts (no MSI-X) Greg Kroah-Hartman
2022-08-01 11:47 ` [PATCH 5.15 55/69] sctp: fix sleep in atomic context bug in timer handlers Greg Kroah-Hartman
2022-08-01 11:47 ` [PATCH 5.15 56/69] octeontx2-pf: cn10k: Fix egress ratelimit configuration Greg Kroah-Hartman
2022-08-01 11:47 ` [PATCH 5.15 57/69] netfilter: nf_queue: do not allow packet truncation below transport header offset Greg Kroah-Hartman
2022-08-01 11:47 ` [PATCH 5.15 58/69] virtio-net: fix the race between refill work and close Greg Kroah-Hartman
2022-08-01 11:47 ` [PATCH 5.15 59/69] perf symbol: Correct address for bss symbols Greg Kroah-Hartman
2022-08-01 11:47 ` [PATCH 5.15 60/69] sfc: disable softirqs for ptp TX Greg Kroah-Hartman
2022-08-01 11:47 ` [PATCH 5.15 61/69] sctp: leave the err path free in sctp_stream_init to sctp_stream_free Greg Kroah-Hartman
2022-08-01 11:47 ` [PATCH 5.15 62/69] ARM: crypto: comment out gcc warning that breaks clang builds Greg Kroah-Hartman
2022-08-01 11:47 ` [PATCH 5.15 63/69] mm/hmm: fault non-owner device private entries Greg Kroah-Hartman
2022-08-01 11:47 ` [PATCH 5.15 64/69] page_alloc: fix invalid watermark check on a negative value Greg Kroah-Hartman
2022-08-01 11:47 ` [PATCH 5.15 65/69] ARM: 9216/1: Fix MAX_DMA_ADDRESS overflow Greg Kroah-Hartman
2022-08-01 11:47 ` [PATCH 5.15 66/69] EDAC/ghes: Set the DIMM label unconditionally Greg Kroah-Hartman
2022-08-01 11:47 ` [PATCH 5.15 67/69] docs/kernel-parameters: Update descriptions for "mitigations=" param with retbleed Greg Kroah-Hartman
2022-08-01 11:47 ` [PATCH 5.15 68/69] locking/rwsem: Allow slowpath writer to ignore handoff bit if not set by first waiter Greg Kroah-Hartman
2022-08-01 11:47 ` [PATCH 5.15 69/69] x86/bugs: Do not enable IBPB at firmware entry when IBPB is not available Greg Kroah-Hartman
2022-08-01 14:26 ` [PATCH 5.15 00/69] 5.15.59-rc1 review Jon Hunter
2022-08-01 21:19 ` Florian Fainelli
2022-08-01 21:53 ` Daniel Díaz
2022-08-01 22:18 ` Shuah Khan
2022-08-02  4:18 ` Bagas Sanjaya
2022-08-02  5:29 ` Guenter Roeck
2022-08-02 11:08 ` Ron Economos
2022-08-02 17:45 ` Sudip Mukherjee (Codethink)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20220801114134.726736393@linuxfoundation.org \
    --to=gregkh@linuxfoundation.org \
    --cc=akpm@linux-foundation.org \
    --cc=clm@fb.com \
    --cc=josef@toxicpanda.com \
    --cc=kirill.shutemov@linux.intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=riel@surriel.com \
    --cc=stable@vger.kernel.org \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.