Linux-NFS Archive on lore.kernel.org
 help / color / Atom feed
From: Trond Myklebust <trondmy@hammerspace.com>
To: "mk@cm4all.com" <mk@cm4all.com>,
	"gregkh@linuxfoundation.org" <gregkh@linuxfoundation.org>
Cc: "zhangliguang@linux.alibaba.com" <zhangliguang@linux.alibaba.com>,
	"linux-nfs@vger.kernel.org" <linux-nfs@vger.kernel.org>,
	"linux-fsdevel@vger.kernel.org" <linux-fsdevel@vger.kernel.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH] Revert "NFS: readdirplus optimization by cache mechanism" (memleak)
Date: Fri, 12 Jul 2019 20:16:04 +0000
Message-ID: <45b015b735ad741205205ef8144dd4f748245ee0.camel@hammerspace.com> (raw)
In-Reply-To: <20190712152813.GB13940@kroah.com>

On Fri, 2019-07-12 at 17:28 +0200, Greg KH wrote:
> On Fri, Jul 12, 2019 at 04:18:06PM +0200, Max Kellermann wrote:
> > This reverts commit be4c2d4723a4a637f0d1b4f7c66447141a4b3564.
> > 
> > That commit caused a severe memory leak in nfs_readdir_make_qstr().
> > 
> > When listing a directory with more than 100 files (this is how many
> > struct nfs_cache_array_entry elements fit in one 4kB page), all
> > allocated file name strings past those 100 leak.
> > 
> > The root of the leakage is that those string pointers are managed
> > in
> > pages which are never linked into the page cache.
> > 
> > fs/nfs/dir.c puts pages into the page cache by calling
> > read_cache_page(); the callback function nfs_readdir_filler() will
> > then fill the given page struct which was passed to it, which is
> > already linked in the page cache (by do_read_cache_page() calling
> > add_to_page_cache_lru()).
> > 
> > Commit be4c2d4723a4 added another (local) array of allocated pages,
> > to
> > be filled with more data, instead of discarding excess items
> > received
> > from the NFS server.  Those additional pages can be used by the
> > next
> > nfs_readdir_filler() call (from within the same nfs_readdir()
> > call).
> > 
> > The leak happens when some of those additional pages are never used
> > (copied to the page cache using copy_highpage()).  The pages will
> > be
> > freed by nfs_readdir_free_pages(), but their contents will
> > not.  The
> > commit did not invoke nfs_readdir_clear_array() (and doing so would
> > have been dangerous, because it did not track which of those pages
> > were already copied to the page cache, risking double free bugs).
> > 
> > How to reproduce the leak:
> > 
> > - Use a kernel with CONFIG_SLUB_DEBUG_ON.
> > 
> > - Create a directory on a NFS mount with more than 100 files with
> >   names long enough to use the "kmalloc-32" slab (so we can easily
> >   look up the allocation counts):
> > 
> >   for i in `seq 110`; do touch ${i}_0123456789abcdef; done
> > 
> > - Drop all caches:
> > 
> >   echo 3 >/proc/sys/vm/drop_caches
> > 
> > - Check the allocation counter:
> > 
> >   grep nfs_readdir /sys/kernel/slab/kmalloc-32/alloc_calls
> >   30564391 nfs_readdir_add_to_array+0x73/0xd0
> > age=534558/4791307/6540952 pid=370-1048386 cpus=0-47 nodes=0-1
> > 
> > - Request a directory listing and check the allocation counters
> > again:
> > 
> >   ls
> >   [...]
> >   grep nfs_readdir /sys/kernel/slab/kmalloc-32/alloc_calls
> >   30564511 nfs_readdir_add_to_array+0x73/0xd0
> > age=207/4792999/6542663 pid=370-1048386 cpus=0-47 nodes=0-1
> > 
> > There are now 120 new allocations.
> > 
> > - Drop all caches and check the counters again:
> > 
> >   echo 3 >/proc/sys/vm/drop_caches
> >   grep nfs_readdir /sys/kernel/slab/kmalloc-32/alloc_calls
> >   30564401 nfs_readdir_add_to_array+0x73/0xd0
> > age=735/4793524/6543176 pid=370-1048386 cpus=0-47 nodes=0-1
> > 
> > 110 allocations are gone, but 10 have leaked and will never be
> > freed.
> > 
> > Unhelpfully, those allocations are explicitly excluded from
> > KMEMLEAK,
> > that's why my initial attempts with KMEMLEAK were not successful:
> > 
> > 	/*
> > 	 * Avoid a kmemleak false positive. The pointer to the name is
> > stored
> > 	 * in a page cache page which kmemleak does not scan.
> > 	 */
> > 	kmemleak_not_leak(string->name);
> > 
> > It would be possible to solve this bug without reverting the whole
> > commit:
> > 
> > - keep track of which pages were not used, and call
> >   nfs_readdir_clear_array() on them, or
> > - manually link those pages into the page cache
> > 
> > But for now I have decided to just revert the commit, because the
> > real
> > fix would require complex considerations, risking more dangerous
> > (crash) bugs, which may seem unsuitable for the stable branches.
> > 
> > Signed-off-by: Max Kellermann <mk@cm4all.com>
> > ---
> >  fs/nfs/dir.c      | 90 ++++---------------------------------------
> > ----
> >  fs/nfs/internal.h |  3 +-
> >  2 files changed, 7 insertions(+), 86 deletions(-)
> 
> No cc: stable@vger on this patch to get it backported?
> 

I've added one. I've also backed out the 3 fixes for other problems
with the same patch that were in the linux-next tree. (St. Stephen
Rothwell please forgive me, for I have sinned...)

-- 
Trond Myklebust
Linux NFS client maintainer, Hammerspace
trond.myklebust@hammerspace.com



      reply index

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-07-12 14:18 Max Kellermann
2019-07-12 15:28 ` Greg KH
2019-07-12 20:16   ` Trond Myklebust [this message]

Reply instructions:

You may reply publically to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=45b015b735ad741205205ef8144dd4f748245ee0.camel@hammerspace.com \
    --to=trondmy@hammerspace.com \
    --cc=gregkh@linuxfoundation.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-nfs@vger.kernel.org \
    --cc=mk@cm4all.com \
    --cc=zhangliguang@linux.alibaba.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Linux-NFS Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-nfs/0 linux-nfs/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-nfs linux-nfs/ https://lore.kernel.org/linux-nfs \
		linux-nfs@vger.kernel.org linux-nfs@archiver.kernel.org
	public-inbox-index linux-nfs


Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-nfs


AGPL code for this site: git clone https://public-inbox.org/ public-inbox