From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <linux-kernel-owner@vger.kernel.org>
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
	id S1755211AbcFGRKm (ORCPT <rfc822;w@1wt.eu>);
	Tue, 7 Jun 2016 13:10:42 -0400
Received: from mail-yw0-f196.google.com ([209.85.161.196]:33772 "EHLO
	mail-yw0-f196.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
	with ESMTP id S1751412AbcFGRKj (ORCPT
	<rfc822;linux-kernel@vger.kernel.org>);
	Tue, 7 Jun 2016 13:10:39 -0400
Message-ID: <1465319435.3024.25.camel@poochiereds.net>
Subject: Re: Files leak from nfsd in 4.7.1-rc1 (and more?)
From: Jeff Layton <jlayton@poochiereds.net>
To: Oleg Drokin <green@linuxhacker.ru>,
        "J. Bruce Fields" <bfields@fieldses.org>
Cc: linux-nfs@vger.kernel.org,
        "<linux-kernel@vger.kernel.org> Mailing List" 
	<linux-kernel@vger.kernel.org>
Date: Tue, 07 Jun 2016 13:10:35 -0400
In-Reply-To: <4EDA6CFD-1FE8-4FCA-ACCF-84250BE342CB@linuxhacker.ru>
References: <4EDA6CFD-1FE8-4FCA-ACCF-84250BE342CB@linuxhacker.ru>
Content-Type: text/plain; charset="UTF-8"
X-Mailer: Evolution 3.18.5.2 (3.18.5.2-1.fc23) 
Mime-Version: 1.0
Content-Transfer-Encoding: 8bit
Sender: linux-kernel-owner@vger.kernel.org
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org

On Tue, 2016-06-07 at 11:37 -0400, Oleg Drokin wrote:
> Hello!
> 
>    I've been trying to better understand this problem I was having where sometimes
>    a formerly NFS-exported mountpoint becomes unmountable (after nfsd stop).
> 
>    I finally traced it to a leaked filedescriptor that was allocated from
>    nfsd4_open()->nfsd4_process_open2()->nfs4_get_vfs_file()->nfsd_open().
> 
>    Also together with it we see leaked credentials allocated along the same path from
>    fh_verify() and groups allocated from svcauth_unix_accept()->groups_alloc() that
>    are presumably used by the credentials.
> 
>    Unfortunately I was not able to make total sense out of the state handling in nfsd,
>    but it's clear that one of the file descriptors inside struct nfs4_file is
>    lost. I added a patch like this (always a good idea, so surprised it was not
>    there already):
> @@ -271,6 +274,9 @@ static void nfsd4_free_file_rcu(struct rcu_head *rcu)
>  {
>         struct nfs4_file *fp = container_of(rcu, struct nfs4_file, fi_rcu);
>  
> +       WARN_ON(fp->fi_fds[0]);
> +       WARN_ON(fp->fi_fds[1]);
> +       WARN_ON(fp->fi_fds[2]);
>         kmem_cache_free(file_slab, fp);
>  }
> 
>    And when the problem is hit, I am also triggering (Always this one which is fd[1])
> [ 3588.143002] ------------[ cut here ]------------
> [ 3588.143662] WARNING: CPU: 5 PID: 9 at /home/green/bk/linux/fs/nfsd/nfs4state.c:278 nfsd4_free_file_rcu+0x65/0x80 [nfsd]
> [ 3588.144947] Modules linked in: loop rpcsec_gss_krb5 joydev acpi_cpufreq tpm_tis i2c_piix4 tpm virtio_console pcspkr nfsd ttm drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm floppy serio_raw virtio_blk
> [ 3588.147135] CPU: 5 PID: 9 Comm: rcuos/0 Not tainted 4.7.0-rc1-vm-nfs+ #120
> [ 3588.153826] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
> [ 3588.153830]  0000000000000286 00000000e2d5ccdf ffff88011965bd50 ffffffff814a11a5
> [ 3588.153832]  0000000000000000 0000000000000000 ffff88011965bd90 ffffffff8108806b
> [ 3588.153834]  0000011600000000 ffff8800c476a0b8 ffff8800c476a048 ffffffffc0110fc0
> [ 3588.153834] Call Trace:
> [ 3588.153839]  [] dump_stack+0x86/0xc1
> [ 3588.153841]  [] __warn+0xcb/0xf0
> [ 3588.153852]  [] ? trace_raw_output_fh_want_write+0x60/0x60 [nfsd]
> [ 3588.153853]  [] warn_slowpath_null+0x1d/0x20
> [ 3588.153859]  [] nfsd4_free_file_rcu+0x65/0x80 [nfsd]
> [ 3588.153861]  [] rcu_nocb_kthread+0x335/0x510
> [ 3588.153862]  [] ? rcu_nocb_kthread+0x27f/0x510
> [ 3588.153863]  [] ? rcu_cpu_notify+0x3e0/0x3e0
> [ 3588.153866]  [] kthread+0x101/0x120
> [ 3588.153868]  [] ? trace_hardirqs_on_caller+0xf4/0x1b0
> [ 3588.153871]  [] ret_from_fork+0x1f/0x40
> [ 3588.153872]  [] ? kthread_create_on_node+0x250/0x250
> 
> 
>   release_all_access() seems to be doing correct job of all that cleaning, so
>   there must be some other path that I do not quite see.
> 
>   Hopefully you are more familiar with the code and can see the problem right away ;)
> 
> Bye,
>     Oleg

Hmm...well I'm not seeing it right offhand, and haven't been able to
reproduce the problem so far after a couple of attempts by hand. What
sort of workload are you running before you see that warning pop?

-- 
Jeff Layton <jlayton@poochiereds.net>