All of lore.kernel.org
 help / color / mirror / Atom feed
From: Chuck Lever <chuck.lever@oracle.com>
To: Dai Ngo <dai.ngo@oracle.com>, Bruce Fields <bfields@fieldses.org>,
	Olga Kornievskaia <aglo@umich.edu>
Cc: Linux NFS Mailing List <linux-nfs@vger.kernel.org>
Subject: Re: [PATCH 1/2] NFSD: Fix use-after-free warning when doing inter-server copy
Date: Mon, 1 Mar 2021 18:15:49 +0000	[thread overview]
Message-ID: <4F649BEC-C222-4D16-B8C1-D1432690498F@oracle.com> (raw)
In-Reply-To: <1d52c88d-0c30-b9a1-9724-596bf7de314c@oracle.com>



> On Feb 25, 2021, at 1:58 PM, dai.ngo@oracle.com wrote:
> 
> 
> On 2/24/21 6:26 PM, dai.ngo@oracle.com wrote:
>> Hi Olga and Bruce,
>> 
>> On 2/24/21 2:35 PM, Olga Kornievskaia wrote:
>>> On Mon, Feb 22, 2021 at 5:09 PM <dai.ngo@oracle.com> wrote:
>>>> 
>>>> On 2/22/21 2:01 PM, dai.ngo@oracle.com wrote:
>>>>> On 2/22/21 1:46 PM, dai.ngo@oracle.com wrote:
>>>>>> On 2/22/21 10:34 AM, dai.ngo@oracle.com wrote:
>>>>>>> On 2/20/21 8:16 PM, dai.ngo@oracle.com wrote:
>>>>>>>> On 2/20/21 6:08 AM, Olga Kornievskaia wrote:
>>>>>>>>> On Fri, Feb 19, 2021 at 10:21 PM J. Bruce Fields
>>>>>>>>> <bfields@fieldses.org> wrote:
>>>>>>>>>> On Fri, Feb 19, 2021 at 05:31:58PM -0800, dai.ngo@oracle.com wrote:
>>>>>>>>>>> If this is the cause why we don't drop the mount after the copy
>>>>>>>>>>> then I can restore the patch and look into this problem.
>>>>>>>>>>> Unfortunately,
>>>>>>>>>>> all my test machines are down for maintenance until Sunday/Monday.
>>>>>>>>>> I think we can take some time to figure out what's actually going on
>>>>>>>>>> here before reverting anything.
>>>>>>>>> Yes I agree. We need to fix the use-after-free and also make sure
>>>>>>>>> that
>>>>>>>>> reference will go away.
>>>>>>> I reverted the patch, verified the warning message is back:
>>>>>>> 
>>>>>>> Feb 22 10:07:45 nfsvmf24 kernel: ------------[ cut here ]------------
>>>>>>> Feb 22 10:07:45 nfsvmf24 kernel: refcount_t: underflow; use-after-free.
>>>>>>> 
>>>>>>> then did a inter-server copy and waited for more than 20 mins and
>>>>>>> the destination server still maintains the session with the source
>>>>>>> server.  It must be some other references that prevent the mount
>>>>>>> to go away.
>>>>>> This change fixed the unmount after inter-server copy problem:
>>>>>> 
>>>>>> diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c
>>>>>> index 8d6d2678abad..87687cd18938 100644
>>>>>> --- a/fs/nfsd/nfs4proc.c
>>>>>> +++ b/fs/nfsd/nfs4proc.c
>>>>>> @@ -1304,7 +1304,7 @@ nfsd4_cleanup_inter_ssc(struct vfsmount
>>>>>> *ss_mnt, struct nfsd_file *src,
>>>>>>                          struct nfsd_file *dst)
>>>>>>   {
>>>>>>          nfs42_ssc_close(src->nf_file);
>>>>>> -       /* 'src' is freed by nfsd4_do_async_copy */
>>>>>> +       nfsd_file_put(src);
>>>>>>          nfsd_file_put(dst);
>>>>>>          mntput(ss_mnt);
>>>>>>   }
>>>> This change is not need. It's left over from my testing to
>>>> reproduce the warning messages. Only the change in
>>>> nfsd4_do_async_copy is needed for the unmount problem.
>>>> 
>>>> -Dai
>>>> 
>>>>>> @@ -1472,14 +1472,12 @@ static int nfsd4_do_async_copy(void *data)
>>>>>>                  copy->nf_src = kzalloc(sizeof(struct nfsd_file),
>>>>>> GFP_KERNEL);
>>>>>>                  if (!copy->nf_src) {
>>>>>>                          copy->nfserr = nfserr_serverfault;
>>>>>> - nfsd4_interssc_disconnect(copy->ss_mnt);
>>>>>>                          goto do_callback;
>>>>>>                  }
>>>>>>                  copy->nf_src->nf_file = nfs42_ssc_open(copy->ss_mnt,
>>>>>> &copy->c_fh,
>>>>>> &copy->stateid);
>>>>>>                  if (IS_ERR(copy->nf_src->nf_file)) {
>>>>>>                          copy->nfserr = nfserr_offload_denied;
>>>>>> - nfsd4_interssc_disconnect(copy->ss_mnt);
>>>>>>                          goto do_callback;
>>>>>>                  }
>>>>>>          }
>>>>>> @@ -1498,6 +1496,7 @@ static int nfsd4_do_async_copy(void *data)
>>>>>>                          &nfsd4_cb_offload_ops,
>>>>>> NFSPROC4_CLNT_CB_OFFLOAD);
>>>>>>          nfsd4_run_cb(&cb_copy->cp_cb);
>>>>>>   out:
>>>>>> +       nfsd4_interssc_disconnect(copy->ss_mnt);
>>>>>>          if (!copy->cp_intra)
>>>>>>                  kfree(copy->nf_src);
>>>>>>          cleanup_async_copy(copy);
>>>>>> 
>>>>>> But there is something new. I tried inter-server copy twice.
>>>>>> First time I can verify from tshark capture that a session was
>>>>>> created and destroy, along with all the NFS ops. On 2nd try,
>>>>>> I can
>>> Hi Dai/Bruce,
>>> 
>>> While I believe the fix works (as in the mount goes away), I'm not
>>> comfortable with this fix as I believe we will be leaking resources.
>>> Server calls nfs42_ssc_open() which creates a legit file pointer (yes
>>> it takes a reference on superblock but it also allocates memory for
>>> "file" structure). Normally a file structure requires doing an fput()
>>> which if I look thru the code does a bunch of things before in the end
>>> calling mntput(). While we free the copy->nf_src in
>>> nfs4_do_asyn_copy(), the copy->nf_src->nf_file was allocated
>>> separately and would have been freed from calling fput() on it.
>>> 
>>> So I guess it's not correct to do kfree(copy->nf_src) in teh
>>> nfs4_do_async_copy() I think that's probably where use-after-free
>>> comes in. We need to keep it until the cleanup. I'm thinking perhaps
>>> call fput(copy->nf_src->nf_file) first and then free it? Just an idea
>>> but I haven't tested it.
>> 
>> I think the unmount can be treated separately and the fix seems
>> to be valid for this issue. I think kfree(copy->nf_src) is called
>> after nfsd4_cleanup_inter_ssc so it's not the reason for the
>> 'use-after-free' warning. A quick look at nfsd4_do_async_copy
>> I see nf_src was kzalloc'ed but no ref count added to it.
>> 
>> I will review the whole cleanup part and report back.
> 
> I think this would fix the resource leak problem:
> 
> diff --git a/fs/nfs/nfs4file.c b/fs/nfs/nfs4file.c
> index 57b3821d975a..742fc128fdc8 100644
> --- a/fs/nfs/nfs4file.c
> +++ b/fs/nfs/nfs4file.c
> @@ -405,6 +405,11 @@ static void __nfs42_ssc_close(struct file *filep)
>        struct nfs_open_context *ctx = nfs_file_open_context(filep);
>         ctx->state->flags = 0;
> +       if (!filep)
> +               return;
> +       get_file(filep);
> +       filp_close(filep, NULL);
> +       fput(filep);
> }
>  static const struct nfs4_ssc_client_ops nfs4_ssc_clnt_ops_tbl = {
> 
> -Dai

I'm not clear on the final outcome. Is a code change needed?
If so, can someone post a patch (with a full description and
Signed-off-by) or point me to one that's been posted already?

Thanks!

--
Chuck Lever




  reply	other threads:[~2021-03-01 18:19 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-10-29 19:07 [PATCH 0/2] NFSD: Fix use-after-free warning when doing inter-server copy Dai Ngo
2020-10-29 19:07 ` [PATCH 1/2] " Dai Ngo
2021-02-20  0:18   ` Olga Kornievskaia
2021-02-20  1:09     ` J. Bruce Fields
2021-02-20  1:15       ` dai.ngo
2021-02-20  1:31       ` dai.ngo
2021-02-20  3:20         ` J. Bruce Fields
2021-02-20  3:41           ` dai.ngo
2021-02-20 14:08           ` Olga Kornievskaia
2021-02-21  4:16             ` dai.ngo
2021-02-22 18:34               ` dai.ngo
2021-02-22 21:46                 ` dai.ngo
2021-02-22 22:01                   ` dai.ngo
2021-02-22 22:08                     ` dai.ngo
2021-02-24 22:35                       ` Olga Kornievskaia
2021-02-25  2:26                         ` dai.ngo
2021-02-25 18:58                           ` dai.ngo
2021-03-01 18:15                             ` Chuck Lever [this message]
2020-10-29 19:07 ` [PATCH 2/2] NFSD: fix missing refcount in nfsd4_copy by nfsd4_do_async_copy Dai Ngo
2020-11-05 22:25 ` [PATCH 0/2] NFSD: Fix use-after-free warning when doing inter-server copy J. Bruce Fields

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4F649BEC-C222-4D16-B8C1-D1432690498F@oracle.com \
    --to=chuck.lever@oracle.com \
    --cc=aglo@umich.edu \
    --cc=bfields@fieldses.org \
    --cc=dai.ngo@oracle.com \
    --cc=linux-nfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.