From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.0 required=3.0 tests=DKIM_SIGNED,DKIM_VALID, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH,MAILING_LIST_MULTI,SIGNED_OFF_BY, SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id C9882C49ED7 for ; Tue, 10 Sep 2019 19:00:38 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 78E0D2084D for ; Tue, 10 Sep 2019 19:00:38 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=zadara-com.20150623.gappssmtp.com header.i=@zadara-com.20150623.gappssmtp.com header.b="YF9ckAJK" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1725916AbfIJTAh (ORCPT ); Tue, 10 Sep 2019 15:00:37 -0400 Received: from mail-io1-f68.google.com ([209.85.166.68]:38932 "EHLO mail-io1-f68.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726107AbfIJTAh (ORCPT ); Tue, 10 Sep 2019 15:00:37 -0400 Received: by mail-io1-f68.google.com with SMTP id d25so40063542iob.6 for ; Tue, 10 Sep 2019 12:00:35 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=zadara-com.20150623.gappssmtp.com; s=20150623; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=Q7L3xHxf6ChAQcQVYx/1HkmySUWaZzSNUSgIiOR9T50=; b=YF9ckAJK4I3Nqlx4Eqf3yGJ3yFf64yM38cMIbTMDhS5GOJndA1+myjh4uA9g65j3MS ACciJmPJaBMrgyCvDZ5/Q1KZEgOQJohRe0ztU29qkxoMeqrdkNZ6F8PRFdtgTWnsP4q3 2NhV5wTj38xlZBWoMeSLiuar4fmu2pXVVIu65x0nX6hEEzaQaG3nkJETZ9psp4NtOwoD CVOZcBO8x+tqTIviS8Hmh2GttmcPsCwCCIYrGSW4N5fDJKzITK99Qwqy2dMgWZivU+uc YRi4OS+1upKUypdAsIaNTdI68knXwr7haXTmZ5w1WfcQdfHglrd4x69BuZ6O4QibDlS1 8MyA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=Q7L3xHxf6ChAQcQVYx/1HkmySUWaZzSNUSgIiOR9T50=; b=nWWrSkeit1hUIZGrNCsI2ciGc9IDXUh33BNTTx6IbuRp+K6M8wQNLhK5axDTJEkKZN N5RZiRnnxLy1V7Dx/ZkKeGlrz6cLpeo/ARJX1nImOrlUBF9ruLQUe+DOPO9JCLU0N7MF 5ubqCzF+wxainRtE3/oMvW7z5gMT1kMfXQoW9G241OBr/DIpl+/EL6eGAPPfSZBLdAkC ef/yJ3B64GH9+zRuiTUKmIjs5sIRdlXVyQI+/rdHu3RJxYRvn9SXU/gPgr1rRpIVN/8z fWdxlrpLk/YcH1eXBojwwsp5gVZ0uX3vLbUET8MwQ2jqA+6t+YgjgNozmIJxd1b5MKGj +RIg== X-Gm-Message-State: APjAAAWx11tfjlAvTCVJc28a2yrl288SLtDGLve0j7B/x3ICXrgWgmNb PR9z6++VP0sDgS3plT3AkV8lL34+pfBDxhnqK4KRzYat X-Google-Smtp-Source: APXvYqxnWO7oOGId3H9s8mSFRBkffpN7koVaFZsEjF+iguqAH9NybZfWqHhzH6Uv6E2wQmWNHqwQgLH/BDctzIzMDW4= X-Received: by 2002:a5d:8902:: with SMTP id b2mr26033938ion.135.1568142035137; Tue, 10 Sep 2019 12:00:35 -0700 (PDT) MIME-Version: 1.0 References: <1567518908-1720-1-git-send-email-alex@zadara.com> <20190906161236.GF17204@fieldses.org> In-Reply-To: <20190906161236.GF17204@fieldses.org> From: Alex Lyakas Date: Tue, 10 Sep 2019 22:00:24 +0300 Message-ID: Subject: Re: [RFC-PATCH] nfsd: provide a procfs entry to release stateids of a particular local filesystem To: "J. Bruce Fields" Cc: linux-nfs@vger.kernel.org, Shyam Kaushik Content-Type: text/plain; charset="UTF-8" Sender: linux-nfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-nfs@vger.kernel.org Hi Bruce, Thanks for reviewing the patch. I addressed your comments, and ran the patch through checkpatch.pl. Patch v2 is on its way. On Fri, Sep 6, 2019 at 7:12 PM J. Bruce Fields wrote: > > On Tue, Sep 03, 2019 at 04:55:08PM +0300, Alex Lyakas wrote: > > This patch addresses the following issue: > > Thanks for the patch and the good explanation! > > I'd rather we just call nfs4_release_stateids() from write_unlock_fs(). > > That modifies the behavior of the existing unlock_filesystem interface, > so, yes, adding a new file would be the more conservative approach. > > But really I think that the only use of unlock_filesystem is to allow > unmounting, and the fact that it only handles NLM locks is just a bug > that we should fix. > > You'll want to cover delegations as well. And probably pNFS layouts. > It'd be OK to do that incrementally in followup patches. Unfortunately, I don't have much understanding of what these are, and how to cover them) > > Style nits: > > I assume all the print statements are just for temporary debugging. I left only two prints with pr_info, which give some indication that the operation has started, and whether it did anything useful. Rest of the prints are with pr_debug. > > Try to keep lines to 80 characters. I'd break the inner loop of > nfs4_release_stateids() into a separate nfs4_release_client_stateids(sb, > clp). > > --b. > > > - Create two local file systems FS1 and FS2 on the server machine S. > > - Export both FS1 and FS2 through nfsd to the same nfs client, running on client machine C. > > - On C, mount both exported file systems and start writing files to both of them. > > - After few minutes, on server machine S, un-export FS1 only. > > - Do not unmount FS1 on the client machine C prior to un-exporting. > > - Also, FS2 remains exported to C. > > - Now we want to unmount FS1 on the server machine S, but we fail, because there are still open files on FS1 held by nfsd. > > > > Debugging this issue showed the following root cause: there is a nfs4_client entry for the client C. > > This entry has two nfs4_openowners, for FS1 and FS2, although FS1 was un-exported. > > Looking at the stateids of both openowners, we see that they have stateids of kind NFS4_OPEN_STID, > > and each stateid is holding a nfs4_file. The reason we cannot unmount FS1, is because we still have > > an openowner for FS1, holding open-stateids, which hold open files on FS1. > > > > The laundromat doesn't help in this case, because it can only decide per-nfs4_client that it should be purged. > > But in this case, since FS2 is still exported to C, there is no reason to purge the nfs4_client. > > > > This situation remains until we un-export FS2 as well. > > Then the whole nfs4_client is purged, and all the files get closed, and we can unmount both FS1 and FS2. > > > > This patch allows user-space to tell nfsd to release stateids of a particular local filesystem. > > After that, it is possible to unmount the local filesystem. > > > > This patch is based on kernel 4.14.99, which we currently use. > > That's why marking it as RFC. > > > > Signed-off-by: Alex Lyakas > > --- > > fs/nfsd/nfs4state.c | 107 +++++++++++++++++++++++++++++++++++++++++++++++++++- > > fs/nfsd/nfsctl.c | 46 ++++++++++++++++++++++ > > fs/nfsd/state.h | 2 + > > 3 files changed, 154 insertions(+), 1 deletion(-) > > > > diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c > > index 3cf0b2e..4081753 100755 > > --- a/fs/nfsd/nfs4state.c > > +++ b/fs/nfsd/nfs4state.c > > @@ -6481,13 +6481,13 @@ struct nfs4_client_reclaim * > > return nfs_ok; > > } > > > > -#ifdef CONFIG_NFSD_FAULT_INJECTION > > static inline void > > put_client(struct nfs4_client *clp) > > { > > atomic_dec(&clp->cl_refcount); > > } > > > > +#ifdef CONFIG_NFSD_FAULT_INJECTION > > static struct nfs4_client * > > nfsd_find_client(struct sockaddr_storage *addr, size_t addr_size) > > { > > @@ -6811,6 +6811,7 @@ static u64 nfsd_foreach_client_lock(struct nfs4_client *clp, u64 max, > > > > return count; > > } > > +#endif /* CONFIG_NFSD_FAULT_INJECTION */ > > > > static void > > nfsd_reap_openowners(struct list_head *reaplist) > > @@ -6826,6 +6827,7 @@ static u64 nfsd_foreach_client_lock(struct nfs4_client *clp, u64 max, > > } > > } > > > > +#ifdef CONFIG_NFSD_FAULT_INJECTION > > u64 > > nfsd_inject_forget_client_openowners(struct sockaddr_storage *addr, > > size_t addr_size) > > @@ -7072,6 +7074,109 @@ static u64 nfsd_find_all_delegations(struct nfs4_client *clp, u64 max, > > #endif /* CONFIG_NFSD_FAULT_INJECTION */ > > > > /* > > + * Attempts to release the stateids that have open files on the specified superblock. > > + */ > > +void > > +nfs4_release_stateids(struct super_block * sb) > > +{ > > + struct nfsd_net *nn = net_generic(current->nsproxy->net_ns, nfsd_net_id); > > + struct nfs4_client *clp = NULL; > > + struct nfs4_openowner *oo = NULL, *oo_next = NULL; > > + LIST_HEAD(openowner_reaplist); > > + unsigned int n_openowners = 0; > > + > > + if (!nfsd_netns_ready(nn)) > > + return; > > + > > + pr_info("=== Release stateids for sb=%p ===\n", sb); > > + > > + spin_lock(&nn->client_lock); > > + list_for_each_entry(clp, &nn->client_lru, cl_lru) { > > + char cl_addr[INET6_ADDRSTRLEN] = {'\0'}; > > + > > + rpc_ntop((struct sockaddr*)&clp->cl_addr, cl_addr, sizeof(cl_addr)); > > + pr_debug("Looking at client=%p/%s cl_clientid=%u:%u refcnt=%d\n", > > + clp, cl_addr, clp->cl_clientid.cl_boot, clp->cl_clientid.cl_id, > > + atomic_read(&clp->cl_refcount)); > > + > > + spin_lock(&clp->cl_lock); > > + list_for_each_entry_safe(oo, oo_next, &clp->cl_openowners, oo_perclient) { > > + struct nfs4_ol_stateid *stp = NULL; > > + bool found_my_sb = false, found_other_sb = false; > > + struct super_block *other_sb = NULL; > > + > > + pr_debug(" Openowner %p %.*s\n", oo, oo->oo_owner.so_owner.len, oo->oo_owner.so_owner.data); > > + pr_debug(" oo_close_lru=%s oo_last_closed_stid=%p refcnt=%d so_is_open_owner=%u\n", > > + list_empty(&oo->oo_close_lru) ? "N" : "Y", oo->oo_last_closed_stid, > > + atomic_read(&oo->oo_owner.so_count), oo->oo_owner.so_is_open_owner); > > + > > + list_for_each_entry(stp, &oo->oo_owner.so_stateids, st_perstateowner) { > > + struct nfs4_file *fp = NULL; > > + struct file *filp = NULL; > > + struct super_block *f_sb = NULL; > > + if (stp->st_stid.sc_file == NULL) > > + continue; > > + > > + fp = stp->st_stid.sc_file; > > + filp = find_any_file(fp); > > + if (filp != NULL) > > + f_sb = file_inode(filp)->i_sb; > > + pr_debug(" filp=%p sb=%p my_sb=%p\n", filp, f_sb, sb); > > + if (f_sb == sb) { > > + found_my_sb = true; > > + } else { > > + found_other_sb = true; > > + other_sb = f_sb; > > + } > > + if (filp != NULL) > > + fput(filp); > > + } > > + > > + /* openowner does not have files from needed fs, skip it */ > > + if (!found_my_sb) > > + continue; > > + > > + /* > > + * we do not expect same openowhner having open files from more than one fs. > > + * but if it happens, we cannot release this openowner. > > + */ > > + if (found_other_sb) { > > + pr_warn(" client=%p/%s openowner %p %.*s has files from sb=%p but also from sb=%p, skipping it!\n", > > + clp, cl_addr, oo, oo->oo_owner.so_owner.len, oo->oo_owner.so_owner.data, sb, other_sb); > > + continue; > > + } > > + > > + /* > > + * Each OPEN stateid holds a refcnt on the openowner (and LOCK stateid holds a refcnt on the lockowner). > > + * This refcnt is dropped when nfs4_free_ol_stateid is called, which calls nfs4_put_stateowner. > > + * The last refcnt drop, unhashes and frees the openowner. > > + * As a result, after we free the last stateid, the openowner will be also be freed. > > + * But we still need the openowner to be around, because we need to call release_last_closed_stateid(), > > + * which is what release_openowner() does (we are doing equivalent of that). > > + * So we need to grab an extra refcnt for the openowner here. > > + */ > > + nfs4_get_stateowner(&oo->oo_owner); > > + > > + /* see: nfsd_collect_client_openowners(), nfsd_foreach_client_openowner() */ > > + unhash_openowner_locked(oo); > > + /* > > + * By incrementing cl_refcount under "nn->client_lock" we, hopefully, protect that client from being killed via mark_client_expired_locked(). > > + * We increment cl_refcount once per each openowner. > > + */ > > + atomic_inc(&clp->cl_refcount); > > + list_add(&oo->oo_perclient, &openowner_reaplist); > > + ++n_openowners; > > + } > > + spin_unlock(&clp->cl_lock); > > + } > > + spin_unlock(&nn->client_lock); > > + > > + pr_info("Collected %u openowners for removal (sb=%p)\n", n_openowners, sb); > > + > > + nfsd_reap_openowners(&openowner_reaplist); > > +} > > + > > +/* > > * Since the lifetime of a delegation isn't limited to that of an open, a > > * client may quite reasonably hang on to a delegation as long as it has > > * the inode cached. This becomes an obvious problem the first time a > > diff --git a/fs/nfsd/nfsctl.c b/fs/nfsd/nfsctl.c > > index 4824363..8b38186 100755 > > --- a/fs/nfsd/nfsctl.c > > +++ b/fs/nfsd/nfsctl.c > > @@ -37,6 +37,7 @@ enum { > > NFSD_Fh, > > NFSD_FO_UnlockIP, > > NFSD_FO_UnlockFS, > > + NFSD_FO_ReleaseStateIds, > > NFSD_Threads, > > NFSD_Pool_Threads, > > NFSD_Pool_Stats, > > @@ -64,6 +65,7 @@ enum { > > static ssize_t write_filehandle(struct file *file, char *buf, size_t size); > > static ssize_t write_unlock_ip(struct file *file, char *buf, size_t size); > > static ssize_t write_unlock_fs(struct file *file, char *buf, size_t size); > > +static ssize_t write_release_stateids(struct file *file, char *buf, size_t size); > > static ssize_t write_threads(struct file *file, char *buf, size_t size); > > static ssize_t write_pool_threads(struct file *file, char *buf, size_t size); > > static ssize_t write_versions(struct file *file, char *buf, size_t size); > > @@ -81,6 +83,7 @@ static ssize_t (*write_op[])(struct file *, char *, size_t) = { > > [NFSD_Fh] = write_filehandle, > > [NFSD_FO_UnlockIP] = write_unlock_ip, > > [NFSD_FO_UnlockFS] = write_unlock_fs, > > + [NFSD_FO_ReleaseStateIds] = write_release_stateids, > > [NFSD_Threads] = write_threads, > > [NFSD_Pool_Threads] = write_pool_threads, > > [NFSD_Versions] = write_versions, > > @@ -328,6 +331,47 @@ static ssize_t write_unlock_fs(struct file *file, char *buf, size_t size) > > } > > > > /** > > + * write_release_stateids - Release stateids of a local file system > > + * > > + * Experimental. > > + * > > + * Input: > > + * buf: '\n'-terminated C string containing the > > + * absolute pathname of a local file system > > + * size: length of C string in @buf > > + * Output: > > + * On success: returns zero if all openowners were released > > + * On error: return code is negative errno value > > + */ > > +static ssize_t write_release_stateids(struct file *file, char *buf, size_t size) > > +{ > > + struct path path; > > + char *fo_path = NULL; > > + int error = 0; > > + > > + /* sanity check */ > > + if (size == 0) > > + return -EINVAL; > > + > > + if (buf[size-1] != '\n') > > + return -EINVAL; > > + > > + fo_path = buf; > > + error = qword_get(&buf, fo_path, size); > > + if (error < 0) > > + return -EINVAL; > > + > > + error = kern_path(fo_path, 0, &path); > > + if (error) > > + return error; > > + > > + nfs4_release_stateids(path.dentry->d_sb); > > + > > + path_put(&path); > > + return 0; > > +} > > + > > +/** > > * write_filehandle - Get a variable-length NFS file handle by path > > * > > * On input, the buffer contains a '\n'-terminated C string comprised of > > @@ -1167,6 +1211,8 @@ static int nfsd_fill_super(struct super_block * sb, void * data, int silent) > > &transaction_ops, S_IWUSR|S_IRUSR}, > > [NFSD_FO_UnlockFS] = {"unlock_filesystem", > > &transaction_ops, S_IWUSR|S_IRUSR}, > > + [NFSD_FO_ReleaseStateIds] = {"release_stateids", > > + &transaction_ops, S_IWUSR|S_IRUSR}, > > [NFSD_Fh] = {"filehandle", &transaction_ops, S_IWUSR|S_IRUSR}, > > [NFSD_Threads] = {"threads", &transaction_ops, S_IWUSR|S_IRUSR}, > > [NFSD_Pool_Threads] = {"pool_threads", &transaction_ops, S_IWUSR|S_IRUSR}, > > diff --git a/fs/nfsd/state.h b/fs/nfsd/state.h > > index 86aa92d..acee094 100644 > > --- a/fs/nfsd/state.h > > +++ b/fs/nfsd/state.h > > @@ -632,6 +632,8 @@ extern struct nfs4_client_reclaim *nfs4_client_to_reclaim(const char *name, > > struct nfsd_net *nn); > > extern bool nfs4_has_reclaimed_state(const char *name, struct nfsd_net *nn); > > > > +extern void nfs4_release_stateids(struct super_block *sb); > > + > > struct nfs4_file *find_file(struct knfsd_fh *fh); > > void put_nfs4_file(struct nfs4_file *fi); > > static inline void get_nfs4_file(struct nfs4_file *fi) > > -- > > 1.9.1