All of lore.kernel.org
 help / color / mirror / Atom feed
From: Kinglong Mee <kinglongmee@gmail.com>
To: NeilBrown <neilb@suse.de>, "J. Bruce Fields" <bfields@fieldses.org>
Cc: "linux-nfs@vger.kernel.org" <linux-nfs@vger.kernel.org>,
	kinglongmee@gmail.com
Subject: Re: [PATCH RFC] NFSD: fix cannot umounting mount points under pseudo root
Date: Thu, 23 Apr 2015 20:52:40 +0800	[thread overview]
Message-ID: <5538EB18.7080802@gmail.com> (raw)
In-Reply-To: <20150423094431.1a8aa68b@notabene.brown>

On 4/23/2015 7:44 AM, NeilBrown wrote:
> On Wed, 22 Apr 2015 11:07:03 -0400 "J. Bruce Fields" <bfields@fieldses.org>
> wrote:
>>> Reference of dentry/mnt is like a cache, avoids re-finding of them,
>>> it is necessary to store them in svc_export.
>>>
>>> Neil points out another way of 'fs_pin', I will check that.
>>
>> Yes, that'd be interesting.  On a very quick look--I don't understand
>> what it's meant to do at all.  But if it does provide a way to get a
>> callback on umount, that'd certainly be interesting....
> 
> Yeah, on a quick look it isn't really obvious at all.
> 
> But I didn't read the code.  I read
>  https://lwn.net/Articles/636730/
> 
> which says:
> 
>     In its place is a mechanism by which an object can be added to a vfsmount
>     structure (which represents a mounted filesystem); that object supports 
>     only one method: kill(). These "pin" objects hang around until the final
>     reference to the vfsmount goes away, at which point each one's kill() function
>     is called. It thus is a clean mechanism for performing cleanup when a filesystem
>     goes away.
> 
> This is used to close "BSD process accounting" files on umount, and sound
> like the perfect interface for flushing things from the sunrpc cache on
> umount.

I have review those codes in fs/fs_pin.c and kernel/acct.c.
'fs_pin' is used to fix the race between file->f_path.mnt for acct
and umount, file->f_path.mnt just holds the reference of vfsmount
but not keep busy, umount will check the reference and return busy.

The logical is same as sunrpc cache for exports.

Before commit 3064c3563b ("death to mnt_pinned"), acct uses a hack
method, acct get a pin to vfsmount and then put the reference,
so umount finds no other reference and callback those pins in vfsmount,
at last umount success.

After that commit, besides pin to original vfsmount and put the reference
of the original vfsmount, acct clone the vfsmount storing in file->f_path.mnt.

The different between those two methods is the value of file->f_path.mnt,
the first one, it contains the original vfsmnt without reference,
the second one, contains a cloned vfsmnt with reference.

I have test the first method, pins to vfsmount for all exports cache,
nfsd can work but sometimes will delay or hang at rpc.mountd's calling
of name_to_handle_at, I can't find the reason.

Also test the second method, there are many problem caused by the cloned
vfsmount, mnt_clone_internal() change mnt_root values to the path dentry.

Those cases are tested for all exports cache, but, I think nfsd should
put the reference of vfsmount when finding exports upcall fail.
The success exports cache should also take it.

The following is a draft of the first method.

thanks,
Kinglong Mee

----------------------------------------------------------------------
diff --git a/fs/fs_pin.c b/fs/fs_pin.c
index 611b540..553e8b1 100644
--- a/fs/fs_pin.c
+++ b/fs/fs_pin.c
@@ -17,6 +17,7 @@ void pin_remove(struct fs_pin *pin)
 	wake_up_locked(&pin->wait);
 	spin_unlock_irq(&pin->wait.lock);
 }
+EXPORT_SYMBOL(pin_remove);
 
 void pin_insert_group(struct fs_pin *pin, struct vfsmount *m, struct hlist_head *p)
 {
@@ -26,11 +27,13 @@ void pin_insert_group(struct fs_pin *pin, struct vfsmount *m, struct hlist_head
 	hlist_add_head(&pin->m_list, &real_mount(m)->mnt_pins);
 	spin_unlock(&pin_lock);
 }
+EXPORT_SYMBOL(pin_insert_group);
 
 void pin_insert(struct fs_pin *pin, struct vfsmount *m)
 {
 	pin_insert_group(pin, m, &m->mnt_sb->s_pins);
 }
+EXPORT_SYMBOL(pin_insert);
 
 void pin_kill(struct fs_pin *p)
 {
diff --git a/fs/nfsd/export.c b/fs/nfsd/export.c
index c3e3b6e..3e3df8c 100644
--- a/fs/nfsd/export.c
+++ b/fs/nfsd/export.c
@@ -309,7 +309,7 @@ static void nfsd4_fslocs_free(struct nfsd4_fs_locations *fsloc)
 static void svc_export_put(struct kref *ref)
 {
 	struct svc_export *exp = container_of(ref, struct svc_export, h.ref);
-	path_put(&exp->ex_path);
+	pin_remove(&exp->ex_pin);
 	auth_domain_put(exp->ex_client);
 	nfsd4_fslocs_free(&exp->ex_fslocs);
 	kfree(exp->ex_uuid);
@@ -695,15 +695,26 @@ static int svc_export_match(struct cache_head *a, struct cache_head *b)
 		orig->ex_path.mnt == new->ex_path.mnt;
 }
 
+static void export_pin_kill(struct fs_pin *pin)
+{
+	struct svc_export *exp = container_of(pin, struct svc_export, ex_pin);
+
+	write_lock(&exp->cd->hash_lock);
+	cache_mark_expired(&exp->h);
+	pin_remove(pin);
+	write_unlock(&exp->cd->hash_lock);
+}
+
 static void svc_export_init(struct cache_head *cnew, struct cache_head *citem)
 {
 	struct svc_export *new = container_of(cnew, struct svc_export, h);
 	struct svc_export *item = container_of(citem, struct svc_export, h);
 
+	init_fs_pin(&new->ex_pin, export_pin_kill);
 	kref_get(&item->ex_client->ref);
 	new->ex_client = item->ex_client;
 	new->ex_path = item->ex_path;
-	path_get(&item->ex_path);
+	pin_insert_group(&new->ex_pin, new->ex_path.mnt, NULL);
 	new->ex_fslocs.locations = NULL;
 	new->ex_fslocs.locations_count = 0;
 	new->ex_fslocs.migrated = 0;
diff --git a/fs/nfsd/export.h b/fs/nfsd/export.h
index 1f52bfc..718a27e 100644
--- a/fs/nfsd/export.h
+++ b/fs/nfsd/export.h
@@ -4,6 +4,7 @@
 #ifndef NFSD_EXPORT_H
 #define NFSD_EXPORT_H
 
+#include <linux/fs_pin.h>
 #include <linux/sunrpc/cache.h>
 #include <uapi/linux/nfsd/export.h>
 
@@ -49,6 +50,7 @@ struct svc_export {
 	struct auth_domain *	ex_client;
 	int			ex_flags;
 	struct path		ex_path;
+	struct fs_pin		ex_pin;
 	kuid_t			ex_anon_uid;
 	kgid_t			ex_anon_gid;
 	int			ex_fsid;
diff --git a/include/linux/sunrpc/cache.h b/include/linux/sunrpc/cache.h
index 437ddb6..6936684 100644
--- a/include/linux/sunrpc/cache.h
+++ b/include/linux/sunrpc/cache.h
@@ -206,6 +206,11 @@ static inline int cache_is_expired(struct cache_detail *detail, struct cache_hea
 		(detail->flush_time > h->last_refresh);
 }
 
+static inline void cache_mark_expired(struct cache_head *h)
+{
+	h->expiry_time = seconds_since_boot() - 1;
+}
+
 extern int cache_check(struct cache_detail *detail,
 		       struct cache_head *h, struct cache_req *rqstp);
 extern void cache_flush(void);


  reply	other threads:[~2015-04-23 12:52 UTC|newest]

Thread overview: 51+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-04-21 14:50 [PATCH RFC] NFSD: fix cannot umounting mount points under pseudo root Kinglong Mee
2015-04-21 21:54 ` J. Bruce Fields
2015-04-22  5:07   ` NeilBrown
2015-04-22 11:11   ` Kinglong Mee
2015-04-22 15:07     ` J. Bruce Fields
2015-04-22 23:44       ` NeilBrown
2015-04-23 12:52         ` Kinglong Mee [this message]
2015-04-24  3:00           ` NeilBrown
2015-04-27 12:11             ` Kinglong Mee
2015-04-29  2:57               ` NeilBrown
2015-04-29  8:45                 ` Kinglong Mee
2015-04-29 19:19                 ` J. Bruce Fields
2015-04-29 21:52                   ` NeilBrown
2015-04-30 21:36                     ` J. Bruce Fields
2015-05-01  1:53                       ` NeilBrown
2015-05-01  2:03                         ` Al Viro
2015-05-01  2:23                           ` NeilBrown
2015-05-01  2:29                             ` Al Viro
2015-05-01  3:08                               ` NeilBrown
2015-05-01 13:29                                 ` J. Bruce Fields
2015-05-02 23:16                                   ` NeilBrown
2015-05-03  0:37                                     ` J. Bruce Fields
2015-05-04  4:11                                       ` NeilBrown
2015-05-04 21:48                                     ` J. Bruce Fields
2015-05-05 22:27                                       ` NeilBrown
2015-05-04 22:01                         ` J. Bruce Fields
2015-05-05 13:54                           ` Kinglong Mee
2015-05-05 14:18                             ` J. Bruce Fields
2015-05-05 15:52                               ` J. Bruce Fields
2015-05-05 22:26                                 ` NeilBrown
2015-05-08 16:15                                   ` J. Bruce Fields
2015-05-08 20:01                                     ` [PATCH] nfsd: don't hold i_mutex over userspace upcalls J. Bruce Fields
2015-06-03 15:18                                       ` J. Bruce Fields
     [not found]                                         ` <20150603151819.GA8441-uC3wQj2KruNg9hUCZPvPmw@public.gmane.org>
2015-07-05 11:27                                           ` Kinglong Mee
2015-07-05 11:27                                             ` Kinglong Mee
2015-07-06 18:22                                             ` J. Bruce Fields
2015-08-18 19:10                                           ` J. Bruce Fields
2015-08-18 19:10                                             ` J. Bruce Fields
     [not found]                                             ` <20150818191028.GA3957-uC3wQj2KruNg9hUCZPvPmw@public.gmane.org>
2015-11-12 21:22                                               ` J. Bruce Fields
2015-11-12 21:22                                                 ` J. Bruce Fields
2015-05-07 15:31                                 ` [PATCH RFC] NFSD: fix cannot umounting mount points under pseudo root J. Bruce Fields
2015-05-07 22:42                                   ` NeilBrown
2015-05-08 14:10                                     ` J. Bruce Fields
2015-05-05  3:53                       ` Kinglong Mee
2015-05-05  4:19                         ` NeilBrown
2015-05-05  8:32                           ` Kinglong Mee
2015-05-05 13:52                             ` J. Bruce Fields
2015-06-26 23:14                             ` Kinglong Mee
2015-06-26 23:35                               ` NeilBrown
2015-07-02  9:42                                 ` Kinglong Mee
2015-05-01  1:55                     ` Al Viro

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5538EB18.7080802@gmail.com \
    --to=kinglongmee@gmail.com \
    --cc=bfields@fieldses.org \
    --cc=linux-nfs@vger.kernel.org \
    --cc=neilb@suse.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.