* [PATCH 0/3 v3] Fixups for l_pid
@ 2017-06-06 17:19 Benjamin Coddington
2017-06-06 17:19 ` [PATCH 1/3] fs/locks: Use allocation rather than the stack in fcntl_getlk() Benjamin Coddington
` (2 more replies)
0 siblings, 3 replies; 13+ messages in thread
From: Benjamin Coddington @ 2017-06-06 17:19 UTC (permalink / raw)
To: Jeff Layton, bfields, Alexander Viro; +Cc: linux-fsdevel, linux-kernel
LTP fcntl tests (fcntl11 fcntl14 fcntl17 fcntl19 fcntl20 fcntl21) have been
failing for NFSv4 mounts due to an unexpected l_pid. What follows are some
fixups:
v2: - Rebase onto linux-next
- Revert back to using the stack in locks_mandatory_area(), and fixup
patch description for 1/3
v3: - The lkp-robot found some serious per_thread_ops performance
regressions for v1 and v2, so this version changes things around to not
acquire a reference to struct pid in fl_nspid for every lock. Instead,
it drops fl_nspid altogether, and defers the lookup of the
namespace-translated pid until it actually needed.
Benjamin Coddington (3):
fs/locks: Use allocation rather than the stack in fcntl_getlk()
fs/locks: Remove fl_nspid
fs/locks: Use fs-specific l_pid for remote locks
fs/locks.c | 122 +++++++++++++++++++++++++++++++++--------------------
include/linux/fs.h | 2 +-
2 files changed, 78 insertions(+), 46 deletions(-)
--
2.9.3
^ permalink raw reply [flat|nested] 13+ messages in thread
* [PATCH 1/3] fs/locks: Use allocation rather than the stack in fcntl_getlk()
2017-06-06 17:19 [PATCH 0/3 v3] Fixups for l_pid Benjamin Coddington
@ 2017-06-06 17:19 ` Benjamin Coddington
2017-06-06 17:19 ` [PATCH 2/3] fs/locks: Remove fl_nspid Benjamin Coddington
2017-06-06 17:19 ` [PATCH 3/3] fs/locks: Use fs-specific l_pid for remote locks Benjamin Coddington
2 siblings, 0 replies; 13+ messages in thread
From: Benjamin Coddington @ 2017-06-06 17:19 UTC (permalink / raw)
To: Jeff Layton, bfields, Alexander Viro; +Cc: linux-fsdevel, linux-kernel
Struct file_lock is fairly large, so let's save some space on the stack by
using an allocation for struct file_lock in fcntl_getlk(), just as we do
for fcntl_setlk().
Signed-off-by: Benjamin Coddington <bcodding@redhat.com>
---
fs/locks.c | 46 ++++++++++++++++++++++++++--------------------
1 file changed, 26 insertions(+), 20 deletions(-)
diff --git a/fs/locks.c b/fs/locks.c
index afefeb4ad6de..d7daa6c8932f 100644
--- a/fs/locks.c
+++ b/fs/locks.c
@@ -2086,14 +2086,17 @@ static void posix_lock_to_flock64(struct flock64 *flock, struct file_lock *fl)
*/
int fcntl_getlk(struct file *filp, unsigned int cmd, struct flock *flock)
{
- struct file_lock file_lock;
+ struct file_lock *fl;
int error;
+ fl = locks_alloc_lock();
+ if (fl == NULL)
+ return -ENOMEM;
error = -EINVAL;
if (flock->l_type != F_RDLCK && flock->l_type != F_WRLCK)
goto out;
- error = flock_to_posix_lock(filp, &file_lock, flock);
+ error = flock_to_posix_lock(filp, fl, flock);
if (error)
goto out;
@@ -2103,23 +2106,22 @@ int fcntl_getlk(struct file *filp, unsigned int cmd, struct flock *flock)
goto out;
cmd = F_GETLK;
- file_lock.fl_flags |= FL_OFDLCK;
- file_lock.fl_owner = filp;
+ fl->fl_flags |= FL_OFDLCK;
+ fl->fl_owner = filp;
}
- error = vfs_test_lock(filp, &file_lock);
+ error = vfs_test_lock(filp, fl);
if (error)
goto out;
- flock->l_type = file_lock.fl_type;
- if (file_lock.fl_type != F_UNLCK) {
- error = posix_lock_to_flock(flock, &file_lock);
+ flock->l_type = fl->fl_type;
+ if (fl->fl_type != F_UNLCK) {
+ error = posix_lock_to_flock(flock, fl);
if (error)
- goto rel_priv;
+ goto out;
}
-rel_priv:
- locks_release_private(&file_lock);
out:
+ locks_free_lock(fl);
return error;
}
@@ -2298,14 +2300,18 @@ int fcntl_setlk(unsigned int fd, struct file *filp, unsigned int cmd,
*/
int fcntl_getlk64(struct file *filp, unsigned int cmd, struct flock64 *flock)
{
- struct file_lock file_lock;
+ struct file_lock *fl;
int error;
+ fl = locks_alloc_lock();
+ if (fl == NULL)
+ return -ENOMEM;
+
error = -EINVAL;
if (flock->l_type != F_RDLCK && flock->l_type != F_WRLCK)
goto out;
- error = flock64_to_posix_lock(filp, &file_lock, flock);
+ error = flock64_to_posix_lock(filp, fl, flock);
if (error)
goto out;
@@ -2315,20 +2321,20 @@ int fcntl_getlk64(struct file *filp, unsigned int cmd, struct flock64 *flock)
goto out;
cmd = F_GETLK64;
- file_lock.fl_flags |= FL_OFDLCK;
- file_lock.fl_owner = filp;
+ fl->fl_flags |= FL_OFDLCK;
+ fl->fl_owner = filp;
}
- error = vfs_test_lock(filp, &file_lock);
+ error = vfs_test_lock(filp, fl);
if (error)
goto out;
- flock->l_type = file_lock.fl_type;
- if (file_lock.fl_type != F_UNLCK)
- posix_lock_to_flock64(flock, &file_lock);
+ flock->l_type = fl->fl_type;
+ if (fl->fl_type != F_UNLCK)
+ posix_lock_to_flock64(flock, fl);
- locks_release_private(&file_lock);
out:
+ locks_free_lock(fl);
return error;
}
--
2.9.3
^ permalink raw reply related [flat|nested] 13+ messages in thread
* [PATCH 2/3] fs/locks: Remove fl_nspid
2017-06-06 17:19 [PATCH 0/3 v3] Fixups for l_pid Benjamin Coddington
2017-06-06 17:19 ` [PATCH 1/3] fs/locks: Use allocation rather than the stack in fcntl_getlk() Benjamin Coddington
@ 2017-06-06 17:19 ` Benjamin Coddington
2017-06-06 18:00 ` Jeff Layton
2017-06-08 6:50 ` kbuild test robot
2017-06-06 17:19 ` [PATCH 3/3] fs/locks: Use fs-specific l_pid for remote locks Benjamin Coddington
2 siblings, 2 replies; 13+ messages in thread
From: Benjamin Coddington @ 2017-06-06 17:19 UTC (permalink / raw)
To: Jeff Layton, bfields, Alexander Viro; +Cc: linux-fsdevel, linux-kernel
Since commit c69899a17ca4 "NFSv4: Update of VFS byte range lock must be
atomic with the stateid update", NFSv4 has been inserting locks in rpciod
worker context. The result is that the file_lock's fl_nspid is the
kworker's pid instead of the original userspace pid.
The fl_nspid is only used to represent the namespaced virtual pid number
when displaying locks or returning from F_GETLK. There's no reason to set
it for every inserted lock, since we can usually just look it up from
fl_pid. So, instead of looking up and holding struct pid for every lock,
let's just look up the virtual pid number from fl_pid when it is needed.
That means we can remove fl_nspid entirely.
Signed-off-by: Benjamin Coddington <bcodding@redhat.com>
---
fs/locks.c | 58 ++++++++++++++++++++++++++++++++----------------------
include/linux/fs.h | 1 -
2 files changed, 35 insertions(+), 24 deletions(-)
diff --git a/fs/locks.c b/fs/locks.c
index d7daa6c8932f..104398ccc9b9 100644
--- a/fs/locks.c
+++ b/fs/locks.c
@@ -733,7 +733,6 @@ static void locks_wake_up_blocks(struct file_lock *blocker)
static void
locks_insert_lock_ctx(struct file_lock *fl, struct list_head *before)
{
- fl->fl_nspid = get_pid(task_tgid(current));
list_add_tail(&fl->fl_list, before);
locks_insert_global_locks(fl);
}
@@ -743,10 +742,6 @@ locks_unlink_lock_ctx(struct file_lock *fl)
{
locks_delete_global_locks(fl);
list_del_init(&fl->fl_list);
- if (fl->fl_nspid) {
- put_pid(fl->fl_nspid);
- fl->fl_nspid = NULL;
- }
locks_wake_up_blocks(fl);
}
@@ -823,8 +818,6 @@ posix_test_lock(struct file *filp, struct file_lock *fl)
list_for_each_entry(cfl, &ctx->flc_posix, fl_list) {
if (posix_locks_conflict(fl, cfl)) {
locks_copy_conflock(fl, cfl);
- if (cfl->fl_nspid)
- fl->fl_pid = pid_vnr(cfl->fl_nspid);
goto out;
}
}
@@ -2048,6 +2041,31 @@ int vfs_test_lock(struct file *filp, struct file_lock *fl)
}
EXPORT_SYMBOL_GPL(vfs_test_lock);
+/**
+ * locks_translate_pid - translate a pid number into a namespace
+ * @nr: The pid number in the init_pid_ns
+ * @ns: The namespace into which the pid should be translated
+ *
+ * Used to tranlate a fl_pid into a namespace virtual pid number
+ */
+static pid_t locks_translate_pid(int init_nr, struct pid_namespace *ns)
+{
+ pid_t vnr = 0;
+ struct task_struct *task;
+
+ rcu_read_lock();
+ task = find_task_by_pid_ns(init_nr, &init_pid_ns);
+ if (task)
+ get_task_struct(task);
+ rcu_read_unlock();
+ if (!task)
+ goto out;
+ vnr = task_pid_nr_ns(task, ns);
+ put_task_struct(task);
+out:
+ return vnr;
+}
+
static int posix_lock_to_flock(struct flock *flock, struct file_lock *fl)
{
flock->l_pid = IS_OFDLCK(fl) ? -1 : fl->fl_pid;
@@ -2584,22 +2602,16 @@ static void lock_get_status(struct seq_file *f, struct file_lock *fl,
{
struct inode *inode = NULL;
unsigned int fl_pid;
+ struct pid_namespace *proc_pidns = file_inode(f->file)->i_sb->s_fs_info;
- if (fl->fl_nspid) {
- struct pid_namespace *proc_pidns = file_inode(f->file)->i_sb->s_fs_info;
-
- /* Don't let fl_pid change based on who is reading the file */
- fl_pid = pid_nr_ns(fl->fl_nspid, proc_pidns);
-
- /*
- * If there isn't a fl_pid don't display who is waiting on
- * the lock if we are called from locks_show, or if we are
- * called from __show_fd_info - skip lock entirely
- */
- if (fl_pid == 0)
- return;
- } else
- fl_pid = fl->fl_pid;
+ fl_pid = locks_translate_pid(fl->fl_pid, proc_pidns);
+ /*
+ * If there isn't a fl_pid don't display who is waiting on
+ * the lock if we are called from locks_show, or if we are
+ * called from __show_fd_info - skip lock entirely
+ */
+ if (fl_pid == 0)
+ return;
if (fl->fl_file != NULL)
inode = locks_inode(fl->fl_file);
@@ -2674,7 +2686,7 @@ static int locks_show(struct seq_file *f, void *v)
fl = hlist_entry(v, struct file_lock, fl_link);
- if (fl->fl_nspid && !pid_nr_ns(fl->fl_nspid, proc_pidns))
+ if (locks_translate_pid(fl->fl_pid, proc_pidns) == 0)
return 0;
lock_get_status(f, fl, iter->li_pos, "");
diff --git a/include/linux/fs.h b/include/linux/fs.h
index aa4affb38c39..b013fac515f7 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -984,7 +984,6 @@ struct file_lock {
unsigned char fl_type;
unsigned int fl_pid;
int fl_link_cpu; /* what cpu's list is this on? */
- struct pid *fl_nspid;
wait_queue_head_t fl_wait;
struct file *fl_file;
loff_t fl_start;
--
2.9.3
^ permalink raw reply related [flat|nested] 13+ messages in thread
* Re: [PATCH 2/3] fs/locks: Remove fl_nspid
2017-06-06 17:19 ` [PATCH 2/3] fs/locks: Remove fl_nspid Benjamin Coddington
@ 2017-06-06 18:00 ` Jeff Layton
2017-06-06 18:25 ` Jeff Layton
2017-06-08 6:50 ` kbuild test robot
1 sibling, 1 reply; 13+ messages in thread
From: Jeff Layton @ 2017-06-06 18:00 UTC (permalink / raw)
To: Benjamin Coddington, bfields, Alexander Viro; +Cc: linux-fsdevel, linux-kernel
On Tue, 2017-06-06 at 13:19 -0400, Benjamin Coddington wrote:
> Since commit c69899a17ca4 "NFSv4: Update of VFS byte range lock must be
> atomic with the stateid update", NFSv4 has been inserting locks in rpciod
> worker context. The result is that the file_lock's fl_nspid is the
> kworker's pid instead of the original userspace pid.
>
> The fl_nspid is only used to represent the namespaced virtual pid number
> when displaying locks or returning from F_GETLK. There's no reason to set
> it for every inserted lock, since we can usually just look it up from
> fl_pid. So, instead of looking up and holding struct pid for every lock,
> let's just look up the virtual pid number from fl_pid when it is needed.
> That means we can remove fl_nspid entirely.
>
With this set, I think we ought to codify that the stored pid must be
relative
> Signed-off-by: Benjamin Coddington <bcodding@redhat.com>
> ---
> fs/locks.c | 58 ++++++++++++++++++++++++++++++++----------------------
> include/linux/fs.h | 1 -
> 2 files changed, 35 insertions(+), 24 deletions(-)
>
> diff --git a/fs/locks.c b/fs/locks.c
> index d7daa6c8932f..104398ccc9b9 100644
> --- a/fs/locks.c
> +++ b/fs/locks.c
> @@ -733,7 +733,6 @@ static void locks_wake_up_blocks(struct file_lock *blocker)
> static void
> locks_insert_lock_ctx(struct file_lock *fl, struct list_head *before)
> {
> - fl->fl_nspid = get_pid(task_tgid(current));
> list_add_tail(&fl->fl_list, before);
> locks_insert_global_locks(fl);
> }
> @@ -743,10 +742,6 @@ locks_unlink_lock_ctx(struct file_lock *fl)
> {
> locks_delete_global_locks(fl);
> list_del_init(&fl->fl_list);
> - if (fl->fl_nspid) {
> - put_pid(fl->fl_nspid);
> - fl->fl_nspid = NULL;
> - }
> locks_wake_up_blocks(fl);
> }
>
> @@ -823,8 +818,6 @@ posix_test_lock(struct file *filp, struct file_lock *fl)
> list_for_each_entry(cfl, &ctx->flc_posix, fl_list) {
> if (posix_locks_conflict(fl, cfl)) {
> locks_copy_conflock(fl, cfl);
> - if (cfl->fl_nspid)
> - fl->fl_pid = pid_vnr(cfl->fl_nspid);
> goto out;
> }
> }
> @@ -2048,6 +2041,31 @@ int vfs_test_lock(struct file *filp, struct file_lock *fl)
> }
> EXPORT_SYMBOL_GPL(vfs_test_lock);
>
> +/**
> + * locks_translate_pid - translate a pid number into a namespace
> + * @nr: The pid number in the init_pid_ns
> + * @ns: The namespace into which the pid should be translated
> + *
> + * Used to tranlate a fl_pid into a namespace virtual pid number
> + */
> +static pid_t locks_translate_pid(int init_nr, struct pid_namespace *ns)
> +{
> + pid_t vnr = 0;
> + struct task_struct *task;
> +
> + rcu_read_lock();
> + task = find_task_by_pid_ns(init_nr, &init_pid_ns);
> + if (task)
> + get_task_struct(task);
> + rcu_read_unlock();
Is that safe? What prevents get_task_struct from doing a 0->1 transition
there after the task usage count has already gone 1->0 and is on its way
to being freed?
> + if (!task)
> + goto out;
> + vnr = task_pid_nr_ns(task, ns);
> + put_task_struct(task);
> +out:
> + return vnr;
> +}
> +
> static int posix_lock_to_flock(struct flock *flock, struct file_lock *fl)
> {
> flock->l_pid = IS_OFDLCK(fl) ? -1 : fl->fl_pid;
> @@ -2584,22 +2602,16 @@ static void lock_get_status(struct seq_file *f, struct file_lock *fl,
> {
> struct inode *inode = NULL;
> unsigned int fl_pid;
> + struct pid_namespace *proc_pidns = file_inode(f->file)->i_sb->s_fs_info;
>
> - if (fl->fl_nspid) {
> - struct pid_namespace *proc_pidns = file_inode(f->file)->i_sb->s_fs_info;
> -
> - /* Don't let fl_pid change based on who is reading the file */
> - fl_pid = pid_nr_ns(fl->fl_nspid, proc_pidns);
> -
> - /*
> - * If there isn't a fl_pid don't display who is waiting on
> - * the lock if we are called from locks_show, or if we are
> - * called from __show_fd_info - skip lock entirely
> - */
> - if (fl_pid == 0)
> - return;
> - } else
> - fl_pid = fl->fl_pid;
> + fl_pid = locks_translate_pid(fl->fl_pid, proc_pidns);
> + /*
> + * If there isn't a fl_pid don't display who is waiting on
> + * the lock if we are called from locks_show, or if we are
> + * called from __show_fd_info - skip lock entirely
> + */
> + if (fl_pid == 0)
> + return;
>
> if (fl->fl_file != NULL)
> inode = locks_inode(fl->fl_file);
> @@ -2674,7 +2686,7 @@ static int locks_show(struct seq_file *f, void *v)
>
> fl = hlist_entry(v, struct file_lock, fl_link);
>
> - if (fl->fl_nspid && !pid_nr_ns(fl->fl_nspid, proc_pidns))
> + if (locks_translate_pid(fl->fl_pid, proc_pidns) == 0)
> return 0;
>
> lock_get_status(f, fl, iter->li_pos, "");
> diff --git a/include/linux/fs.h b/include/linux/fs.h
> index aa4affb38c39..b013fac515f7 100644
> --- a/include/linux/fs.h
> +++ b/include/linux/fs.h
> @@ -984,7 +984,6 @@ struct file_lock {
> unsigned char fl_type;
> unsigned int fl_pid;
> int fl_link_cpu; /* what cpu's list is this on? */
> - struct pid *fl_nspid;
> wait_queue_head_t fl_wait;
> struct file *fl_file;
> loff_t fl_start;
--
Jeff Layton <jlayton@redhat.com>
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH 2/3] fs/locks: Remove fl_nspid
2017-06-06 18:00 ` Jeff Layton
@ 2017-06-06 18:25 ` Jeff Layton
2017-06-06 18:57 ` Benjamin Coddington
0 siblings, 1 reply; 13+ messages in thread
From: Jeff Layton @ 2017-06-06 18:25 UTC (permalink / raw)
To: Benjamin Coddington, bfields, Alexander Viro; +Cc: linux-fsdevel, linux-kernel
On Tue, 2017-06-06 at 14:00 -0400, Jeff Layton wrote:
> On Tue, 2017-06-06 at 13:19 -0400, Benjamin Coddington wrote:
> > Since commit c69899a17ca4 "NFSv4: Update of VFS byte range lock must be
> > atomic with the stateid update", NFSv4 has been inserting locks in rpciod
> > worker context. The result is that the file_lock's fl_nspid is the
> > kworker's pid instead of the original userspace pid.
> >
> > The fl_nspid is only used to represent the namespaced virtual pid number
> > when displaying locks or returning from F_GETLK. There's no reason to set
> > it for every inserted lock, since we can usually just look it up from
> > fl_pid. So, instead of looking up and holding struct pid for every lock,
> > let's just look up the virtual pid number from fl_pid when it is needed.
> > That means we can remove fl_nspid entirely.
> >
>
> With this set, I think we ought to codify that the stored pid must be
> relative
...to the init_pid_ns. Let's make that clear in the comments for
filesystem authors.
>
>
> > Signed-off-by: Benjamin Coddington <bcodding@redhat.com>
> > ---
> > fs/locks.c | 58 ++++++++++++++++++++++++++++++++----------------------
> > include/linux/fs.h | 1 -
> > 2 files changed, 35 insertions(+), 24 deletions(-)
> >
> > diff --git a/fs/locks.c b/fs/locks.c
> > index d7daa6c8932f..104398ccc9b9 100644
> > --- a/fs/locks.c
> > +++ b/fs/locks.c
> > @@ -733,7 +733,6 @@ static void locks_wake_up_blocks(struct file_lock *blocker)
> > static void
> > locks_insert_lock_ctx(struct file_lock *fl, struct list_head *before)
> > {
> > - fl->fl_nspid = get_pid(task_tgid(current));
> > list_add_tail(&fl->fl_list, before);
> > locks_insert_global_locks(fl);
> > }
> > @@ -743,10 +742,6 @@ locks_unlink_lock_ctx(struct file_lock *fl)
> > {
> > locks_delete_global_locks(fl);
> > list_del_init(&fl->fl_list);
> > - if (fl->fl_nspid) {
> > - put_pid(fl->fl_nspid);
> > - fl->fl_nspid = NULL;
> > - }
> > locks_wake_up_blocks(fl);
> > }
> >
> > @@ -823,8 +818,6 @@ posix_test_lock(struct file *filp, struct file_lock *fl)
> > list_for_each_entry(cfl, &ctx->flc_posix, fl_list) {
> > if (posix_locks_conflict(fl, cfl)) {
> > locks_copy_conflock(fl, cfl);
> > - if (cfl->fl_nspid)
> > - fl->fl_pid = pid_vnr(cfl->fl_nspid);
> > goto out;
> > }
> > }
> > @@ -2048,6 +2041,31 @@ int vfs_test_lock(struct file *filp, struct file_lock *fl)
> > }
> > EXPORT_SYMBOL_GPL(vfs_test_lock);
> >
> > +/**
> > + * locks_translate_pid - translate a pid number into a namespace
> > + * @nr: The pid number in the init_pid_ns
> > + * @ns: The namespace into which the pid should be translated
> > + *
> > + * Used to tranlate a fl_pid into a namespace virtual pid number
> > + */
> > +static pid_t locks_translate_pid(int init_nr, struct pid_namespace *ns)
> > +{
> > + pid_t vnr = 0;
> > + struct task_struct *task;
> > +
> > + rcu_read_lock();
> > + task = find_task_by_pid_ns(init_nr, &init_pid_ns);
> > + if (task)
> > + get_task_struct(task);
> > + rcu_read_unlock();
>
> Is that safe? What prevents get_task_struct from doing a 0->1 transition
> there after the task usage count has already gone 1->0 and is on its way
> to being freed?
>
> > + if (!task)
> > + goto out;
> > + vnr = task_pid_nr_ns(task, ns);
> > + put_task_struct(task);
> > +out:
> > + return vnr;
> > +}
Now that I look, I think it might be best to just do all of that under
the rcu_read_lock and don't muck with the refcount at all. So something
like:
+static pid_t locks_translate_pid(int init_nr, struct pid_namespace *ns)
+{
+ pid_t vnr = 0;
+ struct task_struct *task;
+
+ rcu_read_lock();
+ task = find_task_by_pid_ns(init_nr, &init_pid_ns);
+ if (task)
+ vnr = task_pid_nr_ns(task, ns);
+ rcu_read_unlock();
+ return vnr;
+}
That should be fine since task_pid_nr_ns pretty much does all of its
work under the rcu_read_lock anyway.
> > static int posix_lock_to_flock(struct flock *flock, struct file_lock *fl)
> > {
> > flock->l_pid = IS_OFDLCK(fl) ? -1 : fl->fl_pid;
> > @@ -2584,22 +2602,16 @@ static void lock_get_status(struct seq_file *f, struct file_lock *fl,
> > {
> > struct inode *inode = NULL;
> > unsigned int fl_pid;
> > + struct pid_namespace *proc_pidns = file_inode(f->file)->i_sb->s_fs_info;
> >
> > - if (fl->fl_nspid) {
> > - struct pid_namespace *proc_pidns = file_inode(f->file)->i_sb->s_fs_info;
> > -
> > - /* Don't let fl_pid change based on who is reading the file */
> > - fl_pid = pid_nr_ns(fl->fl_nspid, proc_pidns);
> > -
> > - /*
> > - * If there isn't a fl_pid don't display who is waiting on
> > - * the lock if we are called from locks_show, or if we are
> > - * called from __show_fd_info - skip lock entirely
> > - */
> > - if (fl_pid == 0)
> > - return;
> > - } else
> > - fl_pid = fl->fl_pid;
> > + fl_pid = locks_translate_pid(fl->fl_pid, proc_pidns);
> > + /*
> > + * If there isn't a fl_pid don't display who is waiting on
> > + * the lock if we are called from locks_show, or if we are
> > + * called from __show_fd_info - skip lock entirely
> > + */
> > + if (fl_pid == 0)
> > + return;
> >
> > if (fl->fl_file != NULL)
> > inode = locks_inode(fl->fl_file);
> > @@ -2674,7 +2686,7 @@ static int locks_show(struct seq_file *f, void *v)
> >
> > fl = hlist_entry(v, struct file_lock, fl_link);
> >
> > - if (fl->fl_nspid && !pid_nr_ns(fl->fl_nspid, proc_pidns))
> > + if (locks_translate_pid(fl->fl_pid, proc_pidns) == 0)
> > return 0;
> >
> > lock_get_status(f, fl, iter->li_pos, "");
> > diff --git a/include/linux/fs.h b/include/linux/fs.h
> > index aa4affb38c39..b013fac515f7 100644
> > --- a/include/linux/fs.h
> > +++ b/include/linux/fs.h
> > @@ -984,7 +984,6 @@ struct file_lock {
> > unsigned char fl_type;
> > unsigned int fl_pid;
> > int fl_link_cpu; /* what cpu's list is this on? */
> > - struct pid *fl_nspid;
> > wait_queue_head_t fl_wait;
> > struct file *fl_file;
> > loff_t fl_start;
>
>
--
Jeff Layton <jlayton@redhat.com>
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH 2/3] fs/locks: Remove fl_nspid
2017-06-06 18:25 ` Jeff Layton
@ 2017-06-06 18:57 ` Benjamin Coddington
2017-06-06 20:41 ` Benjamin Coddington
2017-06-06 23:05 ` Jeff Layton
0 siblings, 2 replies; 13+ messages in thread
From: Benjamin Coddington @ 2017-06-06 18:57 UTC (permalink / raw)
To: Jeff Layton; +Cc: bfields, Alexander Viro, linux-fsdevel, linux-kernel
On 6 Jun 2017, at 14:25, Jeff Layton wrote:
> On Tue, 2017-06-06 at 14:00 -0400, Jeff Layton wrote:
>> On Tue, 2017-06-06 at 13:19 -0400, Benjamin Coddington wrote:
>>> Since commit c69899a17ca4 "NFSv4: Update of VFS byte range lock must
>>> be
>>> atomic with the stateid update", NFSv4 has been inserting locks in
>>> rpciod
>>> worker context. The result is that the file_lock's fl_nspid is the
>>> kworker's pid instead of the original userspace pid.
>>>
>>> The fl_nspid is only used to represent the namespaced virtual pid
>>> number
>>> when displaying locks or returning from F_GETLK. There's no reason
>>> to set
>>> it for every inserted lock, since we can usually just look it up
>>> from
>>> fl_pid. So, instead of looking up and holding struct pid for every
>>> lock,
>>> let's just look up the virtual pid number from fl_pid when it is
>>> needed.
>>> That means we can remove fl_nspid entirely.
>>>
>>
>> With this set, I think we ought to codify that the stored pid must be
>> relative
>
> ...to the init_pid_ns. Let's make that clear in the comments for
> filesystem authors.
OK, but I think you mean fl_pid should always be current->tgid or the
pid as
it is in init_pid_ns. We translate that pid into the virtual pid of the
process doing F_GETLK or reading /proc/locks.
>>> Signed-off-by: Benjamin Coddington <bcodding@redhat.com>
>>> ---
>>> fs/locks.c | 58
>>> ++++++++++++++++++++++++++++++++----------------------
>>> include/linux/fs.h | 1 -
>>> 2 files changed, 35 insertions(+), 24 deletions(-)
>>>
>>> diff --git a/fs/locks.c b/fs/locks.c
>>> index d7daa6c8932f..104398ccc9b9 100644
>>> --- a/fs/locks.c
>>> +++ b/fs/locks.c
>>> @@ -733,7 +733,6 @@ static void locks_wake_up_blocks(struct
>>> file_lock *blocker)
>>> static void
>>> locks_insert_lock_ctx(struct file_lock *fl, struct list_head
>>> *before)
>>> {
>>> - fl->fl_nspid = get_pid(task_tgid(current));
>>> list_add_tail(&fl->fl_list, before);
>>> locks_insert_global_locks(fl);
>>> }
>>> @@ -743,10 +742,6 @@ locks_unlink_lock_ctx(struct file_lock *fl)
>>> {
>>> locks_delete_global_locks(fl);
>>> list_del_init(&fl->fl_list);
>>> - if (fl->fl_nspid) {
>>> - put_pid(fl->fl_nspid);
>>> - fl->fl_nspid = NULL;
>>> - }
>>> locks_wake_up_blocks(fl);
>>> }
>>>
>>> @@ -823,8 +818,6 @@ posix_test_lock(struct file *filp, struct
>>> file_lock *fl)
>>> list_for_each_entry(cfl, &ctx->flc_posix, fl_list) {
>>> if (posix_locks_conflict(fl, cfl)) {
>>> locks_copy_conflock(fl, cfl);
>>> - if (cfl->fl_nspid)
>>> - fl->fl_pid = pid_vnr(cfl->fl_nspid);
>>> goto out;
>>> }
>>> }
>>> @@ -2048,6 +2041,31 @@ int vfs_test_lock(struct file *filp, struct
>>> file_lock *fl)
>>> }
>>> EXPORT_SYMBOL_GPL(vfs_test_lock);
>>>
>>> +/**
>>> + * locks_translate_pid - translate a pid number into a namespace
>>> + * @nr: The pid number in the init_pid_ns
>>> + * @ns: The namespace into which the pid should be translated
>>> + *
>>> + * Used to tranlate a fl_pid into a namespace virtual pid number
>>> + */
>>> +static pid_t locks_translate_pid(int init_nr, struct pid_namespace
>>> *ns)
>>> +{
>>> + pid_t vnr = 0;
>>> + struct task_struct *task;
>>> +
>>> + rcu_read_lock();
>>> + task = find_task_by_pid_ns(init_nr, &init_pid_ns);
>>> + if (task)
>>> + get_task_struct(task);
>>> + rcu_read_unlock();
>>
>> Is that safe? What prevents get_task_struct from doing a 0->1
>> transition
>> there after the task usage count has already gone 1->0 and is on its
>> way
>> to being freed?
Uh, no -- seems not safe. I copied that directly from fs/proc/base.c,
and
seems a problem there too.
Changing this to the below avoids the race with the struct task being
released:
rcu_read_lock();
struct pid = find_pid_ns(init_nr, &init_pid_ns)
vnr = pid_vnr(pid);
rcu_read_unlock();
Ben
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH 2/3] fs/locks: Remove fl_nspid
2017-06-06 18:57 ` Benjamin Coddington
@ 2017-06-06 20:41 ` Benjamin Coddington
2017-06-06 23:05 ` Jeff Layton
1 sibling, 0 replies; 13+ messages in thread
From: Benjamin Coddington @ 2017-06-06 20:41 UTC (permalink / raw)
To: Jeff Layton; +Cc: bfields, Alexander Viro, linux-fsdevel, linux-kernel
On 6 Jun 2017, at 14:57, Benjamin Coddington wrote:
> On 6 Jun 2017, at 14:25, Jeff Layton wrote:
>
>> On Tue, 2017-06-06 at 14:00 -0400, Jeff Layton wrote:
>>> On Tue, 2017-06-06 at 13:19 -0400, Benjamin Coddington wrote:
>>>> Since commit c69899a17ca4 "NFSv4: Update of VFS byte range lock
>>>> must be
>>>> atomic with the stateid update", NFSv4 has been inserting locks in
>>>> rpciod
>>>> worker context. The result is that the file_lock's fl_nspid is the
>>>> kworker's pid instead of the original userspace pid.
>>>>
>>>> The fl_nspid is only used to represent the namespaced virtual pid
>>>> number
>>>> when displaying locks or returning from F_GETLK. There's no reason
>>>> to set
>>>> it for every inserted lock, since we can usually just look it up
>>>> from
>>>> fl_pid. So, instead of looking up and holding struct pid for every
>>>> lock,
>>>> let's just look up the virtual pid number from fl_pid when it is
>>>> needed.
>>>> That means we can remove fl_nspid entirely.
>>>>
>>>
>>> With this set, I think we ought to codify that the stored pid must
>>> be
>>> relative
>>
>> ...to the init_pid_ns. Let's make that clear in the comments for
>> filesystem authors.
>
> OK, but I think you mean fl_pid should always be current->tgid or the
> pid as
> it is in init_pid_ns. We translate that pid into the virtual pid of
> the
> process doing F_GETLK or reading /proc/locks.
And in that sense, we shouldn't have to add a comment, because the
expected
behavior is the same as it is today -- namely that fl_pid is
current->tgid,
or the real pid number, which is the pid number in init_pid_ns.
So, unless you feel strongly that this should be explained in a comment,
I
think I'll resend this without additional comments after making the
change
to use find_pid_ns().
Ben
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH 2/3] fs/locks: Remove fl_nspid
2017-06-06 18:57 ` Benjamin Coddington
2017-06-06 20:41 ` Benjamin Coddington
@ 2017-06-06 23:05 ` Jeff Layton
1 sibling, 0 replies; 13+ messages in thread
From: Jeff Layton @ 2017-06-06 23:05 UTC (permalink / raw)
To: Benjamin Coddington; +Cc: bfields, Alexander Viro, linux-fsdevel, linux-kernel
On Tue, 2017-06-06 at 14:57 -0400, Benjamin Coddington wrote:
> On 6 Jun 2017, at 14:25, Jeff Layton wrote:
>
> > On Tue, 2017-06-06 at 14:00 -0400, Jeff Layton wrote:
> > > On Tue, 2017-06-06 at 13:19 -0400, Benjamin Coddington wrote:
> > > > Since commit c69899a17ca4 "NFSv4: Update of VFS byte range lock must
> > > > be
> > > > atomic with the stateid update", NFSv4 has been inserting locks in
> > > > rpciod
> > > > worker context. The result is that the file_lock's fl_nspid is the
> > > > kworker's pid instead of the original userspace pid.
> > > >
> > > > The fl_nspid is only used to represent the namespaced virtual pid
> > > > number
> > > > when displaying locks or returning from F_GETLK. There's no reason
> > > > to set
> > > > it for every inserted lock, since we can usually just look it up
> > > > from
> > > > fl_pid. So, instead of looking up and holding struct pid for every
> > > > lock,
> > > > let's just look up the virtual pid number from fl_pid when it is
> > > > needed.
> > > > That means we can remove fl_nspid entirely.
> > > >
> > >
> > > With this set, I think we ought to codify that the stored pid must be
> > > relative
> >
> > ...to the init_pid_ns. Let's make that clear in the comments for
> > filesystem authors.
>
> OK, but I think you mean fl_pid should always be current->tgid or the
> pid as
> it is in init_pid_ns. We translate that pid into the virtual pid of the
> process doing F_GETLK or reading /proc/locks.
>
> > > > Signed-off-by: Benjamin Coddington <bcodding@redhat.com>
> > > > ---
> > > > fs/locks.c | 58
> > > > ++++++++++++++++++++++++++++++++----------------------
> > > > include/linux/fs.h | 1 -
> > > > 2 files changed, 35 insertions(+), 24 deletions(-)
> > > >
> > > > diff --git a/fs/locks.c b/fs/locks.c
> > > > index d7daa6c8932f..104398ccc9b9 100644
> > > > --- a/fs/locks.c
> > > > +++ b/fs/locks.c
> > > > @@ -733,7 +733,6 @@ static void locks_wake_up_blocks(struct
> > > > file_lock *blocker)
> > > > static void
> > > > locks_insert_lock_ctx(struct file_lock *fl, struct list_head
> > > > *before)
> > > > {
> > > > - fl->fl_nspid = get_pid(task_tgid(current));
> > > > list_add_tail(&fl->fl_list, before);
> > > > locks_insert_global_locks(fl);
> > > > }
> > > > @@ -743,10 +742,6 @@ locks_unlink_lock_ctx(struct file_lock *fl)
> > > > {
> > > > locks_delete_global_locks(fl);
> > > > list_del_init(&fl->fl_list);
> > > > - if (fl->fl_nspid) {
> > > > - put_pid(fl->fl_nspid);
> > > > - fl->fl_nspid = NULL;
> > > > - }
> > > > locks_wake_up_blocks(fl);
> > > > }
> > > >
> > > > @@ -823,8 +818,6 @@ posix_test_lock(struct file *filp, struct
> > > > file_lock *fl)
> > > > list_for_each_entry(cfl, &ctx->flc_posix, fl_list) {
> > > > if (posix_locks_conflict(fl, cfl)) {
> > > > locks_copy_conflock(fl, cfl);
> > > > - if (cfl->fl_nspid)
> > > > - fl->fl_pid = pid_vnr(cfl->fl_nspid);
> > > > goto out;
> > > > }
> > > > }
> > > > @@ -2048,6 +2041,31 @@ int vfs_test_lock(struct file *filp, struct
> > > > file_lock *fl)
> > > > }
> > > > EXPORT_SYMBOL_GPL(vfs_test_lock);
> > > >
> > > > +/**
> > > > + * locks_translate_pid - translate a pid number into a namespace
> > > > + * @nr: The pid number in the init_pid_ns
> > > > + * @ns: The namespace into which the pid should be translated
> > > > + *
> > > > + * Used to tranlate a fl_pid into a namespace virtual pid number
> > > > + */
> > > > +static pid_t locks_translate_pid(int init_nr, struct pid_namespace
> > > > *ns)
> > > > +{
> > > > + pid_t vnr = 0;
> > > > + struct task_struct *task;
> > > > +
> > > > + rcu_read_lock();
> > > > + task = find_task_by_pid_ns(init_nr, &init_pid_ns);
> > > > + if (task)
> > > > + get_task_struct(task);
> > > > + rcu_read_unlock();
> > >
> > > Is that safe? What prevents get_task_struct from doing a 0->1
> > > transition
> > > there after the task usage count has already gone 1->0 and is on its
> > > way
> > > to being freed?
>
> Uh, no -- seems not safe. I copied that directly from fs/proc/base.c,
> and
> seems a problem there too.
>
> Changing this to the below avoids the race with the struct task being
> released:
>
> rcu_read_lock();
> struct pid = find_pid_ns(init_nr, &init_pid_ns)
> vnr = pid_vnr(pid);
> rcu_read_unlock();
>
Actually now that I've looked a little more closely, I think it may be
ok. call_rcu callback is what ends up putting the last reference when
the task is released, so if you can find the thing via
find_task_by_pid_ns, then you know that the refcount is at least 1 until
the rcu_read_lock is dropped.
Still, I think doing the pid assignment under rcu_read_lock and not
bothering with the refcount is better in this case...
Thanks,
--
Jeff Layton <jlayton@redhat.com>
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH 2/3] fs/locks: Remove fl_nspid
2017-06-06 17:19 ` [PATCH 2/3] fs/locks: Remove fl_nspid Benjamin Coddington
2017-06-06 18:00 ` Jeff Layton
@ 2017-06-08 6:50 ` kbuild test robot
1 sibling, 0 replies; 13+ messages in thread
From: kbuild test robot @ 2017-06-08 6:50 UTC (permalink / raw)
To: Benjamin Coddington
Cc: kbuild-all, Jeff Layton, bfields, Alexander Viro, linux-fsdevel,
linux-kernel
[-- Attachment #1: Type: text/plain, Size: 4434 bytes --]
Hi Benjamin,
[auto build test WARNING on jlayton/linux-next]
[also build test WARNING on next-20170607]
[cannot apply to linus/master linux/master v4.12-rc4]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]
url: https://github.com/0day-ci/linux/commits/Benjamin-Coddington/fs-locks-Use-allocation-rather-than-the-stack-in-fcntl_getlk/20170608-015328
base: git://git.samba.org/jlayton/linux linux-next
reproduce: make htmldocs
All warnings (new ones prefixed by >>):
fs/inode.c:1664: warning: No description found for parameter 'rcu'
>> fs/locks.c:2052: warning: No description found for parameter 'init_nr'
>> fs/locks.c:2052: warning: Excess function parameter 'nr' description in 'locks_translate_pid'
>> fs/locks.c:2052: warning: No description found for parameter 'init_nr'
>> fs/locks.c:2052: warning: Excess function parameter 'nr' description in 'locks_translate_pid'
include/linux/jbd2.h:442: warning: No description found for parameter 'i_transaction'
include/linux/jbd2.h:442: warning: No description found for parameter 'i_next_transaction'
include/linux/jbd2.h:442: warning: No description found for parameter 'i_list'
include/linux/jbd2.h:442: warning: No description found for parameter 'i_vfs_inode'
include/linux/jbd2.h:442: warning: No description found for parameter 'i_flags'
include/linux/jbd2.h:496: warning: No description found for parameter 'h_rsv_handle'
include/linux/jbd2.h:496: warning: No description found for parameter 'h_reserved'
include/linux/jbd2.h:496: warning: No description found for parameter 'h_type'
include/linux/jbd2.h:496: warning: No description found for parameter 'h_line_no'
include/linux/jbd2.h:496: warning: No description found for parameter 'h_start_jiffies'
include/linux/jbd2.h:496: warning: No description found for parameter 'h_requested_credits'
include/linux/jbd2.h:496: warning: No description found for parameter 'saved_alloc_context'
include/linux/jbd2.h:1049: warning: No description found for parameter 'j_chkpt_bhs'
include/linux/jbd2.h:1049: warning: No description found for parameter 'j_devname'
include/linux/jbd2.h:1049: warning: No description found for parameter 'j_average_commit_time'
include/linux/jbd2.h:1049: warning: No description found for parameter 'j_min_batch_time'
include/linux/jbd2.h:1049: warning: No description found for parameter 'j_max_batch_time'
include/linux/jbd2.h:1049: warning: No description found for parameter 'j_commit_callback'
include/linux/jbd2.h:1049: warning: No description found for parameter 'j_failed_commit'
include/linux/jbd2.h:1049: warning: No description found for parameter 'j_chksum_driver'
include/linux/jbd2.h:1049: warning: No description found for parameter 'j_csum_seed'
fs/jbd2/transaction.c:434: warning: No description found for parameter 'rsv_blocks'
fs/jbd2/transaction.c:434: warning: No description found for parameter 'gfp_mask'
fs/jbd2/transaction.c:434: warning: No description found for parameter 'type'
fs/jbd2/transaction.c:434: warning: No description found for parameter 'line_no'
fs/jbd2/transaction.c:511: warning: No description found for parameter 'type'
fs/jbd2/transaction.c:511: warning: No description found for parameter 'line_no'
fs/jbd2/transaction.c:641: warning: No description found for parameter 'gfp_mask'
vim +/init_nr +2052 fs/locks.c
2036 {
2037 if (filp->f_op->lock && is_remote_lock(filp))
2038 return filp->f_op->lock(filp, F_GETLK, fl);
2039 posix_test_lock(filp, fl);
2040 return 0;
2041 }
2042 EXPORT_SYMBOL_GPL(vfs_test_lock);
2043
2044 /**
2045 * locks_translate_pid - translate a pid number into a namespace
2046 * @nr: The pid number in the init_pid_ns
2047 * @ns: The namespace into which the pid should be translated
2048 *
2049 * Used to tranlate a fl_pid into a namespace virtual pid number
2050 */
2051 static pid_t locks_translate_pid(int init_nr, struct pid_namespace *ns)
> 2052 {
2053 pid_t vnr = 0;
2054 struct task_struct *task;
2055
2056 rcu_read_lock();
2057 task = find_task_by_pid_ns(init_nr, &init_pid_ns);
2058 if (task)
2059 get_task_struct(task);
2060 rcu_read_unlock();
---
0-DAY kernel test infrastructure Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all Intel Corporation
[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 6634 bytes --]
^ permalink raw reply [flat|nested] 13+ messages in thread
* [PATCH 3/3] fs/locks: Use fs-specific l_pid for remote locks
2017-06-06 17:19 [PATCH 0/3 v3] Fixups for l_pid Benjamin Coddington
2017-06-06 17:19 ` [PATCH 1/3] fs/locks: Use allocation rather than the stack in fcntl_getlk() Benjamin Coddington
2017-06-06 17:19 ` [PATCH 2/3] fs/locks: Remove fl_nspid Benjamin Coddington
@ 2017-06-06 17:19 ` Benjamin Coddington
2 siblings, 0 replies; 13+ messages in thread
From: Benjamin Coddington @ 2017-06-06 17:19 UTC (permalink / raw)
To: Jeff Layton, bfields, Alexander Viro; +Cc: linux-fsdevel, linux-kernel
Now that we're translating fl_pid for F_GETLK and /proc/locks, we need to
handle the case where a remote filesystem directly sets fl_pid. In that
case, the fl_pid should not be translated into a local pid namespace. If
the filesystem implements the lock operation, set a flag to return the
lock's fl_pid value directly, rather translate it.
Signed-off-by: Benjamin Coddington <bcodding@redhat.com>
---
fs/locks.c | 22 ++++++++++++++++++----
include/linux/fs.h | 1 +
2 files changed, 19 insertions(+), 4 deletions(-)
diff --git a/fs/locks.c b/fs/locks.c
index 104398ccc9b9..6858ec9802d2 100644
--- a/fs/locks.c
+++ b/fs/locks.c
@@ -2034,8 +2034,10 @@ SYSCALL_DEFINE2(flock, unsigned int, fd, unsigned int, cmd)
*/
int vfs_test_lock(struct file *filp, struct file_lock *fl)
{
- if (filp->f_op->lock && is_remote_lock(filp))
+ if (filp->f_op->lock && is_remote_lock(filp)) {
+ fl->fl_flags |= FL_PID_PRIV;
return filp->f_op->lock(filp, F_GETLK, fl);
+ }
posix_test_lock(filp, fl);
return 0;
}
@@ -2066,9 +2068,18 @@ static pid_t locks_translate_pid(int init_nr, struct pid_namespace *ns)
return vnr;
}
+static pid_t flock_translate_pid(struct file_lock *fl)
+{
+ if (IS_OFDLCK(fl))
+ return -1;
+ if (fl->fl_flags & FL_PID_PRIV)
+ return fl->fl_pid;
+ return locks_translate_pid(fl->fl_pid, task_active_pid_ns(current));
+}
+
static int posix_lock_to_flock(struct flock *flock, struct file_lock *fl)
{
- flock->l_pid = IS_OFDLCK(fl) ? -1 : fl->fl_pid;
+ flock->l_pid = flock_translate_pid(fl);
#if BITS_PER_LONG == 32
/*
* Make sure we can represent the posix lock via
@@ -2090,7 +2101,7 @@ static int posix_lock_to_flock(struct flock *flock, struct file_lock *fl)
#if BITS_PER_LONG == 32
static void posix_lock_to_flock64(struct flock64 *flock, struct file_lock *fl)
{
- flock->l_pid = IS_OFDLCK(fl) ? -1 : fl->fl_pid;
+ flock->l_pid = flock_translate_pid(fl);
flock->l_start = fl->fl_start;
flock->l_len = fl->fl_end == OFFSET_MAX ? 0 :
fl->fl_end - fl->fl_start + 1;
@@ -2604,7 +2615,10 @@ static void lock_get_status(struct seq_file *f, struct file_lock *fl,
unsigned int fl_pid;
struct pid_namespace *proc_pidns = file_inode(f->file)->i_sb->s_fs_info;
- fl_pid = locks_translate_pid(fl->fl_pid, proc_pidns);
+ if (fl->fl_flags & FL_PID_PRIV)
+ fl_pid = fl->fl_pid;
+ else
+ fl_pid = locks_translate_pid(fl->fl_pid, proc_pidns);
/*
* If there isn't a fl_pid don't display who is waiting on
* the lock if we are called from locks_show, or if we are
diff --git a/include/linux/fs.h b/include/linux/fs.h
index b013fac515f7..179496a9719d 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -908,6 +908,7 @@ static inline struct file *get_file(struct file *f)
#define FL_UNLOCK_PENDING 512 /* Lease is being broken */
#define FL_OFDLCK 1024 /* lock is "owned" by struct file */
#define FL_LAYOUT 2048 /* outstanding pNFS layout */
+#define FL_PID_PRIV 4096 /* F_GETLK should report fl_pid */
#define FL_CLOSE_POSIX (FL_POSIX | FL_CLOSE)
--
2.9.3
^ permalink raw reply related [flat|nested] 13+ messages in thread
* [PATCH 0/3 v4] Fixups for l_pid
@ 2017-06-06 20:45 Benjamin Coddington
2017-06-06 20:45 ` [PATCH 1/3] fs/locks: Use allocation rather than the stack in fcntl_getlk() Benjamin Coddington
0 siblings, 1 reply; 13+ messages in thread
From: Benjamin Coddington @ 2017-06-06 20:45 UTC (permalink / raw)
To: Jeff Layton, bfields, Alexander Viro; +Cc: linux-fsdevel, linux-kernel
LTP fcntl tests (fcntl11 fcntl14 fcntl17 fcntl19 fcntl20 fcntl21) have been
failing for NFSv4 mounts due to an unexpected l_pid. What follows are some
fixups:
v2: - Rebase onto linux-next
- Revert back to using the stack in locks_mandatory_area(), and fixup
patch description for 1/3
v3: - The lkp-robot found some serious per_thread_ops performance
regressions for v1 and v2, so this version changes things around to not
acquire a reference to struct pid in fl_nspid for every lock. Instead,
it drops fl_nspid altogether, and defers the lookup of the
namespace-translated pid until it actually needed.
v4: - Instead of looking up the virtual pid by way of referencing the struct
task of the that pid, instead use find_pid_ns() and pid_nr_ns(), which
avoids a the problem where we race to get a reference to the struct task
while it may be freed.
Benjamin Coddington (3):
fs/locks: Use allocation rather than the stack in fcntl_getlk()
fs/locks: Remove fl_nspid
fs/locks: Use fs-specific l_pid for remote locks
fs/locks.c | 116 ++++++++++++++++++++++++++++++++---------------------
include/linux/fs.h | 2 +-
2 files changed, 72 insertions(+), 46 deletions(-)
--
2.9.3
^ permalink raw reply [flat|nested] 13+ messages in thread
* [PATCH 1/3] fs/locks: Use allocation rather than the stack in fcntl_getlk()
2017-06-06 20:45 [PATCH 0/3 v4] Fixups for l_pid Benjamin Coddington
@ 2017-06-06 20:45 ` Benjamin Coddington
0 siblings, 0 replies; 13+ messages in thread
From: Benjamin Coddington @ 2017-06-06 20:45 UTC (permalink / raw)
To: Jeff Layton, bfields, Alexander Viro; +Cc: linux-fsdevel, linux-kernel
Struct file_lock is fairly large, so let's save some space on the stack by
using an allocation for struct file_lock in fcntl_getlk(), just as we do
for fcntl_setlk().
Signed-off-by: Benjamin Coddington <bcodding@redhat.com>
---
fs/locks.c | 46 ++++++++++++++++++++++++++--------------------
1 file changed, 26 insertions(+), 20 deletions(-)
diff --git a/fs/locks.c b/fs/locks.c
index afefeb4ad6de..d7daa6c8932f 100644
--- a/fs/locks.c
+++ b/fs/locks.c
@@ -2086,14 +2086,17 @@ static void posix_lock_to_flock64(struct flock64 *flock, struct file_lock *fl)
*/
int fcntl_getlk(struct file *filp, unsigned int cmd, struct flock *flock)
{
- struct file_lock file_lock;
+ struct file_lock *fl;
int error;
+ fl = locks_alloc_lock();
+ if (fl == NULL)
+ return -ENOMEM;
error = -EINVAL;
if (flock->l_type != F_RDLCK && flock->l_type != F_WRLCK)
goto out;
- error = flock_to_posix_lock(filp, &file_lock, flock);
+ error = flock_to_posix_lock(filp, fl, flock);
if (error)
goto out;
@@ -2103,23 +2106,22 @@ int fcntl_getlk(struct file *filp, unsigned int cmd, struct flock *flock)
goto out;
cmd = F_GETLK;
- file_lock.fl_flags |= FL_OFDLCK;
- file_lock.fl_owner = filp;
+ fl->fl_flags |= FL_OFDLCK;
+ fl->fl_owner = filp;
}
- error = vfs_test_lock(filp, &file_lock);
+ error = vfs_test_lock(filp, fl);
if (error)
goto out;
- flock->l_type = file_lock.fl_type;
- if (file_lock.fl_type != F_UNLCK) {
- error = posix_lock_to_flock(flock, &file_lock);
+ flock->l_type = fl->fl_type;
+ if (fl->fl_type != F_UNLCK) {
+ error = posix_lock_to_flock(flock, fl);
if (error)
- goto rel_priv;
+ goto out;
}
-rel_priv:
- locks_release_private(&file_lock);
out:
+ locks_free_lock(fl);
return error;
}
@@ -2298,14 +2300,18 @@ int fcntl_setlk(unsigned int fd, struct file *filp, unsigned int cmd,
*/
int fcntl_getlk64(struct file *filp, unsigned int cmd, struct flock64 *flock)
{
- struct file_lock file_lock;
+ struct file_lock *fl;
int error;
+ fl = locks_alloc_lock();
+ if (fl == NULL)
+ return -ENOMEM;
+
error = -EINVAL;
if (flock->l_type != F_RDLCK && flock->l_type != F_WRLCK)
goto out;
- error = flock64_to_posix_lock(filp, &file_lock, flock);
+ error = flock64_to_posix_lock(filp, fl, flock);
if (error)
goto out;
@@ -2315,20 +2321,20 @@ int fcntl_getlk64(struct file *filp, unsigned int cmd, struct flock64 *flock)
goto out;
cmd = F_GETLK64;
- file_lock.fl_flags |= FL_OFDLCK;
- file_lock.fl_owner = filp;
+ fl->fl_flags |= FL_OFDLCK;
+ fl->fl_owner = filp;
}
- error = vfs_test_lock(filp, &file_lock);
+ error = vfs_test_lock(filp, fl);
if (error)
goto out;
- flock->l_type = file_lock.fl_type;
- if (file_lock.fl_type != F_UNLCK)
- posix_lock_to_flock64(flock, &file_lock);
+ flock->l_type = fl->fl_type;
+ if (fl->fl_type != F_UNLCK)
+ posix_lock_to_flock64(flock, fl);
- locks_release_private(&file_lock);
out:
+ locks_free_lock(fl);
return error;
}
--
2.9.3
^ permalink raw reply related [flat|nested] 13+ messages in thread
* [PATCH 0/3 v6] Fixups for l_pid
@ 2017-06-27 15:18 Benjamin Coddington
[not found] ` <cover.1498572504.git.bcodding-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
0 siblings, 1 reply; 13+ messages in thread
From: Benjamin Coddington @ 2017-06-27 15:18 UTC (permalink / raw)
To: Oleg Drokin, Andreas Dilger, James Simmons, Greg Kroah-Hartman,
Eric Van Hensbergen, Ron Minnich, Latchesar Ionkov, Yan, Zheng,
Sage Weil, Ilya Dryomov, Steve French, Christine Caulfield,
David Teigland, Miklos Szeredi, Alexander Viro, Jeff Layton,
J. Bruce Fields, Vitaly Fertman, John L. Hammond, Andriy Skulysh,
Benjamin Coddington, Emoly Liu
Cc: devel-gWbeCf7V1WCQmaza687I9mD2FQJk+8+b,
linux-cifs-u79uwXL29TY76Z2rM5mHXA,
samba-technical-w/Ol4Ecudpl8XjKLYN78aQ,
linux-kernel-u79uwXL29TY76Z2rM5mHXA,
cluster-devel-H+wXaHxf7aLQT0dZR+AlfA,
linux-fsdevel-u79uwXL29TY76Z2rM5mHXA,
v9fs-developer-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f,
ceph-devel-u79uwXL29TY76Z2rM5mHXA,
lustre-devel-aLEFhgZF4x6X6Mz3xDxJMA
LTP fcntl tests (fcntl11 fcntl14 fcntl17 fcntl19 fcntl20 fcntl21) have been
failing for NFSv4 mounts due to an unexpected l_pid. What follows are some
fixups:
v2: - Rebase onto linux-next
- Revert back to using the stack in locks_mandatory_area(), and fixup
patch description for 1/3
v3: - The lkp-robot found some serious per_thread_ops performance
regressions for v1 and v2, so this version changes things around to not
acquire a reference to struct pid in fl_nspid for every lock. Instead,
it drops fl_nspid altogether, and defers the lookup of the
namespace-translated pid until it actually needed.
v4: - Instead of looking up the virtual pid by way of referencing the struct
task of the that pid, instead use find_pid_ns() and pid_nr_ns(), which
avoids a the problem where we race to get a reference to the struct task
while it may be freed.
v5: - Squash previous 2/3 and 3/3 to avoid regression where F_GETLK would
return the init_ns pid instead of a translated pid.
v6: - State clearly how the differing cases of l_pid translation should be
handled, specifically regarding remote locks on remote files: that
fl_pid ought to be returned from the filesystem as <= 0 to indicate that
it makes no sense to translate this l_pid.
- Follow up with a patch to have filesystems negate fl_pid for remote
locks on remote files.
Benjamin Coddington (3):
fs/locks: Use allocation rather than the stack in fcntl_getlk()
fs/locks: Remove fl_nspid and use fs-specific l_pid for remote locks
staging/lustre, 9p, ceph, cifs, dlm: negate remote pids for F_GETLK
drivers/staging/lustre/lustre/ldlm/ldlm_flock.c | 2 +-
fs/9p/vfs_file.c | 2 +-
fs/ceph/locks.c | 2 +-
fs/cifs/cifssmb.c | 2 +-
fs/dlm/plock.c | 2 +-
fs/fuse/file.c | 6 +-
fs/locks.c | 108 ++++++++++++++----------
include/linux/fs.h | 2 +-
8 files changed, 72 insertions(+), 54 deletions(-)
--
2.9.3
^ permalink raw reply [flat|nested] 13+ messages in thread
end of thread, other threads:[~2017-06-27 15:25 UTC | newest]
Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-06-06 17:19 [PATCH 0/3 v3] Fixups for l_pid Benjamin Coddington
2017-06-06 17:19 ` [PATCH 1/3] fs/locks: Use allocation rather than the stack in fcntl_getlk() Benjamin Coddington
2017-06-06 17:19 ` [PATCH 2/3] fs/locks: Remove fl_nspid Benjamin Coddington
2017-06-06 18:00 ` Jeff Layton
2017-06-06 18:25 ` Jeff Layton
2017-06-06 18:57 ` Benjamin Coddington
2017-06-06 20:41 ` Benjamin Coddington
2017-06-06 23:05 ` Jeff Layton
2017-06-08 6:50 ` kbuild test robot
2017-06-06 17:19 ` [PATCH 3/3] fs/locks: Use fs-specific l_pid for remote locks Benjamin Coddington
2017-06-06 20:45 [PATCH 0/3 v4] Fixups for l_pid Benjamin Coddington
2017-06-06 20:45 ` [PATCH 1/3] fs/locks: Use allocation rather than the stack in fcntl_getlk() Benjamin Coddington
2017-06-27 15:18 [PATCH 0/3 v6] Fixups for l_pid Benjamin Coddington
[not found] ` <cover.1498572504.git.bcodding-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2017-06-27 15:18 ` [PATCH 1/3] fs/locks: Use allocation rather than the stack in fcntl_getlk() Benjamin Coddington
2017-06-27 15:18 ` Benjamin Coddington
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.