All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC PATCH] locks: Show only file_locks created in the same pidns as current process
@ 2016-08-02 14:42 ` Nikolay Borisov
  0 siblings, 0 replies; 62+ messages in thread
From: Nikolay Borisov @ 2016-08-02 14:42 UTC (permalink / raw)
  To: jlayton-vpEMnDpepFuMZCB2o+C8xQ, viro-RmSDqhL/yNMiFSDQTTA3OLVCufUGDwFn
  Cc: serge.hallyn-Z7WLFzj8eWMS+FvcfC7Uqw,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	bfields-uC3wQj2KruNg9hUCZPvPmw, Nikolay Borisov,
	ebiederm-aS9lmoZGLiVWk0Htik3J/w,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA

Currently when /proc/locks is read it will show all the file locks
which are currently created on the machine. On containers, hosted
on busy servers this means that doing lsof can be very slow. I
observed up to 5 seconds stalls reading 50k locks, while the container
itself had only a small number of relevant entries. Fix it by
filtering the locks listed by the pidns of the current process
and the process which created the lock.

Signed-off-by: Nikolay Borisov <kernel-6AxghH7DbtA@public.gmane.org>
---
 fs/locks.c | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/fs/locks.c b/fs/locks.c
index 6333263b7bc8..53e96df4c583 100644
--- a/fs/locks.c
+++ b/fs/locks.c
@@ -2615,9 +2615,17 @@ static int locks_show(struct seq_file *f, void *v)
 {
 	struct locks_iterator *iter = f->private;
 	struct file_lock *fl, *bfl;
+	struct pid_namespace *pid_ns = task_active_pid_ns(current);
+
 
 	fl = hlist_entry(v, struct file_lock, fl_link);
 
+	pr_info ("Current pid_ns: %p init_pid_ns: %p, fl->fl_nspid: %p nspidof:%p\n", pid_ns, &init_pid_ns,
+		 fl->fl_nspid, ns_of_pid(fl->fl_nspid));
+	if ((pid_ns != &init_pid_ns) && fl->fl_nspid &&
+		(pid_ns != ns_of_pid(fl->fl_nspid)))
+		    return 0;
+
 	lock_get_status(f, fl, iter->li_pos, "");
 
 	list_for_each_entry(bfl, &fl->fl_block, fl_block)
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 62+ messages in thread

* [RFC PATCH] locks: Show only file_locks created in the same pidns as current process
@ 2016-08-02 14:42 ` Nikolay Borisov
  0 siblings, 0 replies; 62+ messages in thread
From: Nikolay Borisov @ 2016-08-02 14:42 UTC (permalink / raw)
  To: jlayton, viro
  Cc: bfields, linux-kernel, linux-fsdevel, ebiederm, containers,
	serge.hallyn, Nikolay Borisov

Currently when /proc/locks is read it will show all the file locks
which are currently created on the machine. On containers, hosted
on busy servers this means that doing lsof can be very slow. I
observed up to 5 seconds stalls reading 50k locks, while the container
itself had only a small number of relevant entries. Fix it by
filtering the locks listed by the pidns of the current process
and the process which created the lock.

Signed-off-by: Nikolay Borisov <kernel@kyup.com>
---
 fs/locks.c | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/fs/locks.c b/fs/locks.c
index 6333263b7bc8..53e96df4c583 100644
--- a/fs/locks.c
+++ b/fs/locks.c
@@ -2615,9 +2615,17 @@ static int locks_show(struct seq_file *f, void *v)
 {
 	struct locks_iterator *iter = f->private;
 	struct file_lock *fl, *bfl;
+	struct pid_namespace *pid_ns = task_active_pid_ns(current);
+
 
 	fl = hlist_entry(v, struct file_lock, fl_link);
 
+	pr_info ("Current pid_ns: %p init_pid_ns: %p, fl->fl_nspid: %p nspidof:%p\n", pid_ns, &init_pid_ns,
+		 fl->fl_nspid, ns_of_pid(fl->fl_nspid));
+	if ((pid_ns != &init_pid_ns) && fl->fl_nspid &&
+		(pid_ns != ns_of_pid(fl->fl_nspid)))
+		    return 0;
+
 	lock_get_status(f, fl, iter->li_pos, "");
 
 	list_for_each_entry(bfl, &fl->fl_block, fl_block)
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 62+ messages in thread

* Re: [RFC PATCH] locks: Show only file_locks created in the same pidns as current process
  2016-08-02 14:42 ` Nikolay Borisov
@ 2016-08-02 14:45     ` Nikolay Borisov
  -1 siblings, 0 replies; 62+ messages in thread
From: Nikolay Borisov @ 2016-08-02 14:45 UTC (permalink / raw)
  To: jlayton-vpEMnDpepFuMZCB2o+C8xQ, viro-RmSDqhL/yNMiFSDQTTA3OLVCufUGDwFn
  Cc: serge.hallyn-Z7WLFzj8eWMS+FvcfC7Uqw,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	bfields-uC3wQj2KruNg9hUCZPvPmw, ebiederm-aS9lmoZGLiVWk0Htik3J/w,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA



On 08/02/2016 05:42 PM, Nikolay Borisov wrote:
> Currently when /proc/locks is read it will show all the file locks
> which are currently created on the machine. On containers, hosted
> on busy servers this means that doing lsof can be very slow. I
> observed up to 5 seconds stalls reading 50k locks, while the container
> itself had only a small number of relevant entries. Fix it by
> filtering the locks listed by the pidns of the current process
> and the process which created the lock.
> 
> Signed-off-by: Nikolay Borisov <kernel-6AxghH7DbtA@public.gmane.org>
> ---
>  fs/locks.c | 8 ++++++++
>  1 file changed, 8 insertions(+)
> 
> diff --git a/fs/locks.c b/fs/locks.c
> index 6333263b7bc8..53e96df4c583 100644
> --- a/fs/locks.c
> +++ b/fs/locks.c
> @@ -2615,9 +2615,17 @@ static int locks_show(struct seq_file *f, void *v)
>  {
>  	struct locks_iterator *iter = f->private;
>  	struct file_lock *fl, *bfl;
> +	struct pid_namespace *pid_ns = task_active_pid_ns(current);
> +
>  
>  	fl = hlist_entry(v, struct file_lock, fl_link);
>  
> +	pr_info ("Current pid_ns: %p init_pid_ns: %p, fl->fl_nspid: %p nspidof:%p\n", pid_ns, &init_pid_ns,
> +		 fl->fl_nspid, ns_of_pid(fl->fl_nspid));

Obviously I don't intend on including that in the final submission.

> +	if ((pid_ns != &init_pid_ns) && fl->fl_nspid &&
> +		(pid_ns != ns_of_pid(fl->fl_nspid)))
> +		    return 0;
> +
>  	lock_get_status(f, fl, iter->li_pos, "");
>  
>  	list_for_each_entry(bfl, &fl->fl_block, fl_block)
> 

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [RFC PATCH] locks: Show only file_locks created in the same pidns as current process
@ 2016-08-02 14:45     ` Nikolay Borisov
  0 siblings, 0 replies; 62+ messages in thread
From: Nikolay Borisov @ 2016-08-02 14:45 UTC (permalink / raw)
  To: jlayton, viro
  Cc: serge.hallyn, containers, linux-kernel, bfields, ebiederm, linux-fsdevel



On 08/02/2016 05:42 PM, Nikolay Borisov wrote:
> Currently when /proc/locks is read it will show all the file locks
> which are currently created on the machine. On containers, hosted
> on busy servers this means that doing lsof can be very slow. I
> observed up to 5 seconds stalls reading 50k locks, while the container
> itself had only a small number of relevant entries. Fix it by
> filtering the locks listed by the pidns of the current process
> and the process which created the lock.
> 
> Signed-off-by: Nikolay Borisov <kernel@kyup.com>
> ---
>  fs/locks.c | 8 ++++++++
>  1 file changed, 8 insertions(+)
> 
> diff --git a/fs/locks.c b/fs/locks.c
> index 6333263b7bc8..53e96df4c583 100644
> --- a/fs/locks.c
> +++ b/fs/locks.c
> @@ -2615,9 +2615,17 @@ static int locks_show(struct seq_file *f, void *v)
>  {
>  	struct locks_iterator *iter = f->private;
>  	struct file_lock *fl, *bfl;
> +	struct pid_namespace *pid_ns = task_active_pid_ns(current);
> +
>  
>  	fl = hlist_entry(v, struct file_lock, fl_link);
>  
> +	pr_info ("Current pid_ns: %p init_pid_ns: %p, fl->fl_nspid: %p nspidof:%p\n", pid_ns, &init_pid_ns,
> +		 fl->fl_nspid, ns_of_pid(fl->fl_nspid));

Obviously I don't intend on including that in the final submission.

> +	if ((pid_ns != &init_pid_ns) && fl->fl_nspid &&
> +		(pid_ns != ns_of_pid(fl->fl_nspid)))
> +		    return 0;
> +
>  	lock_get_status(f, fl, iter->li_pos, "");
>  
>  	list_for_each_entry(bfl, &fl->fl_block, fl_block)
> 

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [RFC PATCH] locks: Show only file_locks created in the same pidns as current process
       [not found] ` <1470148943-21835-1-git-send-email-kernel-6AxghH7DbtA@public.gmane.org>
  2016-08-02 14:45     ` Nikolay Borisov
@ 2016-08-02 15:05   ` J. Bruce Fields
  2016-08-02 16:00     ` Eric W. Biederman
                     ` (4 subsequent siblings)
  6 siblings, 0 replies; 62+ messages in thread
From: J. Bruce Fields @ 2016-08-02 15:05 UTC (permalink / raw)
  To: Nikolay Borisov
  Cc: serge.hallyn-Z7WLFzj8eWMS+FvcfC7Uqw,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	ebiederm-aS9lmoZGLiVWk0Htik3J/w,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA,
	jlayton-vpEMnDpepFuMZCB2o+C8xQ,
	viro-RmSDqhL/yNMiFSDQTTA3OLVCufUGDwFn

On Tue, Aug 02, 2016 at 05:42:23PM +0300, Nikolay Borisov wrote:
> Currently when /proc/locks is read it will show all the file locks
> which are currently created on the machine. On containers, hosted
> on busy servers this means that doing lsof can be very slow. I
> observed up to 5 seconds stalls reading 50k locks,

Do you mean just that the reading process itself was blocked, or that
others were getting stuck on blocked_lock_lock?

(And what process was actually reading /proc/locks, out of curiosity?)

> while the container
> itself had only a small number of relevant entries. Fix it by
> filtering the locks listed by the pidns of the current process
> and the process which created the lock.

Thanks, that's interesting.  So you show a lock if it was created by
someone in the current pid namespace.  With a special exception for the
init namespace so that 

If a filesystem is shared between containers that means you won't
necessarily be able to figure out from within a container which lock is
conflicting with your lock.  (I don't know if that's really a problem.
I'm unfortunately short on evidence aobut what people actually use
/proc/locks for....)

--b.

> 
> Signed-off-by: Nikolay Borisov <kernel-6AxghH7DbtA@public.gmane.org>
> ---
>  fs/locks.c | 8 ++++++++
>  1 file changed, 8 insertions(+)
> 
> diff --git a/fs/locks.c b/fs/locks.c
> index 6333263b7bc8..53e96df4c583 100644
> --- a/fs/locks.c
> +++ b/fs/locks.c
> @@ -2615,9 +2615,17 @@ static int locks_show(struct seq_file *f, void *v)
>  {
>  	struct locks_iterator *iter = f->private;
>  	struct file_lock *fl, *bfl;
> +	struct pid_namespace *pid_ns = task_active_pid_ns(current);
> +
>  
>  	fl = hlist_entry(v, struct file_lock, fl_link);
>  
> +	pr_info ("Current pid_ns: %p init_pid_ns: %p, fl->fl_nspid: %p nspidof:%p\n", pid_ns, &init_pid_ns,
> +		 fl->fl_nspid, ns_of_pid(fl->fl_nspid));
> +	if ((pid_ns != &init_pid_ns) && fl->fl_nspid &&
> +		(pid_ns != ns_of_pid(fl->fl_nspid)))
> +		    return 0;
> +
>  	lock_get_status(f, fl, iter->li_pos, "");
>  
>  	list_for_each_entry(bfl, &fl->fl_block, fl_block)
> -- 
> 2.5.0

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [RFC PATCH] locks: Show only file_locks created in the same pidns as current process
  2016-08-02 14:42 ` Nikolay Borisov
  (?)
  (?)
@ 2016-08-02 15:05 ` J. Bruce Fields
       [not found]   ` <20160802150521.GB11767-uC3wQj2KruNg9hUCZPvPmw@public.gmane.org>
  -1 siblings, 1 reply; 62+ messages in thread
From: J. Bruce Fields @ 2016-08-02 15:05 UTC (permalink / raw)
  To: Nikolay Borisov
  Cc: jlayton, viro, linux-kernel, linux-fsdevel, ebiederm, containers,
	serge.hallyn

On Tue, Aug 02, 2016 at 05:42:23PM +0300, Nikolay Borisov wrote:
> Currently when /proc/locks is read it will show all the file locks
> which are currently created on the machine. On containers, hosted
> on busy servers this means that doing lsof can be very slow. I
> observed up to 5 seconds stalls reading 50k locks,

Do you mean just that the reading process itself was blocked, or that
others were getting stuck on blocked_lock_lock?

(And what process was actually reading /proc/locks, out of curiosity?)

> while the container
> itself had only a small number of relevant entries. Fix it by
> filtering the locks listed by the pidns of the current process
> and the process which created the lock.

Thanks, that's interesting.  So you show a lock if it was created by
someone in the current pid namespace.  With a special exception for the
init namespace so that 

If a filesystem is shared between containers that means you won't
necessarily be able to figure out from within a container which lock is
conflicting with your lock.  (I don't know if that's really a problem.
I'm unfortunately short on evidence aobut what people actually use
/proc/locks for....)

--b.

> 
> Signed-off-by: Nikolay Borisov <kernel@kyup.com>
> ---
>  fs/locks.c | 8 ++++++++
>  1 file changed, 8 insertions(+)
> 
> diff --git a/fs/locks.c b/fs/locks.c
> index 6333263b7bc8..53e96df4c583 100644
> --- a/fs/locks.c
> +++ b/fs/locks.c
> @@ -2615,9 +2615,17 @@ static int locks_show(struct seq_file *f, void *v)
>  {
>  	struct locks_iterator *iter = f->private;
>  	struct file_lock *fl, *bfl;
> +	struct pid_namespace *pid_ns = task_active_pid_ns(current);
> +
>  
>  	fl = hlist_entry(v, struct file_lock, fl_link);
>  
> +	pr_info ("Current pid_ns: %p init_pid_ns: %p, fl->fl_nspid: %p nspidof:%p\n", pid_ns, &init_pid_ns,
> +		 fl->fl_nspid, ns_of_pid(fl->fl_nspid));
> +	if ((pid_ns != &init_pid_ns) && fl->fl_nspid &&
> +		(pid_ns != ns_of_pid(fl->fl_nspid)))
> +		    return 0;
> +
>  	lock_get_status(f, fl, iter->li_pos, "");
>  
>  	list_for_each_entry(bfl, &fl->fl_block, fl_block)
> -- 
> 2.5.0

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [RFC PATCH] locks: Show only file_locks created in the same pidns as current process
  2016-08-02 15:05 ` [RFC PATCH] locks: Show only file_locks created in the same pidns as current process J. Bruce Fields
@ 2016-08-02 15:20       ` Nikolay Borisov
  0 siblings, 0 replies; 62+ messages in thread
From: Nikolay Borisov @ 2016-08-02 15:20 UTC (permalink / raw)
  To: J. Bruce Fields
  Cc: serge.hallyn-Z7WLFzj8eWMS+FvcfC7Uqw,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	ebiederm-aS9lmoZGLiVWk0Htik3J/w,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA,
	jlayton-vpEMnDpepFuMZCB2o+C8xQ,
	viro-RmSDqhL/yNMiFSDQTTA3OLVCufUGDwFn



On 08/02/2016 06:05 PM, J. Bruce Fields wrote:
> On Tue, Aug 02, 2016 at 05:42:23PM +0300, Nikolay Borisov wrote:
>> Currently when /proc/locks is read it will show all the file locks
>> which are currently created on the machine. On containers, hosted
>> on busy servers this means that doing lsof can be very slow. I
>> observed up to 5 seconds stalls reading 50k locks,
> 
> Do you mean just that the reading process itself was blocked, or that
> others were getting stuck on blocked_lock_lock?

I mean the listing process. Here is a simplified example from cat: 

cat-15084 [010] 3394000.190341: funcgraph_entry:      # 6156.641 us |  vfs_read();
cat-15084 [010] 3394000.196568: funcgraph_entry:      # 6096.618 us |  vfs_read();
cat-15084 [010] 3394000.202743: funcgraph_entry:      # 6060.097 us |  vfs_read();
cat-15084 [010] 3394000.208937: funcgraph_entry:      # 6111.374 us |  vfs_read();


> 
> (And what process was actually reading /proc/locks, out of curiosity?)

lsof in my case

> 
>> while the container
>> itself had only a small number of relevant entries. Fix it by
>> filtering the locks listed by the pidns of the current process
>> and the process which created the lock.
> 
> Thanks, that's interesting.  So you show a lock if it was created by
> someone in the current pid namespace.  With a special exception for the
> init namespace so that 

I admit this is a rather naive approach. Something else I was pondering was 
checking whether the user_ns of the lock's creator pidns is the same as the 
reader's user_ns. That should potentially solve your concerns re. 
shared filesystems, no? Or whether the reader's userns is an ancestor 
of the user'ns of the creator's pidns? Maybe Eric can elaborate whether 
this would make sense?

> 
> If a filesystem is shared between containers that means you won't
> necessarily be able to figure out from within a container which lock is
> conflicting with your lock.  (I don't know if that's really a problem.
> I'm unfortunately short on evidence aobut what people actually use
> /proc/locks for....)
> 
> --b.
> 
>>
>> Signed-off-by: Nikolay Borisov <kernel-6AxghH7DbtA@public.gmane.org>
>> ---
>>  fs/locks.c | 8 ++++++++
>>  1 file changed, 8 insertions(+)
>>
>> diff --git a/fs/locks.c b/fs/locks.c
>> index 6333263b7bc8..53e96df4c583 100644
>> --- a/fs/locks.c
>> +++ b/fs/locks.c
>> @@ -2615,9 +2615,17 @@ static int locks_show(struct seq_file *f, void *v)
>>  {
>>  	struct locks_iterator *iter = f->private;
>>  	struct file_lock *fl, *bfl;
>> +	struct pid_namespace *pid_ns = task_active_pid_ns(current);
>> +
>>  
>>  	fl = hlist_entry(v, struct file_lock, fl_link);
>>  
>> +	pr_info ("Current pid_ns: %p init_pid_ns: %p, fl->fl_nspid: %p nspidof:%p\n", pid_ns, &init_pid_ns,
>> +		 fl->fl_nspid, ns_of_pid(fl->fl_nspid));
>> +	if ((pid_ns != &init_pid_ns) && fl->fl_nspid &&
>> +		(pid_ns != ns_of_pid(fl->fl_nspid)))
>> +		    return 0;
>> +
>>  	lock_get_status(f, fl, iter->li_pos, "");
>>  
>>  	list_for_each_entry(bfl, &fl->fl_block, fl_block)
>> -- 
>> 2.5.0

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [RFC PATCH] locks: Show only file_locks created in the same pidns as current process
@ 2016-08-02 15:20       ` Nikolay Borisov
  0 siblings, 0 replies; 62+ messages in thread
From: Nikolay Borisov @ 2016-08-02 15:20 UTC (permalink / raw)
  To: J. Bruce Fields
  Cc: jlayton, viro, linux-kernel, linux-fsdevel, ebiederm, containers,
	serge.hallyn



On 08/02/2016 06:05 PM, J. Bruce Fields wrote:
> On Tue, Aug 02, 2016 at 05:42:23PM +0300, Nikolay Borisov wrote:
>> Currently when /proc/locks is read it will show all the file locks
>> which are currently created on the machine. On containers, hosted
>> on busy servers this means that doing lsof can be very slow. I
>> observed up to 5 seconds stalls reading 50k locks,
> 
> Do you mean just that the reading process itself was blocked, or that
> others were getting stuck on blocked_lock_lock?

I mean the listing process. Here is a simplified example from cat: 

cat-15084 [010] 3394000.190341: funcgraph_entry:      # 6156.641 us |  vfs_read();
cat-15084 [010] 3394000.196568: funcgraph_entry:      # 6096.618 us |  vfs_read();
cat-15084 [010] 3394000.202743: funcgraph_entry:      # 6060.097 us |  vfs_read();
cat-15084 [010] 3394000.208937: funcgraph_entry:      # 6111.374 us |  vfs_read();


> 
> (And what process was actually reading /proc/locks, out of curiosity?)

lsof in my case

> 
>> while the container
>> itself had only a small number of relevant entries. Fix it by
>> filtering the locks listed by the pidns of the current process
>> and the process which created the lock.
> 
> Thanks, that's interesting.  So you show a lock if it was created by
> someone in the current pid namespace.  With a special exception for the
> init namespace so that 

I admit this is a rather naive approach. Something else I was pondering was 
checking whether the user_ns of the lock's creator pidns is the same as the 
reader's user_ns. That should potentially solve your concerns re. 
shared filesystems, no? Or whether the reader's userns is an ancestor 
of the user'ns of the creator's pidns? Maybe Eric can elaborate whether 
this would make sense?

> 
> If a filesystem is shared between containers that means you won't
> necessarily be able to figure out from within a container which lock is
> conflicting with your lock.  (I don't know if that's really a problem.
> I'm unfortunately short on evidence aobut what people actually use
> /proc/locks for....)
> 
> --b.
> 
>>
>> Signed-off-by: Nikolay Borisov <kernel@kyup.com>
>> ---
>>  fs/locks.c | 8 ++++++++
>>  1 file changed, 8 insertions(+)
>>
>> diff --git a/fs/locks.c b/fs/locks.c
>> index 6333263b7bc8..53e96df4c583 100644
>> --- a/fs/locks.c
>> +++ b/fs/locks.c
>> @@ -2615,9 +2615,17 @@ static int locks_show(struct seq_file *f, void *v)
>>  {
>>  	struct locks_iterator *iter = f->private;
>>  	struct file_lock *fl, *bfl;
>> +	struct pid_namespace *pid_ns = task_active_pid_ns(current);
>> +
>>  
>>  	fl = hlist_entry(v, struct file_lock, fl_link);
>>  
>> +	pr_info ("Current pid_ns: %p init_pid_ns: %p, fl->fl_nspid: %p nspidof:%p\n", pid_ns, &init_pid_ns,
>> +		 fl->fl_nspid, ns_of_pid(fl->fl_nspid));
>> +	if ((pid_ns != &init_pid_ns) && fl->fl_nspid &&
>> +		(pid_ns != ns_of_pid(fl->fl_nspid)))
>> +		    return 0;
>> +
>>  	lock_get_status(f, fl, iter->li_pos, "");
>>  
>>  	list_for_each_entry(bfl, &fl->fl_block, fl_block)
>> -- 
>> 2.5.0

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [RFC PATCH] locks: Show only file_locks created in the same pidns as current process
  2016-08-02 15:20       ` Nikolay Borisov
@ 2016-08-02 15:43           ` J. Bruce Fields
  -1 siblings, 0 replies; 62+ messages in thread
From: J. Bruce Fields @ 2016-08-02 15:43 UTC (permalink / raw)
  To: Nikolay Borisov
  Cc: serge.hallyn-Z7WLFzj8eWMS+FvcfC7Uqw,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	ebiederm-aS9lmoZGLiVWk0Htik3J/w,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA,
	jlayton-vpEMnDpepFuMZCB2o+C8xQ,
	viro-RmSDqhL/yNMiFSDQTTA3OLVCufUGDwFn

On Tue, Aug 02, 2016 at 06:20:32PM +0300, Nikolay Borisov wrote:
> On 08/02/2016 06:05 PM, J. Bruce Fields wrote:
> > (And what process was actually reading /proc/locks, out of curiosity?)
> 
> lsof in my case

Oh, thanks, and you said that at the start, and I overlooked
it--apologies.

> >> while the container
> >> itself had only a small number of relevant entries. Fix it by
> >> filtering the locks listed by the pidns of the current process
> >> and the process which created the lock.
> > 
> > Thanks, that's interesting.  So you show a lock if it was created by
> > someone in the current pid namespace.  With a special exception for the
> > init namespace so that 
> 
> I admit this is a rather naive approach. Something else I was pondering was 
> checking whether the user_ns of the lock's creator pidns is the same as the 
> reader's user_ns. That should potentially solve your concerns re. 
> shared filesystems, no? Or whether the reader's userns is an ancestor 
> of the user'ns of the creator's pidns? Maybe Eric can elaborate whether 
> this would make sense?

If I could just imagine myself king of the world for a moment--I wish I
could have an interface that took a path or a filehandle and gave back a
list of locks on the associated filesystem.  Then if lsof wanted a
global list, it would go through /proc/mounts and request the list of
locks for each filesystem.

For /proc/locks it might be nice if we could restrict to locks on
filesystem that are somehow visible to the current process, but I don't
know if there's a simple way to do that.

--b.

> 
> > 
> > If a filesystem is shared between containers that means you won't
> > necessarily be able to figure out from within a container which lock is
> > conflicting with your lock.  (I don't know if that's really a problem.
> > I'm unfortunately short on evidence aobut what people actually use
> > /proc/locks for....)
> > 
> > --b.
> > 
> >>
> >> Signed-off-by: Nikolay Borisov <kernel-6AxghH7DbtA@public.gmane.org>
> >> ---
> >>  fs/locks.c | 8 ++++++++
> >>  1 file changed, 8 insertions(+)
> >>
> >> diff --git a/fs/locks.c b/fs/locks.c
> >> index 6333263b7bc8..53e96df4c583 100644
> >> --- a/fs/locks.c
> >> +++ b/fs/locks.c
> >> @@ -2615,9 +2615,17 @@ static int locks_show(struct seq_file *f, void *v)
> >>  {
> >>  	struct locks_iterator *iter = f->private;
> >>  	struct file_lock *fl, *bfl;
> >> +	struct pid_namespace *pid_ns = task_active_pid_ns(current);
> >> +
> >>  
> >>  	fl = hlist_entry(v, struct file_lock, fl_link);
> >>  
> >> +	pr_info ("Current pid_ns: %p init_pid_ns: %p, fl->fl_nspid: %p nspidof:%p\n", pid_ns, &init_pid_ns,
> >> +		 fl->fl_nspid, ns_of_pid(fl->fl_nspid));
> >> +	if ((pid_ns != &init_pid_ns) && fl->fl_nspid &&
> >> +		(pid_ns != ns_of_pid(fl->fl_nspid)))
> >> +		    return 0;
> >> +
> >>  	lock_get_status(f, fl, iter->li_pos, "");
> >>  
> >>  	list_for_each_entry(bfl, &fl->fl_block, fl_block)
> >> -- 
> >> 2.5.0

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [RFC PATCH] locks: Show only file_locks created in the same pidns as current process
@ 2016-08-02 15:43           ` J. Bruce Fields
  0 siblings, 0 replies; 62+ messages in thread
From: J. Bruce Fields @ 2016-08-02 15:43 UTC (permalink / raw)
  To: Nikolay Borisov
  Cc: jlayton, viro, linux-kernel, linux-fsdevel, ebiederm, containers,
	serge.hallyn

On Tue, Aug 02, 2016 at 06:20:32PM +0300, Nikolay Borisov wrote:
> On 08/02/2016 06:05 PM, J. Bruce Fields wrote:
> > (And what process was actually reading /proc/locks, out of curiosity?)
> 
> lsof in my case

Oh, thanks, and you said that at the start, and I overlooked
it--apologies.

> >> while the container
> >> itself had only a small number of relevant entries. Fix it by
> >> filtering the locks listed by the pidns of the current process
> >> and the process which created the lock.
> > 
> > Thanks, that's interesting.  So you show a lock if it was created by
> > someone in the current pid namespace.  With a special exception for the
> > init namespace so that 
> 
> I admit this is a rather naive approach. Something else I was pondering was 
> checking whether the user_ns of the lock's creator pidns is the same as the 
> reader's user_ns. That should potentially solve your concerns re. 
> shared filesystems, no? Or whether the reader's userns is an ancestor 
> of the user'ns of the creator's pidns? Maybe Eric can elaborate whether 
> this would make sense?

If I could just imagine myself king of the world for a moment--I wish I
could have an interface that took a path or a filehandle and gave back a
list of locks on the associated filesystem.  Then if lsof wanted a
global list, it would go through /proc/mounts and request the list of
locks for each filesystem.

For /proc/locks it might be nice if we could restrict to locks on
filesystem that are somehow visible to the current process, but I don't
know if there's a simple way to do that.

--b.

> 
> > 
> > If a filesystem is shared between containers that means you won't
> > necessarily be able to figure out from within a container which lock is
> > conflicting with your lock.  (I don't know if that's really a problem.
> > I'm unfortunately short on evidence aobut what people actually use
> > /proc/locks for....)
> > 
> > --b.
> > 
> >>
> >> Signed-off-by: Nikolay Borisov <kernel@kyup.com>
> >> ---
> >>  fs/locks.c | 8 ++++++++
> >>  1 file changed, 8 insertions(+)
> >>
> >> diff --git a/fs/locks.c b/fs/locks.c
> >> index 6333263b7bc8..53e96df4c583 100644
> >> --- a/fs/locks.c
> >> +++ b/fs/locks.c
> >> @@ -2615,9 +2615,17 @@ static int locks_show(struct seq_file *f, void *v)
> >>  {
> >>  	struct locks_iterator *iter = f->private;
> >>  	struct file_lock *fl, *bfl;
> >> +	struct pid_namespace *pid_ns = task_active_pid_ns(current);
> >> +
> >>  
> >>  	fl = hlist_entry(v, struct file_lock, fl_link);
> >>  
> >> +	pr_info ("Current pid_ns: %p init_pid_ns: %p, fl->fl_nspid: %p nspidof:%p\n", pid_ns, &init_pid_ns,
> >> +		 fl->fl_nspid, ns_of_pid(fl->fl_nspid));
> >> +	if ((pid_ns != &init_pid_ns) && fl->fl_nspid &&
> >> +		(pid_ns != ns_of_pid(fl->fl_nspid)))
> >> +		    return 0;
> >> +
> >>  	lock_get_status(f, fl, iter->li_pos, "");
> >>  
> >>  	list_for_each_entry(bfl, &fl->fl_block, fl_block)
> >> -- 
> >> 2.5.0

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [RFC PATCH] locks: Show only file_locks created in the same pidns as current process
  2016-08-02 14:42 ` Nikolay Borisov
@ 2016-08-02 16:00     ` Eric W. Biederman
  -1 siblings, 0 replies; 62+ messages in thread
From: Eric W. Biederman @ 2016-08-02 16:00 UTC (permalink / raw)
  To: Nikolay Borisov
  Cc: serge.hallyn-Z7WLFzj8eWMS+FvcfC7Uqw,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	bfields-uC3wQj2KruNg9hUCZPvPmw,
	viro-RmSDqhL/yNMiFSDQTTA3OLVCufUGDwFn,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA,
	jlayton-vpEMnDpepFuMZCB2o+C8xQ

Nikolay Borisov <kernel-6AxghH7DbtA@public.gmane.org> writes:

> Currently when /proc/locks is read it will show all the file locks
> which are currently created on the machine. On containers, hosted
> on busy servers this means that doing lsof can be very slow. I
> observed up to 5 seconds stalls reading 50k locks, while the container
> itself had only a small number of relevant entries. Fix it by
> filtering the locks listed by the pidns of the current process
> and the process which created the lock.

The locks always confuse me so I am not 100% connecting locks
to a pid namespace is appropriate.

That said if you are going to filter by pid namespace please use the pid
namespace of proc, not the pid namespace of the process reading the
file.

Different contents of files depending on who opens them is generally to
be discouraged.

Eric

> Signed-off-by: Nikolay Borisov <kernel-6AxghH7DbtA@public.gmane.org>
> ---
>  fs/locks.c | 8 ++++++++
>  1 file changed, 8 insertions(+)
>
> diff --git a/fs/locks.c b/fs/locks.c
> index 6333263b7bc8..53e96df4c583 100644
> --- a/fs/locks.c
> +++ b/fs/locks.c
> @@ -2615,9 +2615,17 @@ static int locks_show(struct seq_file *f, void *v)
>  {
>  	struct locks_iterator *iter = f->private;
>  	struct file_lock *fl, *bfl;
> +	struct pid_namespace *pid_ns = task_active_pid_ns(current);
> +
>  
>  	fl = hlist_entry(v, struct file_lock, fl_link);
>  
> +	pr_info ("Current pid_ns: %p init_pid_ns: %p, fl->fl_nspid: %p nspidof:%p\n", pid_ns, &init_pid_ns,
> +		 fl->fl_nspid, ns_of_pid(fl->fl_nspid));
> +	if ((pid_ns != &init_pid_ns) && fl->fl_nspid &&
> +		(pid_ns != ns_of_pid(fl->fl_nspid)))
> +		    return 0;
> +
>  	lock_get_status(f, fl, iter->li_pos, "");
>  
>  	list_for_each_entry(bfl, &fl->fl_block, fl_block)

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [RFC PATCH] locks: Show only file_locks created in the same pidns as current process
@ 2016-08-02 16:00     ` Eric W. Biederman
  0 siblings, 0 replies; 62+ messages in thread
From: Eric W. Biederman @ 2016-08-02 16:00 UTC (permalink / raw)
  To: Nikolay Borisov
  Cc: jlayton, viro, bfields, linux-kernel, linux-fsdevel, containers,
	serge.hallyn

Nikolay Borisov <kernel@kyup.com> writes:

> Currently when /proc/locks is read it will show all the file locks
> which are currently created on the machine. On containers, hosted
> on busy servers this means that doing lsof can be very slow. I
> observed up to 5 seconds stalls reading 50k locks, while the container
> itself had only a small number of relevant entries. Fix it by
> filtering the locks listed by the pidns of the current process
> and the process which created the lock.

The locks always confuse me so I am not 100% connecting locks
to a pid namespace is appropriate.

That said if you are going to filter by pid namespace please use the pid
namespace of proc, not the pid namespace of the process reading the
file.

Different contents of files depending on who opens them is generally to
be discouraged.

Eric

> Signed-off-by: Nikolay Borisov <kernel@kyup.com>
> ---
>  fs/locks.c | 8 ++++++++
>  1 file changed, 8 insertions(+)
>
> diff --git a/fs/locks.c b/fs/locks.c
> index 6333263b7bc8..53e96df4c583 100644
> --- a/fs/locks.c
> +++ b/fs/locks.c
> @@ -2615,9 +2615,17 @@ static int locks_show(struct seq_file *f, void *v)
>  {
>  	struct locks_iterator *iter = f->private;
>  	struct file_lock *fl, *bfl;
> +	struct pid_namespace *pid_ns = task_active_pid_ns(current);
> +
>  
>  	fl = hlist_entry(v, struct file_lock, fl_link);
>  
> +	pr_info ("Current pid_ns: %p init_pid_ns: %p, fl->fl_nspid: %p nspidof:%p\n", pid_ns, &init_pid_ns,
> +		 fl->fl_nspid, ns_of_pid(fl->fl_nspid));
> +	if ((pid_ns != &init_pid_ns) && fl->fl_nspid &&
> +		(pid_ns != ns_of_pid(fl->fl_nspid)))
> +		    return 0;
> +
>  	lock_get_status(f, fl, iter->li_pos, "");
>  
>  	list_for_each_entry(bfl, &fl->fl_block, fl_block)

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [RFC PATCH] locks: Show only file_locks created in the same pidns as current process
  2016-08-02 16:00     ` Eric W. Biederman
@ 2016-08-02 17:40         ` J. Bruce Fields
  -1 siblings, 0 replies; 62+ messages in thread
From: J. Bruce Fields @ 2016-08-02 17:40 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: serge.hallyn-Z7WLFzj8eWMS+FvcfC7Uqw,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA, Nikolay Borisov,
	viro-RmSDqhL/yNMiFSDQTTA3OLVCufUGDwFn,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA,
	jlayton-vpEMnDpepFuMZCB2o+C8xQ

On Tue, Aug 02, 2016 at 11:00:39AM -0500, Eric W. Biederman wrote:
> Nikolay Borisov <kernel-6AxghH7DbtA@public.gmane.org> writes:
> 
> > Currently when /proc/locks is read it will show all the file locks
> > which are currently created on the machine. On containers, hosted
> > on busy servers this means that doing lsof can be very slow. I
> > observed up to 5 seconds stalls reading 50k locks, while the container
> > itself had only a small number of relevant entries. Fix it by
> > filtering the locks listed by the pidns of the current process
> > and the process which created the lock.
> 
> The locks always confuse me so I am not 100% connecting locks
> to a pid namespace is appropriate.
> 
> That said if you are going to filter by pid namespace please use the pid
> namespace of proc, not the pid namespace of the process reading the
> file.

Oh, that makes sense, thanks.

What does /proc/mounts use, out of curiosity?  The mount namespace that
/proc was originally mounted in?

--b.

> 
> Different contents of files depending on who opens them is generally to
> be discouraged.
> 
> Eric
> 
> > Signed-off-by: Nikolay Borisov <kernel-6AxghH7DbtA@public.gmane.org>
> > ---
> >  fs/locks.c | 8 ++++++++
> >  1 file changed, 8 insertions(+)
> >
> > diff --git a/fs/locks.c b/fs/locks.c
> > index 6333263b7bc8..53e96df4c583 100644
> > --- a/fs/locks.c
> > +++ b/fs/locks.c
> > @@ -2615,9 +2615,17 @@ static int locks_show(struct seq_file *f, void *v)
> >  {
> >  	struct locks_iterator *iter = f->private;
> >  	struct file_lock *fl, *bfl;
> > +	struct pid_namespace *pid_ns = task_active_pid_ns(current);
> > +
> >  
> >  	fl = hlist_entry(v, struct file_lock, fl_link);
> >  
> > +	pr_info ("Current pid_ns: %p init_pid_ns: %p, fl->fl_nspid: %p nspidof:%p\n", pid_ns, &init_pid_ns,
> > +		 fl->fl_nspid, ns_of_pid(fl->fl_nspid));
> > +	if ((pid_ns != &init_pid_ns) && fl->fl_nspid &&
> > +		(pid_ns != ns_of_pid(fl->fl_nspid)))
> > +		    return 0;
> > +
> >  	lock_get_status(f, fl, iter->li_pos, "");
> >  
> >  	list_for_each_entry(bfl, &fl->fl_block, fl_block)

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [RFC PATCH] locks: Show only file_locks created in the same pidns as current process
@ 2016-08-02 17:40         ` J. Bruce Fields
  0 siblings, 0 replies; 62+ messages in thread
From: J. Bruce Fields @ 2016-08-02 17:40 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Nikolay Borisov, jlayton, viro, linux-kernel, linux-fsdevel,
	containers, serge.hallyn

On Tue, Aug 02, 2016 at 11:00:39AM -0500, Eric W. Biederman wrote:
> Nikolay Borisov <kernel@kyup.com> writes:
> 
> > Currently when /proc/locks is read it will show all the file locks
> > which are currently created on the machine. On containers, hosted
> > on busy servers this means that doing lsof can be very slow. I
> > observed up to 5 seconds stalls reading 50k locks, while the container
> > itself had only a small number of relevant entries. Fix it by
> > filtering the locks listed by the pidns of the current process
> > and the process which created the lock.
> 
> The locks always confuse me so I am not 100% connecting locks
> to a pid namespace is appropriate.
> 
> That said if you are going to filter by pid namespace please use the pid
> namespace of proc, not the pid namespace of the process reading the
> file.

Oh, that makes sense, thanks.

What does /proc/mounts use, out of curiosity?  The mount namespace that
/proc was originally mounted in?

--b.

> 
> Different contents of files depending on who opens them is generally to
> be discouraged.
> 
> Eric
> 
> > Signed-off-by: Nikolay Borisov <kernel@kyup.com>
> > ---
> >  fs/locks.c | 8 ++++++++
> >  1 file changed, 8 insertions(+)
> >
> > diff --git a/fs/locks.c b/fs/locks.c
> > index 6333263b7bc8..53e96df4c583 100644
> > --- a/fs/locks.c
> > +++ b/fs/locks.c
> > @@ -2615,9 +2615,17 @@ static int locks_show(struct seq_file *f, void *v)
> >  {
> >  	struct locks_iterator *iter = f->private;
> >  	struct file_lock *fl, *bfl;
> > +	struct pid_namespace *pid_ns = task_active_pid_ns(current);
> > +
> >  
> >  	fl = hlist_entry(v, struct file_lock, fl_link);
> >  
> > +	pr_info ("Current pid_ns: %p init_pid_ns: %p, fl->fl_nspid: %p nspidof:%p\n", pid_ns, &init_pid_ns,
> > +		 fl->fl_nspid, ns_of_pid(fl->fl_nspid));
> > +	if ((pid_ns != &init_pid_ns) && fl->fl_nspid &&
> > +		(pid_ns != ns_of_pid(fl->fl_nspid)))
> > +		    return 0;
> > +
> >  	lock_get_status(f, fl, iter->li_pos, "");
> >  
> >  	list_for_each_entry(bfl, &fl->fl_block, fl_block)

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [RFC PATCH] locks: Show only file_locks created in the same pidns as current process
  2016-08-02 17:40         ` J. Bruce Fields
@ 2016-08-02 19:09             ` Eric W. Biederman
  -1 siblings, 0 replies; 62+ messages in thread
From: Eric W. Biederman @ 2016-08-02 19:09 UTC (permalink / raw)
  To: J. Bruce Fields
  Cc: serge.hallyn-Z7WLFzj8eWMS+FvcfC7Uqw,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA, Nikolay Borisov,
	viro-RmSDqhL/yNMiFSDQTTA3OLVCufUGDwFn,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA,
	jlayton-vpEMnDpepFuMZCB2o+C8xQ

"J. Bruce Fields" <bfields-uC3wQj2KruNg9hUCZPvPmw@public.gmane.org> writes:

> On Tue, Aug 02, 2016 at 11:00:39AM -0500, Eric W. Biederman wrote:
>> Nikolay Borisov <kernel-6AxghH7DbtA@public.gmane.org> writes:
>> 
>> > Currently when /proc/locks is read it will show all the file locks
>> > which are currently created on the machine. On containers, hosted
>> > on busy servers this means that doing lsof can be very slow. I
>> > observed up to 5 seconds stalls reading 50k locks, while the container
>> > itself had only a small number of relevant entries. Fix it by
>> > filtering the locks listed by the pidns of the current process
>> > and the process which created the lock.
>> 
>> The locks always confuse me so I am not 100% connecting locks
>> to a pid namespace is appropriate.
>> 
>> That said if you are going to filter by pid namespace please use the pid
>> namespace of proc, not the pid namespace of the process reading the
>> file.
>
> Oh, that makes sense, thanks.
>
> What does /proc/mounts use, out of curiosity?  The mount namespace that
> /proc was originally mounted in?

/proc/mounts -> /proc/self/mounts

/proc/[pid]/mounts lists mounts from the mount namespace of the
appropriate process.

That is another way to go but it is a tread carefully thing as changing
things that way it is easy to surprise apparmor or selinux rules and be
surprised you broke someones userspace in a way that prevents booting.
Although I suspect /proc/locks isn't too bad.

Eric

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [RFC PATCH] locks: Show only file_locks created in the same pidns as current process
@ 2016-08-02 19:09             ` Eric W. Biederman
  0 siblings, 0 replies; 62+ messages in thread
From: Eric W. Biederman @ 2016-08-02 19:09 UTC (permalink / raw)
  To: J. Bruce Fields
  Cc: Nikolay Borisov, jlayton, viro, linux-kernel, linux-fsdevel,
	containers, serge.hallyn

"J. Bruce Fields" <bfields@fieldses.org> writes:

> On Tue, Aug 02, 2016 at 11:00:39AM -0500, Eric W. Biederman wrote:
>> Nikolay Borisov <kernel@kyup.com> writes:
>> 
>> > Currently when /proc/locks is read it will show all the file locks
>> > which are currently created on the machine. On containers, hosted
>> > on busy servers this means that doing lsof can be very slow. I
>> > observed up to 5 seconds stalls reading 50k locks, while the container
>> > itself had only a small number of relevant entries. Fix it by
>> > filtering the locks listed by the pidns of the current process
>> > and the process which created the lock.
>> 
>> The locks always confuse me so I am not 100% connecting locks
>> to a pid namespace is appropriate.
>> 
>> That said if you are going to filter by pid namespace please use the pid
>> namespace of proc, not the pid namespace of the process reading the
>> file.
>
> Oh, that makes sense, thanks.
>
> What does /proc/mounts use, out of curiosity?  The mount namespace that
> /proc was originally mounted in?

/proc/mounts -> /proc/self/mounts

/proc/[pid]/mounts lists mounts from the mount namespace of the
appropriate process.

That is another way to go but it is a tread carefully thing as changing
things that way it is easy to surprise apparmor or selinux rules and be
surprised you broke someones userspace in a way that prevents booting.
Although I suspect /proc/locks isn't too bad.

Eric

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [RFC PATCH] locks: Show only file_locks created in the same pidns as current process
  2016-08-02 19:09             ` Eric W. Biederman
@ 2016-08-02 19:44                 ` J. Bruce Fields
  -1 siblings, 0 replies; 62+ messages in thread
From: J. Bruce Fields @ 2016-08-02 19:44 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: serge.hallyn-Z7WLFzj8eWMS+FvcfC7Uqw,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA, Nikolay Borisov,
	viro-RmSDqhL/yNMiFSDQTTA3OLVCufUGDwFn,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA,
	jlayton-vpEMnDpepFuMZCB2o+C8xQ

On Tue, Aug 02, 2016 at 02:09:22PM -0500, Eric W. Biederman wrote:
> "J. Bruce Fields" <bfields-uC3wQj2KruNg9hUCZPvPmw@public.gmane.org> writes:
> 
> > On Tue, Aug 02, 2016 at 11:00:39AM -0500, Eric W. Biederman wrote:
> >> Nikolay Borisov <kernel-6AxghH7DbtA@public.gmane.org> writes:
> >> 
> >> > Currently when /proc/locks is read it will show all the file locks
> >> > which are currently created on the machine. On containers, hosted
> >> > on busy servers this means that doing lsof can be very slow. I
> >> > observed up to 5 seconds stalls reading 50k locks, while the container
> >> > itself had only a small number of relevant entries. Fix it by
> >> > filtering the locks listed by the pidns of the current process
> >> > and the process which created the lock.
> >> 
> >> The locks always confuse me so I am not 100% connecting locks
> >> to a pid namespace is appropriate.
> >> 
> >> That said if you are going to filter by pid namespace please use the pid
> >> namespace of proc, not the pid namespace of the process reading the
> >> file.
> >
> > Oh, that makes sense, thanks.
> >
> > What does /proc/mounts use, out of curiosity?  The mount namespace that
> > /proc was originally mounted in?
> 
> /proc/mounts -> /proc/self/mounts

D'oh, I knew that.

> /proc/[pid]/mounts lists mounts from the mount namespace of the
> appropriate process.
> 
> That is another way to go but it is a tread carefully thing as changing
> things that way it is easy to surprise apparmor or selinux rules and be
> surprised you broke someones userspace in a way that prevents booting.
> Although I suspect /proc/locks isn't too bad.

OK, thanks.

/proc/[pid]/locks might be confusing.  I'd expect it to be "all the
locks owned by this task", rather than "all the locks owned by pid's in
the same pid namespace", or whatever criterion we choose.

Uh, I'm still trying to think of the Obviously Right solution here, and
it's not coming.

--b.

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [RFC PATCH] locks: Show only file_locks created in the same pidns as current process
@ 2016-08-02 19:44                 ` J. Bruce Fields
  0 siblings, 0 replies; 62+ messages in thread
From: J. Bruce Fields @ 2016-08-02 19:44 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Nikolay Borisov, jlayton, viro, linux-kernel, linux-fsdevel,
	containers, serge.hallyn

On Tue, Aug 02, 2016 at 02:09:22PM -0500, Eric W. Biederman wrote:
> "J. Bruce Fields" <bfields@fieldses.org> writes:
> 
> > On Tue, Aug 02, 2016 at 11:00:39AM -0500, Eric W. Biederman wrote:
> >> Nikolay Borisov <kernel@kyup.com> writes:
> >> 
> >> > Currently when /proc/locks is read it will show all the file locks
> >> > which are currently created on the machine. On containers, hosted
> >> > on busy servers this means that doing lsof can be very slow. I
> >> > observed up to 5 seconds stalls reading 50k locks, while the container
> >> > itself had only a small number of relevant entries. Fix it by
> >> > filtering the locks listed by the pidns of the current process
> >> > and the process which created the lock.
> >> 
> >> The locks always confuse me so I am not 100% connecting locks
> >> to a pid namespace is appropriate.
> >> 
> >> That said if you are going to filter by pid namespace please use the pid
> >> namespace of proc, not the pid namespace of the process reading the
> >> file.
> >
> > Oh, that makes sense, thanks.
> >
> > What does /proc/mounts use, out of curiosity?  The mount namespace that
> > /proc was originally mounted in?
> 
> /proc/mounts -> /proc/self/mounts

D'oh, I knew that.

> /proc/[pid]/mounts lists mounts from the mount namespace of the
> appropriate process.
> 
> That is another way to go but it is a tread carefully thing as changing
> things that way it is easy to surprise apparmor or selinux rules and be
> surprised you broke someones userspace in a way that prevents booting.
> Although I suspect /proc/locks isn't too bad.

OK, thanks.

/proc/[pid]/locks might be confusing.  I'd expect it to be "all the
locks owned by this task", rather than "all the locks owned by pid's in
the same pid namespace", or whatever criterion we choose.

Uh, I'm still trying to think of the Obviously Right solution here, and
it's not coming.

--b.

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [RFC PATCH] locks: Show only file_locks created in the same pidns as current process
  2016-08-02 19:44                 ` J. Bruce Fields
@ 2016-08-02 20:01                     ` Jeff Layton
  -1 siblings, 0 replies; 62+ messages in thread
From: Jeff Layton @ 2016-08-02 20:01 UTC (permalink / raw)
  To: J. Bruce Fields, Eric W. Biederman
  Cc: serge.hallyn-Z7WLFzj8eWMS+FvcfC7Uqw,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA, Nikolay Borisov,
	viro-RmSDqhL/yNMiFSDQTTA3OLVCufUGDwFn,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA

On Tue, 2016-08-02 at 15:44 -0400, J. Bruce Fields wrote:
> On Tue, Aug 02, 2016 at 02:09:22PM -0500, Eric W. Biederman wrote:
> > 
> > > > "J. Bruce Fields" <bfields@fieldses.org> writes:
> > 
> > > 
> > > On Tue, Aug 02, 2016 at 11:00:39AM -0500, Eric W. Biederman wrote:
> > > > 
> > > > > > > > Nikolay Borisov <kernel@kyup.com> writes:
> > > > 
> > > > > 
> > > > > Currently when /proc/locks is read it will show all the file locks
> > > > > which are currently created on the machine. On containers, hosted
> > > > > on busy servers this means that doing lsof can be very slow. I
> > > > > observed up to 5 seconds stalls reading 50k locks, while the container
> > > > > itself had only a small number of relevant entries. Fix it by
> > > > > filtering the locks listed by the pidns of the current process
> > > > > and the process which created the lock.
> > > > 
> > > > The locks always confuse me so I am not 100% connecting locks
> > > > to a pid namespace is appropriate.
> > > > 
> > > > That said if you are going to filter by pid namespace please use the pid
> > > > namespace of proc, not the pid namespace of the process reading the
> > > > file.
> > > 
> > > Oh, that makes sense, thanks.
> > > 
> > > What does /proc/mounts use, out of curiosity?  The mount namespace that
> > > /proc was originally mounted in?
> > 
> > /proc/mounts -> /proc/self/mounts
> 
> D'oh, I knew that.
> 
> > 
> > /proc/[pid]/mounts lists mounts from the mount namespace of the
> > appropriate process.
> > 
> > That is another way to go but it is a tread carefully thing as changing
> > things that way it is easy to surprise apparmor or selinux rules and be
> > surprised you broke someones userspace in a way that prevents booting.
> > Although I suspect /proc/locks isn't too bad.
> 
> OK, thanks.
> 
> /proc/[pid]/locks might be confusing.  I'd expect it to be "all the
> locks owned by this task", rather than "all the locks owned by pid's in
> the same pid namespace", or whatever criterion we choose.
> 
> Uh, I'm still trying to think of the Obviously Right solution here, and
> it's not coming.
> 
> --b.


I'm a little leery of changing how this works. It has always been
maintained as a legacy interface, so do we run the risk of breaking
something if we turn it into a per-namespace thing? This also doesn't
solve the problem of slow traversal in the init_pid_ns -- only in a
container.

I also can't help but feel that /proc/locks is just showing its age. It
was fine in the late 90's, but its limitations are just becoming more
apparent as things get more complex. It was never designed for
performance as you end up thrashing several spinlocks when reading it.

Maybe it's time to think about presenting this info in another way? A
global view of all locks on the system is interesting but maybe it
would be better to present it more granularly somehow?

I guess I should go look at what lsof actually does with this info...

-- 
Jeff Layton <jlayton@poochiereds.net>
_______________________________________________
Containers mailing list
Containers@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/containers

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [RFC PATCH] locks: Show only file_locks created in the same pidns as current process
@ 2016-08-02 20:01                     ` Jeff Layton
  0 siblings, 0 replies; 62+ messages in thread
From: Jeff Layton @ 2016-08-02 20:01 UTC (permalink / raw)
  To: J. Bruce Fields, Eric W. Biederman
  Cc: Nikolay Borisov, viro, linux-kernel, linux-fsdevel, containers,
	serge.hallyn

On Tue, 2016-08-02 at 15:44 -0400, J. Bruce Fields wrote:
> On Tue, Aug 02, 2016 at 02:09:22PM -0500, Eric W. Biederman wrote:
> > 
> > > > "J. Bruce Fields" <bfields@fieldses.org> writes:
> > 
> > > 
> > > On Tue, Aug 02, 2016 at 11:00:39AM -0500, Eric W. Biederman wrote:
> > > > 
> > > > > > > > Nikolay Borisov <kernel@kyup.com> writes:
> > > > 
> > > > > 
> > > > > Currently when /proc/locks is read it will show all the file locks
> > > > > which are currently created on the machine. On containers, hosted
> > > > > on busy servers this means that doing lsof can be very slow. I
> > > > > observed up to 5 seconds stalls reading 50k locks, while the container
> > > > > itself had only a small number of relevant entries. Fix it by
> > > > > filtering the locks listed by the pidns of the current process
> > > > > and the process which created the lock.
> > > > 
> > > > The locks always confuse me so I am not 100% connecting locks
> > > > to a pid namespace is appropriate.
> > > > 
> > > > That said if you are going to filter by pid namespace please use the pid
> > > > namespace of proc, not the pid namespace of the process reading the
> > > > file.
> > > 
> > > Oh, that makes sense, thanks.
> > > 
> > > What does /proc/mounts use, out of curiosity?  The mount namespace that
> > > /proc was originally mounted in?
> > 
> > /proc/mounts -> /proc/self/mounts
> 
> D'oh, I knew that.
> 
> > 
> > /proc/[pid]/mounts lists mounts from the mount namespace of the
> > appropriate process.
> > 
> > That is another way to go but it is a tread carefully thing as changing
> > things that way it is easy to surprise apparmor or selinux rules and be
> > surprised you broke someones userspace in a way that prevents booting.
> > Although I suspect /proc/locks isn't too bad.
> 
> OK, thanks.
> 
> /proc/[pid]/locks might be confusing.  I'd expect it to be "all the
> locks owned by this task", rather than "all the locks owned by pid's in
> the same pid namespace", or whatever criterion we choose.
> 
> Uh, I'm still trying to think of the Obviously Right solution here, and
> it's not coming.
> 
> --b.


I'm a little leery of changing how this works. It has always been
maintained as a legacy interface, so do we run the risk of breaking
something if we turn it into a per-namespace thing? This also doesn't
solve the problem of slow traversal in the init_pid_ns -- only in a
container.

I also can't help but feel that /proc/locks is just showing its age. It
was fine in the late 90's, but its limitations are just becoming more
apparent as things get more complex. It was never designed for
performance as you end up thrashing several spinlocks when reading it.

Maybe it's time to think about presenting this info in another way? A
global view of all locks on the system is interesting but maybe it
would be better to present it more granularly somehow?

I guess I should go look at what lsof actually does with this info...

-- 
Jeff Layton <jlayton@poochiereds.net>

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [RFC PATCH] locks: Show only file_locks created in the same pidns as current process
  2016-08-02 20:01                     ` Jeff Layton
@ 2016-08-02 20:11                         ` Nikolay Borisov
  -1 siblings, 0 replies; 62+ messages in thread
From: Nikolay Borisov @ 2016-08-02 20:11 UTC (permalink / raw)
  To: Jeff Layton
  Cc: Serge Hallyn, Linux Containers, LKML, J. Bruce Fields,
	Nikolay Borisov, Eric W. Biederman,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, Alexander Viro

On Tue, Aug 2, 2016 at 11:01 PM, Jeff Layton <jlayton-vpEMnDpepFuMZCB2o+C8xQ@public.gmane.org> wrote:
> On Tue, 2016-08-02 at 15:44 -0400, J. Bruce Fields wrote:
>> On Tue, Aug 02, 2016 at 02:09:22PM -0500, Eric W. Biederman wrote:
>> >
>> > > > "J. Bruce Fields" <bfields-uC3wQj2KruNg9hUCZPvPmw@public.gmane.org> writes:
>> >
>> > >
>> > > On Tue, Aug 02, 2016 at 11:00:39AM -0500, Eric W. Biederman wrote:
>> > > >
>> > > > > > > > Nikolay Borisov <kernel-6AxghH7DbtA@public.gmane.org> writes:
>> > > >
>> > > > >
>> > > > > Currently when /proc/locks is read it will show all the file locks
>> > > > > which are currently created on the machine. On containers, hosted
>> > > > > on busy servers this means that doing lsof can be very slow. I
>> > > > > observed up to 5 seconds stalls reading 50k locks, while the container
>> > > > > itself had only a small number of relevant entries. Fix it by
>> > > > > filtering the locks listed by the pidns of the current process
>> > > > > and the process which created the lock.
>> > > >
>> > > > The locks always confuse me so I am not 100% connecting locks
>> > > > to a pid namespace is appropriate.
>> > > >
>> > > > That said if you are going to filter by pid namespace please use the pid
>> > > > namespace of proc, not the pid namespace of the process reading the
>> > > > file.
>> > >
>> > > Oh, that makes sense, thanks.
>> > >
>> > > What does /proc/mounts use, out of curiosity?  The mount namespace that
>> > > /proc was originally mounted in?
>> >
>> > /proc/mounts -> /proc/self/mounts
>>
>> D'oh, I knew that.
>>
>> >
>> > /proc/[pid]/mounts lists mounts from the mount namespace of the
>> > appropriate process.
>> >
>> > That is another way to go but it is a tread carefully thing as changing
>> > things that way it is easy to surprise apparmor or selinux rules and be
>> > surprised you broke someones userspace in a way that prevents booting.
>> > Although I suspect /proc/locks isn't too bad.
>>
>> OK, thanks.
>>
>> /proc/[pid]/locks might be confusing.  I'd expect it to be "all the
>> locks owned by this task", rather than "all the locks owned by pid's in
>> the same pid namespace", or whatever criterion we choose.
>>
>> Uh, I'm still trying to think of the Obviously Right solution here, and
>> it's not coming.
>>
>> --b.
>
>
> I'm a little leery of changing how this works. It has always been
> maintained as a legacy interface, so do we run the risk of breaking
> something if we turn it into a per-namespace thing? This also doesn't
> solve the problem of slow traversal in the init_pid_ns -- only in a
> container.
>
> I also can't help but feel that /proc/locks is just showing its age. It
> was fine in the late 90's, but its limitations are just becoming more
> apparent as things get more complex. It was never designed for
> performance as you end up thrashing several spinlocks when reading it.

I believe it's also used by CRIU, though in this case you'd be doing
that from the init ns so I guess it's not that big of a problem there.

>
> Maybe it's time to think about presenting this info in another way? A
> global view of all locks on the system is interesting but maybe it
> would be better to present it more granularly somehow?
>
> I guess I should go look at what lsof actually does with this info...
>
> --
> Jeff Layton <jlayton-vpEMnDpepFuMZCB2o+C8xQ@public.gmane.org>

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [RFC PATCH] locks: Show only file_locks created in the same pidns as current process
@ 2016-08-02 20:11                         ` Nikolay Borisov
  0 siblings, 0 replies; 62+ messages in thread
From: Nikolay Borisov @ 2016-08-02 20:11 UTC (permalink / raw)
  To: Jeff Layton
  Cc: J. Bruce Fields, Eric W. Biederman, Nikolay Borisov,
	Alexander Viro, LKML, linux-fsdevel, Linux Containers,
	Serge Hallyn

On Tue, Aug 2, 2016 at 11:01 PM, Jeff Layton <jlayton@poochiereds.net> wrote:
> On Tue, 2016-08-02 at 15:44 -0400, J. Bruce Fields wrote:
>> On Tue, Aug 02, 2016 at 02:09:22PM -0500, Eric W. Biederman wrote:
>> >
>> > > > "J. Bruce Fields" <bfields@fieldses.org> writes:
>> >
>> > >
>> > > On Tue, Aug 02, 2016 at 11:00:39AM -0500, Eric W. Biederman wrote:
>> > > >
>> > > > > > > > Nikolay Borisov <kernel@kyup.com> writes:
>> > > >
>> > > > >
>> > > > > Currently when /proc/locks is read it will show all the file locks
>> > > > > which are currently created on the machine. On containers, hosted
>> > > > > on busy servers this means that doing lsof can be very slow. I
>> > > > > observed up to 5 seconds stalls reading 50k locks, while the container
>> > > > > itself had only a small number of relevant entries. Fix it by
>> > > > > filtering the locks listed by the pidns of the current process
>> > > > > and the process which created the lock.
>> > > >
>> > > > The locks always confuse me so I am not 100% connecting locks
>> > > > to a pid namespace is appropriate.
>> > > >
>> > > > That said if you are going to filter by pid namespace please use the pid
>> > > > namespace of proc, not the pid namespace of the process reading the
>> > > > file.
>> > >
>> > > Oh, that makes sense, thanks.
>> > >
>> > > What does /proc/mounts use, out of curiosity?  The mount namespace that
>> > > /proc was originally mounted in?
>> >
>> > /proc/mounts -> /proc/self/mounts
>>
>> D'oh, I knew that.
>>
>> >
>> > /proc/[pid]/mounts lists mounts from the mount namespace of the
>> > appropriate process.
>> >
>> > That is another way to go but it is a tread carefully thing as changing
>> > things that way it is easy to surprise apparmor or selinux rules and be
>> > surprised you broke someones userspace in a way that prevents booting.
>> > Although I suspect /proc/locks isn't too bad.
>>
>> OK, thanks.
>>
>> /proc/[pid]/locks might be confusing.  I'd expect it to be "all the
>> locks owned by this task", rather than "all the locks owned by pid's in
>> the same pid namespace", or whatever criterion we choose.
>>
>> Uh, I'm still trying to think of the Obviously Right solution here, and
>> it's not coming.
>>
>> --b.
>
>
> I'm a little leery of changing how this works. It has always been
> maintained as a legacy interface, so do we run the risk of breaking
> something if we turn it into a per-namespace thing? This also doesn't
> solve the problem of slow traversal in the init_pid_ns -- only in a
> container.
>
> I also can't help but feel that /proc/locks is just showing its age. It
> was fine in the late 90's, but its limitations are just becoming more
> apparent as things get more complex. It was never designed for
> performance as you end up thrashing several spinlocks when reading it.

I believe it's also used by CRIU, though in this case you'd be doing
that from the init ns so I guess it's not that big of a problem there.

>
> Maybe it's time to think about presenting this info in another way? A
> global view of all locks on the system is interesting but maybe it
> would be better to present it more granularly somehow?
>
> I guess I should go look at what lsof actually does with this info...
>
> --
> Jeff Layton <jlayton@poochiereds.net>

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [RFC PATCH] locks: Show only file_locks created in the same pidns as current process
  2016-08-02 20:01                     ` Jeff Layton
@ 2016-08-02 20:34                         ` J. Bruce Fields
  -1 siblings, 0 replies; 62+ messages in thread
From: J. Bruce Fields @ 2016-08-02 20:34 UTC (permalink / raw)
  To: Jeff Layton
  Cc: serge.hallyn-Z7WLFzj8eWMS+FvcfC7Uqw,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA, Nikolay Borisov,
	Eric W. Biederman, linux-fsdevel-u79uwXL29TY76Z2rM5mHXA,
	viro-RmSDqhL/yNMiFSDQTTA3OLVCufUGDwFn

On Tue, Aug 02, 2016 at 04:01:22PM -0400, Jeff Layton wrote:
> On Tue, 2016-08-02 at 15:44 -0400, J. Bruce Fields wrote:
> > On Tue, Aug 02, 2016 at 02:09:22PM -0500, Eric W. Biederman wrote:
> > > 
> > > > > "J. Bruce Fields" <bfields@fieldses.org> writes:
> > > 
> > > > 
> > > > On Tue, Aug 02, 2016 at 11:00:39AM -0500, Eric W. Biederman wrote:
> > > > > 
> > > > > > > > > Nikolay Borisov <kernel@kyup.com> writes:
> > > > > 
> > > > > > 
> > > > > > Currently when /proc/locks is read it will show all the file locks
> > > > > > which are currently created on the machine. On containers, hosted
> > > > > > on busy servers this means that doing lsof can be very slow. I
> > > > > > observed up to 5 seconds stalls reading 50k locks, while the container
> > > > > > itself had only a small number of relevant entries. Fix it by
> > > > > > filtering the locks listed by the pidns of the current process
> > > > > > and the process which created the lock.
> > > > > 
> > > > > The locks always confuse me so I am not 100% connecting locks
> > > > > to a pid namespace is appropriate.
> > > > > 
> > > > > That said if you are going to filter by pid namespace please use the pid
> > > > > namespace of proc, not the pid namespace of the process reading the
> > > > > file.
> > > > 
> > > > Oh, that makes sense, thanks.
> > > > 
> > > > What does /proc/mounts use, out of curiosity?  The mount namespace that
> > > > /proc was originally mounted in?
> > > 
> > > /proc/mounts -> /proc/self/mounts
> > 
> > D'oh, I knew that.
> > 
> > > 
> > > /proc/[pid]/mounts lists mounts from the mount namespace of the
> > > appropriate process.
> > > 
> > > That is another way to go but it is a tread carefully thing as changing
> > > things that way it is easy to surprise apparmor or selinux rules and be
> > > surprised you broke someones userspace in a way that prevents booting.
> > > Although I suspect /proc/locks isn't too bad.
> > 
> > OK, thanks.
> > 
> > /proc/[pid]/locks might be confusing.  I'd expect it to be "all the
> > locks owned by this task", rather than "all the locks owned by pid's in
> > the same pid namespace", or whatever criterion we choose.
> > 
> > Uh, I'm still trying to think of the Obviously Right solution here, and
> > it's not coming.
> > 
> > --b.
> 
> 
> I'm a little leery of changing how this works. It has always been
> maintained as a legacy interface, so do we run the risk of breaking
> something if we turn it into a per-namespace thing?

The namespace work is all about making interfaces per-namespace.  I
guess it works as long as it contributes to the illusion that each
container is its own machine.

Thinking about it, I might be sold on the per-pidns approach (with
Eric's modification to use the pidns of /proc not the reader).

My complaint about not being able to see conflicting locks would apply
just as well to conflicts from nfs locks held by other clients.  A disk
filesystem shared across multiple containers is a little like an nfs
filesystem shared between nfs clients.

That'd solve this immediate problem without requiring an lsof upgrade as
well.

> This also doesn't
> solve the problem of slow traversal in the init_pid_ns -- only in a
> container.
> 
> I also can't help but feel that /proc/locks is just showing its age. It
> was fine in the late 90's, but its limitations are just becoming more
> apparent as things get more complex. It was never designed for
> performance as you end up thrashing several spinlocks when reading it.
> 
> Maybe it's time to think about presenting this info in another way? A
> global view of all locks on the system is interesting but maybe it
> would be better to present it more granularly somehow?

But, yes, that might be a good idea.

--b.

> 
> I guess I should go look at what lsof actually does with this info...
> 
> -- 
> Jeff Layton <jlayton@poochiereds.net>
_______________________________________________
Containers mailing list
Containers@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/containers

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [RFC PATCH] locks: Show only file_locks created in the same pidns as current process
@ 2016-08-02 20:34                         ` J. Bruce Fields
  0 siblings, 0 replies; 62+ messages in thread
From: J. Bruce Fields @ 2016-08-02 20:34 UTC (permalink / raw)
  To: Jeff Layton
  Cc: Eric W. Biederman, Nikolay Borisov, viro, linux-kernel,
	linux-fsdevel, containers, serge.hallyn

On Tue, Aug 02, 2016 at 04:01:22PM -0400, Jeff Layton wrote:
> On Tue, 2016-08-02 at 15:44 -0400, J. Bruce Fields wrote:
> > On Tue, Aug 02, 2016 at 02:09:22PM -0500, Eric W. Biederman wrote:
> > > 
> > > > > "J. Bruce Fields" <bfields@fieldses.org> writes:
> > > 
> > > > 
> > > > On Tue, Aug 02, 2016 at 11:00:39AM -0500, Eric W. Biederman wrote:
> > > > > 
> > > > > > > > > Nikolay Borisov <kernel@kyup.com> writes:
> > > > > 
> > > > > > 
> > > > > > Currently when /proc/locks is read it will show all the file locks
> > > > > > which are currently created on the machine. On containers, hosted
> > > > > > on busy servers this means that doing lsof can be very slow. I
> > > > > > observed up to 5 seconds stalls reading 50k locks, while the container
> > > > > > itself had only a small number of relevant entries. Fix it by
> > > > > > filtering the locks listed by the pidns of the current process
> > > > > > and the process which created the lock.
> > > > > 
> > > > > The locks always confuse me so I am not 100% connecting locks
> > > > > to a pid namespace is appropriate.
> > > > > 
> > > > > That said if you are going to filter by pid namespace please use the pid
> > > > > namespace of proc, not the pid namespace of the process reading the
> > > > > file.
> > > > 
> > > > Oh, that makes sense, thanks.
> > > > 
> > > > What does /proc/mounts use, out of curiosity?  The mount namespace that
> > > > /proc was originally mounted in?
> > > 
> > > /proc/mounts -> /proc/self/mounts
> > 
> > D'oh, I knew that.
> > 
> > > 
> > > /proc/[pid]/mounts lists mounts from the mount namespace of the
> > > appropriate process.
> > > 
> > > That is another way to go but it is a tread carefully thing as changing
> > > things that way it is easy to surprise apparmor or selinux rules and be
> > > surprised you broke someones userspace in a way that prevents booting.
> > > Although I suspect /proc/locks isn't too bad.
> > 
> > OK, thanks.
> > 
> > /proc/[pid]/locks might be confusing.  I'd expect it to be "all the
> > locks owned by this task", rather than "all the locks owned by pid's in
> > the same pid namespace", or whatever criterion we choose.
> > 
> > Uh, I'm still trying to think of the Obviously Right solution here, and
> > it's not coming.
> > 
> > --b.
> 
> 
> I'm a little leery of changing how this works. It has always been
> maintained as a legacy interface, so do we run the risk of breaking
> something if we turn it into a per-namespace thing?

The namespace work is all about making interfaces per-namespace.  I
guess it works as long as it contributes to the illusion that each
container is its own machine.

Thinking about it, I might be sold on the per-pidns approach (with
Eric's modification to use the pidns of /proc not the reader).

My complaint about not being able to see conflicting locks would apply
just as well to conflicts from nfs locks held by other clients.  A disk
filesystem shared across multiple containers is a little like an nfs
filesystem shared between nfs clients.

That'd solve this immediate problem without requiring an lsof upgrade as
well.

> This also doesn't
> solve the problem of slow traversal in the init_pid_ns -- only in a
> container.
> 
> I also can't help but feel that /proc/locks is just showing its age. It
> was fine in the late 90's, but its limitations are just becoming more
> apparent as things get more complex. It was never designed for
> performance as you end up thrashing several spinlocks when reading it.
> 
> Maybe it's time to think about presenting this info in another way? A
> global view of all locks on the system is interesting but maybe it
> would be better to present it more granularly somehow?

But, yes, that might be a good idea.

--b.

> 
> I guess I should go look at what lsof actually does with this info...
> 
> -- 
> Jeff Layton <jlayton@poochiereds.net>

^ permalink raw reply	[flat|nested] 62+ messages in thread

* [PATCH v2] locks: Filter /proc/locks output on proc pid ns
  2016-08-02 14:42 ` Nikolay Borisov
@ 2016-08-03  7:35     ` Nikolay Borisov
  -1 siblings, 0 replies; 62+ messages in thread
From: Nikolay Borisov @ 2016-08-03  7:35 UTC (permalink / raw)
  To: jlayton-vpEMnDpepFuMZCB2o+C8xQ, bfields-uC3wQj2KruNg9hUCZPvPmw
  Cc: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA, Nikolay Borisov,
	ebiederm-aS9lmoZGLiVWk0Htik3J/w,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA,
	viro-RmSDqhL/yNMiFSDQTTA3OLVCufUGDwFn

On busy container servers reading /proc/locks shows all the locks
created by all clients. This can cause large latency spikes. In my
case I observed lsof taking up to 5-10 seconds while processing around
50k locks. Fix this by limiting the locks shown only to those created
in the same pidns as the one the proc was mounted in. When reading
/proc/locks from the init_pid_ns show everything.

Signed-off-by: Nikolay Borisov <kernel-6AxghH7DbtA@public.gmane.org>
---
 fs/locks.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/fs/locks.c b/fs/locks.c
index ee1b15f6fc13..751673d7f7fc 100644
--- a/fs/locks.c
+++ b/fs/locks.c
@@ -2648,9 +2648,15 @@ static int locks_show(struct seq_file *f, void *v)
 {
 	struct locks_iterator *iter = f->private;
 	struct file_lock *fl, *bfl;
+	struct pid_namespace *proc_pidns = file_inode(f->file)->i_sb->s_fs_info;
+	struct pid_namespace *current_pidns = task_active_pid_ns(current);
 
 	fl = hlist_entry(v, struct file_lock, fl_link);
 
+	if ((current_pidns != &init_pid_ns) && fl->fl_nspid
+	    && (proc_pidns != ns_of_pid(fl->fl_nspid)))
+		return 0;
+
 	lock_get_status(f, fl, iter->li_pos, "");
 
 	list_for_each_entry(bfl, &fl->fl_block, fl_block)
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 62+ messages in thread

* [PATCH v2] locks: Filter /proc/locks output on proc pid ns
@ 2016-08-03  7:35     ` Nikolay Borisov
  0 siblings, 0 replies; 62+ messages in thread
From: Nikolay Borisov @ 2016-08-03  7:35 UTC (permalink / raw)
  To: jlayton, bfields
  Cc: viro, linux-kernel, linux-fsdevel, ebiederm, containers, Nikolay Borisov

On busy container servers reading /proc/locks shows all the locks
created by all clients. This can cause large latency spikes. In my
case I observed lsof taking up to 5-10 seconds while processing around
50k locks. Fix this by limiting the locks shown only to those created
in the same pidns as the one the proc was mounted in. When reading
/proc/locks from the init_pid_ns show everything.

Signed-off-by: Nikolay Borisov <kernel@kyup.com>
---
 fs/locks.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/fs/locks.c b/fs/locks.c
index ee1b15f6fc13..751673d7f7fc 100644
--- a/fs/locks.c
+++ b/fs/locks.c
@@ -2648,9 +2648,15 @@ static int locks_show(struct seq_file *f, void *v)
 {
 	struct locks_iterator *iter = f->private;
 	struct file_lock *fl, *bfl;
+	struct pid_namespace *proc_pidns = file_inode(f->file)->i_sb->s_fs_info;
+	struct pid_namespace *current_pidns = task_active_pid_ns(current);
 
 	fl = hlist_entry(v, struct file_lock, fl_link);
 
+	if ((current_pidns != &init_pid_ns) && fl->fl_nspid
+	    && (proc_pidns != ns_of_pid(fl->fl_nspid)))
+		return 0;
+
 	lock_get_status(f, fl, iter->li_pos, "");
 
 	list_for_each_entry(bfl, &fl->fl_block, fl_block)
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 62+ messages in thread

* Re: [PATCH v2] locks: Filter /proc/locks output on proc pid ns
  2016-08-03  7:35     ` Nikolay Borisov
@ 2016-08-03 13:46         ` Jeff Layton
  -1 siblings, 0 replies; 62+ messages in thread
From: Jeff Layton @ 2016-08-03 13:46 UTC (permalink / raw)
  To: Nikolay Borisov, bfields-uC3wQj2KruNg9hUCZPvPmw
  Cc: linux-fsdevel-u79uwXL29TY76Z2rM5mHXA,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	ebiederm-aS9lmoZGLiVWk0Htik3J/w,
	viro-RmSDqhL/yNMiFSDQTTA3OLVCufUGDwFn,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA

On Wed, 2016-08-03 at 10:35 +0300, Nikolay Borisov wrote:
> On busy container servers reading /proc/locks shows all the locks
> created by all clients. This can cause large latency spikes. In my
> case I observed lsof taking up to 5-10 seconds while processing around
> 50k locks. Fix this by limiting the locks shown only to those created
> in the same pidns as the one the proc was mounted in. When reading
> /proc/locks from the init_pid_ns show everything.
> 
> > Signed-off-by: Nikolay Borisov <kernel@kyup.com>
> ---
>  fs/locks.c | 6 ++++++
>  1 file changed, 6 insertions(+)
> 
> diff --git a/fs/locks.c b/fs/locks.c
> index ee1b15f6fc13..751673d7f7fc 100644
> --- a/fs/locks.c
> +++ b/fs/locks.c
> @@ -2648,9 +2648,15 @@ static int locks_show(struct seq_file *f, void *v)
>  {
> >  	struct locks_iterator *iter = f->private;
> >  	struct file_lock *fl, *bfl;
> > +	struct pid_namespace *proc_pidns = file_inode(f->file)->i_sb->s_fs_info;
> > +	struct pid_namespace *current_pidns = task_active_pid_ns(current);
>  
> >  	fl = hlist_entry(v, struct file_lock, fl_link);
>  
> > > +	if ((current_pidns != &init_pid_ns) && fl->fl_nspid

Ok, so when you read from a process that's in the init_pid_ns
namespace, then you'll get the whole pile of locks, even when reading
this from a filesystem that was mounted in a different pid_ns?

That seems odd to me if so. Any reason not to just uniformly use the
proc_pidns here?

> > > +	    && (proc_pidns != ns_of_pid(fl->fl_nspid)))
> > +		return 0;
> +
> >  	lock_get_status(f, fl, iter->li_pos, "");
>  
> >  	list_for_each_entry(bfl, &fl->fl_block, fl_block)

-- 
Jeff Layton <jlayton@poochiereds.net>
_______________________________________________
Containers mailing list
Containers@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/containers

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH v2] locks: Filter /proc/locks output on proc pid ns
@ 2016-08-03 13:46         ` Jeff Layton
  0 siblings, 0 replies; 62+ messages in thread
From: Jeff Layton @ 2016-08-03 13:46 UTC (permalink / raw)
  To: Nikolay Borisov, bfields
  Cc: viro, linux-kernel, linux-fsdevel, ebiederm, containers

On Wed, 2016-08-03 at 10:35 +0300, Nikolay Borisov wrote:
> On busy container servers reading /proc/locks shows all the locks
> created by all clients. This can cause large latency spikes. In my
> case I observed lsof taking up to 5-10 seconds while processing around
> 50k locks. Fix this by limiting the locks shown only to those created
> in the same pidns as the one the proc was mounted in. When reading
> /proc/locks from the init_pid_ns show everything.
> 
> > Signed-off-by: Nikolay Borisov <kernel@kyup.com>
> ---
>  fs/locks.c | 6 ++++++
>  1 file changed, 6 insertions(+)
> 
> diff --git a/fs/locks.c b/fs/locks.c
> index ee1b15f6fc13..751673d7f7fc 100644
> --- a/fs/locks.c
> +++ b/fs/locks.c
> @@ -2648,9 +2648,15 @@ static int locks_show(struct seq_file *f, void *v)
>  {
> >  	struct locks_iterator *iter = f->private;
> >  	struct file_lock *fl, *bfl;
> > +	struct pid_namespace *proc_pidns = file_inode(f->file)->i_sb->s_fs_info;
> > +	struct pid_namespace *current_pidns = task_active_pid_ns(current);
>  
> >  	fl = hlist_entry(v, struct file_lock, fl_link);
>  
> > > +	if ((current_pidns != &init_pid_ns) && fl->fl_nspid

Ok, so when you read from a process that's in the init_pid_ns
namespace, then you'll get the whole pile of locks, even when reading
this from a filesystem that was mounted in a different pid_ns?

That seems odd to me if so. Any reason not to just uniformly use the
proc_pidns here?

> > > +	    && (proc_pidns != ns_of_pid(fl->fl_nspid)))
> > +		return 0;
> +
> >  	lock_get_status(f, fl, iter->li_pos, "");
>  
> >  	list_for_each_entry(bfl, &fl->fl_block, fl_block)

-- 
Jeff Layton <jlayton@poochiereds.net>

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH v2] locks: Filter /proc/locks output on proc pid ns
  2016-08-03 13:46         ` Jeff Layton
@ 2016-08-03 14:17             ` Nikolay Borisov
  -1 siblings, 0 replies; 62+ messages in thread
From: Nikolay Borisov @ 2016-08-03 14:17 UTC (permalink / raw)
  To: Jeff Layton, bfields-uC3wQj2KruNg9hUCZPvPmw
  Cc: Andrey Vagin,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	ebiederm-aS9lmoZGLiVWk0Htik3J/w,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA,
	xemul-5HdwGun5lf+gSpxsJD1C4w,
	viro-RmSDqhL/yNMiFSDQTTA3OLVCufUGDwFn



On 08/03/2016 04:46 PM, Jeff Layton wrote:
> On Wed, 2016-08-03 at 10:35 +0300, Nikolay Borisov wrote:
>> On busy container servers reading /proc/locks shows all the locks
>> created by all clients. This can cause large latency spikes. In my
>> case I observed lsof taking up to 5-10 seconds while processing around
>> 50k locks. Fix this by limiting the locks shown only to those created
>> in the same pidns as the one the proc was mounted in. When reading
>> /proc/locks from the init_pid_ns show everything.
>>
>>> Signed-off-by: Nikolay Borisov <kernel-6AxghH7DbtA@public.gmane.org>
>> ---
>>  fs/locks.c | 6 ++++++
>>  1 file changed, 6 insertions(+)
>>
>> diff --git a/fs/locks.c b/fs/locks.c
>> index ee1b15f6fc13..751673d7f7fc 100644
>> --- a/fs/locks.c
>> +++ b/fs/locks.c
>> @@ -2648,9 +2648,15 @@ static int locks_show(struct seq_file *f, void *v)
>>  {
>>>  	struct locks_iterator *iter = f->private;
>>>  	struct file_lock *fl, *bfl;
>>> +	struct pid_namespace *proc_pidns = file_inode(f->file)->i_sb->s_fs_info;
>>> +	struct pid_namespace *current_pidns = task_active_pid_ns(current);
>>  
>>>  	fl = hlist_entry(v, struct file_lock, fl_link);
>>  
>>>> +	if ((current_pidns != &init_pid_ns) && fl->fl_nspid
> 
> Ok, so when you read from a process that's in the init_pid_ns
> namespace, then you'll get the whole pile of locks, even when reading
> this from a filesystem that was mounted in a different pid_ns?
> 
> That seems odd to me if so. Any reason not to just uniformly use the
> proc_pidns here?

[CCing some people from openvz/CRIU]

My train of thought was "we should have means which would be the one
universal truth about everything and this would be a process in the
init_pid_ns". I don't have strong preference as long as I'm not breaking
userspace. As I said before - I think the CRIU guys might be using that
interface.

> 
>>>> +	    && (proc_pidns != ns_of_pid(fl->fl_nspid)))
>>> +		return 0;
>> +
>>>  	lock_get_status(f, fl, iter->li_pos, "");
>>  
>>>  	list_for_each_entry(bfl, &fl->fl_block, fl_block)
> 

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH v2] locks: Filter /proc/locks output on proc pid ns
@ 2016-08-03 14:17             ` Nikolay Borisov
  0 siblings, 0 replies; 62+ messages in thread
From: Nikolay Borisov @ 2016-08-03 14:17 UTC (permalink / raw)
  To: Jeff Layton, bfields
  Cc: viro, linux-kernel, linux-fsdevel, ebiederm, containers,
	Andrey Vagin, xemul



On 08/03/2016 04:46 PM, Jeff Layton wrote:
> On Wed, 2016-08-03 at 10:35 +0300, Nikolay Borisov wrote:
>> On busy container servers reading /proc/locks shows all the locks
>> created by all clients. This can cause large latency spikes. In my
>> case I observed lsof taking up to 5-10 seconds while processing around
>> 50k locks. Fix this by limiting the locks shown only to those created
>> in the same pidns as the one the proc was mounted in. When reading
>> /proc/locks from the init_pid_ns show everything.
>>
>>> Signed-off-by: Nikolay Borisov <kernel@kyup.com>
>> ---
>>  fs/locks.c | 6 ++++++
>>  1 file changed, 6 insertions(+)
>>
>> diff --git a/fs/locks.c b/fs/locks.c
>> index ee1b15f6fc13..751673d7f7fc 100644
>> --- a/fs/locks.c
>> +++ b/fs/locks.c
>> @@ -2648,9 +2648,15 @@ static int locks_show(struct seq_file *f, void *v)
>>  {
>>>  	struct locks_iterator *iter = f->private;
>>>  	struct file_lock *fl, *bfl;
>>> +	struct pid_namespace *proc_pidns = file_inode(f->file)->i_sb->s_fs_info;
>>> +	struct pid_namespace *current_pidns = task_active_pid_ns(current);
>>  
>>>  	fl = hlist_entry(v, struct file_lock, fl_link);
>>  
>>>> +	if ((current_pidns != &init_pid_ns) && fl->fl_nspid
> 
> Ok, so when you read from a process that's in the init_pid_ns
> namespace, then you'll get the whole pile of locks, even when reading
> this from a filesystem that was mounted in a different pid_ns?
> 
> That seems odd to me if so. Any reason not to just uniformly use the
> proc_pidns here?

[CCing some people from openvz/CRIU]

My train of thought was "we should have means which would be the one
universal truth about everything and this would be a process in the
init_pid_ns". I don't have strong preference as long as I'm not breaking
userspace. As I said before - I think the CRIU guys might be using that
interface.

> 
>>>> +	    && (proc_pidns != ns_of_pid(fl->fl_nspid)))
>>> +		return 0;
>> +
>>>  	lock_get_status(f, fl, iter->li_pos, "");
>>  
>>>  	list_for_each_entry(bfl, &fl->fl_block, fl_block)
> 

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH v2] locks: Filter /proc/locks output on proc pid ns
       [not found]             ` <57A1FCE5.3040206-6AxghH7DbtA@public.gmane.org>
@ 2016-08-03 14:28               ` J. Bruce Fields
  2016-08-03 14:54                 ` Pavel Emelyanov
  1 sibling, 0 replies; 62+ messages in thread
From: J. Bruce Fields @ 2016-08-03 14:28 UTC (permalink / raw)
  To: Nikolay Borisov
  Cc: Andrey Vagin,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	ebiederm-aS9lmoZGLiVWk0Htik3J/w,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, Jeff Layton,
	xemul-5HdwGun5lf+gSpxsJD1C4w,
	viro-RmSDqhL/yNMiFSDQTTA3OLVCufUGDwFn

On Wed, Aug 03, 2016 at 05:17:09PM +0300, Nikolay Borisov wrote:
> 
> 
> On 08/03/2016 04:46 PM, Jeff Layton wrote:
> > On Wed, 2016-08-03 at 10:35 +0300, Nikolay Borisov wrote:
> >> On busy container servers reading /proc/locks shows all the locks
> >> created by all clients. This can cause large latency spikes. In my
> >> case I observed lsof taking up to 5-10 seconds while processing around
> >> 50k locks. Fix this by limiting the locks shown only to those created
> >> in the same pidns as the one the proc was mounted in. When reading
> >> /proc/locks from the init_pid_ns show everything.
> >>
> >>> Signed-off-by: Nikolay Borisov <kernel-6AxghH7DbtA@public.gmane.org>
> >> ---
> >>  fs/locks.c | 6 ++++++
> >>  1 file changed, 6 insertions(+)
> >>
> >> diff --git a/fs/locks.c b/fs/locks.c
> >> index ee1b15f6fc13..751673d7f7fc 100644
> >> --- a/fs/locks.c
> >> +++ b/fs/locks.c
> >> @@ -2648,9 +2648,15 @@ static int locks_show(struct seq_file *f, void *v)
> >>  {
> >>>  	struct locks_iterator *iter = f->private;
> >>>  	struct file_lock *fl, *bfl;
> >>> +	struct pid_namespace *proc_pidns = file_inode(f->file)->i_sb->s_fs_info;
> >>> +	struct pid_namespace *current_pidns = task_active_pid_ns(current);
> >>  
> >>>  	fl = hlist_entry(v, struct file_lock, fl_link);
> >>  
> >>>> +	if ((current_pidns != &init_pid_ns) && fl->fl_nspid
> > 
> > Ok, so when you read from a process that's in the init_pid_ns
> > namespace, then you'll get the whole pile of locks, even when reading
> > this from a filesystem that was mounted in a different pid_ns?
> > 
> > That seems odd to me if so. Any reason not to just uniformly use the
> > proc_pidns here?
> 
> [CCing some people from openvz/CRIU]
> 
> My train of thought was "we should have means which would be the one
> universal truth about everything and this would be a process in the
> init_pid_ns".

OK, but why not make that means be "mount proc from the init_pid_ns and
read /proc/locks there".  So just replace current_pidns with proc_pidns
in the above.  I think that's all Jeff was suggesting.

--b.

> I don't have strong preference as long as I'm not breaking
> userspace. As I said before - I think the CRIU guys might be using that
> interface.
> 
> > 
> >>>> +	    && (proc_pidns != ns_of_pid(fl->fl_nspid)))
> >>> +		return 0;
> >> +
> >>>  	lock_get_status(f, fl, iter->li_pos, "");
> >>  
> >>>  	list_for_each_entry(bfl, &fl->fl_block, fl_block)
> > 

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH v2] locks: Filter /proc/locks output on proc pid ns
  2016-08-03 14:17             ` Nikolay Borisov
  (?)
@ 2016-08-03 14:28             ` J. Bruce Fields
       [not found]               ` <20160803142850.GA27072-uC3wQj2KruNg9hUCZPvPmw@public.gmane.org>
  -1 siblings, 1 reply; 62+ messages in thread
From: J. Bruce Fields @ 2016-08-03 14:28 UTC (permalink / raw)
  To: Nikolay Borisov
  Cc: Jeff Layton, viro, linux-kernel, linux-fsdevel, ebiederm,
	containers, Andrey Vagin, xemul

On Wed, Aug 03, 2016 at 05:17:09PM +0300, Nikolay Borisov wrote:
> 
> 
> On 08/03/2016 04:46 PM, Jeff Layton wrote:
> > On Wed, 2016-08-03 at 10:35 +0300, Nikolay Borisov wrote:
> >> On busy container servers reading /proc/locks shows all the locks
> >> created by all clients. This can cause large latency spikes. In my
> >> case I observed lsof taking up to 5-10 seconds while processing around
> >> 50k locks. Fix this by limiting the locks shown only to those created
> >> in the same pidns as the one the proc was mounted in. When reading
> >> /proc/locks from the init_pid_ns show everything.
> >>
> >>> Signed-off-by: Nikolay Borisov <kernel@kyup.com>
> >> ---
> >>  fs/locks.c | 6 ++++++
> >>  1 file changed, 6 insertions(+)
> >>
> >> diff --git a/fs/locks.c b/fs/locks.c
> >> index ee1b15f6fc13..751673d7f7fc 100644
> >> --- a/fs/locks.c
> >> +++ b/fs/locks.c
> >> @@ -2648,9 +2648,15 @@ static int locks_show(struct seq_file *f, void *v)
> >>  {
> >>>  	struct locks_iterator *iter = f->private;
> >>>  	struct file_lock *fl, *bfl;
> >>> +	struct pid_namespace *proc_pidns = file_inode(f->file)->i_sb->s_fs_info;
> >>> +	struct pid_namespace *current_pidns = task_active_pid_ns(current);
> >>  
> >>>  	fl = hlist_entry(v, struct file_lock, fl_link);
> >>  
> >>>> +	if ((current_pidns != &init_pid_ns) && fl->fl_nspid
> > 
> > Ok, so when you read from a process that's in the init_pid_ns
> > namespace, then you'll get the whole pile of locks, even when reading
> > this from a filesystem that was mounted in a different pid_ns?
> > 
> > That seems odd to me if so. Any reason not to just uniformly use the
> > proc_pidns here?
> 
> [CCing some people from openvz/CRIU]
> 
> My train of thought was "we should have means which would be the one
> universal truth about everything and this would be a process in the
> init_pid_ns".

OK, but why not make that means be "mount proc from the init_pid_ns and
read /proc/locks there".  So just replace current_pidns with proc_pidns
in the above.  I think that's all Jeff was suggesting.

--b.

> I don't have strong preference as long as I'm not breaking
> userspace. As I said before - I think the CRIU guys might be using that
> interface.
> 
> > 
> >>>> +	    && (proc_pidns != ns_of_pid(fl->fl_nspid)))
> >>> +		return 0;
> >> +
> >>>  	lock_get_status(f, fl, iter->li_pos, "");
> >>  
> >>>  	list_for_each_entry(bfl, &fl->fl_block, fl_block)
> > 

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH v2] locks: Filter /proc/locks output on proc pid ns
  2016-08-03 14:28             ` J. Bruce Fields
@ 2016-08-03 14:33                   ` Nikolay Borisov
  0 siblings, 0 replies; 62+ messages in thread
From: Nikolay Borisov @ 2016-08-03 14:33 UTC (permalink / raw)
  To: J. Bruce Fields
  Cc: Andrey Vagin,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	ebiederm-aS9lmoZGLiVWk0Htik3J/w,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, Jeff Layton,
	xemul-5HdwGun5lf+gSpxsJD1C4w,
	viro-RmSDqhL/yNMiFSDQTTA3OLVCufUGDwFn



On 08/03/2016 05:28 PM, J. Bruce Fields wrote:
> On Wed, Aug 03, 2016 at 05:17:09PM +0300, Nikolay Borisov wrote:
>>
>>
>> On 08/03/2016 04:46 PM, Jeff Layton wrote:
>>> On Wed, 2016-08-03 at 10:35 +0300, Nikolay Borisov wrote:
>>>> On busy container servers reading /proc/locks shows all the locks
>>>> created by all clients. This can cause large latency spikes. In my
>>>> case I observed lsof taking up to 5-10 seconds while processing around
>>>> 50k locks. Fix this by limiting the locks shown only to those created
>>>> in the same pidns as the one the proc was mounted in. When reading
>>>> /proc/locks from the init_pid_ns show everything.
>>>>
>>>>> Signed-off-by: Nikolay Borisov <kernel-6AxghH7DbtA@public.gmane.org>
>>>> ---
>>>>  fs/locks.c | 6 ++++++
>>>>  1 file changed, 6 insertions(+)
>>>>
>>>> diff --git a/fs/locks.c b/fs/locks.c
>>>> index ee1b15f6fc13..751673d7f7fc 100644
>>>> --- a/fs/locks.c
>>>> +++ b/fs/locks.c
>>>> @@ -2648,9 +2648,15 @@ static int locks_show(struct seq_file *f, void *v)
>>>>  {
>>>>>  	struct locks_iterator *iter = f->private;
>>>>>  	struct file_lock *fl, *bfl;
>>>>> +	struct pid_namespace *proc_pidns = file_inode(f->file)->i_sb->s_fs_info;
>>>>> +	struct pid_namespace *current_pidns = task_active_pid_ns(current);
>>>>  
>>>>>  	fl = hlist_entry(v, struct file_lock, fl_link);
>>>>  
>>>>>> +	if ((current_pidns != &init_pid_ns) && fl->fl_nspid
>>>
>>> Ok, so when you read from a process that's in the init_pid_ns
>>> namespace, then you'll get the whole pile of locks, even when reading
>>> this from a filesystem that was mounted in a different pid_ns?
>>>
>>> That seems odd to me if so. Any reason not to just uniformly use the
>>> proc_pidns here?
>>
>> [CCing some people from openvz/CRIU]
>>
>> My train of thought was "we should have means which would be the one
>> universal truth about everything and this would be a process in the
>> init_pid_ns".
> 
> OK, but why not make that means be "mount proc from the init_pid_ns and
> read /proc/locks there".  So just replace current_pidns with proc_pidns
> in the above.  I think that's all Jeff was suggesting.

Oh, you are right. Silly me, yes, I'm happy with this and I will send a
patch.


> 
> --b.
> 
>> I don't have strong preference as long as I'm not breaking
>> userspace. As I said before - I think the CRIU guys might be using that
>> interface.
>>
>>>
>>>>>> +	    && (proc_pidns != ns_of_pid(fl->fl_nspid)))
>>>>> +		return 0;
>>>> +
>>>>>  	lock_get_status(f, fl, iter->li_pos, "");
>>>>  
>>>>>  	list_for_each_entry(bfl, &fl->fl_block, fl_block)
>>>

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH v2] locks: Filter /proc/locks output on proc pid ns
@ 2016-08-03 14:33                   ` Nikolay Borisov
  0 siblings, 0 replies; 62+ messages in thread
From: Nikolay Borisov @ 2016-08-03 14:33 UTC (permalink / raw)
  To: J. Bruce Fields
  Cc: Jeff Layton, viro, linux-kernel, linux-fsdevel, ebiederm,
	containers, Andrey Vagin, xemul



On 08/03/2016 05:28 PM, J. Bruce Fields wrote:
> On Wed, Aug 03, 2016 at 05:17:09PM +0300, Nikolay Borisov wrote:
>>
>>
>> On 08/03/2016 04:46 PM, Jeff Layton wrote:
>>> On Wed, 2016-08-03 at 10:35 +0300, Nikolay Borisov wrote:
>>>> On busy container servers reading /proc/locks shows all the locks
>>>> created by all clients. This can cause large latency spikes. In my
>>>> case I observed lsof taking up to 5-10 seconds while processing around
>>>> 50k locks. Fix this by limiting the locks shown only to those created
>>>> in the same pidns as the one the proc was mounted in. When reading
>>>> /proc/locks from the init_pid_ns show everything.
>>>>
>>>>> Signed-off-by: Nikolay Borisov <kernel@kyup.com>
>>>> ---
>>>>  fs/locks.c | 6 ++++++
>>>>  1 file changed, 6 insertions(+)
>>>>
>>>> diff --git a/fs/locks.c b/fs/locks.c
>>>> index ee1b15f6fc13..751673d7f7fc 100644
>>>> --- a/fs/locks.c
>>>> +++ b/fs/locks.c
>>>> @@ -2648,9 +2648,15 @@ static int locks_show(struct seq_file *f, void *v)
>>>>  {
>>>>>  	struct locks_iterator *iter = f->private;
>>>>>  	struct file_lock *fl, *bfl;
>>>>> +	struct pid_namespace *proc_pidns = file_inode(f->file)->i_sb->s_fs_info;
>>>>> +	struct pid_namespace *current_pidns = task_active_pid_ns(current);
>>>>  
>>>>>  	fl = hlist_entry(v, struct file_lock, fl_link);
>>>>  
>>>>>> +	if ((current_pidns != &init_pid_ns) && fl->fl_nspid
>>>
>>> Ok, so when you read from a process that's in the init_pid_ns
>>> namespace, then you'll get the whole pile of locks, even when reading
>>> this from a filesystem that was mounted in a different pid_ns?
>>>
>>> That seems odd to me if so. Any reason not to just uniformly use the
>>> proc_pidns here?
>>
>> [CCing some people from openvz/CRIU]
>>
>> My train of thought was "we should have means which would be the one
>> universal truth about everything and this would be a process in the
>> init_pid_ns".
> 
> OK, but why not make that means be "mount proc from the init_pid_ns and
> read /proc/locks there".  So just replace current_pidns with proc_pidns
> in the above.  I think that's all Jeff was suggesting.

Oh, you are right. Silly me, yes, I'm happy with this and I will send a
patch.


> 
> --b.
> 
>> I don't have strong preference as long as I'm not breaking
>> userspace. As I said before - I think the CRIU guys might be using that
>> interface.
>>
>>>
>>>>>> +	    && (proc_pidns != ns_of_pid(fl->fl_nspid)))
>>>>> +		return 0;
>>>> +
>>>>>  	lock_get_status(f, fl, iter->li_pos, "");
>>>>  
>>>>>  	list_for_each_entry(bfl, &fl->fl_block, fl_block)
>>>

^ permalink raw reply	[flat|nested] 62+ messages in thread

* [PATCHv3] locks: Filter /proc/locks output on proc pid ns
       [not found] ` <1470148943-21835-1-git-send-email-kernel-6AxghH7DbtA@public.gmane.org>
                     ` (3 preceding siblings ...)
  2016-08-03  7:35     ` Nikolay Borisov
@ 2016-08-03 14:54   ` Nikolay Borisov
       [not found]     ` <1470236078-2389-1-git-send-email-kernel-6AxghH7DbtA@public.gmane.org>
  2016-08-04  7:26   ` [PATCHv4] " Nikolay Borisov
  2016-08-05  7:30   ` [PATCHv5] " Nikolay Borisov
  6 siblings, 1 reply; 62+ messages in thread
From: Nikolay Borisov @ 2016-08-03 14:54 UTC (permalink / raw)
  To: bfields-uC3wQj2KruNg9hUCZPvPmw, jlayton-vpEMnDpepFuMZCB2o+C8xQ
  Cc: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	xemul-5HdwGun5lf+gSpxsJD1C4w, Nikolay Borisov,
	avagin-GEFAQzZX7r8dnm+yROfE0A, ebiederm-aS9lmoZGLiVWk0Htik3J/w

On busy container servers reading /proc/locks shows all the locks
created by all clients. This can cause large latency spikes. In my
case I observed lsof taking up to 5-10 seconds while processing around
50k locks. Fix this by limiting the locks shown only to those created
in the same pidns as the one the proc fs was mounted in. When reading
/proc/locks from the init_pid_ns proc instance then perform no
filtering

Signed-off-by: Nikolay Borisov <kernel-6AxghH7DbtA@public.gmane.org>
---
 fs/locks.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/fs/locks.c b/fs/locks.c
index ee1b15f6fc13..65e75810a836 100644
--- a/fs/locks.c
+++ b/fs/locks.c
@@ -2648,9 +2648,14 @@ static int locks_show(struct seq_file *f, void *v)
 {
 	struct locks_iterator *iter = f->private;
 	struct file_lock *fl, *bfl;
+	struct pid_namespace *proc_pidns = file_inode(f->file)->i_sb->s_fs_info;
 
 	fl = hlist_entry(v, struct file_lock, fl_link);
 
+	if ((proc_pidns != &init_pid_ns) && fl->fl_nspid
+	    && (proc_pidns != ns_of_pid(fl->fl_nspid)))
+		return 0;
+
 	lock_get_status(f, fl, iter->li_pos, "");
 
 	list_for_each_entry(bfl, &fl->fl_block, fl_block)
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 62+ messages in thread

* Re: [PATCH v2] locks: Filter /proc/locks output on proc pid ns
  2016-08-03 14:17             ` Nikolay Borisov
@ 2016-08-03 14:54                 ` Pavel Emelyanov
  -1 siblings, 0 replies; 62+ messages in thread
From: Pavel Emelyanov @ 2016-08-03 14:54 UTC (permalink / raw)
  To: Nikolay Borisov, Jeff Layton, bfields-uC3wQj2KruNg9hUCZPvPmw
  Cc: Andrey Vagin,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	ebiederm-aS9lmoZGLiVWk0Htik3J/w,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA,
	viro-RmSDqhL/yNMiFSDQTTA3OLVCufUGDwFn

On 08/03/2016 05:17 PM, Nikolay Borisov wrote:
> 
> 
> On 08/03/2016 04:46 PM, Jeff Layton wrote:
>> On Wed, 2016-08-03 at 10:35 +0300, Nikolay Borisov wrote:
>>> On busy container servers reading /proc/locks shows all the locks
>>> created by all clients. This can cause large latency spikes. In my
>>> case I observed lsof taking up to 5-10 seconds while processing around
>>> 50k locks. Fix this by limiting the locks shown only to those created
>>> in the same pidns as the one the proc was mounted in. When reading
>>> /proc/locks from the init_pid_ns show everything.
>>>
>>>> Signed-off-by: Nikolay Borisov <kernel-6AxghH7DbtA@public.gmane.org>
>>> ---
>>>  fs/locks.c | 6 ++++++
>>>  1 file changed, 6 insertions(+)
>>>
>>> diff --git a/fs/locks.c b/fs/locks.c
>>> index ee1b15f6fc13..751673d7f7fc 100644
>>> --- a/fs/locks.c
>>> +++ b/fs/locks.c
>>> @@ -2648,9 +2648,15 @@ static int locks_show(struct seq_file *f, void *v)
>>>  {
>>>>  	struct locks_iterator *iter = f->private;
>>>>  	struct file_lock *fl, *bfl;
>>>> +	struct pid_namespace *proc_pidns = file_inode(f->file)->i_sb->s_fs_info;
>>>> +	struct pid_namespace *current_pidns = task_active_pid_ns(current);
>>>  
>>>>  	fl = hlist_entry(v, struct file_lock, fl_link);
>>>  
>>>>> +	if ((current_pidns != &init_pid_ns) && fl->fl_nspid
>>
>> Ok, so when you read from a process that's in the init_pid_ns
>> namespace, then you'll get the whole pile of locks, even when reading
>> this from a filesystem that was mounted in a different pid_ns?
>>
>> That seems odd to me if so. Any reason not to just uniformly use the
>> proc_pidns here?
> 
> [CCing some people from openvz/CRIU]

Thanks :)

> My train of thought was "we should have means which would be the one
> universal truth about everything and this would be a process in the
> init_pid_ns". I don't have strong preference as long as I'm not breaking
> userspace. As I said before - I think the CRIU guys might be using that
> interface.

This particular change won't break us mostly because we've switched to
reading the /proc/pid/fdinfo/n files for locks.

-- Pavel

>>
>>>>> +	    && (proc_pidns != ns_of_pid(fl->fl_nspid)))
>>>> +		return 0;
>>> +
>>>>  	lock_get_status(f, fl, iter->li_pos, "");
>>>  
>>>>  	list_for_each_entry(bfl, &fl->fl_block, fl_block)
>>
> .
> 

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH v2] locks: Filter /proc/locks output on proc pid ns
@ 2016-08-03 14:54                 ` Pavel Emelyanov
  0 siblings, 0 replies; 62+ messages in thread
From: Pavel Emelyanov @ 2016-08-03 14:54 UTC (permalink / raw)
  To: Nikolay Borisov, Jeff Layton, bfields
  Cc: viro, linux-kernel, linux-fsdevel, ebiederm, containers, Andrey Vagin

On 08/03/2016 05:17 PM, Nikolay Borisov wrote:
> 
> 
> On 08/03/2016 04:46 PM, Jeff Layton wrote:
>> On Wed, 2016-08-03 at 10:35 +0300, Nikolay Borisov wrote:
>>> On busy container servers reading /proc/locks shows all the locks
>>> created by all clients. This can cause large latency spikes. In my
>>> case I observed lsof taking up to 5-10 seconds while processing around
>>> 50k locks. Fix this by limiting the locks shown only to those created
>>> in the same pidns as the one the proc was mounted in. When reading
>>> /proc/locks from the init_pid_ns show everything.
>>>
>>>> Signed-off-by: Nikolay Borisov <kernel@kyup.com>
>>> ---
>>>  fs/locks.c | 6 ++++++
>>>  1 file changed, 6 insertions(+)
>>>
>>> diff --git a/fs/locks.c b/fs/locks.c
>>> index ee1b15f6fc13..751673d7f7fc 100644
>>> --- a/fs/locks.c
>>> +++ b/fs/locks.c
>>> @@ -2648,9 +2648,15 @@ static int locks_show(struct seq_file *f, void *v)
>>>  {
>>>>  	struct locks_iterator *iter = f->private;
>>>>  	struct file_lock *fl, *bfl;
>>>> +	struct pid_namespace *proc_pidns = file_inode(f->file)->i_sb->s_fs_info;
>>>> +	struct pid_namespace *current_pidns = task_active_pid_ns(current);
>>>  
>>>>  	fl = hlist_entry(v, struct file_lock, fl_link);
>>>  
>>>>> +	if ((current_pidns != &init_pid_ns) && fl->fl_nspid
>>
>> Ok, so when you read from a process that's in the init_pid_ns
>> namespace, then you'll get the whole pile of locks, even when reading
>> this from a filesystem that was mounted in a different pid_ns?
>>
>> That seems odd to me if so. Any reason not to just uniformly use the
>> proc_pidns here?
> 
> [CCing some people from openvz/CRIU]

Thanks :)

> My train of thought was "we should have means which would be the one
> universal truth about everything and this would be a process in the
> init_pid_ns". I don't have strong preference as long as I'm not breaking
> userspace. As I said before - I think the CRIU guys might be using that
> interface.

This particular change won't break us mostly because we've switched to
reading the /proc/pid/fdinfo/n files for locks.

-- Pavel

>>
>>>>> +	    && (proc_pidns != ns_of_pid(fl->fl_nspid)))
>>>> +		return 0;
>>> +
>>>>  	lock_get_status(f, fl, iter->li_pos, "");
>>>  
>>>>  	list_for_each_entry(bfl, &fl->fl_block, fl_block)
>>
> .
> 

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH v2] locks: Filter /proc/locks output on proc pid ns
       [not found]                 ` <57A205BE.3070202-5HdwGun5lf+gSpxsJD1C4w@public.gmane.org>
@ 2016-08-03 15:00                   ` Nikolay Borisov
  0 siblings, 0 replies; 62+ messages in thread
From: Nikolay Borisov @ 2016-08-03 15:00 UTC (permalink / raw)
  To: Pavel Emelyanov, Nikolay Borisov, Jeff Layton,
	bfields-uC3wQj2KruNg9hUCZPvPmw
  Cc: Andrey Vagin,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	ebiederm-aS9lmoZGLiVWk0Htik3J/w,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA,
	viro-RmSDqhL/yNMiFSDQTTA3OLVCufUGDwFn



On 08/03/2016 05:54 PM, Pavel Emelyanov wrote:
> On 08/03/2016 05:17 PM, Nikolay Borisov wrote:
>>
>>
[SNIP]
>>
>> [CCing some people from openvz/CRIU]
> 
> Thanks :)
> 
>> My train of thought was "we should have means which would be the one
>> universal truth about everything and this would be a process in the
>> init_pid_ns". I don't have strong preference as long as I'm not breaking
>> userspace. As I said before - I think the CRIU guys might be using that
>> interface.
> 
> This particular change won't break us mostly because we've switched to
> reading the /proc/pid/fdinfo/n files for locks.

[thinking out loud here]

I've never actually looked into those files but now that I have it seems
to make sense to also switch 'lsof' to actually reading the locks from
the available pids directories rather than relying on the global
/proc/locks interface. Oh well :)

[/thinking out loud here]

> 
> -- Pavel
> 
>>>
>>>>>> +	    && (proc_pidns != ns_of_pid(fl->fl_nspid)))
>>>>> +		return 0;
>>>> +
>>>>>  	lock_get_status(f, fl, iter->li_pos, "");
>>>>  
>>>>>  	list_for_each_entry(bfl, &fl->fl_block, fl_block)
>>>
>> .
>>
> 
> _______________________________________________
> Containers mailing list
> Containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org
> https://lists.linuxfoundation.org/mailman/listinfo/containers
> 

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH v2] locks: Filter /proc/locks output on proc pid ns
  2016-08-03 14:54                 ` Pavel Emelyanov
  (?)
  (?)
@ 2016-08-03 15:00                 ` Nikolay Borisov
       [not found]                   ` <57A20702.3040805-6AxghH7DbtA@public.gmane.org>
  -1 siblings, 1 reply; 62+ messages in thread
From: Nikolay Borisov @ 2016-08-03 15:00 UTC (permalink / raw)
  To: Pavel Emelyanov, Nikolay Borisov, Jeff Layton, bfields
  Cc: Andrey Vagin, containers, linux-kernel, ebiederm, linux-fsdevel, viro



On 08/03/2016 05:54 PM, Pavel Emelyanov wrote:
> On 08/03/2016 05:17 PM, Nikolay Borisov wrote:
>>
>>
[SNIP]
>>
>> [CCing some people from openvz/CRIU]
> 
> Thanks :)
> 
>> My train of thought was "we should have means which would be the one
>> universal truth about everything and this would be a process in the
>> init_pid_ns". I don't have strong preference as long as I'm not breaking
>> userspace. As I said before - I think the CRIU guys might be using that
>> interface.
> 
> This particular change won't break us mostly because we've switched to
> reading the /proc/pid/fdinfo/n files for locks.

[thinking out loud here]

I've never actually looked into those files but now that I have it seems
to make sense to also switch 'lsof' to actually reading the locks from
the available pids directories rather than relying on the global
/proc/locks interface. Oh well :)

[/thinking out loud here]

> 
> -- Pavel
> 
>>>
>>>>>> +	    && (proc_pidns != ns_of_pid(fl->fl_nspid)))
>>>>> +		return 0;
>>>> +
>>>>>  	lock_get_status(f, fl, iter->li_pos, "");
>>>>  
>>>>>  	list_for_each_entry(bfl, &fl->fl_block, fl_block)
>>>
>> .
>>
> 
> _______________________________________________
> Containers mailing list
> Containers@lists.linux-foundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/containers
> 

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH v2] locks: Filter /proc/locks output on proc pid ns
  2016-08-03 15:00                 ` Nikolay Borisov
@ 2016-08-03 15:06                       ` J. Bruce Fields
  0 siblings, 0 replies; 62+ messages in thread
From: J. Bruce Fields @ 2016-08-03 15:06 UTC (permalink / raw)
  To: Nikolay Borisov
  Cc: Andrey Vagin,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	viro-RmSDqhL/yNMiFSDQTTA3OLVCufUGDwFn,
	ebiederm-aS9lmoZGLiVWk0Htik3J/w,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, Jeff Layton,
	Pavel Emelyanov

On Wed, Aug 03, 2016 at 06:00:18PM +0300, Nikolay Borisov wrote:
> 
> 
> On 08/03/2016 05:54 PM, Pavel Emelyanov wrote:
> > On 08/03/2016 05:17 PM, Nikolay Borisov wrote:
> >>
> >>
> [SNIP]
> >>
> >> [CCing some people from openvz/CRIU]
> > 
> > Thanks :)
> > 
> >> My train of thought was "we should have means which would be the one
> >> universal truth about everything and this would be a process in the
> >> init_pid_ns". I don't have strong preference as long as I'm not breaking
> >> userspace. As I said before - I think the CRIU guys might be using that
> >> interface.
> > 
> > This particular change won't break us mostly because we've switched to
> > reading the /proc/pid/fdinfo/n files for locks.
> 
> [thinking out loud here]
> 
> I've never actually looked into those files but now that I have it seems
> to make sense to also switch 'lsof' to actually reading the locks from
> the available pids directories rather than relying on the global
> /proc/locks interface. Oh well :)

Digging around...  Oh, I see, there's an optional 'lock:..' line in
/proc/[pid]/fdinfo/[pid] file, is that what you're looking at?  I'd
forgotten.  Yeah, maybe that would make more sense long term.

--b.

> 
> [/thinking out loud here]
> 
> > 
> > -- Pavel
> > 
> >>>
> >>>>>> +	    && (proc_pidns != ns_of_pid(fl->fl_nspid)))
> >>>>> +		return 0;
> >>>> +
> >>>>>  	lock_get_status(f, fl, iter->li_pos, "");
> >>>>  
> >>>>>  	list_for_each_entry(bfl, &fl->fl_block, fl_block)
> >>>
> >> .
> >>
> > 
> > _______________________________________________
> > Containers mailing list
> > Containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org
> > https://lists.linuxfoundation.org/mailman/listinfo/containers
> > 

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH v2] locks: Filter /proc/locks output on proc pid ns
@ 2016-08-03 15:06                       ` J. Bruce Fields
  0 siblings, 0 replies; 62+ messages in thread
From: J. Bruce Fields @ 2016-08-03 15:06 UTC (permalink / raw)
  To: Nikolay Borisov
  Cc: Pavel Emelyanov, Jeff Layton, Andrey Vagin, containers,
	linux-kernel, ebiederm, linux-fsdevel, viro

On Wed, Aug 03, 2016 at 06:00:18PM +0300, Nikolay Borisov wrote:
> 
> 
> On 08/03/2016 05:54 PM, Pavel Emelyanov wrote:
> > On 08/03/2016 05:17 PM, Nikolay Borisov wrote:
> >>
> >>
> [SNIP]
> >>
> >> [CCing some people from openvz/CRIU]
> > 
> > Thanks :)
> > 
> >> My train of thought was "we should have means which would be the one
> >> universal truth about everything and this would be a process in the
> >> init_pid_ns". I don't have strong preference as long as I'm not breaking
> >> userspace. As I said before - I think the CRIU guys might be using that
> >> interface.
> > 
> > This particular change won't break us mostly because we've switched to
> > reading the /proc/pid/fdinfo/n files for locks.
> 
> [thinking out loud here]
> 
> I've never actually looked into those files but now that I have it seems
> to make sense to also switch 'lsof' to actually reading the locks from
> the available pids directories rather than relying on the global
> /proc/locks interface. Oh well :)

Digging around...  Oh, I see, there's an optional 'lock:..' line in
/proc/[pid]/fdinfo/[pid] file, is that what you're looking at?  I'd
forgotten.  Yeah, maybe that would make more sense long term.

--b.

> 
> [/thinking out loud here]
> 
> > 
> > -- Pavel
> > 
> >>>
> >>>>>> +	    && (proc_pidns != ns_of_pid(fl->fl_nspid)))
> >>>>> +		return 0;
> >>>> +
> >>>>>  	lock_get_status(f, fl, iter->li_pos, "");
> >>>>  
> >>>>>  	list_for_each_entry(bfl, &fl->fl_block, fl_block)
> >>>
> >> .
> >>
> > 
> > _______________________________________________
> > Containers mailing list
> > Containers@lists.linux-foundation.org
> > https://lists.linuxfoundation.org/mailman/listinfo/containers
> > 

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH v2] locks: Filter /proc/locks output on proc pid ns
  2016-08-03 15:06                       ` J. Bruce Fields
@ 2016-08-03 15:10                           ` Nikolay Borisov
  -1 siblings, 0 replies; 62+ messages in thread
From: Nikolay Borisov @ 2016-08-03 15:10 UTC (permalink / raw)
  To: J. Bruce Fields
  Cc: Andrey Vagin,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	viro-RmSDqhL/yNMiFSDQTTA3OLVCufUGDwFn,
	ebiederm-aS9lmoZGLiVWk0Htik3J/w,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, Jeff Layton,
	Pavel Emelyanov



On 08/03/2016 06:06 PM, J. Bruce Fields wrote:
> Digging around...  Oh, I see, there's an optional 'lock:..' line in
> /proc/[pid]/fdinfo/[pid] file, is that what you're looking at?  I'd
> forgotten.  Yeah, maybe that would make more sense long term.

Yep, that's the one but this requires the userspace to be updated to use
that interface. In the meantime we could do away with some maintenance
of the existing /proc/locks :)

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH v2] locks: Filter /proc/locks output on proc pid ns
@ 2016-08-03 15:10                           ` Nikolay Borisov
  0 siblings, 0 replies; 62+ messages in thread
From: Nikolay Borisov @ 2016-08-03 15:10 UTC (permalink / raw)
  To: J. Bruce Fields
  Cc: Pavel Emelyanov, Jeff Layton, Andrey Vagin, containers,
	linux-kernel, ebiederm, linux-fsdevel, viro



On 08/03/2016 06:06 PM, J. Bruce Fields wrote:
> Digging around...  Oh, I see, there's an optional 'lock:..' line in
> /proc/[pid]/fdinfo/[pid] file, is that what you're looking at?  I'd
> forgotten.  Yeah, maybe that would make more sense long term.

Yep, that's the one but this requires the userspace to be updated to use
that interface. In the meantime we could do away with some maintenance
of the existing /proc/locks :)

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCHv3] locks: Filter /proc/locks output on proc pid ns
       [not found]     ` <1470236078-2389-1-git-send-email-kernel-6AxghH7DbtA@public.gmane.org>
@ 2016-08-03 15:24       ` Jeff Layton
  2016-08-03 16:23       ` Eric W. Biederman
  2016-08-03 17:40       ` Eric W. Biederman
  2 siblings, 0 replies; 62+ messages in thread
From: Jeff Layton @ 2016-08-03 15:24 UTC (permalink / raw)
  To: Nikolay Borisov, bfields-uC3wQj2KruNg9hUCZPvPmw
  Cc: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	xemul-5HdwGun5lf+gSpxsJD1C4w, avagin-GEFAQzZX7r8dnm+yROfE0A,
	ebiederm-aS9lmoZGLiVWk0Htik3J/w

On Wed, 2016-08-03 at 17:54 +0300, Nikolay Borisov wrote:
> On busy container servers reading /proc/locks shows all the locks
> created by all clients. This can cause large latency spikes. In my
> case I observed lsof taking up to 5-10 seconds while processing around
> 50k locks. Fix this by limiting the locks shown only to those created
> in the same pidns as the one the proc fs was mounted in. When reading
> /proc/locks from the init_pid_ns proc instance then perform no
> filtering
> 
> > Signed-off-by: Nikolay Borisov <kernel@kyup.com>
> ---
>  fs/locks.c | 5 +++++
>  1 file changed, 5 insertions(+)
> 
> diff --git a/fs/locks.c b/fs/locks.c
> index ee1b15f6fc13..65e75810a836 100644
> --- a/fs/locks.c
> +++ b/fs/locks.c
> @@ -2648,9 +2648,14 @@ static int locks_show(struct seq_file *f, void *v)
>  {
> >  	struct locks_iterator *iter = f->private;
> >  	struct file_lock *fl, *bfl;
> > +	struct pid_namespace *proc_pidns = file_inode(f->file)->i_sb->s_fs_info;
>  
> >  	fl = hlist_entry(v, struct file_lock, fl_link);
>  
> > +	if ((proc_pidns != &init_pid_ns) && fl->fl_nspid
> > +	    && (proc_pidns != ns_of_pid(fl->fl_nspid)))
> > +		return 0;
> +
> >  	lock_get_status(f, fl, iter->li_pos, "");
>  
> >  	list_for_each_entry(bfl, &fl->fl_block, fl_block)


Yeah, that makes much more sense to me. I'll plan to merge this for
v4.9 unless there are objections between now and the next merge window.

Thanks,
-- 
Jeff Layton <jlayton@poochiereds.net>
_______________________________________________
Containers mailing list
Containers@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/containers

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCHv3] locks: Filter /proc/locks output on proc pid ns
       [not found]     ` <1470236078-2389-1-git-send-email-kernel-6AxghH7DbtA@public.gmane.org>
  2016-08-03 15:24       ` Jeff Layton
@ 2016-08-03 16:23       ` Eric W. Biederman
       [not found]         ` <87k2fxom8a.fsf-JOvCrm2gF+uungPnsOpG7nhyD016LWXt@public.gmane.org>
  2016-08-03 17:40       ` Eric W. Biederman
  2 siblings, 1 reply; 62+ messages in thread
From: Eric W. Biederman @ 2016-08-03 16:23 UTC (permalink / raw)
  To: Nikolay Borisov
  Cc: bfields-uC3wQj2KruNg9hUCZPvPmw, jlayton-vpEMnDpepFuMZCB2o+C8xQ,
	xemul-5HdwGun5lf+gSpxsJD1C4w,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	avagin-GEFAQzZX7r8dnm+yROfE0A

Nikolay Borisov <kernel-6AxghH7DbtA@public.gmane.org> writes:

> On busy container servers reading /proc/locks shows all the locks
> created by all clients. This can cause large latency spikes. In my
> case I observed lsof taking up to 5-10 seconds while processing around
> 50k locks. Fix this by limiting the locks shown only to those created
> in the same pidns as the one the proc fs was mounted in. When reading
> /proc/locks from the init_pid_ns proc instance then perform no
> filtering

If we are going to do this, this should be a recrusive belonging test
(because pid namespaces are recursive).

Right now the test looks like it will filter out child pid namespaces.

Special casing the init_pid_ns should be an optimization not something
that is necessary for correctness. (as it appears here).

Eric


> Signed-off-by: Nikolay Borisov <kernel-6AxghH7DbtA@public.gmane.org>
> ---
>  fs/locks.c | 5 +++++
>  1 file changed, 5 insertions(+)
>
> diff --git a/fs/locks.c b/fs/locks.c
> index ee1b15f6fc13..65e75810a836 100644
> --- a/fs/locks.c
> +++ b/fs/locks.c
> @@ -2648,9 +2648,14 @@ static int locks_show(struct seq_file *f, void *v)
>  {
>  	struct locks_iterator *iter = f->private;
>  	struct file_lock *fl, *bfl;
> +	struct pid_namespace *proc_pidns = file_inode(f->file)->i_sb->s_fs_info;
>  
>  	fl = hlist_entry(v, struct file_lock, fl_link);
>  
> +	if ((proc_pidns != &init_pid_ns) && fl->fl_nspid
> +	    && (proc_pidns != ns_of_pid(fl->fl_nspid)))
> +		return 0;
> +
>  	lock_get_status(f, fl, iter->li_pos, "");
>  
>  	list_for_each_entry(bfl, &fl->fl_block, fl_block)

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCHv3] locks: Filter /proc/locks output on proc pid ns
       [not found]         ` <87k2fxom8a.fsf-JOvCrm2gF+uungPnsOpG7nhyD016LWXt@public.gmane.org>
@ 2016-08-03 16:50           ` Jeff Layton
       [not found]             ` <1470243015.13804.7.camel-vpEMnDpepFuMZCB2o+C8xQ@public.gmane.org>
  0 siblings, 1 reply; 62+ messages in thread
From: Jeff Layton @ 2016-08-03 16:50 UTC (permalink / raw)
  To: Eric W. Biederman, Nikolay Borisov
  Cc: bfields-uC3wQj2KruNg9hUCZPvPmw,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	avagin-GEFAQzZX7r8dnm+yROfE0A, xemul-5HdwGun5lf+gSpxsJD1C4w

On Wed, 2016-08-03 at 11:23 -0500, Eric W. Biederman wrote:
> Nikolay Borisov <kernel@kyup.com> writes:
> 
> > 
> > On busy container servers reading /proc/locks shows all the locks
> > created by all clients. This can cause large latency spikes. In my
> > case I observed lsof taking up to 5-10 seconds while processing
> > around
> > 50k locks. Fix this by limiting the locks shown only to those
> > created
> > in the same pidns as the one the proc fs was mounted in. When
> > reading
> > /proc/locks from the init_pid_ns proc instance then perform no
> > filtering
> 
> If we are going to do this, this should be a recrusive belonging test
> (because pid namespaces are recursive).
> 
> Right now the test looks like it will filter out child pid
> namespaces.
> 
> Special casing the init_pid_ns should be an optimization not
> something
> that is necessary for correctness. (as it appears here).
> 
> Eric
> 
> 

Ok, thanks. I'm still not that namespace savvy -- so there's a
hierarchy of pid_namespaces?

If so, then yeah does sound better. Is there an interface that allows
you to tell whether a pid is a descendant of a particular
pid_namespace?

> > 
> > Signed-off-by: Nikolay Borisov <kernel@kyup.com>
> > ---
> >  fs/locks.c | 5 +++++
> >  1 file changed, 5 insertions(+)
> > 
> > diff --git a/fs/locks.c b/fs/locks.c
> > index ee1b15f6fc13..65e75810a836 100644
> > --- a/fs/locks.c
> > +++ b/fs/locks.c
> > @@ -2648,9 +2648,14 @@ static int locks_show(struct seq_file *f,
> > void *v)
> >  {
> >  	struct locks_iterator *iter = f->private;
> >  	struct file_lock *fl, *bfl;
> > +	struct pid_namespace *proc_pidns = file_inode(f->file)-
> > >i_sb->s_fs_info;
> >  
> >  	fl = hlist_entry(v, struct file_lock, fl_link);
> >  
> > +	if ((proc_pidns != &init_pid_ns) && fl->fl_nspid
> > +	    && (proc_pidns != ns_of_pid(fl->fl_nspid)))
> > +		return 0;
> > +
> >  	lock_get_status(f, fl, iter->li_pos, "");
> >  
> >  	list_for_each_entry(bfl, &fl->fl_block, fl_block)

-- 
Jeff Layton <jlayton@poochiereds.net>
_______________________________________________
Containers mailing list
Containers@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/containers

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH v2] locks: Filter /proc/locks output on proc pid ns
  2016-08-03 15:10                           ` Nikolay Borisov
@ 2016-08-03 17:35                               ` Eric W. Biederman
  -1 siblings, 0 replies; 62+ messages in thread
From: Eric W. Biederman @ 2016-08-03 17:35 UTC (permalink / raw)
  To: Nikolay Borisov
  Cc: Andrey Vagin,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA, J. Bruce Fields,
	viro-RmSDqhL/yNMiFSDQTTA3OLVCufUGDwFn,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, Jeff Layton,
	Pavel Emelyanov

Nikolay Borisov <kernel-6AxghH7DbtA@public.gmane.org> writes:

> On 08/03/2016 06:06 PM, J. Bruce Fields wrote:
>> Digging around...  Oh, I see, there's an optional 'lock:..' line in
>> /proc/[pid]/fdinfo/[pid] file, is that what you're looking at?  I'd
>> forgotten.  Yeah, maybe that would make more sense long term.
>
> Yep, that's the one but this requires the userspace to be updated to use
> that interface. In the meantime we could do away with some maintenance
> of the existing /proc/locks :)

I am tempted to say let's not change /proc/locks at all, but if locks
really are in a pid namespace than I do think it makes sense to filter
them in /proc just so there is not excessive visiblity outside of the
pid namespace.

Excessive visibility is a problem on it's own.

Eric

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH v2] locks: Filter /proc/locks output on proc pid ns
@ 2016-08-03 17:35                               ` Eric W. Biederman
  0 siblings, 0 replies; 62+ messages in thread
From: Eric W. Biederman @ 2016-08-03 17:35 UTC (permalink / raw)
  To: Nikolay Borisov
  Cc: J. Bruce Fields, Pavel Emelyanov, Jeff Layton, Andrey Vagin,
	containers, linux-kernel, linux-fsdevel, viro

Nikolay Borisov <kernel@kyup.com> writes:

> On 08/03/2016 06:06 PM, J. Bruce Fields wrote:
>> Digging around...  Oh, I see, there's an optional 'lock:..' line in
>> /proc/[pid]/fdinfo/[pid] file, is that what you're looking at?  I'd
>> forgotten.  Yeah, maybe that would make more sense long term.
>
> Yep, that's the one but this requires the userspace to be updated to use
> that interface. In the meantime we could do away with some maintenance
> of the existing /proc/locks :)

I am tempted to say let's not change /proc/locks at all, but if locks
really are in a pid namespace than I do think it makes sense to filter
them in /proc just so there is not excessive visiblity outside of the
pid namespace.

Excessive visibility is a problem on it's own.

Eric

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCHv3] locks: Filter /proc/locks output on proc pid ns
       [not found]     ` <1470236078-2389-1-git-send-email-kernel-6AxghH7DbtA@public.gmane.org>
  2016-08-03 15:24       ` Jeff Layton
  2016-08-03 16:23       ` Eric W. Biederman
@ 2016-08-03 17:40       ` Eric W. Biederman
  2 siblings, 0 replies; 62+ messages in thread
From: Eric W. Biederman @ 2016-08-03 17:40 UTC (permalink / raw)
  To: Nikolay Borisov
  Cc: bfields-uC3wQj2KruNg9hUCZPvPmw, jlayton-vpEMnDpepFuMZCB2o+C8xQ,
	xemul-5HdwGun5lf+gSpxsJD1C4w,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	avagin-GEFAQzZX7r8dnm+yROfE0A

Nikolay Borisov <kernel-6AxghH7DbtA@public.gmane.org> writes:

> On busy container servers reading /proc/locks shows all the locks
> created by all clients. This can cause large latency spikes. In my
> case I observed lsof taking up to 5-10 seconds while processing around
> 50k locks. Fix this by limiting the locks shown only to those created
> in the same pidns as the one the proc fs was mounted in. When reading
> /proc/locks from the init_pid_ns proc instance then perform no
> filtering
>
> Signed-off-by: Nikolay Borisov <kernel-6AxghH7DbtA@public.gmane.org>
> ---
>  fs/locks.c | 5 +++++
>  1 file changed, 5 insertions(+)
>
> diff --git a/fs/locks.c b/fs/locks.c
> index ee1b15f6fc13..65e75810a836 100644
> --- a/fs/locks.c
> +++ b/fs/locks.c
> @@ -2648,9 +2648,14 @@ static int locks_show(struct seq_file *f, void *v)
>  {
>  	struct locks_iterator *iter = f->private;
>  	struct file_lock *fl, *bfl;
> +	struct pid_namespace *proc_pidns = file_inode(f->file)->i_sb->s_fs_info;
>  
>  	fl = hlist_entry(v, struct file_lock, fl_link);
>  
> +	if ((proc_pidns != &init_pid_ns) && fl->fl_nspid
> +	    && (proc_pidns != ns_of_pid(fl->fl_nspid)))
> +		return 0;
> +

With no loss of generality you can simplify this check to:

	if ((fl->fl_ns_pid) && (pid_nr_ns(fl->fl_nspid, prod_pidns)))
        	return 0;

>  	lock_get_status(f, fl, iter->li_pos, "");
>  
>  	list_for_each_entry(bfl, &fl->fl_block, fl_block)


Of course now I am staring at the crazy use of pid_vnr in
lock_get_status.  That should probably be pid_nr_ns(proc_pidns) as well.


Eric

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCHv3] locks: Filter /proc/locks output on proc pid ns
       [not found]             ` <1470243015.13804.7.camel-vpEMnDpepFuMZCB2o+C8xQ@public.gmane.org>
@ 2016-08-03 21:09               ` Eric W. Biederman
       [not found]                 ` <87twf1ftk9.fsf-JOvCrm2gF+uungPnsOpG7nhyD016LWXt@public.gmane.org>
  0 siblings, 1 reply; 62+ messages in thread
From: Eric W. Biederman @ 2016-08-03 21:09 UTC (permalink / raw)
  To: Jeff Layton
  Cc: bfields-uC3wQj2KruNg9hUCZPvPmw,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	Nikolay Borisov, avagin-GEFAQzZX7r8dnm+yROfE0A,
	xemul-5HdwGun5lf+gSpxsJD1C4w

Jeff Layton <jlayton-vpEMnDpepFuMZCB2o+C8xQ@public.gmane.org> writes:

> On Wed, 2016-08-03 at 11:23 -0500, Eric W. Biederman wrote:
>> Nikolay Borisov <kernel-6AxghH7DbtA@public.gmane.org> writes:
>> 
>> > 
>> > On busy container servers reading /proc/locks shows all the locks
>> > created by all clients. This can cause large latency spikes. In my
>> > case I observed lsof taking up to 5-10 seconds while processing
>> > around
>> > 50k locks. Fix this by limiting the locks shown only to those
>> > created
>> > in the same pidns as the one the proc fs was mounted in. When
>> > reading
>> > /proc/locks from the init_pid_ns proc instance then perform no
>> > filtering
>> 
>> If we are going to do this, this should be a recrusive belonging test
>> (because pid namespaces are recursive).
>> 
>> Right now the test looks like it will filter out child pid
>> namespaces.
>> 
>> Special casing the init_pid_ns should be an optimization not
>> something
>> that is necessary for correctness. (as it appears here).
>> 
>> Eric
>> 
>> 
>
> Ok, thanks. I'm still not that namespace savvy -- so there's a
> hierarchy of pid_namespaces?

There is.

> If so, then yeah does sound better. Is there an interface that allows
> you to tell whether a pid is a descendant of a particular
> pid_namespace?

Yes.  And each pid has an array of the pid namespaces it is in so it is
a O(1) operation to see if that struct pid is in a pid namespace.

Dumb question does anyone know the difference between fl_nspid and
fl_pid off the top of your heads?  I am looking at the code and I am
confused why we have to both.  I am afraid that there was some
sloppiness when the pid namespace was implemented and this was the
result.  I remember that file locks were a rough spot during the
conversion but I don't recall the details off the top of my head.

Eric

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCHv3] locks: Filter /proc/locks output on proc pid ns
       [not found]                 ` <87twf1ftk9.fsf-JOvCrm2gF+uungPnsOpG7nhyD016LWXt@public.gmane.org>
@ 2016-08-03 21:26                   ` Nikolay Borisov
       [not found]                     ` <a0a58f75-0e40-c14f-d8e3-8f094e9fc62c-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
  0 siblings, 1 reply; 62+ messages in thread
From: Nikolay Borisov @ 2016-08-03 21:26 UTC (permalink / raw)
  To: Eric W. Biederman, Jeff Layton
  Cc: bfields-uC3wQj2KruNg9hUCZPvPmw,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	Nikolay Borisov, avagin-GEFAQzZX7r8dnm+yROfE0A,
	xemul-5HdwGun5lf+gSpxsJD1C4w



On  4.08.2016 00:09, Eric W. Biederman wrote:
> Dumb question does anyone know the difference between fl_nspid and
> fl_pid off the top of your heads?  I am looking at the code and I am
> confused why we have to both.  I am afraid that there was some
> sloppiness when the pid namespace was implemented and this was the
> result.  I remember that file locks were a rough spot during the
> conversion but I don't recall the details off the top of my head.

I think the fl_nspid is the actual pid namespace where the lock was
created, where as pid can be just a unique value required by NFS. I
gathered that information while reading the changelogs and the mailing
list discussion here:
https://lists.linuxfoundation.org/pipermail/containers/2007-December/009044.html

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCHv3] locks: Filter /proc/locks output on proc pid ns
       [not found]                     ` <a0a58f75-0e40-c14f-d8e3-8f094e9fc62c-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
@ 2016-08-04  4:18                       ` Eric W. Biederman
       [not found]                         ` <87eg659ngh.fsf-JOvCrm2gF+uungPnsOpG7nhyD016LWXt@public.gmane.org>
  0 siblings, 1 reply; 62+ messages in thread
From: Eric W. Biederman @ 2016-08-04  4:18 UTC (permalink / raw)
  To: Nikolay Borisov
  Cc: avagin-GEFAQzZX7r8dnm+yROfE0A,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	bfields-uC3wQj2KruNg9hUCZPvPmw, Nikolay Borisov, Jeff Layton,
	xemul-5HdwGun5lf+gSpxsJD1C4w

Nikolay Borisov <n.borisov.lkml-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> writes:

> On  4.08.2016 00:09, Eric W. Biederman wrote:
>> Dumb question does anyone know the difference between fl_nspid and
>> fl_pid off the top of your heads?  I am looking at the code and I am
>> confused why we have to both.  I am afraid that there was some
>> sloppiness when the pid namespace was implemented and this was the
>> result.  I remember that file locks were a rough spot during the
>> conversion but I don't recall the details off the top of my head.
>
> I think the fl_nspid is the actual pid namespace where the lock was
> created, where as pid can be just a unique value required by NFS. I
> gathered that information while reading the changelogs and the mailing
> list discussion here:
> https://lists.linuxfoundation.org/pipermail/containers/2007-December/009044.html

Thanks for the old thread.

Researching myself I see that for posix locks we have struct flock
that has a field l_pid that must be the pid of the process holding
the lock.

It looks like the explanation given in the old thread was inadequate,
and the code in the kernel is definitely incorrect with respect to pid
namespaces.  If the process holding the lock is in a different pid
namespace than the process waiting for the lock l_pid will be set
incorrectly.

Looking at the code it appears from the perspective of struct file_lock
the only difference between a remote lock and a local lock is that
fl_owner is set to point at a different structure.

Looking at the nfs case if I am reading my sources correctly the struct
nlm field named svid is the process id on the remote system, and other
nlm fields distinguish the host.

Is the desired behavior for nfs that for a remote lock we set l_pid
to the remote process id, and don't report a hint about which machine
the process id is on?

It does seem to make sense to have both fl_pid as a value usable
remotely and otherwise and fl_nspid (arguably it should be fl_local_pid)
as a value usable on the local machine.

I think the code that sets fl_nspid today is anything but obviously
correct and probably needs to be rewritten.

Eric

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCHv3] locks: Filter /proc/locks output on proc pid ns
       [not found]                         ` <87eg659ngh.fsf-JOvCrm2gF+uungPnsOpG7nhyD016LWXt@public.gmane.org>
@ 2016-08-04  5:07                           ` Eric W. Biederman
  0 siblings, 0 replies; 62+ messages in thread
From: Eric W. Biederman @ 2016-08-04  5:07 UTC (permalink / raw)
  To: Nikolay Borisov
  Cc: avagin-GEFAQzZX7r8dnm+yROfE0A,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	bfields-uC3wQj2KruNg9hUCZPvPmw, Nikolay Borisov, Jeff Layton,
	xemul-5HdwGun5lf+gSpxsJD1C4w

ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org (Eric W. Biederman) writes:

> Nikolay Borisov <n.borisov.lkml-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> writes:
>
>> On  4.08.2016 00:09, Eric W. Biederman wrote:
>>> Dumb question does anyone know the difference between fl_nspid and
>>> fl_pid off the top of your heads?  I am looking at the code and I am
>>> confused why we have to both.  I am afraid that there was some
>>> sloppiness when the pid namespace was implemented and this was the
>>> result.  I remember that file locks were a rough spot during the
>>> conversion but I don't recall the details off the top of my head.
>>
>> I think the fl_nspid is the actual pid namespace where the lock was
>> created, where as pid can be just a unique value required by NFS. I
>> gathered that information while reading the changelogs and the mailing
>> list discussion here:
>> https://lists.linuxfoundation.org/pipermail/containers/2007-December/009044.html
>
> Thanks for the old thread.
>
> Researching myself I see that for posix locks we have struct flock
> that has a field l_pid that must be the pid of the process holding
> the lock.
>
> It looks like the explanation given in the old thread was inadequate,
> and the code in the kernel is definitely incorrect with respect to pid
> namespaces.  If the process holding the lock is in a different pid
> namespace than the process waiting for the lock l_pid will be set
> incorrectly.
>
> Looking at the code it appears from the perspective of struct file_lock
> the only difference between a remote lock and a local lock is that
> fl_owner is set to point at a different structure.
>
> Looking at the nfs case if I am reading my sources correctly the struct
> nlm field named svid is the process id on the remote system, and other
> nlm fields distinguish the host.
>
> Is the desired behavior for nfs that for a remote lock we set l_pid
> to the remote process id, and don't report a hint about which machine
> the process id is on?
>
> It does seem to make sense to have both fl_pid as a value usable
> remotely and otherwise and fl_nspid (arguably it should be fl_local_pid)
> as a value usable on the local machine.
>
> I think the code that sets fl_nspid today is anything but obviously
> correct and probably needs to be rewritten.

Bah.  I take it back F_GETFL isis not broken when crossing pid
namespaces, the code is just dangerously clever.

Eric

^ permalink raw reply	[flat|nested] 62+ messages in thread

* [PATCHv4] locks: Filter /proc/locks output on proc pid ns
       [not found] ` <1470148943-21835-1-git-send-email-kernel-6AxghH7DbtA@public.gmane.org>
                     ` (4 preceding siblings ...)
  2016-08-03 14:54   ` [PATCHv3] " Nikolay Borisov
@ 2016-08-04  7:26   ` Nikolay Borisov
       [not found]     ` <1470295588-9803-1-git-send-email-kernel-6AxghH7DbtA@public.gmane.org>
  2016-08-05  7:30   ` [PATCHv5] " Nikolay Borisov
  6 siblings, 1 reply; 62+ messages in thread
From: Nikolay Borisov @ 2016-08-04  7:26 UTC (permalink / raw)
  To: ebiederm-aS9lmoZGLiVWk0Htik3J/w, jlayton-vpEMnDpepFuMZCB2o+C8xQ
  Cc: bfields-uC3wQj2KruNg9hUCZPvPmw,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	Nikolay Borisov

On busy container servers reading /proc/locks shows all the locks
created by all clients. This can cause large latency spikes. In my
case I observed lsof taking up to 5-10 seconds while processing around
50k locks. Fix this by limiting the locks shown only to those created
in the same pidns as the one the proc fs was mounted in. When reading
/proc/locks from the init_pid_ns proc instance then perform no
filtering

Signed-off-by: Nikolay Borisov <kernel-6AxghH7DbtA@public.gmane.org>
---
 fs/locks.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/fs/locks.c b/fs/locks.c
index ee1b15f6fc13..df038c27b19f 100644
--- a/fs/locks.c
+++ b/fs/locks.c
@@ -2648,9 +2648,13 @@ static int locks_show(struct seq_file *f, void *v)
 {
 	struct locks_iterator *iter = f->private;
 	struct file_lock *fl, *bfl;
+	struct pid_namespace *proc_pidns = file_inode(f->file)->i_sb->s_fs_info;
 
 	fl = hlist_entry(v, struct file_lock, fl_link);
 
+	if (fl->fl_nspid && !pid_nr_ns(fl->fl_nspid, proc_pidns))
+		return 0;
+
 	lock_get_status(f, fl, iter->li_pos, "");
 
 	list_for_each_entry(bfl, &fl->fl_block, fl_block)
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 62+ messages in thread

* Re: [PATCHv4] locks: Filter /proc/locks output on proc pid ns
       [not found]     ` <1470295588-9803-1-git-send-email-kernel-6AxghH7DbtA@public.gmane.org>
@ 2016-08-04 11:29       ` Jeff Layton
       [not found]         ` <1470310175.22052.3.camel-vpEMnDpepFuMZCB2o+C8xQ@public.gmane.org>
  0 siblings, 1 reply; 62+ messages in thread
From: Jeff Layton @ 2016-08-04 11:29 UTC (permalink / raw)
  To: Nikolay Borisov, ebiederm-aS9lmoZGLiVWk0Htik3J/w
  Cc: bfields-uC3wQj2KruNg9hUCZPvPmw,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA

On Thu, 2016-08-04 at 10:26 +0300, Nikolay Borisov wrote:
> On busy container servers reading /proc/locks shows all the locks
> created by all clients. This can cause large latency spikes. In my
> case I observed lsof taking up to 5-10 seconds while processing around
> 50k locks. Fix this by limiting the locks shown only to those created
> in the same pidns as the one the proc fs was mounted in. When reading
> /proc/locks from the init_pid_ns proc instance then perform no
> filtering
> 
> > Signed-off-by: Nikolay Borisov <kernel@kyup.com>
> ---
>  fs/locks.c | 4 ++++
>  1 file changed, 4 insertions(+)
> 
> diff --git a/fs/locks.c b/fs/locks.c
> index ee1b15f6fc13..df038c27b19f 100644
> --- a/fs/locks.c
> +++ b/fs/locks.c
> @@ -2648,9 +2648,13 @@ static int locks_show(struct seq_file *f, void *v)
>  {
> >  	struct locks_iterator *iter = f->private;
> >  	struct file_lock *fl, *bfl;
> > +	struct pid_namespace *proc_pidns = file_inode(f->file)->i_sb->s_fs_info;
>  
> >  	fl = hlist_entry(v, struct file_lock, fl_link);
>  
> > +	if (fl->fl_nspid && !pid_nr_ns(fl->fl_nspid, proc_pidns))
> > +		return 0;
> +
> >  	lock_get_status(f, fl, iter->li_pos, "");
>  
> >  	list_for_each_entry(bfl, &fl->fl_block, fl_block)

Looks reasonable to me. Eric, any comments? If this looks alright I'll
go ahead and merge into my -next branch for v4.9.

Thanks,
-- 
Jeff Layton <jlayton@poochiereds.net>
_______________________________________________
Containers mailing list
Containers@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/containers

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCHv4] locks: Filter /proc/locks output on proc pid ns
       [not found]         ` <1470310175.22052.3.camel-vpEMnDpepFuMZCB2o+C8xQ@public.gmane.org>
@ 2016-08-04 14:09           ` Eric W. Biederman
       [not found]             ` <874m707hhm.fsf-JOvCrm2gF+uungPnsOpG7nhyD016LWXt@public.gmane.org>
  0 siblings, 1 reply; 62+ messages in thread
From: Eric W. Biederman @ 2016-08-04 14:09 UTC (permalink / raw)
  To: Jeff Layton
  Cc: bfields-uC3wQj2KruNg9hUCZPvPmw,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	Nikolay Borisov

Jeff Layton <jlayton@poochiereds.net> writes:

> On Thu, 2016-08-04 at 10:26 +0300, Nikolay Borisov wrote:
>> On busy container servers reading /proc/locks shows all the locks
>> created by all clients. This can cause large latency spikes. In my
>> case I observed lsof taking up to 5-10 seconds while processing around
>> 50k locks. Fix this by limiting the locks shown only to those created
>> in the same pidns as the one the proc fs was mounted in. When reading
>> /proc/locks from the init_pid_ns proc instance then perform no
>> filtering
>> 
>> > Signed-off-by: Nikolay Borisov <kernel@kyup.com>
>> ---
>>  fs/locks.c | 4 ++++
>>  1 file changed, 4 insertions(+)
>> 
>> diff --git a/fs/locks.c b/fs/locks.c
>> index ee1b15f6fc13..df038c27b19f 100644
>> --- a/fs/locks.c
>> +++ b/fs/locks.c
>> @@ -2648,9 +2648,13 @@ static int locks_show(struct seq_file *f, void *v)
>>  {
>> >  	struct locks_iterator *iter = f->private;
>> >  	struct file_lock *fl, *bfl;
>> > +	struct pid_namespace *proc_pidns = file_inode(f->file)->i_sb->s_fs_info;
>>  
>> >  	fl = hlist_entry(v, struct file_lock, fl_link);
>>  
>> > +	if (fl->fl_nspid && !pid_nr_ns(fl->fl_nspid, proc_pidns))
>> > +		return 0;
>> +
>> >  	lock_get_status(f, fl, iter->li_pos, "");
>>  
>> >  	list_for_each_entry(bfl, &fl->fl_block, fl_block)
>
> Looks reasonable to me. Eric, any comments? If this looks alright I'll
> go ahead and merge into my -next branch for v4.9.

Generally this looks good to me.

Some related nits.
- We are not filtering the processes that are blocked waiting on the
  lock.

- The same issue shows up in show_fd_locks.

- In lock_get_status the code should say:
  if (fl->fl_nspid) {
  	/* Don't let fl_pid change depending on who is reading the file */
  	fl_pid = pid_nr_ns(fl->fl_nspid, proc_pidns);
        /* If there isn't a fl_pid don't display who is waiting on the lock */
        if (fl_pid == 0)
           return;
  } else {
  	fl_pid = fl->fl_pid;
  }

  All of which implies that lock_get_status needs to take proc_pidns
  from it's caller, or derive proc_pidns from the seq_file.
  
Eric
_______________________________________________
Containers mailing list
Containers@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/containers

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCHv4] locks: Filter /proc/locks output on proc pid ns
       [not found]             ` <874m707hhm.fsf-JOvCrm2gF+uungPnsOpG7nhyD016LWXt@public.gmane.org>
@ 2016-08-04 14:34               ` Nikolay Borisov
  2016-08-04 15:09               ` Nikolay Borisov
  1 sibling, 0 replies; 62+ messages in thread
From: Nikolay Borisov @ 2016-08-04 14:34 UTC (permalink / raw)
  To: Eric W. Biederman, Jeff Layton
  Cc: bfields-uC3wQj2KruNg9hUCZPvPmw,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	Nikolay Borisov



On 08/04/2016 05:09 PM, Eric W. Biederman wrote:
> - In lock_get_status the code should say:
>   if (fl->fl_nspid) {
>   	/* Don't let fl_pid change depending on who is reading the file */
>   	fl_pid = pid_nr_ns(fl->fl_nspid, proc_pidns);
>         /* If there isn't a fl_pid don't display who is waiting on the lock */
>         if (fl_pid == 0)
>            return;
>   } else {

Shall I fold this into my patch and resend or would you prefer this
change to be a separate patch ?

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCHv4] locks: Filter /proc/locks output on proc pid ns
       [not found]             ` <874m707hhm.fsf-JOvCrm2gF+uungPnsOpG7nhyD016LWXt@public.gmane.org>
  2016-08-04 14:34               ` Nikolay Borisov
@ 2016-08-04 15:09               ` Nikolay Borisov
       [not found]                 ` <57A35AC7.7040105-6AxghH7DbtA@public.gmane.org>
  1 sibling, 1 reply; 62+ messages in thread
From: Nikolay Borisov @ 2016-08-04 15:09 UTC (permalink / raw)
  To: Eric W. Biederman, Jeff Layton
  Cc: bfields-uC3wQj2KruNg9hUCZPvPmw,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA



On 08/04/2016 05:09 PM, Eric W. Biederman wrote:
> Jeff Layton <jlayton-vpEMnDpepFuMZCB2o+C8xQ@public.gmane.org> writes:
> 
>> On Thu, 2016-08-04 at 10:26 +0300, Nikolay Borisov wrote:
>>> On busy container servers reading /proc/locks shows all the locks
>>> created by all clients. This can cause large latency spikes. In my
>>> case I observed lsof taking up to 5-10 seconds while processing around
>>> 50k locks. Fix this by limiting the locks shown only to those created
>>> in the same pidns as the one the proc fs was mounted in. When reading
>>> /proc/locks from the init_pid_ns proc instance then perform no
>>> filtering
>>>
>>>> Signed-off-by: Nikolay Borisov <kernel-6AxghH7DbtA@public.gmane.org>
>>> ---
>>>  fs/locks.c | 4 ++++
>>>  1 file changed, 4 insertions(+)
>>>
>>> diff --git a/fs/locks.c b/fs/locks.c
>>> index ee1b15f6fc13..df038c27b19f 100644
>>> --- a/fs/locks.c
>>> +++ b/fs/locks.c
>>> @@ -2648,9 +2648,13 @@ static int locks_show(struct seq_file *f, void *v)
>>>  {
>>>>  	struct locks_iterator *iter = f->private;
>>>>  	struct file_lock *fl, *bfl;
>>>> +	struct pid_namespace *proc_pidns = file_inode(f->file)->i_sb->s_fs_info;
>>>  
>>>>  	fl = hlist_entry(v, struct file_lock, fl_link);
>>>  
>>>> +	if (fl->fl_nspid && !pid_nr_ns(fl->fl_nspid, proc_pidns))
>>>> +		return 0;
>>> +
>>>>  	lock_get_status(f, fl, iter->li_pos, "");
>>>  
>>>>  	list_for_each_entry(bfl, &fl->fl_block, fl_block)
>>
>> Looks reasonable to me. Eric, any comments? If this looks alright I'll
>> go ahead and merge into my -next branch for v4.9.
> 
> Generally this looks good to me.
> 
> Some related nits.
> - We are not filtering the processes that are blocked waiting on the
>   lock.
> 
> - The same issue shows up in show_fd_locks.
> 
> - In lock_get_status the code should say:
>   if (fl->fl_nspid) {
>   	/* Don't let fl_pid change depending on who is reading the file */
>   	fl_pid = pid_nr_ns(fl->fl_nspid, proc_pidns);
>         /* If there isn't a fl_pid don't display who is waiting on the lock */
>         if (fl_pid == 0)
>            return;
>   } else {
>   	fl_pid = fl->fl_pid;
>   }
> 
>   All of which implies that lock_get_status needs to take proc_pidns
>   from it's caller, or derive proc_pidns from the seq_file.

Just had a quick look at the code. If the aforementioned change is
introduced in lock_get_status and proc_pidns is derived from the
seq_file, then the issue in show_fd_locks would also be fixed, correct?

We essentially want to skip showing locks for whose owner we don't have
a mapping in the current pidns hierarchy?


>   
> Eric
> 

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCHv4] locks: Filter /proc/locks output on proc pid ns
       [not found]                 ` <57A35AC7.7040105-6AxghH7DbtA@public.gmane.org>
@ 2016-08-04 15:21                   ` Eric W. Biederman
  0 siblings, 0 replies; 62+ messages in thread
From: Eric W. Biederman @ 2016-08-04 15:21 UTC (permalink / raw)
  To: Nikolay Borisov
  Cc: bfields-uC3wQj2KruNg9hUCZPvPmw, Jeff Layton,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA

Nikolay Borisov <kernel-6AxghH7DbtA@public.gmane.org> writes:

> On 08/04/2016 05:09 PM, Eric W. Biederman wrote:
>> Jeff Layton <jlayton-vpEMnDpepFuMZCB2o+C8xQ@public.gmane.org> writes:
>> 
>>> On Thu, 2016-08-04 at 10:26 +0300, Nikolay Borisov wrote:
>>>> On busy container servers reading /proc/locks shows all the locks
>>>> created by all clients. This can cause large latency spikes. In my
>>>> case I observed lsof taking up to 5-10 seconds while processing around
>>>> 50k locks. Fix this by limiting the locks shown only to those created
>>>> in the same pidns as the one the proc fs was mounted in. When reading
>>>> /proc/locks from the init_pid_ns proc instance then perform no
>>>> filtering
>>>>
>>>>> Signed-off-by: Nikolay Borisov <kernel-6AxghH7DbtA@public.gmane.org>
>>>> ---
>>>>  fs/locks.c | 4 ++++
>>>>  1 file changed, 4 insertions(+)
>>>>
>>>> diff --git a/fs/locks.c b/fs/locks.c
>>>> index ee1b15f6fc13..df038c27b19f 100644
>>>> --- a/fs/locks.c
>>>> +++ b/fs/locks.c
>>>> @@ -2648,9 +2648,13 @@ static int locks_show(struct seq_file *f, void *v)
>>>>  {
>>>>>  	struct locks_iterator *iter = f->private;
>>>>>  	struct file_lock *fl, *bfl;
>>>>> +	struct pid_namespace *proc_pidns = file_inode(f->file)->i_sb->s_fs_info;
>>>>  
>>>>>  	fl = hlist_entry(v, struct file_lock, fl_link);
>>>>  
>>>>> +	if (fl->fl_nspid && !pid_nr_ns(fl->fl_nspid, proc_pidns))
>>>>> +		return 0;
>>>> +
>>>>>  	lock_get_status(f, fl, iter->li_pos, "");
>>>>  
>>>>>  	list_for_each_entry(bfl, &fl->fl_block, fl_block)
>>>
>>> Looks reasonable to me. Eric, any comments? If this looks alright I'll
>>> go ahead and merge into my -next branch for v4.9.
>> 
>> Generally this looks good to me.
>> 
>> Some related nits.
>> - We are not filtering the processes that are blocked waiting on the
>>   lock.
>> 
>> - The same issue shows up in show_fd_locks.
>> 
>> - In lock_get_status the code should say:
>>   if (fl->fl_nspid) {
>>   	/* Don't let fl_pid change depending on who is reading the file */
>>   	fl_pid = pid_nr_ns(fl->fl_nspid, proc_pidns);
>>         /* If there isn't a fl_pid don't display who is waiting on the lock */
>>         if (fl_pid == 0)
>>            return;
>>   } else {
>>   	fl_pid = fl->fl_pid;
>>   }
>> 
>>   All of which implies that lock_get_status needs to take proc_pidns
>>   from it's caller, or derive proc_pidns from the seq_file.
>
> Just had a quick look at the code. If the aforementioned change is
> introduced in lock_get_status and proc_pidns is derived from the
> seq_file, then the issue in show_fd_locks would also be fixed, correct?

Yes I believe so.

> We essentially want to skip showing locks for whose owner we don't have
> a mapping in the current pidns hierarchy?

Yes.  That is the semantic reason why this change is ok.  Don't display
things that are not parts of the current pid namespace.

It probably makes sense to fold all of these fixes together as they are
logically one semantic change.  Only show locks that are valid in procs
pid namespace.

Eric

^ permalink raw reply	[flat|nested] 62+ messages in thread

* [PATCHv5] locks: Filter /proc/locks output on proc pid ns
       [not found] ` <1470148943-21835-1-git-send-email-kernel-6AxghH7DbtA@public.gmane.org>
                     ` (5 preceding siblings ...)
  2016-08-04  7:26   ` [PATCHv4] " Nikolay Borisov
@ 2016-08-05  7:30   ` Nikolay Borisov
       [not found]     ` <1470382204-21480-1-git-send-email-kernel-6AxghH7DbtA@public.gmane.org>
  6 siblings, 1 reply; 62+ messages in thread
From: Nikolay Borisov @ 2016-08-05  7:30 UTC (permalink / raw)
  To: jlayton-vpEMnDpepFuMZCB2o+C8xQ
  Cc: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	Nikolay Borisov, ebiederm-aS9lmoZGLiVWk0Htik3J/w

On busy container servers reading /proc/locks shows all the locks
created by all clients. This can cause large latency spikes. In my
case I observed lsof taking up to 5-10 seconds while processing around
50k locks. Fix this by limiting the locks shown only to those created
in the same pidns as the one the proc fs was mounted in. When reading
/proc/locks from the init_pid_ns proc instance then perform no
filtering

Signed-off-by: Nikolay Borisov <kernel-6AxghH7DbtA@public.gmane.org>
Suggested-by: Eric W. Biederman <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
---
 fs/locks.c | 20 +++++++++++++++++---
 1 file changed, 17 insertions(+), 3 deletions(-)

diff --git a/fs/locks.c b/fs/locks.c
index ee1b15f6fc13..484b7e106076 100644
--- a/fs/locks.c
+++ b/fs/locks.c
@@ -2574,9 +2574,19 @@ static void lock_get_status(struct seq_file *f, struct file_lock *fl,
 	struct inode *inode = NULL;
 	unsigned int fl_pid;
 
-	if (fl->fl_nspid)
-		fl_pid = pid_vnr(fl->fl_nspid);
-	else
+	if (fl->fl_nspid) {
+		struct pid_namespace *proc_pidns = file_inode(f->file)->i_sb->s_fs_info;
+
+		/* Don't let fl_pid change depending on who is reading the file */
+		fl_pid = pid_nr_ns(fl->fl_nspid, proc_pidns);
+
+		/* If there isn't a fl_pid don't display who is waiting on the lock
+		 * if we are called from locks_show, or if we are called from
+		 * __show_fd_info - skip lock entirely
+		 */
+		if (fl_pid == 0)
+			return;
+	} else
 		fl_pid = fl->fl_pid;
 
 	if (fl->fl_file != NULL)
@@ -2648,9 +2658,13 @@ static int locks_show(struct seq_file *f, void *v)
 {
 	struct locks_iterator *iter = f->private;
 	struct file_lock *fl, *bfl;
+	struct pid_namespace *proc_pidns = file_inode(f->file)->i_sb->s_fs_info;
 
 	fl = hlist_entry(v, struct file_lock, fl_link);
 
+	if (fl->fl_nspid && !pid_nr_ns(fl->fl_nspid, proc_pidns))
+		return 0;
+
 	lock_get_status(f, fl, iter->li_pos, "");
 
 	list_for_each_entry(bfl, &fl->fl_block, fl_block)
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 62+ messages in thread

* Re: [PATCHv5] locks: Filter /proc/locks output on proc pid ns
       [not found]     ` <1470382204-21480-1-git-send-email-kernel-6AxghH7DbtA@public.gmane.org>
@ 2016-08-05 10:47       ` Jeff Layton
       [not found]         ` <1470394036.8100.2.camel-vpEMnDpepFuMZCB2o+C8xQ@public.gmane.org>
  0 siblings, 1 reply; 62+ messages in thread
From: Jeff Layton @ 2016-08-05 10:47 UTC (permalink / raw)
  To: Nikolay Borisov
  Cc: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	J. Bruce Fields, ebiederm-aS9lmoZGLiVWk0Htik3J/w

On Fri, 2016-08-05 at 10:30 +0300, Nikolay Borisov wrote:
> On busy container servers reading /proc/locks shows all the locks
> created by all clients. This can cause large latency spikes. In my
> case I observed lsof taking up to 5-10 seconds while processing around
> 50k locks. Fix this by limiting the locks shown only to those created
> in the same pidns as the one the proc fs was mounted in. When reading
> /proc/locks from the init_pid_ns proc instance then perform no
> filtering
> 
> > Signed-off-by: Nikolay Borisov <kernel@kyup.com>
> > Suggested-by: Eric W. Biederman <ebiederm@xmission.com>
> ---
>  fs/locks.c | 20 +++++++++++++++++---
>  1 file changed, 17 insertions(+), 3 deletions(-)
> 
> diff --git a/fs/locks.c b/fs/locks.c
> index ee1b15f6fc13..484b7e106076 100644
> --- a/fs/locks.c
> +++ b/fs/locks.c
> @@ -2574,9 +2574,19 @@ static void lock_get_status(struct seq_file *f, struct file_lock *fl,
> >  	struct inode *inode = NULL;
> >  	unsigned int fl_pid;
>  
> > -	if (fl->fl_nspid)
> > -		fl_pid = pid_vnr(fl->fl_nspid);
> > -	else
> > +	if (fl->fl_nspid) {
> > +		struct pid_namespace *proc_pidns = file_inode(f->file)->i_sb->s_fs_info;
> +
> > +		/* Don't let fl_pid change depending on who is reading the file */
> > +		fl_pid = pid_nr_ns(fl->fl_nspid, proc_pidns);
> +
> > +		/* If there isn't a fl_pid don't display who is waiting on the lock
> > +		 * if we are called from locks_show, or if we are called from
> > +		 * __show_fd_info - skip lock entirely
> > +		 */
> > +		if (fl_pid == 0)
> > +			return;
> > +	} else
> >  		fl_pid = fl->fl_pid;
>  
> >  	if (fl->fl_file != NULL)
> @@ -2648,9 +2658,13 @@ static int locks_show(struct seq_file *f, void *v)
>  {
> >  	struct locks_iterator *iter = f->private;
> >  	struct file_lock *fl, *bfl;
> > +	struct pid_namespace *proc_pidns = file_inode(f->file)->i_sb->s_fs_info;
>  
> >  	fl = hlist_entry(v, struct file_lock, fl_link);
>  
> > +	if (fl->fl_nspid && !pid_nr_ns(fl->fl_nspid, proc_pidns))
> > +		return 0;
> +
> >  	lock_get_status(f, fl, iter->li_pos, "");
>  
> >  	list_for_each_entry(bfl, &fl->fl_block, fl_block)

Looks good to me. I'll go ahead and merge this into my locks branch for
v4.9 and get it into -next.

Thanks!
-- 
Jeff Layton <jlayton@poochiereds.net>
_______________________________________________
Containers mailing list
Containers@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/containers

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCHv5] locks: Filter /proc/locks output on proc pid ns
       [not found]         ` <1470394036.8100.2.camel-vpEMnDpepFuMZCB2o+C8xQ@public.gmane.org>
@ 2016-08-05 14:58           ` J. Bruce Fields
  0 siblings, 0 replies; 62+ messages in thread
From: J. Bruce Fields @ 2016-08-05 14:58 UTC (permalink / raw)
  To: Jeff Layton
  Cc: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	Nikolay Borisov, ebiederm-aS9lmoZGLiVWk0Htik3J/w

On Fri, Aug 05, 2016 at 06:47:16AM -0400, Jeff Layton wrote:
> On Fri, 2016-08-05 at 10:30 +0300, Nikolay Borisov wrote:
> > On busy container servers reading /proc/locks shows all the locks
> > created by all clients. This can cause large latency spikes. In my
> > case I observed lsof taking up to 5-10 seconds while processing around
> > 50k locks. Fix this by limiting the locks shown only to those created
> > in the same pidns as the one the proc fs was mounted in. When reading
> > /proc/locks from the init_pid_ns proc instance then perform no
> > filtering
> > 
> > > Signed-off-by: Nikolay Borisov <kernel-6AxghH7DbtA@public.gmane.org>
> > > Suggested-by: Eric W. Biederman <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
> > ---
> >  fs/locks.c | 20 +++++++++++++++++---
> >  1 file changed, 17 insertions(+), 3 deletions(-)
> > 
> > diff --git a/fs/locks.c b/fs/locks.c
> > index ee1b15f6fc13..484b7e106076 100644
> > --- a/fs/locks.c
> > +++ b/fs/locks.c
> > @@ -2574,9 +2574,19 @@ static void lock_get_status(struct seq_file *f, struct file_lock *fl,
> > >  	struct inode *inode = NULL;
> > >  	unsigned int fl_pid;
> >  
> > > -	if (fl->fl_nspid)
> > > -		fl_pid = pid_vnr(fl->fl_nspid);
> > > -	else
> > > +	if (fl->fl_nspid) {
> > > +		struct pid_namespace *proc_pidns = file_inode(f->file)->i_sb->s_fs_info;
> > +
> > > +		/* Don't let fl_pid change depending on who is reading the file */
> > > +		fl_pid = pid_nr_ns(fl->fl_nspid, proc_pidns);
> > +
> > > +		/* If there isn't a fl_pid don't display who is waiting on the lock
> > > +		 * if we are called from locks_show, or if we are called from
> > > +		 * __show_fd_info - skip lock entirely
> > > +		 */
> > > +		if (fl_pid == 0)
> > > +			return;
> > > +	} else
> > >  		fl_pid = fl->fl_pid;
> >  
> > >  	if (fl->fl_file != NULL)
> > @@ -2648,9 +2658,13 @@ static int locks_show(struct seq_file *f, void *v)
> >  {
> > >  	struct locks_iterator *iter = f->private;
> > >  	struct file_lock *fl, *bfl;
> > > +	struct pid_namespace *proc_pidns = file_inode(f->file)->i_sb->s_fs_info;
> >  
> > >  	fl = hlist_entry(v, struct file_lock, fl_link);
> >  
> > > +	if (fl->fl_nspid && !pid_nr_ns(fl->fl_nspid, proc_pidns))
> > > +		return 0;
> > +
> > >  	lock_get_status(f, fl, iter->li_pos, "");
> >  
> > >  	list_for_each_entry(bfl, &fl->fl_block, fl_block)
> 
> Looks good to me. I'll go ahead and merge this into my locks branch for
> v4.9 and get it into -next.

Makes sense to me.  Thanks also to Eric for the help.

--b.

^ permalink raw reply	[flat|nested] 62+ messages in thread

end of thread, other threads:[~2016-08-05 14:58 UTC | newest]

Thread overview: 62+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-08-02 14:42 [RFC PATCH] locks: Show only file_locks created in the same pidns as current process Nikolay Borisov
2016-08-02 14:42 ` Nikolay Borisov
     [not found] ` <1470148943-21835-1-git-send-email-kernel-6AxghH7DbtA@public.gmane.org>
2016-08-02 14:45   ` Nikolay Borisov
2016-08-02 14:45     ` Nikolay Borisov
2016-08-02 15:05   ` J. Bruce Fields
2016-08-02 16:00   ` Eric W. Biederman
2016-08-02 16:00     ` Eric W. Biederman
     [not found]     ` <87r3a7qhy0.fsf-JOvCrm2gF+uungPnsOpG7nhyD016LWXt@public.gmane.org>
2016-08-02 17:40       ` J. Bruce Fields
2016-08-02 17:40         ` J. Bruce Fields
     [not found]         ` <20160802174003.GD11767-uC3wQj2KruNg9hUCZPvPmw@public.gmane.org>
2016-08-02 19:09           ` Eric W. Biederman
2016-08-02 19:09             ` Eric W. Biederman
     [not found]             ` <87invjq97h.fsf-JOvCrm2gF+uungPnsOpG7nhyD016LWXt@public.gmane.org>
2016-08-02 19:44               ` J. Bruce Fields
2016-08-02 19:44                 ` J. Bruce Fields
     [not found]                 ` <20160802194437.GD15324-uC3wQj2KruNg9hUCZPvPmw@public.gmane.org>
2016-08-02 20:01                   ` Jeff Layton
2016-08-02 20:01                     ` Jeff Layton
     [not found]                     ` <1470168082.15226.14.camel-vpEMnDpepFuMZCB2o+C8xQ@public.gmane.org>
2016-08-02 20:11                       ` Nikolay Borisov
2016-08-02 20:11                         ` Nikolay Borisov
2016-08-02 20:34                       ` J. Bruce Fields
2016-08-02 20:34                         ` J. Bruce Fields
2016-08-03  7:35   ` [PATCH v2] locks: Filter /proc/locks output on proc pid ns Nikolay Borisov
2016-08-03  7:35     ` Nikolay Borisov
     [not found]     ` <1470209710-30022-1-git-send-email-kernel-6AxghH7DbtA@public.gmane.org>
2016-08-03 13:46       ` Jeff Layton
2016-08-03 13:46         ` Jeff Layton
     [not found]         ` <1470232012.18285.4.camel-vpEMnDpepFuMZCB2o+C8xQ@public.gmane.org>
2016-08-03 14:17           ` Nikolay Borisov
2016-08-03 14:17             ` Nikolay Borisov
2016-08-03 14:28             ` J. Bruce Fields
     [not found]               ` <20160803142850.GA27072-uC3wQj2KruNg9hUCZPvPmw@public.gmane.org>
2016-08-03 14:33                 ` Nikolay Borisov
2016-08-03 14:33                   ` Nikolay Borisov
     [not found]             ` <57A1FCE5.3040206-6AxghH7DbtA@public.gmane.org>
2016-08-03 14:28               ` J. Bruce Fields
2016-08-03 14:54               ` Pavel Emelyanov
2016-08-03 14:54                 ` Pavel Emelyanov
     [not found]                 ` <57A205BE.3070202-5HdwGun5lf+gSpxsJD1C4w@public.gmane.org>
2016-08-03 15:00                   ` Nikolay Borisov
2016-08-03 15:00                 ` Nikolay Borisov
     [not found]                   ` <57A20702.3040805-6AxghH7DbtA@public.gmane.org>
2016-08-03 15:06                     ` J. Bruce Fields
2016-08-03 15:06                       ` J. Bruce Fields
     [not found]                       ` <20160803150631.GA3789-uC3wQj2KruNg9hUCZPvPmw@public.gmane.org>
2016-08-03 15:10                         ` Nikolay Borisov
2016-08-03 15:10                           ` Nikolay Borisov
     [not found]                           ` <57A2097C.7060206-6AxghH7DbtA@public.gmane.org>
2016-08-03 17:35                             ` Eric W. Biederman
2016-08-03 17:35                               ` Eric W. Biederman
2016-08-03 14:54   ` [PATCHv3] " Nikolay Borisov
     [not found]     ` <1470236078-2389-1-git-send-email-kernel-6AxghH7DbtA@public.gmane.org>
2016-08-03 15:24       ` Jeff Layton
2016-08-03 16:23       ` Eric W. Biederman
     [not found]         ` <87k2fxom8a.fsf-JOvCrm2gF+uungPnsOpG7nhyD016LWXt@public.gmane.org>
2016-08-03 16:50           ` Jeff Layton
     [not found]             ` <1470243015.13804.7.camel-vpEMnDpepFuMZCB2o+C8xQ@public.gmane.org>
2016-08-03 21:09               ` Eric W. Biederman
     [not found]                 ` <87twf1ftk9.fsf-JOvCrm2gF+uungPnsOpG7nhyD016LWXt@public.gmane.org>
2016-08-03 21:26                   ` Nikolay Borisov
     [not found]                     ` <a0a58f75-0e40-c14f-d8e3-8f094e9fc62c-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2016-08-04  4:18                       ` Eric W. Biederman
     [not found]                         ` <87eg659ngh.fsf-JOvCrm2gF+uungPnsOpG7nhyD016LWXt@public.gmane.org>
2016-08-04  5:07                           ` Eric W. Biederman
2016-08-03 17:40       ` Eric W. Biederman
2016-08-04  7:26   ` [PATCHv4] " Nikolay Borisov
     [not found]     ` <1470295588-9803-1-git-send-email-kernel-6AxghH7DbtA@public.gmane.org>
2016-08-04 11:29       ` Jeff Layton
     [not found]         ` <1470310175.22052.3.camel-vpEMnDpepFuMZCB2o+C8xQ@public.gmane.org>
2016-08-04 14:09           ` Eric W. Biederman
     [not found]             ` <874m707hhm.fsf-JOvCrm2gF+uungPnsOpG7nhyD016LWXt@public.gmane.org>
2016-08-04 14:34               ` Nikolay Borisov
2016-08-04 15:09               ` Nikolay Borisov
     [not found]                 ` <57A35AC7.7040105-6AxghH7DbtA@public.gmane.org>
2016-08-04 15:21                   ` Eric W. Biederman
2016-08-05  7:30   ` [PATCHv5] " Nikolay Borisov
     [not found]     ` <1470382204-21480-1-git-send-email-kernel-6AxghH7DbtA@public.gmane.org>
2016-08-05 10:47       ` Jeff Layton
     [not found]         ` <1470394036.8100.2.camel-vpEMnDpepFuMZCB2o+C8xQ@public.gmane.org>
2016-08-05 14:58           ` J. Bruce Fields
2016-08-02 15:05 ` [RFC PATCH] locks: Show only file_locks created in the same pidns as current process J. Bruce Fields
     [not found]   ` <20160802150521.GB11767-uC3wQj2KruNg9hUCZPvPmw@public.gmane.org>
2016-08-02 15:20     ` Nikolay Borisov
2016-08-02 15:20       ` Nikolay Borisov
     [not found]       ` <57A0BA40.5010406-6AxghH7DbtA@public.gmane.org>
2016-08-02 15:43         ` J. Bruce Fields
2016-08-02 15:43           ` J. Bruce Fields

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.