linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] proc: report open files as size in stat() for /proc/pid/fd
@ 2022-09-20 19:06 Ivan Babrou
  2022-09-21 10:21 ` David Laight
  0 siblings, 1 reply; 3+ messages in thread
From: Ivan Babrou @ 2022-09-20 19:06 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: linux-kernel, kernel-team, Alexey Dobriyan, Al Viro,
	Theodore Ts'o, Ivan Babrou, Jonathan Corbet, Andrew Morton,
	David Hildenbrand, Johannes Weiner, Christoph Anton Mitterer,
	Vincent Whitchurch, Mike Rapoport, Yang Shi, Paul Gortmaker,
	Kalesh Singh

Many monitoring tools include open file count as a metric. Currently
the only way to get this number is to enumerate the files in /proc/pid/fd.

The problem with the current approach is that it does many things people
generally don't care about when they need one number for a metric.
In our tests for cadvisor, which reports open file counts per cgroup,
we observed that reading the number of open files is slow. Out of 35.23%
of CPU time spent in `proc_readfd_common`, we see 29.43% spent in
`proc_fill_cache`, which is responsible for filling dentry info.
Some of this extra time is spinlock contention, but it's a contention
for the lock we don't want to take to begin with.

We considered putting the number of open files in /proc/pid/status.
Unfortunately, counting the number of fds involves iterating the open_files
bitmap, which has a linear complexity in proportion with the number
of open files (bitmap slots really, but it's close). We don't want
to make /proc/pid/status any slower, so instead we put this info
in /proc/pid/fd as a size member of the stat syscall result.
Previously the reported number was zero, so there's very little
risk of breaking anything, while still providing a somewhat logical
way to count the open files with a fallback if it's zero.

RFC for this patch included iterating open fds under RCU. Thanks
to Frank Hofmann for the suggestion to use the bitmap instead.

Previously:

```
$ sudo stat /proc/1/fd | head -n2
  File: /proc/1/fd
  Size: 0         	Blocks: 0          IO Block: 1024   directory
```

With this patch:

```
$ sudo stat /proc/1/fd | head -n2
  File: /proc/1/fd
  Size: 65        	Blocks: 0          IO Block: 1024   directory
```

Correctness check:

```
$ sudo ls /proc/1/fd | wc -l
65
```

I added the docs for /proc/<pid>/fd while I'm at it.

Signed-off-by: Ivan Babrou <ivan@cloudflare.com>
---
 Documentation/filesystems/proc.rst | 17 ++++++++++++++
 fs/proc/fd.c                       | 36 ++++++++++++++++++++++++++++++
 2 files changed, 53 insertions(+)

diff --git a/Documentation/filesystems/proc.rst b/Documentation/filesystems/proc.rst
index e7aafc82be99..394548d26187 100644
--- a/Documentation/filesystems/proc.rst
+++ b/Documentation/filesystems/proc.rst
@@ -47,6 +47,7 @@ fixes/update part 1.1  Stefani Seibold <stefani@seibold.net>    June 9 2009
   3.10  /proc/<pid>/timerslack_ns - Task timerslack value
   3.11	/proc/<pid>/patch_state - Livepatch patch operation state
   3.12	/proc/<pid>/arch_status - Task architecture specific information
+  3.13  /proc/<pid>/fd - List of symlinks to open files
 
   4	Configuring procfs
   4.1	Mount options
@@ -2145,6 +2146,22 @@ AVX512_elapsed_ms
   the task is unlikely an AVX512 user, but depends on the workload and the
   scheduling scenario, it also could be a false negative mentioned above.
 
+3.13 /proc/<pid>/fd - List of symlinks to open files
+-------------------------------------------------------
+This directory contains symbolic links which represent open files
+the process is maintaining.  Example output::
+
+  lr-x------ 1 root root 64 Sep 20 17:53 0 -> /dev/null
+  l-wx------ 1 root root 64 Sep 20 17:53 1 -> /dev/null
+  lrwx------ 1 root root 64 Sep 20 17:53 10 -> 'socket:[12539]'
+  lrwx------ 1 root root 64 Sep 20 17:53 11 -> 'socket:[12540]'
+  lrwx------ 1 root root 64 Sep 20 17:53 12 -> 'socket:[12542]'
+
+The number of open files for the process is stored in 'size' member
+of stat() output for /proc/<pid>/fd for fast access.
+-------------------------------------------------------
+
+
 Chapter 4: Configuring procfs
 =============================
 
diff --git a/fs/proc/fd.c b/fs/proc/fd.c
index 913bef0d2a36..c3c4bf2939a7 100644
--- a/fs/proc/fd.c
+++ b/fs/proc/fd.c
@@ -279,6 +279,26 @@ static int proc_readfd_common(struct file *file, struct dir_context *ctx,
 	return 0;
 }
 
+static int proc_readfd_count(struct inode *inode)
+{
+	struct task_struct *p = get_proc_task(inode);
+	struct fdtable *fdt;
+	unsigned int i, size, open_fds = 0;
+
+	if (!p)
+		return -ENOENT;
+
+	if (p->files) {
+		fdt = files_fdtable(p->files);
+		size = fdt->max_fds;
+
+		for (i = size / BITS_PER_LONG; i > 0;)
+			open_fds += hweight64(fdt->open_fds[--i]);
+	}
+
+	return open_fds;
+}
+
 static int proc_readfd(struct file *file, struct dir_context *ctx)
 {
 	return proc_readfd_common(file, ctx, proc_fd_instantiate);
@@ -319,9 +339,25 @@ int proc_fd_permission(struct user_namespace *mnt_userns,
 	return rv;
 }
 
+static int proc_fd_getattr(struct user_namespace *mnt_userns,
+			const struct path *path, struct kstat *stat,
+			u32 request_mask, unsigned int query_flags)
+{
+	struct inode *inode = d_inode(path->dentry);
+
+	generic_fillattr(&init_user_ns, inode, stat);
+
+	/* If it's a directory, put the number of open fds there */
+	if (S_ISDIR(inode->i_mode))
+		stat->size = proc_readfd_count(inode);
+
+	return 0;
+}
+
 const struct inode_operations proc_fd_inode_operations = {
 	.lookup		= proc_lookupfd,
 	.permission	= proc_fd_permission,
+	.getattr	= proc_fd_getattr,
 	.setattr	= proc_setattr,
 };
 
-- 
2.37.2


^ permalink raw reply related	[flat|nested] 3+ messages in thread

* RE: [PATCH] proc: report open files as size in stat() for /proc/pid/fd
  2022-09-20 19:06 [PATCH] proc: report open files as size in stat() for /proc/pid/fd Ivan Babrou
@ 2022-09-21 10:21 ` David Laight
  2022-09-21 18:08   ` Ivan Babrou
  0 siblings, 1 reply; 3+ messages in thread
From: David Laight @ 2022-09-21 10:21 UTC (permalink / raw)
  To: 'Ivan Babrou', linux-fsdevel
  Cc: linux-kernel, kernel-team, Alexey Dobriyan, Al Viro,
	Theodore Ts'o, Jonathan Corbet, Andrew Morton,
	David Hildenbrand, Johannes Weiner, Christoph Anton Mitterer,
	Vincent Whitchurch, Mike Rapoport, Yang Shi, Paul Gortmaker,
	Kalesh Singh

From: Ivan Babrou
> Sent: 20 September 2022 20:06
...
> 
> +static int proc_readfd_count(struct inode *inode)
> +{
> +	struct task_struct *p = get_proc_task(inode);
> +	struct fdtable *fdt;
> +	unsigned int i, size, open_fds = 0;
> +
> +	if (!p)
> +		return -ENOENT;
> +
> +	if (p->files) {
> +		fdt = files_fdtable(p->files);
> +		size = fdt->max_fds;
> +
> +		for (i = size / BITS_PER_LONG; i > 0;)
> +			open_fds += hweight64(fdt->open_fds[--i]);
> +	}
> +
> +	return open_fds;
> +}
> +

Doesn't that need (at least) rcu protection?
There might also be issues reading p->files twice.

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)


^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH] proc: report open files as size in stat() for /proc/pid/fd
  2022-09-21 10:21 ` David Laight
@ 2022-09-21 18:08   ` Ivan Babrou
  0 siblings, 0 replies; 3+ messages in thread
From: Ivan Babrou @ 2022-09-21 18:08 UTC (permalink / raw)
  To: David Laight
  Cc: linux-fsdevel, linux-kernel, kernel-team, Alexey Dobriyan,
	Al Viro, Theodore Ts'o, Jonathan Corbet, Andrew Morton,
	David Hildenbrand, Johannes Weiner, Christoph Anton Mitterer,
	Vincent Whitchurch, Mike Rapoport, Yang Shi, Paul Gortmaker,
	Kalesh Singh

On Wed, Sep 21, 2022 at 3:21 AM David Laight <David.Laight@aculab.com> wrote:
>
> From: Ivan Babrou
> > Sent: 20 September 2022 20:06
> ...
> >
> > +static int proc_readfd_count(struct inode *inode)
> > +{
> > +     struct task_struct *p = get_proc_task(inode);
> > +     struct fdtable *fdt;
> > +     unsigned int i, size, open_fds = 0;
> > +
> > +     if (!p)
> > +             return -ENOENT;
> > +
> > +     if (p->files) {
> > +             fdt = files_fdtable(p->files);
> > +             size = fdt->max_fds;
> > +
> > +             for (i = size / BITS_PER_LONG; i > 0;)
> > +                     open_fds += hweight64(fdt->open_fds[--i]);
> > +     }

I'm missing put_task_struct(p) here.

> > +
> > +     return open_fds;
> > +}
> > +
>
> Doesn't that need (at least) rcu protection?

Should I enclose the "if" in rcu_read_lock() / rcu_read_unlock()?

files_fdtable() is this:

* https://elixir.bootlin.com/linux/v6.0-rc6/source/include/linux/fdtable.h#L77

#define files_fdtable(files) \
    rcu_dereference_check_fdtable((files), (files)->fdt)

And rcu_dereference_check_fdtable() is

#define rcu_dereference_check_fdtable(files, fdtfd) \
  rcu_dereference_check((fdtfd), lockdep_is_held(&(files)->file_lock))

I definitely need some help with locking here.

> There might also be issues reading p->files twice.

This block for reading twice:

if (p->files) {
    fdt = files_fdtable(p->files);

Already exists in fs/proc/array.c in task_state():

* https://elixir.bootlin.com/linux/v6.0-rc6/source/fs/proc/array.c#L173

There's task_lock(p) there, so maybe that's why it's allowed.

Should I run files_fdtable(p->files) unconditionally and then check
the result instead? I'm happy to change it to something else if you
can tell me what it should be.

If there are kernel options I should enable for testing this, I'd be
happy to hear them. So far we've tried running with KASAN with no
issues.

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2022-09-21 18:08 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-09-20 19:06 [PATCH] proc: report open files as size in stat() for /proc/pid/fd Ivan Babrou
2022-09-21 10:21 ` David Laight
2022-09-21 18:08   ` Ivan Babrou

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).