All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Christian König" <christian.koenig@amd.com>
To: Kalesh Singh <kaleshsingh@google.com>
Cc: ilkos@google.com, tjmercier@google.com, surenb@google.com,
	kernel-team@android.com, Jonathan Corbet <corbet@lwn.net>,
	Sumit Semwal <sumit.semwal@linaro.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	Christoph Anton Mitterer <mail@christoph.anton.mitterer.name>,
	Kees Cook <keescook@chromium.org>,
	Mike Rapoport <rppt@kernel.org>, Colin Cross <ccross@google.com>,
	Randy Dunlap <rdunlap@infradead.org>,
	linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org,
	linux-doc@vger.kernel.org, linux-media@vger.kernel.org,
	dri-devel@lists.freedesktop.org, linaro-mm-sig@lists.linaro.org
Subject: Re: [RFC PATCH] procfs: Add file path and size to /proc/<pid>/fdinfo
Date: Fri, 20 May 2022 08:29:18 +0200	[thread overview]
Message-ID: <4e35dc30-1157-50b3-e3b6-954481a0524d@amd.com> (raw)
In-Reply-To: <20220519214021.3572840-1-kaleshsingh@google.com>

Am 19.05.22 um 23:40 schrieb Kalesh Singh:
> Processes can pin shared memory by keeping a handle to it through a
> file descriptor; for instance dmabufs, memfd, and ashsmem (in Android).
>
> In the case of a memory leak, to identify the process pinning the
> memory, userspace needs to:
>    - Iterate the /proc/<pid>/fd/* for each process
>    - Do a readlink on each entry to identify the type of memory from
>      the file path.
>    - stat() each entry to get the size of the memory.
>
> The file permissions on /proc/<pid>/fd/* only allows for the owner
> or root to perform the operations above; and so is not suitable for
> capturing the system-wide state in a production environment.
>
> This issue was addressed for dmabufs by making /proc/*/fdinfo/*
> accessible to a process with PTRACE_MODE_READ_FSCREDS credentials[1]
> To allow the same kind of tracking for other types of shared memory,
> add the following fields to /proc/<pid>/fdinfo/<fd>:
>
> path - This allows identifying the type of memory based on common
>         prefixes: e.g. "/memfd...", "/dmabuf...", "/dev/ashmem..."
>
>         This was not an issued when dmabuf tracking was introduced
>         because the exp_name field of dmabuf fdinfo could be used
>         to distinguish dmabuf fds from other types.
>
> size - To track the amount of memory that is being pinned.
>
>         dmabufs expose size as an additional field in fdinfo. Remove
>         this and make it a common field for all fds.
>
> Access to /proc/<pid>/fdinfo is governed by PTRACE_MODE_READ_FSCREDS
> -- the same as for /proc/<pid>/maps which also exposes the path and
> size for mapped memory regions.
>
> This allows for a system process with PTRACE_MODE_READ_FSCREDS to
> account the pinned per-process memory via fdinfo.

I think this should be split into two patches, one adding the size and 
one adding the path.

Adding the size is completely unproblematic, but the path might raise 
some eyebrows.

>
> [1] https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flore.kernel.org%2Flkml%2F20210308170651.919148-1-kaleshsingh%40google.com%2F&amp;data=05%7C01%7Cchristian.koenig%40amd.com%7C95ee7bf71c2c4aa342fa08da39e03398%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637885932392014544%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&amp;sdata=kf%2B2es12hV3z5zjOFhx3EyxI1XEMeHexqTLNpNoDhAY%3D&amp;reserved=0
>
> Signed-off-by: Kalesh Singh <kaleshsingh@google.com>
> ---
>   Documentation/filesystems/proc.rst | 22 ++++++++++++++++++++--
>   drivers/dma-buf/dma-buf.c          |  1 -
>   fs/proc/fd.c                       |  9 +++++++--
>   3 files changed, 27 insertions(+), 5 deletions(-)
>
> diff --git a/Documentation/filesystems/proc.rst b/Documentation/filesystems/proc.rst
> index 061744c436d9..ad66d78aca51 100644
> --- a/Documentation/filesystems/proc.rst
> +++ b/Documentation/filesystems/proc.rst
> @@ -1922,13 +1922,16 @@ if precise results are needed.
>   3.8	/proc/<pid>/fdinfo/<fd> - Information about opened file
>   ---------------------------------------------------------------
>   This file provides information associated with an opened file. The regular
> -files have at least four fields -- 'pos', 'flags', 'mnt_id' and 'ino'.
> +files have at least six fields -- 'pos', 'flags', 'mnt_id', 'ino', 'size',
> +and 'path'.
> +
>   The 'pos' represents the current offset of the opened file in decimal
>   form [see lseek(2) for details], 'flags' denotes the octal O_xxx mask the
>   file has been created with [see open(2) for details] and 'mnt_id' represents
>   mount ID of the file system containing the opened file [see 3.5
>   /proc/<pid>/mountinfo for details]. 'ino' represents the inode number of
> -the file.
> +the file, 'size' represents the size of the file in bytes, and 'path'
> +represents the file path.
>   
>   A typical output is::
>   
> @@ -1936,6 +1939,8 @@ A typical output is::
>   	flags:	0100002
>   	mnt_id:	19
>   	ino:	63107
> +        size:   0
> +        path:   /dev/null
>   
>   All locks associated with a file descriptor are shown in its fdinfo too::
>   
> @@ -1953,6 +1958,8 @@ Eventfd files
>   	flags:	04002
>   	mnt_id:	9
>   	ino:	63107
> +        size:   0
> +        path:   anon_inode:[eventfd]
>   	eventfd-count:	5a
>   
>   where 'eventfd-count' is hex value of a counter.
> @@ -1966,6 +1973,8 @@ Signalfd files
>   	flags:	04002
>   	mnt_id:	9
>   	ino:	63107
> +        size:   0
> +        path:   anon_inode:[signalfd]
>   	sigmask:	0000000000000200
>   
>   where 'sigmask' is hex value of the signal mask associated
> @@ -1980,6 +1989,8 @@ Epoll files
>   	flags:	02
>   	mnt_id:	9
>   	ino:	63107
> +        size:   0
> +        path:   anon_inode:[eventpoll]
>   	tfd:        5 events:       1d data: ffffffffffffffff pos:0 ino:61af sdev:7
>   
>   where 'tfd' is a target file descriptor number in decimal form,
> @@ -1998,6 +2009,8 @@ For inotify files the format is the following::
>   	flags:	02000000
>   	mnt_id:	9
>   	ino:	63107
> +        size:   0
> +        path:   anon_inode:inotify
>   	inotify wd:3 ino:9e7e sdev:800013 mask:800afce ignored_mask:0 fhandle-bytes:8 fhandle-type:1 f_handle:7e9e0000640d1b6d
>   
>   where 'wd' is a watch descriptor in decimal form, i.e. a target file
> @@ -2021,6 +2034,8 @@ For fanotify files the format is::
>   	flags:	02
>   	mnt_id:	9
>   	ino:	63107
> +        size:   0
> +        path:   anon_inode:[fanotify]
>   	fanotify flags:10 event-flags:0
>   	fanotify mnt_id:12 mflags:40 mask:38 ignored_mask:40000003
>   	fanotify ino:4f969 sdev:800013 mflags:0 mask:3b ignored_mask:40000000 fhandle-bytes:8 fhandle-type:1 f_handle:69f90400c275b5b4
> @@ -2046,6 +2061,8 @@ Timerfd files
>   	flags:	02
>   	mnt_id:	9
>   	ino:	63107
> +        size:   0
> +        path:   anon_inode:[timerfd]
>   	clockid: 0
>   	ticks: 0
>   	settime flags: 01
> @@ -2070,6 +2087,7 @@ DMA Buffer files
>   	mnt_id:	9
>   	ino:	63107
>   	size:   32768
> +        path:   /dmabuf:
>   	count:  2
>   	exp_name:  system-heap
>   
> diff --git a/drivers/dma-buf/dma-buf.c b/drivers/dma-buf/dma-buf.c
> index b1e25ae98302..d61183ff3c30 100644
> --- a/drivers/dma-buf/dma-buf.c
> +++ b/drivers/dma-buf/dma-buf.c
> @@ -377,7 +377,6 @@ static void dma_buf_show_fdinfo(struct seq_file *m, struct file *file)
>   {
>   	struct dma_buf *dmabuf = file->private_data;
>   
> -	seq_printf(m, "size:\t%zu\n", dmabuf->size);
>   	/* Don't count the temporary reference taken inside procfs seq_show */
>   	seq_printf(m, "count:\t%ld\n", file_count(dmabuf->file) - 1);
>   	seq_printf(m, "exp_name:\t%s\n", dmabuf->exp_name);
> diff --git a/fs/proc/fd.c b/fs/proc/fd.c
> index 913bef0d2a36..a8a968bc58f0 100644
> --- a/fs/proc/fd.c
> +++ b/fs/proc/fd.c
> @@ -54,10 +54,15 @@ static int seq_show(struct seq_file *m, void *v)
>   	if (ret)
>   		return ret;
>   
> -	seq_printf(m, "pos:\t%lli\nflags:\t0%o\nmnt_id:\t%i\nino:\t%lu\n",
> +	seq_printf(m, "pos:\t%lli\nflags:\t0%o\nmnt_id:\t%i\nino:\t%lu\nsize:\t%zu\n",
>   		   (long long)file->f_pos, f_flags,
>   		   real_mount(file->f_path.mnt)->mnt_id,
> -		   file_inode(file)->i_ino);
> +		   file_inode(file)->i_ino,
> +		   file_inode(file)->i_size);

We might consider splitting this into multiple seq_printf calls, one for 
each printed attribute.

It becomes a bit unreadable and the minimal additional overhead 
shouldn't matter that much.

Regards,
Christian.

> +
> +	seq_puts(m, "path:\t");
> +	seq_file_path(m, file, "\n");
> +	seq_putc(m, '\n');
>   
>   	/* show_fd_locks() never deferences files so a stale value is safe */
>   	show_fd_locks(m, file, files);
>
> base-commit: b015dcd62b86d298829990f8261d5d154b8d7af5


WARNING: multiple messages have this Message-ID (diff)
From: "Christian König" <christian.koenig@amd.com>
To: Kalesh Singh <kaleshsingh@google.com>
Cc: linux-doc@vger.kernel.org, Kees Cook <keescook@chromium.org>,
	Jonathan Corbet <corbet@lwn.net>,
	Christoph Anton Mitterer <mail@christoph.anton.mitterer.name>,
	kernel-team@android.com, Randy Dunlap <rdunlap@infradead.org>,
	ilkos@google.com, linux-kernel@vger.kernel.org,
	Colin Cross <ccross@google.com>,
	tjmercier@google.com, linaro-mm-sig@lists.linaro.org,
	dri-devel@lists.freedesktop.org, linux-fsdevel@vger.kernel.org,
	Andrew Morton <akpm@linux-foundation.org>,
	surenb@google.com, Sumit Semwal <sumit.semwal@linaro.org>,
	Mike Rapoport <rppt@kernel.org>,
	linux-media@vger.kernel.org
Subject: Re: [RFC PATCH] procfs: Add file path and size to /proc/<pid>/fdinfo
Date: Fri, 20 May 2022 08:29:18 +0200	[thread overview]
Message-ID: <4e35dc30-1157-50b3-e3b6-954481a0524d@amd.com> (raw)
In-Reply-To: <20220519214021.3572840-1-kaleshsingh@google.com>

Am 19.05.22 um 23:40 schrieb Kalesh Singh:
> Processes can pin shared memory by keeping a handle to it through a
> file descriptor; for instance dmabufs, memfd, and ashsmem (in Android).
>
> In the case of a memory leak, to identify the process pinning the
> memory, userspace needs to:
>    - Iterate the /proc/<pid>/fd/* for each process
>    - Do a readlink on each entry to identify the type of memory from
>      the file path.
>    - stat() each entry to get the size of the memory.
>
> The file permissions on /proc/<pid>/fd/* only allows for the owner
> or root to perform the operations above; and so is not suitable for
> capturing the system-wide state in a production environment.
>
> This issue was addressed for dmabufs by making /proc/*/fdinfo/*
> accessible to a process with PTRACE_MODE_READ_FSCREDS credentials[1]
> To allow the same kind of tracking for other types of shared memory,
> add the following fields to /proc/<pid>/fdinfo/<fd>:
>
> path - This allows identifying the type of memory based on common
>         prefixes: e.g. "/memfd...", "/dmabuf...", "/dev/ashmem..."
>
>         This was not an issued when dmabuf tracking was introduced
>         because the exp_name field of dmabuf fdinfo could be used
>         to distinguish dmabuf fds from other types.
>
> size - To track the amount of memory that is being pinned.
>
>         dmabufs expose size as an additional field in fdinfo. Remove
>         this and make it a common field for all fds.
>
> Access to /proc/<pid>/fdinfo is governed by PTRACE_MODE_READ_FSCREDS
> -- the same as for /proc/<pid>/maps which also exposes the path and
> size for mapped memory regions.
>
> This allows for a system process with PTRACE_MODE_READ_FSCREDS to
> account the pinned per-process memory via fdinfo.

I think this should be split into two patches, one adding the size and 
one adding the path.

Adding the size is completely unproblematic, but the path might raise 
some eyebrows.

>
> [1] https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flore.kernel.org%2Flkml%2F20210308170651.919148-1-kaleshsingh%40google.com%2F&amp;data=05%7C01%7Cchristian.koenig%40amd.com%7C95ee7bf71c2c4aa342fa08da39e03398%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637885932392014544%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&amp;sdata=kf%2B2es12hV3z5zjOFhx3EyxI1XEMeHexqTLNpNoDhAY%3D&amp;reserved=0
>
> Signed-off-by: Kalesh Singh <kaleshsingh@google.com>
> ---
>   Documentation/filesystems/proc.rst | 22 ++++++++++++++++++++--
>   drivers/dma-buf/dma-buf.c          |  1 -
>   fs/proc/fd.c                       |  9 +++++++--
>   3 files changed, 27 insertions(+), 5 deletions(-)
>
> diff --git a/Documentation/filesystems/proc.rst b/Documentation/filesystems/proc.rst
> index 061744c436d9..ad66d78aca51 100644
> --- a/Documentation/filesystems/proc.rst
> +++ b/Documentation/filesystems/proc.rst
> @@ -1922,13 +1922,16 @@ if precise results are needed.
>   3.8	/proc/<pid>/fdinfo/<fd> - Information about opened file
>   ---------------------------------------------------------------
>   This file provides information associated with an opened file. The regular
> -files have at least four fields -- 'pos', 'flags', 'mnt_id' and 'ino'.
> +files have at least six fields -- 'pos', 'flags', 'mnt_id', 'ino', 'size',
> +and 'path'.
> +
>   The 'pos' represents the current offset of the opened file in decimal
>   form [see lseek(2) for details], 'flags' denotes the octal O_xxx mask the
>   file has been created with [see open(2) for details] and 'mnt_id' represents
>   mount ID of the file system containing the opened file [see 3.5
>   /proc/<pid>/mountinfo for details]. 'ino' represents the inode number of
> -the file.
> +the file, 'size' represents the size of the file in bytes, and 'path'
> +represents the file path.
>   
>   A typical output is::
>   
> @@ -1936,6 +1939,8 @@ A typical output is::
>   	flags:	0100002
>   	mnt_id:	19
>   	ino:	63107
> +        size:   0
> +        path:   /dev/null
>   
>   All locks associated with a file descriptor are shown in its fdinfo too::
>   
> @@ -1953,6 +1958,8 @@ Eventfd files
>   	flags:	04002
>   	mnt_id:	9
>   	ino:	63107
> +        size:   0
> +        path:   anon_inode:[eventfd]
>   	eventfd-count:	5a
>   
>   where 'eventfd-count' is hex value of a counter.
> @@ -1966,6 +1973,8 @@ Signalfd files
>   	flags:	04002
>   	mnt_id:	9
>   	ino:	63107
> +        size:   0
> +        path:   anon_inode:[signalfd]
>   	sigmask:	0000000000000200
>   
>   where 'sigmask' is hex value of the signal mask associated
> @@ -1980,6 +1989,8 @@ Epoll files
>   	flags:	02
>   	mnt_id:	9
>   	ino:	63107
> +        size:   0
> +        path:   anon_inode:[eventpoll]
>   	tfd:        5 events:       1d data: ffffffffffffffff pos:0 ino:61af sdev:7
>   
>   where 'tfd' is a target file descriptor number in decimal form,
> @@ -1998,6 +2009,8 @@ For inotify files the format is the following::
>   	flags:	02000000
>   	mnt_id:	9
>   	ino:	63107
> +        size:   0
> +        path:   anon_inode:inotify
>   	inotify wd:3 ino:9e7e sdev:800013 mask:800afce ignored_mask:0 fhandle-bytes:8 fhandle-type:1 f_handle:7e9e0000640d1b6d
>   
>   where 'wd' is a watch descriptor in decimal form, i.e. a target file
> @@ -2021,6 +2034,8 @@ For fanotify files the format is::
>   	flags:	02
>   	mnt_id:	9
>   	ino:	63107
> +        size:   0
> +        path:   anon_inode:[fanotify]
>   	fanotify flags:10 event-flags:0
>   	fanotify mnt_id:12 mflags:40 mask:38 ignored_mask:40000003
>   	fanotify ino:4f969 sdev:800013 mflags:0 mask:3b ignored_mask:40000000 fhandle-bytes:8 fhandle-type:1 f_handle:69f90400c275b5b4
> @@ -2046,6 +2061,8 @@ Timerfd files
>   	flags:	02
>   	mnt_id:	9
>   	ino:	63107
> +        size:   0
> +        path:   anon_inode:[timerfd]
>   	clockid: 0
>   	ticks: 0
>   	settime flags: 01
> @@ -2070,6 +2087,7 @@ DMA Buffer files
>   	mnt_id:	9
>   	ino:	63107
>   	size:   32768
> +        path:   /dmabuf:
>   	count:  2
>   	exp_name:  system-heap
>   
> diff --git a/drivers/dma-buf/dma-buf.c b/drivers/dma-buf/dma-buf.c
> index b1e25ae98302..d61183ff3c30 100644
> --- a/drivers/dma-buf/dma-buf.c
> +++ b/drivers/dma-buf/dma-buf.c
> @@ -377,7 +377,6 @@ static void dma_buf_show_fdinfo(struct seq_file *m, struct file *file)
>   {
>   	struct dma_buf *dmabuf = file->private_data;
>   
> -	seq_printf(m, "size:\t%zu\n", dmabuf->size);
>   	/* Don't count the temporary reference taken inside procfs seq_show */
>   	seq_printf(m, "count:\t%ld\n", file_count(dmabuf->file) - 1);
>   	seq_printf(m, "exp_name:\t%s\n", dmabuf->exp_name);
> diff --git a/fs/proc/fd.c b/fs/proc/fd.c
> index 913bef0d2a36..a8a968bc58f0 100644
> --- a/fs/proc/fd.c
> +++ b/fs/proc/fd.c
> @@ -54,10 +54,15 @@ static int seq_show(struct seq_file *m, void *v)
>   	if (ret)
>   		return ret;
>   
> -	seq_printf(m, "pos:\t%lli\nflags:\t0%o\nmnt_id:\t%i\nino:\t%lu\n",
> +	seq_printf(m, "pos:\t%lli\nflags:\t0%o\nmnt_id:\t%i\nino:\t%lu\nsize:\t%zu\n",
>   		   (long long)file->f_pos, f_flags,
>   		   real_mount(file->f_path.mnt)->mnt_id,
> -		   file_inode(file)->i_ino);
> +		   file_inode(file)->i_ino,
> +		   file_inode(file)->i_size);

We might consider splitting this into multiple seq_printf calls, one for 
each printed attribute.

It becomes a bit unreadable and the minimal additional overhead 
shouldn't matter that much.

Regards,
Christian.

> +
> +	seq_puts(m, "path:\t");
> +	seq_file_path(m, file, "\n");
> +	seq_putc(m, '\n');
>   
>   	/* show_fd_locks() never deferences files so a stale value is safe */
>   	show_fd_locks(m, file, files);
>
> base-commit: b015dcd62b86d298829990f8261d5d154b8d7af5


  parent reply	other threads:[~2022-05-20  6:29 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-05-19 21:40 [RFC PATCH] procfs: Add file path and size to /proc/<pid>/fdinfo Kalesh Singh
2022-05-19 21:40 ` Kalesh Singh
2022-05-19 21:46 ` Randy Dunlap
2022-05-19 21:46   ` Randy Dunlap
2022-05-19 21:53   ` Kalesh Singh
2022-05-19 21:53     ` Kalesh Singh
2022-05-20  1:49 ` Kees Cook
2022-05-20  1:49   ` Kees Cook
2022-05-20  4:01   ` Kalesh Singh
2022-05-20  4:01     ` Kalesh Singh
2022-05-20  2:18 ` kernel test robot
2022-05-20  2:33 ` kernel test robot
2022-05-20  6:29 ` Christian König [this message]
2022-05-20  6:29   ` Christian König
2022-05-20 16:12   ` Kalesh Singh
2022-05-20 16:12     ` Kalesh Singh

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4e35dc30-1157-50b3-e3b6-954481a0524d@amd.com \
    --to=christian.koenig@amd.com \
    --cc=akpm@linux-foundation.org \
    --cc=ccross@google.com \
    --cc=corbet@lwn.net \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=ilkos@google.com \
    --cc=kaleshsingh@google.com \
    --cc=keescook@chromium.org \
    --cc=kernel-team@android.com \
    --cc=linaro-mm-sig@lists.linaro.org \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-media@vger.kernel.org \
    --cc=mail@christoph.anton.mitterer.name \
    --cc=rdunlap@infradead.org \
    --cc=rppt@kernel.org \
    --cc=sumit.semwal@linaro.org \
    --cc=surenb@google.com \
    --cc=tjmercier@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.