All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 0/2] procfs: Add file path and size to /proc/<pid>/fdinfo
@ 2022-06-23 22:06 ` Kalesh Singh
  0 siblings, 0 replies; 20+ messages in thread
From: Kalesh Singh @ 2022-06-23 22:06 UTC (permalink / raw)
  To: ckoenig.leichtzumerken, christian.koenig, viro, hch,
	stephen.s.brennan, David.Laight
  Cc: Randy Dunlap, linux-doc, Jonathan Corbet,
	Christoph Anton Mitterer, kernel-team, Johannes Weiner, ilkos,
	linux-kernel, Colin Cross, Sumit Semwal, linaro-mm-sig,
	Paul Gortmaker, dri-devel, Kalesh Singh, linux-fsdevel,
	Andrew Morton, surenb, tjmercier, Mike Rapoport, linux-media

Hi all,

This is v2 of the fdinfo patches. The main update is adding path
field only for files with anon inodes. Rebased on 5.19-rc3.

The previous cover letter is copied below for convenience.

Thanks,
Kalesh

-----------

Processes can pin shared memory by keeping a handle to it through a
file descriptor; for instance dmabufs, memfd, and ashmem (in Android).

In the case of a memory leak, to identify the process pinning the
memory, userspace needs to:
  - Iterate the /proc/<pid>/fd/* for each process
  - Do a readlink on each entry to identify the type of memory from
    the file path.
  - stat() each entry to get the size of the memory.

The file permissions on /proc/<pid>/fd/* only allows for the owner
or root to perform the operations above; and so is not suitable for
capturing the system-wide state in a production environment.

This issue was addressed for dmabufs by making /proc/*/fdinfo/*
accessible to a process with PTRACE_MODE_READ_FSCREDS credentials[1]
To allow the same kind of tracking for other types of shared memory,
add the following fields to /proc/<pid>/fdinfo/<fd>:

path - This allows identifying the type of memory based on common
       prefixes: e.g. "/memfd...", "/dmabuf...", "/dev/ashmem..."

       This was not an issued when dmabuf tracking was introduced
       because the exp_name field of dmabuf fdinfo could be used
       to distinguish dmabuf fds from other types.

size - To track the amount of memory that is being pinned.

       dmabufs expose size as an additional field in fdinfo. Remove
       this and make it a common field for all fds.

Access to /proc/<pid>/fdinfo is governed by PTRACE_MODE_READ_FSCREDS
-- the same as for /proc/<pid>/maps which also exposes the path and
size for mapped memory regions.

This allows for a system process with PTRACE_MODE_READ_FSCREDS to
account the pinned per-process memory via fdinfo.


Kalesh Singh (2):
  procfs: Add 'size' to /proc/<pid>/fdinfo/
  procfs: Add 'path' to /proc/<pid>/fdinfo/

 Documentation/filesystems/proc.rst | 22 ++++++++++++++++++++--
 drivers/dma-buf/dma-buf.c          |  1 -
 fs/libfs.c                         |  9 +++++++++
 fs/proc/fd.c                       | 18 ++++++++++++++----
 include/linux/fs.h                 |  1 +
 5 files changed, 44 insertions(+), 7 deletions(-)


base-commit: a111daf0c53ae91e71fd2bfe7497862d14132e3e
-- 
2.37.0.rc0.161.g10f37bed90-goog


^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH v2 0/2] procfs: Add file path and size to /proc/<pid>/fdinfo
@ 2022-06-23 22:06 ` Kalesh Singh
  0 siblings, 0 replies; 20+ messages in thread
From: Kalesh Singh @ 2022-06-23 22:06 UTC (permalink / raw)
  To: ckoenig.leichtzumerken, christian.koenig, viro, hch,
	stephen.s.brennan, David.Laight
  Cc: ilkos, tjmercier, surenb, kernel-team, Kalesh Singh,
	Jonathan Corbet, Sumit Semwal, Andrew Morton,
	Christoph Anton Mitterer, Johannes Weiner, Mike Rapoport,
	Colin Cross, Paul Gortmaker, Randy Dunlap, linux-kernel,
	linux-fsdevel, linux-doc, linux-media, dri-devel, linaro-mm-sig

Hi all,

This is v2 of the fdinfo patches. The main update is adding path
field only for files with anon inodes. Rebased on 5.19-rc3.

The previous cover letter is copied below for convenience.

Thanks,
Kalesh

-----------

Processes can pin shared memory by keeping a handle to it through a
file descriptor; for instance dmabufs, memfd, and ashmem (in Android).

In the case of a memory leak, to identify the process pinning the
memory, userspace needs to:
  - Iterate the /proc/<pid>/fd/* for each process
  - Do a readlink on each entry to identify the type of memory from
    the file path.
  - stat() each entry to get the size of the memory.

The file permissions on /proc/<pid>/fd/* only allows for the owner
or root to perform the operations above; and so is not suitable for
capturing the system-wide state in a production environment.

This issue was addressed for dmabufs by making /proc/*/fdinfo/*
accessible to a process with PTRACE_MODE_READ_FSCREDS credentials[1]
To allow the same kind of tracking for other types of shared memory,
add the following fields to /proc/<pid>/fdinfo/<fd>:

path - This allows identifying the type of memory based on common
       prefixes: e.g. "/memfd...", "/dmabuf...", "/dev/ashmem..."

       This was not an issued when dmabuf tracking was introduced
       because the exp_name field of dmabuf fdinfo could be used
       to distinguish dmabuf fds from other types.

size - To track the amount of memory that is being pinned.

       dmabufs expose size as an additional field in fdinfo. Remove
       this and make it a common field for all fds.

Access to /proc/<pid>/fdinfo is governed by PTRACE_MODE_READ_FSCREDS
-- the same as for /proc/<pid>/maps which also exposes the path and
size for mapped memory regions.

This allows for a system process with PTRACE_MODE_READ_FSCREDS to
account the pinned per-process memory via fdinfo.


Kalesh Singh (2):
  procfs: Add 'size' to /proc/<pid>/fdinfo/
  procfs: Add 'path' to /proc/<pid>/fdinfo/

 Documentation/filesystems/proc.rst | 22 ++++++++++++++++++++--
 drivers/dma-buf/dma-buf.c          |  1 -
 fs/libfs.c                         |  9 +++++++++
 fs/proc/fd.c                       | 18 ++++++++++++++----
 include/linux/fs.h                 |  1 +
 5 files changed, 44 insertions(+), 7 deletions(-)


base-commit: a111daf0c53ae91e71fd2bfe7497862d14132e3e
-- 
2.37.0.rc0.161.g10f37bed90-goog


^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH v2 1/2] procfs: Add 'size' to /proc/<pid>/fdinfo/
  2022-06-23 22:06 ` Kalesh Singh
@ 2022-06-23 22:06   ` Kalesh Singh
  -1 siblings, 0 replies; 20+ messages in thread
From: Kalesh Singh @ 2022-06-23 22:06 UTC (permalink / raw)
  To: ckoenig.leichtzumerken, christian.koenig, viro, hch,
	stephen.s.brennan, David.Laight
  Cc: Randy Dunlap, linux-doc, Jonathan Corbet,
	Christoph Anton Mitterer, kernel-team, Johannes Weiner, ilkos,
	linux-kernel, dri-devel, Sumit Semwal, linaro-mm-sig,
	Paul Gortmaker, Kalesh Singh, linux-fsdevel, Andrew Morton,
	surenb, tjmercier, Mike Rapoport, linux-media

To be able to account the amount of memory a process is keeping pinned
by open file descriptors add a 'size' field to fdinfo output.

dmabufs fds already expose a 'size' field for this reason, remove this
and make it a common field for all fds. This allows tracking of
other types of memory (e.g. memfd and ashmem in Android).

Signed-off-by: Kalesh Singh <kaleshsingh@google.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
---

Changes in v2:
  - Add Christian's Reviewed-by

Changes from rfc:
  - Split adding 'size' and 'path' into a separate patches, per Christian
  - Split fdinfo seq_printf into separate lines, per Christian
  - Fix indentation (use tabs) in documentaion, per Randy

 Documentation/filesystems/proc.rst | 12 ++++++++++--
 drivers/dma-buf/dma-buf.c          |  1 -
 fs/proc/fd.c                       |  9 +++++----
 3 files changed, 15 insertions(+), 7 deletions(-)

diff --git a/Documentation/filesystems/proc.rst b/Documentation/filesystems/proc.rst
index 1bc91fb8c321..779c05528e87 100644
--- a/Documentation/filesystems/proc.rst
+++ b/Documentation/filesystems/proc.rst
@@ -1886,13 +1886,14 @@ if precise results are needed.
 3.8	/proc/<pid>/fdinfo/<fd> - Information about opened file
 ---------------------------------------------------------------
 This file provides information associated with an opened file. The regular
-files have at least four fields -- 'pos', 'flags', 'mnt_id' and 'ino'.
+files have at least five fields -- 'pos', 'flags', 'mnt_id', 'ino', and 'size'.
+
 The 'pos' represents the current offset of the opened file in decimal
 form [see lseek(2) for details], 'flags' denotes the octal O_xxx mask the
 file has been created with [see open(2) for details] and 'mnt_id' represents
 mount ID of the file system containing the opened file [see 3.5
 /proc/<pid>/mountinfo for details]. 'ino' represents the inode number of
-the file.
+the file, and 'size' represents the size of the file in bytes.
 
 A typical output is::
 
@@ -1900,6 +1901,7 @@ A typical output is::
 	flags:	0100002
 	mnt_id:	19
 	ino:	63107
+	size:	0
 
 All locks associated with a file descriptor are shown in its fdinfo too::
 
@@ -1917,6 +1919,7 @@ Eventfd files
 	flags:	04002
 	mnt_id:	9
 	ino:	63107
+	size:   0
 	eventfd-count:	5a
 
 where 'eventfd-count' is hex value of a counter.
@@ -1930,6 +1933,7 @@ Signalfd files
 	flags:	04002
 	mnt_id:	9
 	ino:	63107
+	size:   0
 	sigmask:	0000000000000200
 
 where 'sigmask' is hex value of the signal mask associated
@@ -1944,6 +1948,7 @@ Epoll files
 	flags:	02
 	mnt_id:	9
 	ino:	63107
+	size:   0
 	tfd:        5 events:       1d data: ffffffffffffffff pos:0 ino:61af sdev:7
 
 where 'tfd' is a target file descriptor number in decimal form,
@@ -1962,6 +1967,7 @@ For inotify files the format is the following::
 	flags:	02000000
 	mnt_id:	9
 	ino:	63107
+	size:   0
 	inotify wd:3 ino:9e7e sdev:800013 mask:800afce ignored_mask:0 fhandle-bytes:8 fhandle-type:1 f_handle:7e9e0000640d1b6d
 
 where 'wd' is a watch descriptor in decimal form, i.e. a target file
@@ -1985,6 +1991,7 @@ For fanotify files the format is::
 	flags:	02
 	mnt_id:	9
 	ino:	63107
+	size:   0
 	fanotify flags:10 event-flags:0
 	fanotify mnt_id:12 mflags:40 mask:38 ignored_mask:40000003
 	fanotify ino:4f969 sdev:800013 mflags:0 mask:3b ignored_mask:40000000 fhandle-bytes:8 fhandle-type:1 f_handle:69f90400c275b5b4
@@ -2010,6 +2017,7 @@ Timerfd files
 	flags:	02
 	mnt_id:	9
 	ino:	63107
+	size:   0
 	clockid: 0
 	ticks: 0
 	settime flags: 01
diff --git a/drivers/dma-buf/dma-buf.c b/drivers/dma-buf/dma-buf.c
index 32f55640890c..5f2ae38c960f 100644
--- a/drivers/dma-buf/dma-buf.c
+++ b/drivers/dma-buf/dma-buf.c
@@ -378,7 +378,6 @@ static void dma_buf_show_fdinfo(struct seq_file *m, struct file *file)
 {
 	struct dma_buf *dmabuf = file->private_data;
 
-	seq_printf(m, "size:\t%zu\n", dmabuf->size);
 	/* Don't count the temporary reference taken inside procfs seq_show */
 	seq_printf(m, "count:\t%ld\n", file_count(dmabuf->file) - 1);
 	seq_printf(m, "exp_name:\t%s\n", dmabuf->exp_name);
diff --git a/fs/proc/fd.c b/fs/proc/fd.c
index 913bef0d2a36..464bc3f55759 100644
--- a/fs/proc/fd.c
+++ b/fs/proc/fd.c
@@ -54,10 +54,11 @@ static int seq_show(struct seq_file *m, void *v)
 	if (ret)
 		return ret;
 
-	seq_printf(m, "pos:\t%lli\nflags:\t0%o\nmnt_id:\t%i\nino:\t%lu\n",
-		   (long long)file->f_pos, f_flags,
-		   real_mount(file->f_path.mnt)->mnt_id,
-		   file_inode(file)->i_ino);
+	seq_printf(m, "pos:\t%lli\n", (long long)file->f_pos);
+	seq_printf(m, "flags:\t0%o\n", f_flags);
+	seq_printf(m, "mnt_id:\t%i\n", real_mount(file->f_path.mnt)->mnt_id);
+	seq_printf(m, "ino:\t%lu\n", file_inode(file)->i_ino);
+	seq_printf(m, "size:\t%lli\n", (long long)file_inode(file)->i_size);
 
 	/* show_fd_locks() never deferences files so a stale value is safe */
 	show_fd_locks(m, file, files);
-- 
2.37.0.rc0.161.g10f37bed90-goog


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v2 1/2] procfs: Add 'size' to /proc/<pid>/fdinfo/
@ 2022-06-23 22:06   ` Kalesh Singh
  0 siblings, 0 replies; 20+ messages in thread
From: Kalesh Singh @ 2022-06-23 22:06 UTC (permalink / raw)
  To: ckoenig.leichtzumerken, christian.koenig, viro, hch,
	stephen.s.brennan, David.Laight
  Cc: ilkos, tjmercier, surenb, kernel-team, Kalesh Singh,
	Jonathan Corbet, Sumit Semwal, Andrew Morton, Johannes Weiner,
	Christoph Anton Mitterer, Paul Gortmaker, Mike Rapoport,
	Randy Dunlap, linux-kernel, linux-fsdevel, linux-doc,
	linux-media, dri-devel, linaro-mm-sig

To be able to account the amount of memory a process is keeping pinned
by open file descriptors add a 'size' field to fdinfo output.

dmabufs fds already expose a 'size' field for this reason, remove this
and make it a common field for all fds. This allows tracking of
other types of memory (e.g. memfd and ashmem in Android).

Signed-off-by: Kalesh Singh <kaleshsingh@google.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
---

Changes in v2:
  - Add Christian's Reviewed-by

Changes from rfc:
  - Split adding 'size' and 'path' into a separate patches, per Christian
  - Split fdinfo seq_printf into separate lines, per Christian
  - Fix indentation (use tabs) in documentaion, per Randy

 Documentation/filesystems/proc.rst | 12 ++++++++++--
 drivers/dma-buf/dma-buf.c          |  1 -
 fs/proc/fd.c                       |  9 +++++----
 3 files changed, 15 insertions(+), 7 deletions(-)

diff --git a/Documentation/filesystems/proc.rst b/Documentation/filesystems/proc.rst
index 1bc91fb8c321..779c05528e87 100644
--- a/Documentation/filesystems/proc.rst
+++ b/Documentation/filesystems/proc.rst
@@ -1886,13 +1886,14 @@ if precise results are needed.
 3.8	/proc/<pid>/fdinfo/<fd> - Information about opened file
 ---------------------------------------------------------------
 This file provides information associated with an opened file. The regular
-files have at least four fields -- 'pos', 'flags', 'mnt_id' and 'ino'.
+files have at least five fields -- 'pos', 'flags', 'mnt_id', 'ino', and 'size'.
+
 The 'pos' represents the current offset of the opened file in decimal
 form [see lseek(2) for details], 'flags' denotes the octal O_xxx mask the
 file has been created with [see open(2) for details] and 'mnt_id' represents
 mount ID of the file system containing the opened file [see 3.5
 /proc/<pid>/mountinfo for details]. 'ino' represents the inode number of
-the file.
+the file, and 'size' represents the size of the file in bytes.
 
 A typical output is::
 
@@ -1900,6 +1901,7 @@ A typical output is::
 	flags:	0100002
 	mnt_id:	19
 	ino:	63107
+	size:	0
 
 All locks associated with a file descriptor are shown in its fdinfo too::
 
@@ -1917,6 +1919,7 @@ Eventfd files
 	flags:	04002
 	mnt_id:	9
 	ino:	63107
+	size:   0
 	eventfd-count:	5a
 
 where 'eventfd-count' is hex value of a counter.
@@ -1930,6 +1933,7 @@ Signalfd files
 	flags:	04002
 	mnt_id:	9
 	ino:	63107
+	size:   0
 	sigmask:	0000000000000200
 
 where 'sigmask' is hex value of the signal mask associated
@@ -1944,6 +1948,7 @@ Epoll files
 	flags:	02
 	mnt_id:	9
 	ino:	63107
+	size:   0
 	tfd:        5 events:       1d data: ffffffffffffffff pos:0 ino:61af sdev:7
 
 where 'tfd' is a target file descriptor number in decimal form,
@@ -1962,6 +1967,7 @@ For inotify files the format is the following::
 	flags:	02000000
 	mnt_id:	9
 	ino:	63107
+	size:   0
 	inotify wd:3 ino:9e7e sdev:800013 mask:800afce ignored_mask:0 fhandle-bytes:8 fhandle-type:1 f_handle:7e9e0000640d1b6d
 
 where 'wd' is a watch descriptor in decimal form, i.e. a target file
@@ -1985,6 +1991,7 @@ For fanotify files the format is::
 	flags:	02
 	mnt_id:	9
 	ino:	63107
+	size:   0
 	fanotify flags:10 event-flags:0
 	fanotify mnt_id:12 mflags:40 mask:38 ignored_mask:40000003
 	fanotify ino:4f969 sdev:800013 mflags:0 mask:3b ignored_mask:40000000 fhandle-bytes:8 fhandle-type:1 f_handle:69f90400c275b5b4
@@ -2010,6 +2017,7 @@ Timerfd files
 	flags:	02
 	mnt_id:	9
 	ino:	63107
+	size:   0
 	clockid: 0
 	ticks: 0
 	settime flags: 01
diff --git a/drivers/dma-buf/dma-buf.c b/drivers/dma-buf/dma-buf.c
index 32f55640890c..5f2ae38c960f 100644
--- a/drivers/dma-buf/dma-buf.c
+++ b/drivers/dma-buf/dma-buf.c
@@ -378,7 +378,6 @@ static void dma_buf_show_fdinfo(struct seq_file *m, struct file *file)
 {
 	struct dma_buf *dmabuf = file->private_data;
 
-	seq_printf(m, "size:\t%zu\n", dmabuf->size);
 	/* Don't count the temporary reference taken inside procfs seq_show */
 	seq_printf(m, "count:\t%ld\n", file_count(dmabuf->file) - 1);
 	seq_printf(m, "exp_name:\t%s\n", dmabuf->exp_name);
diff --git a/fs/proc/fd.c b/fs/proc/fd.c
index 913bef0d2a36..464bc3f55759 100644
--- a/fs/proc/fd.c
+++ b/fs/proc/fd.c
@@ -54,10 +54,11 @@ static int seq_show(struct seq_file *m, void *v)
 	if (ret)
 		return ret;
 
-	seq_printf(m, "pos:\t%lli\nflags:\t0%o\nmnt_id:\t%i\nino:\t%lu\n",
-		   (long long)file->f_pos, f_flags,
-		   real_mount(file->f_path.mnt)->mnt_id,
-		   file_inode(file)->i_ino);
+	seq_printf(m, "pos:\t%lli\n", (long long)file->f_pos);
+	seq_printf(m, "flags:\t0%o\n", f_flags);
+	seq_printf(m, "mnt_id:\t%i\n", real_mount(file->f_path.mnt)->mnt_id);
+	seq_printf(m, "ino:\t%lu\n", file_inode(file)->i_ino);
+	seq_printf(m, "size:\t%lli\n", (long long)file_inode(file)->i_size);
 
 	/* show_fd_locks() never deferences files so a stale value is safe */
 	show_fd_locks(m, file, files);
-- 
2.37.0.rc0.161.g10f37bed90-goog


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v2 2/2] procfs: Add 'path' to /proc/<pid>/fdinfo/
  2022-06-23 22:06 ` Kalesh Singh
@ 2022-06-23 22:06   ` Kalesh Singh
  -1 siblings, 0 replies; 20+ messages in thread
From: Kalesh Singh @ 2022-06-23 22:06 UTC (permalink / raw)
  To: ckoenig.leichtzumerken, christian.koenig, viro, hch,
	stephen.s.brennan, David.Laight
  Cc: ilkos, tjmercier, surenb, kernel-team, Kalesh Singh,
	Jonathan Corbet, Sumit Semwal, Andrew Morton, Johannes Weiner,
	Christoph Anton Mitterer, Colin Cross, Paul Gortmaker,
	Randy Dunlap, linux-kernel, linux-fsdevel, linux-doc,
	linux-media, dri-devel, linaro-mm-sig

In order to identify the type of memory a process has pinned through
its open fds, add the file path to fdinfo output. This allows
identifying memory types based on common prefixes:
  e.g. "/memfd...", "/dmabuf...", "/dev/ashmem...".

To be cautious, only expose the paths for anonymous inodes, and this
also avoids printing path names with strange characters.

Access to /proc/<pid>/fdinfo is governed by PTRACE_MODE_READ_FSCREDS
the same as /proc/<pid>/maps which also exposes the file path of
mappings; so the security permissions for accessing path is consistent
with that of /proc/<pid>/maps.

Signed-off-by: Kalesh Singh <kaleshsingh@google.com>
---

Changes in v2:
  - Only add path field for files with anon inodes.

Changes from rfc:
  - Split adding 'size' and 'path' into a separate patches, per Christian
  - Fix indentation (use tabs) in documentaion, per Randy

 Documentation/filesystems/proc.rst | 10 ++++++++++
 fs/libfs.c                         |  9 +++++++++
 fs/proc/fd.c                       | 13 +++++++++++--
 include/linux/fs.h                 |  1 +
 4 files changed, 31 insertions(+), 2 deletions(-)

diff --git a/Documentation/filesystems/proc.rst b/Documentation/filesystems/proc.rst
index 779c05528e87..ca23a9af4845 100644
--- a/Documentation/filesystems/proc.rst
+++ b/Documentation/filesystems/proc.rst
@@ -1907,6 +1907,9 @@ All locks associated with a file descriptor are shown in its fdinfo too::
 
     lock:       1: FLOCK  ADVISORY  WRITE 359 00:13:11691 0 EOF
 
+Files with anonymous inodes have an additional 'path' field which represents
+the anonymous file path.
+
 The files such as eventfd, fsnotify, signalfd, epoll among the regular pos/flags
 pair provide additional information particular to the objects they represent.
 
@@ -1920,6 +1923,7 @@ Eventfd files
 	mnt_id:	9
 	ino:	63107
 	size:   0
+	path:	anon_inode:[eventfd]
 	eventfd-count:	5a
 
 where 'eventfd-count' is hex value of a counter.
@@ -1934,6 +1938,7 @@ Signalfd files
 	mnt_id:	9
 	ino:	63107
 	size:   0
+	path:	anon_inode:[signalfd]
 	sigmask:	0000000000000200
 
 where 'sigmask' is hex value of the signal mask associated
@@ -1949,6 +1954,7 @@ Epoll files
 	mnt_id:	9
 	ino:	63107
 	size:   0
+	path:	anon_inode:[eventpoll]
 	tfd:        5 events:       1d data: ffffffffffffffff pos:0 ino:61af sdev:7
 
 where 'tfd' is a target file descriptor number in decimal form,
@@ -1968,6 +1974,7 @@ For inotify files the format is the following::
 	mnt_id:	9
 	ino:	63107
 	size:   0
+	path:	anon_inode:inotify
 	inotify wd:3 ino:9e7e sdev:800013 mask:800afce ignored_mask:0 fhandle-bytes:8 fhandle-type:1 f_handle:7e9e0000640d1b6d
 
 where 'wd' is a watch descriptor in decimal form, i.e. a target file
@@ -1992,6 +1999,7 @@ For fanotify files the format is::
 	mnt_id:	9
 	ino:	63107
 	size:   0
+	path:	anon_inode:[fanotify]
 	fanotify flags:10 event-flags:0
 	fanotify mnt_id:12 mflags:40 mask:38 ignored_mask:40000003
 	fanotify ino:4f969 sdev:800013 mflags:0 mask:3b ignored_mask:40000000 fhandle-bytes:8 fhandle-type:1 f_handle:69f90400c275b5b4
@@ -2018,6 +2026,7 @@ Timerfd files
 	mnt_id:	9
 	ino:	63107
 	size:   0
+	path:	anon_inode:[timerfd]
 	clockid: 0
 	ticks: 0
 	settime flags: 01
@@ -2042,6 +2051,7 @@ DMA Buffer files
 	mnt_id:	9
 	ino:	63107
 	size:   32768
+	path:	/dmabuf:
 	count:  2
 	exp_name:  system-heap
 
diff --git a/fs/libfs.c b/fs/libfs.c
index 31b0ddf01c31..6911749b4da7 100644
--- a/fs/libfs.c
+++ b/fs/libfs.c
@@ -1217,6 +1217,15 @@ void kfree_link(void *p)
 }
 EXPORT_SYMBOL(kfree_link);
 
+static const struct address_space_operations anon_aops = {
+	.dirty_folio	= noop_dirty_folio,
+};
+
+bool is_anon_inode(struct inode *inode)
+{
+	return inode->i_mapping->a_ops == &anon_aops;
+}
+
 struct inode *alloc_anon_inode(struct super_block *s)
 {
 	static const struct address_space_operations anon_aops = {
diff --git a/fs/proc/fd.c b/fs/proc/fd.c
index 464bc3f55759..5bac79a2fa51 100644
--- a/fs/proc/fd.c
+++ b/fs/proc/fd.c
@@ -23,6 +23,7 @@ static int seq_show(struct seq_file *m, void *v)
 	struct files_struct *files = NULL;
 	int f_flags = 0, ret = -ENOENT;
 	struct file *file = NULL;
+	struct inode *inode = NULL;
 	struct task_struct *task;
 
 	task = get_proc_task(m->private);
@@ -54,11 +55,19 @@ static int seq_show(struct seq_file *m, void *v)
 	if (ret)
 		return ret;
 
+	inode = file_inode(file);
+
 	seq_printf(m, "pos:\t%lli\n", (long long)file->f_pos);
 	seq_printf(m, "flags:\t0%o\n", f_flags);
 	seq_printf(m, "mnt_id:\t%i\n", real_mount(file->f_path.mnt)->mnt_id);
-	seq_printf(m, "ino:\t%lu\n", file_inode(file)->i_ino);
-	seq_printf(m, "size:\t%lli\n", (long long)file_inode(file)->i_size);
+	seq_printf(m, "ino:\t%lu\n", inode->i_ino);
+	seq_printf(m, "size:\t%lli\n", (long long)inode->i_size);
+
+	if (is_anon_inode(inode)) {
+		seq_puts(m, "path:\t");
+		seq_file_path(m, file, "\n");
+		seq_putc(m, '\n');
+	}
 
 	/* show_fd_locks() never deferences files so a stale value is safe */
 	show_fd_locks(m, file, files);
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 9ad5e3520fae..73449e620b66 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -3111,6 +3111,7 @@ extern void page_put_link(void *);
 extern int page_symlink(struct inode *inode, const char *symname, int len);
 extern const struct inode_operations page_symlink_inode_operations;
 extern void kfree_link(void *);
+extern bool is_anon_inode(struct inode *inode);
 void generic_fillattr(struct user_namespace *, struct inode *, struct kstat *);
 void generic_fill_statx_attr(struct inode *inode, struct kstat *stat);
 extern int vfs_getattr_nosec(const struct path *, struct kstat *, u32, unsigned int);
-- 
2.37.0.rc0.161.g10f37bed90-goog


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v2 2/2] procfs: Add 'path' to /proc/<pid>/fdinfo/
@ 2022-06-23 22:06   ` Kalesh Singh
  0 siblings, 0 replies; 20+ messages in thread
From: Kalesh Singh @ 2022-06-23 22:06 UTC (permalink / raw)
  To: ckoenig.leichtzumerken, christian.koenig, viro, hch,
	stephen.s.brennan, David.Laight
  Cc: Randy Dunlap, linux-doc, Jonathan Corbet,
	Christoph Anton Mitterer, kernel-team, Johannes Weiner, ilkos,
	linux-kernel, Colin Cross, Sumit Semwal, linaro-mm-sig,
	Paul Gortmaker, dri-devel, Kalesh Singh, linux-fsdevel,
	Andrew Morton, surenb, tjmercier, linux-media

In order to identify the type of memory a process has pinned through
its open fds, add the file path to fdinfo output. This allows
identifying memory types based on common prefixes:
  e.g. "/memfd...", "/dmabuf...", "/dev/ashmem...".

To be cautious, only expose the paths for anonymous inodes, and this
also avoids printing path names with strange characters.

Access to /proc/<pid>/fdinfo is governed by PTRACE_MODE_READ_FSCREDS
the same as /proc/<pid>/maps which also exposes the file path of
mappings; so the security permissions for accessing path is consistent
with that of /proc/<pid>/maps.

Signed-off-by: Kalesh Singh <kaleshsingh@google.com>
---

Changes in v2:
  - Only add path field for files with anon inodes.

Changes from rfc:
  - Split adding 'size' and 'path' into a separate patches, per Christian
  - Fix indentation (use tabs) in documentaion, per Randy

 Documentation/filesystems/proc.rst | 10 ++++++++++
 fs/libfs.c                         |  9 +++++++++
 fs/proc/fd.c                       | 13 +++++++++++--
 include/linux/fs.h                 |  1 +
 4 files changed, 31 insertions(+), 2 deletions(-)

diff --git a/Documentation/filesystems/proc.rst b/Documentation/filesystems/proc.rst
index 779c05528e87..ca23a9af4845 100644
--- a/Documentation/filesystems/proc.rst
+++ b/Documentation/filesystems/proc.rst
@@ -1907,6 +1907,9 @@ All locks associated with a file descriptor are shown in its fdinfo too::
 
     lock:       1: FLOCK  ADVISORY  WRITE 359 00:13:11691 0 EOF
 
+Files with anonymous inodes have an additional 'path' field which represents
+the anonymous file path.
+
 The files such as eventfd, fsnotify, signalfd, epoll among the regular pos/flags
 pair provide additional information particular to the objects they represent.
 
@@ -1920,6 +1923,7 @@ Eventfd files
 	mnt_id:	9
 	ino:	63107
 	size:   0
+	path:	anon_inode:[eventfd]
 	eventfd-count:	5a
 
 where 'eventfd-count' is hex value of a counter.
@@ -1934,6 +1938,7 @@ Signalfd files
 	mnt_id:	9
 	ino:	63107
 	size:   0
+	path:	anon_inode:[signalfd]
 	sigmask:	0000000000000200
 
 where 'sigmask' is hex value of the signal mask associated
@@ -1949,6 +1954,7 @@ Epoll files
 	mnt_id:	9
 	ino:	63107
 	size:   0
+	path:	anon_inode:[eventpoll]
 	tfd:        5 events:       1d data: ffffffffffffffff pos:0 ino:61af sdev:7
 
 where 'tfd' is a target file descriptor number in decimal form,
@@ -1968,6 +1974,7 @@ For inotify files the format is the following::
 	mnt_id:	9
 	ino:	63107
 	size:   0
+	path:	anon_inode:inotify
 	inotify wd:3 ino:9e7e sdev:800013 mask:800afce ignored_mask:0 fhandle-bytes:8 fhandle-type:1 f_handle:7e9e0000640d1b6d
 
 where 'wd' is a watch descriptor in decimal form, i.e. a target file
@@ -1992,6 +1999,7 @@ For fanotify files the format is::
 	mnt_id:	9
 	ino:	63107
 	size:   0
+	path:	anon_inode:[fanotify]
 	fanotify flags:10 event-flags:0
 	fanotify mnt_id:12 mflags:40 mask:38 ignored_mask:40000003
 	fanotify ino:4f969 sdev:800013 mflags:0 mask:3b ignored_mask:40000000 fhandle-bytes:8 fhandle-type:1 f_handle:69f90400c275b5b4
@@ -2018,6 +2026,7 @@ Timerfd files
 	mnt_id:	9
 	ino:	63107
 	size:   0
+	path:	anon_inode:[timerfd]
 	clockid: 0
 	ticks: 0
 	settime flags: 01
@@ -2042,6 +2051,7 @@ DMA Buffer files
 	mnt_id:	9
 	ino:	63107
 	size:   32768
+	path:	/dmabuf:
 	count:  2
 	exp_name:  system-heap
 
diff --git a/fs/libfs.c b/fs/libfs.c
index 31b0ddf01c31..6911749b4da7 100644
--- a/fs/libfs.c
+++ b/fs/libfs.c
@@ -1217,6 +1217,15 @@ void kfree_link(void *p)
 }
 EXPORT_SYMBOL(kfree_link);
 
+static const struct address_space_operations anon_aops = {
+	.dirty_folio	= noop_dirty_folio,
+};
+
+bool is_anon_inode(struct inode *inode)
+{
+	return inode->i_mapping->a_ops == &anon_aops;
+}
+
 struct inode *alloc_anon_inode(struct super_block *s)
 {
 	static const struct address_space_operations anon_aops = {
diff --git a/fs/proc/fd.c b/fs/proc/fd.c
index 464bc3f55759..5bac79a2fa51 100644
--- a/fs/proc/fd.c
+++ b/fs/proc/fd.c
@@ -23,6 +23,7 @@ static int seq_show(struct seq_file *m, void *v)
 	struct files_struct *files = NULL;
 	int f_flags = 0, ret = -ENOENT;
 	struct file *file = NULL;
+	struct inode *inode = NULL;
 	struct task_struct *task;
 
 	task = get_proc_task(m->private);
@@ -54,11 +55,19 @@ static int seq_show(struct seq_file *m, void *v)
 	if (ret)
 		return ret;
 
+	inode = file_inode(file);
+
 	seq_printf(m, "pos:\t%lli\n", (long long)file->f_pos);
 	seq_printf(m, "flags:\t0%o\n", f_flags);
 	seq_printf(m, "mnt_id:\t%i\n", real_mount(file->f_path.mnt)->mnt_id);
-	seq_printf(m, "ino:\t%lu\n", file_inode(file)->i_ino);
-	seq_printf(m, "size:\t%lli\n", (long long)file_inode(file)->i_size);
+	seq_printf(m, "ino:\t%lu\n", inode->i_ino);
+	seq_printf(m, "size:\t%lli\n", (long long)inode->i_size);
+
+	if (is_anon_inode(inode)) {
+		seq_puts(m, "path:\t");
+		seq_file_path(m, file, "\n");
+		seq_putc(m, '\n');
+	}
 
 	/* show_fd_locks() never deferences files so a stale value is safe */
 	show_fd_locks(m, file, files);
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 9ad5e3520fae..73449e620b66 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -3111,6 +3111,7 @@ extern void page_put_link(void *);
 extern int page_symlink(struct inode *inode, const char *symname, int len);
 extern const struct inode_operations page_symlink_inode_operations;
 extern void kfree_link(void *);
+extern bool is_anon_inode(struct inode *inode);
 void generic_fillattr(struct user_namespace *, struct inode *, struct kstat *);
 void generic_fill_statx_attr(struct inode *inode, struct kstat *stat);
 extern int vfs_getattr_nosec(const struct path *, struct kstat *, u32, unsigned int);
-- 
2.37.0.rc0.161.g10f37bed90-goog


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 1/2] procfs: Add 'size' to /proc/<pid>/fdinfo/
  2022-06-23 22:06   ` Kalesh Singh
@ 2022-06-28 11:53     ` Brian Foster
  -1 siblings, 0 replies; 20+ messages in thread
From: Brian Foster @ 2022-06-28 11:53 UTC (permalink / raw)
  To: Kalesh Singh
  Cc: ckoenig.leichtzumerken, christian.koenig, viro, hch,
	stephen.s.brennan, David.Laight, ilkos, tjmercier, surenb,
	kernel-team, Jonathan Corbet, Sumit Semwal, Andrew Morton,
	Johannes Weiner, Christoph Anton Mitterer, Paul Gortmaker,
	Mike Rapoport, Randy Dunlap, linux-kernel, linux-fsdevel,
	linux-doc, linux-media, dri-devel, linaro-mm-sig

On Thu, Jun 23, 2022 at 03:06:06PM -0700, Kalesh Singh wrote:
> To be able to account the amount of memory a process is keeping pinned
> by open file descriptors add a 'size' field to fdinfo output.
> 
> dmabufs fds already expose a 'size' field for this reason, remove this
> and make it a common field for all fds. This allows tracking of
> other types of memory (e.g. memfd and ashmem in Android).
> 
> Signed-off-by: Kalesh Singh <kaleshsingh@google.com>
> Reviewed-by: Christian König <christian.koenig@amd.com>
> ---
> 
> Changes in v2:
>   - Add Christian's Reviewed-by
> 
> Changes from rfc:
>   - Split adding 'size' and 'path' into a separate patches, per Christian
>   - Split fdinfo seq_printf into separate lines, per Christian
>   - Fix indentation (use tabs) in documentaion, per Randy
> 
>  Documentation/filesystems/proc.rst | 12 ++++++++++--
>  drivers/dma-buf/dma-buf.c          |  1 -
>  fs/proc/fd.c                       |  9 +++++----
>  3 files changed, 15 insertions(+), 7 deletions(-)
> 
...
> diff --git a/fs/proc/fd.c b/fs/proc/fd.c
> index 913bef0d2a36..464bc3f55759 100644
> --- a/fs/proc/fd.c
> +++ b/fs/proc/fd.c
> @@ -54,10 +54,11 @@ static int seq_show(struct seq_file *m, void *v)
>  	if (ret)
>  		return ret;
>  
> -	seq_printf(m, "pos:\t%lli\nflags:\t0%o\nmnt_id:\t%i\nino:\t%lu\n",
> -		   (long long)file->f_pos, f_flags,
> -		   real_mount(file->f_path.mnt)->mnt_id,
> -		   file_inode(file)->i_ino);
> +	seq_printf(m, "pos:\t%lli\n", (long long)file->f_pos);
> +	seq_printf(m, "flags:\t0%o\n", f_flags);
> +	seq_printf(m, "mnt_id:\t%i\n", real_mount(file->f_path.mnt)->mnt_id);
> +	seq_printf(m, "ino:\t%lu\n", file_inode(file)->i_ino);
> +	seq_printf(m, "size:\t%lli\n", (long long)file_inode(file)->i_size);

Hi Kalesh,

Any reason not to use i_size_read() here?

Also not sure if it matters that much for your use case, but something
worth noting at least with shmem is that one can do something like:

# cat /proc/meminfo | grep Shmem:
Shmem:               764 kB
# xfs_io -fc "falloc -k 0 10m" ./file
# ls -alh file 
-rw-------. 1 root root 0 Jun 28 07:22 file
# stat file 
  File: file
  Size: 0               Blocks: 20480      IO Block: 4096   regular empty file
# cat /proc/meminfo | grep Shmem:
Shmem:             11004 kB

... where the resulting memory usage isn't reflected in i_size (but is
is in i_blocks/bytes).

Brian

>  
>  	/* show_fd_locks() never deferences files so a stale value is safe */
>  	show_fd_locks(m, file, files);
> -- 
> 2.37.0.rc0.161.g10f37bed90-goog
> 


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 1/2] procfs: Add 'size' to /proc/<pid>/fdinfo/
@ 2022-06-28 11:53     ` Brian Foster
  0 siblings, 0 replies; 20+ messages in thread
From: Brian Foster @ 2022-06-28 11:53 UTC (permalink / raw)
  To: Kalesh Singh
  Cc: linux-doc, dri-devel, stephen.s.brennan, Paul Gortmaker,
	Sumit Semwal, Jonathan Corbet, Christoph Anton Mitterer, hch,
	kernel-team, linux-media, ckoenig.leichtzumerken, linaro-mm-sig,
	viro, surenb, tjmercier, Randy Dunlap, ilkos, linux-kernel,
	David.Laight, Johannes Weiner, linux-fsdevel, Andrew Morton,
	christian.koenig, Mike Rapoport

On Thu, Jun 23, 2022 at 03:06:06PM -0700, Kalesh Singh wrote:
> To be able to account the amount of memory a process is keeping pinned
> by open file descriptors add a 'size' field to fdinfo output.
> 
> dmabufs fds already expose a 'size' field for this reason, remove this
> and make it a common field for all fds. This allows tracking of
> other types of memory (e.g. memfd and ashmem in Android).
> 
> Signed-off-by: Kalesh Singh <kaleshsingh@google.com>
> Reviewed-by: Christian König <christian.koenig@amd.com>
> ---
> 
> Changes in v2:
>   - Add Christian's Reviewed-by
> 
> Changes from rfc:
>   - Split adding 'size' and 'path' into a separate patches, per Christian
>   - Split fdinfo seq_printf into separate lines, per Christian
>   - Fix indentation (use tabs) in documentaion, per Randy
> 
>  Documentation/filesystems/proc.rst | 12 ++++++++++--
>  drivers/dma-buf/dma-buf.c          |  1 -
>  fs/proc/fd.c                       |  9 +++++----
>  3 files changed, 15 insertions(+), 7 deletions(-)
> 
...
> diff --git a/fs/proc/fd.c b/fs/proc/fd.c
> index 913bef0d2a36..464bc3f55759 100644
> --- a/fs/proc/fd.c
> +++ b/fs/proc/fd.c
> @@ -54,10 +54,11 @@ static int seq_show(struct seq_file *m, void *v)
>  	if (ret)
>  		return ret;
>  
> -	seq_printf(m, "pos:\t%lli\nflags:\t0%o\nmnt_id:\t%i\nino:\t%lu\n",
> -		   (long long)file->f_pos, f_flags,
> -		   real_mount(file->f_path.mnt)->mnt_id,
> -		   file_inode(file)->i_ino);
> +	seq_printf(m, "pos:\t%lli\n", (long long)file->f_pos);
> +	seq_printf(m, "flags:\t0%o\n", f_flags);
> +	seq_printf(m, "mnt_id:\t%i\n", real_mount(file->f_path.mnt)->mnt_id);
> +	seq_printf(m, "ino:\t%lu\n", file_inode(file)->i_ino);
> +	seq_printf(m, "size:\t%lli\n", (long long)file_inode(file)->i_size);

Hi Kalesh,

Any reason not to use i_size_read() here?

Also not sure if it matters that much for your use case, but something
worth noting at least with shmem is that one can do something like:

# cat /proc/meminfo | grep Shmem:
Shmem:               764 kB
# xfs_io -fc "falloc -k 0 10m" ./file
# ls -alh file 
-rw-------. 1 root root 0 Jun 28 07:22 file
# stat file 
  File: file
  Size: 0               Blocks: 20480      IO Block: 4096   regular empty file
# cat /proc/meminfo | grep Shmem:
Shmem:             11004 kB

... where the resulting memory usage isn't reflected in i_size (but is
is in i_blocks/bytes).

Brian

>  
>  	/* show_fd_locks() never deferences files so a stale value is safe */
>  	show_fd_locks(m, file, files);
> -- 
> 2.37.0.rc0.161.g10f37bed90-goog
> 


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 1/2] procfs: Add 'size' to /proc/<pid>/fdinfo/
  2022-06-28 11:53     ` Brian Foster
@ 2022-06-28 22:38       ` Kalesh Singh
  -1 siblings, 0 replies; 20+ messages in thread
From: Kalesh Singh @ 2022-06-28 22:38 UTC (permalink / raw)
  To: Brian Foster
  Cc: Christian König, Christian König, Alexander Viro,
	Christoph Hellwig, Stephen Brennan, David.Laight, Ioannis Ilkos,
	T.J. Mercier, Suren Baghdasaryan, Cc: Android Kernel,
	Jonathan Corbet, Sumit Semwal, Andrew Morton, Johannes Weiner,
	Christoph Anton Mitterer, Paul Gortmaker, Mike Rapoport,
	Randy Dunlap, LKML, linux-fsdevel, open list:DOCUMENTATION,
	Linux Media Mailing List, DRI mailing list,
	moderated list:DMA BUFFER SHARING FRAMEWORK

On Tue, Jun 28, 2022 at 4:54 AM Brian Foster <bfoster@redhat.com> wrote:
>
> On Thu, Jun 23, 2022 at 03:06:06PM -0700, Kalesh Singh wrote:
> > To be able to account the amount of memory a process is keeping pinned
> > by open file descriptors add a 'size' field to fdinfo output.
> >
> > dmabufs fds already expose a 'size' field for this reason, remove this
> > and make it a common field for all fds. This allows tracking of
> > other types of memory (e.g. memfd and ashmem in Android).
> >
> > Signed-off-by: Kalesh Singh <kaleshsingh@google.com>
> > Reviewed-by: Christian König <christian.koenig@amd.com>
> > ---
> >
> > Changes in v2:
> >   - Add Christian's Reviewed-by
> >
> > Changes from rfc:
> >   - Split adding 'size' and 'path' into a separate patches, per Christian
> >   - Split fdinfo seq_printf into separate lines, per Christian
> >   - Fix indentation (use tabs) in documentaion, per Randy
> >
> >  Documentation/filesystems/proc.rst | 12 ++++++++++--
> >  drivers/dma-buf/dma-buf.c          |  1 -
> >  fs/proc/fd.c                       |  9 +++++----
> >  3 files changed, 15 insertions(+), 7 deletions(-)
> >
> ...
> > diff --git a/fs/proc/fd.c b/fs/proc/fd.c
> > index 913bef0d2a36..464bc3f55759 100644
> > --- a/fs/proc/fd.c
> > +++ b/fs/proc/fd.c
> > @@ -54,10 +54,11 @@ static int seq_show(struct seq_file *m, void *v)
> >       if (ret)
> >               return ret;
> >
> > -     seq_printf(m, "pos:\t%lli\nflags:\t0%o\nmnt_id:\t%i\nino:\t%lu\n",
> > -                (long long)file->f_pos, f_flags,
> > -                real_mount(file->f_path.mnt)->mnt_id,
> > -                file_inode(file)->i_ino);
> > +     seq_printf(m, "pos:\t%lli\n", (long long)file->f_pos);
> > +     seq_printf(m, "flags:\t0%o\n", f_flags);
> > +     seq_printf(m, "mnt_id:\t%i\n", real_mount(file->f_path.mnt)->mnt_id);
> > +     seq_printf(m, "ino:\t%lu\n", file_inode(file)->i_ino);
> > +     seq_printf(m, "size:\t%lli\n", (long long)file_inode(file)->i_size);
>
> Hi Kalesh,
>
> Any reason not to use i_size_read() here?

Hi Brian.

Thanks for pointing this out. You are right, we should use
i_size_read() here. I'll update in the next version.

>
> Also not sure if it matters that much for your use case, but something
> worth noting at least with shmem is that one can do something like:
>
> # cat /proc/meminfo | grep Shmem:
> Shmem:               764 kB
> # xfs_io -fc "falloc -k 0 10m" ./file
> # ls -alh file
> -rw-------. 1 root root 0 Jun 28 07:22 file
> # stat file
>   File: file
>   Size: 0               Blocks: 20480      IO Block: 4096   regular empty file
> # cat /proc/meminfo | grep Shmem:
> Shmem:             11004 kB
>
> ... where the resulting memory usage isn't reflected in i_size (but is
> is in i_blocks/bytes).

I tried a similar experiment a few times, but I don't see the same
results. In my case, there is not any change in shmem. IIUC the
fallocate is allocating the disk space not shared memory.

cat /proc/meminfo > meminfo.start
xfs_io -fc "falloc -k 0 50m" ./xfs_file
cat /proc/meminfo > meminfo.stop
tail -n +1 meminfo.st* | grep -i '==\|Shmem:'

==> meminfo.start <==
Shmem:               484 kB
==> meminfo.stop <==
Shmem:               484 kB

ls -lh xfs_file
-rw------- 1 root root 0 Jun 28 15:12 xfs_file

stat xfs_file
  File: xfs_file
  Size: 0               Blocks: 102400     IO Block: 4096   regular empty file

Thanks,
Kalesh

>
> Brian
>
> >
> >       /* show_fd_locks() never deferences files so a stale value is safe */
> >       show_fd_locks(m, file, files);
> > --
> > 2.37.0.rc0.161.g10f37bed90-goog
> >
>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 1/2] procfs: Add 'size' to /proc/<pid>/fdinfo/
@ 2022-06-28 22:38       ` Kalesh Singh
  0 siblings, 0 replies; 20+ messages in thread
From: Kalesh Singh @ 2022-06-28 22:38 UTC (permalink / raw)
  To: Brian Foster
  Cc: open list:DOCUMENTATION, DRI mailing list, Stephen Brennan,
	Paul Gortmaker, Sumit Semwal, Jonathan Corbet,
	Christoph Anton Mitterer, Christoph Hellwig, Cc: Android Kernel,
	Linux Media Mailing List, Christian König,
	moderated list:DMA BUFFER SHARING FRAMEWORK, Alexander Viro,
	Suren Baghdasaryan, T.J. Mercier, Randy Dunlap, Ioannis Ilkos,
	LKML, David.Laight, Johannes Weiner, linux-fsdevel,
	Andrew Morton, Christian König, Mike Rapoport

On Tue, Jun 28, 2022 at 4:54 AM Brian Foster <bfoster@redhat.com> wrote:
>
> On Thu, Jun 23, 2022 at 03:06:06PM -0700, Kalesh Singh wrote:
> > To be able to account the amount of memory a process is keeping pinned
> > by open file descriptors add a 'size' field to fdinfo output.
> >
> > dmabufs fds already expose a 'size' field for this reason, remove this
> > and make it a common field for all fds. This allows tracking of
> > other types of memory (e.g. memfd and ashmem in Android).
> >
> > Signed-off-by: Kalesh Singh <kaleshsingh@google.com>
> > Reviewed-by: Christian König <christian.koenig@amd.com>
> > ---
> >
> > Changes in v2:
> >   - Add Christian's Reviewed-by
> >
> > Changes from rfc:
> >   - Split adding 'size' and 'path' into a separate patches, per Christian
> >   - Split fdinfo seq_printf into separate lines, per Christian
> >   - Fix indentation (use tabs) in documentaion, per Randy
> >
> >  Documentation/filesystems/proc.rst | 12 ++++++++++--
> >  drivers/dma-buf/dma-buf.c          |  1 -
> >  fs/proc/fd.c                       |  9 +++++----
> >  3 files changed, 15 insertions(+), 7 deletions(-)
> >
> ...
> > diff --git a/fs/proc/fd.c b/fs/proc/fd.c
> > index 913bef0d2a36..464bc3f55759 100644
> > --- a/fs/proc/fd.c
> > +++ b/fs/proc/fd.c
> > @@ -54,10 +54,11 @@ static int seq_show(struct seq_file *m, void *v)
> >       if (ret)
> >               return ret;
> >
> > -     seq_printf(m, "pos:\t%lli\nflags:\t0%o\nmnt_id:\t%i\nino:\t%lu\n",
> > -                (long long)file->f_pos, f_flags,
> > -                real_mount(file->f_path.mnt)->mnt_id,
> > -                file_inode(file)->i_ino);
> > +     seq_printf(m, "pos:\t%lli\n", (long long)file->f_pos);
> > +     seq_printf(m, "flags:\t0%o\n", f_flags);
> > +     seq_printf(m, "mnt_id:\t%i\n", real_mount(file->f_path.mnt)->mnt_id);
> > +     seq_printf(m, "ino:\t%lu\n", file_inode(file)->i_ino);
> > +     seq_printf(m, "size:\t%lli\n", (long long)file_inode(file)->i_size);
>
> Hi Kalesh,
>
> Any reason not to use i_size_read() here?

Hi Brian.

Thanks for pointing this out. You are right, we should use
i_size_read() here. I'll update in the next version.

>
> Also not sure if it matters that much for your use case, but something
> worth noting at least with shmem is that one can do something like:
>
> # cat /proc/meminfo | grep Shmem:
> Shmem:               764 kB
> # xfs_io -fc "falloc -k 0 10m" ./file
> # ls -alh file
> -rw-------. 1 root root 0 Jun 28 07:22 file
> # stat file
>   File: file
>   Size: 0               Blocks: 20480      IO Block: 4096   regular empty file
> # cat /proc/meminfo | grep Shmem:
> Shmem:             11004 kB
>
> ... where the resulting memory usage isn't reflected in i_size (but is
> is in i_blocks/bytes).

I tried a similar experiment a few times, but I don't see the same
results. In my case, there is not any change in shmem. IIUC the
fallocate is allocating the disk space not shared memory.

cat /proc/meminfo > meminfo.start
xfs_io -fc "falloc -k 0 50m" ./xfs_file
cat /proc/meminfo > meminfo.stop
tail -n +1 meminfo.st* | grep -i '==\|Shmem:'

==> meminfo.start <==
Shmem:               484 kB
==> meminfo.stop <==
Shmem:               484 kB

ls -lh xfs_file
-rw------- 1 root root 0 Jun 28 15:12 xfs_file

stat xfs_file
  File: xfs_file
  Size: 0               Blocks: 102400     IO Block: 4096   regular empty file

Thanks,
Kalesh

>
> Brian
>
> >
> >       /* show_fd_locks() never deferences files so a stale value is safe */
> >       show_fd_locks(m, file, files);
> > --
> > 2.37.0.rc0.161.g10f37bed90-goog
> >
>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 1/2] procfs: Add 'size' to /proc/<pid>/fdinfo/
  2022-06-28 22:38       ` Kalesh Singh
@ 2022-06-29 12:23         ` Brian Foster
  -1 siblings, 0 replies; 20+ messages in thread
From: Brian Foster @ 2022-06-29 12:23 UTC (permalink / raw)
  To: Kalesh Singh
  Cc: Christian König, Christian König, Alexander Viro,
	Christoph Hellwig, Stephen Brennan, David.Laight, Ioannis Ilkos,
	T.J. Mercier, Suren Baghdasaryan, Cc: Android Kernel,
	Jonathan Corbet, Sumit Semwal, Andrew Morton, Johannes Weiner,
	Christoph Anton Mitterer, Paul Gortmaker, Mike Rapoport,
	Randy Dunlap, LKML, linux-fsdevel, open list:DOCUMENTATION,
	Linux Media Mailing List, DRI mailing list,
	moderated list:DMA BUFFER SHARING FRAMEWORK

On Tue, Jun 28, 2022 at 03:38:02PM -0700, Kalesh Singh wrote:
> On Tue, Jun 28, 2022 at 4:54 AM Brian Foster <bfoster@redhat.com> wrote:
> >
> > On Thu, Jun 23, 2022 at 03:06:06PM -0700, Kalesh Singh wrote:
> > > To be able to account the amount of memory a process is keeping pinned
> > > by open file descriptors add a 'size' field to fdinfo output.
> > >
> > > dmabufs fds already expose a 'size' field for this reason, remove this
> > > and make it a common field for all fds. This allows tracking of
> > > other types of memory (e.g. memfd and ashmem in Android).
> > >
> > > Signed-off-by: Kalesh Singh <kaleshsingh@google.com>
> > > Reviewed-by: Christian König <christian.koenig@amd.com>
> > > ---
> > >
> > > Changes in v2:
> > >   - Add Christian's Reviewed-by
> > >
> > > Changes from rfc:
> > >   - Split adding 'size' and 'path' into a separate patches, per Christian
> > >   - Split fdinfo seq_printf into separate lines, per Christian
> > >   - Fix indentation (use tabs) in documentaion, per Randy
> > >
> > >  Documentation/filesystems/proc.rst | 12 ++++++++++--
> > >  drivers/dma-buf/dma-buf.c          |  1 -
> > >  fs/proc/fd.c                       |  9 +++++----
> > >  3 files changed, 15 insertions(+), 7 deletions(-)
> > >
...
> >
> > Also not sure if it matters that much for your use case, but something
> > worth noting at least with shmem is that one can do something like:
> >
> > # cat /proc/meminfo | grep Shmem:
> > Shmem:               764 kB
> > # xfs_io -fc "falloc -k 0 10m" ./file
> > # ls -alh file
> > -rw-------. 1 root root 0 Jun 28 07:22 file
> > # stat file
> >   File: file
> >   Size: 0               Blocks: 20480      IO Block: 4096   regular empty file
> > # cat /proc/meminfo | grep Shmem:
> > Shmem:             11004 kB
> >
> > ... where the resulting memory usage isn't reflected in i_size (but is
> > is in i_blocks/bytes).
> 
> I tried a similar experiment a few times, but I don't see the same
> results. In my case, there is not any change in shmem. IIUC the
> fallocate is allocating the disk space not shared memory.
> 

Sorry, it was implied in my previous test was that I was running against
tmpfs. So regardless of fs, the fallocate keep_size semantics shown in
both cases is as expected: the underlying blocks are allocated and the
inode size is unchanged.

What wasn't totally clear to me when I read this patch was 1. whether
tmpfs refers to Shmem and 2. whether tmpfs allowed this sort of
operation. The test above seems to confirm both, however, right? E.g., a
more detailed example:

# mount | grep /tmp
tmpfs on /tmp type tmpfs (rw,nosuid,nodev,seclabel,nr_inodes=1048576,inode64)
# cat /proc/meminfo | grep Shmem:
Shmem:              5300 kB
# xfs_io -fc "falloc -k 0 1g" /tmp/file
# stat /tmp/file 
  File: /tmp/file
  Size: 0               Blocks: 2097152    IO Block: 4096   regular empty file
Device: 22h/34d Inode: 45          Links: 1
Access: (0600/-rw-------)  Uid: (    0/    root)   Gid: (    0/    root)
Context: unconfined_u:object_r:user_tmp_t:s0
Access: 2022-06-29 08:04:01.301307154 -0400
Modify: 2022-06-29 08:04:01.301307154 -0400
Change: 2022-06-29 08:04:01.451312834 -0400
 Birth: 2022-06-29 08:04:01.301307154 -0400
# cat /proc/meminfo | grep Shmem:
Shmem:           1053876 kB
# rm -f /tmp/file 
# cat /proc/meminfo | grep Shmem:
Shmem:              5300 kB

So clearly this impacts Shmem.. was your test run against tmpfs or some
other (disk based) fs?

FWIW, I don't have any objection to exposing inode size if it's commonly
useful information. My feedback was more just an fyi that i_size doesn't
necessarily reflect underlying space consumption (whether it's memory or
disk space) in more generic cases, because it sounds like that is really
what you're after here. The opposite example to the above would be
something like an 'xfs_io -fc "truncate 1t" /tmp/file', which shows a
1TB inode size with zero additional shmem usage.

Brian

> cat /proc/meminfo > meminfo.start
> xfs_io -fc "falloc -k 0 50m" ./xfs_file
> cat /proc/meminfo > meminfo.stop
> tail -n +1 meminfo.st* | grep -i '==\|Shmem:'
> 
> ==> meminfo.start <==
> Shmem:               484 kB
> ==> meminfo.stop <==
> Shmem:               484 kB
> 
> ls -lh xfs_file
> -rw------- 1 root root 0 Jun 28 15:12 xfs_file
> 
> stat xfs_file
>   File: xfs_file
>   Size: 0               Blocks: 102400     IO Block: 4096   regular empty file
> 
> Thanks,
> Kalesh
> 
> >
> > Brian
> >
> > >
> > >       /* show_fd_locks() never deferences files so a stale value is safe */
> > >       show_fd_locks(m, file, files);
> > > --
> > > 2.37.0.rc0.161.g10f37bed90-goog
> > >
> >
> 


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 1/2] procfs: Add 'size' to /proc/<pid>/fdinfo/
@ 2022-06-29 12:23         ` Brian Foster
  0 siblings, 0 replies; 20+ messages in thread
From: Brian Foster @ 2022-06-29 12:23 UTC (permalink / raw)
  To: Kalesh Singh
  Cc: open list:DOCUMENTATION, DRI mailing list, Stephen Brennan,
	Paul Gortmaker, Sumit Semwal, Jonathan Corbet,
	Christoph Anton Mitterer, Christoph Hellwig, Cc: Android Kernel,
	Linux Media Mailing List, Christian König,
	moderated list:DMA BUFFER SHARING FRAMEWORK, Alexander Viro,
	Suren Baghdasaryan, T.J. Mercier, Randy Dunlap, Ioannis Ilkos,
	LKML, David.Laight, Johannes Weiner, linux-fsdevel,
	Andrew Morton, Christian König, Mike Rapoport

On Tue, Jun 28, 2022 at 03:38:02PM -0700, Kalesh Singh wrote:
> On Tue, Jun 28, 2022 at 4:54 AM Brian Foster <bfoster@redhat.com> wrote:
> >
> > On Thu, Jun 23, 2022 at 03:06:06PM -0700, Kalesh Singh wrote:
> > > To be able to account the amount of memory a process is keeping pinned
> > > by open file descriptors add a 'size' field to fdinfo output.
> > >
> > > dmabufs fds already expose a 'size' field for this reason, remove this
> > > and make it a common field for all fds. This allows tracking of
> > > other types of memory (e.g. memfd and ashmem in Android).
> > >
> > > Signed-off-by: Kalesh Singh <kaleshsingh@google.com>
> > > Reviewed-by: Christian König <christian.koenig@amd.com>
> > > ---
> > >
> > > Changes in v2:
> > >   - Add Christian's Reviewed-by
> > >
> > > Changes from rfc:
> > >   - Split adding 'size' and 'path' into a separate patches, per Christian
> > >   - Split fdinfo seq_printf into separate lines, per Christian
> > >   - Fix indentation (use tabs) in documentaion, per Randy
> > >
> > >  Documentation/filesystems/proc.rst | 12 ++++++++++--
> > >  drivers/dma-buf/dma-buf.c          |  1 -
> > >  fs/proc/fd.c                       |  9 +++++----
> > >  3 files changed, 15 insertions(+), 7 deletions(-)
> > >
...
> >
> > Also not sure if it matters that much for your use case, but something
> > worth noting at least with shmem is that one can do something like:
> >
> > # cat /proc/meminfo | grep Shmem:
> > Shmem:               764 kB
> > # xfs_io -fc "falloc -k 0 10m" ./file
> > # ls -alh file
> > -rw-------. 1 root root 0 Jun 28 07:22 file
> > # stat file
> >   File: file
> >   Size: 0               Blocks: 20480      IO Block: 4096   regular empty file
> > # cat /proc/meminfo | grep Shmem:
> > Shmem:             11004 kB
> >
> > ... where the resulting memory usage isn't reflected in i_size (but is
> > is in i_blocks/bytes).
> 
> I tried a similar experiment a few times, but I don't see the same
> results. In my case, there is not any change in shmem. IIUC the
> fallocate is allocating the disk space not shared memory.
> 

Sorry, it was implied in my previous test was that I was running against
tmpfs. So regardless of fs, the fallocate keep_size semantics shown in
both cases is as expected: the underlying blocks are allocated and the
inode size is unchanged.

What wasn't totally clear to me when I read this patch was 1. whether
tmpfs refers to Shmem and 2. whether tmpfs allowed this sort of
operation. The test above seems to confirm both, however, right? E.g., a
more detailed example:

# mount | grep /tmp
tmpfs on /tmp type tmpfs (rw,nosuid,nodev,seclabel,nr_inodes=1048576,inode64)
# cat /proc/meminfo | grep Shmem:
Shmem:              5300 kB
# xfs_io -fc "falloc -k 0 1g" /tmp/file
# stat /tmp/file 
  File: /tmp/file
  Size: 0               Blocks: 2097152    IO Block: 4096   regular empty file
Device: 22h/34d Inode: 45          Links: 1
Access: (0600/-rw-------)  Uid: (    0/    root)   Gid: (    0/    root)
Context: unconfined_u:object_r:user_tmp_t:s0
Access: 2022-06-29 08:04:01.301307154 -0400
Modify: 2022-06-29 08:04:01.301307154 -0400
Change: 2022-06-29 08:04:01.451312834 -0400
 Birth: 2022-06-29 08:04:01.301307154 -0400
# cat /proc/meminfo | grep Shmem:
Shmem:           1053876 kB
# rm -f /tmp/file 
# cat /proc/meminfo | grep Shmem:
Shmem:              5300 kB

So clearly this impacts Shmem.. was your test run against tmpfs or some
other (disk based) fs?

FWIW, I don't have any objection to exposing inode size if it's commonly
useful information. My feedback was more just an fyi that i_size doesn't
necessarily reflect underlying space consumption (whether it's memory or
disk space) in more generic cases, because it sounds like that is really
what you're after here. The opposite example to the above would be
something like an 'xfs_io -fc "truncate 1t" /tmp/file', which shows a
1TB inode size with zero additional shmem usage.

Brian

> cat /proc/meminfo > meminfo.start
> xfs_io -fc "falloc -k 0 50m" ./xfs_file
> cat /proc/meminfo > meminfo.stop
> tail -n +1 meminfo.st* | grep -i '==\|Shmem:'
> 
> ==> meminfo.start <==
> Shmem:               484 kB
> ==> meminfo.stop <==
> Shmem:               484 kB
> 
> ls -lh xfs_file
> -rw------- 1 root root 0 Jun 28 15:12 xfs_file
> 
> stat xfs_file
>   File: xfs_file
>   Size: 0               Blocks: 102400     IO Block: 4096   regular empty file
> 
> Thanks,
> Kalesh
> 
> >
> > Brian
> >
> > >
> > >       /* show_fd_locks() never deferences files so a stale value is safe */
> > >       show_fd_locks(m, file, files);
> > > --
> > > 2.37.0.rc0.161.g10f37bed90-goog
> > >
> >
> 


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 1/2] procfs: Add 'size' to /proc/<pid>/fdinfo/
  2022-06-29 12:23         ` Brian Foster
@ 2022-06-29 20:43           ` Kalesh Singh
  -1 siblings, 0 replies; 20+ messages in thread
From: Kalesh Singh @ 2022-06-29 20:43 UTC (permalink / raw)
  To: Brian Foster
  Cc: Christian König, Christian König, Alexander Viro,
	Christoph Hellwig, Stephen Brennan, David.Laight, Ioannis Ilkos,
	T.J. Mercier, Suren Baghdasaryan, Cc: Android Kernel,
	Jonathan Corbet, Sumit Semwal, Andrew Morton, Johannes Weiner,
	Christoph Anton Mitterer, Paul Gortmaker, Mike Rapoport,
	Randy Dunlap, LKML, linux-fsdevel, open list:DOCUMENTATION,
	Linux Media Mailing List, DRI mailing list,
	moderated list:DMA BUFFER SHARING FRAMEWORK

On Wed, Jun 29, 2022 at 5:23 AM Brian Foster <bfoster@redhat.com> wrote:
>
> On Tue, Jun 28, 2022 at 03:38:02PM -0700, Kalesh Singh wrote:
> > On Tue, Jun 28, 2022 at 4:54 AM Brian Foster <bfoster@redhat.com> wrote:
> > >
> > > On Thu, Jun 23, 2022 at 03:06:06PM -0700, Kalesh Singh wrote:
> > > > To be able to account the amount of memory a process is keeping pinned
> > > > by open file descriptors add a 'size' field to fdinfo output.
> > > >
> > > > dmabufs fds already expose a 'size' field for this reason, remove this
> > > > and make it a common field for all fds. This allows tracking of
> > > > other types of memory (e.g. memfd and ashmem in Android).
> > > >
> > > > Signed-off-by: Kalesh Singh <kaleshsingh@google.com>
> > > > Reviewed-by: Christian König <christian.koenig@amd.com>
> > > > ---
> > > >
> > > > Changes in v2:
> > > >   - Add Christian's Reviewed-by
> > > >
> > > > Changes from rfc:
> > > >   - Split adding 'size' and 'path' into a separate patches, per Christian
> > > >   - Split fdinfo seq_printf into separate lines, per Christian
> > > >   - Fix indentation (use tabs) in documentaion, per Randy
> > > >
> > > >  Documentation/filesystems/proc.rst | 12 ++++++++++--
> > > >  drivers/dma-buf/dma-buf.c          |  1 -
> > > >  fs/proc/fd.c                       |  9 +++++----
> > > >  3 files changed, 15 insertions(+), 7 deletions(-)
> > > >
> ...
> > >
> > > Also not sure if it matters that much for your use case, but something
> > > worth noting at least with shmem is that one can do something like:
> > >
> > > # cat /proc/meminfo | grep Shmem:
> > > Shmem:               764 kB
> > > # xfs_io -fc "falloc -k 0 10m" ./file
> > > # ls -alh file
> > > -rw-------. 1 root root 0 Jun 28 07:22 file
> > > # stat file
> > >   File: file
> > >   Size: 0               Blocks: 20480      IO Block: 4096   regular empty file
> > > # cat /proc/meminfo | grep Shmem:
> > > Shmem:             11004 kB
> > >
> > > ... where the resulting memory usage isn't reflected in i_size (but is
> > > is in i_blocks/bytes).
> >
> > I tried a similar experiment a few times, but I don't see the same
> > results. In my case, there is not any change in shmem. IIUC the
> > fallocate is allocating the disk space not shared memory.
> >
>
> Sorry, it was implied in my previous test was that I was running against
> tmpfs. So regardless of fs, the fallocate keep_size semantics shown in
> both cases is as expected: the underlying blocks are allocated and the
> inode size is unchanged.
>
> What wasn't totally clear to me when I read this patch was 1. whether
> tmpfs refers to Shmem and 2. whether tmpfs allowed this sort of
> operation. The test above seems to confirm both, however, right? E.g., a
> more detailed example:
>
> # mount | grep /tmp
> tmpfs on /tmp type tmpfs (rw,nosuid,nodev,seclabel,nr_inodes=1048576,inode64)
> # cat /proc/meminfo | grep Shmem:
> Shmem:              5300 kB
> # xfs_io -fc "falloc -k 0 1g" /tmp/file
> # stat /tmp/file
>   File: /tmp/file
>   Size: 0               Blocks: 2097152    IO Block: 4096   regular empty file
> Device: 22h/34d Inode: 45          Links: 1
> Access: (0600/-rw-------)  Uid: (    0/    root)   Gid: (    0/    root)
> Context: unconfined_u:object_r:user_tmp_t:s0
> Access: 2022-06-29 08:04:01.301307154 -0400
> Modify: 2022-06-29 08:04:01.301307154 -0400
> Change: 2022-06-29 08:04:01.451312834 -0400
>  Birth: 2022-06-29 08:04:01.301307154 -0400
> # cat /proc/meminfo | grep Shmem:
> Shmem:           1053876 kB
> # rm -f /tmp/file
> # cat /proc/meminfo | grep Shmem:
> Shmem:              5300 kB
>
> So clearly this impacts Shmem.. was your test run against tmpfs or some
> other (disk based) fs?

Hi Brian,

Thanks for clarifying. My issue was tmpfs not mounted at /tmp in my system:

==> meminfo.start <==
Shmem:               572 kB
==> meminfo.stop <==
Shmem:             51688 kB

>
> FWIW, I don't have any objection to exposing inode size if it's commonly
> useful information. My feedback was more just an fyi that i_size doesn't
> necessarily reflect underlying space consumption (whether it's memory or
> disk space) in more generic cases, because it sounds like that is really
> what you're after here. The opposite example to the above would be
> something like an 'xfs_io -fc "truncate 1t" /tmp/file', which shows a
> 1TB inode size with zero additional shmem usage.

From these cases, it seems the more generic way to do this is by
calculating the actual size consumed using the blocks. (i_blocks *
512). So in the latter example  'xfs_io -fc "truncate 1t" /tmp/file'
the size consumed would be zero. Let me know if it sounds ok to you
and I can repost the updated version.

Thanks,
Kalesh

>
> Brian
>
> > cat /proc/meminfo > meminfo.start
> > xfs_io -fc "falloc -k 0 50m" ./xfs_file
> > cat /proc/meminfo > meminfo.stop
> > tail -n +1 meminfo.st* | grep -i '==\|Shmem:'
> >
> > ==> meminfo.start <==
> > Shmem:               484 kB
> > ==> meminfo.stop <==
> > Shmem:               484 kB
> >
> > ls -lh xfs_file
> > -rw------- 1 root root 0 Jun 28 15:12 xfs_file
> >
> > stat xfs_file
> >   File: xfs_file
> >   Size: 0               Blocks: 102400     IO Block: 4096   regular empty file
> >
> > Thanks,
> > Kalesh
> >
> > >
> > > Brian
> > >
> > > >
> > > >       /* show_fd_locks() never deferences files so a stale value is safe */
> > > >       show_fd_locks(m, file, files);
> > > > --
> > > > 2.37.0.rc0.161.g10f37bed90-goog
> > > >
> > >
> >
>
> --
> To unsubscribe from this group and stop receiving emails from it, send an email to kernel-team+unsubscribe@android.com.
>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 1/2] procfs: Add 'size' to /proc/<pid>/fdinfo/
@ 2022-06-29 20:43           ` Kalesh Singh
  0 siblings, 0 replies; 20+ messages in thread
From: Kalesh Singh @ 2022-06-29 20:43 UTC (permalink / raw)
  To: Brian Foster
  Cc: open list:DOCUMENTATION, DRI mailing list, Stephen Brennan,
	Paul Gortmaker, Sumit Semwal, Jonathan Corbet,
	Christoph Anton Mitterer, Christoph Hellwig, Cc: Android Kernel,
	Linux Media Mailing List, Christian König,
	moderated list:DMA BUFFER SHARING FRAMEWORK, Alexander Viro,
	Suren Baghdasaryan, T.J. Mercier, Randy Dunlap, Ioannis Ilkos,
	LKML, David.Laight, Johannes Weiner, linux-fsdevel,
	Andrew Morton, Christian König, Mike Rapoport

On Wed, Jun 29, 2022 at 5:23 AM Brian Foster <bfoster@redhat.com> wrote:
>
> On Tue, Jun 28, 2022 at 03:38:02PM -0700, Kalesh Singh wrote:
> > On Tue, Jun 28, 2022 at 4:54 AM Brian Foster <bfoster@redhat.com> wrote:
> > >
> > > On Thu, Jun 23, 2022 at 03:06:06PM -0700, Kalesh Singh wrote:
> > > > To be able to account the amount of memory a process is keeping pinned
> > > > by open file descriptors add a 'size' field to fdinfo output.
> > > >
> > > > dmabufs fds already expose a 'size' field for this reason, remove this
> > > > and make it a common field for all fds. This allows tracking of
> > > > other types of memory (e.g. memfd and ashmem in Android).
> > > >
> > > > Signed-off-by: Kalesh Singh <kaleshsingh@google.com>
> > > > Reviewed-by: Christian König <christian.koenig@amd.com>
> > > > ---
> > > >
> > > > Changes in v2:
> > > >   - Add Christian's Reviewed-by
> > > >
> > > > Changes from rfc:
> > > >   - Split adding 'size' and 'path' into a separate patches, per Christian
> > > >   - Split fdinfo seq_printf into separate lines, per Christian
> > > >   - Fix indentation (use tabs) in documentaion, per Randy
> > > >
> > > >  Documentation/filesystems/proc.rst | 12 ++++++++++--
> > > >  drivers/dma-buf/dma-buf.c          |  1 -
> > > >  fs/proc/fd.c                       |  9 +++++----
> > > >  3 files changed, 15 insertions(+), 7 deletions(-)
> > > >
> ...
> > >
> > > Also not sure if it matters that much for your use case, but something
> > > worth noting at least with shmem is that one can do something like:
> > >
> > > # cat /proc/meminfo | grep Shmem:
> > > Shmem:               764 kB
> > > # xfs_io -fc "falloc -k 0 10m" ./file
> > > # ls -alh file
> > > -rw-------. 1 root root 0 Jun 28 07:22 file
> > > # stat file
> > >   File: file
> > >   Size: 0               Blocks: 20480      IO Block: 4096   regular empty file
> > > # cat /proc/meminfo | grep Shmem:
> > > Shmem:             11004 kB
> > >
> > > ... where the resulting memory usage isn't reflected in i_size (but is
> > > is in i_blocks/bytes).
> >
> > I tried a similar experiment a few times, but I don't see the same
> > results. In my case, there is not any change in shmem. IIUC the
> > fallocate is allocating the disk space not shared memory.
> >
>
> Sorry, it was implied in my previous test was that I was running against
> tmpfs. So regardless of fs, the fallocate keep_size semantics shown in
> both cases is as expected: the underlying blocks are allocated and the
> inode size is unchanged.
>
> What wasn't totally clear to me when I read this patch was 1. whether
> tmpfs refers to Shmem and 2. whether tmpfs allowed this sort of
> operation. The test above seems to confirm both, however, right? E.g., a
> more detailed example:
>
> # mount | grep /tmp
> tmpfs on /tmp type tmpfs (rw,nosuid,nodev,seclabel,nr_inodes=1048576,inode64)
> # cat /proc/meminfo | grep Shmem:
> Shmem:              5300 kB
> # xfs_io -fc "falloc -k 0 1g" /tmp/file
> # stat /tmp/file
>   File: /tmp/file
>   Size: 0               Blocks: 2097152    IO Block: 4096   regular empty file
> Device: 22h/34d Inode: 45          Links: 1
> Access: (0600/-rw-------)  Uid: (    0/    root)   Gid: (    0/    root)
> Context: unconfined_u:object_r:user_tmp_t:s0
> Access: 2022-06-29 08:04:01.301307154 -0400
> Modify: 2022-06-29 08:04:01.301307154 -0400
> Change: 2022-06-29 08:04:01.451312834 -0400
>  Birth: 2022-06-29 08:04:01.301307154 -0400
> # cat /proc/meminfo | grep Shmem:
> Shmem:           1053876 kB
> # rm -f /tmp/file
> # cat /proc/meminfo | grep Shmem:
> Shmem:              5300 kB
>
> So clearly this impacts Shmem.. was your test run against tmpfs or some
> other (disk based) fs?

Hi Brian,

Thanks for clarifying. My issue was tmpfs not mounted at /tmp in my system:

==> meminfo.start <==
Shmem:               572 kB
==> meminfo.stop <==
Shmem:             51688 kB

>
> FWIW, I don't have any objection to exposing inode size if it's commonly
> useful information. My feedback was more just an fyi that i_size doesn't
> necessarily reflect underlying space consumption (whether it's memory or
> disk space) in more generic cases, because it sounds like that is really
> what you're after here. The opposite example to the above would be
> something like an 'xfs_io -fc "truncate 1t" /tmp/file', which shows a
> 1TB inode size with zero additional shmem usage.

From these cases, it seems the more generic way to do this is by
calculating the actual size consumed using the blocks. (i_blocks *
512). So in the latter example  'xfs_io -fc "truncate 1t" /tmp/file'
the size consumed would be zero. Let me know if it sounds ok to you
and I can repost the updated version.

Thanks,
Kalesh

>
> Brian
>
> > cat /proc/meminfo > meminfo.start
> > xfs_io -fc "falloc -k 0 50m" ./xfs_file
> > cat /proc/meminfo > meminfo.stop
> > tail -n +1 meminfo.st* | grep -i '==\|Shmem:'
> >
> > ==> meminfo.start <==
> > Shmem:               484 kB
> > ==> meminfo.stop <==
> > Shmem:               484 kB
> >
> > ls -lh xfs_file
> > -rw------- 1 root root 0 Jun 28 15:12 xfs_file
> >
> > stat xfs_file
> >   File: xfs_file
> >   Size: 0               Blocks: 102400     IO Block: 4096   regular empty file
> >
> > Thanks,
> > Kalesh
> >
> > >
> > > Brian
> > >
> > > >
> > > >       /* show_fd_locks() never deferences files so a stale value is safe */
> > > >       show_fd_locks(m, file, files);
> > > > --
> > > > 2.37.0.rc0.161.g10f37bed90-goog
> > > >
> > >
> >
>
> --
> To unsubscribe from this group and stop receiving emails from it, send an email to kernel-team+unsubscribe@android.com.
>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 1/2] procfs: Add 'size' to /proc/<pid>/fdinfo/
  2022-06-29 20:43           ` Kalesh Singh
@ 2022-06-30 11:48             ` Brian Foster
  -1 siblings, 0 replies; 20+ messages in thread
From: Brian Foster @ 2022-06-30 11:48 UTC (permalink / raw)
  To: Kalesh Singh
  Cc: Christian König, Christian König, Alexander Viro,
	Christoph Hellwig, Stephen Brennan, David.Laight, Ioannis Ilkos,
	T.J. Mercier, Suren Baghdasaryan, Cc: Android Kernel,
	Jonathan Corbet, Sumit Semwal, Andrew Morton, Johannes Weiner,
	Christoph Anton Mitterer, Paul Gortmaker, Mike Rapoport,
	Randy Dunlap, LKML, linux-fsdevel, open list:DOCUMENTATION,
	Linux Media Mailing List, DRI mailing list,
	moderated list:DMA BUFFER SHARING FRAMEWORK

On Wed, Jun 29, 2022 at 01:43:11PM -0700, Kalesh Singh wrote:
> On Wed, Jun 29, 2022 at 5:23 AM Brian Foster <bfoster@redhat.com> wrote:
> >
> > On Tue, Jun 28, 2022 at 03:38:02PM -0700, Kalesh Singh wrote:
> > > On Tue, Jun 28, 2022 at 4:54 AM Brian Foster <bfoster@redhat.com> wrote:
> > > >
> > > > On Thu, Jun 23, 2022 at 03:06:06PM -0700, Kalesh Singh wrote:
> > > > > To be able to account the amount of memory a process is keeping pinned
> > > > > by open file descriptors add a 'size' field to fdinfo output.
> > > > >
> > > > > dmabufs fds already expose a 'size' field for this reason, remove this
> > > > > and make it a common field for all fds. This allows tracking of
> > > > > other types of memory (e.g. memfd and ashmem in Android).
> > > > >
> > > > > Signed-off-by: Kalesh Singh <kaleshsingh@google.com>
> > > > > Reviewed-by: Christian König <christian.koenig@amd.com>
> > > > > ---
> > > > >
> > > > > Changes in v2:
> > > > >   - Add Christian's Reviewed-by
> > > > >
> > > > > Changes from rfc:
> > > > >   - Split adding 'size' and 'path' into a separate patches, per Christian
> > > > >   - Split fdinfo seq_printf into separate lines, per Christian
> > > > >   - Fix indentation (use tabs) in documentaion, per Randy
> > > > >
> > > > >  Documentation/filesystems/proc.rst | 12 ++++++++++--
> > > > >  drivers/dma-buf/dma-buf.c          |  1 -
> > > > >  fs/proc/fd.c                       |  9 +++++----
> > > > >  3 files changed, 15 insertions(+), 7 deletions(-)
> > > > >
> > ...
> > > >
> > > > Also not sure if it matters that much for your use case, but something
> > > > worth noting at least with shmem is that one can do something like:
> > > >
> > > > # cat /proc/meminfo | grep Shmem:
> > > > Shmem:               764 kB
> > > > # xfs_io -fc "falloc -k 0 10m" ./file
> > > > # ls -alh file
> > > > -rw-------. 1 root root 0 Jun 28 07:22 file
> > > > # stat file
> > > >   File: file
> > > >   Size: 0               Blocks: 20480      IO Block: 4096   regular empty file
> > > > # cat /proc/meminfo | grep Shmem:
> > > > Shmem:             11004 kB
> > > >
> > > > ... where the resulting memory usage isn't reflected in i_size (but is
> > > > is in i_blocks/bytes).
> > >
> > > I tried a similar experiment a few times, but I don't see the same
> > > results. In my case, there is not any change in shmem. IIUC the
> > > fallocate is allocating the disk space not shared memory.
> > >
> >
> > Sorry, it was implied in my previous test was that I was running against
> > tmpfs. So regardless of fs, the fallocate keep_size semantics shown in
> > both cases is as expected: the underlying blocks are allocated and the
> > inode size is unchanged.
> >
> > What wasn't totally clear to me when I read this patch was 1. whether
> > tmpfs refers to Shmem and 2. whether tmpfs allowed this sort of
> > operation. The test above seems to confirm both, however, right? E.g., a
> > more detailed example:
> >
> > # mount | grep /tmp
> > tmpfs on /tmp type tmpfs (rw,nosuid,nodev,seclabel,nr_inodes=1048576,inode64)
> > # cat /proc/meminfo | grep Shmem:
> > Shmem:              5300 kB
> > # xfs_io -fc "falloc -k 0 1g" /tmp/file
> > # stat /tmp/file
> >   File: /tmp/file
> >   Size: 0               Blocks: 2097152    IO Block: 4096   regular empty file
> > Device: 22h/34d Inode: 45          Links: 1
> > Access: (0600/-rw-------)  Uid: (    0/    root)   Gid: (    0/    root)
> > Context: unconfined_u:object_r:user_tmp_t:s0
> > Access: 2022-06-29 08:04:01.301307154 -0400
> > Modify: 2022-06-29 08:04:01.301307154 -0400
> > Change: 2022-06-29 08:04:01.451312834 -0400
> >  Birth: 2022-06-29 08:04:01.301307154 -0400
> > # cat /proc/meminfo | grep Shmem:
> > Shmem:           1053876 kB
> > # rm -f /tmp/file
> > # cat /proc/meminfo | grep Shmem:
> > Shmem:              5300 kB
> >
> > So clearly this impacts Shmem.. was your test run against tmpfs or some
> > other (disk based) fs?
> 
> Hi Brian,
> 
> Thanks for clarifying. My issue was tmpfs not mounted at /tmp in my system:
> 
> ==> meminfo.start <==
> Shmem:               572 kB
> ==> meminfo.stop <==
> Shmem:             51688 kB
> 

Ok, makes sense.

> >
> > FWIW, I don't have any objection to exposing inode size if it's commonly
> > useful information. My feedback was more just an fyi that i_size doesn't
> > necessarily reflect underlying space consumption (whether it's memory or
> > disk space) in more generic cases, because it sounds like that is really
> > what you're after here. The opposite example to the above would be
> > something like an 'xfs_io -fc "truncate 1t" /tmp/file', which shows a
> > 1TB inode size with zero additional shmem usage.
> 
> From these cases, it seems the more generic way to do this is by
> calculating the actual size consumed using the blocks. (i_blocks *
> 512). So in the latter example  'xfs_io -fc "truncate 1t" /tmp/file'
> the size consumed would be zero. Let me know if it sounds ok to you
> and I can repost the updated version.
> 

That sounds a bit more useful to me if you're interested in space usage,
or at least I don't have a better idea for you. ;)

One thing to note is that I'm not sure whether all fs' use i_blocks
reliably. E.g., XFS populates stat->blocks via a separate block counter
in the XFS specific inode structure (see xfs_vn_getattr()). A bunch of
other fs' seem to touch it so perhaps that is just an outlier. You could
consider fixing that up, perhaps make a ->getattr() call to avoid it, or
just use the field directly if it's useful enough as is and there are no
other objections. Something to think about anyways..

Brian

> Thanks,
> Kalesh
> 
> >
> > Brian
> >
> > > cat /proc/meminfo > meminfo.start
> > > xfs_io -fc "falloc -k 0 50m" ./xfs_file
> > > cat /proc/meminfo > meminfo.stop
> > > tail -n +1 meminfo.st* | grep -i '==\|Shmem:'
> > >
> > > ==> meminfo.start <==
> > > Shmem:               484 kB
> > > ==> meminfo.stop <==
> > > Shmem:               484 kB
> > >
> > > ls -lh xfs_file
> > > -rw------- 1 root root 0 Jun 28 15:12 xfs_file
> > >
> > > stat xfs_file
> > >   File: xfs_file
> > >   Size: 0               Blocks: 102400     IO Block: 4096   regular empty file
> > >
> > > Thanks,
> > > Kalesh
> > >
> > > >
> > > > Brian
> > > >
> > > > >
> > > > >       /* show_fd_locks() never deferences files so a stale value is safe */
> > > > >       show_fd_locks(m, file, files);
> > > > > --
> > > > > 2.37.0.rc0.161.g10f37bed90-goog
> > > > >
> > > >
> > >
> >
> > --
> > To unsubscribe from this group and stop receiving emails from it, send an email to kernel-team+unsubscribe@android.com.
> >
> 


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 1/2] procfs: Add 'size' to /proc/<pid>/fdinfo/
@ 2022-06-30 11:48             ` Brian Foster
  0 siblings, 0 replies; 20+ messages in thread
From: Brian Foster @ 2022-06-30 11:48 UTC (permalink / raw)
  To: Kalesh Singh
  Cc: open list:DOCUMENTATION, DRI mailing list, Stephen Brennan,
	Paul Gortmaker, Sumit Semwal, Jonathan Corbet,
	Christoph Anton Mitterer, Christoph Hellwig, Cc: Android Kernel,
	Linux Media Mailing List, Christian König,
	moderated list:DMA BUFFER SHARING FRAMEWORK, Alexander Viro,
	Suren Baghdasaryan, T.J. Mercier, Randy Dunlap, Ioannis Ilkos,
	LKML, David.Laight, Johannes Weiner, linux-fsdevel,
	Andrew Morton, Christian König, Mike Rapoport

On Wed, Jun 29, 2022 at 01:43:11PM -0700, Kalesh Singh wrote:
> On Wed, Jun 29, 2022 at 5:23 AM Brian Foster <bfoster@redhat.com> wrote:
> >
> > On Tue, Jun 28, 2022 at 03:38:02PM -0700, Kalesh Singh wrote:
> > > On Tue, Jun 28, 2022 at 4:54 AM Brian Foster <bfoster@redhat.com> wrote:
> > > >
> > > > On Thu, Jun 23, 2022 at 03:06:06PM -0700, Kalesh Singh wrote:
> > > > > To be able to account the amount of memory a process is keeping pinned
> > > > > by open file descriptors add a 'size' field to fdinfo output.
> > > > >
> > > > > dmabufs fds already expose a 'size' field for this reason, remove this
> > > > > and make it a common field for all fds. This allows tracking of
> > > > > other types of memory (e.g. memfd and ashmem in Android).
> > > > >
> > > > > Signed-off-by: Kalesh Singh <kaleshsingh@google.com>
> > > > > Reviewed-by: Christian König <christian.koenig@amd.com>
> > > > > ---
> > > > >
> > > > > Changes in v2:
> > > > >   - Add Christian's Reviewed-by
> > > > >
> > > > > Changes from rfc:
> > > > >   - Split adding 'size' and 'path' into a separate patches, per Christian
> > > > >   - Split fdinfo seq_printf into separate lines, per Christian
> > > > >   - Fix indentation (use tabs) in documentaion, per Randy
> > > > >
> > > > >  Documentation/filesystems/proc.rst | 12 ++++++++++--
> > > > >  drivers/dma-buf/dma-buf.c          |  1 -
> > > > >  fs/proc/fd.c                       |  9 +++++----
> > > > >  3 files changed, 15 insertions(+), 7 deletions(-)
> > > > >
> > ...
> > > >
> > > > Also not sure if it matters that much for your use case, but something
> > > > worth noting at least with shmem is that one can do something like:
> > > >
> > > > # cat /proc/meminfo | grep Shmem:
> > > > Shmem:               764 kB
> > > > # xfs_io -fc "falloc -k 0 10m" ./file
> > > > # ls -alh file
> > > > -rw-------. 1 root root 0 Jun 28 07:22 file
> > > > # stat file
> > > >   File: file
> > > >   Size: 0               Blocks: 20480      IO Block: 4096   regular empty file
> > > > # cat /proc/meminfo | grep Shmem:
> > > > Shmem:             11004 kB
> > > >
> > > > ... where the resulting memory usage isn't reflected in i_size (but is
> > > > is in i_blocks/bytes).
> > >
> > > I tried a similar experiment a few times, but I don't see the same
> > > results. In my case, there is not any change in shmem. IIUC the
> > > fallocate is allocating the disk space not shared memory.
> > >
> >
> > Sorry, it was implied in my previous test was that I was running against
> > tmpfs. So regardless of fs, the fallocate keep_size semantics shown in
> > both cases is as expected: the underlying blocks are allocated and the
> > inode size is unchanged.
> >
> > What wasn't totally clear to me when I read this patch was 1. whether
> > tmpfs refers to Shmem and 2. whether tmpfs allowed this sort of
> > operation. The test above seems to confirm both, however, right? E.g., a
> > more detailed example:
> >
> > # mount | grep /tmp
> > tmpfs on /tmp type tmpfs (rw,nosuid,nodev,seclabel,nr_inodes=1048576,inode64)
> > # cat /proc/meminfo | grep Shmem:
> > Shmem:              5300 kB
> > # xfs_io -fc "falloc -k 0 1g" /tmp/file
> > # stat /tmp/file
> >   File: /tmp/file
> >   Size: 0               Blocks: 2097152    IO Block: 4096   regular empty file
> > Device: 22h/34d Inode: 45          Links: 1
> > Access: (0600/-rw-------)  Uid: (    0/    root)   Gid: (    0/    root)
> > Context: unconfined_u:object_r:user_tmp_t:s0
> > Access: 2022-06-29 08:04:01.301307154 -0400
> > Modify: 2022-06-29 08:04:01.301307154 -0400
> > Change: 2022-06-29 08:04:01.451312834 -0400
> >  Birth: 2022-06-29 08:04:01.301307154 -0400
> > # cat /proc/meminfo | grep Shmem:
> > Shmem:           1053876 kB
> > # rm -f /tmp/file
> > # cat /proc/meminfo | grep Shmem:
> > Shmem:              5300 kB
> >
> > So clearly this impacts Shmem.. was your test run against tmpfs or some
> > other (disk based) fs?
> 
> Hi Brian,
> 
> Thanks for clarifying. My issue was tmpfs not mounted at /tmp in my system:
> 
> ==> meminfo.start <==
> Shmem:               572 kB
> ==> meminfo.stop <==
> Shmem:             51688 kB
> 

Ok, makes sense.

> >
> > FWIW, I don't have any objection to exposing inode size if it's commonly
> > useful information. My feedback was more just an fyi that i_size doesn't
> > necessarily reflect underlying space consumption (whether it's memory or
> > disk space) in more generic cases, because it sounds like that is really
> > what you're after here. The opposite example to the above would be
> > something like an 'xfs_io -fc "truncate 1t" /tmp/file', which shows a
> > 1TB inode size with zero additional shmem usage.
> 
> From these cases, it seems the more generic way to do this is by
> calculating the actual size consumed using the blocks. (i_blocks *
> 512). So in the latter example  'xfs_io -fc "truncate 1t" /tmp/file'
> the size consumed would be zero. Let me know if it sounds ok to you
> and I can repost the updated version.
> 

That sounds a bit more useful to me if you're interested in space usage,
or at least I don't have a better idea for you. ;)

One thing to note is that I'm not sure whether all fs' use i_blocks
reliably. E.g., XFS populates stat->blocks via a separate block counter
in the XFS specific inode structure (see xfs_vn_getattr()). A bunch of
other fs' seem to touch it so perhaps that is just an outlier. You could
consider fixing that up, perhaps make a ->getattr() call to avoid it, or
just use the field directly if it's useful enough as is and there are no
other objections. Something to think about anyways..

Brian

> Thanks,
> Kalesh
> 
> >
> > Brian
> >
> > > cat /proc/meminfo > meminfo.start
> > > xfs_io -fc "falloc -k 0 50m" ./xfs_file
> > > cat /proc/meminfo > meminfo.stop
> > > tail -n +1 meminfo.st* | grep -i '==\|Shmem:'
> > >
> > > ==> meminfo.start <==
> > > Shmem:               484 kB
> > > ==> meminfo.stop <==
> > > Shmem:               484 kB
> > >
> > > ls -lh xfs_file
> > > -rw------- 1 root root 0 Jun 28 15:12 xfs_file
> > >
> > > stat xfs_file
> > >   File: xfs_file
> > >   Size: 0               Blocks: 102400     IO Block: 4096   regular empty file
> > >
> > > Thanks,
> > > Kalesh
> > >
> > > >
> > > > Brian
> > > >
> > > > >
> > > > >       /* show_fd_locks() never deferences files so a stale value is safe */
> > > > >       show_fd_locks(m, file, files);
> > > > > --
> > > > > 2.37.0.rc0.161.g10f37bed90-goog
> > > > >
> > > >
> > >
> >
> > --
> > To unsubscribe from this group and stop receiving emails from it, send an email to kernel-team+unsubscribe@android.com.
> >
> 


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 1/2] procfs: Add 'size' to /proc/<pid>/fdinfo/
  2022-06-30 11:48             ` Brian Foster
@ 2022-06-30 12:03               ` Brian Foster
  -1 siblings, 0 replies; 20+ messages in thread
From: Brian Foster @ 2022-06-30 12:03 UTC (permalink / raw)
  To: Kalesh Singh
  Cc: Christian König, Christian König, Alexander Viro,
	Christoph Hellwig, Stephen Brennan, David.Laight, Ioannis Ilkos,
	T.J. Mercier, Suren Baghdasaryan, Cc: Android Kernel,
	Jonathan Corbet, Sumit Semwal, Andrew Morton, Johannes Weiner,
	Christoph Anton Mitterer, Paul Gortmaker, Mike Rapoport,
	Randy Dunlap, LKML, linux-fsdevel, open list:DOCUMENTATION,
	Linux Media Mailing List, DRI mailing list,
	moderated list:DMA BUFFER SHARING FRAMEWORK

On Thu, Jun 30, 2022 at 07:48:46AM -0400, Brian Foster wrote:
> On Wed, Jun 29, 2022 at 01:43:11PM -0700, Kalesh Singh wrote:
> > On Wed, Jun 29, 2022 at 5:23 AM Brian Foster <bfoster@redhat.com> wrote:
> > >
> > > On Tue, Jun 28, 2022 at 03:38:02PM -0700, Kalesh Singh wrote:
> > > > On Tue, Jun 28, 2022 at 4:54 AM Brian Foster <bfoster@redhat.com> wrote:
> > > > >
> > > > > On Thu, Jun 23, 2022 at 03:06:06PM -0700, Kalesh Singh wrote:
> > > > > > To be able to account the amount of memory a process is keeping pinned
> > > > > > by open file descriptors add a 'size' field to fdinfo output.
> > > > > >
> > > > > > dmabufs fds already expose a 'size' field for this reason, remove this
> > > > > > and make it a common field for all fds. This allows tracking of
> > > > > > other types of memory (e.g. memfd and ashmem in Android).
> > > > > >
> > > > > > Signed-off-by: Kalesh Singh <kaleshsingh@google.com>
> > > > > > Reviewed-by: Christian König <christian.koenig@amd.com>
> > > > > > ---
> > > > > >
> > > > > > Changes in v2:
> > > > > >   - Add Christian's Reviewed-by
> > > > > >
> > > > > > Changes from rfc:
> > > > > >   - Split adding 'size' and 'path' into a separate patches, per Christian
> > > > > >   - Split fdinfo seq_printf into separate lines, per Christian
> > > > > >   - Fix indentation (use tabs) in documentaion, per Randy
> > > > > >
> > > > > >  Documentation/filesystems/proc.rst | 12 ++++++++++--
> > > > > >  drivers/dma-buf/dma-buf.c          |  1 -
> > > > > >  fs/proc/fd.c                       |  9 +++++----
> > > > > >  3 files changed, 15 insertions(+), 7 deletions(-)
> > > > > >
> > > ...
> > > > >
> > > > > Also not sure if it matters that much for your use case, but something
> > > > > worth noting at least with shmem is that one can do something like:
> > > > >
> > > > > # cat /proc/meminfo | grep Shmem:
> > > > > Shmem:               764 kB
> > > > > # xfs_io -fc "falloc -k 0 10m" ./file
> > > > > # ls -alh file
> > > > > -rw-------. 1 root root 0 Jun 28 07:22 file
> > > > > # stat file
> > > > >   File: file
> > > > >   Size: 0               Blocks: 20480      IO Block: 4096   regular empty file
> > > > > # cat /proc/meminfo | grep Shmem:
> > > > > Shmem:             11004 kB
> > > > >
> > > > > ... where the resulting memory usage isn't reflected in i_size (but is
> > > > > is in i_blocks/bytes).
> > > >
> > > > I tried a similar experiment a few times, but I don't see the same
> > > > results. In my case, there is not any change in shmem. IIUC the
> > > > fallocate is allocating the disk space not shared memory.
> > > >
> > >
> > > Sorry, it was implied in my previous test was that I was running against
> > > tmpfs. So regardless of fs, the fallocate keep_size semantics shown in
> > > both cases is as expected: the underlying blocks are allocated and the
> > > inode size is unchanged.
> > >
> > > What wasn't totally clear to me when I read this patch was 1. whether
> > > tmpfs refers to Shmem and 2. whether tmpfs allowed this sort of
> > > operation. The test above seems to confirm both, however, right? E.g., a
> > > more detailed example:
> > >
> > > # mount | grep /tmp
> > > tmpfs on /tmp type tmpfs (rw,nosuid,nodev,seclabel,nr_inodes=1048576,inode64)
> > > # cat /proc/meminfo | grep Shmem:
> > > Shmem:              5300 kB
> > > # xfs_io -fc "falloc -k 0 1g" /tmp/file
> > > # stat /tmp/file
> > >   File: /tmp/file
> > >   Size: 0               Blocks: 2097152    IO Block: 4096   regular empty file
> > > Device: 22h/34d Inode: 45          Links: 1
> > > Access: (0600/-rw-------)  Uid: (    0/    root)   Gid: (    0/    root)
> > > Context: unconfined_u:object_r:user_tmp_t:s0
> > > Access: 2022-06-29 08:04:01.301307154 -0400
> > > Modify: 2022-06-29 08:04:01.301307154 -0400
> > > Change: 2022-06-29 08:04:01.451312834 -0400
> > >  Birth: 2022-06-29 08:04:01.301307154 -0400
> > > # cat /proc/meminfo | grep Shmem:
> > > Shmem:           1053876 kB
> > > # rm -f /tmp/file
> > > # cat /proc/meminfo | grep Shmem:
> > > Shmem:              5300 kB
> > >
> > > So clearly this impacts Shmem.. was your test run against tmpfs or some
> > > other (disk based) fs?
> > 
> > Hi Brian,
> > 
> > Thanks for clarifying. My issue was tmpfs not mounted at /tmp in my system:
> > 
> > ==> meminfo.start <==
> > Shmem:               572 kB
> > ==> meminfo.stop <==
> > Shmem:             51688 kB
> > 
> 
> Ok, makes sense.
> 
> > >
> > > FWIW, I don't have any objection to exposing inode size if it's commonly
> > > useful information. My feedback was more just an fyi that i_size doesn't
> > > necessarily reflect underlying space consumption (whether it's memory or
> > > disk space) in more generic cases, because it sounds like that is really
> > > what you're after here. The opposite example to the above would be
> > > something like an 'xfs_io -fc "truncate 1t" /tmp/file', which shows a
> > > 1TB inode size with zero additional shmem usage.
> > 
> > From these cases, it seems the more generic way to do this is by
> > calculating the actual size consumed using the blocks. (i_blocks *
> > 512). So in the latter example  'xfs_io -fc "truncate 1t" /tmp/file'
> > the size consumed would be zero. Let me know if it sounds ok to you
> > and I can repost the updated version.
> > 
> 
> That sounds a bit more useful to me if you're interested in space usage,
> or at least I don't have a better idea for you. ;)
> 
> One thing to note is that I'm not sure whether all fs' use i_blocks
> reliably. E.g., XFS populates stat->blocks via a separate block counter
> in the XFS specific inode structure (see xfs_vn_getattr()). A bunch of
> other fs' seem to touch it so perhaps that is just an outlier. You could
> consider fixing that up, perhaps make a ->getattr() call to avoid it, or
> just use the field directly if it's useful enough as is and there are no
> other objections. Something to think about anyways..
> 

Oh, I wonder if you're looking for similar "file rss" information this
series wants to collect/expose..?

https://lore.kernel.org/linux-fsdevel/20220624080444.7619-1-christian.koenig@amd.com/#r

Brian

> Brian
> 
> > Thanks,
> > Kalesh
> > 
> > >
> > > Brian
> > >
> > > > cat /proc/meminfo > meminfo.start
> > > > xfs_io -fc "falloc -k 0 50m" ./xfs_file
> > > > cat /proc/meminfo > meminfo.stop
> > > > tail -n +1 meminfo.st* | grep -i '==\|Shmem:'
> > > >
> > > > ==> meminfo.start <==
> > > > Shmem:               484 kB
> > > > ==> meminfo.stop <==
> > > > Shmem:               484 kB
> > > >
> > > > ls -lh xfs_file
> > > > -rw------- 1 root root 0 Jun 28 15:12 xfs_file
> > > >
> > > > stat xfs_file
> > > >   File: xfs_file
> > > >   Size: 0               Blocks: 102400     IO Block: 4096   regular empty file
> > > >
> > > > Thanks,
> > > > Kalesh
> > > >
> > > > >
> > > > > Brian
> > > > >
> > > > > >
> > > > > >       /* show_fd_locks() never deferences files so a stale value is safe */
> > > > > >       show_fd_locks(m, file, files);
> > > > > > --
> > > > > > 2.37.0.rc0.161.g10f37bed90-goog
> > > > > >
> > > > >
> > > >
> > >
> > > --
> > > To unsubscribe from this group and stop receiving emails from it, send an email to kernel-team+unsubscribe@android.com.
> > >
> > 


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 1/2] procfs: Add 'size' to /proc/<pid>/fdinfo/
@ 2022-06-30 12:03               ` Brian Foster
  0 siblings, 0 replies; 20+ messages in thread
From: Brian Foster @ 2022-06-30 12:03 UTC (permalink / raw)
  To: Kalesh Singh
  Cc: open list:DOCUMENTATION, DRI mailing list, Stephen Brennan,
	Paul Gortmaker, Sumit Semwal, Jonathan Corbet,
	Christoph Anton Mitterer, Christoph Hellwig, Cc: Android Kernel,
	Linux Media Mailing List, Christian König,
	moderated list:DMA BUFFER SHARING FRAMEWORK, Alexander Viro,
	Suren Baghdasaryan, T.J. Mercier, Randy Dunlap, Ioannis Ilkos,
	LKML, David.Laight, Johannes Weiner, linux-fsdevel,
	Andrew Morton, Christian König, Mike Rapoport

On Thu, Jun 30, 2022 at 07:48:46AM -0400, Brian Foster wrote:
> On Wed, Jun 29, 2022 at 01:43:11PM -0700, Kalesh Singh wrote:
> > On Wed, Jun 29, 2022 at 5:23 AM Brian Foster <bfoster@redhat.com> wrote:
> > >
> > > On Tue, Jun 28, 2022 at 03:38:02PM -0700, Kalesh Singh wrote:
> > > > On Tue, Jun 28, 2022 at 4:54 AM Brian Foster <bfoster@redhat.com> wrote:
> > > > >
> > > > > On Thu, Jun 23, 2022 at 03:06:06PM -0700, Kalesh Singh wrote:
> > > > > > To be able to account the amount of memory a process is keeping pinned
> > > > > > by open file descriptors add a 'size' field to fdinfo output.
> > > > > >
> > > > > > dmabufs fds already expose a 'size' field for this reason, remove this
> > > > > > and make it a common field for all fds. This allows tracking of
> > > > > > other types of memory (e.g. memfd and ashmem in Android).
> > > > > >
> > > > > > Signed-off-by: Kalesh Singh <kaleshsingh@google.com>
> > > > > > Reviewed-by: Christian König <christian.koenig@amd.com>
> > > > > > ---
> > > > > >
> > > > > > Changes in v2:
> > > > > >   - Add Christian's Reviewed-by
> > > > > >
> > > > > > Changes from rfc:
> > > > > >   - Split adding 'size' and 'path' into a separate patches, per Christian
> > > > > >   - Split fdinfo seq_printf into separate lines, per Christian
> > > > > >   - Fix indentation (use tabs) in documentaion, per Randy
> > > > > >
> > > > > >  Documentation/filesystems/proc.rst | 12 ++++++++++--
> > > > > >  drivers/dma-buf/dma-buf.c          |  1 -
> > > > > >  fs/proc/fd.c                       |  9 +++++----
> > > > > >  3 files changed, 15 insertions(+), 7 deletions(-)
> > > > > >
> > > ...
> > > > >
> > > > > Also not sure if it matters that much for your use case, but something
> > > > > worth noting at least with shmem is that one can do something like:
> > > > >
> > > > > # cat /proc/meminfo | grep Shmem:
> > > > > Shmem:               764 kB
> > > > > # xfs_io -fc "falloc -k 0 10m" ./file
> > > > > # ls -alh file
> > > > > -rw-------. 1 root root 0 Jun 28 07:22 file
> > > > > # stat file
> > > > >   File: file
> > > > >   Size: 0               Blocks: 20480      IO Block: 4096   regular empty file
> > > > > # cat /proc/meminfo | grep Shmem:
> > > > > Shmem:             11004 kB
> > > > >
> > > > > ... where the resulting memory usage isn't reflected in i_size (but is
> > > > > is in i_blocks/bytes).
> > > >
> > > > I tried a similar experiment a few times, but I don't see the same
> > > > results. In my case, there is not any change in shmem. IIUC the
> > > > fallocate is allocating the disk space not shared memory.
> > > >
> > >
> > > Sorry, it was implied in my previous test was that I was running against
> > > tmpfs. So regardless of fs, the fallocate keep_size semantics shown in
> > > both cases is as expected: the underlying blocks are allocated and the
> > > inode size is unchanged.
> > >
> > > What wasn't totally clear to me when I read this patch was 1. whether
> > > tmpfs refers to Shmem and 2. whether tmpfs allowed this sort of
> > > operation. The test above seems to confirm both, however, right? E.g., a
> > > more detailed example:
> > >
> > > # mount | grep /tmp
> > > tmpfs on /tmp type tmpfs (rw,nosuid,nodev,seclabel,nr_inodes=1048576,inode64)
> > > # cat /proc/meminfo | grep Shmem:
> > > Shmem:              5300 kB
> > > # xfs_io -fc "falloc -k 0 1g" /tmp/file
> > > # stat /tmp/file
> > >   File: /tmp/file
> > >   Size: 0               Blocks: 2097152    IO Block: 4096   regular empty file
> > > Device: 22h/34d Inode: 45          Links: 1
> > > Access: (0600/-rw-------)  Uid: (    0/    root)   Gid: (    0/    root)
> > > Context: unconfined_u:object_r:user_tmp_t:s0
> > > Access: 2022-06-29 08:04:01.301307154 -0400
> > > Modify: 2022-06-29 08:04:01.301307154 -0400
> > > Change: 2022-06-29 08:04:01.451312834 -0400
> > >  Birth: 2022-06-29 08:04:01.301307154 -0400
> > > # cat /proc/meminfo | grep Shmem:
> > > Shmem:           1053876 kB
> > > # rm -f /tmp/file
> > > # cat /proc/meminfo | grep Shmem:
> > > Shmem:              5300 kB
> > >
> > > So clearly this impacts Shmem.. was your test run against tmpfs or some
> > > other (disk based) fs?
> > 
> > Hi Brian,
> > 
> > Thanks for clarifying. My issue was tmpfs not mounted at /tmp in my system:
> > 
> > ==> meminfo.start <==
> > Shmem:               572 kB
> > ==> meminfo.stop <==
> > Shmem:             51688 kB
> > 
> 
> Ok, makes sense.
> 
> > >
> > > FWIW, I don't have any objection to exposing inode size if it's commonly
> > > useful information. My feedback was more just an fyi that i_size doesn't
> > > necessarily reflect underlying space consumption (whether it's memory or
> > > disk space) in more generic cases, because it sounds like that is really
> > > what you're after here. The opposite example to the above would be
> > > something like an 'xfs_io -fc "truncate 1t" /tmp/file', which shows a
> > > 1TB inode size with zero additional shmem usage.
> > 
> > From these cases, it seems the more generic way to do this is by
> > calculating the actual size consumed using the blocks. (i_blocks *
> > 512). So in the latter example  'xfs_io -fc "truncate 1t" /tmp/file'
> > the size consumed would be zero. Let me know if it sounds ok to you
> > and I can repost the updated version.
> > 
> 
> That sounds a bit more useful to me if you're interested in space usage,
> or at least I don't have a better idea for you. ;)
> 
> One thing to note is that I'm not sure whether all fs' use i_blocks
> reliably. E.g., XFS populates stat->blocks via a separate block counter
> in the XFS specific inode structure (see xfs_vn_getattr()). A bunch of
> other fs' seem to touch it so perhaps that is just an outlier. You could
> consider fixing that up, perhaps make a ->getattr() call to avoid it, or
> just use the field directly if it's useful enough as is and there are no
> other objections. Something to think about anyways..
> 

Oh, I wonder if you're looking for similar "file rss" information this
series wants to collect/expose..?

https://lore.kernel.org/linux-fsdevel/20220624080444.7619-1-christian.koenig@amd.com/#r

Brian

> Brian
> 
> > Thanks,
> > Kalesh
> > 
> > >
> > > Brian
> > >
> > > > cat /proc/meminfo > meminfo.start
> > > > xfs_io -fc "falloc -k 0 50m" ./xfs_file
> > > > cat /proc/meminfo > meminfo.stop
> > > > tail -n +1 meminfo.st* | grep -i '==\|Shmem:'
> > > >
> > > > ==> meminfo.start <==
> > > > Shmem:               484 kB
> > > > ==> meminfo.stop <==
> > > > Shmem:               484 kB
> > > >
> > > > ls -lh xfs_file
> > > > -rw------- 1 root root 0 Jun 28 15:12 xfs_file
> > > >
> > > > stat xfs_file
> > > >   File: xfs_file
> > > >   Size: 0               Blocks: 102400     IO Block: 4096   regular empty file
> > > >
> > > > Thanks,
> > > > Kalesh
> > > >
> > > > >
> > > > > Brian
> > > > >
> > > > > >
> > > > > >       /* show_fd_locks() never deferences files so a stale value is safe */
> > > > > >       show_fd_locks(m, file, files);
> > > > > > --
> > > > > > 2.37.0.rc0.161.g10f37bed90-goog
> > > > > >
> > > > >
> > > >
> > >
> > > --
> > > To unsubscribe from this group and stop receiving emails from it, send an email to kernel-team+unsubscribe@android.com.
> > >
> > 


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 1/2] procfs: Add 'size' to /proc/<pid>/fdinfo/
  2022-06-30 12:03               ` Brian Foster
@ 2022-06-30 21:30                 ` Kalesh Singh
  -1 siblings, 0 replies; 20+ messages in thread
From: Kalesh Singh @ 2022-06-30 21:30 UTC (permalink / raw)
  To: Brian Foster
  Cc: Christian König, Christian König, Alexander Viro,
	Christoph Hellwig, Stephen Brennan, David.Laight, Ioannis Ilkos,
	T.J. Mercier, Suren Baghdasaryan, Cc: Android Kernel,
	Jonathan Corbet, Sumit Semwal, Andrew Morton, Johannes Weiner,
	Christoph Anton Mitterer, Paul Gortmaker, Mike Rapoport,
	Randy Dunlap, LKML, linux-fsdevel, open list:DOCUMENTATION,
	Linux Media Mailing List, DRI mailing list,
	moderated list:DMA BUFFER SHARING FRAMEWORK

On Thu, Jun 30, 2022 at 5:03 AM Brian Foster <bfoster@redhat.com> wrote:
>
> On Thu, Jun 30, 2022 at 07:48:46AM -0400, Brian Foster wrote:
> > On Wed, Jun 29, 2022 at 01:43:11PM -0700, Kalesh Singh wrote:
> > > On Wed, Jun 29, 2022 at 5:23 AM Brian Foster <bfoster@redhat.com> wrote:
> > > >
> > > > On Tue, Jun 28, 2022 at 03:38:02PM -0700, Kalesh Singh wrote:
> > > > > On Tue, Jun 28, 2022 at 4:54 AM Brian Foster <bfoster@redhat.com> wrote:
> > > > > >
> > > > > > On Thu, Jun 23, 2022 at 03:06:06PM -0700, Kalesh Singh wrote:
> > > > > > > To be able to account the amount of memory a process is keeping pinned
> > > > > > > by open file descriptors add a 'size' field to fdinfo output.
> > > > > > >
> > > > > > > dmabufs fds already expose a 'size' field for this reason, remove this
> > > > > > > and make it a common field for all fds. This allows tracking of
> > > > > > > other types of memory (e.g. memfd and ashmem in Android).
> > > > > > >
> > > > > > > Signed-off-by: Kalesh Singh <kaleshsingh@google.com>
> > > > > > > Reviewed-by: Christian König <christian.koenig@amd.com>
> > > > > > > ---
> > > > > > >
> > > > > > > Changes in v2:
> > > > > > >   - Add Christian's Reviewed-by
> > > > > > >
> > > > > > > Changes from rfc:
> > > > > > >   - Split adding 'size' and 'path' into a separate patches, per Christian
> > > > > > >   - Split fdinfo seq_printf into separate lines, per Christian
> > > > > > >   - Fix indentation (use tabs) in documentaion, per Randy
> > > > > > >
> > > > > > >  Documentation/filesystems/proc.rst | 12 ++++++++++--
> > > > > > >  drivers/dma-buf/dma-buf.c          |  1 -
> > > > > > >  fs/proc/fd.c                       |  9 +++++----
> > > > > > >  3 files changed, 15 insertions(+), 7 deletions(-)
> > > > > > >
> > > > ...
> > > > > >
> > > > > > Also not sure if it matters that much for your use case, but something
> > > > > > worth noting at least with shmem is that one can do something like:
> > > > > >
> > > > > > # cat /proc/meminfo | grep Shmem:
> > > > > > Shmem:               764 kB
> > > > > > # xfs_io -fc "falloc -k 0 10m" ./file
> > > > > > # ls -alh file
> > > > > > -rw-------. 1 root root 0 Jun 28 07:22 file
> > > > > > # stat file
> > > > > >   File: file
> > > > > >   Size: 0               Blocks: 20480      IO Block: 4096   regular empty file
> > > > > > # cat /proc/meminfo | grep Shmem:
> > > > > > Shmem:             11004 kB
> > > > > >
> > > > > > ... where the resulting memory usage isn't reflected in i_size (but is
> > > > > > is in i_blocks/bytes).
> > > > >
> > > > > I tried a similar experiment a few times, but I don't see the same
> > > > > results. In my case, there is not any change in shmem. IIUC the
> > > > > fallocate is allocating the disk space not shared memory.
> > > > >
> > > >
> > > > Sorry, it was implied in my previous test was that I was running against
> > > > tmpfs. So regardless of fs, the fallocate keep_size semantics shown in
> > > > both cases is as expected: the underlying blocks are allocated and the
> > > > inode size is unchanged.
> > > >
> > > > What wasn't totally clear to me when I read this patch was 1. whether
> > > > tmpfs refers to Shmem and 2. whether tmpfs allowed this sort of
> > > > operation. The test above seems to confirm both, however, right? E.g., a
> > > > more detailed example:
> > > >
> > > > # mount | grep /tmp
> > > > tmpfs on /tmp type tmpfs (rw,nosuid,nodev,seclabel,nr_inodes=1048576,inode64)
> > > > # cat /proc/meminfo | grep Shmem:
> > > > Shmem:              5300 kB
> > > > # xfs_io -fc "falloc -k 0 1g" /tmp/file
> > > > # stat /tmp/file
> > > >   File: /tmp/file
> > > >   Size: 0               Blocks: 2097152    IO Block: 4096   regular empty file
> > > > Device: 22h/34d Inode: 45          Links: 1
> > > > Access: (0600/-rw-------)  Uid: (    0/    root)   Gid: (    0/    root)
> > > > Context: unconfined_u:object_r:user_tmp_t:s0
> > > > Access: 2022-06-29 08:04:01.301307154 -0400
> > > > Modify: 2022-06-29 08:04:01.301307154 -0400
> > > > Change: 2022-06-29 08:04:01.451312834 -0400
> > > >  Birth: 2022-06-29 08:04:01.301307154 -0400
> > > > # cat /proc/meminfo | grep Shmem:
> > > > Shmem:           1053876 kB
> > > > # rm -f /tmp/file
> > > > # cat /proc/meminfo | grep Shmem:
> > > > Shmem:              5300 kB
> > > >
> > > > So clearly this impacts Shmem.. was your test run against tmpfs or some
> > > > other (disk based) fs?
> > >
> > > Hi Brian,
> > >
> > > Thanks for clarifying. My issue was tmpfs not mounted at /tmp in my system:
> > >
> > > ==> meminfo.start <==
> > > Shmem:               572 kB
> > > ==> meminfo.stop <==
> > > Shmem:             51688 kB
> > >
> >
> > Ok, makes sense.
> >
> > > >
> > > > FWIW, I don't have any objection to exposing inode size if it's commonly
> > > > useful information. My feedback was more just an fyi that i_size doesn't
> > > > necessarily reflect underlying space consumption (whether it's memory or
> > > > disk space) in more generic cases, because it sounds like that is really
> > > > what you're after here. The opposite example to the above would be
> > > > something like an 'xfs_io -fc "truncate 1t" /tmp/file', which shows a
> > > > 1TB inode size with zero additional shmem usage.
> > >
> > > From these cases, it seems the more generic way to do this is by
> > > calculating the actual size consumed using the blocks. (i_blocks *
> > > 512). So in the latter example  'xfs_io -fc "truncate 1t" /tmp/file'
> > > the size consumed would be zero. Let me know if it sounds ok to you
> > > and I can repost the updated version.
> > >
> >
> > That sounds a bit more useful to me if you're interested in space usage,
> > or at least I don't have a better idea for you. ;)
> >
> > One thing to note is that I'm not sure whether all fs' use i_blocks
> > reliably. E.g., XFS populates stat->blocks via a separate block counter
> > in the XFS specific inode structure (see xfs_vn_getattr()). A bunch of
> > other fs' seem to touch it so perhaps that is just an outlier. You could
> > consider fixing that up, perhaps make a ->getattr() call to avoid it, or
> > just use the field directly if it's useful enough as is and there are no
> > other objections. Something to think about anyways..
> >

Hi Brian,

Thanks for pointing it out. Let me take a look into the xfs case.

>
> Oh, I wonder if you're looking for similar "file rss" information this
> series wants to collect/expose..?
>
> https://lore.kernel.org/linux-fsdevel/20220624080444.7619-1-christian.koenig@amd.com/#r

Christian's series seems to have some overlap with what we want to
achieve here. IIUC it exposes the information on the per process
granularity. Perhaps if that approach is agreed on, I think we can use
the file_rss() f_op to expose the  per file size in the fdinfo for the
cases where the i_blocks are unreliable.

Thanks,
Kalesh

>
> Brian
>
> > Brian
> >
> > > Thanks,
> > > Kalesh
> > >
> > > >
> > > > Brian
> > > >
> > > > > cat /proc/meminfo > meminfo.start
> > > > > xfs_io -fc "falloc -k 0 50m" ./xfs_file
> > > > > cat /proc/meminfo > meminfo.stop
> > > > > tail -n +1 meminfo.st* | grep -i '==\|Shmem:'
> > > > >
> > > > > ==> meminfo.start <==
> > > > > Shmem:               484 kB
> > > > > ==> meminfo.stop <==
> > > > > Shmem:               484 kB
> > > > >
> > > > > ls -lh xfs_file
> > > > > -rw------- 1 root root 0 Jun 28 15:12 xfs_file
> > > > >
> > > > > stat xfs_file
> > > > >   File: xfs_file
> > > > >   Size: 0               Blocks: 102400     IO Block: 4096   regular empty file
> > > > >
> > > > > Thanks,
> > > > > Kalesh
> > > > >
> > > > > >
> > > > > > Brian
> > > > > >
> > > > > > >
> > > > > > >       /* show_fd_locks() never deferences files so a stale value is safe */
> > > > > > >       show_fd_locks(m, file, files);
> > > > > > > --
> > > > > > > 2.37.0.rc0.161.g10f37bed90-goog
> > > > > > >
> > > > > >
> > > > >
> > > >
> > > > --
> > > > To unsubscribe from this group and stop receiving emails from it, send an email to kernel-team+unsubscribe@android.com.
> > > >
> > >
>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 1/2] procfs: Add 'size' to /proc/<pid>/fdinfo/
@ 2022-06-30 21:30                 ` Kalesh Singh
  0 siblings, 0 replies; 20+ messages in thread
From: Kalesh Singh @ 2022-06-30 21:30 UTC (permalink / raw)
  To: Brian Foster
  Cc: open list:DOCUMENTATION, DRI mailing list, Stephen Brennan,
	Paul Gortmaker, Sumit Semwal, Jonathan Corbet,
	Christoph Anton Mitterer, Christoph Hellwig, Cc: Android Kernel,
	Linux Media Mailing List, Christian König,
	moderated list:DMA BUFFER SHARING FRAMEWORK, Alexander Viro,
	Suren Baghdasaryan, T.J. Mercier, Randy Dunlap, Ioannis Ilkos,
	LKML, David.Laight, Johannes Weiner, linux-fsdevel,
	Andrew Morton, Christian König, Mike Rapoport

On Thu, Jun 30, 2022 at 5:03 AM Brian Foster <bfoster@redhat.com> wrote:
>
> On Thu, Jun 30, 2022 at 07:48:46AM -0400, Brian Foster wrote:
> > On Wed, Jun 29, 2022 at 01:43:11PM -0700, Kalesh Singh wrote:
> > > On Wed, Jun 29, 2022 at 5:23 AM Brian Foster <bfoster@redhat.com> wrote:
> > > >
> > > > On Tue, Jun 28, 2022 at 03:38:02PM -0700, Kalesh Singh wrote:
> > > > > On Tue, Jun 28, 2022 at 4:54 AM Brian Foster <bfoster@redhat.com> wrote:
> > > > > >
> > > > > > On Thu, Jun 23, 2022 at 03:06:06PM -0700, Kalesh Singh wrote:
> > > > > > > To be able to account the amount of memory a process is keeping pinned
> > > > > > > by open file descriptors add a 'size' field to fdinfo output.
> > > > > > >
> > > > > > > dmabufs fds already expose a 'size' field for this reason, remove this
> > > > > > > and make it a common field for all fds. This allows tracking of
> > > > > > > other types of memory (e.g. memfd and ashmem in Android).
> > > > > > >
> > > > > > > Signed-off-by: Kalesh Singh <kaleshsingh@google.com>
> > > > > > > Reviewed-by: Christian König <christian.koenig@amd.com>
> > > > > > > ---
> > > > > > >
> > > > > > > Changes in v2:
> > > > > > >   - Add Christian's Reviewed-by
> > > > > > >
> > > > > > > Changes from rfc:
> > > > > > >   - Split adding 'size' and 'path' into a separate patches, per Christian
> > > > > > >   - Split fdinfo seq_printf into separate lines, per Christian
> > > > > > >   - Fix indentation (use tabs) in documentaion, per Randy
> > > > > > >
> > > > > > >  Documentation/filesystems/proc.rst | 12 ++++++++++--
> > > > > > >  drivers/dma-buf/dma-buf.c          |  1 -
> > > > > > >  fs/proc/fd.c                       |  9 +++++----
> > > > > > >  3 files changed, 15 insertions(+), 7 deletions(-)
> > > > > > >
> > > > ...
> > > > > >
> > > > > > Also not sure if it matters that much for your use case, but something
> > > > > > worth noting at least with shmem is that one can do something like:
> > > > > >
> > > > > > # cat /proc/meminfo | grep Shmem:
> > > > > > Shmem:               764 kB
> > > > > > # xfs_io -fc "falloc -k 0 10m" ./file
> > > > > > # ls -alh file
> > > > > > -rw-------. 1 root root 0 Jun 28 07:22 file
> > > > > > # stat file
> > > > > >   File: file
> > > > > >   Size: 0               Blocks: 20480      IO Block: 4096   regular empty file
> > > > > > # cat /proc/meminfo | grep Shmem:
> > > > > > Shmem:             11004 kB
> > > > > >
> > > > > > ... where the resulting memory usage isn't reflected in i_size (but is
> > > > > > is in i_blocks/bytes).
> > > > >
> > > > > I tried a similar experiment a few times, but I don't see the same
> > > > > results. In my case, there is not any change in shmem. IIUC the
> > > > > fallocate is allocating the disk space not shared memory.
> > > > >
> > > >
> > > > Sorry, it was implied in my previous test was that I was running against
> > > > tmpfs. So regardless of fs, the fallocate keep_size semantics shown in
> > > > both cases is as expected: the underlying blocks are allocated and the
> > > > inode size is unchanged.
> > > >
> > > > What wasn't totally clear to me when I read this patch was 1. whether
> > > > tmpfs refers to Shmem and 2. whether tmpfs allowed this sort of
> > > > operation. The test above seems to confirm both, however, right? E.g., a
> > > > more detailed example:
> > > >
> > > > # mount | grep /tmp
> > > > tmpfs on /tmp type tmpfs (rw,nosuid,nodev,seclabel,nr_inodes=1048576,inode64)
> > > > # cat /proc/meminfo | grep Shmem:
> > > > Shmem:              5300 kB
> > > > # xfs_io -fc "falloc -k 0 1g" /tmp/file
> > > > # stat /tmp/file
> > > >   File: /tmp/file
> > > >   Size: 0               Blocks: 2097152    IO Block: 4096   regular empty file
> > > > Device: 22h/34d Inode: 45          Links: 1
> > > > Access: (0600/-rw-------)  Uid: (    0/    root)   Gid: (    0/    root)
> > > > Context: unconfined_u:object_r:user_tmp_t:s0
> > > > Access: 2022-06-29 08:04:01.301307154 -0400
> > > > Modify: 2022-06-29 08:04:01.301307154 -0400
> > > > Change: 2022-06-29 08:04:01.451312834 -0400
> > > >  Birth: 2022-06-29 08:04:01.301307154 -0400
> > > > # cat /proc/meminfo | grep Shmem:
> > > > Shmem:           1053876 kB
> > > > # rm -f /tmp/file
> > > > # cat /proc/meminfo | grep Shmem:
> > > > Shmem:              5300 kB
> > > >
> > > > So clearly this impacts Shmem.. was your test run against tmpfs or some
> > > > other (disk based) fs?
> > >
> > > Hi Brian,
> > >
> > > Thanks for clarifying. My issue was tmpfs not mounted at /tmp in my system:
> > >
> > > ==> meminfo.start <==
> > > Shmem:               572 kB
> > > ==> meminfo.stop <==
> > > Shmem:             51688 kB
> > >
> >
> > Ok, makes sense.
> >
> > > >
> > > > FWIW, I don't have any objection to exposing inode size if it's commonly
> > > > useful information. My feedback was more just an fyi that i_size doesn't
> > > > necessarily reflect underlying space consumption (whether it's memory or
> > > > disk space) in more generic cases, because it sounds like that is really
> > > > what you're after here. The opposite example to the above would be
> > > > something like an 'xfs_io -fc "truncate 1t" /tmp/file', which shows a
> > > > 1TB inode size with zero additional shmem usage.
> > >
> > > From these cases, it seems the more generic way to do this is by
> > > calculating the actual size consumed using the blocks. (i_blocks *
> > > 512). So in the latter example  'xfs_io -fc "truncate 1t" /tmp/file'
> > > the size consumed would be zero. Let me know if it sounds ok to you
> > > and I can repost the updated version.
> > >
> >
> > That sounds a bit more useful to me if you're interested in space usage,
> > or at least I don't have a better idea for you. ;)
> >
> > One thing to note is that I'm not sure whether all fs' use i_blocks
> > reliably. E.g., XFS populates stat->blocks via a separate block counter
> > in the XFS specific inode structure (see xfs_vn_getattr()). A bunch of
> > other fs' seem to touch it so perhaps that is just an outlier. You could
> > consider fixing that up, perhaps make a ->getattr() call to avoid it, or
> > just use the field directly if it's useful enough as is and there are no
> > other objections. Something to think about anyways..
> >

Hi Brian,

Thanks for pointing it out. Let me take a look into the xfs case.

>
> Oh, I wonder if you're looking for similar "file rss" information this
> series wants to collect/expose..?
>
> https://lore.kernel.org/linux-fsdevel/20220624080444.7619-1-christian.koenig@amd.com/#r

Christian's series seems to have some overlap with what we want to
achieve here. IIUC it exposes the information on the per process
granularity. Perhaps if that approach is agreed on, I think we can use
the file_rss() f_op to expose the  per file size in the fdinfo for the
cases where the i_blocks are unreliable.

Thanks,
Kalesh

>
> Brian
>
> > Brian
> >
> > > Thanks,
> > > Kalesh
> > >
> > > >
> > > > Brian
> > > >
> > > > > cat /proc/meminfo > meminfo.start
> > > > > xfs_io -fc "falloc -k 0 50m" ./xfs_file
> > > > > cat /proc/meminfo > meminfo.stop
> > > > > tail -n +1 meminfo.st* | grep -i '==\|Shmem:'
> > > > >
> > > > > ==> meminfo.start <==
> > > > > Shmem:               484 kB
> > > > > ==> meminfo.stop <==
> > > > > Shmem:               484 kB
> > > > >
> > > > > ls -lh xfs_file
> > > > > -rw------- 1 root root 0 Jun 28 15:12 xfs_file
> > > > >
> > > > > stat xfs_file
> > > > >   File: xfs_file
> > > > >   Size: 0               Blocks: 102400     IO Block: 4096   regular empty file
> > > > >
> > > > > Thanks,
> > > > > Kalesh
> > > > >
> > > > > >
> > > > > > Brian
> > > > > >
> > > > > > >
> > > > > > >       /* show_fd_locks() never deferences files so a stale value is safe */
> > > > > > >       show_fd_locks(m, file, files);
> > > > > > > --
> > > > > > > 2.37.0.rc0.161.g10f37bed90-goog
> > > > > > >
> > > > > >
> > > > >
> > > >
> > > > --
> > > > To unsubscribe from this group and stop receiving emails from it, send an email to kernel-team+unsubscribe@android.com.
> > > >
> > >
>

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2022-06-30 21:30 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-06-23 22:06 [PATCH v2 0/2] procfs: Add file path and size to /proc/<pid>/fdinfo Kalesh Singh
2022-06-23 22:06 ` Kalesh Singh
2022-06-23 22:06 ` [PATCH v2 1/2] procfs: Add 'size' to /proc/<pid>/fdinfo/ Kalesh Singh
2022-06-23 22:06   ` Kalesh Singh
2022-06-28 11:53   ` Brian Foster
2022-06-28 11:53     ` Brian Foster
2022-06-28 22:38     ` Kalesh Singh
2022-06-28 22:38       ` Kalesh Singh
2022-06-29 12:23       ` Brian Foster
2022-06-29 12:23         ` Brian Foster
2022-06-29 20:43         ` Kalesh Singh
2022-06-29 20:43           ` Kalesh Singh
2022-06-30 11:48           ` Brian Foster
2022-06-30 11:48             ` Brian Foster
2022-06-30 12:03             ` Brian Foster
2022-06-30 12:03               ` Brian Foster
2022-06-30 21:30               ` Kalesh Singh
2022-06-30 21:30                 ` Kalesh Singh
2022-06-23 22:06 ` [PATCH v2 2/2] procfs: Add 'path' " Kalesh Singh
2022-06-23 22:06   ` Kalesh Singh

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.