All of lore.kernel.org
 help / color / mirror / Atom feed
* Per file OOM badness
@ 2022-05-31  9:59 ` Christian König
  0 siblings, 0 replies; 145+ messages in thread
From: Christian König @ 2022-05-31  9:59 UTC (permalink / raw)
  To: linux-media, linux-kernel, intel-gfx, amd-gfx, nouveau,
	linux-tegra, linux-fsdevel, linux-mm
  Cc: christian.koenig, alexander.deucher, daniel, viro, akpm, hughd,
	andrey.grodzovsky

Hello everyone, 

To summarize the issue I'm trying to address here: Processes can allocate
resources through a file descriptor without being held responsible for it.

Especially for the DRM graphics driver subsystem this is rather
problematic. Modern games tend to allocate huge amounts of system memory
through the DRM drivers to make it accessible to GPU rendering.

But even outside of the DRM subsystem this problem exists and it is
trivial to exploit. See the following simple example of
using memfd_create():

         fd = memfd_create("test", 0);
         while (1)
                 write(fd, page, 4096);

Compile this and you can bring down any standard desktop system within
seconds.

The background is that the OOM killer will kill every processes in the
system, but just not the one which holds the only reference to the memory
allocated by the memfd.

Those problems where brought up on the mailing list multiple times now
[1][2][3], but without any final conclusion how to address them. Since
file descriptors are considered shared the process can not directly held
accountable for allocations made through them. Additional to that file
descriptors can also easily move between processes as well.

So what this patch set does is to instead of trying to account the
allocated memory to a specific process it adds a callback to struct
file_operations which the OOM killer can use to query the specific OOM
badness of this file reference. This badness is then divided by the
file_count, so that every process using a shmem file, DMA-buf or DRM
driver will get it's equal amount of OOM badness.

Callbacks are then implemented for the two core users (memfd and DMA-buf)
as well as 72 DRM based graphics drivers.

The result is that the OOM killer can now much better judge if a process
is worth killing to free up memory. Resulting a quite a bit better system
stability in OOM situations, especially while running games.

The only other possibility I can see would be to change the accounting of
resources whenever references to the file structure change, but this would
mean quite some additional overhead for a rather common operation.

Additionally I think trying to limit device driver allocations using
cgroups is orthogonal to this effort. While cgroups is very useful, it
works on per process limits and tries to enforce a collaborative model on
memory management while the OOM killer enforces a competitive model.

Please comment and/or review, we have that problem flying around for years
now and are not at a point where we finally need to find a solution for
this.

Regards,
Christian.

[1] https://lists.freedesktop.org/archives/dri-devel/2015-September/089778.html
[2] https://lkml.org/lkml/2018/1/18/543
[3] https://lkml.org/lkml/2021/2/4/799



^ permalink raw reply	[flat|nested] 145+ messages in thread

* [Nouveau] Per file OOM badness
@ 2022-05-31  9:59 ` Christian König
  0 siblings, 0 replies; 145+ messages in thread
From: Christian König @ 2022-05-31  9:59 UTC (permalink / raw)
  To: linux-media, linux-kernel, intel-gfx, amd-gfx, nouveau,
	linux-tegra, linux-fsdevel, linux-mm
  Cc: andrey.grodzovsky, hughd, viro, daniel, alexander.deucher, akpm,
	christian.koenig

Hello everyone, 

To summarize the issue I'm trying to address here: Processes can allocate
resources through a file descriptor without being held responsible for it.

Especially for the DRM graphics driver subsystem this is rather
problematic. Modern games tend to allocate huge amounts of system memory
through the DRM drivers to make it accessible to GPU rendering.

But even outside of the DRM subsystem this problem exists and it is
trivial to exploit. See the following simple example of
using memfd_create():

         fd = memfd_create("test", 0);
         while (1)
                 write(fd, page, 4096);

Compile this and you can bring down any standard desktop system within
seconds.

The background is that the OOM killer will kill every processes in the
system, but just not the one which holds the only reference to the memory
allocated by the memfd.

Those problems where brought up on the mailing list multiple times now
[1][2][3], but without any final conclusion how to address them. Since
file descriptors are considered shared the process can not directly held
accountable for allocations made through them. Additional to that file
descriptors can also easily move between processes as well.

So what this patch set does is to instead of trying to account the
allocated memory to a specific process it adds a callback to struct
file_operations which the OOM killer can use to query the specific OOM
badness of this file reference. This badness is then divided by the
file_count, so that every process using a shmem file, DMA-buf or DRM
driver will get it's equal amount of OOM badness.

Callbacks are then implemented for the two core users (memfd and DMA-buf)
as well as 72 DRM based graphics drivers.

The result is that the OOM killer can now much better judge if a process
is worth killing to free up memory. Resulting a quite a bit better system
stability in OOM situations, especially while running games.

The only other possibility I can see would be to change the accounting of
resources whenever references to the file structure change, but this would
mean quite some additional overhead for a rather common operation.

Additionally I think trying to limit device driver allocations using
cgroups is orthogonal to this effort. While cgroups is very useful, it
works on per process limits and tries to enforce a collaborative model on
memory management while the OOM killer enforces a competitive model.

Please comment and/or review, we have that problem flying around for years
now and are not at a point where we finally need to find a solution for
this.

Regards,
Christian.

[1] https://lists.freedesktop.org/archives/dri-devel/2015-September/089778.html
[2] https://lkml.org/lkml/2018/1/18/543
[3] https://lkml.org/lkml/2021/2/4/799



^ permalink raw reply	[flat|nested] 145+ messages in thread

* Per file OOM badness
@ 2022-05-31  9:59 ` Christian König
  0 siblings, 0 replies; 145+ messages in thread
From: Christian König @ 2022-05-31  9:59 UTC (permalink / raw)
  To: linux-media, linux-kernel, intel-gfx, amd-gfx, nouveau,
	linux-tegra, linux-fsdevel, linux-mm
  Cc: andrey.grodzovsky, hughd, viro, daniel, alexander.deucher, akpm,
	christian.koenig

Hello everyone, 

To summarize the issue I'm trying to address here: Processes can allocate
resources through a file descriptor without being held responsible for it.

Especially for the DRM graphics driver subsystem this is rather
problematic. Modern games tend to allocate huge amounts of system memory
through the DRM drivers to make it accessible to GPU rendering.

But even outside of the DRM subsystem this problem exists and it is
trivial to exploit. See the following simple example of
using memfd_create():

         fd = memfd_create("test", 0);
         while (1)
                 write(fd, page, 4096);

Compile this and you can bring down any standard desktop system within
seconds.

The background is that the OOM killer will kill every processes in the
system, but just not the one which holds the only reference to the memory
allocated by the memfd.

Those problems where brought up on the mailing list multiple times now
[1][2][3], but without any final conclusion how to address them. Since
file descriptors are considered shared the process can not directly held
accountable for allocations made through them. Additional to that file
descriptors can also easily move between processes as well.

So what this patch set does is to instead of trying to account the
allocated memory to a specific process it adds a callback to struct
file_operations which the OOM killer can use to query the specific OOM
badness of this file reference. This badness is then divided by the
file_count, so that every process using a shmem file, DMA-buf or DRM
driver will get it's equal amount of OOM badness.

Callbacks are then implemented for the two core users (memfd and DMA-buf)
as well as 72 DRM based graphics drivers.

The result is that the OOM killer can now much better judge if a process
is worth killing to free up memory. Resulting a quite a bit better system
stability in OOM situations, especially while running games.

The only other possibility I can see would be to change the accounting of
resources whenever references to the file structure change, but this would
mean quite some additional overhead for a rather common operation.

Additionally I think trying to limit device driver allocations using
cgroups is orthogonal to this effort. While cgroups is very useful, it
works on per process limits and tries to enforce a collaborative model on
memory management while the OOM killer enforces a competitive model.

Please comment and/or review, we have that problem flying around for years
now and are not at a point where we finally need to find a solution for
this.

Regards,
Christian.

[1] https://lists.freedesktop.org/archives/dri-devel/2015-September/089778.html
[2] https://lkml.org/lkml/2018/1/18/543
[3] https://lkml.org/lkml/2021/2/4/799



^ permalink raw reply	[flat|nested] 145+ messages in thread

* [PATCH 01/13] fs: add OOM badness callback to file_operatrations struct
  2022-05-31  9:59 ` [Nouveau] " Christian König
  (?)
@ 2022-05-31  9:59   ` Christian König
  -1 siblings, 0 replies; 145+ messages in thread
From: Christian König @ 2022-05-31  9:59 UTC (permalink / raw)
  To: linux-media, linux-kernel, intel-gfx, amd-gfx, nouveau,
	linux-tegra, linux-fsdevel, linux-mm
  Cc: christian.koenig, alexander.deucher, daniel, viro, akpm, hughd,
	andrey.grodzovsky

From: Andrey Grodzovsky <andrey.grodzovsky@amd.com>

This allows file_operation implementations to specify an additional
badness for the OOM killer when they allocate memory on behalf of
userspace.

This badness is per file because file descriptor and therefor the
reference to the allocated memory can migrate between processes.

For easy debugging this also adds printing of the per file oom badness
to fdinfo inside procfs.

Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
Signed-off-by: Christian König <christian.koenig@amd.com>
---
 fs/proc/fd.c       | 4 ++++
 include/linux/fs.h | 1 +
 2 files changed, 5 insertions(+)

diff --git a/fs/proc/fd.c b/fs/proc/fd.c
index 172c86270b31..d1905c05cb3a 100644
--- a/fs/proc/fd.c
+++ b/fs/proc/fd.c
@@ -59,6 +59,10 @@ static int seq_show(struct seq_file *m, void *v)
 		   real_mount(file->f_path.mnt)->mnt_id,
 		   file_inode(file)->i_ino);
 
+	if (file->f_op->oom_badness)
+		seq_printf(m, "oom_badness:\t%lu\n",
+			   file->f_op->oom_badness(file));
+
 	/* show_fd_locks() never deferences files so a stale value is safe */
 	show_fd_locks(m, file, files);
 	if (seq_has_overflowed(m))
diff --git a/include/linux/fs.h b/include/linux/fs.h
index bbde95387a23..d5222543aeb0 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -1995,6 +1995,7 @@ struct file_operations {
 				   struct file *file_out, loff_t pos_out,
 				   loff_t len, unsigned int remap_flags);
 	int (*fadvise)(struct file *, loff_t, loff_t, int);
+	long (*oom_badness)(struct file *);
 } __randomize_layout;
 
 struct inode_operations {
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 145+ messages in thread

* [Nouveau] [PATCH 01/13] fs: add OOM badness callback to file_operatrations struct
@ 2022-05-31  9:59   ` Christian König
  0 siblings, 0 replies; 145+ messages in thread
From: Christian König @ 2022-05-31  9:59 UTC (permalink / raw)
  To: linux-media, linux-kernel, intel-gfx, amd-gfx, nouveau,
	linux-tegra, linux-fsdevel, linux-mm
  Cc: andrey.grodzovsky, hughd, viro, daniel, alexander.deucher, akpm,
	christian.koenig

From: Andrey Grodzovsky <andrey.grodzovsky@amd.com>

This allows file_operation implementations to specify an additional
badness for the OOM killer when they allocate memory on behalf of
userspace.

This badness is per file because file descriptor and therefor the
reference to the allocated memory can migrate between processes.

For easy debugging this also adds printing of the per file oom badness
to fdinfo inside procfs.

Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
Signed-off-by: Christian König <christian.koenig@amd.com>
---
 fs/proc/fd.c       | 4 ++++
 include/linux/fs.h | 1 +
 2 files changed, 5 insertions(+)

diff --git a/fs/proc/fd.c b/fs/proc/fd.c
index 172c86270b31..d1905c05cb3a 100644
--- a/fs/proc/fd.c
+++ b/fs/proc/fd.c
@@ -59,6 +59,10 @@ static int seq_show(struct seq_file *m, void *v)
 		   real_mount(file->f_path.mnt)->mnt_id,
 		   file_inode(file)->i_ino);
 
+	if (file->f_op->oom_badness)
+		seq_printf(m, "oom_badness:\t%lu\n",
+			   file->f_op->oom_badness(file));
+
 	/* show_fd_locks() never deferences files so a stale value is safe */
 	show_fd_locks(m, file, files);
 	if (seq_has_overflowed(m))
diff --git a/include/linux/fs.h b/include/linux/fs.h
index bbde95387a23..d5222543aeb0 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -1995,6 +1995,7 @@ struct file_operations {
 				   struct file *file_out, loff_t pos_out,
 				   loff_t len, unsigned int remap_flags);
 	int (*fadvise)(struct file *, loff_t, loff_t, int);
+	long (*oom_badness)(struct file *);
 } __randomize_layout;
 
 struct inode_operations {
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 145+ messages in thread

* [PATCH 01/13] fs: add OOM badness callback to file_operatrations struct
@ 2022-05-31  9:59   ` Christian König
  0 siblings, 0 replies; 145+ messages in thread
From: Christian König @ 2022-05-31  9:59 UTC (permalink / raw)
  To: linux-media, linux-kernel, intel-gfx, amd-gfx, nouveau,
	linux-tegra, linux-fsdevel, linux-mm
  Cc: andrey.grodzovsky, hughd, viro, daniel, alexander.deucher, akpm,
	christian.koenig

From: Andrey Grodzovsky <andrey.grodzovsky@amd.com>

This allows file_operation implementations to specify an additional
badness for the OOM killer when they allocate memory on behalf of
userspace.

This badness is per file because file descriptor and therefor the
reference to the allocated memory can migrate between processes.

For easy debugging this also adds printing of the per file oom badness
to fdinfo inside procfs.

Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
Signed-off-by: Christian König <christian.koenig@amd.com>
---
 fs/proc/fd.c       | 4 ++++
 include/linux/fs.h | 1 +
 2 files changed, 5 insertions(+)

diff --git a/fs/proc/fd.c b/fs/proc/fd.c
index 172c86270b31..d1905c05cb3a 100644
--- a/fs/proc/fd.c
+++ b/fs/proc/fd.c
@@ -59,6 +59,10 @@ static int seq_show(struct seq_file *m, void *v)
 		   real_mount(file->f_path.mnt)->mnt_id,
 		   file_inode(file)->i_ino);
 
+	if (file->f_op->oom_badness)
+		seq_printf(m, "oom_badness:\t%lu\n",
+			   file->f_op->oom_badness(file));
+
 	/* show_fd_locks() never deferences files so a stale value is safe */
 	show_fd_locks(m, file, files);
 	if (seq_has_overflowed(m))
diff --git a/include/linux/fs.h b/include/linux/fs.h
index bbde95387a23..d5222543aeb0 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -1995,6 +1995,7 @@ struct file_operations {
 				   struct file *file_out, loff_t pos_out,
 				   loff_t len, unsigned int remap_flags);
 	int (*fadvise)(struct file *, loff_t, loff_t, int);
+	long (*oom_badness)(struct file *);
 } __randomize_layout;
 
 struct inode_operations {
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 145+ messages in thread

* [PATCH 02/13] oom: take per file badness into account
  2022-05-31  9:59 ` [Nouveau] " Christian König
  (?)
@ 2022-05-31  9:59   ` Christian König
  -1 siblings, 0 replies; 145+ messages in thread
From: Christian König @ 2022-05-31  9:59 UTC (permalink / raw)
  To: linux-media, linux-kernel, intel-gfx, amd-gfx, nouveau,
	linux-tegra, linux-fsdevel, linux-mm
  Cc: christian.koenig, alexander.deucher, daniel, viro, akpm, hughd,
	andrey.grodzovsky

From: Andrey Grodzovsky <andrey.grodzovsky@amd.com>

Try to make better decisions which process to kill based on
per file OOM badness. For this the per file oom badness is queried from
every file which supports that and divided by the number of references to
that file structure.

Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
---
 mm/oom_kill.c | 21 +++++++++++++++++++++
 1 file changed, 21 insertions(+)

diff --git a/mm/oom_kill.c b/mm/oom_kill.c
index 49d7df39b02d..8a4d05e9568b 100644
--- a/mm/oom_kill.c
+++ b/mm/oom_kill.c
@@ -52,6 +52,8 @@
 #define CREATE_TRACE_POINTS
 #include <trace/events/oom.h>
 
+#include <linux/fdtable.h>
+
 int sysctl_panic_on_oom;
 int sysctl_oom_kill_allocating_task;
 int sysctl_oom_dump_tasks = 1;
@@ -189,6 +191,19 @@ static bool should_dump_unreclaim_slab(void)
 	return (global_node_page_state_pages(NR_SLAB_UNRECLAIMABLE_B) > nr_lru);
 }
 
+/* Sumup how much resources are bound by files opened. */
+static int oom_file_badness(const void *points, struct file *file, unsigned n)
+{
+	long badness;
+
+	if (!file->f_op->oom_badness)
+		return 0;
+
+	badness = file->f_op->oom_badness(file);
+	*((long *)points) += DIV_ROUND_UP(badness, file_count(file));
+	return 0;
+}
+
 /**
  * oom_badness - heuristic function to determine which candidate task to kill
  * @p: task struct of which task we should calculate
@@ -229,6 +244,12 @@ long oom_badness(struct task_struct *p, unsigned long totalpages)
 	 */
 	points = get_mm_rss(p->mm) + get_mm_counter(p->mm, MM_SWAPENTS) +
 		mm_pgtables_bytes(p->mm) / PAGE_SIZE;
+
+	/*
+	 * Add how much memory a task uses in opened files, e.g. device drivers.
+	 */
+	iterate_fd(p->files, 0, oom_file_badness, &points);
+
 	task_unlock(p);
 
 	/* Normalize to oom_score_adj units */
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 145+ messages in thread

* [Nouveau] [PATCH 02/13] oom: take per file badness into account
@ 2022-05-31  9:59   ` Christian König
  0 siblings, 0 replies; 145+ messages in thread
From: Christian König @ 2022-05-31  9:59 UTC (permalink / raw)
  To: linux-media, linux-kernel, intel-gfx, amd-gfx, nouveau,
	linux-tegra, linux-fsdevel, linux-mm
  Cc: andrey.grodzovsky, hughd, viro, daniel, alexander.deucher, akpm,
	christian.koenig

From: Andrey Grodzovsky <andrey.grodzovsky@amd.com>

Try to make better decisions which process to kill based on
per file OOM badness. For this the per file oom badness is queried from
every file which supports that and divided by the number of references to
that file structure.

Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
---
 mm/oom_kill.c | 21 +++++++++++++++++++++
 1 file changed, 21 insertions(+)

diff --git a/mm/oom_kill.c b/mm/oom_kill.c
index 49d7df39b02d..8a4d05e9568b 100644
--- a/mm/oom_kill.c
+++ b/mm/oom_kill.c
@@ -52,6 +52,8 @@
 #define CREATE_TRACE_POINTS
 #include <trace/events/oom.h>
 
+#include <linux/fdtable.h>
+
 int sysctl_panic_on_oom;
 int sysctl_oom_kill_allocating_task;
 int sysctl_oom_dump_tasks = 1;
@@ -189,6 +191,19 @@ static bool should_dump_unreclaim_slab(void)
 	return (global_node_page_state_pages(NR_SLAB_UNRECLAIMABLE_B) > nr_lru);
 }
 
+/* Sumup how much resources are bound by files opened. */
+static int oom_file_badness(const void *points, struct file *file, unsigned n)
+{
+	long badness;
+
+	if (!file->f_op->oom_badness)
+		return 0;
+
+	badness = file->f_op->oom_badness(file);
+	*((long *)points) += DIV_ROUND_UP(badness, file_count(file));
+	return 0;
+}
+
 /**
  * oom_badness - heuristic function to determine which candidate task to kill
  * @p: task struct of which task we should calculate
@@ -229,6 +244,12 @@ long oom_badness(struct task_struct *p, unsigned long totalpages)
 	 */
 	points = get_mm_rss(p->mm) + get_mm_counter(p->mm, MM_SWAPENTS) +
 		mm_pgtables_bytes(p->mm) / PAGE_SIZE;
+
+	/*
+	 * Add how much memory a task uses in opened files, e.g. device drivers.
+	 */
+	iterate_fd(p->files, 0, oom_file_badness, &points);
+
 	task_unlock(p);
 
 	/* Normalize to oom_score_adj units */
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 145+ messages in thread

* [PATCH 02/13] oom: take per file badness into account
@ 2022-05-31  9:59   ` Christian König
  0 siblings, 0 replies; 145+ messages in thread
From: Christian König @ 2022-05-31  9:59 UTC (permalink / raw)
  To: linux-media, linux-kernel, intel-gfx, amd-gfx, nouveau,
	linux-tegra, linux-fsdevel, linux-mm
  Cc: andrey.grodzovsky, hughd, viro, daniel, alexander.deucher, akpm,
	christian.koenig

From: Andrey Grodzovsky <andrey.grodzovsky@amd.com>

Try to make better decisions which process to kill based on
per file OOM badness. For this the per file oom badness is queried from
every file which supports that and divided by the number of references to
that file structure.

Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
---
 mm/oom_kill.c | 21 +++++++++++++++++++++
 1 file changed, 21 insertions(+)

diff --git a/mm/oom_kill.c b/mm/oom_kill.c
index 49d7df39b02d..8a4d05e9568b 100644
--- a/mm/oom_kill.c
+++ b/mm/oom_kill.c
@@ -52,6 +52,8 @@
 #define CREATE_TRACE_POINTS
 #include <trace/events/oom.h>
 
+#include <linux/fdtable.h>
+
 int sysctl_panic_on_oom;
 int sysctl_oom_kill_allocating_task;
 int sysctl_oom_dump_tasks = 1;
@@ -189,6 +191,19 @@ static bool should_dump_unreclaim_slab(void)
 	return (global_node_page_state_pages(NR_SLAB_UNRECLAIMABLE_B) > nr_lru);
 }
 
+/* Sumup how much resources are bound by files opened. */
+static int oom_file_badness(const void *points, struct file *file, unsigned n)
+{
+	long badness;
+
+	if (!file->f_op->oom_badness)
+		return 0;
+
+	badness = file->f_op->oom_badness(file);
+	*((long *)points) += DIV_ROUND_UP(badness, file_count(file));
+	return 0;
+}
+
 /**
  * oom_badness - heuristic function to determine which candidate task to kill
  * @p: task struct of which task we should calculate
@@ -229,6 +244,12 @@ long oom_badness(struct task_struct *p, unsigned long totalpages)
 	 */
 	points = get_mm_rss(p->mm) + get_mm_counter(p->mm, MM_SWAPENTS) +
 		mm_pgtables_bytes(p->mm) / PAGE_SIZE;
+
+	/*
+	 * Add how much memory a task uses in opened files, e.g. device drivers.
+	 */
+	iterate_fd(p->files, 0, oom_file_badness, &points);
+
 	task_unlock(p);
 
 	/* Normalize to oom_score_adj units */
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 145+ messages in thread

* [PATCH 03/13] mm: shmem: provide oom badness for shmem files
  2022-05-31  9:59 ` [Nouveau] " Christian König
  (?)
@ 2022-05-31  9:59   ` Christian König
  -1 siblings, 0 replies; 145+ messages in thread
From: Christian König @ 2022-05-31  9:59 UTC (permalink / raw)
  To: linux-media, linux-kernel, intel-gfx, amd-gfx, nouveau,
	linux-tegra, linux-fsdevel, linux-mm
  Cc: christian.koenig, alexander.deucher, daniel, viro, akpm, hughd,
	andrey.grodzovsky

This gives the OOM killer an additional hint which processes are
referencing shmem files with potentially no other accounting for them.

Signed-off-by: Christian König <christian.koenig@amd.com>
---
 mm/shmem.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/mm/shmem.c b/mm/shmem.c
index 4b2fea33158e..a4ad92a16968 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -2179,6 +2179,11 @@ unsigned long shmem_get_unmapped_area(struct file *file,
 	return inflated_addr;
 }
 
+static long shmem_oom_badness(struct file *file)
+{
+	return i_size_read(file_inode(file)) >> PAGE_SHIFT;
+}
+
 #ifdef CONFIG_NUMA
 static int shmem_set_policy(struct vm_area_struct *vma, struct mempolicy *mpol)
 {
@@ -3780,6 +3785,7 @@ EXPORT_SYMBOL(shmem_aops);
 static const struct file_operations shmem_file_operations = {
 	.mmap		= shmem_mmap,
 	.get_unmapped_area = shmem_get_unmapped_area,
+	.oom_badness	= shmem_oom_badness,
 #ifdef CONFIG_TMPFS
 	.llseek		= shmem_file_llseek,
 	.read_iter	= shmem_file_read_iter,
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 145+ messages in thread

* [Nouveau] [PATCH 03/13] mm: shmem: provide oom badness for shmem files
@ 2022-05-31  9:59   ` Christian König
  0 siblings, 0 replies; 145+ messages in thread
From: Christian König @ 2022-05-31  9:59 UTC (permalink / raw)
  To: linux-media, linux-kernel, intel-gfx, amd-gfx, nouveau,
	linux-tegra, linux-fsdevel, linux-mm
  Cc: andrey.grodzovsky, hughd, viro, daniel, alexander.deucher, akpm,
	christian.koenig

This gives the OOM killer an additional hint which processes are
referencing shmem files with potentially no other accounting for them.

Signed-off-by: Christian König <christian.koenig@amd.com>
---
 mm/shmem.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/mm/shmem.c b/mm/shmem.c
index 4b2fea33158e..a4ad92a16968 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -2179,6 +2179,11 @@ unsigned long shmem_get_unmapped_area(struct file *file,
 	return inflated_addr;
 }
 
+static long shmem_oom_badness(struct file *file)
+{
+	return i_size_read(file_inode(file)) >> PAGE_SHIFT;
+}
+
 #ifdef CONFIG_NUMA
 static int shmem_set_policy(struct vm_area_struct *vma, struct mempolicy *mpol)
 {
@@ -3780,6 +3785,7 @@ EXPORT_SYMBOL(shmem_aops);
 static const struct file_operations shmem_file_operations = {
 	.mmap		= shmem_mmap,
 	.get_unmapped_area = shmem_get_unmapped_area,
+	.oom_badness	= shmem_oom_badness,
 #ifdef CONFIG_TMPFS
 	.llseek		= shmem_file_llseek,
 	.read_iter	= shmem_file_read_iter,
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 145+ messages in thread

* [PATCH 03/13] mm: shmem: provide oom badness for shmem files
@ 2022-05-31  9:59   ` Christian König
  0 siblings, 0 replies; 145+ messages in thread
From: Christian König @ 2022-05-31  9:59 UTC (permalink / raw)
  To: linux-media, linux-kernel, intel-gfx, amd-gfx, nouveau,
	linux-tegra, linux-fsdevel, linux-mm
  Cc: andrey.grodzovsky, hughd, viro, daniel, alexander.deucher, akpm,
	christian.koenig

This gives the OOM killer an additional hint which processes are
referencing shmem files with potentially no other accounting for them.

Signed-off-by: Christian König <christian.koenig@amd.com>
---
 mm/shmem.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/mm/shmem.c b/mm/shmem.c
index 4b2fea33158e..a4ad92a16968 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -2179,6 +2179,11 @@ unsigned long shmem_get_unmapped_area(struct file *file,
 	return inflated_addr;
 }
 
+static long shmem_oom_badness(struct file *file)
+{
+	return i_size_read(file_inode(file)) >> PAGE_SHIFT;
+}
+
 #ifdef CONFIG_NUMA
 static int shmem_set_policy(struct vm_area_struct *vma, struct mempolicy *mpol)
 {
@@ -3780,6 +3785,7 @@ EXPORT_SYMBOL(shmem_aops);
 static const struct file_operations shmem_file_operations = {
 	.mmap		= shmem_mmap,
 	.get_unmapped_area = shmem_get_unmapped_area,
+	.oom_badness	= shmem_oom_badness,
 #ifdef CONFIG_TMPFS
 	.llseek		= shmem_file_llseek,
 	.read_iter	= shmem_file_read_iter,
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 145+ messages in thread

* [PATCH 04/13] dma-buf: provide oom badness for DMA-buf files
  2022-05-31  9:59 ` [Nouveau] " Christian König
  (?)
@ 2022-05-31  9:59   ` Christian König
  -1 siblings, 0 replies; 145+ messages in thread
From: Christian König @ 2022-05-31  9:59 UTC (permalink / raw)
  To: linux-media, linux-kernel, intel-gfx, amd-gfx, nouveau,
	linux-tegra, linux-fsdevel, linux-mm
  Cc: christian.koenig, alexander.deucher, daniel, viro, akpm, hughd,
	andrey.grodzovsky

For now just return the size of the DMA-buf in pages as badness in the
OOM situation. That should probably be extended to be in control of the
exporter in the future.

Signed-off-by: Christian König <christian.koenig@amd.com>
---
 drivers/dma-buf/dma-buf.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/drivers/dma-buf/dma-buf.c b/drivers/dma-buf/dma-buf.c
index a2f9a1815e38..bdd4e8767cd3 100644
--- a/drivers/dma-buf/dma-buf.c
+++ b/drivers/dma-buf/dma-buf.c
@@ -388,6 +388,12 @@ static void dma_buf_show_fdinfo(struct seq_file *m, struct file *file)
 	spin_unlock(&dmabuf->name_lock);
 }
 
+static long dma_buf_oom_badness(struct file *file)
+{
+	/* TODO: This should probably be controlled by a flag */
+	return i_size_read(file_inode(file)) >> PAGE_SHIFT;
+}
+
 static const struct file_operations dma_buf_fops = {
 	.release	= dma_buf_file_release,
 	.mmap		= dma_buf_mmap_internal,
@@ -396,6 +402,7 @@ static const struct file_operations dma_buf_fops = {
 	.unlocked_ioctl	= dma_buf_ioctl,
 	.compat_ioctl	= compat_ptr_ioctl,
 	.show_fdinfo	= dma_buf_show_fdinfo,
+	.oom_badness	= dma_buf_oom_badness,
 };
 
 /*
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 145+ messages in thread

* [Nouveau] [PATCH 04/13] dma-buf: provide oom badness for DMA-buf files
@ 2022-05-31  9:59   ` Christian König
  0 siblings, 0 replies; 145+ messages in thread
From: Christian König @ 2022-05-31  9:59 UTC (permalink / raw)
  To: linux-media, linux-kernel, intel-gfx, amd-gfx, nouveau,
	linux-tegra, linux-fsdevel, linux-mm
  Cc: andrey.grodzovsky, hughd, viro, daniel, alexander.deucher, akpm,
	christian.koenig

For now just return the size of the DMA-buf in pages as badness in the
OOM situation. That should probably be extended to be in control of the
exporter in the future.

Signed-off-by: Christian König <christian.koenig@amd.com>
---
 drivers/dma-buf/dma-buf.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/drivers/dma-buf/dma-buf.c b/drivers/dma-buf/dma-buf.c
index a2f9a1815e38..bdd4e8767cd3 100644
--- a/drivers/dma-buf/dma-buf.c
+++ b/drivers/dma-buf/dma-buf.c
@@ -388,6 +388,12 @@ static void dma_buf_show_fdinfo(struct seq_file *m, struct file *file)
 	spin_unlock(&dmabuf->name_lock);
 }
 
+static long dma_buf_oom_badness(struct file *file)
+{
+	/* TODO: This should probably be controlled by a flag */
+	return i_size_read(file_inode(file)) >> PAGE_SHIFT;
+}
+
 static const struct file_operations dma_buf_fops = {
 	.release	= dma_buf_file_release,
 	.mmap		= dma_buf_mmap_internal,
@@ -396,6 +402,7 @@ static const struct file_operations dma_buf_fops = {
 	.unlocked_ioctl	= dma_buf_ioctl,
 	.compat_ioctl	= compat_ptr_ioctl,
 	.show_fdinfo	= dma_buf_show_fdinfo,
+	.oom_badness	= dma_buf_oom_badness,
 };
 
 /*
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 145+ messages in thread

* [PATCH 04/13] dma-buf: provide oom badness for DMA-buf files
@ 2022-05-31  9:59   ` Christian König
  0 siblings, 0 replies; 145+ messages in thread
From: Christian König @ 2022-05-31  9:59 UTC (permalink / raw)
  To: linux-media, linux-kernel, intel-gfx, amd-gfx, nouveau,
	linux-tegra, linux-fsdevel, linux-mm
  Cc: andrey.grodzovsky, hughd, viro, daniel, alexander.deucher, akpm,
	christian.koenig

For now just return the size of the DMA-buf in pages as badness in the
OOM situation. That should probably be extended to be in control of the
exporter in the future.

Signed-off-by: Christian König <christian.koenig@amd.com>
---
 drivers/dma-buf/dma-buf.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/drivers/dma-buf/dma-buf.c b/drivers/dma-buf/dma-buf.c
index a2f9a1815e38..bdd4e8767cd3 100644
--- a/drivers/dma-buf/dma-buf.c
+++ b/drivers/dma-buf/dma-buf.c
@@ -388,6 +388,12 @@ static void dma_buf_show_fdinfo(struct seq_file *m, struct file *file)
 	spin_unlock(&dmabuf->name_lock);
 }
 
+static long dma_buf_oom_badness(struct file *file)
+{
+	/* TODO: This should probably be controlled by a flag */
+	return i_size_read(file_inode(file)) >> PAGE_SHIFT;
+}
+
 static const struct file_operations dma_buf_fops = {
 	.release	= dma_buf_file_release,
 	.mmap		= dma_buf_mmap_internal,
@@ -396,6 +402,7 @@ static const struct file_operations dma_buf_fops = {
 	.unlocked_ioctl	= dma_buf_ioctl,
 	.compat_ioctl	= compat_ptr_ioctl,
 	.show_fdinfo	= dma_buf_show_fdinfo,
+	.oom_badness	= dma_buf_oom_badness,
 };
 
 /*
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 145+ messages in thread

* [PATCH 05/13] drm/gem: adjust per file OOM badness on handling buffers
  2022-05-31  9:59 ` [Nouveau] " Christian König
  (?)
@ 2022-05-31  9:59   ` Christian König
  -1 siblings, 0 replies; 145+ messages in thread
From: Christian König @ 2022-05-31  9:59 UTC (permalink / raw)
  To: linux-media, linux-kernel, intel-gfx, amd-gfx, nouveau,
	linux-tegra, linux-fsdevel, linux-mm
  Cc: christian.koenig, alexander.deucher, daniel, viro, akpm, hughd,
	andrey.grodzovsky

From: Andrey Grodzovsky <andrey.grodzovsky@amd.com>

Large amounts of VRAM are usually not CPU accessible, so they are not mapped
into the processes address space. But since the device drivers usually support
swapping buffers from VRAM to system memory we can still run into an out of
memory situation when userspace starts to allocate to much.

This patch gives the OOM killer another hint which process is
holding references to memory resources.

A GEM helper is provided and automatically used for all drivers using the
DEFINE_DRM_GEM_FOPS() and DEFINE_DRM_GEM_CMA_FOPS() macros.

Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
---
 drivers/gpu/drm/drm_file.c       | 19 +++++++++++++++++++
 drivers/gpu/drm/drm_gem.c        |  5 +++++
 include/drm/drm_file.h           |  9 +++++++++
 include/drm/drm_gem.h            |  1 +
 include/drm/drm_gem_cma_helper.h |  1 +
 5 files changed, 35 insertions(+)

diff --git a/drivers/gpu/drm/drm_file.c b/drivers/gpu/drm/drm_file.c
index ed25168619fc..1959a5b7029e 100644
--- a/drivers/gpu/drm/drm_file.c
+++ b/drivers/gpu/drm/drm_file.c
@@ -1049,3 +1049,22 @@ unsigned long drm_get_unmapped_area(struct file *file,
 #endif /* CONFIG_TRANSPARENT_HUGEPAGE */
 EXPORT_SYMBOL_GPL(drm_get_unmapped_area);
 #endif /* CONFIG_MMU */
+
+
+/**
+ * drm_oom_badness() - get oom badness for struct drm_file
+ * @f: struct drm_file to get the badness from
+ *
+ * Return how many pages are allocated for this client.
+ */
+long drm_oom_badness(struct file *f)
+{
+
+	struct drm_file *file_priv = f->private_data;
+
+	if (file_priv)
+		return atomic_long_read(&file_priv->f_oom_badness);
+
+	return 0;
+}
+EXPORT_SYMBOL(drm_oom_badness);
diff --git a/drivers/gpu/drm/drm_gem.c b/drivers/gpu/drm/drm_gem.c
index eb0c2d041f13..768b28b198cd 100644
--- a/drivers/gpu/drm/drm_gem.c
+++ b/drivers/gpu/drm/drm_gem.c
@@ -256,6 +256,7 @@ drm_gem_object_release_handle(int id, void *ptr, void *data)
 	drm_gem_remove_prime_handles(obj, file_priv);
 	drm_vma_node_revoke(&obj->vma_node, file_priv);
 
+	atomic_long_sub(obj->size >> PAGE_SHIFT, &file_priv->f_oom_badness);
 	drm_gem_object_handle_put_unlocked(obj);
 
 	return 0;
@@ -291,6 +292,8 @@ drm_gem_handle_delete(struct drm_file *filp, u32 handle)
 	idr_remove(&filp->object_idr, handle);
 	spin_unlock(&filp->table_lock);
 
+	atomic_long_sub(obj->size >> PAGE_SHIFT, &filp->f_oom_badness);
+
 	return 0;
 }
 EXPORT_SYMBOL(drm_gem_handle_delete);
@@ -399,6 +402,8 @@ drm_gem_handle_create_tail(struct drm_file *file_priv,
 	}
 
 	*handlep = handle;
+
+	atomic_long_add(obj->size >> PAGE_SHIFT, &file_priv->f_oom_badness);
 	return 0;
 
 err_revoke:
diff --git a/include/drm/drm_file.h b/include/drm/drm_file.h
index e0a73a1e2df7..5926766d79f0 100644
--- a/include/drm/drm_file.h
+++ b/include/drm/drm_file.h
@@ -366,6 +366,13 @@ struct drm_file {
 #if IS_ENABLED(CONFIG_DRM_LEGACY)
 	unsigned long lock_count; /* DRI1 legacy lock count */
 #endif
+
+	/**
+	 * @f_oom_badness:
+	 *
+	 * How many pages are allocated through this driver connection.
+	 */
+	atomic_long_t		f_oom_badness;
 };
 
 /**
@@ -430,4 +437,6 @@ unsigned long drm_get_unmapped_area(struct file *file,
 #endif /* CONFIG_MMU */
 
 
+long drm_oom_badness(struct file *f);
+
 #endif /* _DRM_FILE_H_ */
diff --git a/include/drm/drm_gem.h b/include/drm/drm_gem.h
index 9d7c61a122dc..0adf8c2f62e8 100644
--- a/include/drm/drm_gem.h
+++ b/include/drm/drm_gem.h
@@ -338,6 +338,7 @@ struct drm_gem_object {
 		.read		= drm_read,\
 		.llseek		= noop_llseek,\
 		.mmap		= drm_gem_mmap,\
+		.oom_badness	= drm_oom_badness,\
 	}
 
 void drm_gem_object_release(struct drm_gem_object *obj);
diff --git a/include/drm/drm_gem_cma_helper.h b/include/drm/drm_gem_cma_helper.h
index fbda4ce5d5fb..455ce1aa6d2c 100644
--- a/include/drm/drm_gem_cma_helper.h
+++ b/include/drm/drm_gem_cma_helper.h
@@ -273,6 +273,7 @@ unsigned long drm_gem_cma_get_unmapped_area(struct file *filp,
 		.read		= drm_read,\
 		.llseek		= noop_llseek,\
 		.mmap		= drm_gem_mmap,\
+		.oom_badness	= drm_oom_badness,\
 		DRM_GEM_CMA_UNMAPPED_AREA_FOPS \
 	}
 
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 145+ messages in thread

* [Nouveau] [PATCH 05/13] drm/gem: adjust per file OOM badness on handling buffers
@ 2022-05-31  9:59   ` Christian König
  0 siblings, 0 replies; 145+ messages in thread
From: Christian König @ 2022-05-31  9:59 UTC (permalink / raw)
  To: linux-media, linux-kernel, intel-gfx, amd-gfx, nouveau,
	linux-tegra, linux-fsdevel, linux-mm
  Cc: andrey.grodzovsky, hughd, viro, daniel, alexander.deucher, akpm,
	christian.koenig

From: Andrey Grodzovsky <andrey.grodzovsky@amd.com>

Large amounts of VRAM are usually not CPU accessible, so they are not mapped
into the processes address space. But since the device drivers usually support
swapping buffers from VRAM to system memory we can still run into an out of
memory situation when userspace starts to allocate to much.

This patch gives the OOM killer another hint which process is
holding references to memory resources.

A GEM helper is provided and automatically used for all drivers using the
DEFINE_DRM_GEM_FOPS() and DEFINE_DRM_GEM_CMA_FOPS() macros.

Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
---
 drivers/gpu/drm/drm_file.c       | 19 +++++++++++++++++++
 drivers/gpu/drm/drm_gem.c        |  5 +++++
 include/drm/drm_file.h           |  9 +++++++++
 include/drm/drm_gem.h            |  1 +
 include/drm/drm_gem_cma_helper.h |  1 +
 5 files changed, 35 insertions(+)

diff --git a/drivers/gpu/drm/drm_file.c b/drivers/gpu/drm/drm_file.c
index ed25168619fc..1959a5b7029e 100644
--- a/drivers/gpu/drm/drm_file.c
+++ b/drivers/gpu/drm/drm_file.c
@@ -1049,3 +1049,22 @@ unsigned long drm_get_unmapped_area(struct file *file,
 #endif /* CONFIG_TRANSPARENT_HUGEPAGE */
 EXPORT_SYMBOL_GPL(drm_get_unmapped_area);
 #endif /* CONFIG_MMU */
+
+
+/**
+ * drm_oom_badness() - get oom badness for struct drm_file
+ * @f: struct drm_file to get the badness from
+ *
+ * Return how many pages are allocated for this client.
+ */
+long drm_oom_badness(struct file *f)
+{
+
+	struct drm_file *file_priv = f->private_data;
+
+	if (file_priv)
+		return atomic_long_read(&file_priv->f_oom_badness);
+
+	return 0;
+}
+EXPORT_SYMBOL(drm_oom_badness);
diff --git a/drivers/gpu/drm/drm_gem.c b/drivers/gpu/drm/drm_gem.c
index eb0c2d041f13..768b28b198cd 100644
--- a/drivers/gpu/drm/drm_gem.c
+++ b/drivers/gpu/drm/drm_gem.c
@@ -256,6 +256,7 @@ drm_gem_object_release_handle(int id, void *ptr, void *data)
 	drm_gem_remove_prime_handles(obj, file_priv);
 	drm_vma_node_revoke(&obj->vma_node, file_priv);
 
+	atomic_long_sub(obj->size >> PAGE_SHIFT, &file_priv->f_oom_badness);
 	drm_gem_object_handle_put_unlocked(obj);
 
 	return 0;
@@ -291,6 +292,8 @@ drm_gem_handle_delete(struct drm_file *filp, u32 handle)
 	idr_remove(&filp->object_idr, handle);
 	spin_unlock(&filp->table_lock);
 
+	atomic_long_sub(obj->size >> PAGE_SHIFT, &filp->f_oom_badness);
+
 	return 0;
 }
 EXPORT_SYMBOL(drm_gem_handle_delete);
@@ -399,6 +402,8 @@ drm_gem_handle_create_tail(struct drm_file *file_priv,
 	}
 
 	*handlep = handle;
+
+	atomic_long_add(obj->size >> PAGE_SHIFT, &file_priv->f_oom_badness);
 	return 0;
 
 err_revoke:
diff --git a/include/drm/drm_file.h b/include/drm/drm_file.h
index e0a73a1e2df7..5926766d79f0 100644
--- a/include/drm/drm_file.h
+++ b/include/drm/drm_file.h
@@ -366,6 +366,13 @@ struct drm_file {
 #if IS_ENABLED(CONFIG_DRM_LEGACY)
 	unsigned long lock_count; /* DRI1 legacy lock count */
 #endif
+
+	/**
+	 * @f_oom_badness:
+	 *
+	 * How many pages are allocated through this driver connection.
+	 */
+	atomic_long_t		f_oom_badness;
 };
 
 /**
@@ -430,4 +437,6 @@ unsigned long drm_get_unmapped_area(struct file *file,
 #endif /* CONFIG_MMU */
 
 
+long drm_oom_badness(struct file *f);
+
 #endif /* _DRM_FILE_H_ */
diff --git a/include/drm/drm_gem.h b/include/drm/drm_gem.h
index 9d7c61a122dc..0adf8c2f62e8 100644
--- a/include/drm/drm_gem.h
+++ b/include/drm/drm_gem.h
@@ -338,6 +338,7 @@ struct drm_gem_object {
 		.read		= drm_read,\
 		.llseek		= noop_llseek,\
 		.mmap		= drm_gem_mmap,\
+		.oom_badness	= drm_oom_badness,\
 	}
 
 void drm_gem_object_release(struct drm_gem_object *obj);
diff --git a/include/drm/drm_gem_cma_helper.h b/include/drm/drm_gem_cma_helper.h
index fbda4ce5d5fb..455ce1aa6d2c 100644
--- a/include/drm/drm_gem_cma_helper.h
+++ b/include/drm/drm_gem_cma_helper.h
@@ -273,6 +273,7 @@ unsigned long drm_gem_cma_get_unmapped_area(struct file *filp,
 		.read		= drm_read,\
 		.llseek		= noop_llseek,\
 		.mmap		= drm_gem_mmap,\
+		.oom_badness	= drm_oom_badness,\
 		DRM_GEM_CMA_UNMAPPED_AREA_FOPS \
 	}
 
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 145+ messages in thread

* [PATCH 05/13] drm/gem: adjust per file OOM badness on handling buffers
@ 2022-05-31  9:59   ` Christian König
  0 siblings, 0 replies; 145+ messages in thread
From: Christian König @ 2022-05-31  9:59 UTC (permalink / raw)
  To: linux-media, linux-kernel, intel-gfx, amd-gfx, nouveau,
	linux-tegra, linux-fsdevel, linux-mm
  Cc: andrey.grodzovsky, hughd, viro, daniel, alexander.deucher, akpm,
	christian.koenig

From: Andrey Grodzovsky <andrey.grodzovsky@amd.com>

Large amounts of VRAM are usually not CPU accessible, so they are not mapped
into the processes address space. But since the device drivers usually support
swapping buffers from VRAM to system memory we can still run into an out of
memory situation when userspace starts to allocate to much.

This patch gives the OOM killer another hint which process is
holding references to memory resources.

A GEM helper is provided and automatically used for all drivers using the
DEFINE_DRM_GEM_FOPS() and DEFINE_DRM_GEM_CMA_FOPS() macros.

Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
---
 drivers/gpu/drm/drm_file.c       | 19 +++++++++++++++++++
 drivers/gpu/drm/drm_gem.c        |  5 +++++
 include/drm/drm_file.h           |  9 +++++++++
 include/drm/drm_gem.h            |  1 +
 include/drm/drm_gem_cma_helper.h |  1 +
 5 files changed, 35 insertions(+)

diff --git a/drivers/gpu/drm/drm_file.c b/drivers/gpu/drm/drm_file.c
index ed25168619fc..1959a5b7029e 100644
--- a/drivers/gpu/drm/drm_file.c
+++ b/drivers/gpu/drm/drm_file.c
@@ -1049,3 +1049,22 @@ unsigned long drm_get_unmapped_area(struct file *file,
 #endif /* CONFIG_TRANSPARENT_HUGEPAGE */
 EXPORT_SYMBOL_GPL(drm_get_unmapped_area);
 #endif /* CONFIG_MMU */
+
+
+/**
+ * drm_oom_badness() - get oom badness for struct drm_file
+ * @f: struct drm_file to get the badness from
+ *
+ * Return how many pages are allocated for this client.
+ */
+long drm_oom_badness(struct file *f)
+{
+
+	struct drm_file *file_priv = f->private_data;
+
+	if (file_priv)
+		return atomic_long_read(&file_priv->f_oom_badness);
+
+	return 0;
+}
+EXPORT_SYMBOL(drm_oom_badness);
diff --git a/drivers/gpu/drm/drm_gem.c b/drivers/gpu/drm/drm_gem.c
index eb0c2d041f13..768b28b198cd 100644
--- a/drivers/gpu/drm/drm_gem.c
+++ b/drivers/gpu/drm/drm_gem.c
@@ -256,6 +256,7 @@ drm_gem_object_release_handle(int id, void *ptr, void *data)
 	drm_gem_remove_prime_handles(obj, file_priv);
 	drm_vma_node_revoke(&obj->vma_node, file_priv);
 
+	atomic_long_sub(obj->size >> PAGE_SHIFT, &file_priv->f_oom_badness);
 	drm_gem_object_handle_put_unlocked(obj);
 
 	return 0;
@@ -291,6 +292,8 @@ drm_gem_handle_delete(struct drm_file *filp, u32 handle)
 	idr_remove(&filp->object_idr, handle);
 	spin_unlock(&filp->table_lock);
 
+	atomic_long_sub(obj->size >> PAGE_SHIFT, &filp->f_oom_badness);
+
 	return 0;
 }
 EXPORT_SYMBOL(drm_gem_handle_delete);
@@ -399,6 +402,8 @@ drm_gem_handle_create_tail(struct drm_file *file_priv,
 	}
 
 	*handlep = handle;
+
+	atomic_long_add(obj->size >> PAGE_SHIFT, &file_priv->f_oom_badness);
 	return 0;
 
 err_revoke:
diff --git a/include/drm/drm_file.h b/include/drm/drm_file.h
index e0a73a1e2df7..5926766d79f0 100644
--- a/include/drm/drm_file.h
+++ b/include/drm/drm_file.h
@@ -366,6 +366,13 @@ struct drm_file {
 #if IS_ENABLED(CONFIG_DRM_LEGACY)
 	unsigned long lock_count; /* DRI1 legacy lock count */
 #endif
+
+	/**
+	 * @f_oom_badness:
+	 *
+	 * How many pages are allocated through this driver connection.
+	 */
+	atomic_long_t		f_oom_badness;
 };
 
 /**
@@ -430,4 +437,6 @@ unsigned long drm_get_unmapped_area(struct file *file,
 #endif /* CONFIG_MMU */
 
 
+long drm_oom_badness(struct file *f);
+
 #endif /* _DRM_FILE_H_ */
diff --git a/include/drm/drm_gem.h b/include/drm/drm_gem.h
index 9d7c61a122dc..0adf8c2f62e8 100644
--- a/include/drm/drm_gem.h
+++ b/include/drm/drm_gem.h
@@ -338,6 +338,7 @@ struct drm_gem_object {
 		.read		= drm_read,\
 		.llseek		= noop_llseek,\
 		.mmap		= drm_gem_mmap,\
+		.oom_badness	= drm_oom_badness,\
 	}
 
 void drm_gem_object_release(struct drm_gem_object *obj);
diff --git a/include/drm/drm_gem_cma_helper.h b/include/drm/drm_gem_cma_helper.h
index fbda4ce5d5fb..455ce1aa6d2c 100644
--- a/include/drm/drm_gem_cma_helper.h
+++ b/include/drm/drm_gem_cma_helper.h
@@ -273,6 +273,7 @@ unsigned long drm_gem_cma_get_unmapped_area(struct file *filp,
 		.read		= drm_read,\
 		.llseek		= noop_llseek,\
 		.mmap		= drm_gem_mmap,\
+		.oom_badness	= drm_oom_badness,\
 		DRM_GEM_CMA_UNMAPPED_AREA_FOPS \
 	}
 
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 145+ messages in thread

* [PATCH 06/13] drm/gma500: use drm_oom_badness
  2022-05-31  9:59 ` [Nouveau] " Christian König
  (?)
@ 2022-05-31 10:00   ` Christian König
  -1 siblings, 0 replies; 145+ messages in thread
From: Christian König @ 2022-05-31 10:00 UTC (permalink / raw)
  To: linux-media, linux-kernel, intel-gfx, amd-gfx, nouveau,
	linux-tegra, linux-fsdevel, linux-mm
  Cc: christian.koenig, alexander.deucher, daniel, viro, akpm, hughd,
	andrey.grodzovsky

This allows the OOM killer to make a better decision which process to reap.

Signed-off-by: Christian König <christian.koenig@amd.com>
---
 drivers/gpu/drm/gma500/psb_drv.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/gpu/drm/gma500/psb_drv.c b/drivers/gpu/drm/gma500/psb_drv.c
index 1d8744f3e702..d5ab4e081b53 100644
--- a/drivers/gpu/drm/gma500/psb_drv.c
+++ b/drivers/gpu/drm/gma500/psb_drv.c
@@ -513,6 +513,7 @@ static const struct file_operations psb_gem_fops = {
 	.mmap = drm_gem_mmap,
 	.poll = drm_poll,
 	.read = drm_read,
+	.oom_badness = drm_oom_badness,
 };
 
 static const struct drm_driver driver = {
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 145+ messages in thread

* [Nouveau] [PATCH 06/13] drm/gma500: use drm_oom_badness
@ 2022-05-31 10:00   ` Christian König
  0 siblings, 0 replies; 145+ messages in thread
From: Christian König @ 2022-05-31 10:00 UTC (permalink / raw)
  To: linux-media, linux-kernel, intel-gfx, amd-gfx, nouveau,
	linux-tegra, linux-fsdevel, linux-mm
  Cc: andrey.grodzovsky, hughd, viro, daniel, alexander.deucher, akpm,
	christian.koenig

This allows the OOM killer to make a better decision which process to reap.

Signed-off-by: Christian König <christian.koenig@amd.com>
---
 drivers/gpu/drm/gma500/psb_drv.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/gpu/drm/gma500/psb_drv.c b/drivers/gpu/drm/gma500/psb_drv.c
index 1d8744f3e702..d5ab4e081b53 100644
--- a/drivers/gpu/drm/gma500/psb_drv.c
+++ b/drivers/gpu/drm/gma500/psb_drv.c
@@ -513,6 +513,7 @@ static const struct file_operations psb_gem_fops = {
 	.mmap = drm_gem_mmap,
 	.poll = drm_poll,
 	.read = drm_read,
+	.oom_badness = drm_oom_badness,
 };
 
 static const struct drm_driver driver = {
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 145+ messages in thread

* [PATCH 06/13] drm/gma500: use drm_oom_badness
@ 2022-05-31 10:00   ` Christian König
  0 siblings, 0 replies; 145+ messages in thread
From: Christian König @ 2022-05-31 10:00 UTC (permalink / raw)
  To: linux-media, linux-kernel, intel-gfx, amd-gfx, nouveau,
	linux-tegra, linux-fsdevel, linux-mm
  Cc: andrey.grodzovsky, hughd, viro, daniel, alexander.deucher, akpm,
	christian.koenig

This allows the OOM killer to make a better decision which process to reap.

Signed-off-by: Christian König <christian.koenig@amd.com>
---
 drivers/gpu/drm/gma500/psb_drv.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/gpu/drm/gma500/psb_drv.c b/drivers/gpu/drm/gma500/psb_drv.c
index 1d8744f3e702..d5ab4e081b53 100644
--- a/drivers/gpu/drm/gma500/psb_drv.c
+++ b/drivers/gpu/drm/gma500/psb_drv.c
@@ -513,6 +513,7 @@ static const struct file_operations psb_gem_fops = {
 	.mmap = drm_gem_mmap,
 	.poll = drm_poll,
 	.read = drm_read,
+	.oom_badness = drm_oom_badness,
 };
 
 static const struct drm_driver driver = {
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 145+ messages in thread

* [PATCH 07/13] drm/amdgpu: Use drm_oom_badness for amdgpu
  2022-05-31  9:59 ` [Nouveau] " Christian König
  (?)
@ 2022-05-31 10:00   ` Christian König
  -1 siblings, 0 replies; 145+ messages in thread
From: Christian König @ 2022-05-31 10:00 UTC (permalink / raw)
  To: linux-media, linux-kernel, intel-gfx, amd-gfx, nouveau,
	linux-tegra, linux-fsdevel, linux-mm
  Cc: christian.koenig, alexander.deucher, daniel, viro, akpm, hughd,
	andrey.grodzovsky

From: Andrey Grodzovsky <andrey.grodzovsky@amd.com>

This allows the OOM killer to make a better decision which process to reap.

Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
index ebd37fb19cdb..9d6e57c93d3e 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
@@ -2651,8 +2651,9 @@ static const struct file_operations amdgpu_driver_kms_fops = {
 	.compat_ioctl = amdgpu_kms_compat_ioctl,
 #endif
 #ifdef CONFIG_PROC_FS
-	.show_fdinfo = amdgpu_show_fdinfo
+	.show_fdinfo = amdgpu_show_fdinfo,
 #endif
+	.oom_badness = drm_oom_badness,
 };
 
 int amdgpu_file_to_fpriv(struct file *filp, struct amdgpu_fpriv **fpriv)
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 145+ messages in thread

* [Nouveau] [PATCH 07/13] drm/amdgpu: Use drm_oom_badness for amdgpu
@ 2022-05-31 10:00   ` Christian König
  0 siblings, 0 replies; 145+ messages in thread
From: Christian König @ 2022-05-31 10:00 UTC (permalink / raw)
  To: linux-media, linux-kernel, intel-gfx, amd-gfx, nouveau,
	linux-tegra, linux-fsdevel, linux-mm
  Cc: andrey.grodzovsky, hughd, viro, daniel, alexander.deucher, akpm,
	christian.koenig

From: Andrey Grodzovsky <andrey.grodzovsky@amd.com>

This allows the OOM killer to make a better decision which process to reap.

Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
index ebd37fb19cdb..9d6e57c93d3e 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
@@ -2651,8 +2651,9 @@ static const struct file_operations amdgpu_driver_kms_fops = {
 	.compat_ioctl = amdgpu_kms_compat_ioctl,
 #endif
 #ifdef CONFIG_PROC_FS
-	.show_fdinfo = amdgpu_show_fdinfo
+	.show_fdinfo = amdgpu_show_fdinfo,
 #endif
+	.oom_badness = drm_oom_badness,
 };
 
 int amdgpu_file_to_fpriv(struct file *filp, struct amdgpu_fpriv **fpriv)
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 145+ messages in thread

* [PATCH 07/13] drm/amdgpu: Use drm_oom_badness for amdgpu
@ 2022-05-31 10:00   ` Christian König
  0 siblings, 0 replies; 145+ messages in thread
From: Christian König @ 2022-05-31 10:00 UTC (permalink / raw)
  To: linux-media, linux-kernel, intel-gfx, amd-gfx, nouveau,
	linux-tegra, linux-fsdevel, linux-mm
  Cc: andrey.grodzovsky, hughd, viro, daniel, alexander.deucher, akpm,
	christian.koenig

From: Andrey Grodzovsky <andrey.grodzovsky@amd.com>

This allows the OOM killer to make a better decision which process to reap.

Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
index ebd37fb19cdb..9d6e57c93d3e 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
@@ -2651,8 +2651,9 @@ static const struct file_operations amdgpu_driver_kms_fops = {
 	.compat_ioctl = amdgpu_kms_compat_ioctl,
 #endif
 #ifdef CONFIG_PROC_FS
-	.show_fdinfo = amdgpu_show_fdinfo
+	.show_fdinfo = amdgpu_show_fdinfo,
 #endif
+	.oom_badness = drm_oom_badness,
 };
 
 int amdgpu_file_to_fpriv(struct file *filp, struct amdgpu_fpriv **fpriv)
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 145+ messages in thread

* [PATCH 08/13] drm/radeon: use drm_oom_badness
  2022-05-31  9:59 ` [Nouveau] " Christian König
  (?)
@ 2022-05-31 10:00   ` Christian König
  -1 siblings, 0 replies; 145+ messages in thread
From: Christian König @ 2022-05-31 10:00 UTC (permalink / raw)
  To: linux-media, linux-kernel, intel-gfx, amd-gfx, nouveau,
	linux-tegra, linux-fsdevel, linux-mm
  Cc: christian.koenig, alexander.deucher, daniel, viro, akpm, hughd,
	andrey.grodzovsky

This allows the OOM killer to make a better decision which process to reap.

Signed-off-by: Christian König <christian.koenig@amd.com>
---
 drivers/gpu/drm/radeon/radeon_drv.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/gpu/drm/radeon/radeon_drv.c b/drivers/gpu/drm/radeon/radeon_drv.c
index 956c72b5aa33..7e7308c096d8 100644
--- a/drivers/gpu/drm/radeon/radeon_drv.c
+++ b/drivers/gpu/drm/radeon/radeon_drv.c
@@ -550,6 +550,7 @@ static const struct file_operations radeon_driver_kms_fops = {
 #ifdef CONFIG_COMPAT
 	.compat_ioctl = radeon_kms_compat_ioctl,
 #endif
+	.oom_badness = drm_oom_badness,
 };
 
 static const struct drm_ioctl_desc radeon_ioctls_kms[] = {
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 145+ messages in thread

* [Nouveau] [PATCH 08/13] drm/radeon: use drm_oom_badness
@ 2022-05-31 10:00   ` Christian König
  0 siblings, 0 replies; 145+ messages in thread
From: Christian König @ 2022-05-31 10:00 UTC (permalink / raw)
  To: linux-media, linux-kernel, intel-gfx, amd-gfx, nouveau,
	linux-tegra, linux-fsdevel, linux-mm
  Cc: andrey.grodzovsky, hughd, viro, daniel, alexander.deucher, akpm,
	christian.koenig

This allows the OOM killer to make a better decision which process to reap.

Signed-off-by: Christian König <christian.koenig@amd.com>
---
 drivers/gpu/drm/radeon/radeon_drv.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/gpu/drm/radeon/radeon_drv.c b/drivers/gpu/drm/radeon/radeon_drv.c
index 956c72b5aa33..7e7308c096d8 100644
--- a/drivers/gpu/drm/radeon/radeon_drv.c
+++ b/drivers/gpu/drm/radeon/radeon_drv.c
@@ -550,6 +550,7 @@ static const struct file_operations radeon_driver_kms_fops = {
 #ifdef CONFIG_COMPAT
 	.compat_ioctl = radeon_kms_compat_ioctl,
 #endif
+	.oom_badness = drm_oom_badness,
 };
 
 static const struct drm_ioctl_desc radeon_ioctls_kms[] = {
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 145+ messages in thread

* [PATCH 08/13] drm/radeon: use drm_oom_badness
@ 2022-05-31 10:00   ` Christian König
  0 siblings, 0 replies; 145+ messages in thread
From: Christian König @ 2022-05-31 10:00 UTC (permalink / raw)
  To: linux-media, linux-kernel, intel-gfx, amd-gfx, nouveau,
	linux-tegra, linux-fsdevel, linux-mm
  Cc: andrey.grodzovsky, hughd, viro, daniel, alexander.deucher, akpm,
	christian.koenig

This allows the OOM killer to make a better decision which process to reap.

Signed-off-by: Christian König <christian.koenig@amd.com>
---
 drivers/gpu/drm/radeon/radeon_drv.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/gpu/drm/radeon/radeon_drv.c b/drivers/gpu/drm/radeon/radeon_drv.c
index 956c72b5aa33..7e7308c096d8 100644
--- a/drivers/gpu/drm/radeon/radeon_drv.c
+++ b/drivers/gpu/drm/radeon/radeon_drv.c
@@ -550,6 +550,7 @@ static const struct file_operations radeon_driver_kms_fops = {
 #ifdef CONFIG_COMPAT
 	.compat_ioctl = radeon_kms_compat_ioctl,
 #endif
+	.oom_badness = drm_oom_badness,
 };
 
 static const struct drm_ioctl_desc radeon_ioctls_kms[] = {
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 145+ messages in thread

* [PATCH 09/13] drm/i915: use drm_oom_badness
  2022-05-31  9:59 ` [Nouveau] " Christian König
  (?)
@ 2022-05-31 10:00   ` Christian König
  -1 siblings, 0 replies; 145+ messages in thread
From: Christian König @ 2022-05-31 10:00 UTC (permalink / raw)
  To: linux-media, linux-kernel, intel-gfx, amd-gfx, nouveau,
	linux-tegra, linux-fsdevel, linux-mm
  Cc: christian.koenig, alexander.deucher, daniel, viro, akpm, hughd,
	andrey.grodzovsky

This allows the OOM killer to make a better decision which process to reap.

Signed-off-by: Christian König <christian.koenig@amd.com>
---
 drivers/gpu/drm/i915/i915_driver.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/gpu/drm/i915/i915_driver.c b/drivers/gpu/drm/i915/i915_driver.c
index 3ffb617d75c9..f9676a5b8aeb 100644
--- a/drivers/gpu/drm/i915/i915_driver.c
+++ b/drivers/gpu/drm/i915/i915_driver.c
@@ -1748,6 +1748,7 @@ static const struct file_operations i915_driver_fops = {
 #ifdef CONFIG_PROC_FS
 	.show_fdinfo = i915_drm_client_fdinfo,
 #endif
+	.oom_badness = drm_oom_badness,
 };
 
 static int
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 145+ messages in thread

* [Nouveau] [PATCH 09/13] drm/i915: use drm_oom_badness
@ 2022-05-31 10:00   ` Christian König
  0 siblings, 0 replies; 145+ messages in thread
From: Christian König @ 2022-05-31 10:00 UTC (permalink / raw)
  To: linux-media, linux-kernel, intel-gfx, amd-gfx, nouveau,
	linux-tegra, linux-fsdevel, linux-mm
  Cc: andrey.grodzovsky, hughd, viro, daniel, alexander.deucher, akpm,
	christian.koenig

This allows the OOM killer to make a better decision which process to reap.

Signed-off-by: Christian König <christian.koenig@amd.com>
---
 drivers/gpu/drm/i915/i915_driver.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/gpu/drm/i915/i915_driver.c b/drivers/gpu/drm/i915/i915_driver.c
index 3ffb617d75c9..f9676a5b8aeb 100644
--- a/drivers/gpu/drm/i915/i915_driver.c
+++ b/drivers/gpu/drm/i915/i915_driver.c
@@ -1748,6 +1748,7 @@ static const struct file_operations i915_driver_fops = {
 #ifdef CONFIG_PROC_FS
 	.show_fdinfo = i915_drm_client_fdinfo,
 #endif
+	.oom_badness = drm_oom_badness,
 };
 
 static int
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 145+ messages in thread

* [PATCH 09/13] drm/i915: use drm_oom_badness
@ 2022-05-31 10:00   ` Christian König
  0 siblings, 0 replies; 145+ messages in thread
From: Christian König @ 2022-05-31 10:00 UTC (permalink / raw)
  To: linux-media, linux-kernel, intel-gfx, amd-gfx, nouveau,
	linux-tegra, linux-fsdevel, linux-mm
  Cc: andrey.grodzovsky, hughd, viro, daniel, alexander.deucher, akpm,
	christian.koenig

This allows the OOM killer to make a better decision which process to reap.

Signed-off-by: Christian König <christian.koenig@amd.com>
---
 drivers/gpu/drm/i915/i915_driver.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/gpu/drm/i915/i915_driver.c b/drivers/gpu/drm/i915/i915_driver.c
index 3ffb617d75c9..f9676a5b8aeb 100644
--- a/drivers/gpu/drm/i915/i915_driver.c
+++ b/drivers/gpu/drm/i915/i915_driver.c
@@ -1748,6 +1748,7 @@ static const struct file_operations i915_driver_fops = {
 #ifdef CONFIG_PROC_FS
 	.show_fdinfo = i915_drm_client_fdinfo,
 #endif
+	.oom_badness = drm_oom_badness,
 };
 
 static int
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 145+ messages in thread

* [PATCH 10/13] drm/nouveau: use drm_oom_badness
  2022-05-31  9:59 ` [Nouveau] " Christian König
  (?)
@ 2022-05-31 10:00   ` Christian König
  -1 siblings, 0 replies; 145+ messages in thread
From: Christian König @ 2022-05-31 10:00 UTC (permalink / raw)
  To: linux-media, linux-kernel, intel-gfx, amd-gfx, nouveau,
	linux-tegra, linux-fsdevel, linux-mm
  Cc: christian.koenig, alexander.deucher, daniel, viro, akpm, hughd,
	andrey.grodzovsky

This allows the OOM killer to make a better decision which process to reap.

Signed-off-by: Christian König <christian.koenig@amd.com>
---
 drivers/gpu/drm/nouveau/nouveau_drm.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/gpu/drm/nouveau/nouveau_drm.c b/drivers/gpu/drm/nouveau/nouveau_drm.c
index 561309d447e0..5439b6938455 100644
--- a/drivers/gpu/drm/nouveau/nouveau_drm.c
+++ b/drivers/gpu/drm/nouveau/nouveau_drm.c
@@ -1218,6 +1218,7 @@ nouveau_driver_fops = {
 	.compat_ioctl = nouveau_compat_ioctl,
 #endif
 	.llseek = noop_llseek,
+	.oom_badness = drm_oom_badness,
 };
 
 static struct drm_driver
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 145+ messages in thread

* [Nouveau] [PATCH 10/13] drm/nouveau: use drm_oom_badness
@ 2022-05-31 10:00   ` Christian König
  0 siblings, 0 replies; 145+ messages in thread
From: Christian König @ 2022-05-31 10:00 UTC (permalink / raw)
  To: linux-media, linux-kernel, intel-gfx, amd-gfx, nouveau,
	linux-tegra, linux-fsdevel, linux-mm
  Cc: andrey.grodzovsky, hughd, viro, daniel, alexander.deucher, akpm,
	christian.koenig

This allows the OOM killer to make a better decision which process to reap.

Signed-off-by: Christian König <christian.koenig@amd.com>
---
 drivers/gpu/drm/nouveau/nouveau_drm.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/gpu/drm/nouveau/nouveau_drm.c b/drivers/gpu/drm/nouveau/nouveau_drm.c
index 561309d447e0..5439b6938455 100644
--- a/drivers/gpu/drm/nouveau/nouveau_drm.c
+++ b/drivers/gpu/drm/nouveau/nouveau_drm.c
@@ -1218,6 +1218,7 @@ nouveau_driver_fops = {
 	.compat_ioctl = nouveau_compat_ioctl,
 #endif
 	.llseek = noop_llseek,
+	.oom_badness = drm_oom_badness,
 };
 
 static struct drm_driver
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 145+ messages in thread

* [PATCH 10/13] drm/nouveau: use drm_oom_badness
@ 2022-05-31 10:00   ` Christian König
  0 siblings, 0 replies; 145+ messages in thread
From: Christian König @ 2022-05-31 10:00 UTC (permalink / raw)
  To: linux-media, linux-kernel, intel-gfx, amd-gfx, nouveau,
	linux-tegra, linux-fsdevel, linux-mm
  Cc: andrey.grodzovsky, hughd, viro, daniel, alexander.deucher, akpm,
	christian.koenig

This allows the OOM killer to make a better decision which process to reap.

Signed-off-by: Christian König <christian.koenig@amd.com>
---
 drivers/gpu/drm/nouveau/nouveau_drm.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/gpu/drm/nouveau/nouveau_drm.c b/drivers/gpu/drm/nouveau/nouveau_drm.c
index 561309d447e0..5439b6938455 100644
--- a/drivers/gpu/drm/nouveau/nouveau_drm.c
+++ b/drivers/gpu/drm/nouveau/nouveau_drm.c
@@ -1218,6 +1218,7 @@ nouveau_driver_fops = {
 	.compat_ioctl = nouveau_compat_ioctl,
 #endif
 	.llseek = noop_llseek,
+	.oom_badness = drm_oom_badness,
 };
 
 static struct drm_driver
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 145+ messages in thread

* [PATCH 11/13] drm/omap: use drm_oom_badness
  2022-05-31  9:59 ` [Nouveau] " Christian König
  (?)
@ 2022-05-31 10:00   ` Christian König
  -1 siblings, 0 replies; 145+ messages in thread
From: Christian König @ 2022-05-31 10:00 UTC (permalink / raw)
  To: linux-media, linux-kernel, intel-gfx, amd-gfx, nouveau,
	linux-tegra, linux-fsdevel, linux-mm
  Cc: christian.koenig, alexander.deucher, daniel, viro, akpm, hughd,
	andrey.grodzovsky

This allows the OOM killer to make a better decision which process to reap.

Signed-off-by: Christian König <christian.koenig@amd.com>
---
 drivers/gpu/drm/omapdrm/omap_drv.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/gpu/drm/omapdrm/omap_drv.c b/drivers/gpu/drm/omapdrm/omap_drv.c
index eaf67b9e5f12..ca2c484f48ca 100644
--- a/drivers/gpu/drm/omapdrm/omap_drv.c
+++ b/drivers/gpu/drm/omapdrm/omap_drv.c
@@ -684,6 +684,7 @@ static const struct file_operations omapdriver_fops = {
 	.poll = drm_poll,
 	.read = drm_read,
 	.llseek = noop_llseek,
+	.oom_badness = drm_oom_badness,
 };
 
 static const struct drm_driver omap_drm_driver = {
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 145+ messages in thread

* [Nouveau] [PATCH 11/13] drm/omap: use drm_oom_badness
@ 2022-05-31 10:00   ` Christian König
  0 siblings, 0 replies; 145+ messages in thread
From: Christian König @ 2022-05-31 10:00 UTC (permalink / raw)
  To: linux-media, linux-kernel, intel-gfx, amd-gfx, nouveau,
	linux-tegra, linux-fsdevel, linux-mm
  Cc: andrey.grodzovsky, hughd, viro, daniel, alexander.deucher, akpm,
	christian.koenig

This allows the OOM killer to make a better decision which process to reap.

Signed-off-by: Christian König <christian.koenig@amd.com>
---
 drivers/gpu/drm/omapdrm/omap_drv.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/gpu/drm/omapdrm/omap_drv.c b/drivers/gpu/drm/omapdrm/omap_drv.c
index eaf67b9e5f12..ca2c484f48ca 100644
--- a/drivers/gpu/drm/omapdrm/omap_drv.c
+++ b/drivers/gpu/drm/omapdrm/omap_drv.c
@@ -684,6 +684,7 @@ static const struct file_operations omapdriver_fops = {
 	.poll = drm_poll,
 	.read = drm_read,
 	.llseek = noop_llseek,
+	.oom_badness = drm_oom_badness,
 };
 
 static const struct drm_driver omap_drm_driver = {
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 145+ messages in thread

* [PATCH 11/13] drm/omap: use drm_oom_badness
@ 2022-05-31 10:00   ` Christian König
  0 siblings, 0 replies; 145+ messages in thread
From: Christian König @ 2022-05-31 10:00 UTC (permalink / raw)
  To: linux-media, linux-kernel, intel-gfx, amd-gfx, nouveau,
	linux-tegra, linux-fsdevel, linux-mm
  Cc: andrey.grodzovsky, hughd, viro, daniel, alexander.deucher, akpm,
	christian.koenig

This allows the OOM killer to make a better decision which process to reap.

Signed-off-by: Christian König <christian.koenig@amd.com>
---
 drivers/gpu/drm/omapdrm/omap_drv.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/gpu/drm/omapdrm/omap_drv.c b/drivers/gpu/drm/omapdrm/omap_drv.c
index eaf67b9e5f12..ca2c484f48ca 100644
--- a/drivers/gpu/drm/omapdrm/omap_drv.c
+++ b/drivers/gpu/drm/omapdrm/omap_drv.c
@@ -684,6 +684,7 @@ static const struct file_operations omapdriver_fops = {
 	.poll = drm_poll,
 	.read = drm_read,
 	.llseek = noop_llseek,
+	.oom_badness = drm_oom_badness,
 };
 
 static const struct drm_driver omap_drm_driver = {
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 145+ messages in thread

* [PATCH 12/13] drm/vmwgfx: use drm_oom_badness
  2022-05-31  9:59 ` [Nouveau] " Christian König
  (?)
@ 2022-05-31 10:00   ` Christian König
  -1 siblings, 0 replies; 145+ messages in thread
From: Christian König @ 2022-05-31 10:00 UTC (permalink / raw)
  To: linux-media, linux-kernel, intel-gfx, amd-gfx, nouveau,
	linux-tegra, linux-fsdevel, linux-mm
  Cc: christian.koenig, alexander.deucher, daniel, viro, akpm, hughd,
	andrey.grodzovsky

This allows the OOM killer to make a better decision which process to reap.

Signed-off-by: Christian König <christian.koenig@amd.com>
---
 drivers/gpu/drm/vmwgfx/vmwgfx_drv.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_drv.c b/drivers/gpu/drm/vmwgfx/vmwgfx_drv.c
index 01a5b47e95f9..e447e8ae29be 100644
--- a/drivers/gpu/drm/vmwgfx/vmwgfx_drv.c
+++ b/drivers/gpu/drm/vmwgfx/vmwgfx_drv.c
@@ -1577,6 +1577,7 @@ static const struct file_operations vmwgfx_driver_fops = {
 #endif
 	.llseek = noop_llseek,
 	.get_unmapped_area = vmw_get_unmapped_area,
+	.oom_badness = drm_oom_badness,
 };
 
 static const struct drm_driver driver = {
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 145+ messages in thread

* [Nouveau] [PATCH 12/13] drm/vmwgfx: use drm_oom_badness
@ 2022-05-31 10:00   ` Christian König
  0 siblings, 0 replies; 145+ messages in thread
From: Christian König @ 2022-05-31 10:00 UTC (permalink / raw)
  To: linux-media, linux-kernel, intel-gfx, amd-gfx, nouveau,
	linux-tegra, linux-fsdevel, linux-mm
  Cc: andrey.grodzovsky, hughd, viro, daniel, alexander.deucher, akpm,
	christian.koenig

This allows the OOM killer to make a better decision which process to reap.

Signed-off-by: Christian König <christian.koenig@amd.com>
---
 drivers/gpu/drm/vmwgfx/vmwgfx_drv.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_drv.c b/drivers/gpu/drm/vmwgfx/vmwgfx_drv.c
index 01a5b47e95f9..e447e8ae29be 100644
--- a/drivers/gpu/drm/vmwgfx/vmwgfx_drv.c
+++ b/drivers/gpu/drm/vmwgfx/vmwgfx_drv.c
@@ -1577,6 +1577,7 @@ static const struct file_operations vmwgfx_driver_fops = {
 #endif
 	.llseek = noop_llseek,
 	.get_unmapped_area = vmw_get_unmapped_area,
+	.oom_badness = drm_oom_badness,
 };
 
 static const struct drm_driver driver = {
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 145+ messages in thread

* [PATCH 12/13] drm/vmwgfx: use drm_oom_badness
@ 2022-05-31 10:00   ` Christian König
  0 siblings, 0 replies; 145+ messages in thread
From: Christian König @ 2022-05-31 10:00 UTC (permalink / raw)
  To: linux-media, linux-kernel, intel-gfx, amd-gfx, nouveau,
	linux-tegra, linux-fsdevel, linux-mm
  Cc: andrey.grodzovsky, hughd, viro, daniel, alexander.deucher, akpm,
	christian.koenig

This allows the OOM killer to make a better decision which process to reap.

Signed-off-by: Christian König <christian.koenig@amd.com>
---
 drivers/gpu/drm/vmwgfx/vmwgfx_drv.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_drv.c b/drivers/gpu/drm/vmwgfx/vmwgfx_drv.c
index 01a5b47e95f9..e447e8ae29be 100644
--- a/drivers/gpu/drm/vmwgfx/vmwgfx_drv.c
+++ b/drivers/gpu/drm/vmwgfx/vmwgfx_drv.c
@@ -1577,6 +1577,7 @@ static const struct file_operations vmwgfx_driver_fops = {
 #endif
 	.llseek = noop_llseek,
 	.get_unmapped_area = vmw_get_unmapped_area,
+	.oom_badness = drm_oom_badness,
 };
 
 static const struct drm_driver driver = {
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 145+ messages in thread

* [PATCH 13/13] drm/tegra: use drm_oom_badness
  2022-05-31  9:59 ` [Nouveau] " Christian König
  (?)
@ 2022-05-31 10:00   ` Christian König
  -1 siblings, 0 replies; 145+ messages in thread
From: Christian König @ 2022-05-31 10:00 UTC (permalink / raw)
  To: linux-media, linux-kernel, intel-gfx, amd-gfx, nouveau,
	linux-tegra, linux-fsdevel, linux-mm
  Cc: christian.koenig, alexander.deucher, daniel, viro, akpm, hughd,
	andrey.grodzovsky

This allows the OOM killer to make a better decision which process to reap.

Signed-off-by: Christian König <christian.koenig@amd.com>
---
 drivers/gpu/drm/tegra/drm.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/gpu/drm/tegra/drm.c b/drivers/gpu/drm/tegra/drm.c
index 9464f522e257..89ea4f658815 100644
--- a/drivers/gpu/drm/tegra/drm.c
+++ b/drivers/gpu/drm/tegra/drm.c
@@ -803,6 +803,7 @@ static const struct file_operations tegra_drm_fops = {
 	.read = drm_read,
 	.compat_ioctl = drm_compat_ioctl,
 	.llseek = noop_llseek,
+	.oom_badness = drm_oom_badness,
 };
 
 static int tegra_drm_context_cleanup(int id, void *p, void *data)
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 145+ messages in thread

* [Nouveau] [PATCH 13/13] drm/tegra: use drm_oom_badness
@ 2022-05-31 10:00   ` Christian König
  0 siblings, 0 replies; 145+ messages in thread
From: Christian König @ 2022-05-31 10:00 UTC (permalink / raw)
  To: linux-media, linux-kernel, intel-gfx, amd-gfx, nouveau,
	linux-tegra, linux-fsdevel, linux-mm
  Cc: andrey.grodzovsky, hughd, viro, daniel, alexander.deucher, akpm,
	christian.koenig

This allows the OOM killer to make a better decision which process to reap.

Signed-off-by: Christian König <christian.koenig@amd.com>
---
 drivers/gpu/drm/tegra/drm.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/gpu/drm/tegra/drm.c b/drivers/gpu/drm/tegra/drm.c
index 9464f522e257..89ea4f658815 100644
--- a/drivers/gpu/drm/tegra/drm.c
+++ b/drivers/gpu/drm/tegra/drm.c
@@ -803,6 +803,7 @@ static const struct file_operations tegra_drm_fops = {
 	.read = drm_read,
 	.compat_ioctl = drm_compat_ioctl,
 	.llseek = noop_llseek,
+	.oom_badness = drm_oom_badness,
 };
 
 static int tegra_drm_context_cleanup(int id, void *p, void *data)
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 145+ messages in thread

* [PATCH 13/13] drm/tegra: use drm_oom_badness
@ 2022-05-31 10:00   ` Christian König
  0 siblings, 0 replies; 145+ messages in thread
From: Christian König @ 2022-05-31 10:00 UTC (permalink / raw)
  To: linux-media, linux-kernel, intel-gfx, amd-gfx, nouveau,
	linux-tegra, linux-fsdevel, linux-mm
  Cc: andrey.grodzovsky, hughd, viro, daniel, alexander.deucher, akpm,
	christian.koenig

This allows the OOM killer to make a better decision which process to reap.

Signed-off-by: Christian König <christian.koenig@amd.com>
---
 drivers/gpu/drm/tegra/drm.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/gpu/drm/tegra/drm.c b/drivers/gpu/drm/tegra/drm.c
index 9464f522e257..89ea4f658815 100644
--- a/drivers/gpu/drm/tegra/drm.c
+++ b/drivers/gpu/drm/tegra/drm.c
@@ -803,6 +803,7 @@ static const struct file_operations tegra_drm_fops = {
 	.read = drm_read,
 	.compat_ioctl = drm_compat_ioctl,
 	.llseek = noop_llseek,
+	.oom_badness = drm_oom_badness,
 };
 
 static int tegra_drm_context_cleanup(int id, void *p, void *data)
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 145+ messages in thread

* Re: Per file OOM badness
  2022-05-31  9:59 ` [Nouveau] " Christian König
                     ` (2 preceding siblings ...)
  (?)
@ 2022-05-31 22:00   ` Alex Deucher
  -1 siblings, 0 replies; 145+ messages in thread
From: Alex Deucher @ 2022-05-31 22:00 UTC (permalink / raw)
  To: Christian König, Maling list - DRI developers
  Cc: linux-media, LKML, Intel Graphics Development, amd-gfx list,
	nouveau, linux-tegra, Linux-Fsdevel, linux-mm, Andrey Grodzovsky,
	Hugh Dickens, Alexander Viro, Daniel Vetter, Deucher, Alexander,
	Andrew Morton, Christian Koenig

+ dri-devel

On Tue, May 31, 2022 at 6:00 AM Christian König
<ckoenig.leichtzumerken@gmail.com> wrote:
>
> Hello everyone,
>
> To summarize the issue I'm trying to address here: Processes can allocate
> resources through a file descriptor without being held responsible for it.
>
> Especially for the DRM graphics driver subsystem this is rather
> problematic. Modern games tend to allocate huge amounts of system memory
> through the DRM drivers to make it accessible to GPU rendering.
>
> But even outside of the DRM subsystem this problem exists and it is
> trivial to exploit. See the following simple example of
> using memfd_create():
>
>          fd = memfd_create("test", 0);
>          while (1)
>                  write(fd, page, 4096);
>
> Compile this and you can bring down any standard desktop system within
> seconds.
>
> The background is that the OOM killer will kill every processes in the
> system, but just not the one which holds the only reference to the memory
> allocated by the memfd.
>
> Those problems where brought up on the mailing list multiple times now
> [1][2][3], but without any final conclusion how to address them. Since
> file descriptors are considered shared the process can not directly held
> accountable for allocations made through them. Additional to that file
> descriptors can also easily move between processes as well.
>
> So what this patch set does is to instead of trying to account the
> allocated memory to a specific process it adds a callback to struct
> file_operations which the OOM killer can use to query the specific OOM
> badness of this file reference. This badness is then divided by the
> file_count, so that every process using a shmem file, DMA-buf or DRM
> driver will get it's equal amount of OOM badness.
>
> Callbacks are then implemented for the two core users (memfd and DMA-buf)
> as well as 72 DRM based graphics drivers.
>
> The result is that the OOM killer can now much better judge if a process
> is worth killing to free up memory. Resulting a quite a bit better system
> stability in OOM situations, especially while running games.
>
> The only other possibility I can see would be to change the accounting of
> resources whenever references to the file structure change, but this would
> mean quite some additional overhead for a rather common operation.
>
> Additionally I think trying to limit device driver allocations using
> cgroups is orthogonal to this effort. While cgroups is very useful, it
> works on per process limits and tries to enforce a collaborative model on
> memory management while the OOM killer enforces a competitive model.
>
> Please comment and/or review, we have that problem flying around for years
> now and are not at a point where we finally need to find a solution for
> this.
>
> Regards,
> Christian.
>
> [1] https://lists.freedesktop.org/archives/dri-devel/2015-September/089778.html
> [2] https://lkml.org/lkml/2018/1/18/543
> [3] https://lkml.org/lkml/2021/2/4/799
>
>

^ permalink raw reply	[flat|nested] 145+ messages in thread

* Re: [Nouveau] Per file OOM badness
@ 2022-05-31 22:00   ` Alex Deucher
  0 siblings, 0 replies; 145+ messages in thread
From: Alex Deucher @ 2022-05-31 22:00 UTC (permalink / raw)
  To: Christian König, Maling list - DRI developers
  Cc: Andrey Grodzovsky, linux-mm, nouveau, Intel Graphics Development,
	Hugh Dickens, LKML, amd-gfx list, Linux-Fsdevel, Alexander Viro,
	Daniel Vetter, linux-tegra, Deucher, Alexander, Andrew Morton,
	Christian Koenig, linux-media

+ dri-devel

On Tue, May 31, 2022 at 6:00 AM Christian König
<ckoenig.leichtzumerken@gmail.com> wrote:
>
> Hello everyone,
>
> To summarize the issue I'm trying to address here: Processes can allocate
> resources through a file descriptor without being held responsible for it.
>
> Especially for the DRM graphics driver subsystem this is rather
> problematic. Modern games tend to allocate huge amounts of system memory
> through the DRM drivers to make it accessible to GPU rendering.
>
> But even outside of the DRM subsystem this problem exists and it is
> trivial to exploit. See the following simple example of
> using memfd_create():
>
>          fd = memfd_create("test", 0);
>          while (1)
>                  write(fd, page, 4096);
>
> Compile this and you can bring down any standard desktop system within
> seconds.
>
> The background is that the OOM killer will kill every processes in the
> system, but just not the one which holds the only reference to the memory
> allocated by the memfd.
>
> Those problems where brought up on the mailing list multiple times now
> [1][2][3], but without any final conclusion how to address them. Since
> file descriptors are considered shared the process can not directly held
> accountable for allocations made through them. Additional to that file
> descriptors can also easily move between processes as well.
>
> So what this patch set does is to instead of trying to account the
> allocated memory to a specific process it adds a callback to struct
> file_operations which the OOM killer can use to query the specific OOM
> badness of this file reference. This badness is then divided by the
> file_count, so that every process using a shmem file, DMA-buf or DRM
> driver will get it's equal amount of OOM badness.
>
> Callbacks are then implemented for the two core users (memfd and DMA-buf)
> as well as 72 DRM based graphics drivers.
>
> The result is that the OOM killer can now much better judge if a process
> is worth killing to free up memory. Resulting a quite a bit better system
> stability in OOM situations, especially while running games.
>
> The only other possibility I can see would be to change the accounting of
> resources whenever references to the file structure change, but this would
> mean quite some additional overhead for a rather common operation.
>
> Additionally I think trying to limit device driver allocations using
> cgroups is orthogonal to this effort. While cgroups is very useful, it
> works on per process limits and tries to enforce a collaborative model on
> memory management while the OOM killer enforces a competitive model.
>
> Please comment and/or review, we have that problem flying around for years
> now and are not at a point where we finally need to find a solution for
> this.
>
> Regards,
> Christian.
>
> [1] https://lists.freedesktop.org/archives/dri-devel/2015-September/089778.html
> [2] https://lkml.org/lkml/2018/1/18/543
> [3] https://lkml.org/lkml/2021/2/4/799
>
>

^ permalink raw reply	[flat|nested] 145+ messages in thread

* Re: Per file OOM badness
@ 2022-05-31 22:00   ` Alex Deucher
  0 siblings, 0 replies; 145+ messages in thread
From: Alex Deucher @ 2022-05-31 22:00 UTC (permalink / raw)
  To: Christian König, Maling list - DRI developers
  Cc: linux-mm, nouveau, Intel Graphics Development, Hugh Dickens,
	LKML, amd-gfx list, Linux-Fsdevel, Alexander Viro, linux-tegra,
	Deucher, Alexander, Andrew Morton, Christian Koenig, linux-media

+ dri-devel

On Tue, May 31, 2022 at 6:00 AM Christian König
<ckoenig.leichtzumerken@gmail.com> wrote:
>
> Hello everyone,
>
> To summarize the issue I'm trying to address here: Processes can allocate
> resources through a file descriptor without being held responsible for it.
>
> Especially for the DRM graphics driver subsystem this is rather
> problematic. Modern games tend to allocate huge amounts of system memory
> through the DRM drivers to make it accessible to GPU rendering.
>
> But even outside of the DRM subsystem this problem exists and it is
> trivial to exploit. See the following simple example of
> using memfd_create():
>
>          fd = memfd_create("test", 0);
>          while (1)
>                  write(fd, page, 4096);
>
> Compile this and you can bring down any standard desktop system within
> seconds.
>
> The background is that the OOM killer will kill every processes in the
> system, but just not the one which holds the only reference to the memory
> allocated by the memfd.
>
> Those problems where brought up on the mailing list multiple times now
> [1][2][3], but without any final conclusion how to address them. Since
> file descriptors are considered shared the process can not directly held
> accountable for allocations made through them. Additional to that file
> descriptors can also easily move between processes as well.
>
> So what this patch set does is to instead of trying to account the
> allocated memory to a specific process it adds a callback to struct
> file_operations which the OOM killer can use to query the specific OOM
> badness of this file reference. This badness is then divided by the
> file_count, so that every process using a shmem file, DMA-buf or DRM
> driver will get it's equal amount of OOM badness.
>
> Callbacks are then implemented for the two core users (memfd and DMA-buf)
> as well as 72 DRM based graphics drivers.
>
> The result is that the OOM killer can now much better judge if a process
> is worth killing to free up memory. Resulting a quite a bit better system
> stability in OOM situations, especially while running games.
>
> The only other possibility I can see would be to change the accounting of
> resources whenever references to the file structure change, but this would
> mean quite some additional overhead for a rather common operation.
>
> Additionally I think trying to limit device driver allocations using
> cgroups is orthogonal to this effort. While cgroups is very useful, it
> works on per process limits and tries to enforce a collaborative model on
> memory management while the OOM killer enforces a competitive model.
>
> Please comment and/or review, we have that problem flying around for years
> now and are not at a point where we finally need to find a solution for
> this.
>
> Regards,
> Christian.
>
> [1] https://lists.freedesktop.org/archives/dri-devel/2015-September/089778.html
> [2] https://lkml.org/lkml/2018/1/18/543
> [3] https://lkml.org/lkml/2021/2/4/799
>
>

^ permalink raw reply	[flat|nested] 145+ messages in thread

* Re: [Intel-gfx] Per file OOM badness
@ 2022-05-31 22:00   ` Alex Deucher
  0 siblings, 0 replies; 145+ messages in thread
From: Alex Deucher @ 2022-05-31 22:00 UTC (permalink / raw)
  To: Christian König, Maling list - DRI developers
  Cc: Andrey Grodzovsky, linux-mm, nouveau, Intel Graphics Development,
	Hugh Dickens, LKML, amd-gfx list, Linux-Fsdevel, Alexander Viro,
	linux-tegra, Deucher, Alexander, Andrew Morton, Christian Koenig,
	linux-media

+ dri-devel

On Tue, May 31, 2022 at 6:00 AM Christian König
<ckoenig.leichtzumerken@gmail.com> wrote:
>
> Hello everyone,
>
> To summarize the issue I'm trying to address here: Processes can allocate
> resources through a file descriptor without being held responsible for it.
>
> Especially for the DRM graphics driver subsystem this is rather
> problematic. Modern games tend to allocate huge amounts of system memory
> through the DRM drivers to make it accessible to GPU rendering.
>
> But even outside of the DRM subsystem this problem exists and it is
> trivial to exploit. See the following simple example of
> using memfd_create():
>
>          fd = memfd_create("test", 0);
>          while (1)
>                  write(fd, page, 4096);
>
> Compile this and you can bring down any standard desktop system within
> seconds.
>
> The background is that the OOM killer will kill every processes in the
> system, but just not the one which holds the only reference to the memory
> allocated by the memfd.
>
> Those problems where brought up on the mailing list multiple times now
> [1][2][3], but without any final conclusion how to address them. Since
> file descriptors are considered shared the process can not directly held
> accountable for allocations made through them. Additional to that file
> descriptors can also easily move between processes as well.
>
> So what this patch set does is to instead of trying to account the
> allocated memory to a specific process it adds a callback to struct
> file_operations which the OOM killer can use to query the specific OOM
> badness of this file reference. This badness is then divided by the
> file_count, so that every process using a shmem file, DMA-buf or DRM
> driver will get it's equal amount of OOM badness.
>
> Callbacks are then implemented for the two core users (memfd and DMA-buf)
> as well as 72 DRM based graphics drivers.
>
> The result is that the OOM killer can now much better judge if a process
> is worth killing to free up memory. Resulting a quite a bit better system
> stability in OOM situations, especially while running games.
>
> The only other possibility I can see would be to change the accounting of
> resources whenever references to the file structure change, but this would
> mean quite some additional overhead for a rather common operation.
>
> Additionally I think trying to limit device driver allocations using
> cgroups is orthogonal to this effort. While cgroups is very useful, it
> works on per process limits and tries to enforce a collaborative model on
> memory management while the OOM killer enforces a competitive model.
>
> Please comment and/or review, we have that problem flying around for years
> now and are not at a point where we finally need to find a solution for
> this.
>
> Regards,
> Christian.
>
> [1] https://lists.freedesktop.org/archives/dri-devel/2015-September/089778.html
> [2] https://lkml.org/lkml/2018/1/18/543
> [3] https://lkml.org/lkml/2021/2/4/799
>
>

^ permalink raw reply	[flat|nested] 145+ messages in thread

* Re: Per file OOM badness
@ 2022-05-31 22:00   ` Alex Deucher
  0 siblings, 0 replies; 145+ messages in thread
From: Alex Deucher @ 2022-05-31 22:00 UTC (permalink / raw)
  To: Christian König, Maling list - DRI developers
  Cc: Andrey Grodzovsky, linux-mm, nouveau, Intel Graphics Development,
	Hugh Dickens, LKML, amd-gfx list, Linux-Fsdevel, Alexander Viro,
	Daniel Vetter, linux-tegra, Deucher, Alexander, Andrew Morton,
	Christian Koenig, linux-media

+ dri-devel

On Tue, May 31, 2022 at 6:00 AM Christian König
<ckoenig.leichtzumerken@gmail.com> wrote:
>
> Hello everyone,
>
> To summarize the issue I'm trying to address here: Processes can allocate
> resources through a file descriptor without being held responsible for it.
>
> Especially for the DRM graphics driver subsystem this is rather
> problematic. Modern games tend to allocate huge amounts of system memory
> through the DRM drivers to make it accessible to GPU rendering.
>
> But even outside of the DRM subsystem this problem exists and it is
> trivial to exploit. See the following simple example of
> using memfd_create():
>
>          fd = memfd_create("test", 0);
>          while (1)
>                  write(fd, page, 4096);
>
> Compile this and you can bring down any standard desktop system within
> seconds.
>
> The background is that the OOM killer will kill every processes in the
> system, but just not the one which holds the only reference to the memory
> allocated by the memfd.
>
> Those problems where brought up on the mailing list multiple times now
> [1][2][3], but without any final conclusion how to address them. Since
> file descriptors are considered shared the process can not directly held
> accountable for allocations made through them. Additional to that file
> descriptors can also easily move between processes as well.
>
> So what this patch set does is to instead of trying to account the
> allocated memory to a specific process it adds a callback to struct
> file_operations which the OOM killer can use to query the specific OOM
> badness of this file reference. This badness is then divided by the
> file_count, so that every process using a shmem file, DMA-buf or DRM
> driver will get it's equal amount of OOM badness.
>
> Callbacks are then implemented for the two core users (memfd and DMA-buf)
> as well as 72 DRM based graphics drivers.
>
> The result is that the OOM killer can now much better judge if a process
> is worth killing to free up memory. Resulting a quite a bit better system
> stability in OOM situations, especially while running games.
>
> The only other possibility I can see would be to change the accounting of
> resources whenever references to the file structure change, but this would
> mean quite some additional overhead for a rather common operation.
>
> Additionally I think trying to limit device driver allocations using
> cgroups is orthogonal to this effort. While cgroups is very useful, it
> works on per process limits and tries to enforce a collaborative model on
> memory management while the OOM killer enforces a competitive model.
>
> Please comment and/or review, we have that problem flying around for years
> now and are not at a point where we finally need to find a solution for
> this.
>
> Regards,
> Christian.
>
> [1] https://lists.freedesktop.org/archives/dri-devel/2015-September/089778.html
> [2] https://lkml.org/lkml/2018/1/18/543
> [3] https://lkml.org/lkml/2021/2/4/799
>
>

^ permalink raw reply	[flat|nested] 145+ messages in thread

* Re: [PATCH 03/13] mm: shmem: provide oom badness for shmem files
  2022-05-31  9:59   ` [Nouveau] " Christian König
  (?)
  (?)
@ 2022-06-09  9:18     ` Michal Hocko
  -1 siblings, 0 replies; 145+ messages in thread
From: Michal Hocko @ 2022-06-09  9:18 UTC (permalink / raw)
  To: Christian König
  Cc: linux-media, linux-kernel, intel-gfx, amd-gfx, nouveau,
	linux-tegra, linux-fsdevel, linux-mm, christian.koenig,
	alexander.deucher, daniel, viro, akpm, hughd, andrey.grodzovsky

On Tue 31-05-22 11:59:57, Christian König wrote:
> This gives the OOM killer an additional hint which processes are
> referencing shmem files with potentially no other accounting for them.
> 
> Signed-off-by: Christian König <christian.koenig@amd.com>
> ---
>  mm/shmem.c | 6 ++++++
>  1 file changed, 6 insertions(+)
> 
> diff --git a/mm/shmem.c b/mm/shmem.c
> index 4b2fea33158e..a4ad92a16968 100644
> --- a/mm/shmem.c
> +++ b/mm/shmem.c
> @@ -2179,6 +2179,11 @@ unsigned long shmem_get_unmapped_area(struct file *file,
>  	return inflated_addr;
>  }
>  
> +static long shmem_oom_badness(struct file *file)
> +{
> +	return i_size_read(file_inode(file)) >> PAGE_SHIFT;
> +}

This doesn't really represent the in memory size of the file, does it?
Also the memcg oom handling could be considerably skewed if the file was
shared between more memcgs.

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 145+ messages in thread

* Re: [Intel-gfx] [PATCH 03/13] mm: shmem: provide oom badness for shmem files
@ 2022-06-09  9:18     ` Michal Hocko
  0 siblings, 0 replies; 145+ messages in thread
From: Michal Hocko @ 2022-06-09  9:18 UTC (permalink / raw)
  To: Christian König
  Cc: andrey.grodzovsky, linux-mm, nouveau, intel-gfx, hughd,
	linux-kernel, amd-gfx, linux-fsdevel, viro, linux-tegra,
	alexander.deucher, akpm, christian.koenig, linux-media

On Tue 31-05-22 11:59:57, Christian König wrote:
> This gives the OOM killer an additional hint which processes are
> referencing shmem files with potentially no other accounting for them.
> 
> Signed-off-by: Christian König <christian.koenig@amd.com>
> ---
>  mm/shmem.c | 6 ++++++
>  1 file changed, 6 insertions(+)
> 
> diff --git a/mm/shmem.c b/mm/shmem.c
> index 4b2fea33158e..a4ad92a16968 100644
> --- a/mm/shmem.c
> +++ b/mm/shmem.c
> @@ -2179,6 +2179,11 @@ unsigned long shmem_get_unmapped_area(struct file *file,
>  	return inflated_addr;
>  }
>  
> +static long shmem_oom_badness(struct file *file)
> +{
> +	return i_size_read(file_inode(file)) >> PAGE_SHIFT;
> +}

This doesn't really represent the in memory size of the file, does it?
Also the memcg oom handling could be considerably skewed if the file was
shared between more memcgs.

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 145+ messages in thread

* Re: [PATCH 03/13] mm: shmem: provide oom badness for shmem files
@ 2022-06-09  9:18     ` Michal Hocko
  0 siblings, 0 replies; 145+ messages in thread
From: Michal Hocko @ 2022-06-09  9:18 UTC (permalink / raw)
  To: Christian König
  Cc: andrey.grodzovsky, linux-mm, nouveau, intel-gfx, hughd,
	linux-kernel, amd-gfx, linux-fsdevel, viro, daniel, linux-tegra,
	alexander.deucher, akpm, christian.koenig, linux-media

On Tue 31-05-22 11:59:57, Christian König wrote:
> This gives the OOM killer an additional hint which processes are
> referencing shmem files with potentially no other accounting for them.
> 
> Signed-off-by: Christian König <christian.koenig@amd.com>
> ---
>  mm/shmem.c | 6 ++++++
>  1 file changed, 6 insertions(+)
> 
> diff --git a/mm/shmem.c b/mm/shmem.c
> index 4b2fea33158e..a4ad92a16968 100644
> --- a/mm/shmem.c
> +++ b/mm/shmem.c
> @@ -2179,6 +2179,11 @@ unsigned long shmem_get_unmapped_area(struct file *file,
>  	return inflated_addr;
>  }
>  
> +static long shmem_oom_badness(struct file *file)
> +{
> +	return i_size_read(file_inode(file)) >> PAGE_SHIFT;
> +}

This doesn't really represent the in memory size of the file, does it?
Also the memcg oom handling could be considerably skewed if the file was
shared between more memcgs.

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 145+ messages in thread

* Re: [Nouveau] [PATCH 03/13] mm: shmem: provide oom badness for shmem files
@ 2022-06-09  9:18     ` Michal Hocko
  0 siblings, 0 replies; 145+ messages in thread
From: Michal Hocko @ 2022-06-09  9:18 UTC (permalink / raw)
  To: Christian König
  Cc: andrey.grodzovsky, linux-mm, nouveau, intel-gfx, hughd,
	linux-kernel, amd-gfx, linux-fsdevel, viro, daniel, linux-tegra,
	alexander.deucher, akpm, christian.koenig, linux-media

On Tue 31-05-22 11:59:57, Christian König wrote:
> This gives the OOM killer an additional hint which processes are
> referencing shmem files with potentially no other accounting for them.
> 
> Signed-off-by: Christian König <christian.koenig@amd.com>
> ---
>  mm/shmem.c | 6 ++++++
>  1 file changed, 6 insertions(+)
> 
> diff --git a/mm/shmem.c b/mm/shmem.c
> index 4b2fea33158e..a4ad92a16968 100644
> --- a/mm/shmem.c
> +++ b/mm/shmem.c
> @@ -2179,6 +2179,11 @@ unsigned long shmem_get_unmapped_area(struct file *file,
>  	return inflated_addr;
>  }
>  
> +static long shmem_oom_badness(struct file *file)
> +{
> +	return i_size_read(file_inode(file)) >> PAGE_SHIFT;
> +}

This doesn't really represent the in memory size of the file, does it?
Also the memcg oom handling could be considerably skewed if the file was
shared between more memcgs.

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 145+ messages in thread

* Re: [PATCH 03/13] mm: shmem: provide oom badness for shmem files
  2022-06-09  9:18     ` [Intel-gfx] " Michal Hocko
  (?)
  (?)
@ 2022-06-09 12:16       ` Christian König
  -1 siblings, 0 replies; 145+ messages in thread
From: Christian König @ 2022-06-09 12:16 UTC (permalink / raw)
  To: Michal Hocko, Christian König
  Cc: linux-media, linux-kernel, intel-gfx, amd-gfx, nouveau,
	linux-tegra, linux-fsdevel, linux-mm, alexander.deucher, daniel,
	viro, akpm, hughd, andrey.grodzovsky

Am 09.06.22 um 11:18 schrieb Michal Hocko:
> On Tue 31-05-22 11:59:57, Christian König wrote:
>> This gives the OOM killer an additional hint which processes are
>> referencing shmem files with potentially no other accounting for them.
>>
>> Signed-off-by: Christian König <christian.koenig@amd.com>
>> ---
>>   mm/shmem.c | 6 ++++++
>>   1 file changed, 6 insertions(+)
>>
>> diff --git a/mm/shmem.c b/mm/shmem.c
>> index 4b2fea33158e..a4ad92a16968 100644
>> --- a/mm/shmem.c
>> +++ b/mm/shmem.c
>> @@ -2179,6 +2179,11 @@ unsigned long shmem_get_unmapped_area(struct file *file,
>>   	return inflated_addr;
>>   }
>>   
>> +static long shmem_oom_badness(struct file *file)
>> +{
>> +	return i_size_read(file_inode(file)) >> PAGE_SHIFT;
>> +}
> This doesn't really represent the in memory size of the file, does it?

Well the file could be partially or fully swapped out as anonymous 
memory or the address space only sparse populated, but even then just 
using the file size as OOM badness sounded like the most straightforward 
approach to me.

What could happen is that the file is also mmaped and we double account.

> Also the memcg oom handling could be considerably skewed if the file was
> shared between more memcgs.

Yes, and that's one of the reasons why I didn't touched the memcg by 
this and only affected the classic OOM killer.

Thanks for the comments,
Christian.

^ permalink raw reply	[flat|nested] 145+ messages in thread

* Re: [Nouveau] [PATCH 03/13] mm: shmem: provide oom badness for shmem files
@ 2022-06-09 12:16       ` Christian König
  0 siblings, 0 replies; 145+ messages in thread
From: Christian König @ 2022-06-09 12:16 UTC (permalink / raw)
  To: Michal Hocko, Christian König
  Cc: andrey.grodzovsky, linux-mm, nouveau, intel-gfx, hughd,
	linux-kernel, amd-gfx, linux-fsdevel, viro, daniel, linux-tegra,
	alexander.deucher, akpm, linux-media

Am 09.06.22 um 11:18 schrieb Michal Hocko:
> On Tue 31-05-22 11:59:57, Christian König wrote:
>> This gives the OOM killer an additional hint which processes are
>> referencing shmem files with potentially no other accounting for them.
>>
>> Signed-off-by: Christian König <christian.koenig@amd.com>
>> ---
>>   mm/shmem.c | 6 ++++++
>>   1 file changed, 6 insertions(+)
>>
>> diff --git a/mm/shmem.c b/mm/shmem.c
>> index 4b2fea33158e..a4ad92a16968 100644
>> --- a/mm/shmem.c
>> +++ b/mm/shmem.c
>> @@ -2179,6 +2179,11 @@ unsigned long shmem_get_unmapped_area(struct file *file,
>>   	return inflated_addr;
>>   }
>>   
>> +static long shmem_oom_badness(struct file *file)
>> +{
>> +	return i_size_read(file_inode(file)) >> PAGE_SHIFT;
>> +}
> This doesn't really represent the in memory size of the file, does it?

Well the file could be partially or fully swapped out as anonymous 
memory or the address space only sparse populated, but even then just 
using the file size as OOM badness sounded like the most straightforward 
approach to me.

What could happen is that the file is also mmaped and we double account.

> Also the memcg oom handling could be considerably skewed if the file was
> shared between more memcgs.

Yes, and that's one of the reasons why I didn't touched the memcg by 
this and only affected the classic OOM killer.

Thanks for the comments,
Christian.

^ permalink raw reply	[flat|nested] 145+ messages in thread

* Re: [Intel-gfx] [PATCH 03/13] mm: shmem: provide oom badness for shmem files
@ 2022-06-09 12:16       ` Christian König
  0 siblings, 0 replies; 145+ messages in thread
From: Christian König @ 2022-06-09 12:16 UTC (permalink / raw)
  To: Michal Hocko, Christian König
  Cc: andrey.grodzovsky, linux-mm, nouveau, intel-gfx, hughd,
	linux-kernel, amd-gfx, linux-fsdevel, viro, linux-tegra,
	alexander.deucher, akpm, linux-media

Am 09.06.22 um 11:18 schrieb Michal Hocko:
> On Tue 31-05-22 11:59:57, Christian König wrote:
>> This gives the OOM killer an additional hint which processes are
>> referencing shmem files with potentially no other accounting for them.
>>
>> Signed-off-by: Christian König <christian.koenig@amd.com>
>> ---
>>   mm/shmem.c | 6 ++++++
>>   1 file changed, 6 insertions(+)
>>
>> diff --git a/mm/shmem.c b/mm/shmem.c
>> index 4b2fea33158e..a4ad92a16968 100644
>> --- a/mm/shmem.c
>> +++ b/mm/shmem.c
>> @@ -2179,6 +2179,11 @@ unsigned long shmem_get_unmapped_area(struct file *file,
>>   	return inflated_addr;
>>   }
>>   
>> +static long shmem_oom_badness(struct file *file)
>> +{
>> +	return i_size_read(file_inode(file)) >> PAGE_SHIFT;
>> +}
> This doesn't really represent the in memory size of the file, does it?

Well the file could be partially or fully swapped out as anonymous 
memory or the address space only sparse populated, but even then just 
using the file size as OOM badness sounded like the most straightforward 
approach to me.

What could happen is that the file is also mmaped and we double account.

> Also the memcg oom handling could be considerably skewed if the file was
> shared between more memcgs.

Yes, and that's one of the reasons why I didn't touched the memcg by 
this and only affected the classic OOM killer.

Thanks for the comments,
Christian.

^ permalink raw reply	[flat|nested] 145+ messages in thread

* Re: [PATCH 03/13] mm: shmem: provide oom badness for shmem files
@ 2022-06-09 12:16       ` Christian König
  0 siblings, 0 replies; 145+ messages in thread
From: Christian König @ 2022-06-09 12:16 UTC (permalink / raw)
  To: Michal Hocko, Christian König
  Cc: andrey.grodzovsky, linux-mm, nouveau, intel-gfx, hughd,
	linux-kernel, amd-gfx, linux-fsdevel, viro, daniel, linux-tegra,
	alexander.deucher, akpm, linux-media

Am 09.06.22 um 11:18 schrieb Michal Hocko:
> On Tue 31-05-22 11:59:57, Christian König wrote:
>> This gives the OOM killer an additional hint which processes are
>> referencing shmem files with potentially no other accounting for them.
>>
>> Signed-off-by: Christian König <christian.koenig@amd.com>
>> ---
>>   mm/shmem.c | 6 ++++++
>>   1 file changed, 6 insertions(+)
>>
>> diff --git a/mm/shmem.c b/mm/shmem.c
>> index 4b2fea33158e..a4ad92a16968 100644
>> --- a/mm/shmem.c
>> +++ b/mm/shmem.c
>> @@ -2179,6 +2179,11 @@ unsigned long shmem_get_unmapped_area(struct file *file,
>>   	return inflated_addr;
>>   }
>>   
>> +static long shmem_oom_badness(struct file *file)
>> +{
>> +	return i_size_read(file_inode(file)) >> PAGE_SHIFT;
>> +}
> This doesn't really represent the in memory size of the file, does it?

Well the file could be partially or fully swapped out as anonymous 
memory or the address space only sparse populated, but even then just 
using the file size as OOM badness sounded like the most straightforward 
approach to me.

What could happen is that the file is also mmaped and we double account.

> Also the memcg oom handling could be considerably skewed if the file was
> shared between more memcgs.

Yes, and that's one of the reasons why I didn't touched the memcg by 
this and only affected the classic OOM killer.

Thanks for the comments,
Christian.

^ permalink raw reply	[flat|nested] 145+ messages in thread

* Re: [PATCH 03/13] mm: shmem: provide oom badness for shmem files
  2022-06-09 12:16       ` [Nouveau] " Christian König
  (?)
  (?)
@ 2022-06-09 12:57         ` Michal Hocko
  -1 siblings, 0 replies; 145+ messages in thread
From: Michal Hocko @ 2022-06-09 12:57 UTC (permalink / raw)
  To: Christian König
  Cc: Christian König, linux-media, linux-kernel, intel-gfx,
	amd-gfx, nouveau, linux-tegra, linux-fsdevel, linux-mm,
	alexander.deucher, daniel, viro, akpm, hughd, andrey.grodzovsky

On Thu 09-06-22 14:16:56, Christian König wrote:
> Am 09.06.22 um 11:18 schrieb Michal Hocko:
> > On Tue 31-05-22 11:59:57, Christian König wrote:
> > > This gives the OOM killer an additional hint which processes are
> > > referencing shmem files with potentially no other accounting for them.
> > > 
> > > Signed-off-by: Christian König <christian.koenig@amd.com>
> > > ---
> > >   mm/shmem.c | 6 ++++++
> > >   1 file changed, 6 insertions(+)
> > > 
> > > diff --git a/mm/shmem.c b/mm/shmem.c
> > > index 4b2fea33158e..a4ad92a16968 100644
> > > --- a/mm/shmem.c
> > > +++ b/mm/shmem.c
> > > @@ -2179,6 +2179,11 @@ unsigned long shmem_get_unmapped_area(struct file *file,
> > >   	return inflated_addr;
> > >   }
> > > +static long shmem_oom_badness(struct file *file)
> > > +{
> > > +	return i_size_read(file_inode(file)) >> PAGE_SHIFT;
> > > +}
> > This doesn't really represent the in memory size of the file, does it?
> 
> Well the file could be partially or fully swapped out as anonymous memory or
> the address space only sparse populated, but even then just using the file
> size as OOM badness sounded like the most straightforward approach to me.

It covers hole as well, right?

> What could happen is that the file is also mmaped and we double account.
> 
> > Also the memcg oom handling could be considerably skewed if the file was
> > shared between more memcgs.
> 
> Yes, and that's one of the reasons why I didn't touched the memcg by this
> and only affected the classic OOM killer.

oom_badness is for all oom handlers, including memcg. Maybe I have
misread an earlier patch but I do not see anything specific to global
oom handling.

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 145+ messages in thread

* Re: [Intel-gfx] [PATCH 03/13] mm: shmem: provide oom badness for shmem files
@ 2022-06-09 12:57         ` Michal Hocko
  0 siblings, 0 replies; 145+ messages in thread
From: Michal Hocko @ 2022-06-09 12:57 UTC (permalink / raw)
  To: Christian König
  Cc: andrey.grodzovsky, linux-mm, nouveau, intel-gfx, hughd,
	linux-kernel, amd-gfx, linux-fsdevel, viro, Christian König,
	linux-tegra, alexander.deucher, akpm, linux-media

On Thu 09-06-22 14:16:56, Christian König wrote:
> Am 09.06.22 um 11:18 schrieb Michal Hocko:
> > On Tue 31-05-22 11:59:57, Christian König wrote:
> > > This gives the OOM killer an additional hint which processes are
> > > referencing shmem files with potentially no other accounting for them.
> > > 
> > > Signed-off-by: Christian König <christian.koenig@amd.com>
> > > ---
> > >   mm/shmem.c | 6 ++++++
> > >   1 file changed, 6 insertions(+)
> > > 
> > > diff --git a/mm/shmem.c b/mm/shmem.c
> > > index 4b2fea33158e..a4ad92a16968 100644
> > > --- a/mm/shmem.c
> > > +++ b/mm/shmem.c
> > > @@ -2179,6 +2179,11 @@ unsigned long shmem_get_unmapped_area(struct file *file,
> > >   	return inflated_addr;
> > >   }
> > > +static long shmem_oom_badness(struct file *file)
> > > +{
> > > +	return i_size_read(file_inode(file)) >> PAGE_SHIFT;
> > > +}
> > This doesn't really represent the in memory size of the file, does it?
> 
> Well the file could be partially or fully swapped out as anonymous memory or
> the address space only sparse populated, but even then just using the file
> size as OOM badness sounded like the most straightforward approach to me.

It covers hole as well, right?

> What could happen is that the file is also mmaped and we double account.
> 
> > Also the memcg oom handling could be considerably skewed if the file was
> > shared between more memcgs.
> 
> Yes, and that's one of the reasons why I didn't touched the memcg by this
> and only affected the classic OOM killer.

oom_badness is for all oom handlers, including memcg. Maybe I have
misread an earlier patch but I do not see anything specific to global
oom handling.

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 145+ messages in thread

* Re: [PATCH 03/13] mm: shmem: provide oom badness for shmem files
@ 2022-06-09 12:57         ` Michal Hocko
  0 siblings, 0 replies; 145+ messages in thread
From: Michal Hocko @ 2022-06-09 12:57 UTC (permalink / raw)
  To: Christian König
  Cc: andrey.grodzovsky, linux-mm, nouveau, intel-gfx, hughd,
	linux-kernel, amd-gfx, linux-fsdevel, viro, daniel,
	Christian König, linux-tegra, alexander.deucher, akpm,
	linux-media

On Thu 09-06-22 14:16:56, Christian König wrote:
> Am 09.06.22 um 11:18 schrieb Michal Hocko:
> > On Tue 31-05-22 11:59:57, Christian König wrote:
> > > This gives the OOM killer an additional hint which processes are
> > > referencing shmem files with potentially no other accounting for them.
> > > 
> > > Signed-off-by: Christian König <christian.koenig@amd.com>
> > > ---
> > >   mm/shmem.c | 6 ++++++
> > >   1 file changed, 6 insertions(+)
> > > 
> > > diff --git a/mm/shmem.c b/mm/shmem.c
> > > index 4b2fea33158e..a4ad92a16968 100644
> > > --- a/mm/shmem.c
> > > +++ b/mm/shmem.c
> > > @@ -2179,6 +2179,11 @@ unsigned long shmem_get_unmapped_area(struct file *file,
> > >   	return inflated_addr;
> > >   }
> > > +static long shmem_oom_badness(struct file *file)
> > > +{
> > > +	return i_size_read(file_inode(file)) >> PAGE_SHIFT;
> > > +}
> > This doesn't really represent the in memory size of the file, does it?
> 
> Well the file could be partially or fully swapped out as anonymous memory or
> the address space only sparse populated, but even then just using the file
> size as OOM badness sounded like the most straightforward approach to me.

It covers hole as well, right?

> What could happen is that the file is also mmaped and we double account.
> 
> > Also the memcg oom handling could be considerably skewed if the file was
> > shared between more memcgs.
> 
> Yes, and that's one of the reasons why I didn't touched the memcg by this
> and only affected the classic OOM killer.

oom_badness is for all oom handlers, including memcg. Maybe I have
misread an earlier patch but I do not see anything specific to global
oom handling.

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 145+ messages in thread

* Re: [Nouveau] [PATCH 03/13] mm: shmem: provide oom badness for shmem files
@ 2022-06-09 12:57         ` Michal Hocko
  0 siblings, 0 replies; 145+ messages in thread
From: Michal Hocko @ 2022-06-09 12:57 UTC (permalink / raw)
  To: Christian König
  Cc: andrey.grodzovsky, linux-mm, nouveau, intel-gfx, hughd,
	linux-kernel, amd-gfx, linux-fsdevel, viro, daniel,
	Christian König, linux-tegra, alexander.deucher, akpm,
	linux-media

On Thu 09-06-22 14:16:56, Christian König wrote:
> Am 09.06.22 um 11:18 schrieb Michal Hocko:
> > On Tue 31-05-22 11:59:57, Christian König wrote:
> > > This gives the OOM killer an additional hint which processes are
> > > referencing shmem files with potentially no other accounting for them.
> > > 
> > > Signed-off-by: Christian König <christian.koenig@amd.com>
> > > ---
> > >   mm/shmem.c | 6 ++++++
> > >   1 file changed, 6 insertions(+)
> > > 
> > > diff --git a/mm/shmem.c b/mm/shmem.c
> > > index 4b2fea33158e..a4ad92a16968 100644
> > > --- a/mm/shmem.c
> > > +++ b/mm/shmem.c
> > > @@ -2179,6 +2179,11 @@ unsigned long shmem_get_unmapped_area(struct file *file,
> > >   	return inflated_addr;
> > >   }
> > > +static long shmem_oom_badness(struct file *file)
> > > +{
> > > +	return i_size_read(file_inode(file)) >> PAGE_SHIFT;
> > > +}
> > This doesn't really represent the in memory size of the file, does it?
> 
> Well the file could be partially or fully swapped out as anonymous memory or
> the address space only sparse populated, but even then just using the file
> size as OOM badness sounded like the most straightforward approach to me.

It covers hole as well, right?

> What could happen is that the file is also mmaped and we double account.
> 
> > Also the memcg oom handling could be considerably skewed if the file was
> > shared between more memcgs.
> 
> Yes, and that's one of the reasons why I didn't touched the memcg by this
> and only affected the classic OOM killer.

oom_badness is for all oom handlers, including memcg. Maybe I have
misread an earlier patch but I do not see anything specific to global
oom handling.

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 145+ messages in thread

* Re: [PATCH 03/13] mm: shmem: provide oom badness for shmem files
  2022-06-09 12:57         ` [Intel-gfx] " Michal Hocko
  (?)
@ 2022-06-09 14:10           ` Christian König
  -1 siblings, 0 replies; 145+ messages in thread
From: Christian König @ 2022-06-09 14:10 UTC (permalink / raw)
  To: Michal Hocko, Christian König
  Cc: linux-media, linux-kernel, intel-gfx, amd-gfx, nouveau,
	linux-tegra, linux-fsdevel, linux-mm, alexander.deucher, daniel,
	viro, akpm, hughd, andrey.grodzovsky

Am 09.06.22 um 14:57 schrieb Michal Hocko:
> On Thu 09-06-22 14:16:56, Christian König wrote:
>> Am 09.06.22 um 11:18 schrieb Michal Hocko:
>>> On Tue 31-05-22 11:59:57, Christian König wrote:
>>>> This gives the OOM killer an additional hint which processes are
>>>> referencing shmem files with potentially no other accounting for them.
>>>>
>>>> Signed-off-by: Christian König <christian.koenig@amd.com>
>>>> ---
>>>>    mm/shmem.c | 6 ++++++
>>>>    1 file changed, 6 insertions(+)
>>>>
>>>> diff --git a/mm/shmem.c b/mm/shmem.c
>>>> index 4b2fea33158e..a4ad92a16968 100644
>>>> --- a/mm/shmem.c
>>>> +++ b/mm/shmem.c
>>>> @@ -2179,6 +2179,11 @@ unsigned long shmem_get_unmapped_area(struct file *file,
>>>>    	return inflated_addr;
>>>>    }
>>>> +static long shmem_oom_badness(struct file *file)
>>>> +{
>>>> +	return i_size_read(file_inode(file)) >> PAGE_SHIFT;
>>>> +}
>>> This doesn't really represent the in memory size of the file, does it?
>> Well the file could be partially or fully swapped out as anonymous memory or
>> the address space only sparse populated, but even then just using the file
>> size as OOM badness sounded like the most straightforward approach to me.
> It covers hole as well, right?

Yes, exactly.

>
>> What could happen is that the file is also mmaped and we double account.
>>
>>> Also the memcg oom handling could be considerably skewed if the file was
>>> shared between more memcgs.
>> Yes, and that's one of the reasons why I didn't touched the memcg by this
>> and only affected the classic OOM killer.
> oom_badness is for all oom handlers, including memcg. Maybe I have
> misread an earlier patch but I do not see anything specific to global
> oom handling.

As far as I can see the oom_badness() function is only used in oom_kill.c and in procfs to return the oom score. Did I missed something?

Regards,
Christian.

^ permalink raw reply	[flat|nested] 145+ messages in thread

* Re: [Nouveau] [PATCH 03/13] mm: shmem: provide oom badness for shmem files
@ 2022-06-09 14:10           ` Christian König
  0 siblings, 0 replies; 145+ messages in thread
From: Christian König @ 2022-06-09 14:10 UTC (permalink / raw)
  To: Michal Hocko, Christian König
  Cc: andrey.grodzovsky, linux-mm, nouveau, intel-gfx, hughd,
	linux-kernel, amd-gfx, linux-fsdevel, viro, daniel, linux-tegra,
	alexander.deucher, akpm, linux-media

Am 09.06.22 um 14:57 schrieb Michal Hocko:
> On Thu 09-06-22 14:16:56, Christian König wrote:
>> Am 09.06.22 um 11:18 schrieb Michal Hocko:
>>> On Tue 31-05-22 11:59:57, Christian König wrote:
>>>> This gives the OOM killer an additional hint which processes are
>>>> referencing shmem files with potentially no other accounting for them.
>>>>
>>>> Signed-off-by: Christian König <christian.koenig@amd.com>
>>>> ---
>>>>    mm/shmem.c | 6 ++++++
>>>>    1 file changed, 6 insertions(+)
>>>>
>>>> diff --git a/mm/shmem.c b/mm/shmem.c
>>>> index 4b2fea33158e..a4ad92a16968 100644
>>>> --- a/mm/shmem.c
>>>> +++ b/mm/shmem.c
>>>> @@ -2179,6 +2179,11 @@ unsigned long shmem_get_unmapped_area(struct file *file,
>>>>    	return inflated_addr;
>>>>    }
>>>> +static long shmem_oom_badness(struct file *file)
>>>> +{
>>>> +	return i_size_read(file_inode(file)) >> PAGE_SHIFT;
>>>> +}
>>> This doesn't really represent the in memory size of the file, does it?
>> Well the file could be partially or fully swapped out as anonymous memory or
>> the address space only sparse populated, but even then just using the file
>> size as OOM badness sounded like the most straightforward approach to me.
> It covers hole as well, right?

Yes, exactly.

>
>> What could happen is that the file is also mmaped and we double account.
>>
>>> Also the memcg oom handling could be considerably skewed if the file was
>>> shared between more memcgs.
>> Yes, and that's one of the reasons why I didn't touched the memcg by this
>> and only affected the classic OOM killer.
> oom_badness is for all oom handlers, including memcg. Maybe I have
> misread an earlier patch but I do not see anything specific to global
> oom handling.

As far as I can see the oom_badness() function is only used in oom_kill.c and in procfs to return the oom score. Did I missed something?

Regards,
Christian.

^ permalink raw reply	[flat|nested] 145+ messages in thread

* Re: [PATCH 03/13] mm: shmem: provide oom badness for shmem files
@ 2022-06-09 14:10           ` Christian König
  0 siblings, 0 replies; 145+ messages in thread
From: Christian König @ 2022-06-09 14:10 UTC (permalink / raw)
  To: Michal Hocko, Christian König
  Cc: andrey.grodzovsky, linux-mm, nouveau, intel-gfx, hughd,
	linux-kernel, amd-gfx, linux-fsdevel, viro, daniel, linux-tegra,
	alexander.deucher, akpm, linux-media

Am 09.06.22 um 14:57 schrieb Michal Hocko:
> On Thu 09-06-22 14:16:56, Christian König wrote:
>> Am 09.06.22 um 11:18 schrieb Michal Hocko:
>>> On Tue 31-05-22 11:59:57, Christian König wrote:
>>>> This gives the OOM killer an additional hint which processes are
>>>> referencing shmem files with potentially no other accounting for them.
>>>>
>>>> Signed-off-by: Christian König <christian.koenig@amd.com>
>>>> ---
>>>>    mm/shmem.c | 6 ++++++
>>>>    1 file changed, 6 insertions(+)
>>>>
>>>> diff --git a/mm/shmem.c b/mm/shmem.c
>>>> index 4b2fea33158e..a4ad92a16968 100644
>>>> --- a/mm/shmem.c
>>>> +++ b/mm/shmem.c
>>>> @@ -2179,6 +2179,11 @@ unsigned long shmem_get_unmapped_area(struct file *file,
>>>>    	return inflated_addr;
>>>>    }
>>>> +static long shmem_oom_badness(struct file *file)
>>>> +{
>>>> +	return i_size_read(file_inode(file)) >> PAGE_SHIFT;
>>>> +}
>>> This doesn't really represent the in memory size of the file, does it?
>> Well the file could be partially or fully swapped out as anonymous memory or
>> the address space only sparse populated, but even then just using the file
>> size as OOM badness sounded like the most straightforward approach to me.
> It covers hole as well, right?

Yes, exactly.

>
>> What could happen is that the file is also mmaped and we double account.
>>
>>> Also the memcg oom handling could be considerably skewed if the file was
>>> shared between more memcgs.
>> Yes, and that's one of the reasons why I didn't touched the memcg by this
>> and only affected the classic OOM killer.
> oom_badness is for all oom handlers, including memcg. Maybe I have
> misread an earlier patch but I do not see anything specific to global
> oom handling.

As far as I can see the oom_badness() function is only used in oom_kill.c and in procfs to return the oom score. Did I missed something?

Regards,
Christian.

^ permalink raw reply	[flat|nested] 145+ messages in thread

* Re: [PATCH 03/13] mm: shmem: provide oom badness for shmem files
  2022-06-09 14:10           ` [Nouveau] " Christian König
  (?)
  (?)
@ 2022-06-09 14:21             ` Michal Hocko
  -1 siblings, 0 replies; 145+ messages in thread
From: Michal Hocko @ 2022-06-09 14:21 UTC (permalink / raw)
  To: Christian König
  Cc: Christian König, linux-media, linux-kernel, intel-gfx,
	amd-gfx, nouveau, linux-tegra, linux-fsdevel, linux-mm,
	alexander.deucher, daniel, viro, akpm, hughd, andrey.grodzovsky

On Thu 09-06-22 16:10:33, Christian König wrote:
> Am 09.06.22 um 14:57 schrieb Michal Hocko:
> > On Thu 09-06-22 14:16:56, Christian König wrote:
> > > Am 09.06.22 um 11:18 schrieb Michal Hocko:
> > > > On Tue 31-05-22 11:59:57, Christian König wrote:
> > > > > This gives the OOM killer an additional hint which processes are
> > > > > referencing shmem files with potentially no other accounting for them.
> > > > > 
> > > > > Signed-off-by: Christian König <christian.koenig@amd.com>
> > > > > ---
> > > > >    mm/shmem.c | 6 ++++++
> > > > >    1 file changed, 6 insertions(+)
> > > > > 
> > > > > diff --git a/mm/shmem.c b/mm/shmem.c
> > > > > index 4b2fea33158e..a4ad92a16968 100644
> > > > > --- a/mm/shmem.c
> > > > > +++ b/mm/shmem.c
> > > > > @@ -2179,6 +2179,11 @@ unsigned long shmem_get_unmapped_area(struct file *file,
> > > > >    	return inflated_addr;
> > > > >    }
> > > > > +static long shmem_oom_badness(struct file *file)
> > > > > +{
> > > > > +	return i_size_read(file_inode(file)) >> PAGE_SHIFT;
> > > > > +}
> > > > This doesn't really represent the in memory size of the file, does it?
> > > Well the file could be partially or fully swapped out as anonymous memory or
> > > the address space only sparse populated, but even then just using the file
> > > size as OOM badness sounded like the most straightforward approach to me.
> > It covers hole as well, right?
> 
> Yes, exactly.

So let's say I have a huge sparse shmem file. I will get killed because
the oom_badness of such a file would be large as well...

> > > What could happen is that the file is also mmaped and we double account.
> > > 
> > > > Also the memcg oom handling could be considerably skewed if the file was
> > > > shared between more memcgs.
> > > Yes, and that's one of the reasons why I didn't touched the memcg by this
> > > and only affected the classic OOM killer.
> > oom_badness is for all oom handlers, including memcg. Maybe I have
> > misread an earlier patch but I do not see anything specific to global
> > oom handling.
> 
> As far as I can see the oom_badness() function is only used in
> oom_kill.c and in procfs to return the oom score. Did I missed
> something?

oom_kill.c implements most of the oom killer functionality. Memcg oom
killing is a part of that. Have a look at select_bad_process.

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 145+ messages in thread

* Re: [Intel-gfx] [PATCH 03/13] mm: shmem: provide oom badness for shmem files
@ 2022-06-09 14:21             ` Michal Hocko
  0 siblings, 0 replies; 145+ messages in thread
From: Michal Hocko @ 2022-06-09 14:21 UTC (permalink / raw)
  To: Christian König
  Cc: andrey.grodzovsky, linux-mm, nouveau, intel-gfx, hughd,
	linux-kernel, amd-gfx, linux-fsdevel, viro, linux-tegra,
	alexander.deucher, akpm, Christian König, linux-media

On Thu 09-06-22 16:10:33, Christian König wrote:
> Am 09.06.22 um 14:57 schrieb Michal Hocko:
> > On Thu 09-06-22 14:16:56, Christian König wrote:
> > > Am 09.06.22 um 11:18 schrieb Michal Hocko:
> > > > On Tue 31-05-22 11:59:57, Christian König wrote:
> > > > > This gives the OOM killer an additional hint which processes are
> > > > > referencing shmem files with potentially no other accounting for them.
> > > > > 
> > > > > Signed-off-by: Christian König <christian.koenig@amd.com>
> > > > > ---
> > > > >    mm/shmem.c | 6 ++++++
> > > > >    1 file changed, 6 insertions(+)
> > > > > 
> > > > > diff --git a/mm/shmem.c b/mm/shmem.c
> > > > > index 4b2fea33158e..a4ad92a16968 100644
> > > > > --- a/mm/shmem.c
> > > > > +++ b/mm/shmem.c
> > > > > @@ -2179,6 +2179,11 @@ unsigned long shmem_get_unmapped_area(struct file *file,
> > > > >    	return inflated_addr;
> > > > >    }
> > > > > +static long shmem_oom_badness(struct file *file)
> > > > > +{
> > > > > +	return i_size_read(file_inode(file)) >> PAGE_SHIFT;
> > > > > +}
> > > > This doesn't really represent the in memory size of the file, does it?
> > > Well the file could be partially or fully swapped out as anonymous memory or
> > > the address space only sparse populated, but even then just using the file
> > > size as OOM badness sounded like the most straightforward approach to me.
> > It covers hole as well, right?
> 
> Yes, exactly.

So let's say I have a huge sparse shmem file. I will get killed because
the oom_badness of such a file would be large as well...

> > > What could happen is that the file is also mmaped and we double account.
> > > 
> > > > Also the memcg oom handling could be considerably skewed if the file was
> > > > shared between more memcgs.
> > > Yes, and that's one of the reasons why I didn't touched the memcg by this
> > > and only affected the classic OOM killer.
> > oom_badness is for all oom handlers, including memcg. Maybe I have
> > misread an earlier patch but I do not see anything specific to global
> > oom handling.
> 
> As far as I can see the oom_badness() function is only used in
> oom_kill.c and in procfs to return the oom score. Did I missed
> something?

oom_kill.c implements most of the oom killer functionality. Memcg oom
killing is a part of that. Have a look at select_bad_process.

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 145+ messages in thread

* Re: [PATCH 03/13] mm: shmem: provide oom badness for shmem files
@ 2022-06-09 14:21             ` Michal Hocko
  0 siblings, 0 replies; 145+ messages in thread
From: Michal Hocko @ 2022-06-09 14:21 UTC (permalink / raw)
  To: Christian König
  Cc: andrey.grodzovsky, linux-mm, nouveau, intel-gfx, hughd,
	linux-kernel, amd-gfx, linux-fsdevel, viro, daniel, linux-tegra,
	alexander.deucher, akpm, Christian König, linux-media

On Thu 09-06-22 16:10:33, Christian König wrote:
> Am 09.06.22 um 14:57 schrieb Michal Hocko:
> > On Thu 09-06-22 14:16:56, Christian König wrote:
> > > Am 09.06.22 um 11:18 schrieb Michal Hocko:
> > > > On Tue 31-05-22 11:59:57, Christian König wrote:
> > > > > This gives the OOM killer an additional hint which processes are
> > > > > referencing shmem files with potentially no other accounting for them.
> > > > > 
> > > > > Signed-off-by: Christian König <christian.koenig@amd.com>
> > > > > ---
> > > > >    mm/shmem.c | 6 ++++++
> > > > >    1 file changed, 6 insertions(+)
> > > > > 
> > > > > diff --git a/mm/shmem.c b/mm/shmem.c
> > > > > index 4b2fea33158e..a4ad92a16968 100644
> > > > > --- a/mm/shmem.c
> > > > > +++ b/mm/shmem.c
> > > > > @@ -2179,6 +2179,11 @@ unsigned long shmem_get_unmapped_area(struct file *file,
> > > > >    	return inflated_addr;
> > > > >    }
> > > > > +static long shmem_oom_badness(struct file *file)
> > > > > +{
> > > > > +	return i_size_read(file_inode(file)) >> PAGE_SHIFT;
> > > > > +}
> > > > This doesn't really represent the in memory size of the file, does it?
> > > Well the file could be partially or fully swapped out as anonymous memory or
> > > the address space only sparse populated, but even then just using the file
> > > size as OOM badness sounded like the most straightforward approach to me.
> > It covers hole as well, right?
> 
> Yes, exactly.

So let's say I have a huge sparse shmem file. I will get killed because
the oom_badness of such a file would be large as well...

> > > What could happen is that the file is also mmaped and we double account.
> > > 
> > > > Also the memcg oom handling could be considerably skewed if the file was
> > > > shared between more memcgs.
> > > Yes, and that's one of the reasons why I didn't touched the memcg by this
> > > and only affected the classic OOM killer.
> > oom_badness is for all oom handlers, including memcg. Maybe I have
> > misread an earlier patch but I do not see anything specific to global
> > oom handling.
> 
> As far as I can see the oom_badness() function is only used in
> oom_kill.c and in procfs to return the oom score. Did I missed
> something?

oom_kill.c implements most of the oom killer functionality. Memcg oom
killing is a part of that. Have a look at select_bad_process.

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 145+ messages in thread

* Re: [Nouveau] [PATCH 03/13] mm: shmem: provide oom badness for shmem files
@ 2022-06-09 14:21             ` Michal Hocko
  0 siblings, 0 replies; 145+ messages in thread
From: Michal Hocko @ 2022-06-09 14:21 UTC (permalink / raw)
  To: Christian König
  Cc: andrey.grodzovsky, linux-mm, nouveau, intel-gfx, hughd,
	linux-kernel, amd-gfx, linux-fsdevel, viro, daniel, linux-tegra,
	alexander.deucher, akpm, Christian König, linux-media

On Thu 09-06-22 16:10:33, Christian König wrote:
> Am 09.06.22 um 14:57 schrieb Michal Hocko:
> > On Thu 09-06-22 14:16:56, Christian König wrote:
> > > Am 09.06.22 um 11:18 schrieb Michal Hocko:
> > > > On Tue 31-05-22 11:59:57, Christian König wrote:
> > > > > This gives the OOM killer an additional hint which processes are
> > > > > referencing shmem files with potentially no other accounting for them.
> > > > > 
> > > > > Signed-off-by: Christian König <christian.koenig@amd.com>
> > > > > ---
> > > > >    mm/shmem.c | 6 ++++++
> > > > >    1 file changed, 6 insertions(+)
> > > > > 
> > > > > diff --git a/mm/shmem.c b/mm/shmem.c
> > > > > index 4b2fea33158e..a4ad92a16968 100644
> > > > > --- a/mm/shmem.c
> > > > > +++ b/mm/shmem.c
> > > > > @@ -2179,6 +2179,11 @@ unsigned long shmem_get_unmapped_area(struct file *file,
> > > > >    	return inflated_addr;
> > > > >    }
> > > > > +static long shmem_oom_badness(struct file *file)
> > > > > +{
> > > > > +	return i_size_read(file_inode(file)) >> PAGE_SHIFT;
> > > > > +}
> > > > This doesn't really represent the in memory size of the file, does it?
> > > Well the file could be partially or fully swapped out as anonymous memory or
> > > the address space only sparse populated, but even then just using the file
> > > size as OOM badness sounded like the most straightforward approach to me.
> > It covers hole as well, right?
> 
> Yes, exactly.

So let's say I have a huge sparse shmem file. I will get killed because
the oom_badness of such a file would be large as well...

> > > What could happen is that the file is also mmaped and we double account.
> > > 
> > > > Also the memcg oom handling could be considerably skewed if the file was
> > > > shared between more memcgs.
> > > Yes, and that's one of the reasons why I didn't touched the memcg by this
> > > and only affected the classic OOM killer.
> > oom_badness is for all oom handlers, including memcg. Maybe I have
> > misread an earlier patch but I do not see anything specific to global
> > oom handling.
> 
> As far as I can see the oom_badness() function is only used in
> oom_kill.c and in procfs to return the oom score. Did I missed
> something?

oom_kill.c implements most of the oom killer functionality. Memcg oom
killing is a part of that. Have a look at select_bad_process.

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 145+ messages in thread

* Re: [PATCH 03/13] mm: shmem: provide oom badness for shmem files
  2022-06-09 14:21             ` [Intel-gfx] " Michal Hocko
  (?)
  (?)
@ 2022-06-09 14:29               ` Christian König
  -1 siblings, 0 replies; 145+ messages in thread
From: Christian König @ 2022-06-09 14:29 UTC (permalink / raw)
  To: Michal Hocko, Christian König
  Cc: linux-media, linux-kernel, intel-gfx, amd-gfx, nouveau,
	linux-tegra, linux-fsdevel, linux-mm, alexander.deucher, daniel,
	viro, akpm, hughd, andrey.grodzovsky

Am 09.06.22 um 16:21 schrieb Michal Hocko:
> On Thu 09-06-22 16:10:33, Christian König wrote:
>> Am 09.06.22 um 14:57 schrieb Michal Hocko:
>>> On Thu 09-06-22 14:16:56, Christian König wrote:
>>>> Am 09.06.22 um 11:18 schrieb Michal Hocko:
>>>>> On Tue 31-05-22 11:59:57, Christian König wrote:
>>>>>> This gives the OOM killer an additional hint which processes are
>>>>>> referencing shmem files with potentially no other accounting for them.
>>>>>>
>>>>>> Signed-off-by: Christian König <christian.koenig@amd.com>
>>>>>> ---
>>>>>>     mm/shmem.c | 6 ++++++
>>>>>>     1 file changed, 6 insertions(+)
>>>>>>
>>>>>> diff --git a/mm/shmem.c b/mm/shmem.c
>>>>>> index 4b2fea33158e..a4ad92a16968 100644
>>>>>> --- a/mm/shmem.c
>>>>>> +++ b/mm/shmem.c
>>>>>> @@ -2179,6 +2179,11 @@ unsigned long shmem_get_unmapped_area(struct file *file,
>>>>>>     	return inflated_addr;
>>>>>>     }
>>>>>> +static long shmem_oom_badness(struct file *file)
>>>>>> +{
>>>>>> +	return i_size_read(file_inode(file)) >> PAGE_SHIFT;
>>>>>> +}
>>>>> This doesn't really represent the in memory size of the file, does it?
>>>> Well the file could be partially or fully swapped out as anonymous memory or
>>>> the address space only sparse populated, but even then just using the file
>>>> size as OOM badness sounded like the most straightforward approach to me.
>>> It covers hole as well, right?
>> Yes, exactly.
> So let's say I have a huge sparse shmem file. I will get killed because
> the oom_badness of such a file would be large as well...

Yes, correct. But I of hand don't see how we could improve that accounting.

>>>> What could happen is that the file is also mmaped and we double account.
>>>>
>>>>> Also the memcg oom handling could be considerably skewed if the file was
>>>>> shared between more memcgs.
>>>> Yes, and that's one of the reasons why I didn't touched the memcg by this
>>>> and only affected the classic OOM killer.
>>> oom_badness is for all oom handlers, including memcg. Maybe I have
>>> misread an earlier patch but I do not see anything specific to global
>>> oom handling.
>> As far as I can see the oom_badness() function is only used in
>> oom_kill.c and in procfs to return the oom score. Did I missed
>> something?
> oom_kill.c implements most of the oom killer functionality. Memcg oom
> killing is a part of that. Have a look at select_bad_process.

Ah! So mem_cgroup_scan_tasks() calls oom_evaluate_task for each task in 
the control group.

Thanks for pointing that out, that was absolutely not obvious to me.

Is that a show stopper? How should we address this?

Christian.


^ permalink raw reply	[flat|nested] 145+ messages in thread

* Re: [Nouveau] [PATCH 03/13] mm: shmem: provide oom badness for shmem files
@ 2022-06-09 14:29               ` Christian König
  0 siblings, 0 replies; 145+ messages in thread
From: Christian König @ 2022-06-09 14:29 UTC (permalink / raw)
  To: Michal Hocko, Christian König
  Cc: andrey.grodzovsky, linux-mm, nouveau, intel-gfx, hughd,
	linux-kernel, amd-gfx, linux-fsdevel, viro, daniel, linux-tegra,
	alexander.deucher, akpm, linux-media

Am 09.06.22 um 16:21 schrieb Michal Hocko:
> On Thu 09-06-22 16:10:33, Christian König wrote:
>> Am 09.06.22 um 14:57 schrieb Michal Hocko:
>>> On Thu 09-06-22 14:16:56, Christian König wrote:
>>>> Am 09.06.22 um 11:18 schrieb Michal Hocko:
>>>>> On Tue 31-05-22 11:59:57, Christian König wrote:
>>>>>> This gives the OOM killer an additional hint which processes are
>>>>>> referencing shmem files with potentially no other accounting for them.
>>>>>>
>>>>>> Signed-off-by: Christian König <christian.koenig@amd.com>
>>>>>> ---
>>>>>>     mm/shmem.c | 6 ++++++
>>>>>>     1 file changed, 6 insertions(+)
>>>>>>
>>>>>> diff --git a/mm/shmem.c b/mm/shmem.c
>>>>>> index 4b2fea33158e..a4ad92a16968 100644
>>>>>> --- a/mm/shmem.c
>>>>>> +++ b/mm/shmem.c
>>>>>> @@ -2179,6 +2179,11 @@ unsigned long shmem_get_unmapped_area(struct file *file,
>>>>>>     	return inflated_addr;
>>>>>>     }
>>>>>> +static long shmem_oom_badness(struct file *file)
>>>>>> +{
>>>>>> +	return i_size_read(file_inode(file)) >> PAGE_SHIFT;
>>>>>> +}
>>>>> This doesn't really represent the in memory size of the file, does it?
>>>> Well the file could be partially or fully swapped out as anonymous memory or
>>>> the address space only sparse populated, but even then just using the file
>>>> size as OOM badness sounded like the most straightforward approach to me.
>>> It covers hole as well, right?
>> Yes, exactly.
> So let's say I have a huge sparse shmem file. I will get killed because
> the oom_badness of such a file would be large as well...

Yes, correct. But I of hand don't see how we could improve that accounting.

>>>> What could happen is that the file is also mmaped and we double account.
>>>>
>>>>> Also the memcg oom handling could be considerably skewed if the file was
>>>>> shared between more memcgs.
>>>> Yes, and that's one of the reasons why I didn't touched the memcg by this
>>>> and only affected the classic OOM killer.
>>> oom_badness is for all oom handlers, including memcg. Maybe I have
>>> misread an earlier patch but I do not see anything specific to global
>>> oom handling.
>> As far as I can see the oom_badness() function is only used in
>> oom_kill.c and in procfs to return the oom score. Did I missed
>> something?
> oom_kill.c implements most of the oom killer functionality. Memcg oom
> killing is a part of that. Have a look at select_bad_process.

Ah! So mem_cgroup_scan_tasks() calls oom_evaluate_task for each task in 
the control group.

Thanks for pointing that out, that was absolutely not obvious to me.

Is that a show stopper? How should we address this?

Christian.


^ permalink raw reply	[flat|nested] 145+ messages in thread

* Re: [Intel-gfx] [PATCH 03/13] mm: shmem: provide oom badness for shmem files
@ 2022-06-09 14:29               ` Christian König
  0 siblings, 0 replies; 145+ messages in thread
From: Christian König @ 2022-06-09 14:29 UTC (permalink / raw)
  To: Michal Hocko, Christian König
  Cc: andrey.grodzovsky, linux-mm, nouveau, intel-gfx, hughd,
	linux-kernel, amd-gfx, linux-fsdevel, viro, linux-tegra,
	alexander.deucher, akpm, linux-media

Am 09.06.22 um 16:21 schrieb Michal Hocko:
> On Thu 09-06-22 16:10:33, Christian König wrote:
>> Am 09.06.22 um 14:57 schrieb Michal Hocko:
>>> On Thu 09-06-22 14:16:56, Christian König wrote:
>>>> Am 09.06.22 um 11:18 schrieb Michal Hocko:
>>>>> On Tue 31-05-22 11:59:57, Christian König wrote:
>>>>>> This gives the OOM killer an additional hint which processes are
>>>>>> referencing shmem files with potentially no other accounting for them.
>>>>>>
>>>>>> Signed-off-by: Christian König <christian.koenig@amd.com>
>>>>>> ---
>>>>>>     mm/shmem.c | 6 ++++++
>>>>>>     1 file changed, 6 insertions(+)
>>>>>>
>>>>>> diff --git a/mm/shmem.c b/mm/shmem.c
>>>>>> index 4b2fea33158e..a4ad92a16968 100644
>>>>>> --- a/mm/shmem.c
>>>>>> +++ b/mm/shmem.c
>>>>>> @@ -2179,6 +2179,11 @@ unsigned long shmem_get_unmapped_area(struct file *file,
>>>>>>     	return inflated_addr;
>>>>>>     }
>>>>>> +static long shmem_oom_badness(struct file *file)
>>>>>> +{
>>>>>> +	return i_size_read(file_inode(file)) >> PAGE_SHIFT;
>>>>>> +}
>>>>> This doesn't really represent the in memory size of the file, does it?
>>>> Well the file could be partially or fully swapped out as anonymous memory or
>>>> the address space only sparse populated, but even then just using the file
>>>> size as OOM badness sounded like the most straightforward approach to me.
>>> It covers hole as well, right?
>> Yes, exactly.
> So let's say I have a huge sparse shmem file. I will get killed because
> the oom_badness of such a file would be large as well...

Yes, correct. But I of hand don't see how we could improve that accounting.

>>>> What could happen is that the file is also mmaped and we double account.
>>>>
>>>>> Also the memcg oom handling could be considerably skewed if the file was
>>>>> shared between more memcgs.
>>>> Yes, and that's one of the reasons why I didn't touched the memcg by this
>>>> and only affected the classic OOM killer.
>>> oom_badness is for all oom handlers, including memcg. Maybe I have
>>> misread an earlier patch but I do not see anything specific to global
>>> oom handling.
>> As far as I can see the oom_badness() function is only used in
>> oom_kill.c and in procfs to return the oom score. Did I missed
>> something?
> oom_kill.c implements most of the oom killer functionality. Memcg oom
> killing is a part of that. Have a look at select_bad_process.

Ah! So mem_cgroup_scan_tasks() calls oom_evaluate_task for each task in 
the control group.

Thanks for pointing that out, that was absolutely not obvious to me.

Is that a show stopper? How should we address this?

Christian.


^ permalink raw reply	[flat|nested] 145+ messages in thread

* Re: [PATCH 03/13] mm: shmem: provide oom badness for shmem files
@ 2022-06-09 14:29               ` Christian König
  0 siblings, 0 replies; 145+ messages in thread
From: Christian König @ 2022-06-09 14:29 UTC (permalink / raw)
  To: Michal Hocko, Christian König
  Cc: andrey.grodzovsky, linux-mm, nouveau, intel-gfx, hughd,
	linux-kernel, amd-gfx, linux-fsdevel, viro, daniel, linux-tegra,
	alexander.deucher, akpm, linux-media

Am 09.06.22 um 16:21 schrieb Michal Hocko:
> On Thu 09-06-22 16:10:33, Christian König wrote:
>> Am 09.06.22 um 14:57 schrieb Michal Hocko:
>>> On Thu 09-06-22 14:16:56, Christian König wrote:
>>>> Am 09.06.22 um 11:18 schrieb Michal Hocko:
>>>>> On Tue 31-05-22 11:59:57, Christian König wrote:
>>>>>> This gives the OOM killer an additional hint which processes are
>>>>>> referencing shmem files with potentially no other accounting for them.
>>>>>>
>>>>>> Signed-off-by: Christian König <christian.koenig@amd.com>
>>>>>> ---
>>>>>>     mm/shmem.c | 6 ++++++
>>>>>>     1 file changed, 6 insertions(+)
>>>>>>
>>>>>> diff --git a/mm/shmem.c b/mm/shmem.c
>>>>>> index 4b2fea33158e..a4ad92a16968 100644
>>>>>> --- a/mm/shmem.c
>>>>>> +++ b/mm/shmem.c
>>>>>> @@ -2179,6 +2179,11 @@ unsigned long shmem_get_unmapped_area(struct file *file,
>>>>>>     	return inflated_addr;
>>>>>>     }
>>>>>> +static long shmem_oom_badness(struct file *file)
>>>>>> +{
>>>>>> +	return i_size_read(file_inode(file)) >> PAGE_SHIFT;
>>>>>> +}
>>>>> This doesn't really represent the in memory size of the file, does it?
>>>> Well the file could be partially or fully swapped out as anonymous memory or
>>>> the address space only sparse populated, but even then just using the file
>>>> size as OOM badness sounded like the most straightforward approach to me.
>>> It covers hole as well, right?
>> Yes, exactly.
> So let's say I have a huge sparse shmem file. I will get killed because
> the oom_badness of such a file would be large as well...

Yes, correct. But I of hand don't see how we could improve that accounting.

>>>> What could happen is that the file is also mmaped and we double account.
>>>>
>>>>> Also the memcg oom handling could be considerably skewed if the file was
>>>>> shared between more memcgs.
>>>> Yes, and that's one of the reasons why I didn't touched the memcg by this
>>>> and only affected the classic OOM killer.
>>> oom_badness is for all oom handlers, including memcg. Maybe I have
>>> misread an earlier patch but I do not see anything specific to global
>>> oom handling.
>> As far as I can see the oom_badness() function is only used in
>> oom_kill.c and in procfs to return the oom score. Did I missed
>> something?
> oom_kill.c implements most of the oom killer functionality. Memcg oom
> killing is a part of that. Have a look at select_bad_process.

Ah! So mem_cgroup_scan_tasks() calls oom_evaluate_task for each task in 
the control group.

Thanks for pointing that out, that was absolutely not obvious to me.

Is that a show stopper? How should we address this?

Christian.


^ permalink raw reply	[flat|nested] 145+ messages in thread

* Re: [PATCH 03/13] mm: shmem: provide oom badness for shmem files
  2022-06-09 14:29               ` [Nouveau] " Christian König
  (?)
  (?)
@ 2022-06-09 15:07                 ` Michal Hocko
  -1 siblings, 0 replies; 145+ messages in thread
From: Michal Hocko @ 2022-06-09 15:07 UTC (permalink / raw)
  To: Christian König
  Cc: Christian König, linux-media, linux-kernel, intel-gfx,
	amd-gfx, nouveau, linux-tegra, linux-fsdevel, linux-mm,
	alexander.deucher, daniel, viro, akpm, hughd, andrey.grodzovsky

On Thu 09-06-22 16:29:46, Christian König wrote:
[...]
> Is that a show stopper? How should we address this?

This is a hard problem to deal with and I am not sure this simple
solution is really a good fit. Not only because of the memcg side of
things. I have my doubts that sparse files handling is ok as well.

I do realize this is a long term problem and there is a demand for some
solution at least. I am not sure how to deal with shared resources
myself. The best approximation I can come up with is to limit the scope
of the damage into a memcg context. One idea I was playing with (but
never convinced myself it is really a worth) is to allow a new mode of
the oom victim selection for the global oom event. It would be an opt in
and the victim would be selected from the biggest leaf memcg (or kill
the whole memcg if it has group_oom configured.

That would address at least some of the accounting issue because charges
are better tracked than per process memory consumption. It is a crude
and ugly hack and it doesn't solve the underlying problem as shared
resources are not guaranteed to be freed when processes die but maybe it
would be just slightly better than the existing scheme which is clearly
lacking behind existing userspace.
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 145+ messages in thread

* Re: [Intel-gfx] [PATCH 03/13] mm: shmem: provide oom badness for shmem files
@ 2022-06-09 15:07                 ` Michal Hocko
  0 siblings, 0 replies; 145+ messages in thread
From: Michal Hocko @ 2022-06-09 15:07 UTC (permalink / raw)
  To: Christian König
  Cc: andrey.grodzovsky, linux-mm, nouveau, intel-gfx, hughd,
	linux-kernel, amd-gfx, linux-fsdevel, viro, Christian König,
	linux-tegra, alexander.deucher, akpm, linux-media

On Thu 09-06-22 16:29:46, Christian König wrote:
[...]
> Is that a show stopper? How should we address this?

This is a hard problem to deal with and I am not sure this simple
solution is really a good fit. Not only because of the memcg side of
things. I have my doubts that sparse files handling is ok as well.

I do realize this is a long term problem and there is a demand for some
solution at least. I am not sure how to deal with shared resources
myself. The best approximation I can come up with is to limit the scope
of the damage into a memcg context. One idea I was playing with (but
never convinced myself it is really a worth) is to allow a new mode of
the oom victim selection for the global oom event. It would be an opt in
and the victim would be selected from the biggest leaf memcg (or kill
the whole memcg if it has group_oom configured.

That would address at least some of the accounting issue because charges
are better tracked than per process memory consumption. It is a crude
and ugly hack and it doesn't solve the underlying problem as shared
resources are not guaranteed to be freed when processes die but maybe it
would be just slightly better than the existing scheme which is clearly
lacking behind existing userspace.
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 145+ messages in thread

* Re: [PATCH 03/13] mm: shmem: provide oom badness for shmem files
@ 2022-06-09 15:07                 ` Michal Hocko
  0 siblings, 0 replies; 145+ messages in thread
From: Michal Hocko @ 2022-06-09 15:07 UTC (permalink / raw)
  To: Christian König
  Cc: andrey.grodzovsky, linux-mm, nouveau, intel-gfx, hughd,
	linux-kernel, amd-gfx, linux-fsdevel, viro, daniel,
	Christian König, linux-tegra, alexander.deucher, akpm,
	linux-media

On Thu 09-06-22 16:29:46, Christian König wrote:
[...]
> Is that a show stopper? How should we address this?

This is a hard problem to deal with and I am not sure this simple
solution is really a good fit. Not only because of the memcg side of
things. I have my doubts that sparse files handling is ok as well.

I do realize this is a long term problem and there is a demand for some
solution at least. I am not sure how to deal with shared resources
myself. The best approximation I can come up with is to limit the scope
of the damage into a memcg context. One idea I was playing with (but
never convinced myself it is really a worth) is to allow a new mode of
the oom victim selection for the global oom event. It would be an opt in
and the victim would be selected from the biggest leaf memcg (or kill
the whole memcg if it has group_oom configured.

That would address at least some of the accounting issue because charges
are better tracked than per process memory consumption. It is a crude
and ugly hack and it doesn't solve the underlying problem as shared
resources are not guaranteed to be freed when processes die but maybe it
would be just slightly better than the existing scheme which is clearly
lacking behind existing userspace.
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 145+ messages in thread

* Re: [Nouveau] [PATCH 03/13] mm: shmem: provide oom badness for shmem files
@ 2022-06-09 15:07                 ` Michal Hocko
  0 siblings, 0 replies; 145+ messages in thread
From: Michal Hocko @ 2022-06-09 15:07 UTC (permalink / raw)
  To: Christian König
  Cc: andrey.grodzovsky, linux-mm, nouveau, intel-gfx, hughd,
	linux-kernel, amd-gfx, linux-fsdevel, viro, daniel,
	Christian König, linux-tegra, alexander.deucher, akpm,
	linux-media

On Thu 09-06-22 16:29:46, Christian König wrote:
[...]
> Is that a show stopper? How should we address this?

This is a hard problem to deal with and I am not sure this simple
solution is really a good fit. Not only because of the memcg side of
things. I have my doubts that sparse files handling is ok as well.

I do realize this is a long term problem and there is a demand for some
solution at least. I am not sure how to deal with shared resources
myself. The best approximation I can come up with is to limit the scope
of the damage into a memcg context. One idea I was playing with (but
never convinced myself it is really a worth) is to allow a new mode of
the oom victim selection for the global oom event. It would be an opt in
and the victim would be selected from the biggest leaf memcg (or kill
the whole memcg if it has group_oom configured.

That would address at least some of the accounting issue because charges
are better tracked than per process memory consumption. It is a crude
and ugly hack and it doesn't solve the underlying problem as shared
resources are not guaranteed to be freed when processes die but maybe it
would be just slightly better than the existing scheme which is clearly
lacking behind existing userspace.
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 145+ messages in thread

* Re: [PATCH 03/13] mm: shmem: provide oom badness for shmem files
  2022-06-09 14:21             ` [Intel-gfx] " Michal Hocko
  (?)
  (?)
@ 2022-06-09 15:19               ` Felix Kuehling
  -1 siblings, 0 replies; 145+ messages in thread
From: Felix Kuehling @ 2022-06-09 15:19 UTC (permalink / raw)
  To: Michal Hocko, Christian König
  Cc: Christian König, linux-media, linux-kernel, intel-gfx,
	amd-gfx, nouveau, linux-tegra, linux-fsdevel, linux-mm,
	alexander.deucher, daniel, viro, akpm, hughd, andrey.grodzovsky


Am 2022-06-09 um 10:21 schrieb Michal Hocko:
> On Thu 09-06-22 16:10:33, Christian König wrote:
>> Am 09.06.22 um 14:57 schrieb Michal Hocko:
>>> On Thu 09-06-22 14:16:56, Christian König wrote:
>>>> Am 09.06.22 um 11:18 schrieb Michal Hocko:
>>>>> On Tue 31-05-22 11:59:57, Christian König wrote:
>>>>>> This gives the OOM killer an additional hint which processes are
>>>>>> referencing shmem files with potentially no other accounting for them.
>>>>>>
>>>>>> Signed-off-by: Christian König <christian.koenig@amd.com>
>>>>>> ---
>>>>>>     mm/shmem.c | 6 ++++++
>>>>>>     1 file changed, 6 insertions(+)
>>>>>>
>>>>>> diff --git a/mm/shmem.c b/mm/shmem.c
>>>>>> index 4b2fea33158e..a4ad92a16968 100644
>>>>>> --- a/mm/shmem.c
>>>>>> +++ b/mm/shmem.c
>>>>>> @@ -2179,6 +2179,11 @@ unsigned long shmem_get_unmapped_area(struct file *file,
>>>>>>     	return inflated_addr;
>>>>>>     }
>>>>>> +static long shmem_oom_badness(struct file *file)
>>>>>> +{
>>>>>> +	return i_size_read(file_inode(file)) >> PAGE_SHIFT;
>>>>>> +}
>>>>> This doesn't really represent the in memory size of the file, does it?
>>>> Well the file could be partially or fully swapped out as anonymous memory or
>>>> the address space only sparse populated, but even then just using the file
>>>> size as OOM badness sounded like the most straightforward approach to me.
>>> It covers hole as well, right?
>> Yes, exactly.
> So let's say I have a huge sparse shmem file. I will get killed because
> the oom_badness of such a file would be large as well...

Would killing processes free shmem files, though? Aren't those 
persistent anyway? In that case, shmem files should not contribute to 
oom_badness at all.

I guess a special case would be files that were removed from the 
filesystem but are still open in some processes.

Regards,
   Felix


>
>>>> What could happen is that the file is also mmaped and we double account.
>>>>
>>>>> Also the memcg oom handling could be considerably skewed if the file was
>>>>> shared between more memcgs.
>>>> Yes, and that's one of the reasons why I didn't touched the memcg by this
>>>> and only affected the classic OOM killer.
>>> oom_badness is for all oom handlers, including memcg. Maybe I have
>>> misread an earlier patch but I do not see anything specific to global
>>> oom handling.
>> As far as I can see the oom_badness() function is only used in
>> oom_kill.c and in procfs to return the oom score. Did I missed
>> something?
> oom_kill.c implements most of the oom killer functionality. Memcg oom
> killing is a part of that. Have a look at select_bad_process.
>

^ permalink raw reply	[flat|nested] 145+ messages in thread

* Re: [Nouveau] [PATCH 03/13] mm: shmem: provide oom badness for shmem files
@ 2022-06-09 15:19               ` Felix Kuehling
  0 siblings, 0 replies; 145+ messages in thread
From: Felix Kuehling @ 2022-06-09 15:19 UTC (permalink / raw)
  To: Michal Hocko, Christian König
  Cc: andrey.grodzovsky, linux-mm, nouveau, intel-gfx, hughd,
	linux-kernel, amd-gfx, linux-fsdevel, viro, daniel, linux-tegra,
	alexander.deucher, akpm, Christian König, linux-media


Am 2022-06-09 um 10:21 schrieb Michal Hocko:
> On Thu 09-06-22 16:10:33, Christian König wrote:
>> Am 09.06.22 um 14:57 schrieb Michal Hocko:
>>> On Thu 09-06-22 14:16:56, Christian König wrote:
>>>> Am 09.06.22 um 11:18 schrieb Michal Hocko:
>>>>> On Tue 31-05-22 11:59:57, Christian König wrote:
>>>>>> This gives the OOM killer an additional hint which processes are
>>>>>> referencing shmem files with potentially no other accounting for them.
>>>>>>
>>>>>> Signed-off-by: Christian König <christian.koenig@amd.com>
>>>>>> ---
>>>>>>     mm/shmem.c | 6 ++++++
>>>>>>     1 file changed, 6 insertions(+)
>>>>>>
>>>>>> diff --git a/mm/shmem.c b/mm/shmem.c
>>>>>> index 4b2fea33158e..a4ad92a16968 100644
>>>>>> --- a/mm/shmem.c
>>>>>> +++ b/mm/shmem.c
>>>>>> @@ -2179,6 +2179,11 @@ unsigned long shmem_get_unmapped_area(struct file *file,
>>>>>>     	return inflated_addr;
>>>>>>     }
>>>>>> +static long shmem_oom_badness(struct file *file)
>>>>>> +{
>>>>>> +	return i_size_read(file_inode(file)) >> PAGE_SHIFT;
>>>>>> +}
>>>>> This doesn't really represent the in memory size of the file, does it?
>>>> Well the file could be partially or fully swapped out as anonymous memory or
>>>> the address space only sparse populated, but even then just using the file
>>>> size as OOM badness sounded like the most straightforward approach to me.
>>> It covers hole as well, right?
>> Yes, exactly.
> So let's say I have a huge sparse shmem file. I will get killed because
> the oom_badness of such a file would be large as well...

Would killing processes free shmem files, though? Aren't those 
persistent anyway? In that case, shmem files should not contribute to 
oom_badness at all.

I guess a special case would be files that were removed from the 
filesystem but are still open in some processes.

Regards,
   Felix


>
>>>> What could happen is that the file is also mmaped and we double account.
>>>>
>>>>> Also the memcg oom handling could be considerably skewed if the file was
>>>>> shared between more memcgs.
>>>> Yes, and that's one of the reasons why I didn't touched the memcg by this
>>>> and only affected the classic OOM killer.
>>> oom_badness is for all oom handlers, including memcg. Maybe I have
>>> misread an earlier patch but I do not see anything specific to global
>>> oom handling.
>> As far as I can see the oom_badness() function is only used in
>> oom_kill.c and in procfs to return the oom score. Did I missed
>> something?
> oom_kill.c implements most of the oom killer functionality. Memcg oom
> killing is a part of that. Have a look at select_bad_process.
>

^ permalink raw reply	[flat|nested] 145+ messages in thread

* Re: [Intel-gfx] [PATCH 03/13] mm: shmem: provide oom badness for shmem files
@ 2022-06-09 15:19               ` Felix Kuehling
  0 siblings, 0 replies; 145+ messages in thread
From: Felix Kuehling @ 2022-06-09 15:19 UTC (permalink / raw)
  To: Michal Hocko, Christian König
  Cc: andrey.grodzovsky, linux-mm, nouveau, intel-gfx, hughd,
	linux-kernel, amd-gfx, linux-fsdevel, viro, linux-tegra,
	alexander.deucher, akpm, Christian König, linux-media


Am 2022-06-09 um 10:21 schrieb Michal Hocko:
> On Thu 09-06-22 16:10:33, Christian König wrote:
>> Am 09.06.22 um 14:57 schrieb Michal Hocko:
>>> On Thu 09-06-22 14:16:56, Christian König wrote:
>>>> Am 09.06.22 um 11:18 schrieb Michal Hocko:
>>>>> On Tue 31-05-22 11:59:57, Christian König wrote:
>>>>>> This gives the OOM killer an additional hint which processes are
>>>>>> referencing shmem files with potentially no other accounting for them.
>>>>>>
>>>>>> Signed-off-by: Christian König <christian.koenig@amd.com>
>>>>>> ---
>>>>>>     mm/shmem.c | 6 ++++++
>>>>>>     1 file changed, 6 insertions(+)
>>>>>>
>>>>>> diff --git a/mm/shmem.c b/mm/shmem.c
>>>>>> index 4b2fea33158e..a4ad92a16968 100644
>>>>>> --- a/mm/shmem.c
>>>>>> +++ b/mm/shmem.c
>>>>>> @@ -2179,6 +2179,11 @@ unsigned long shmem_get_unmapped_area(struct file *file,
>>>>>>     	return inflated_addr;
>>>>>>     }
>>>>>> +static long shmem_oom_badness(struct file *file)
>>>>>> +{
>>>>>> +	return i_size_read(file_inode(file)) >> PAGE_SHIFT;
>>>>>> +}
>>>>> This doesn't really represent the in memory size of the file, does it?
>>>> Well the file could be partially or fully swapped out as anonymous memory or
>>>> the address space only sparse populated, but even then just using the file
>>>> size as OOM badness sounded like the most straightforward approach to me.
>>> It covers hole as well, right?
>> Yes, exactly.
> So let's say I have a huge sparse shmem file. I will get killed because
> the oom_badness of such a file would be large as well...

Would killing processes free shmem files, though? Aren't those 
persistent anyway? In that case, shmem files should not contribute to 
oom_badness at all.

I guess a special case would be files that were removed from the 
filesystem but are still open in some processes.

Regards,
   Felix


>
>>>> What could happen is that the file is also mmaped and we double account.
>>>>
>>>>> Also the memcg oom handling could be considerably skewed if the file was
>>>>> shared between more memcgs.
>>>> Yes, and that's one of the reasons why I didn't touched the memcg by this
>>>> and only affected the classic OOM killer.
>>> oom_badness is for all oom handlers, including memcg. Maybe I have
>>> misread an earlier patch but I do not see anything specific to global
>>> oom handling.
>> As far as I can see the oom_badness() function is only used in
>> oom_kill.c and in procfs to return the oom score. Did I missed
>> something?
> oom_kill.c implements most of the oom killer functionality. Memcg oom
> killing is a part of that. Have a look at select_bad_process.
>

^ permalink raw reply	[flat|nested] 145+ messages in thread

* Re: [PATCH 03/13] mm: shmem: provide oom badness for shmem files
@ 2022-06-09 15:19               ` Felix Kuehling
  0 siblings, 0 replies; 145+ messages in thread
From: Felix Kuehling @ 2022-06-09 15:19 UTC (permalink / raw)
  To: Michal Hocko, Christian König
  Cc: andrey.grodzovsky, linux-mm, nouveau, intel-gfx, hughd,
	linux-kernel, amd-gfx, linux-fsdevel, viro, daniel, linux-tegra,
	alexander.deucher, akpm, Christian König, linux-media


Am 2022-06-09 um 10:21 schrieb Michal Hocko:
> On Thu 09-06-22 16:10:33, Christian König wrote:
>> Am 09.06.22 um 14:57 schrieb Michal Hocko:
>>> On Thu 09-06-22 14:16:56, Christian König wrote:
>>>> Am 09.06.22 um 11:18 schrieb Michal Hocko:
>>>>> On Tue 31-05-22 11:59:57, Christian König wrote:
>>>>>> This gives the OOM killer an additional hint which processes are
>>>>>> referencing shmem files with potentially no other accounting for them.
>>>>>>
>>>>>> Signed-off-by: Christian König <christian.koenig@amd.com>
>>>>>> ---
>>>>>>     mm/shmem.c | 6 ++++++
>>>>>>     1 file changed, 6 insertions(+)
>>>>>>
>>>>>> diff --git a/mm/shmem.c b/mm/shmem.c
>>>>>> index 4b2fea33158e..a4ad92a16968 100644
>>>>>> --- a/mm/shmem.c
>>>>>> +++ b/mm/shmem.c
>>>>>> @@ -2179,6 +2179,11 @@ unsigned long shmem_get_unmapped_area(struct file *file,
>>>>>>     	return inflated_addr;
>>>>>>     }
>>>>>> +static long shmem_oom_badness(struct file *file)
>>>>>> +{
>>>>>> +	return i_size_read(file_inode(file)) >> PAGE_SHIFT;
>>>>>> +}
>>>>> This doesn't really represent the in memory size of the file, does it?
>>>> Well the file could be partially or fully swapped out as anonymous memory or
>>>> the address space only sparse populated, but even then just using the file
>>>> size as OOM badness sounded like the most straightforward approach to me.
>>> It covers hole as well, right?
>> Yes, exactly.
> So let's say I have a huge sparse shmem file. I will get killed because
> the oom_badness of such a file would be large as well...

Would killing processes free shmem files, though? Aren't those 
persistent anyway? In that case, shmem files should not contribute to 
oom_badness at all.

I guess a special case would be files that were removed from the 
filesystem but are still open in some processes.

Regards,
   Felix


>
>>>> What could happen is that the file is also mmaped and we double account.
>>>>
>>>>> Also the memcg oom handling could be considerably skewed if the file was
>>>>> shared between more memcgs.
>>>> Yes, and that's one of the reasons why I didn't touched the memcg by this
>>>> and only affected the classic OOM killer.
>>> oom_badness is for all oom handlers, including memcg. Maybe I have
>>> misread an earlier patch but I do not see anything specific to global
>>> oom handling.
>> As far as I can see the oom_badness() function is only used in
>> oom_kill.c and in procfs to return the oom score. Did I missed
>> something?
> oom_kill.c implements most of the oom killer functionality. Memcg oom
> killing is a part of that. Have a look at select_bad_process.
>

^ permalink raw reply	[flat|nested] 145+ messages in thread

* Re: [PATCH 03/13] mm: shmem: provide oom badness for shmem files
  2022-06-09 15:19               ` [Nouveau] " Felix Kuehling
  (?)
  (?)
@ 2022-06-09 15:22                 ` Christian König
  -1 siblings, 0 replies; 145+ messages in thread
From: Christian König @ 2022-06-09 15:22 UTC (permalink / raw)
  To: Felix Kuehling, Michal Hocko, Christian König
  Cc: linux-media, linux-kernel, intel-gfx, amd-gfx, nouveau,
	linux-tegra, linux-fsdevel, linux-mm, alexander.deucher, daniel,
	viro, akpm, hughd, andrey.grodzovsky

Am 09.06.22 um 17:19 schrieb Felix Kuehling:
>
> Am 2022-06-09 um 10:21 schrieb Michal Hocko:
>> On Thu 09-06-22 16:10:33, Christian König wrote:
>>> Am 09.06.22 um 14:57 schrieb Michal Hocko:
>>>> On Thu 09-06-22 14:16:56, Christian König wrote:
>>>>> Am 09.06.22 um 11:18 schrieb Michal Hocko:
>>>>>> On Tue 31-05-22 11:59:57, Christian König wrote:
>>>>>>> This gives the OOM killer an additional hint which processes are
>>>>>>> referencing shmem files with potentially no other accounting for 
>>>>>>> them.
>>>>>>>
>>>>>>> Signed-off-by: Christian König <christian.koenig@amd.com>
>>>>>>> ---
>>>>>>>     mm/shmem.c | 6 ++++++
>>>>>>>     1 file changed, 6 insertions(+)
>>>>>>>
>>>>>>> diff --git a/mm/shmem.c b/mm/shmem.c
>>>>>>> index 4b2fea33158e..a4ad92a16968 100644
>>>>>>> --- a/mm/shmem.c
>>>>>>> +++ b/mm/shmem.c
>>>>>>> @@ -2179,6 +2179,11 @@ unsigned long 
>>>>>>> shmem_get_unmapped_area(struct file *file,
>>>>>>>         return inflated_addr;
>>>>>>>     }
>>>>>>> +static long shmem_oom_badness(struct file *file)
>>>>>>> +{
>>>>>>> +    return i_size_read(file_inode(file)) >> PAGE_SHIFT;
>>>>>>> +}
>>>>>> This doesn't really represent the in memory size of the file, 
>>>>>> does it?
>>>>> Well the file could be partially or fully swapped out as anonymous 
>>>>> memory or
>>>>> the address space only sparse populated, but even then just using 
>>>>> the file
>>>>> size as OOM badness sounded like the most straightforward approach 
>>>>> to me.
>>>> It covers hole as well, right?
>>> Yes, exactly.
>> So let's say I have a huge sparse shmem file. I will get killed because
>> the oom_badness of such a file would be large as well...
>
> Would killing processes free shmem files, though? Aren't those 
> persistent anyway? In that case, shmem files should not contribute to 
> oom_badness at all.

At least for the memfd_create() case they do, yes.

Those files were never part of any filesystem in the first place, so by 
killing all the process referencing them you can indeed free the memory 
locked by them.

Regards,
Christian.

>
> I guess a special case would be files that were removed from the 
> filesystem but are still open in some processes.
>
> Regards,
>   Felix
>
>
>>
>>>>> What could happen is that the file is also mmaped and we double 
>>>>> account.
>>>>>
>>>>>> Also the memcg oom handling could be considerably skewed if the 
>>>>>> file was
>>>>>> shared between more memcgs.
>>>>> Yes, and that's one of the reasons why I didn't touched the memcg 
>>>>> by this
>>>>> and only affected the classic OOM killer.
>>>> oom_badness is for all oom handlers, including memcg. Maybe I have
>>>> misread an earlier patch but I do not see anything specific to global
>>>> oom handling.
>>> As far as I can see the oom_badness() function is only used in
>>> oom_kill.c and in procfs to return the oom score. Did I missed
>>> something?
>> oom_kill.c implements most of the oom killer functionality. Memcg oom
>> killing is a part of that. Have a look at select_bad_process.
>>


^ permalink raw reply	[flat|nested] 145+ messages in thread

* Re: [Nouveau] [PATCH 03/13] mm: shmem: provide oom badness for shmem files
@ 2022-06-09 15:22                 ` Christian König
  0 siblings, 0 replies; 145+ messages in thread
From: Christian König @ 2022-06-09 15:22 UTC (permalink / raw)
  To: Felix Kuehling, Michal Hocko, Christian König
  Cc: andrey.grodzovsky, linux-mm, nouveau, intel-gfx, hughd,
	linux-kernel, amd-gfx, linux-fsdevel, viro, daniel, linux-tegra,
	alexander.deucher, akpm, linux-media

Am 09.06.22 um 17:19 schrieb Felix Kuehling:
>
> Am 2022-06-09 um 10:21 schrieb Michal Hocko:
>> On Thu 09-06-22 16:10:33, Christian König wrote:
>>> Am 09.06.22 um 14:57 schrieb Michal Hocko:
>>>> On Thu 09-06-22 14:16:56, Christian König wrote:
>>>>> Am 09.06.22 um 11:18 schrieb Michal Hocko:
>>>>>> On Tue 31-05-22 11:59:57, Christian König wrote:
>>>>>>> This gives the OOM killer an additional hint which processes are
>>>>>>> referencing shmem files with potentially no other accounting for 
>>>>>>> them.
>>>>>>>
>>>>>>> Signed-off-by: Christian König <christian.koenig@amd.com>
>>>>>>> ---
>>>>>>>     mm/shmem.c | 6 ++++++
>>>>>>>     1 file changed, 6 insertions(+)
>>>>>>>
>>>>>>> diff --git a/mm/shmem.c b/mm/shmem.c
>>>>>>> index 4b2fea33158e..a4ad92a16968 100644
>>>>>>> --- a/mm/shmem.c
>>>>>>> +++ b/mm/shmem.c
>>>>>>> @@ -2179,6 +2179,11 @@ unsigned long 
>>>>>>> shmem_get_unmapped_area(struct file *file,
>>>>>>>         return inflated_addr;
>>>>>>>     }
>>>>>>> +static long shmem_oom_badness(struct file *file)
>>>>>>> +{
>>>>>>> +    return i_size_read(file_inode(file)) >> PAGE_SHIFT;
>>>>>>> +}
>>>>>> This doesn't really represent the in memory size of the file, 
>>>>>> does it?
>>>>> Well the file could be partially or fully swapped out as anonymous 
>>>>> memory or
>>>>> the address space only sparse populated, but even then just using 
>>>>> the file
>>>>> size as OOM badness sounded like the most straightforward approach 
>>>>> to me.
>>>> It covers hole as well, right?
>>> Yes, exactly.
>> So let's say I have a huge sparse shmem file. I will get killed because
>> the oom_badness of such a file would be large as well...
>
> Would killing processes free shmem files, though? Aren't those 
> persistent anyway? In that case, shmem files should not contribute to 
> oom_badness at all.

At least for the memfd_create() case they do, yes.

Those files were never part of any filesystem in the first place, so by 
killing all the process referencing them you can indeed free the memory 
locked by them.

Regards,
Christian.

>
> I guess a special case would be files that were removed from the 
> filesystem but are still open in some processes.
>
> Regards,
>   Felix
>
>
>>
>>>>> What could happen is that the file is also mmaped and we double 
>>>>> account.
>>>>>
>>>>>> Also the memcg oom handling could be considerably skewed if the 
>>>>>> file was
>>>>>> shared between more memcgs.
>>>>> Yes, and that's one of the reasons why I didn't touched the memcg 
>>>>> by this
>>>>> and only affected the classic OOM killer.
>>>> oom_badness is for all oom handlers, including memcg. Maybe I have
>>>> misread an earlier patch but I do not see anything specific to global
>>>> oom handling.
>>> As far as I can see the oom_badness() function is only used in
>>> oom_kill.c and in procfs to return the oom score. Did I missed
>>> something?
>> oom_kill.c implements most of the oom killer functionality. Memcg oom
>> killing is a part of that. Have a look at select_bad_process.
>>


^ permalink raw reply	[flat|nested] 145+ messages in thread

* Re: [Intel-gfx] [PATCH 03/13] mm: shmem: provide oom badness for shmem files
@ 2022-06-09 15:22                 ` Christian König
  0 siblings, 0 replies; 145+ messages in thread
From: Christian König @ 2022-06-09 15:22 UTC (permalink / raw)
  To: Felix Kuehling, Michal Hocko, Christian König
  Cc: andrey.grodzovsky, linux-mm, nouveau, intel-gfx, hughd,
	linux-kernel, amd-gfx, linux-fsdevel, viro, linux-tegra,
	alexander.deucher, akpm, linux-media

Am 09.06.22 um 17:19 schrieb Felix Kuehling:
>
> Am 2022-06-09 um 10:21 schrieb Michal Hocko:
>> On Thu 09-06-22 16:10:33, Christian König wrote:
>>> Am 09.06.22 um 14:57 schrieb Michal Hocko:
>>>> On Thu 09-06-22 14:16:56, Christian König wrote:
>>>>> Am 09.06.22 um 11:18 schrieb Michal Hocko:
>>>>>> On Tue 31-05-22 11:59:57, Christian König wrote:
>>>>>>> This gives the OOM killer an additional hint which processes are
>>>>>>> referencing shmem files with potentially no other accounting for 
>>>>>>> them.
>>>>>>>
>>>>>>> Signed-off-by: Christian König <christian.koenig@amd.com>
>>>>>>> ---
>>>>>>>     mm/shmem.c | 6 ++++++
>>>>>>>     1 file changed, 6 insertions(+)
>>>>>>>
>>>>>>> diff --git a/mm/shmem.c b/mm/shmem.c
>>>>>>> index 4b2fea33158e..a4ad92a16968 100644
>>>>>>> --- a/mm/shmem.c
>>>>>>> +++ b/mm/shmem.c
>>>>>>> @@ -2179,6 +2179,11 @@ unsigned long 
>>>>>>> shmem_get_unmapped_area(struct file *file,
>>>>>>>         return inflated_addr;
>>>>>>>     }
>>>>>>> +static long shmem_oom_badness(struct file *file)
>>>>>>> +{
>>>>>>> +    return i_size_read(file_inode(file)) >> PAGE_SHIFT;
>>>>>>> +}
>>>>>> This doesn't really represent the in memory size of the file, 
>>>>>> does it?
>>>>> Well the file could be partially or fully swapped out as anonymous 
>>>>> memory or
>>>>> the address space only sparse populated, but even then just using 
>>>>> the file
>>>>> size as OOM badness sounded like the most straightforward approach 
>>>>> to me.
>>>> It covers hole as well, right?
>>> Yes, exactly.
>> So let's say I have a huge sparse shmem file. I will get killed because
>> the oom_badness of such a file would be large as well...
>
> Would killing processes free shmem files, though? Aren't those 
> persistent anyway? In that case, shmem files should not contribute to 
> oom_badness at all.

At least for the memfd_create() case they do, yes.

Those files were never part of any filesystem in the first place, so by 
killing all the process referencing them you can indeed free the memory 
locked by them.

Regards,
Christian.

>
> I guess a special case would be files that were removed from the 
> filesystem but are still open in some processes.
>
> Regards,
>   Felix
>
>
>>
>>>>> What could happen is that the file is also mmaped and we double 
>>>>> account.
>>>>>
>>>>>> Also the memcg oom handling could be considerably skewed if the 
>>>>>> file was
>>>>>> shared between more memcgs.
>>>>> Yes, and that's one of the reasons why I didn't touched the memcg 
>>>>> by this
>>>>> and only affected the classic OOM killer.
>>>> oom_badness is for all oom handlers, including memcg. Maybe I have
>>>> misread an earlier patch but I do not see anything specific to global
>>>> oom handling.
>>> As far as I can see the oom_badness() function is only used in
>>> oom_kill.c and in procfs to return the oom score. Did I missed
>>> something?
>> oom_kill.c implements most of the oom killer functionality. Memcg oom
>> killing is a part of that. Have a look at select_bad_process.
>>


^ permalink raw reply	[flat|nested] 145+ messages in thread

* Re: [PATCH 03/13] mm: shmem: provide oom badness for shmem files
@ 2022-06-09 15:22                 ` Christian König
  0 siblings, 0 replies; 145+ messages in thread
From: Christian König @ 2022-06-09 15:22 UTC (permalink / raw)
  To: Felix Kuehling, Michal Hocko, Christian König
  Cc: andrey.grodzovsky, linux-mm, nouveau, intel-gfx, hughd,
	linux-kernel, amd-gfx, linux-fsdevel, viro, daniel, linux-tegra,
	alexander.deucher, akpm, linux-media

Am 09.06.22 um 17:19 schrieb Felix Kuehling:
>
> Am 2022-06-09 um 10:21 schrieb Michal Hocko:
>> On Thu 09-06-22 16:10:33, Christian König wrote:
>>> Am 09.06.22 um 14:57 schrieb Michal Hocko:
>>>> On Thu 09-06-22 14:16:56, Christian König wrote:
>>>>> Am 09.06.22 um 11:18 schrieb Michal Hocko:
>>>>>> On Tue 31-05-22 11:59:57, Christian König wrote:
>>>>>>> This gives the OOM killer an additional hint which processes are
>>>>>>> referencing shmem files with potentially no other accounting for 
>>>>>>> them.
>>>>>>>
>>>>>>> Signed-off-by: Christian König <christian.koenig@amd.com>
>>>>>>> ---
>>>>>>>     mm/shmem.c | 6 ++++++
>>>>>>>     1 file changed, 6 insertions(+)
>>>>>>>
>>>>>>> diff --git a/mm/shmem.c b/mm/shmem.c
>>>>>>> index 4b2fea33158e..a4ad92a16968 100644
>>>>>>> --- a/mm/shmem.c
>>>>>>> +++ b/mm/shmem.c
>>>>>>> @@ -2179,6 +2179,11 @@ unsigned long 
>>>>>>> shmem_get_unmapped_area(struct file *file,
>>>>>>>         return inflated_addr;
>>>>>>>     }
>>>>>>> +static long shmem_oom_badness(struct file *file)
>>>>>>> +{
>>>>>>> +    return i_size_read(file_inode(file)) >> PAGE_SHIFT;
>>>>>>> +}
>>>>>> This doesn't really represent the in memory size of the file, 
>>>>>> does it?
>>>>> Well the file could be partially or fully swapped out as anonymous 
>>>>> memory or
>>>>> the address space only sparse populated, but even then just using 
>>>>> the file
>>>>> size as OOM badness sounded like the most straightforward approach 
>>>>> to me.
>>>> It covers hole as well, right?
>>> Yes, exactly.
>> So let's say I have a huge sparse shmem file. I will get killed because
>> the oom_badness of such a file would be large as well...
>
> Would killing processes free shmem files, though? Aren't those 
> persistent anyway? In that case, shmem files should not contribute to 
> oom_badness at all.

At least for the memfd_create() case they do, yes.

Those files were never part of any filesystem in the first place, so by 
killing all the process referencing them you can indeed free the memory 
locked by them.

Regards,
Christian.

>
> I guess a special case would be files that were removed from the 
> filesystem but are still open in some processes.
>
> Regards,
>   Felix
>
>
>>
>>>>> What could happen is that the file is also mmaped and we double 
>>>>> account.
>>>>>
>>>>>> Also the memcg oom handling could be considerably skewed if the 
>>>>>> file was
>>>>>> shared between more memcgs.
>>>>> Yes, and that's one of the reasons why I didn't touched the memcg 
>>>>> by this
>>>>> and only affected the classic OOM killer.
>>>> oom_badness is for all oom handlers, including memcg. Maybe I have
>>>> misread an earlier patch but I do not see anything specific to global
>>>> oom handling.
>>> As far as I can see the oom_badness() function is only used in
>>> oom_kill.c and in procfs to return the oom score. Did I missed
>>> something?
>> oom_kill.c implements most of the oom killer functionality. Memcg oom
>> killing is a part of that. Have a look at select_bad_process.
>>


^ permalink raw reply	[flat|nested] 145+ messages in thread

* Re: [PATCH 03/13] mm: shmem: provide oom badness for shmem files
  2022-06-09 15:22                 ` [Nouveau] " Christian König
  (?)
  (?)
@ 2022-06-09 15:54                   ` Michal Hocko
  -1 siblings, 0 replies; 145+ messages in thread
From: Michal Hocko @ 2022-06-09 15:54 UTC (permalink / raw)
  To: Christian König
  Cc: Felix Kuehling, Christian König, linux-media, linux-kernel,
	intel-gfx, amd-gfx, nouveau, linux-tegra, linux-fsdevel,
	linux-mm, alexander.deucher, daniel, viro, akpm, hughd,
	andrey.grodzovsky

On Thu 09-06-22 17:22:14, Christian König wrote:
[...]
> Those files were never part of any filesystem in the first place, so by
> killing all the process referencing them you can indeed free the memory
> locked by them.

Yes, this would require the oom killer to understand that all processes
referencing that file are killed. Theoretically possible but I am not
sure a feasible solution.
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 145+ messages in thread

* Re: [Intel-gfx] [PATCH 03/13] mm: shmem: provide oom badness for shmem files
@ 2022-06-09 15:54                   ` Michal Hocko
  0 siblings, 0 replies; 145+ messages in thread
From: Michal Hocko @ 2022-06-09 15:54 UTC (permalink / raw)
  To: Christian König
  Cc: andrey.grodzovsky, linux-mm, Christian König,
	Felix Kuehling, hughd, linux-kernel, amd-gfx, linux-fsdevel,
	viro, nouveau, linux-tegra, alexander.deucher, akpm, intel-gfx,
	linux-media

On Thu 09-06-22 17:22:14, Christian König wrote:
[...]
> Those files were never part of any filesystem in the first place, so by
> killing all the process referencing them you can indeed free the memory
> locked by them.

Yes, this would require the oom killer to understand that all processes
referencing that file are killed. Theoretically possible but I am not
sure a feasible solution.
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 145+ messages in thread

* Re: [PATCH 03/13] mm: shmem: provide oom badness for shmem files
@ 2022-06-09 15:54                   ` Michal Hocko
  0 siblings, 0 replies; 145+ messages in thread
From: Michal Hocko @ 2022-06-09 15:54 UTC (permalink / raw)
  To: Christian König
  Cc: andrey.grodzovsky, linux-mm, Christian König,
	Felix Kuehling, hughd, linux-kernel, amd-gfx, linux-fsdevel,
	viro, daniel, nouveau, linux-tegra, alexander.deucher, akpm,
	intel-gfx, linux-media

On Thu 09-06-22 17:22:14, Christian König wrote:
[...]
> Those files were never part of any filesystem in the first place, so by
> killing all the process referencing them you can indeed free the memory
> locked by them.

Yes, this would require the oom killer to understand that all processes
referencing that file are killed. Theoretically possible but I am not
sure a feasible solution.
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 145+ messages in thread

* Re: [Nouveau] [PATCH 03/13] mm: shmem: provide oom badness for shmem files
@ 2022-06-09 15:54                   ` Michal Hocko
  0 siblings, 0 replies; 145+ messages in thread
From: Michal Hocko @ 2022-06-09 15:54 UTC (permalink / raw)
  To: Christian König
  Cc: andrey.grodzovsky, linux-mm, Christian König,
	Felix Kuehling, hughd, linux-kernel, amd-gfx, linux-fsdevel,
	viro, daniel, nouveau, linux-tegra, alexander.deucher, akpm,
	intel-gfx, linux-media

On Thu 09-06-22 17:22:14, Christian König wrote:
[...]
> Those files were never part of any filesystem in the first place, so by
> killing all the process referencing them you can indeed free the memory
> locked by them.

Yes, this would require the oom killer to understand that all processes
referencing that file are killed. Theoretically possible but I am not
sure a feasible solution.
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 145+ messages in thread

* Re: [PATCH 03/13] mm: shmem: provide oom badness for shmem files
  2022-06-09 15:07                 ` [Intel-gfx] " Michal Hocko
  (?)
@ 2022-06-10 10:58                   ` Christian König
  -1 siblings, 0 replies; 145+ messages in thread
From: Christian König @ 2022-06-10 10:58 UTC (permalink / raw)
  To: Michal Hocko, Christian König
  Cc: linux-media, linux-kernel, intel-gfx, amd-gfx, nouveau,
	linux-tegra, linux-fsdevel, linux-mm, alexander.deucher, daniel,
	viro, akpm, hughd, andrey.grodzovsky

Am 09.06.22 um 17:07 schrieb Michal Hocko:
> On Thu 09-06-22 16:29:46, Christian König wrote:
> [...]
>> Is that a show stopper? How should we address this?
> This is a hard problem to deal with and I am not sure this simple
> solution is really a good fit. Not only because of the memcg side of
> things. I have my doubts that sparse files handling is ok as well.

Well I didn't claimed that this would be easy, we juts need to start 
somewhere.

Regarding the sparse file handling, how about using 
file->f_mapping->nrpages as badness for shmem files?

That should give us the real number of pages allocated through this 
shmem file and gracefully handles sparse files.

> I do realize this is a long term problem and there is a demand for some
> solution at least. I am not sure how to deal with shared resources
> myself. The best approximation I can come up with is to limit the scope
> of the damage into a memcg context. One idea I was playing with (but
> never convinced myself it is really a worth) is to allow a new mode of
> the oom victim selection for the global oom event. It would be an opt in
> and the victim would be selected from the biggest leaf memcg (or kill
> the whole memcg if it has group_oom configured.
>
> That would address at least some of the accounting issue because charges
> are better tracked than per process memory consumption. It is a crude
> and ugly hack and it doesn't solve the underlying problem as shared
> resources are not guaranteed to be freed when processes die but maybe it
> would be just slightly better than the existing scheme which is clearly
> lacking behind existing userspace.

Well, what is so bad at the approach of giving each process holding a 
reference to some shared memory it's equal amount of badness even when 
the processes belong to different memory control groups?

If you really think that this would be a hard problem for upstreaming we 
could as well keep the behavior for memcg as it is for now. We would 
just need to adjust the paramters to oom_badness() a bit.

Regards,
Christian.

^ permalink raw reply	[flat|nested] 145+ messages in thread

* Re: [Nouveau] [PATCH 03/13] mm: shmem: provide oom badness for shmem files
@ 2022-06-10 10:58                   ` Christian König
  0 siblings, 0 replies; 145+ messages in thread
From: Christian König @ 2022-06-10 10:58 UTC (permalink / raw)
  To: Michal Hocko, Christian König
  Cc: andrey.grodzovsky, linux-mm, nouveau, intel-gfx, hughd,
	linux-kernel, amd-gfx, linux-fsdevel, viro, daniel, linux-tegra,
	alexander.deucher, akpm, linux-media

Am 09.06.22 um 17:07 schrieb Michal Hocko:
> On Thu 09-06-22 16:29:46, Christian König wrote:
> [...]
>> Is that a show stopper? How should we address this?
> This is a hard problem to deal with and I am not sure this simple
> solution is really a good fit. Not only because of the memcg side of
> things. I have my doubts that sparse files handling is ok as well.

Well I didn't claimed that this would be easy, we juts need to start 
somewhere.

Regarding the sparse file handling, how about using 
file->f_mapping->nrpages as badness for shmem files?

That should give us the real number of pages allocated through this 
shmem file and gracefully handles sparse files.

> I do realize this is a long term problem and there is a demand for some
> solution at least. I am not sure how to deal with shared resources
> myself. The best approximation I can come up with is to limit the scope
> of the damage into a memcg context. One idea I was playing with (but
> never convinced myself it is really a worth) is to allow a new mode of
> the oom victim selection for the global oom event. It would be an opt in
> and the victim would be selected from the biggest leaf memcg (or kill
> the whole memcg if it has group_oom configured.
>
> That would address at least some of the accounting issue because charges
> are better tracked than per process memory consumption. It is a crude
> and ugly hack and it doesn't solve the underlying problem as shared
> resources are not guaranteed to be freed when processes die but maybe it
> would be just slightly better than the existing scheme which is clearly
> lacking behind existing userspace.

Well, what is so bad at the approach of giving each process holding a 
reference to some shared memory it's equal amount of badness even when 
the processes belong to different memory control groups?

If you really think that this would be a hard problem for upstreaming we 
could as well keep the behavior for memcg as it is for now. We would 
just need to adjust the paramters to oom_badness() a bit.

Regards,
Christian.

^ permalink raw reply	[flat|nested] 145+ messages in thread

* Re: [PATCH 03/13] mm: shmem: provide oom badness for shmem files
@ 2022-06-10 10:58                   ` Christian König
  0 siblings, 0 replies; 145+ messages in thread
From: Christian König @ 2022-06-10 10:58 UTC (permalink / raw)
  To: Michal Hocko, Christian König
  Cc: andrey.grodzovsky, linux-mm, nouveau, intel-gfx, hughd,
	linux-kernel, amd-gfx, linux-fsdevel, viro, daniel, linux-tegra,
	alexander.deucher, akpm, linux-media

Am 09.06.22 um 17:07 schrieb Michal Hocko:
> On Thu 09-06-22 16:29:46, Christian König wrote:
> [...]
>> Is that a show stopper? How should we address this?
> This is a hard problem to deal with and I am not sure this simple
> solution is really a good fit. Not only because of the memcg side of
> things. I have my doubts that sparse files handling is ok as well.

Well I didn't claimed that this would be easy, we juts need to start 
somewhere.

Regarding the sparse file handling, how about using 
file->f_mapping->nrpages as badness for shmem files?

That should give us the real number of pages allocated through this 
shmem file and gracefully handles sparse files.

> I do realize this is a long term problem and there is a demand for some
> solution at least. I am not sure how to deal with shared resources
> myself. The best approximation I can come up with is to limit the scope
> of the damage into a memcg context. One idea I was playing with (but
> never convinced myself it is really a worth) is to allow a new mode of
> the oom victim selection for the global oom event. It would be an opt in
> and the victim would be selected from the biggest leaf memcg (or kill
> the whole memcg if it has group_oom configured.
>
> That would address at least some of the accounting issue because charges
> are better tracked than per process memory consumption. It is a crude
> and ugly hack and it doesn't solve the underlying problem as shared
> resources are not guaranteed to be freed when processes die but maybe it
> would be just slightly better than the existing scheme which is clearly
> lacking behind existing userspace.

Well, what is so bad at the approach of giving each process holding a 
reference to some shared memory it's equal amount of badness even when 
the processes belong to different memory control groups?

If you really think that this would be a hard problem for upstreaming we 
could as well keep the behavior for memcg as it is for now. We would 
just need to adjust the paramters to oom_badness() a bit.

Regards,
Christian.

^ permalink raw reply	[flat|nested] 145+ messages in thread

* Re: [PATCH 03/13] mm: shmem: provide oom badness for shmem files
  2022-06-10 10:58                   ` [Nouveau] " Christian König
  (?)
  (?)
@ 2022-06-10 11:44                     ` Michal Hocko
  -1 siblings, 0 replies; 145+ messages in thread
From: Michal Hocko @ 2022-06-10 11:44 UTC (permalink / raw)
  To: Christian König
  Cc: Christian König, linux-media, linux-kernel, intel-gfx,
	amd-gfx, nouveau, linux-tegra, linux-fsdevel, linux-mm,
	alexander.deucher, daniel, viro, akpm, hughd, andrey.grodzovsky

On Fri 10-06-22 12:58:53, Christian König wrote:
> Am 09.06.22 um 17:07 schrieb Michal Hocko:
> > On Thu 09-06-22 16:29:46, Christian König wrote:
> > [...]
> > > Is that a show stopper? How should we address this?
> > This is a hard problem to deal with and I am not sure this simple
> > solution is really a good fit. Not only because of the memcg side of
> > things. I have my doubts that sparse files handling is ok as well.
> 
> Well I didn't claimed that this would be easy, we juts need to start
> somewhere.
> 
> Regarding the sparse file handling, how about using file->f_mapping->nrpages
> as badness for shmem files?
> 
> That should give us the real number of pages allocated through this shmem
> file and gracefully handles sparse files.

Yes, this would be a better approximation.

> > I do realize this is a long term problem and there is a demand for some
> > solution at least. I am not sure how to deal with shared resources
> > myself. The best approximation I can come up with is to limit the scope
> > of the damage into a memcg context. One idea I was playing with (but
> > never convinced myself it is really a worth) is to allow a new mode of
> > the oom victim selection for the global oom event.

And just for the clarity. I have mentioned global oom event here but the
concept could be extended to per-memcg oom killer as well.

> > It would be an opt in
> > and the victim would be selected from the biggest leaf memcg (or kill
> > the whole memcg if it has group_oom configured.
> > 
> > That would address at least some of the accounting issue because charges
> > are better tracked than per process memory consumption. It is a crude
> > and ugly hack and it doesn't solve the underlying problem as shared
> > resources are not guaranteed to be freed when processes die but maybe it
> > would be just slightly better than the existing scheme which is clearly
> > lacking behind existing userspace.
> 
> Well, what is so bad at the approach of giving each process holding a
> reference to some shared memory it's equal amount of badness even when the
> processes belong to different memory control groups?

I am not claiming this is wrong per se. It is just an approximation and
it can surely be wrong in some cases (e.g. in those workloads where the
share memory is mostly owned by one process while the shared content is
consumed by many).

The primary question is whether it actually helps much or what kind of
scenarios it can help with and whether we can actually do better for
those. Also do not forget that shared file memory is not the only thing
to care about. What about the kernel memory used on behalf of processes?

Just consider the above mentioned memcg driven model. It doesn't really
require to chase specific files and do some arbitrary math to share the
responsibility. It has a clear accounting and responsibility model.

It shares the same underlying problem that the oom killing is not
resource aware and therefore there is no guarantee that memory really
gets freed.  But it allows sane configurations where shared resources do
not cross memcg boundaries at least. With that in mind and oom_cgroup
semantic you can get at least some semi-sane guarantees. Is it
pefect? No, by any means. But I would expect it to be more predictable.

Maybe we can come up with a saner model, but just going with per file
stats sounds like a hard to predict and debug approach to me. OOM
killing is a very disruptive operation and having random tasks killed
just because they have mapped few pages from a shared resource sounds
like a terrible thing to debug and explain to users.
 
> If you really think that this would be a hard problem for upstreaming we
> could as well keep the behavior for memcg as it is for now. We would just
> need to adjust the paramters to oom_badness() a bit.

Say we ignore the memcg side of things for now. How does it help long
term? Special casing the global oom is not all that hard but any future
change would very likely be disruptive with some semantic implications
AFAICS.
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 145+ messages in thread

* Re: [Intel-gfx] [PATCH 03/13] mm: shmem: provide oom badness for shmem files
@ 2022-06-10 11:44                     ` Michal Hocko
  0 siblings, 0 replies; 145+ messages in thread
From: Michal Hocko @ 2022-06-10 11:44 UTC (permalink / raw)
  To: Christian König
  Cc: andrey.grodzovsky, linux-mm, nouveau, intel-gfx, hughd,
	linux-kernel, amd-gfx, linux-fsdevel, viro, linux-tegra,
	alexander.deucher, akpm, Christian König, linux-media

On Fri 10-06-22 12:58:53, Christian König wrote:
> Am 09.06.22 um 17:07 schrieb Michal Hocko:
> > On Thu 09-06-22 16:29:46, Christian König wrote:
> > [...]
> > > Is that a show stopper? How should we address this?
> > This is a hard problem to deal with and I am not sure this simple
> > solution is really a good fit. Not only because of the memcg side of
> > things. I have my doubts that sparse files handling is ok as well.
> 
> Well I didn't claimed that this would be easy, we juts need to start
> somewhere.
> 
> Regarding the sparse file handling, how about using file->f_mapping->nrpages
> as badness for shmem files?
> 
> That should give us the real number of pages allocated through this shmem
> file and gracefully handles sparse files.

Yes, this would be a better approximation.

> > I do realize this is a long term problem and there is a demand for some
> > solution at least. I am not sure how to deal with shared resources
> > myself. The best approximation I can come up with is to limit the scope
> > of the damage into a memcg context. One idea I was playing with (but
> > never convinced myself it is really a worth) is to allow a new mode of
> > the oom victim selection for the global oom event.

And just for the clarity. I have mentioned global oom event here but the
concept could be extended to per-memcg oom killer as well.

> > It would be an opt in
> > and the victim would be selected from the biggest leaf memcg (or kill
> > the whole memcg if it has group_oom configured.
> > 
> > That would address at least some of the accounting issue because charges
> > are better tracked than per process memory consumption. It is a crude
> > and ugly hack and it doesn't solve the underlying problem as shared
> > resources are not guaranteed to be freed when processes die but maybe it
> > would be just slightly better than the existing scheme which is clearly
> > lacking behind existing userspace.
> 
> Well, what is so bad at the approach of giving each process holding a
> reference to some shared memory it's equal amount of badness even when the
> processes belong to different memory control groups?

I am not claiming this is wrong per se. It is just an approximation and
it can surely be wrong in some cases (e.g. in those workloads where the
share memory is mostly owned by one process while the shared content is
consumed by many).

The primary question is whether it actually helps much or what kind of
scenarios it can help with and whether we can actually do better for
those. Also do not forget that shared file memory is not the only thing
to care about. What about the kernel memory used on behalf of processes?

Just consider the above mentioned memcg driven model. It doesn't really
require to chase specific files and do some arbitrary math to share the
responsibility. It has a clear accounting and responsibility model.

It shares the same underlying problem that the oom killing is not
resource aware and therefore there is no guarantee that memory really
gets freed.  But it allows sane configurations where shared resources do
not cross memcg boundaries at least. With that in mind and oom_cgroup
semantic you can get at least some semi-sane guarantees. Is it
pefect? No, by any means. But I would expect it to be more predictable.

Maybe we can come up with a saner model, but just going with per file
stats sounds like a hard to predict and debug approach to me. OOM
killing is a very disruptive operation and having random tasks killed
just because they have mapped few pages from a shared resource sounds
like a terrible thing to debug and explain to users.
 
> If you really think that this would be a hard problem for upstreaming we
> could as well keep the behavior for memcg as it is for now. We would just
> need to adjust the paramters to oom_badness() a bit.

Say we ignore the memcg side of things for now. How does it help long
term? Special casing the global oom is not all that hard but any future
change would very likely be disruptive with some semantic implications
AFAICS.
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 145+ messages in thread

* Re: [PATCH 03/13] mm: shmem: provide oom badness for shmem files
@ 2022-06-10 11:44                     ` Michal Hocko
  0 siblings, 0 replies; 145+ messages in thread
From: Michal Hocko @ 2022-06-10 11:44 UTC (permalink / raw)
  To: Christian König
  Cc: andrey.grodzovsky, linux-mm, nouveau, intel-gfx, hughd,
	linux-kernel, amd-gfx, linux-fsdevel, viro, daniel, linux-tegra,
	alexander.deucher, akpm, Christian König, linux-media

On Fri 10-06-22 12:58:53, Christian König wrote:
> Am 09.06.22 um 17:07 schrieb Michal Hocko:
> > On Thu 09-06-22 16:29:46, Christian König wrote:
> > [...]
> > > Is that a show stopper? How should we address this?
> > This is a hard problem to deal with and I am not sure this simple
> > solution is really a good fit. Not only because of the memcg side of
> > things. I have my doubts that sparse files handling is ok as well.
> 
> Well I didn't claimed that this would be easy, we juts need to start
> somewhere.
> 
> Regarding the sparse file handling, how about using file->f_mapping->nrpages
> as badness for shmem files?
> 
> That should give us the real number of pages allocated through this shmem
> file and gracefully handles sparse files.

Yes, this would be a better approximation.

> > I do realize this is a long term problem and there is a demand for some
> > solution at least. I am not sure how to deal with shared resources
> > myself. The best approximation I can come up with is to limit the scope
> > of the damage into a memcg context. One idea I was playing with (but
> > never convinced myself it is really a worth) is to allow a new mode of
> > the oom victim selection for the global oom event.

And just for the clarity. I have mentioned global oom event here but the
concept could be extended to per-memcg oom killer as well.

> > It would be an opt in
> > and the victim would be selected from the biggest leaf memcg (or kill
> > the whole memcg if it has group_oom configured.
> > 
> > That would address at least some of the accounting issue because charges
> > are better tracked than per process memory consumption. It is a crude
> > and ugly hack and it doesn't solve the underlying problem as shared
> > resources are not guaranteed to be freed when processes die but maybe it
> > would be just slightly better than the existing scheme which is clearly
> > lacking behind existing userspace.
> 
> Well, what is so bad at the approach of giving each process holding a
> reference to some shared memory it's equal amount of badness even when the
> processes belong to different memory control groups?

I am not claiming this is wrong per se. It is just an approximation and
it can surely be wrong in some cases (e.g. in those workloads where the
share memory is mostly owned by one process while the shared content is
consumed by many).

The primary question is whether it actually helps much or what kind of
scenarios it can help with and whether we can actually do better for
those. Also do not forget that shared file memory is not the only thing
to care about. What about the kernel memory used on behalf of processes?

Just consider the above mentioned memcg driven model. It doesn't really
require to chase specific files and do some arbitrary math to share the
responsibility. It has a clear accounting and responsibility model.

It shares the same underlying problem that the oom killing is not
resource aware and therefore there is no guarantee that memory really
gets freed.  But it allows sane configurations where shared resources do
not cross memcg boundaries at least. With that in mind and oom_cgroup
semantic you can get at least some semi-sane guarantees. Is it
pefect? No, by any means. But I would expect it to be more predictable.

Maybe we can come up with a saner model, but just going with per file
stats sounds like a hard to predict and debug approach to me. OOM
killing is a very disruptive operation and having random tasks killed
just because they have mapped few pages from a shared resource sounds
like a terrible thing to debug and explain to users.
 
> If you really think that this would be a hard problem for upstreaming we
> could as well keep the behavior for memcg as it is for now. We would just
> need to adjust the paramters to oom_badness() a bit.

Say we ignore the memcg side of things for now. How does it help long
term? Special casing the global oom is not all that hard but any future
change would very likely be disruptive with some semantic implications
AFAICS.
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 145+ messages in thread

* Re: [Nouveau] [PATCH 03/13] mm: shmem: provide oom badness for shmem files
@ 2022-06-10 11:44                     ` Michal Hocko
  0 siblings, 0 replies; 145+ messages in thread
From: Michal Hocko @ 2022-06-10 11:44 UTC (permalink / raw)
  To: Christian König
  Cc: andrey.grodzovsky, linux-mm, nouveau, intel-gfx, hughd,
	linux-kernel, amd-gfx, linux-fsdevel, viro, daniel, linux-tegra,
	alexander.deucher, akpm, Christian König, linux-media

On Fri 10-06-22 12:58:53, Christian König wrote:
> Am 09.06.22 um 17:07 schrieb Michal Hocko:
> > On Thu 09-06-22 16:29:46, Christian König wrote:
> > [...]
> > > Is that a show stopper? How should we address this?
> > This is a hard problem to deal with and I am not sure this simple
> > solution is really a good fit. Not only because of the memcg side of
> > things. I have my doubts that sparse files handling is ok as well.
> 
> Well I didn't claimed that this would be easy, we juts need to start
> somewhere.
> 
> Regarding the sparse file handling, how about using file->f_mapping->nrpages
> as badness for shmem files?
> 
> That should give us the real number of pages allocated through this shmem
> file and gracefully handles sparse files.

Yes, this would be a better approximation.

> > I do realize this is a long term problem and there is a demand for some
> > solution at least. I am not sure how to deal with shared resources
> > myself. The best approximation I can come up with is to limit the scope
> > of the damage into a memcg context. One idea I was playing with (but
> > never convinced myself it is really a worth) is to allow a new mode of
> > the oom victim selection for the global oom event.

And just for the clarity. I have mentioned global oom event here but the
concept could be extended to per-memcg oom killer as well.

> > It would be an opt in
> > and the victim would be selected from the biggest leaf memcg (or kill
> > the whole memcg if it has group_oom configured.
> > 
> > That would address at least some of the accounting issue because charges
> > are better tracked than per process memory consumption. It is a crude
> > and ugly hack and it doesn't solve the underlying problem as shared
> > resources are not guaranteed to be freed when processes die but maybe it
> > would be just slightly better than the existing scheme which is clearly
> > lacking behind existing userspace.
> 
> Well, what is so bad at the approach of giving each process holding a
> reference to some shared memory it's equal amount of badness even when the
> processes belong to different memory control groups?

I am not claiming this is wrong per se. It is just an approximation and
it can surely be wrong in some cases (e.g. in those workloads where the
share memory is mostly owned by one process while the shared content is
consumed by many).

The primary question is whether it actually helps much or what kind of
scenarios it can help with and whether we can actually do better for
those. Also do not forget that shared file memory is not the only thing
to care about. What about the kernel memory used on behalf of processes?

Just consider the above mentioned memcg driven model. It doesn't really
require to chase specific files and do some arbitrary math to share the
responsibility. It has a clear accounting and responsibility model.

It shares the same underlying problem that the oom killing is not
resource aware and therefore there is no guarantee that memory really
gets freed.  But it allows sane configurations where shared resources do
not cross memcg boundaries at least. With that in mind and oom_cgroup
semantic you can get at least some semi-sane guarantees. Is it
pefect? No, by any means. But I would expect it to be more predictable.

Maybe we can come up with a saner model, but just going with per file
stats sounds like a hard to predict and debug approach to me. OOM
killing is a very disruptive operation and having random tasks killed
just because they have mapped few pages from a shared resource sounds
like a terrible thing to debug and explain to users.
 
> If you really think that this would be a hard problem for upstreaming we
> could as well keep the behavior for memcg as it is for now. We would just
> need to adjust the paramters to oom_badness() a bit.

Say we ignore the memcg side of things for now. How does it help long
term? Special casing the global oom is not all that hard but any future
change would very likely be disruptive with some semantic implications
AFAICS.
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 145+ messages in thread

* Re: [PATCH 03/13] mm: shmem: provide oom badness for shmem files
  2022-06-10 11:44                     ` [Intel-gfx] " Michal Hocko
  (?)
  (?)
@ 2022-06-10 12:17                       ` Christian König
  -1 siblings, 0 replies; 145+ messages in thread
From: Christian König @ 2022-06-10 12:17 UTC (permalink / raw)
  To: Michal Hocko, Christian König
  Cc: linux-media, linux-kernel, intel-gfx, amd-gfx, nouveau,
	linux-tegra, linux-fsdevel, linux-mm, alexander.deucher, daniel,
	viro, akpm, hughd, andrey.grodzovsky

Am 10.06.22 um 13:44 schrieb Michal Hocko:
> On Fri 10-06-22 12:58:53, Christian König wrote:
> [SNIP]
>>> I do realize this is a long term problem and there is a demand for some
>>> solution at least. I am not sure how to deal with shared resources
>>> myself. The best approximation I can come up with is to limit the scope
>>> of the damage into a memcg context. One idea I was playing with (but
>>> never convinced myself it is really a worth) is to allow a new mode of
>>> the oom victim selection for the global oom event.
> And just for the clarity. I have mentioned global oom event here but the
> concept could be extended to per-memcg oom killer as well.

Then what exactly do you mean with "limiting the scope of the damage"? 
Cause that doesn't make sense without memcg.

>>> It would be an opt in
>>> and the victim would be selected from the biggest leaf memcg (or kill
>>> the whole memcg if it has group_oom configured.
>>>
>>> That would address at least some of the accounting issue because charges
>>> are better tracked than per process memory consumption. It is a crude
>>> and ugly hack and it doesn't solve the underlying problem as shared
>>> resources are not guaranteed to be freed when processes die but maybe it
>>> would be just slightly better than the existing scheme which is clearly
>>> lacking behind existing userspace.
>> Well, what is so bad at the approach of giving each process holding a
>> reference to some shared memory it's equal amount of badness even when the
>> processes belong to different memory control groups?
> I am not claiming this is wrong per se. It is just an approximation and
> it can surely be wrong in some cases (e.g. in those workloads where the
> share memory is mostly owned by one process while the shared content is
> consumed by many).

Yeah, completely agree. Basically we can only do an educated guess.

Key point is that we should do the most educated guess we can and not 
just try to randomly kill something until we hit the right target. 
That's essentially what's happening today.

> The primary question is whether it actually helps much or what kind of
> scenarios it can help with and whether we can actually do better for
> those.

Well, it does help massively with a standard Linux desktop and GPU 
workloads (e.g. games).

See what currently happens is that when games allocate for example 
textures the memory for that is not accounted against that game. Instead 
it's usually the display server (X or Wayland) which most of the shared 
resources accounts to because it needs to compose a desktop from it and 
usually also mmaps it for fallback CPU operations.

So what happens when a games over allocates texture resources is that 
your whole desktop restarts because the compositor is killed. This 
obviously also kills the game, but it would be much nice if we would be 
more selective here.

For hardware rendering DMA-buf and GPU drivers are used, but for the 
software fallback shmem files is what is used under the hood as far as I 
know. And the underlying problem is the same for both.

> Also do not forget that shared file memory is not the only thing
> to care about. What about the kernel memory used on behalf of processes?

Yeah, I'm aware of that as well. But at least inside the GPU drivers we 
try to keep that in a reasonable ratio.

> Just consider the above mentioned memcg driven model. It doesn't really
> require to chase specific files and do some arbitrary math to share the
> responsibility. It has a clear accounting and responsibility model.

Ok, how does that work then?

> It shares the same underlying problem that the oom killing is not
> resource aware and therefore there is no guarantee that memory really
> gets freed.  But it allows sane configurations where shared resources do
> not cross memcg boundaries at least. With that in mind and oom_cgroup
> semantic you can get at least some semi-sane guarantees. Is it
> pefect? No, by any means. But I would expect it to be more predictable.
>
> Maybe we can come up with a saner model, but just going with per file
> stats sounds like a hard to predict and debug approach to me. OOM
> killing is a very disruptive operation and having random tasks killed
> just because they have mapped few pages from a shared resource sounds
> like a terrible thing to debug and explain to users.

Well to be honest I think it's much saner than what we do today.

As I said you currently can get any Linux system down within seconds and 
that's basically a perfect deny of service attack.

>> If you really think that this would be a hard problem for upstreaming we
>> could as well keep the behavior for memcg as it is for now. We would just
>> need to adjust the paramters to oom_badness() a bit.
> Say we ignore the memcg side of things for now. How does it help long
> term? Special casing the global oom is not all that hard but any future
> change would very likely be disruptive with some semantic implications
> AFAICS.

What else can we do? I mean the desktop instability we are facing is 
really massive.

Regards,
Christian.

^ permalink raw reply	[flat|nested] 145+ messages in thread

* Re: [Nouveau] [PATCH 03/13] mm: shmem: provide oom badness for shmem files
@ 2022-06-10 12:17                       ` Christian König
  0 siblings, 0 replies; 145+ messages in thread
From: Christian König @ 2022-06-10 12:17 UTC (permalink / raw)
  To: Michal Hocko, Christian König
  Cc: andrey.grodzovsky, linux-mm, nouveau, intel-gfx, hughd,
	linux-kernel, amd-gfx, linux-fsdevel, viro, daniel, linux-tegra,
	alexander.deucher, akpm, linux-media

Am 10.06.22 um 13:44 schrieb Michal Hocko:
> On Fri 10-06-22 12:58:53, Christian König wrote:
> [SNIP]
>>> I do realize this is a long term problem and there is a demand for some
>>> solution at least. I am not sure how to deal with shared resources
>>> myself. The best approximation I can come up with is to limit the scope
>>> of the damage into a memcg context. One idea I was playing with (but
>>> never convinced myself it is really a worth) is to allow a new mode of
>>> the oom victim selection for the global oom event.
> And just for the clarity. I have mentioned global oom event here but the
> concept could be extended to per-memcg oom killer as well.

Then what exactly do you mean with "limiting the scope of the damage"? 
Cause that doesn't make sense without memcg.

>>> It would be an opt in
>>> and the victim would be selected from the biggest leaf memcg (or kill
>>> the whole memcg if it has group_oom configured.
>>>
>>> That would address at least some of the accounting issue because charges
>>> are better tracked than per process memory consumption. It is a crude
>>> and ugly hack and it doesn't solve the underlying problem as shared
>>> resources are not guaranteed to be freed when processes die but maybe it
>>> would be just slightly better than the existing scheme which is clearly
>>> lacking behind existing userspace.
>> Well, what is so bad at the approach of giving each process holding a
>> reference to some shared memory it's equal amount of badness even when the
>> processes belong to different memory control groups?
> I am not claiming this is wrong per se. It is just an approximation and
> it can surely be wrong in some cases (e.g. in those workloads where the
> share memory is mostly owned by one process while the shared content is
> consumed by many).

Yeah, completely agree. Basically we can only do an educated guess.

Key point is that we should do the most educated guess we can and not 
just try to randomly kill something until we hit the right target. 
That's essentially what's happening today.

> The primary question is whether it actually helps much or what kind of
> scenarios it can help with and whether we can actually do better for
> those.

Well, it does help massively with a standard Linux desktop and GPU 
workloads (e.g. games).

See what currently happens is that when games allocate for example 
textures the memory for that is not accounted against that game. Instead 
it's usually the display server (X or Wayland) which most of the shared 
resources accounts to because it needs to compose a desktop from it and 
usually also mmaps it for fallback CPU operations.

So what happens when a games over allocates texture resources is that 
your whole desktop restarts because the compositor is killed. This 
obviously also kills the game, but it would be much nice if we would be 
more selective here.

For hardware rendering DMA-buf and GPU drivers are used, but for the 
software fallback shmem files is what is used under the hood as far as I 
know. And the underlying problem is the same for both.

> Also do not forget that shared file memory is not the only thing
> to care about. What about the kernel memory used on behalf of processes?

Yeah, I'm aware of that as well. But at least inside the GPU drivers we 
try to keep that in a reasonable ratio.

> Just consider the above mentioned memcg driven model. It doesn't really
> require to chase specific files and do some arbitrary math to share the
> responsibility. It has a clear accounting and responsibility model.

Ok, how does that work then?

> It shares the same underlying problem that the oom killing is not
> resource aware and therefore there is no guarantee that memory really
> gets freed.  But it allows sane configurations where shared resources do
> not cross memcg boundaries at least. With that in mind and oom_cgroup
> semantic you can get at least some semi-sane guarantees. Is it
> pefect? No, by any means. But I would expect it to be more predictable.
>
> Maybe we can come up with a saner model, but just going with per file
> stats sounds like a hard to predict and debug approach to me. OOM
> killing is a very disruptive operation and having random tasks killed
> just because they have mapped few pages from a shared resource sounds
> like a terrible thing to debug and explain to users.

Well to be honest I think it's much saner than what we do today.

As I said you currently can get any Linux system down within seconds and 
that's basically a perfect deny of service attack.

>> If you really think that this would be a hard problem for upstreaming we
>> could as well keep the behavior for memcg as it is for now. We would just
>> need to adjust the paramters to oom_badness() a bit.
> Say we ignore the memcg side of things for now. How does it help long
> term? Special casing the global oom is not all that hard but any future
> change would very likely be disruptive with some semantic implications
> AFAICS.

What else can we do? I mean the desktop instability we are facing is 
really massive.

Regards,
Christian.

^ permalink raw reply	[flat|nested] 145+ messages in thread

* Re: [Intel-gfx] [PATCH 03/13] mm: shmem: provide oom badness for shmem files
@ 2022-06-10 12:17                       ` Christian König
  0 siblings, 0 replies; 145+ messages in thread
From: Christian König @ 2022-06-10 12:17 UTC (permalink / raw)
  To: Michal Hocko, Christian König
  Cc: andrey.grodzovsky, linux-mm, nouveau, intel-gfx, hughd,
	linux-kernel, amd-gfx, linux-fsdevel, viro, linux-tegra,
	alexander.deucher, akpm, linux-media

Am 10.06.22 um 13:44 schrieb Michal Hocko:
> On Fri 10-06-22 12:58:53, Christian König wrote:
> [SNIP]
>>> I do realize this is a long term problem and there is a demand for some
>>> solution at least. I am not sure how to deal with shared resources
>>> myself. The best approximation I can come up with is to limit the scope
>>> of the damage into a memcg context. One idea I was playing with (but
>>> never convinced myself it is really a worth) is to allow a new mode of
>>> the oom victim selection for the global oom event.
> And just for the clarity. I have mentioned global oom event here but the
> concept could be extended to per-memcg oom killer as well.

Then what exactly do you mean with "limiting the scope of the damage"? 
Cause that doesn't make sense without memcg.

>>> It would be an opt in
>>> and the victim would be selected from the biggest leaf memcg (or kill
>>> the whole memcg if it has group_oom configured.
>>>
>>> That would address at least some of the accounting issue because charges
>>> are better tracked than per process memory consumption. It is a crude
>>> and ugly hack and it doesn't solve the underlying problem as shared
>>> resources are not guaranteed to be freed when processes die but maybe it
>>> would be just slightly better than the existing scheme which is clearly
>>> lacking behind existing userspace.
>> Well, what is so bad at the approach of giving each process holding a
>> reference to some shared memory it's equal amount of badness even when the
>> processes belong to different memory control groups?
> I am not claiming this is wrong per se. It is just an approximation and
> it can surely be wrong in some cases (e.g. in those workloads where the
> share memory is mostly owned by one process while the shared content is
> consumed by many).

Yeah, completely agree. Basically we can only do an educated guess.

Key point is that we should do the most educated guess we can and not 
just try to randomly kill something until we hit the right target. 
That's essentially what's happening today.

> The primary question is whether it actually helps much or what kind of
> scenarios it can help with and whether we can actually do better for
> those.

Well, it does help massively with a standard Linux desktop and GPU 
workloads (e.g. games).

See what currently happens is that when games allocate for example 
textures the memory for that is not accounted against that game. Instead 
it's usually the display server (X or Wayland) which most of the shared 
resources accounts to because it needs to compose a desktop from it and 
usually also mmaps it for fallback CPU operations.

So what happens when a games over allocates texture resources is that 
your whole desktop restarts because the compositor is killed. This 
obviously also kills the game, but it would be much nice if we would be 
more selective here.

For hardware rendering DMA-buf and GPU drivers are used, but for the 
software fallback shmem files is what is used under the hood as far as I 
know. And the underlying problem is the same for both.

> Also do not forget that shared file memory is not the only thing
> to care about. What about the kernel memory used on behalf of processes?

Yeah, I'm aware of that as well. But at least inside the GPU drivers we 
try to keep that in a reasonable ratio.

> Just consider the above mentioned memcg driven model. It doesn't really
> require to chase specific files and do some arbitrary math to share the
> responsibility. It has a clear accounting and responsibility model.

Ok, how does that work then?

> It shares the same underlying problem that the oom killing is not
> resource aware and therefore there is no guarantee that memory really
> gets freed.  But it allows sane configurations where shared resources do
> not cross memcg boundaries at least. With that in mind and oom_cgroup
> semantic you can get at least some semi-sane guarantees. Is it
> pefect? No, by any means. But I would expect it to be more predictable.
>
> Maybe we can come up with a saner model, but just going with per file
> stats sounds like a hard to predict and debug approach to me. OOM
> killing is a very disruptive operation and having random tasks killed
> just because they have mapped few pages from a shared resource sounds
> like a terrible thing to debug and explain to users.

Well to be honest I think it's much saner than what we do today.

As I said you currently can get any Linux system down within seconds and 
that's basically a perfect deny of service attack.

>> If you really think that this would be a hard problem for upstreaming we
>> could as well keep the behavior for memcg as it is for now. We would just
>> need to adjust the paramters to oom_badness() a bit.
> Say we ignore the memcg side of things for now. How does it help long
> term? Special casing the global oom is not all that hard but any future
> change would very likely be disruptive with some semantic implications
> AFAICS.

What else can we do? I mean the desktop instability we are facing is 
really massive.

Regards,
Christian.

^ permalink raw reply	[flat|nested] 145+ messages in thread

* Re: [PATCH 03/13] mm: shmem: provide oom badness for shmem files
@ 2022-06-10 12:17                       ` Christian König
  0 siblings, 0 replies; 145+ messages in thread
From: Christian König @ 2022-06-10 12:17 UTC (permalink / raw)
  To: Michal Hocko, Christian König
  Cc: andrey.grodzovsky, linux-mm, nouveau, intel-gfx, hughd,
	linux-kernel, amd-gfx, linux-fsdevel, viro, daniel, linux-tegra,
	alexander.deucher, akpm, linux-media

Am 10.06.22 um 13:44 schrieb Michal Hocko:
> On Fri 10-06-22 12:58:53, Christian König wrote:
> [SNIP]
>>> I do realize this is a long term problem and there is a demand for some
>>> solution at least. I am not sure how to deal with shared resources
>>> myself. The best approximation I can come up with is to limit the scope
>>> of the damage into a memcg context. One idea I was playing with (but
>>> never convinced myself it is really a worth) is to allow a new mode of
>>> the oom victim selection for the global oom event.
> And just for the clarity. I have mentioned global oom event here but the
> concept could be extended to per-memcg oom killer as well.

Then what exactly do you mean with "limiting the scope of the damage"? 
Cause that doesn't make sense without memcg.

>>> It would be an opt in
>>> and the victim would be selected from the biggest leaf memcg (or kill
>>> the whole memcg if it has group_oom configured.
>>>
>>> That would address at least some of the accounting issue because charges
>>> are better tracked than per process memory consumption. It is a crude
>>> and ugly hack and it doesn't solve the underlying problem as shared
>>> resources are not guaranteed to be freed when processes die but maybe it
>>> would be just slightly better than the existing scheme which is clearly
>>> lacking behind existing userspace.
>> Well, what is so bad at the approach of giving each process holding a
>> reference to some shared memory it's equal amount of badness even when the
>> processes belong to different memory control groups?
> I am not claiming this is wrong per se. It is just an approximation and
> it can surely be wrong in some cases (e.g. in those workloads where the
> share memory is mostly owned by one process while the shared content is
> consumed by many).

Yeah, completely agree. Basically we can only do an educated guess.

Key point is that we should do the most educated guess we can and not 
just try to randomly kill something until we hit the right target. 
That's essentially what's happening today.

> The primary question is whether it actually helps much or what kind of
> scenarios it can help with and whether we can actually do better for
> those.

Well, it does help massively with a standard Linux desktop and GPU 
workloads (e.g. games).

See what currently happens is that when games allocate for example 
textures the memory for that is not accounted against that game. Instead 
it's usually the display server (X or Wayland) which most of the shared 
resources accounts to because it needs to compose a desktop from it and 
usually also mmaps it for fallback CPU operations.

So what happens when a games over allocates texture resources is that 
your whole desktop restarts because the compositor is killed. This 
obviously also kills the game, but it would be much nice if we would be 
more selective here.

For hardware rendering DMA-buf and GPU drivers are used, but for the 
software fallback shmem files is what is used under the hood as far as I 
know. And the underlying problem is the same for both.

> Also do not forget that shared file memory is not the only thing
> to care about. What about the kernel memory used on behalf of processes?

Yeah, I'm aware of that as well. But at least inside the GPU drivers we 
try to keep that in a reasonable ratio.

> Just consider the above mentioned memcg driven model. It doesn't really
> require to chase specific files and do some arbitrary math to share the
> responsibility. It has a clear accounting and responsibility model.

Ok, how does that work then?

> It shares the same underlying problem that the oom killing is not
> resource aware and therefore there is no guarantee that memory really
> gets freed.  But it allows sane configurations where shared resources do
> not cross memcg boundaries at least. With that in mind and oom_cgroup
> semantic you can get at least some semi-sane guarantees. Is it
> pefect? No, by any means. But I would expect it to be more predictable.
>
> Maybe we can come up with a saner model, but just going with per file
> stats sounds like a hard to predict and debug approach to me. OOM
> killing is a very disruptive operation and having random tasks killed
> just because they have mapped few pages from a shared resource sounds
> like a terrible thing to debug and explain to users.

Well to be honest I think it's much saner than what we do today.

As I said you currently can get any Linux system down within seconds and 
that's basically a perfect deny of service attack.

>> If you really think that this would be a hard problem for upstreaming we
>> could as well keep the behavior for memcg as it is for now. We would just
>> need to adjust the paramters to oom_badness() a bit.
> Say we ignore the memcg side of things for now. How does it help long
> term? Special casing the global oom is not all that hard but any future
> change would very likely be disruptive with some semantic implications
> AFAICS.

What else can we do? I mean the desktop instability we are facing is 
really massive.

Regards,
Christian.

^ permalink raw reply	[flat|nested] 145+ messages in thread

* Re: [PATCH 03/13] mm: shmem: provide oom badness for shmem files
  2022-06-10 12:17                       ` [Nouveau] " Christian König
  (?)
  (?)
@ 2022-06-10 14:16                         ` Michal Hocko
  -1 siblings, 0 replies; 145+ messages in thread
From: Michal Hocko @ 2022-06-10 14:16 UTC (permalink / raw)
  To: Christian König
  Cc: Christian König, linux-media, linux-kernel, intel-gfx,
	amd-gfx, nouveau, linux-tegra, linux-fsdevel, linux-mm,
	alexander.deucher, daniel, viro, akpm, hughd, andrey.grodzovsky

On Fri 10-06-22 14:17:27, Christian König wrote:
> Am 10.06.22 um 13:44 schrieb Michal Hocko:
> > On Fri 10-06-22 12:58:53, Christian König wrote:
> > [SNIP]
> > > > I do realize this is a long term problem and there is a demand for some
> > > > solution at least. I am not sure how to deal with shared resources
> > > > myself. The best approximation I can come up with is to limit the scope
> > > > of the damage into a memcg context. One idea I was playing with (but
> > > > never convinced myself it is really a worth) is to allow a new mode of
> > > > the oom victim selection for the global oom event.
> > And just for the clarity. I have mentioned global oom event here but the
> > concept could be extended to per-memcg oom killer as well.
> 
> Then what exactly do you mean with "limiting the scope of the damage"? Cause
> that doesn't make sense without memcg.

What I meant to say is to use the scheme of the damage control
not only to the global oom situation (on the global shortage of memory)
but also to the memcg oom situation (when the hard limit on a hierarchy
is reached).

[...]
> > The primary question is whether it actually helps much or what kind of
> > scenarios it can help with and whether we can actually do better for
> > those.
> 
> Well, it does help massively with a standard Linux desktop and GPU workloads
> (e.g. games).
> 
> See what currently happens is that when games allocate for example textures
> the memory for that is not accounted against that game. Instead it's usually
> the display server (X or Wayland) which most of the shared resources
> accounts to because it needs to compose a desktop from it and usually also
> mmaps it for fallback CPU operations.

Let me try to understand some more. So the game (or the entity to be
responsible for the resource) doesn't really allocate the memory but it
relies on somebody else (from memcg perspective living in a different
resource domain - i.e. a different memcg) to do that on its behalf.
Correct? If that is the case then that is certainly not fitting into the
memcg model then.
I am not really sure there is any reasonable model where you cannot
really tell who is responsible for the resource.

> So what happens when a games over allocates texture resources is that your
> whole desktop restarts because the compositor is killed. This obviously also
> kills the game, but it would be much nice if we would be more selective
> here.
> 
> For hardware rendering DMA-buf and GPU drivers are used, but for the
> software fallback shmem files is what is used under the hood as far as I
> know. And the underlying problem is the same for both.

For shmem files the end user of the buffer can preallocate and so own
the buffer and be accounted for it.
> 
> > Also do not forget that shared file memory is not the only thing
> > to care about. What about the kernel memory used on behalf of processes?
> 
> Yeah, I'm aware of that as well. But at least inside the GPU drivers we try
> to keep that in a reasonable ratio.
> 
> > Just consider the above mentioned memcg driven model. It doesn't really
> > require to chase specific files and do some arbitrary math to share the
> > responsibility. It has a clear accounting and responsibility model.
> 
> Ok, how does that work then?

The memory is accounted to whoever faults that memory in or to the
allocating context if that is a kernel memory (in most situations).
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 145+ messages in thread

* Re: [Intel-gfx] [PATCH 03/13] mm: shmem: provide oom badness for shmem files
@ 2022-06-10 14:16                         ` Michal Hocko
  0 siblings, 0 replies; 145+ messages in thread
From: Michal Hocko @ 2022-06-10 14:16 UTC (permalink / raw)
  To: Christian König
  Cc: andrey.grodzovsky, linux-mm, nouveau, intel-gfx, hughd,
	linux-kernel, amd-gfx, linux-fsdevel, viro, Christian König,
	linux-tegra, alexander.deucher, akpm, linux-media

On Fri 10-06-22 14:17:27, Christian König wrote:
> Am 10.06.22 um 13:44 schrieb Michal Hocko:
> > On Fri 10-06-22 12:58:53, Christian König wrote:
> > [SNIP]
> > > > I do realize this is a long term problem and there is a demand for some
> > > > solution at least. I am not sure how to deal with shared resources
> > > > myself. The best approximation I can come up with is to limit the scope
> > > > of the damage into a memcg context. One idea I was playing with (but
> > > > never convinced myself it is really a worth) is to allow a new mode of
> > > > the oom victim selection for the global oom event.
> > And just for the clarity. I have mentioned global oom event here but the
> > concept could be extended to per-memcg oom killer as well.
> 
> Then what exactly do you mean with "limiting the scope of the damage"? Cause
> that doesn't make sense without memcg.

What I meant to say is to use the scheme of the damage control
not only to the global oom situation (on the global shortage of memory)
but also to the memcg oom situation (when the hard limit on a hierarchy
is reached).

[...]
> > The primary question is whether it actually helps much or what kind of
> > scenarios it can help with and whether we can actually do better for
> > those.
> 
> Well, it does help massively with a standard Linux desktop and GPU workloads
> (e.g. games).
> 
> See what currently happens is that when games allocate for example textures
> the memory for that is not accounted against that game. Instead it's usually
> the display server (X or Wayland) which most of the shared resources
> accounts to because it needs to compose a desktop from it and usually also
> mmaps it for fallback CPU operations.

Let me try to understand some more. So the game (or the entity to be
responsible for the resource) doesn't really allocate the memory but it
relies on somebody else (from memcg perspective living in a different
resource domain - i.e. a different memcg) to do that on its behalf.
Correct? If that is the case then that is certainly not fitting into the
memcg model then.
I am not really sure there is any reasonable model where you cannot
really tell who is responsible for the resource.

> So what happens when a games over allocates texture resources is that your
> whole desktop restarts because the compositor is killed. This obviously also
> kills the game, but it would be much nice if we would be more selective
> here.
> 
> For hardware rendering DMA-buf and GPU drivers are used, but for the
> software fallback shmem files is what is used under the hood as far as I
> know. And the underlying problem is the same for both.

For shmem files the end user of the buffer can preallocate and so own
the buffer and be accounted for it.
> 
> > Also do not forget that shared file memory is not the only thing
> > to care about. What about the kernel memory used on behalf of processes?
> 
> Yeah, I'm aware of that as well. But at least inside the GPU drivers we try
> to keep that in a reasonable ratio.
> 
> > Just consider the above mentioned memcg driven model. It doesn't really
> > require to chase specific files and do some arbitrary math to share the
> > responsibility. It has a clear accounting and responsibility model.
> 
> Ok, how does that work then?

The memory is accounted to whoever faults that memory in or to the
allocating context if that is a kernel memory (in most situations).
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 145+ messages in thread

* Re: [PATCH 03/13] mm: shmem: provide oom badness for shmem files
@ 2022-06-10 14:16                         ` Michal Hocko
  0 siblings, 0 replies; 145+ messages in thread
From: Michal Hocko @ 2022-06-10 14:16 UTC (permalink / raw)
  To: Christian König
  Cc: andrey.grodzovsky, linux-mm, nouveau, intel-gfx, hughd,
	linux-kernel, amd-gfx, linux-fsdevel, viro, daniel,
	Christian König, linux-tegra, alexander.deucher, akpm,
	linux-media

On Fri 10-06-22 14:17:27, Christian König wrote:
> Am 10.06.22 um 13:44 schrieb Michal Hocko:
> > On Fri 10-06-22 12:58:53, Christian König wrote:
> > [SNIP]
> > > > I do realize this is a long term problem and there is a demand for some
> > > > solution at least. I am not sure how to deal with shared resources
> > > > myself. The best approximation I can come up with is to limit the scope
> > > > of the damage into a memcg context. One idea I was playing with (but
> > > > never convinced myself it is really a worth) is to allow a new mode of
> > > > the oom victim selection for the global oom event.
> > And just for the clarity. I have mentioned global oom event here but the
> > concept could be extended to per-memcg oom killer as well.
> 
> Then what exactly do you mean with "limiting the scope of the damage"? Cause
> that doesn't make sense without memcg.

What I meant to say is to use the scheme of the damage control
not only to the global oom situation (on the global shortage of memory)
but also to the memcg oom situation (when the hard limit on a hierarchy
is reached).

[...]
> > The primary question is whether it actually helps much or what kind of
> > scenarios it can help with and whether we can actually do better for
> > those.
> 
> Well, it does help massively with a standard Linux desktop and GPU workloads
> (e.g. games).
> 
> See what currently happens is that when games allocate for example textures
> the memory for that is not accounted against that game. Instead it's usually
> the display server (X or Wayland) which most of the shared resources
> accounts to because it needs to compose a desktop from it and usually also
> mmaps it for fallback CPU operations.

Let me try to understand some more. So the game (or the entity to be
responsible for the resource) doesn't really allocate the memory but it
relies on somebody else (from memcg perspective living in a different
resource domain - i.e. a different memcg) to do that on its behalf.
Correct? If that is the case then that is certainly not fitting into the
memcg model then.
I am not really sure there is any reasonable model where you cannot
really tell who is responsible for the resource.

> So what happens when a games over allocates texture resources is that your
> whole desktop restarts because the compositor is killed. This obviously also
> kills the game, but it would be much nice if we would be more selective
> here.
> 
> For hardware rendering DMA-buf and GPU drivers are used, but for the
> software fallback shmem files is what is used under the hood as far as I
> know. And the underlying problem is the same for both.

For shmem files the end user of the buffer can preallocate and so own
the buffer and be accounted for it.
> 
> > Also do not forget that shared file memory is not the only thing
> > to care about. What about the kernel memory used on behalf of processes?
> 
> Yeah, I'm aware of that as well. But at least inside the GPU drivers we try
> to keep that in a reasonable ratio.
> 
> > Just consider the above mentioned memcg driven model. It doesn't really
> > require to chase specific files and do some arbitrary math to share the
> > responsibility. It has a clear accounting and responsibility model.
> 
> Ok, how does that work then?

The memory is accounted to whoever faults that memory in or to the
allocating context if that is a kernel memory (in most situations).
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 145+ messages in thread

* Re: [Nouveau] [PATCH 03/13] mm: shmem: provide oom badness for shmem files
@ 2022-06-10 14:16                         ` Michal Hocko
  0 siblings, 0 replies; 145+ messages in thread
From: Michal Hocko @ 2022-06-10 14:16 UTC (permalink / raw)
  To: Christian König
  Cc: andrey.grodzovsky, linux-mm, nouveau, intel-gfx, hughd,
	linux-kernel, amd-gfx, linux-fsdevel, viro, daniel,
	Christian König, linux-tegra, alexander.deucher, akpm,
	linux-media

On Fri 10-06-22 14:17:27, Christian König wrote:
> Am 10.06.22 um 13:44 schrieb Michal Hocko:
> > On Fri 10-06-22 12:58:53, Christian König wrote:
> > [SNIP]
> > > > I do realize this is a long term problem and there is a demand for some
> > > > solution at least. I am not sure how to deal with shared resources
> > > > myself. The best approximation I can come up with is to limit the scope
> > > > of the damage into a memcg context. One idea I was playing with (but
> > > > never convinced myself it is really a worth) is to allow a new mode of
> > > > the oom victim selection for the global oom event.
> > And just for the clarity. I have mentioned global oom event here but the
> > concept could be extended to per-memcg oom killer as well.
> 
> Then what exactly do you mean with "limiting the scope of the damage"? Cause
> that doesn't make sense without memcg.

What I meant to say is to use the scheme of the damage control
not only to the global oom situation (on the global shortage of memory)
but also to the memcg oom situation (when the hard limit on a hierarchy
is reached).

[...]
> > The primary question is whether it actually helps much or what kind of
> > scenarios it can help with and whether we can actually do better for
> > those.
> 
> Well, it does help massively with a standard Linux desktop and GPU workloads
> (e.g. games).
> 
> See what currently happens is that when games allocate for example textures
> the memory for that is not accounted against that game. Instead it's usually
> the display server (X or Wayland) which most of the shared resources
> accounts to because it needs to compose a desktop from it and usually also
> mmaps it for fallback CPU operations.

Let me try to understand some more. So the game (or the entity to be
responsible for the resource) doesn't really allocate the memory but it
relies on somebody else (from memcg perspective living in a different
resource domain - i.e. a different memcg) to do that on its behalf.
Correct? If that is the case then that is certainly not fitting into the
memcg model then.
I am not really sure there is any reasonable model where you cannot
really tell who is responsible for the resource.

> So what happens when a games over allocates texture resources is that your
> whole desktop restarts because the compositor is killed. This obviously also
> kills the game, but it would be much nice if we would be more selective
> here.
> 
> For hardware rendering DMA-buf and GPU drivers are used, but for the
> software fallback shmem files is what is used under the hood as far as I
> know. And the underlying problem is the same for both.

For shmem files the end user of the buffer can preallocate and so own
the buffer and be accounted for it.
> 
> > Also do not forget that shared file memory is not the only thing
> > to care about. What about the kernel memory used on behalf of processes?
> 
> Yeah, I'm aware of that as well. But at least inside the GPU drivers we try
> to keep that in a reasonable ratio.
> 
> > Just consider the above mentioned memcg driven model. It doesn't really
> > require to chase specific files and do some arbitrary math to share the
> > responsibility. It has a clear accounting and responsibility model.
> 
> Ok, how does that work then?

The memory is accounted to whoever faults that memory in or to the
allocating context if that is a kernel memory (in most situations).
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 145+ messages in thread

* Re: [PATCH 03/13] mm: shmem: provide oom badness for shmem files
  2022-06-10 14:16                         ` [Intel-gfx] " Michal Hocko
  (?)
  (?)
@ 2022-06-11  8:06                           ` Christian König
  -1 siblings, 0 replies; 145+ messages in thread
From: Christian König @ 2022-06-11  8:06 UTC (permalink / raw)
  To: Michal Hocko, Christian König
  Cc: linux-media, linux-kernel, intel-gfx, amd-gfx, nouveau,
	linux-tegra, linux-fsdevel, linux-mm, alexander.deucher, daniel,
	viro, akpm, hughd, andrey.grodzovsky

Am 10.06.22 um 16:16 schrieb Michal Hocko:
> [...]
>>> The primary question is whether it actually helps much or what kind of
>>> scenarios it can help with and whether we can actually do better for
>>> those.
>> Well, it does help massively with a standard Linux desktop and GPU workloads
>> (e.g. games).
>>
>> See what currently happens is that when games allocate for example textures
>> the memory for that is not accounted against that game. Instead it's usually
>> the display server (X or Wayland) which most of the shared resources
>> accounts to because it needs to compose a desktop from it and usually also
>> mmaps it for fallback CPU operations.
> Let me try to understand some more. So the game (or the entity to be
> responsible for the resource) doesn't really allocate the memory but it
> relies on somebody else (from memcg perspective living in a different
> resource domain - i.e. a different memcg) to do that on its behalf.
> Correct? If that is the case then that is certainly not fitting into the
> memcg model then.

More or less: yes, that is one possible use case.  But we could leave 
that one out since it is not the primary use case.

What happens more is that 99% of the resources are only allocated per 
process, but around 1% are shared with somebody else.

But see two comments below of a better description of the problem I'm 
facing.

> I am not really sure there is any reasonable model where you cannot
> really tell who is responsible for the resource.

Well it would be fine with me to leave out those 1% of resources shared 
with different memcgs.

What breaks my neck are those 99% which are allocated by a game and 
could potentially be shared but are most of the time not.

>> So what happens when a games over allocates texture resources is that your
>> whole desktop restarts because the compositor is killed. This obviously also
>> kills the game, but it would be much nice if we would be more selective
>> here.
>>
>> For hardware rendering DMA-buf and GPU drivers are used, but for the
>> software fallback shmem files is what is used under the hood as far as I
>> know. And the underlying problem is the same for both.
> For shmem files the end user of the buffer can preallocate and so own
> the buffer and be accounted for it.

The problem is just that it can easily happen that one process is 
allocating the resource and a different one freeing it.

So just imaging the following example: Process opens X window, get 
reference to the handle of the buffer backing this window for drawing, 
tells X to close the window again and then a bit later closes the buffer 
handle.

In this example the X server would be charged allocating the buffer and 
the client (which is most likely in a different memcg group) is charged 
freeing it.

I could of course add something to struct page to track which memcg (or 
process) it was charged against, but extending struct page is most 
likely a no-go.

Alternative I could try to track the "owner" of a buffer (e.g. a shmem 
file), but then it can happen that one processes creates the object and 
another one is writing to it and actually allocating the memory.

>>> Also do not forget that shared file memory is not the only thing
>>> to care about. What about the kernel memory used on behalf of processes?
>> Yeah, I'm aware of that as well. But at least inside the GPU drivers we try
>> to keep that in a reasonable ratio.
>>
>>> Just consider the above mentioned memcg driven model. It doesn't really
>>> require to chase specific files and do some arbitrary math to share the
>>> responsibility. It has a clear accounting and responsibility model.
>> Ok, how does that work then?
> The memory is accounted to whoever faults that memory in or to the
> allocating context if that is a kernel memory (in most situations).

That's what I had in mind as well. Problem with this approach is that 
file descriptors are currently not informed that they are shared between 
processes.

So to make this work we would need something like attach/detach to 
process in struct file_operations.

And as I noted, this happens rather often. For example a game which 
renders 120 frames per second needs to transfer 120 buffers per second 
between client and X.

So this is not something which could take a lot of time and the file 
descriptor tracking structures in the Linux kernel are not made for this 
either.

I think for now I will try something like this specific for DRM drivers. 
That doesn't solve the shmem file problem, but it at least gives me 
something at hand for the accelerated Linux desktop case.

Regards,
Christian.

^ permalink raw reply	[flat|nested] 145+ messages in thread

* Re: [Nouveau] [PATCH 03/13] mm: shmem: provide oom badness for shmem files
@ 2022-06-11  8:06                           ` Christian König
  0 siblings, 0 replies; 145+ messages in thread
From: Christian König @ 2022-06-11  8:06 UTC (permalink / raw)
  To: Michal Hocko, Christian König
  Cc: andrey.grodzovsky, linux-mm, nouveau, intel-gfx, hughd,
	linux-kernel, amd-gfx, linux-fsdevel, viro, daniel, linux-tegra,
	alexander.deucher, akpm, linux-media

Am 10.06.22 um 16:16 schrieb Michal Hocko:
> [...]
>>> The primary question is whether it actually helps much or what kind of
>>> scenarios it can help with and whether we can actually do better for
>>> those.
>> Well, it does help massively with a standard Linux desktop and GPU workloads
>> (e.g. games).
>>
>> See what currently happens is that when games allocate for example textures
>> the memory for that is not accounted against that game. Instead it's usually
>> the display server (X or Wayland) which most of the shared resources
>> accounts to because it needs to compose a desktop from it and usually also
>> mmaps it for fallback CPU operations.
> Let me try to understand some more. So the game (or the entity to be
> responsible for the resource) doesn't really allocate the memory but it
> relies on somebody else (from memcg perspective living in a different
> resource domain - i.e. a different memcg) to do that on its behalf.
> Correct? If that is the case then that is certainly not fitting into the
> memcg model then.

More or less: yes, that is one possible use case.  But we could leave 
that one out since it is not the primary use case.

What happens more is that 99% of the resources are only allocated per 
process, but around 1% are shared with somebody else.

But see two comments below of a better description of the problem I'm 
facing.

> I am not really sure there is any reasonable model where you cannot
> really tell who is responsible for the resource.

Well it would be fine with me to leave out those 1% of resources shared 
with different memcgs.

What breaks my neck are those 99% which are allocated by a game and 
could potentially be shared but are most of the time not.

>> So what happens when a games over allocates texture resources is that your
>> whole desktop restarts because the compositor is killed. This obviously also
>> kills the game, but it would be much nice if we would be more selective
>> here.
>>
>> For hardware rendering DMA-buf and GPU drivers are used, but for the
>> software fallback shmem files is what is used under the hood as far as I
>> know. And the underlying problem is the same for both.
> For shmem files the end user of the buffer can preallocate and so own
> the buffer and be accounted for it.

The problem is just that it can easily happen that one process is 
allocating the resource and a different one freeing it.

So just imaging the following example: Process opens X window, get 
reference to the handle of the buffer backing this window for drawing, 
tells X to close the window again and then a bit later closes the buffer 
handle.

In this example the X server would be charged allocating the buffer and 
the client (which is most likely in a different memcg group) is charged 
freeing it.

I could of course add something to struct page to track which memcg (or 
process) it was charged against, but extending struct page is most 
likely a no-go.

Alternative I could try to track the "owner" of a buffer (e.g. a shmem 
file), but then it can happen that one processes creates the object and 
another one is writing to it and actually allocating the memory.

>>> Also do not forget that shared file memory is not the only thing
>>> to care about. What about the kernel memory used on behalf of processes?
>> Yeah, I'm aware of that as well. But at least inside the GPU drivers we try
>> to keep that in a reasonable ratio.
>>
>>> Just consider the above mentioned memcg driven model. It doesn't really
>>> require to chase specific files and do some arbitrary math to share the
>>> responsibility. It has a clear accounting and responsibility model.
>> Ok, how does that work then?
> The memory is accounted to whoever faults that memory in or to the
> allocating context if that is a kernel memory (in most situations).

That's what I had in mind as well. Problem with this approach is that 
file descriptors are currently not informed that they are shared between 
processes.

So to make this work we would need something like attach/detach to 
process in struct file_operations.

And as I noted, this happens rather often. For example a game which 
renders 120 frames per second needs to transfer 120 buffers per second 
between client and X.

So this is not something which could take a lot of time and the file 
descriptor tracking structures in the Linux kernel are not made for this 
either.

I think for now I will try something like this specific for DRM drivers. 
That doesn't solve the shmem file problem, but it at least gives me 
something at hand for the accelerated Linux desktop case.

Regards,
Christian.

^ permalink raw reply	[flat|nested] 145+ messages in thread

* Re: [Intel-gfx] [PATCH 03/13] mm: shmem: provide oom badness for shmem files
@ 2022-06-11  8:06                           ` Christian König
  0 siblings, 0 replies; 145+ messages in thread
From: Christian König @ 2022-06-11  8:06 UTC (permalink / raw)
  To: Michal Hocko, Christian König
  Cc: andrey.grodzovsky, linux-mm, nouveau, intel-gfx, hughd,
	linux-kernel, amd-gfx, linux-fsdevel, viro, linux-tegra,
	alexander.deucher, akpm, linux-media

Am 10.06.22 um 16:16 schrieb Michal Hocko:
> [...]
>>> The primary question is whether it actually helps much or what kind of
>>> scenarios it can help with and whether we can actually do better for
>>> those.
>> Well, it does help massively with a standard Linux desktop and GPU workloads
>> (e.g. games).
>>
>> See what currently happens is that when games allocate for example textures
>> the memory for that is not accounted against that game. Instead it's usually
>> the display server (X or Wayland) which most of the shared resources
>> accounts to because it needs to compose a desktop from it and usually also
>> mmaps it for fallback CPU operations.
> Let me try to understand some more. So the game (or the entity to be
> responsible for the resource) doesn't really allocate the memory but it
> relies on somebody else (from memcg perspective living in a different
> resource domain - i.e. a different memcg) to do that on its behalf.
> Correct? If that is the case then that is certainly not fitting into the
> memcg model then.

More or less: yes, that is one possible use case.  But we could leave 
that one out since it is not the primary use case.

What happens more is that 99% of the resources are only allocated per 
process, but around 1% are shared with somebody else.

But see two comments below of a better description of the problem I'm 
facing.

> I am not really sure there is any reasonable model where you cannot
> really tell who is responsible for the resource.

Well it would be fine with me to leave out those 1% of resources shared 
with different memcgs.

What breaks my neck are those 99% which are allocated by a game and 
could potentially be shared but are most of the time not.

>> So what happens when a games over allocates texture resources is that your
>> whole desktop restarts because the compositor is killed. This obviously also
>> kills the game, but it would be much nice if we would be more selective
>> here.
>>
>> For hardware rendering DMA-buf and GPU drivers are used, but for the
>> software fallback shmem files is what is used under the hood as far as I
>> know. And the underlying problem is the same for both.
> For shmem files the end user of the buffer can preallocate and so own
> the buffer and be accounted for it.

The problem is just that it can easily happen that one process is 
allocating the resource and a different one freeing it.

So just imaging the following example: Process opens X window, get 
reference to the handle of the buffer backing this window for drawing, 
tells X to close the window again and then a bit later closes the buffer 
handle.

In this example the X server would be charged allocating the buffer and 
the client (which is most likely in a different memcg group) is charged 
freeing it.

I could of course add something to struct page to track which memcg (or 
process) it was charged against, but extending struct page is most 
likely a no-go.

Alternative I could try to track the "owner" of a buffer (e.g. a shmem 
file), but then it can happen that one processes creates the object and 
another one is writing to it and actually allocating the memory.

>>> Also do not forget that shared file memory is not the only thing
>>> to care about. What about the kernel memory used on behalf of processes?
>> Yeah, I'm aware of that as well. But at least inside the GPU drivers we try
>> to keep that in a reasonable ratio.
>>
>>> Just consider the above mentioned memcg driven model. It doesn't really
>>> require to chase specific files and do some arbitrary math to share the
>>> responsibility. It has a clear accounting and responsibility model.
>> Ok, how does that work then?
> The memory is accounted to whoever faults that memory in or to the
> allocating context if that is a kernel memory (in most situations).

That's what I had in mind as well. Problem with this approach is that 
file descriptors are currently not informed that they are shared between 
processes.

So to make this work we would need something like attach/detach to 
process in struct file_operations.

And as I noted, this happens rather often. For example a game which 
renders 120 frames per second needs to transfer 120 buffers per second 
between client and X.

So this is not something which could take a lot of time and the file 
descriptor tracking structures in the Linux kernel are not made for this 
either.

I think for now I will try something like this specific for DRM drivers. 
That doesn't solve the shmem file problem, but it at least gives me 
something at hand for the accelerated Linux desktop case.

Regards,
Christian.

^ permalink raw reply	[flat|nested] 145+ messages in thread

* Re: [PATCH 03/13] mm: shmem: provide oom badness for shmem files
@ 2022-06-11  8:06                           ` Christian König
  0 siblings, 0 replies; 145+ messages in thread
From: Christian König @ 2022-06-11  8:06 UTC (permalink / raw)
  To: Michal Hocko, Christian König
  Cc: andrey.grodzovsky, linux-mm, nouveau, intel-gfx, hughd,
	linux-kernel, amd-gfx, linux-fsdevel, viro, daniel, linux-tegra,
	alexander.deucher, akpm, linux-media

Am 10.06.22 um 16:16 schrieb Michal Hocko:
> [...]
>>> The primary question is whether it actually helps much or what kind of
>>> scenarios it can help with and whether we can actually do better for
>>> those.
>> Well, it does help massively with a standard Linux desktop and GPU workloads
>> (e.g. games).
>>
>> See what currently happens is that when games allocate for example textures
>> the memory for that is not accounted against that game. Instead it's usually
>> the display server (X or Wayland) which most of the shared resources
>> accounts to because it needs to compose a desktop from it and usually also
>> mmaps it for fallback CPU operations.
> Let me try to understand some more. So the game (or the entity to be
> responsible for the resource) doesn't really allocate the memory but it
> relies on somebody else (from memcg perspective living in a different
> resource domain - i.e. a different memcg) to do that on its behalf.
> Correct? If that is the case then that is certainly not fitting into the
> memcg model then.

More or less: yes, that is one possible use case.  But we could leave 
that one out since it is not the primary use case.

What happens more is that 99% of the resources are only allocated per 
process, but around 1% are shared with somebody else.

But see two comments below of a better description of the problem I'm 
facing.

> I am not really sure there is any reasonable model where you cannot
> really tell who is responsible for the resource.

Well it would be fine with me to leave out those 1% of resources shared 
with different memcgs.

What breaks my neck are those 99% which are allocated by a game and 
could potentially be shared but are most of the time not.

>> So what happens when a games over allocates texture resources is that your
>> whole desktop restarts because the compositor is killed. This obviously also
>> kills the game, but it would be much nice if we would be more selective
>> here.
>>
>> For hardware rendering DMA-buf and GPU drivers are used, but for the
>> software fallback shmem files is what is used under the hood as far as I
>> know. And the underlying problem is the same for both.
> For shmem files the end user of the buffer can preallocate and so own
> the buffer and be accounted for it.

The problem is just that it can easily happen that one process is 
allocating the resource and a different one freeing it.

So just imaging the following example: Process opens X window, get 
reference to the handle of the buffer backing this window for drawing, 
tells X to close the window again and then a bit later closes the buffer 
handle.

In this example the X server would be charged allocating the buffer and 
the client (which is most likely in a different memcg group) is charged 
freeing it.

I could of course add something to struct page to track which memcg (or 
process) it was charged against, but extending struct page is most 
likely a no-go.

Alternative I could try to track the "owner" of a buffer (e.g. a shmem 
file), but then it can happen that one processes creates the object and 
another one is writing to it and actually allocating the memory.

>>> Also do not forget that shared file memory is not the only thing
>>> to care about. What about the kernel memory used on behalf of processes?
>> Yeah, I'm aware of that as well. But at least inside the GPU drivers we try
>> to keep that in a reasonable ratio.
>>
>>> Just consider the above mentioned memcg driven model. It doesn't really
>>> require to chase specific files and do some arbitrary math to share the
>>> responsibility. It has a clear accounting and responsibility model.
>> Ok, how does that work then?
> The memory is accounted to whoever faults that memory in or to the
> allocating context if that is a kernel memory (in most situations).

That's what I had in mind as well. Problem with this approach is that 
file descriptors are currently not informed that they are shared between 
processes.

So to make this work we would need something like attach/detach to 
process in struct file_operations.

And as I noted, this happens rather often. For example a game which 
renders 120 frames per second needs to transfer 120 buffers per second 
between client and X.

So this is not something which could take a lot of time and the file 
descriptor tracking structures in the Linux kernel are not made for this 
either.

I think for now I will try something like this specific for DRM drivers. 
That doesn't solve the shmem file problem, but it at least gives me 
something at hand for the accelerated Linux desktop case.

Regards,
Christian.

^ permalink raw reply	[flat|nested] 145+ messages in thread

* Re: [PATCH 03/13] mm: shmem: provide oom badness for shmem files
  2022-06-11  8:06                           ` [Nouveau] " Christian König
  (?)
  (?)
@ 2022-06-13  7:45                             ` Michal Hocko
  -1 siblings, 0 replies; 145+ messages in thread
From: Michal Hocko @ 2022-06-13  7:45 UTC (permalink / raw)
  To: Christian König
  Cc: Christian König, linux-media, linux-kernel, intel-gfx,
	amd-gfx, nouveau, linux-tegra, linux-fsdevel, linux-mm,
	alexander.deucher, daniel, viro, akpm, hughd, andrey.grodzovsky

On Sat 11-06-22 10:06:18, Christian König wrote:
> Am 10.06.22 um 16:16 schrieb Michal Hocko:
[...]
> > > So what happens when a games over allocates texture resources is that your
> > > whole desktop restarts because the compositor is killed. This obviously also
> > > kills the game, but it would be much nice if we would be more selective
> > > here.
> > > 
> > > For hardware rendering DMA-buf and GPU drivers are used, but for the
> > > software fallback shmem files is what is used under the hood as far as I
> > > know. And the underlying problem is the same for both.
> > For shmem files the end user of the buffer can preallocate and so own
> > the buffer and be accounted for it.
> 
> The problem is just that it can easily happen that one process is allocating
> the resource and a different one freeing it.
> 
> So just imaging the following example: Process opens X window, get reference
> to the handle of the buffer backing this window for drawing, tells X to
> close the window again and then a bit later closes the buffer handle.
> 
> In this example the X server would be charged allocating the buffer and the
> client (which is most likely in a different memcg group) is charged freeing
> it.

Thanks for the clarification.

> I could of course add something to struct page to track which memcg (or
> process) it was charged against, but extending struct page is most likely a
> no-go.

Struct page already maintains is memcg. The one which has charged it and
it will stay constatnt throughout of the allocation lifetime (cgroup v1
has a concept of the charge migration but this hasn't been adopted in
v2).

We have a concept of active_memcg which allows to charge against a
different memcg than the allocating context. From your example above I
do not think this is really usable for the described usecase as the X is
not aware where the request comes from?

> Alternative I could try to track the "owner" of a buffer (e.g. a shmem
> file), but then it can happen that one processes creates the object and
> another one is writing to it and actually allocating the memory.

If you can enforce that the owner is really responsible for the
allocation then all should be fine. That would require MAP_POPULATE like
semantic and I suspect this is not really feasible with the existing
userspace. It would be certainly hard to enforce for bad players.
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 145+ messages in thread

* Re: [Intel-gfx] [PATCH 03/13] mm: shmem: provide oom badness for shmem files
@ 2022-06-13  7:45                             ` Michal Hocko
  0 siblings, 0 replies; 145+ messages in thread
From: Michal Hocko @ 2022-06-13  7:45 UTC (permalink / raw)
  To: Christian König
  Cc: andrey.grodzovsky, linux-mm, nouveau, intel-gfx, hughd,
	linux-kernel, amd-gfx, linux-fsdevel, viro, linux-tegra,
	alexander.deucher, akpm, Christian König, linux-media

On Sat 11-06-22 10:06:18, Christian König wrote:
> Am 10.06.22 um 16:16 schrieb Michal Hocko:
[...]
> > > So what happens when a games over allocates texture resources is that your
> > > whole desktop restarts because the compositor is killed. This obviously also
> > > kills the game, but it would be much nice if we would be more selective
> > > here.
> > > 
> > > For hardware rendering DMA-buf and GPU drivers are used, but for the
> > > software fallback shmem files is what is used under the hood as far as I
> > > know. And the underlying problem is the same for both.
> > For shmem files the end user of the buffer can preallocate and so own
> > the buffer and be accounted for it.
> 
> The problem is just that it can easily happen that one process is allocating
> the resource and a different one freeing it.
> 
> So just imaging the following example: Process opens X window, get reference
> to the handle of the buffer backing this window for drawing, tells X to
> close the window again and then a bit later closes the buffer handle.
> 
> In this example the X server would be charged allocating the buffer and the
> client (which is most likely in a different memcg group) is charged freeing
> it.

Thanks for the clarification.

> I could of course add something to struct page to track which memcg (or
> process) it was charged against, but extending struct page is most likely a
> no-go.

Struct page already maintains is memcg. The one which has charged it and
it will stay constatnt throughout of the allocation lifetime (cgroup v1
has a concept of the charge migration but this hasn't been adopted in
v2).

We have a concept of active_memcg which allows to charge against a
different memcg than the allocating context. From your example above I
do not think this is really usable for the described usecase as the X is
not aware where the request comes from?

> Alternative I could try to track the "owner" of a buffer (e.g. a shmem
> file), but then it can happen that one processes creates the object and
> another one is writing to it and actually allocating the memory.

If you can enforce that the owner is really responsible for the
allocation then all should be fine. That would require MAP_POPULATE like
semantic and I suspect this is not really feasible with the existing
userspace. It would be certainly hard to enforce for bad players.
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 145+ messages in thread

* Re: [PATCH 03/13] mm: shmem: provide oom badness for shmem files
@ 2022-06-13  7:45                             ` Michal Hocko
  0 siblings, 0 replies; 145+ messages in thread
From: Michal Hocko @ 2022-06-13  7:45 UTC (permalink / raw)
  To: Christian König
  Cc: andrey.grodzovsky, linux-mm, nouveau, intel-gfx, hughd,
	linux-kernel, amd-gfx, linux-fsdevel, viro, daniel, linux-tegra,
	alexander.deucher, akpm, Christian König, linux-media

On Sat 11-06-22 10:06:18, Christian König wrote:
> Am 10.06.22 um 16:16 schrieb Michal Hocko:
[...]
> > > So what happens when a games over allocates texture resources is that your
> > > whole desktop restarts because the compositor is killed. This obviously also
> > > kills the game, but it would be much nice if we would be more selective
> > > here.
> > > 
> > > For hardware rendering DMA-buf and GPU drivers are used, but for the
> > > software fallback shmem files is what is used under the hood as far as I
> > > know. And the underlying problem is the same for both.
> > For shmem files the end user of the buffer can preallocate and so own
> > the buffer and be accounted for it.
> 
> The problem is just that it can easily happen that one process is allocating
> the resource and a different one freeing it.
> 
> So just imaging the following example: Process opens X window, get reference
> to the handle of the buffer backing this window for drawing, tells X to
> close the window again and then a bit later closes the buffer handle.
> 
> In this example the X server would be charged allocating the buffer and the
> client (which is most likely in a different memcg group) is charged freeing
> it.

Thanks for the clarification.

> I could of course add something to struct page to track which memcg (or
> process) it was charged against, but extending struct page is most likely a
> no-go.

Struct page already maintains is memcg. The one which has charged it and
it will stay constatnt throughout of the allocation lifetime (cgroup v1
has a concept of the charge migration but this hasn't been adopted in
v2).

We have a concept of active_memcg which allows to charge against a
different memcg than the allocating context. From your example above I
do not think this is really usable for the described usecase as the X is
not aware where the request comes from?

> Alternative I could try to track the "owner" of a buffer (e.g. a shmem
> file), but then it can happen that one processes creates the object and
> another one is writing to it and actually allocating the memory.

If you can enforce that the owner is really responsible for the
allocation then all should be fine. That would require MAP_POPULATE like
semantic and I suspect this is not really feasible with the existing
userspace. It would be certainly hard to enforce for bad players.
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 145+ messages in thread

* Re: [Nouveau] [PATCH 03/13] mm: shmem: provide oom badness for shmem files
@ 2022-06-13  7:45                             ` Michal Hocko
  0 siblings, 0 replies; 145+ messages in thread
From: Michal Hocko @ 2022-06-13  7:45 UTC (permalink / raw)
  To: Christian König
  Cc: andrey.grodzovsky, linux-mm, nouveau, intel-gfx, hughd,
	linux-kernel, amd-gfx, linux-fsdevel, viro, daniel, linux-tegra,
	alexander.deucher, akpm, Christian König, linux-media

On Sat 11-06-22 10:06:18, Christian König wrote:
> Am 10.06.22 um 16:16 schrieb Michal Hocko:
[...]
> > > So what happens when a games over allocates texture resources is that your
> > > whole desktop restarts because the compositor is killed. This obviously also
> > > kills the game, but it would be much nice if we would be more selective
> > > here.
> > > 
> > > For hardware rendering DMA-buf and GPU drivers are used, but for the
> > > software fallback shmem files is what is used under the hood as far as I
> > > know. And the underlying problem is the same for both.
> > For shmem files the end user of the buffer can preallocate and so own
> > the buffer and be accounted for it.
> 
> The problem is just that it can easily happen that one process is allocating
> the resource and a different one freeing it.
> 
> So just imaging the following example: Process opens X window, get reference
> to the handle of the buffer backing this window for drawing, tells X to
> close the window again and then a bit later closes the buffer handle.
> 
> In this example the X server would be charged allocating the buffer and the
> client (which is most likely in a different memcg group) is charged freeing
> it.

Thanks for the clarification.

> I could of course add something to struct page to track which memcg (or
> process) it was charged against, but extending struct page is most likely a
> no-go.

Struct page already maintains is memcg. The one which has charged it and
it will stay constatnt throughout of the allocation lifetime (cgroup v1
has a concept of the charge migration but this hasn't been adopted in
v2).

We have a concept of active_memcg which allows to charge against a
different memcg than the allocating context. From your example above I
do not think this is really usable for the described usecase as the X is
not aware where the request comes from?

> Alternative I could try to track the "owner" of a buffer (e.g. a shmem
> file), but then it can happen that one processes creates the object and
> another one is writing to it and actually allocating the memory.

If you can enforce that the owner is really responsible for the
allocation then all should be fine. That would require MAP_POPULATE like
semantic and I suspect this is not really feasible with the existing
userspace. It would be certainly hard to enforce for bad players.
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 145+ messages in thread

* Re: [PATCH 03/13] mm: shmem: provide oom badness for shmem files
  2022-06-11  8:06                           ` [Nouveau] " Christian König
  (?)
  (?)
@ 2022-06-13  9:08                             ` Michel Dänzer
  -1 siblings, 0 replies; 145+ messages in thread
From: Michel Dänzer @ 2022-06-13  9:08 UTC (permalink / raw)
  To: Christian König, Michal Hocko, Christian König
  Cc: andrey.grodzovsky, linux-mm, nouveau, intel-gfx, hughd,
	linux-kernel, amd-gfx, linux-fsdevel, viro, daniel, linux-tegra,
	alexander.deucher, akpm, linux-media

On 2022-06-11 10:06, Christian König wrote:
> Am 10.06.22 um 16:16 schrieb Michal Hocko:
>> [...]
>>>> Just consider the above mentioned memcg driven model. It doesn't really
>>>> require to chase specific files and do some arbitrary math to share the
>>>> responsibility. It has a clear accounting and responsibility model.
>>> Ok, how does that work then?
>> The memory is accounted to whoever faults that memory in or to the
>> allocating context if that is a kernel memory (in most situations).
> 
> That's what I had in mind as well. Problem with this approach is that file descriptors are currently not informed that they are shared between processes.
> 
> So to make this work we would need something like attach/detach to process in struct file_operations.
> 
> And as I noted, this happens rather often. For example a game which renders 120 frames per second needs to transfer 120 buffers per second between client and X.

FWIW, in the steady state, the game will cycle between a small (generally 2-5) set of buffers. The game will not cause new buffers to be exported & imported for every frame.

In general, I'd expect dma-buf export & import to happen relatively rarely, e.g. when a window is opened or resized.


-- 
Earthling Michel Dänzer            |                  https://redhat.com
Libre software enthusiast          |         Mesa and Xwayland developer

^ permalink raw reply	[flat|nested] 145+ messages in thread

* Re: [PATCH 03/13] mm: shmem: provide oom badness for shmem files
@ 2022-06-13  9:08                             ` Michel Dänzer
  0 siblings, 0 replies; 145+ messages in thread
From: Michel Dänzer @ 2022-06-13  9:08 UTC (permalink / raw)
  To: Christian König, Michal Hocko, Christian König
  Cc: andrey.grodzovsky, linux-tegra, nouveau, intel-gfx, hughd,
	linux-kernel, amd-gfx, linux-mm, viro, daniel, linux-fsdevel,
	alexander.deucher, akpm, linux-media

On 2022-06-11 10:06, Christian König wrote:
> Am 10.06.22 um 16:16 schrieb Michal Hocko:
>> [...]
>>>> Just consider the above mentioned memcg driven model. It doesn't really
>>>> require to chase specific files and do some arbitrary math to share the
>>>> responsibility. It has a clear accounting and responsibility model.
>>> Ok, how does that work then?
>> The memory is accounted to whoever faults that memory in or to the
>> allocating context if that is a kernel memory (in most situations).
> 
> That's what I had in mind as well. Problem with this approach is that file descriptors are currently not informed that they are shared between processes.
> 
> So to make this work we would need something like attach/detach to process in struct file_operations.
> 
> And as I noted, this happens rather often. For example a game which renders 120 frames per second needs to transfer 120 buffers per second between client and X.

FWIW, in the steady state, the game will cycle between a small (generally 2-5) set of buffers. The game will not cause new buffers to be exported & imported for every frame.

In general, I'd expect dma-buf export & import to happen relatively rarely, e.g. when a window is opened or resized.


-- 
Earthling Michel Dänzer            |                  https://redhat.com
Libre software enthusiast          |         Mesa and Xwayland developer

^ permalink raw reply	[flat|nested] 145+ messages in thread

* Re: [Intel-gfx] [PATCH 03/13] mm: shmem: provide oom badness for shmem files
@ 2022-06-13  9:08                             ` Michel Dänzer
  0 siblings, 0 replies; 145+ messages in thread
From: Michel Dänzer @ 2022-06-13  9:08 UTC (permalink / raw)
  To: Christian König, Michal Hocko, Christian König
  Cc: andrey.grodzovsky, linux-tegra, nouveau, intel-gfx, hughd,
	linux-kernel, amd-gfx, linux-mm, viro, linux-fsdevel,
	alexander.deucher, akpm, linux-media

On 2022-06-11 10:06, Christian König wrote:
> Am 10.06.22 um 16:16 schrieb Michal Hocko:
>> [...]
>>>> Just consider the above mentioned memcg driven model. It doesn't really
>>>> require to chase specific files and do some arbitrary math to share the
>>>> responsibility. It has a clear accounting and responsibility model.
>>> Ok, how does that work then?
>> The memory is accounted to whoever faults that memory in or to the
>> allocating context if that is a kernel memory (in most situations).
> 
> That's what I had in mind as well. Problem with this approach is that file descriptors are currently not informed that they are shared between processes.
> 
> So to make this work we would need something like attach/detach to process in struct file_operations.
> 
> And as I noted, this happens rather often. For example a game which renders 120 frames per second needs to transfer 120 buffers per second between client and X.

FWIW, in the steady state, the game will cycle between a small (generally 2-5) set of buffers. The game will not cause new buffers to be exported & imported for every frame.

In general, I'd expect dma-buf export & import to happen relatively rarely, e.g. when a window is opened or resized.


-- 
Earthling Michel Dänzer            |                  https://redhat.com
Libre software enthusiast          |         Mesa and Xwayland developer

^ permalink raw reply	[flat|nested] 145+ messages in thread

* Re: [Nouveau] [PATCH 03/13] mm: shmem: provide oom badness for shmem files
@ 2022-06-13  9:08                             ` Michel Dänzer
  0 siblings, 0 replies; 145+ messages in thread
From: Michel Dänzer @ 2022-06-13  9:08 UTC (permalink / raw)
  To: Christian König, Michal Hocko, Christian König
  Cc: andrey.grodzovsky, linux-tegra, nouveau, intel-gfx, hughd,
	linux-kernel, amd-gfx, linux-mm, viro, daniel, linux-fsdevel,
	alexander.deucher, akpm, linux-media

On 2022-06-11 10:06, Christian König wrote:
> Am 10.06.22 um 16:16 schrieb Michal Hocko:
>> [...]
>>>> Just consider the above mentioned memcg driven model. It doesn't really
>>>> require to chase specific files and do some arbitrary math to share the
>>>> responsibility. It has a clear accounting and responsibility model.
>>> Ok, how does that work then?
>> The memory is accounted to whoever faults that memory in or to the
>> allocating context if that is a kernel memory (in most situations).
> 
> That's what I had in mind as well. Problem with this approach is that file descriptors are currently not informed that they are shared between processes.
> 
> So to make this work we would need something like attach/detach to process in struct file_operations.
> 
> And as I noted, this happens rather often. For example a game which renders 120 frames per second needs to transfer 120 buffers per second between client and X.

FWIW, in the steady state, the game will cycle between a small (generally 2-5) set of buffers. The game will not cause new buffers to be exported & imported for every frame.

In general, I'd expect dma-buf export & import to happen relatively rarely, e.g. when a window is opened or resized.


-- 
Earthling Michel Dänzer            |                  https://redhat.com
Libre software enthusiast          |         Mesa and Xwayland developer

^ permalink raw reply	[flat|nested] 145+ messages in thread

* Re: [PATCH 03/13] mm: shmem: provide oom badness for shmem files
  2022-06-13  9:08                             ` Michel Dänzer
  (?)
  (?)
@ 2022-06-13  9:11                               ` Christian König
  -1 siblings, 0 replies; 145+ messages in thread
From: Christian König @ 2022-06-13  9:11 UTC (permalink / raw)
  To: Michel Dänzer, Christian König, Michal Hocko
  Cc: andrey.grodzovsky, linux-mm, nouveau, intel-gfx, hughd,
	linux-kernel, amd-gfx, linux-fsdevel, viro, daniel, linux-tegra,
	alexander.deucher, akpm, linux-media

Am 13.06.22 um 11:08 schrieb Michel Dänzer:
> On 2022-06-11 10:06, Christian König wrote:
>> Am 10.06.22 um 16:16 schrieb Michal Hocko:
>>> [...]
>>>>> Just consider the above mentioned memcg driven model. It doesn't really
>>>>> require to chase specific files and do some arbitrary math to share the
>>>>> responsibility. It has a clear accounting and responsibility model.
>>>> Ok, how does that work then?
>>> The memory is accounted to whoever faults that memory in or to the
>>> allocating context if that is a kernel memory (in most situations).
>> That's what I had in mind as well. Problem with this approach is that file descriptors are currently not informed that they are shared between processes.
>>
>> So to make this work we would need something like attach/detach to process in struct file_operations.
>>
>> And as I noted, this happens rather often. For example a game which renders 120 frames per second needs to transfer 120 buffers per second between client and X.
> FWIW, in the steady state, the game will cycle between a small (generally 2-5) set of buffers. The game will not cause new buffers to be exported & imported for every frame.
>
> In general, I'd expect dma-buf export & import to happen relatively rarely, e.g. when a window is opened or resized.

Yeah, on a normal Linux desktop. Just unfortunately not on Android :)

Anyway even when this only happens on game start we can't go over all 
the processes/fds and check where a DMA-buf is opened to account this 
against each process.

We would need to add callbacks for this to make it work halve way reliable.

Christian.

^ permalink raw reply	[flat|nested] 145+ messages in thread

* Re: [Nouveau] [PATCH 03/13] mm: shmem: provide oom badness for shmem files
@ 2022-06-13  9:11                               ` Christian König
  0 siblings, 0 replies; 145+ messages in thread
From: Christian König @ 2022-06-13  9:11 UTC (permalink / raw)
  To: Michel Dänzer, Christian König, Michal Hocko
  Cc: andrey.grodzovsky, linux-tegra, nouveau, intel-gfx, hughd,
	linux-kernel, amd-gfx, linux-mm, viro, daniel, linux-fsdevel,
	alexander.deucher, akpm, linux-media

Am 13.06.22 um 11:08 schrieb Michel Dänzer:
> On 2022-06-11 10:06, Christian König wrote:
>> Am 10.06.22 um 16:16 schrieb Michal Hocko:
>>> [...]
>>>>> Just consider the above mentioned memcg driven model. It doesn't really
>>>>> require to chase specific files and do some arbitrary math to share the
>>>>> responsibility. It has a clear accounting and responsibility model.
>>>> Ok, how does that work then?
>>> The memory is accounted to whoever faults that memory in or to the
>>> allocating context if that is a kernel memory (in most situations).
>> That's what I had in mind as well. Problem with this approach is that file descriptors are currently not informed that they are shared between processes.
>>
>> So to make this work we would need something like attach/detach to process in struct file_operations.
>>
>> And as I noted, this happens rather often. For example a game which renders 120 frames per second needs to transfer 120 buffers per second between client and X.
> FWIW, in the steady state, the game will cycle between a small (generally 2-5) set of buffers. The game will not cause new buffers to be exported & imported for every frame.
>
> In general, I'd expect dma-buf export & import to happen relatively rarely, e.g. when a window is opened or resized.

Yeah, on a normal Linux desktop. Just unfortunately not on Android :)

Anyway even when this only happens on game start we can't go over all 
the processes/fds and check where a DMA-buf is opened to account this 
against each process.

We would need to add callbacks for this to make it work halve way reliable.

Christian.

^ permalink raw reply	[flat|nested] 145+ messages in thread

* Re: [Intel-gfx] [PATCH 03/13] mm: shmem: provide oom badness for shmem files
@ 2022-06-13  9:11                               ` Christian König
  0 siblings, 0 replies; 145+ messages in thread
From: Christian König @ 2022-06-13  9:11 UTC (permalink / raw)
  To: Michel Dänzer, Christian König, Michal Hocko
  Cc: andrey.grodzovsky, linux-tegra, nouveau, intel-gfx, hughd,
	linux-kernel, amd-gfx, linux-mm, viro, linux-fsdevel,
	alexander.deucher, akpm, linux-media

Am 13.06.22 um 11:08 schrieb Michel Dänzer:
> On 2022-06-11 10:06, Christian König wrote:
>> Am 10.06.22 um 16:16 schrieb Michal Hocko:
>>> [...]
>>>>> Just consider the above mentioned memcg driven model. It doesn't really
>>>>> require to chase specific files and do some arbitrary math to share the
>>>>> responsibility. It has a clear accounting and responsibility model.
>>>> Ok, how does that work then?
>>> The memory is accounted to whoever faults that memory in or to the
>>> allocating context if that is a kernel memory (in most situations).
>> That's what I had in mind as well. Problem with this approach is that file descriptors are currently not informed that they are shared between processes.
>>
>> So to make this work we would need something like attach/detach to process in struct file_operations.
>>
>> And as I noted, this happens rather often. For example a game which renders 120 frames per second needs to transfer 120 buffers per second between client and X.
> FWIW, in the steady state, the game will cycle between a small (generally 2-5) set of buffers. The game will not cause new buffers to be exported & imported for every frame.
>
> In general, I'd expect dma-buf export & import to happen relatively rarely, e.g. when a window is opened or resized.

Yeah, on a normal Linux desktop. Just unfortunately not on Android :)

Anyway even when this only happens on game start we can't go over all 
the processes/fds and check where a DMA-buf is opened to account this 
against each process.

We would need to add callbacks for this to make it work halve way reliable.

Christian.

^ permalink raw reply	[flat|nested] 145+ messages in thread

* Re: [PATCH 03/13] mm: shmem: provide oom badness for shmem files
@ 2022-06-13  9:11                               ` Christian König
  0 siblings, 0 replies; 145+ messages in thread
From: Christian König @ 2022-06-13  9:11 UTC (permalink / raw)
  To: Michel Dänzer, Christian König, Michal Hocko
  Cc: andrey.grodzovsky, linux-tegra, nouveau, intel-gfx, hughd,
	linux-kernel, amd-gfx, linux-mm, viro, daniel, linux-fsdevel,
	alexander.deucher, akpm, linux-media

Am 13.06.22 um 11:08 schrieb Michel Dänzer:
> On 2022-06-11 10:06, Christian König wrote:
>> Am 10.06.22 um 16:16 schrieb Michal Hocko:
>>> [...]
>>>>> Just consider the above mentioned memcg driven model. It doesn't really
>>>>> require to chase specific files and do some arbitrary math to share the
>>>>> responsibility. It has a clear accounting and responsibility model.
>>>> Ok, how does that work then?
>>> The memory is accounted to whoever faults that memory in or to the
>>> allocating context if that is a kernel memory (in most situations).
>> That's what I had in mind as well. Problem with this approach is that file descriptors are currently not informed that they are shared between processes.
>>
>> So to make this work we would need something like attach/detach to process in struct file_operations.
>>
>> And as I noted, this happens rather often. For example a game which renders 120 frames per second needs to transfer 120 buffers per second between client and X.
> FWIW, in the steady state, the game will cycle between a small (generally 2-5) set of buffers. The game will not cause new buffers to be exported & imported for every frame.
>
> In general, I'd expect dma-buf export & import to happen relatively rarely, e.g. when a window is opened or resized.

Yeah, on a normal Linux desktop. Just unfortunately not on Android :)

Anyway even when this only happens on game start we can't go over all 
the processes/fds and check where a DMA-buf is opened to account this 
against each process.

We would need to add callbacks for this to make it work halve way reliable.

Christian.

^ permalink raw reply	[flat|nested] 145+ messages in thread

* Re: [PATCH 03/13] mm: shmem: provide oom badness for shmem files
  2022-06-13  7:45                             ` [Intel-gfx] " Michal Hocko
  (?)
  (?)
@ 2022-06-13 11:50                               ` Christian König
  -1 siblings, 0 replies; 145+ messages in thread
From: Christian König @ 2022-06-13 11:50 UTC (permalink / raw)
  To: Michal Hocko, Christian König
  Cc: linux-media, linux-kernel, intel-gfx, amd-gfx, nouveau,
	linux-tegra, linux-fsdevel, linux-mm, alexander.deucher, daniel,
	viro, akpm, hughd, andrey.grodzovsky

Am 13.06.22 um 09:45 schrieb Michal Hocko:
> On Sat 11-06-22 10:06:18, Christian König wrote:
>> Am 10.06.22 um 16:16 schrieb Michal Hocko:
> [...]
>> I could of course add something to struct page to track which memcg (or
>> process) it was charged against, but extending struct page is most likely a
>> no-go.
> Struct page already maintains is memcg. The one which has charged it and
> it will stay constatnt throughout of the allocation lifetime (cgroup v1
> has a concept of the charge migration but this hasn't been adopted in
> v2).
>
> We have a concept of active_memcg which allows to charge against a
> different memcg than the allocating context. From your example above I
> do not think this is really usable for the described usecase as the X is
> not aware where the request comes from?

Well X/Wayland is aware, but not the underlying kernel drivers.

When X/Wayland would want to forward this information to the kernel we 
would need to extend the existing UAPI quite a bit. And that of course 
doesn't help us at all with existing desktops.

>> Alternative I could try to track the "owner" of a buffer (e.g. a shmem
>> file), but then it can happen that one processes creates the object and
>> another one is writing to it and actually allocating the memory.
> If you can enforce that the owner is really responsible for the
> allocation then all should be fine. That would require MAP_POPULATE like
> semantic and I suspect this is not really feasible with the existing
> userspace. It would be certainly hard to enforce for bad players.

I've tried this today and the result was: "BUG: Bad rss-counter state 
mm:000000008751d9ff type:MM_FILEPAGES val:-571286".

The problem is once more that files are not informed when the process 
clones. So what happened is that somebody called fork() with an 
mm_struct I've accounted my pages to. The result is just that we messed 
up the rss_stats and  the the "BUG..." above.

The key difference between normal allocated pages and the resources here 
is just that we are not bound to an mm_struct in any way.

I could just potentially add a dummy VMA to the mm_struct, but to be 
honest I think that this would just be an absolutely hack.

So I'm running out of ideas how to fix this, except for adding this per 
file oom badness like I proposed.

Regards,
Christian.

^ permalink raw reply	[flat|nested] 145+ messages in thread

* Re: [Nouveau] [PATCH 03/13] mm: shmem: provide oom badness for shmem files
@ 2022-06-13 11:50                               ` Christian König
  0 siblings, 0 replies; 145+ messages in thread
From: Christian König @ 2022-06-13 11:50 UTC (permalink / raw)
  To: Michal Hocko, Christian König
  Cc: andrey.grodzovsky, linux-mm, nouveau, intel-gfx, hughd,
	linux-kernel, amd-gfx, linux-fsdevel, viro, daniel, linux-tegra,
	alexander.deucher, akpm, linux-media

Am 13.06.22 um 09:45 schrieb Michal Hocko:
> On Sat 11-06-22 10:06:18, Christian König wrote:
>> Am 10.06.22 um 16:16 schrieb Michal Hocko:
> [...]
>> I could of course add something to struct page to track which memcg (or
>> process) it was charged against, but extending struct page is most likely a
>> no-go.
> Struct page already maintains is memcg. The one which has charged it and
> it will stay constatnt throughout of the allocation lifetime (cgroup v1
> has a concept of the charge migration but this hasn't been adopted in
> v2).
>
> We have a concept of active_memcg which allows to charge against a
> different memcg than the allocating context. From your example above I
> do not think this is really usable for the described usecase as the X is
> not aware where the request comes from?

Well X/Wayland is aware, but not the underlying kernel drivers.

When X/Wayland would want to forward this information to the kernel we 
would need to extend the existing UAPI quite a bit. And that of course 
doesn't help us at all with existing desktops.

>> Alternative I could try to track the "owner" of a buffer (e.g. a shmem
>> file), but then it can happen that one processes creates the object and
>> another one is writing to it and actually allocating the memory.
> If you can enforce that the owner is really responsible for the
> allocation then all should be fine. That would require MAP_POPULATE like
> semantic and I suspect this is not really feasible with the existing
> userspace. It would be certainly hard to enforce for bad players.

I've tried this today and the result was: "BUG: Bad rss-counter state 
mm:000000008751d9ff type:MM_FILEPAGES val:-571286".

The problem is once more that files are not informed when the process 
clones. So what happened is that somebody called fork() with an 
mm_struct I've accounted my pages to. The result is just that we messed 
up the rss_stats and  the the "BUG..." above.

The key difference between normal allocated pages and the resources here 
is just that we are not bound to an mm_struct in any way.

I could just potentially add a dummy VMA to the mm_struct, but to be 
honest I think that this would just be an absolutely hack.

So I'm running out of ideas how to fix this, except for adding this per 
file oom badness like I proposed.

Regards,
Christian.

^ permalink raw reply	[flat|nested] 145+ messages in thread

* Re: [Intel-gfx] [PATCH 03/13] mm: shmem: provide oom badness for shmem files
@ 2022-06-13 11:50                               ` Christian König
  0 siblings, 0 replies; 145+ messages in thread
From: Christian König @ 2022-06-13 11:50 UTC (permalink / raw)
  To: Michal Hocko, Christian König
  Cc: andrey.grodzovsky, linux-mm, nouveau, intel-gfx, hughd,
	linux-kernel, amd-gfx, linux-fsdevel, viro, linux-tegra,
	alexander.deucher, akpm, linux-media

Am 13.06.22 um 09:45 schrieb Michal Hocko:
> On Sat 11-06-22 10:06:18, Christian König wrote:
>> Am 10.06.22 um 16:16 schrieb Michal Hocko:
> [...]
>> I could of course add something to struct page to track which memcg (or
>> process) it was charged against, but extending struct page is most likely a
>> no-go.
> Struct page already maintains is memcg. The one which has charged it and
> it will stay constatnt throughout of the allocation lifetime (cgroup v1
> has a concept of the charge migration but this hasn't been adopted in
> v2).
>
> We have a concept of active_memcg which allows to charge against a
> different memcg than the allocating context. From your example above I
> do not think this is really usable for the described usecase as the X is
> not aware where the request comes from?

Well X/Wayland is aware, but not the underlying kernel drivers.

When X/Wayland would want to forward this information to the kernel we 
would need to extend the existing UAPI quite a bit. And that of course 
doesn't help us at all with existing desktops.

>> Alternative I could try to track the "owner" of a buffer (e.g. a shmem
>> file), but then it can happen that one processes creates the object and
>> another one is writing to it and actually allocating the memory.
> If you can enforce that the owner is really responsible for the
> allocation then all should be fine. That would require MAP_POPULATE like
> semantic and I suspect this is not really feasible with the existing
> userspace. It would be certainly hard to enforce for bad players.

I've tried this today and the result was: "BUG: Bad rss-counter state 
mm:000000008751d9ff type:MM_FILEPAGES val:-571286".

The problem is once more that files are not informed when the process 
clones. So what happened is that somebody called fork() with an 
mm_struct I've accounted my pages to. The result is just that we messed 
up the rss_stats and  the the "BUG..." above.

The key difference between normal allocated pages and the resources here 
is just that we are not bound to an mm_struct in any way.

I could just potentially add a dummy VMA to the mm_struct, but to be 
honest I think that this would just be an absolutely hack.

So I'm running out of ideas how to fix this, except for adding this per 
file oom badness like I proposed.

Regards,
Christian.

^ permalink raw reply	[flat|nested] 145+ messages in thread

* Re: [PATCH 03/13] mm: shmem: provide oom badness for shmem files
@ 2022-06-13 11:50                               ` Christian König
  0 siblings, 0 replies; 145+ messages in thread
From: Christian König @ 2022-06-13 11:50 UTC (permalink / raw)
  To: Michal Hocko, Christian König
  Cc: andrey.grodzovsky, linux-mm, nouveau, intel-gfx, hughd,
	linux-kernel, amd-gfx, linux-fsdevel, viro, daniel, linux-tegra,
	alexander.deucher, akpm, linux-media

Am 13.06.22 um 09:45 schrieb Michal Hocko:
> On Sat 11-06-22 10:06:18, Christian König wrote:
>> Am 10.06.22 um 16:16 schrieb Michal Hocko:
> [...]
>> I could of course add something to struct page to track which memcg (or
>> process) it was charged against, but extending struct page is most likely a
>> no-go.
> Struct page already maintains is memcg. The one which has charged it and
> it will stay constatnt throughout of the allocation lifetime (cgroup v1
> has a concept of the charge migration but this hasn't been adopted in
> v2).
>
> We have a concept of active_memcg which allows to charge against a
> different memcg than the allocating context. From your example above I
> do not think this is really usable for the described usecase as the X is
> not aware where the request comes from?

Well X/Wayland is aware, but not the underlying kernel drivers.

When X/Wayland would want to forward this information to the kernel we 
would need to extend the existing UAPI quite a bit. And that of course 
doesn't help us at all with existing desktops.

>> Alternative I could try to track the "owner" of a buffer (e.g. a shmem
>> file), but then it can happen that one processes creates the object and
>> another one is writing to it and actually allocating the memory.
> If you can enforce that the owner is really responsible for the
> allocation then all should be fine. That would require MAP_POPULATE like
> semantic and I suspect this is not really feasible with the existing
> userspace. It would be certainly hard to enforce for bad players.

I've tried this today and the result was: "BUG: Bad rss-counter state 
mm:000000008751d9ff type:MM_FILEPAGES val:-571286".

The problem is once more that files are not informed when the process 
clones. So what happened is that somebody called fork() with an 
mm_struct I've accounted my pages to. The result is just that we messed 
up the rss_stats and  the the "BUG..." above.

The key difference between normal allocated pages and the resources here 
is just that we are not bound to an mm_struct in any way.

I could just potentially add a dummy VMA to the mm_struct, but to be 
honest I think that this would just be an absolutely hack.

So I'm running out of ideas how to fix this, except for adding this per 
file oom badness like I proposed.

Regards,
Christian.

^ permalink raw reply	[flat|nested] 145+ messages in thread

* Re: [PATCH 03/13] mm: shmem: provide oom badness for shmem files
  2022-06-13 11:50                               ` [Nouveau] " Christian König
  (?)
  (?)
@ 2022-06-13 12:11                                 ` Michal Hocko
  -1 siblings, 0 replies; 145+ messages in thread
From: Michal Hocko @ 2022-06-13 12:11 UTC (permalink / raw)
  To: Christian König
  Cc: Christian König, linux-media, linux-kernel, intel-gfx,
	amd-gfx, nouveau, linux-tegra, linux-fsdevel, linux-mm,
	alexander.deucher, daniel, viro, akpm, hughd, andrey.grodzovsky

On Mon 13-06-22 13:50:28, Christian König wrote:
> Am 13.06.22 um 09:45 schrieb Michal Hocko:
> > On Sat 11-06-22 10:06:18, Christian König wrote:
> > > Am 10.06.22 um 16:16 schrieb Michal Hocko:
[...]
> > > Alternative I could try to track the "owner" of a buffer (e.g. a shmem
> > > file), but then it can happen that one processes creates the object and
> > > another one is writing to it and actually allocating the memory.
> > If you can enforce that the owner is really responsible for the
> > allocation then all should be fine. That would require MAP_POPULATE like
> > semantic and I suspect this is not really feasible with the existing
> > userspace. It would be certainly hard to enforce for bad players.
> 
> I've tried this today and the result was: "BUG: Bad rss-counter state
> mm:000000008751d9ff type:MM_FILEPAGES val:-571286".
> 
> The problem is once more that files are not informed when the process
> clones. So what happened is that somebody called fork() with an mm_struct
> I've accounted my pages to. The result is just that we messed up the
> rss_stats and  the the "BUG..." above.
> 
> The key difference between normal allocated pages and the resources here is
> just that we are not bound to an mm_struct in any way.

It is not really clear to me what exactly you have tried.
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 145+ messages in thread

* Re: [Intel-gfx] [PATCH 03/13] mm: shmem: provide oom badness for shmem files
@ 2022-06-13 12:11                                 ` Michal Hocko
  0 siblings, 0 replies; 145+ messages in thread
From: Michal Hocko @ 2022-06-13 12:11 UTC (permalink / raw)
  To: Christian König
  Cc: andrey.grodzovsky, linux-mm, nouveau, intel-gfx, hughd,
	linux-kernel, amd-gfx, linux-fsdevel, viro, Christian König,
	linux-tegra, alexander.deucher, akpm, linux-media

On Mon 13-06-22 13:50:28, Christian König wrote:
> Am 13.06.22 um 09:45 schrieb Michal Hocko:
> > On Sat 11-06-22 10:06:18, Christian König wrote:
> > > Am 10.06.22 um 16:16 schrieb Michal Hocko:
[...]
> > > Alternative I could try to track the "owner" of a buffer (e.g. a shmem
> > > file), but then it can happen that one processes creates the object and
> > > another one is writing to it and actually allocating the memory.
> > If you can enforce that the owner is really responsible for the
> > allocation then all should be fine. That would require MAP_POPULATE like
> > semantic and I suspect this is not really feasible with the existing
> > userspace. It would be certainly hard to enforce for bad players.
> 
> I've tried this today and the result was: "BUG: Bad rss-counter state
> mm:000000008751d9ff type:MM_FILEPAGES val:-571286".
> 
> The problem is once more that files are not informed when the process
> clones. So what happened is that somebody called fork() with an mm_struct
> I've accounted my pages to. The result is just that we messed up the
> rss_stats and  the the "BUG..." above.
> 
> The key difference between normal allocated pages and the resources here is
> just that we are not bound to an mm_struct in any way.

It is not really clear to me what exactly you have tried.
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 145+ messages in thread

* Re: [PATCH 03/13] mm: shmem: provide oom badness for shmem files
@ 2022-06-13 12:11                                 ` Michal Hocko
  0 siblings, 0 replies; 145+ messages in thread
From: Michal Hocko @ 2022-06-13 12:11 UTC (permalink / raw)
  To: Christian König
  Cc: andrey.grodzovsky, linux-mm, nouveau, intel-gfx, hughd,
	linux-kernel, amd-gfx, linux-fsdevel, viro, daniel,
	Christian König, linux-tegra, alexander.deucher, akpm,
	linux-media

On Mon 13-06-22 13:50:28, Christian König wrote:
> Am 13.06.22 um 09:45 schrieb Michal Hocko:
> > On Sat 11-06-22 10:06:18, Christian König wrote:
> > > Am 10.06.22 um 16:16 schrieb Michal Hocko:
[...]
> > > Alternative I could try to track the "owner" of a buffer (e.g. a shmem
> > > file), but then it can happen that one processes creates the object and
> > > another one is writing to it and actually allocating the memory.
> > If you can enforce that the owner is really responsible for the
> > allocation then all should be fine. That would require MAP_POPULATE like
> > semantic and I suspect this is not really feasible with the existing
> > userspace. It would be certainly hard to enforce for bad players.
> 
> I've tried this today and the result was: "BUG: Bad rss-counter state
> mm:000000008751d9ff type:MM_FILEPAGES val:-571286".
> 
> The problem is once more that files are not informed when the process
> clones. So what happened is that somebody called fork() with an mm_struct
> I've accounted my pages to. The result is just that we messed up the
> rss_stats and  the the "BUG..." above.
> 
> The key difference between normal allocated pages and the resources here is
> just that we are not bound to an mm_struct in any way.

It is not really clear to me what exactly you have tried.
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 145+ messages in thread

* Re: [Nouveau] [PATCH 03/13] mm: shmem: provide oom badness for shmem files
@ 2022-06-13 12:11                                 ` Michal Hocko
  0 siblings, 0 replies; 145+ messages in thread
From: Michal Hocko @ 2022-06-13 12:11 UTC (permalink / raw)
  To: Christian König
  Cc: andrey.grodzovsky, linux-mm, nouveau, intel-gfx, hughd,
	linux-kernel, amd-gfx, linux-fsdevel, viro, daniel,
	Christian König, linux-tegra, alexander.deucher, akpm,
	linux-media

On Mon 13-06-22 13:50:28, Christian König wrote:
> Am 13.06.22 um 09:45 schrieb Michal Hocko:
> > On Sat 11-06-22 10:06:18, Christian König wrote:
> > > Am 10.06.22 um 16:16 schrieb Michal Hocko:
[...]
> > > Alternative I could try to track the "owner" of a buffer (e.g. a shmem
> > > file), but then it can happen that one processes creates the object and
> > > another one is writing to it and actually allocating the memory.
> > If you can enforce that the owner is really responsible for the
> > allocation then all should be fine. That would require MAP_POPULATE like
> > semantic and I suspect this is not really feasible with the existing
> > userspace. It would be certainly hard to enforce for bad players.
> 
> I've tried this today and the result was: "BUG: Bad rss-counter state
> mm:000000008751d9ff type:MM_FILEPAGES val:-571286".
> 
> The problem is once more that files are not informed when the process
> clones. So what happened is that somebody called fork() with an mm_struct
> I've accounted my pages to. The result is just that we messed up the
> rss_stats and  the the "BUG..." above.
> 
> The key difference between normal allocated pages and the resources here is
> just that we are not bound to an mm_struct in any way.

It is not really clear to me what exactly you have tried.
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 145+ messages in thread

* Re: [Nouveau] [PATCH 03/13] mm: shmem: provide oom badness for shmem files
  2022-06-13 12:11                                 ` [Intel-gfx] " Michal Hocko
  (?)
  (?)
@ 2022-06-13 12:55                                   ` Christian König
  -1 siblings, 0 replies; 145+ messages in thread
From: Christian König @ 2022-06-13 12:55 UTC (permalink / raw)
  To: Michal Hocko, Christian König
  Cc: andrey.grodzovsky, linux-mm, nouveau, intel-gfx, hughd,
	linux-kernel, amd-gfx, linux-fsdevel, viro, daniel, linux-tegra,
	alexander.deucher, akpm, linux-media

Am 13.06.22 um 14:11 schrieb Michal Hocko:
> [SNIP]
>>>> Alternative I could try to track the "owner" of a buffer (e.g. a shmem
>>>> file), but then it can happen that one processes creates the object and
>>>> another one is writing to it and actually allocating the memory.
>>> If you can enforce that the owner is really responsible for the
>>> allocation then all should be fine. That would require MAP_POPULATE like
>>> semantic and I suspect this is not really feasible with the existing
>>> userspace. It would be certainly hard to enforce for bad players.
>> I've tried this today and the result was: "BUG: Bad rss-counter state
>> mm:000000008751d9ff type:MM_FILEPAGES val:-571286".
>>
>> The problem is once more that files are not informed when the process
>> clones. So what happened is that somebody called fork() with an mm_struct
>> I've accounted my pages to. The result is just that we messed up the
>> rss_stats and  the the "BUG..." above.
>>
>> The key difference between normal allocated pages and the resources here is
>> just that we are not bound to an mm_struct in any way.
> It is not really clear to me what exactly you have tried.

I've tried to track the "owner" of a driver connection by keeping a 
reference to the mm_struct who created this connection inside our file 
private and then use add_mm_counter() to account all the allocations of 
the driver to this mm_struct.

This works to the extend that now the right process is killed in an OOM 
situation. The problem with this approach is that the driver is not 
informed about operations like fork() or clone(), so what happens is 
that after a fork()/clone() we have an unbalanced rss-counter.

Let me maybe get back to the initial question: We have resources which 
are not related to the virtual address space of a process, how should we 
tell the OOM killer about them?

Thanks for all the input so far,
Christian.

^ permalink raw reply	[flat|nested] 145+ messages in thread

* Re: [Intel-gfx] [PATCH 03/13] mm: shmem: provide oom badness for shmem files
@ 2022-06-13 12:55                                   ` Christian König
  0 siblings, 0 replies; 145+ messages in thread
From: Christian König @ 2022-06-13 12:55 UTC (permalink / raw)
  To: Michal Hocko, Christian König
  Cc: andrey.grodzovsky, linux-mm, nouveau, intel-gfx, hughd,
	linux-kernel, amd-gfx, linux-fsdevel, viro, linux-tegra,
	alexander.deucher, akpm, linux-media

Am 13.06.22 um 14:11 schrieb Michal Hocko:
> [SNIP]
>>>> Alternative I could try to track the "owner" of a buffer (e.g. a shmem
>>>> file), but then it can happen that one processes creates the object and
>>>> another one is writing to it and actually allocating the memory.
>>> If you can enforce that the owner is really responsible for the
>>> allocation then all should be fine. That would require MAP_POPULATE like
>>> semantic and I suspect this is not really feasible with the existing
>>> userspace. It would be certainly hard to enforce for bad players.
>> I've tried this today and the result was: "BUG: Bad rss-counter state
>> mm:000000008751d9ff type:MM_FILEPAGES val:-571286".
>>
>> The problem is once more that files are not informed when the process
>> clones. So what happened is that somebody called fork() with an mm_struct
>> I've accounted my pages to. The result is just that we messed up the
>> rss_stats and  the the "BUG..." above.
>>
>> The key difference between normal allocated pages and the resources here is
>> just that we are not bound to an mm_struct in any way.
> It is not really clear to me what exactly you have tried.

I've tried to track the "owner" of a driver connection by keeping a 
reference to the mm_struct who created this connection inside our file 
private and then use add_mm_counter() to account all the allocations of 
the driver to this mm_struct.

This works to the extend that now the right process is killed in an OOM 
situation. The problem with this approach is that the driver is not 
informed about operations like fork() or clone(), so what happens is 
that after a fork()/clone() we have an unbalanced rss-counter.

Let me maybe get back to the initial question: We have resources which 
are not related to the virtual address space of a process, how should we 
tell the OOM killer about them?

Thanks for all the input so far,
Christian.

^ permalink raw reply	[flat|nested] 145+ messages in thread

* Re: [PATCH 03/13] mm: shmem: provide oom badness for shmem files
@ 2022-06-13 12:55                                   ` Christian König
  0 siblings, 0 replies; 145+ messages in thread
From: Christian König @ 2022-06-13 12:55 UTC (permalink / raw)
  To: Michal Hocko, Christian König
  Cc: andrey.grodzovsky, linux-mm, nouveau, intel-gfx, hughd,
	linux-kernel, amd-gfx, linux-fsdevel, viro, daniel, linux-tegra,
	alexander.deucher, akpm, linux-media

Am 13.06.22 um 14:11 schrieb Michal Hocko:
> [SNIP]
>>>> Alternative I could try to track the "owner" of a buffer (e.g. a shmem
>>>> file), but then it can happen that one processes creates the object and
>>>> another one is writing to it and actually allocating the memory.
>>> If you can enforce that the owner is really responsible for the
>>> allocation then all should be fine. That would require MAP_POPULATE like
>>> semantic and I suspect this is not really feasible with the existing
>>> userspace. It would be certainly hard to enforce for bad players.
>> I've tried this today and the result was: "BUG: Bad rss-counter state
>> mm:000000008751d9ff type:MM_FILEPAGES val:-571286".
>>
>> The problem is once more that files are not informed when the process
>> clones. So what happened is that somebody called fork() with an mm_struct
>> I've accounted my pages to. The result is just that we messed up the
>> rss_stats and  the the "BUG..." above.
>>
>> The key difference between normal allocated pages and the resources here is
>> just that we are not bound to an mm_struct in any way.
> It is not really clear to me what exactly you have tried.

I've tried to track the "owner" of a driver connection by keeping a 
reference to the mm_struct who created this connection inside our file 
private and then use add_mm_counter() to account all the allocations of 
the driver to this mm_struct.

This works to the extend that now the right process is killed in an OOM 
situation. The problem with this approach is that the driver is not 
informed about operations like fork() or clone(), so what happens is 
that after a fork()/clone() we have an unbalanced rss-counter.

Let me maybe get back to the initial question: We have resources which 
are not related to the virtual address space of a process, how should we 
tell the OOM killer about them?

Thanks for all the input so far,
Christian.

^ permalink raw reply	[flat|nested] 145+ messages in thread

* Re: [PATCH 03/13] mm: shmem: provide oom badness for shmem files
@ 2022-06-13 12:55                                   ` Christian König
  0 siblings, 0 replies; 145+ messages in thread
From: Christian König @ 2022-06-13 12:55 UTC (permalink / raw)
  To: Michal Hocko, Christian König
  Cc: linux-media, linux-kernel, intel-gfx, amd-gfx, nouveau,
	linux-tegra, linux-fsdevel, linux-mm, alexander.deucher, daniel,
	viro, akpm, hughd, andrey.grodzovsky

Am 13.06.22 um 14:11 schrieb Michal Hocko:
> [SNIP]
>>>> Alternative I could try to track the "owner" of a buffer (e.g. a shmem
>>>> file), but then it can happen that one processes creates the object and
>>>> another one is writing to it and actually allocating the memory.
>>> If you can enforce that the owner is really responsible for the
>>> allocation then all should be fine. That would require MAP_POPULATE like
>>> semantic and I suspect this is not really feasible with the existing
>>> userspace. It would be certainly hard to enforce for bad players.
>> I've tried this today and the result was: "BUG: Bad rss-counter state
>> mm:000000008751d9ff type:MM_FILEPAGES val:-571286".
>>
>> The problem is once more that files are not informed when the process
>> clones. So what happened is that somebody called fork() with an mm_struct
>> I've accounted my pages to. The result is just that we messed up the
>> rss_stats and  the the "BUG..." above.
>>
>> The key difference between normal allocated pages and the resources here is
>> just that we are not bound to an mm_struct in any way.
> It is not really clear to me what exactly you have tried.

I've tried to track the "owner" of a driver connection by keeping a 
reference to the mm_struct who created this connection inside our file 
private and then use add_mm_counter() to account all the allocations of 
the driver to this mm_struct.

This works to the extend that now the right process is killed in an OOM 
situation. The problem with this approach is that the driver is not 
informed about operations like fork() or clone(), so what happens is 
that after a fork()/clone() we have an unbalanced rss-counter.

Let me maybe get back to the initial question: We have resources which 
are not related to the virtual address space of a process, how should we 
tell the OOM killer about them?

Thanks for all the input so far,
Christian.

^ permalink raw reply	[flat|nested] 145+ messages in thread

* Re: [Intel-gfx] [PATCH 03/13] mm: shmem: provide oom badness for shmem files
  2022-06-13 12:55                                   ` [Intel-gfx] " Christian König
  (?)
  (?)
@ 2022-06-13 14:11                                     ` Michal Hocko
  -1 siblings, 0 replies; 145+ messages in thread
From: Michal Hocko @ 2022-06-13 14:11 UTC (permalink / raw)
  To: Christian König
  Cc: andrey.grodzovsky, linux-mm, nouveau, intel-gfx, hughd,
	linux-kernel, amd-gfx, linux-fsdevel, viro, linux-tegra,
	alexander.deucher, akpm, Christian König, linux-media

On Mon 13-06-22 14:55:54, Christian König wrote:
> Am 13.06.22 um 14:11 schrieb Michal Hocko:
> > [SNIP]
> > > > > Alternative I could try to track the "owner" of a buffer (e.g. a shmem
> > > > > file), but then it can happen that one processes creates the object and
> > > > > another one is writing to it and actually allocating the memory.
> > > > If you can enforce that the owner is really responsible for the
> > > > allocation then all should be fine. That would require MAP_POPULATE like
> > > > semantic and I suspect this is not really feasible with the existing
> > > > userspace. It would be certainly hard to enforce for bad players.
> > > I've tried this today and the result was: "BUG: Bad rss-counter state
> > > mm:000000008751d9ff type:MM_FILEPAGES val:-571286".
> > > 
> > > The problem is once more that files are not informed when the process
> > > clones. So what happened is that somebody called fork() with an mm_struct
> > > I've accounted my pages to. The result is just that we messed up the
> > > rss_stats and  the the "BUG..." above.
> > > 
> > > The key difference between normal allocated pages and the resources here is
> > > just that we are not bound to an mm_struct in any way.
> > It is not really clear to me what exactly you have tried.
> 
> I've tried to track the "owner" of a driver connection by keeping a
> reference to the mm_struct who created this connection inside our file
> private and then use add_mm_counter() to account all the allocations of the
> driver to this mm_struct.
> 
> This works to the extend that now the right process is killed in an OOM
> situation. The problem with this approach is that the driver is not informed
> about operations like fork() or clone(), so what happens is that after a
> fork()/clone() we have an unbalanced rss-counter.

Yes, I do not think you can make per-process accounting without a
concept of the per-process ownership.

> Let me maybe get back to the initial question: We have resources which are
> not related to the virtual address space of a process, how should we tell
> the OOM killer about them?

I would say memcg, but we have discussed this already...

I do not think that exposing a resource (in a form of a counter
or something like that) is sufficient. The existing oom killer
implementation is hevily process centric (with memcg extension for
grouping but not changing the overall design in principle). If you
want to make it aware of resources which are not directly accounted to
processes then a a new implementation is necessary IMHO. You would need
to evaluate those resources and kill all the tasks that can hold on that
resource.

This is also the reason why I am not really fan of the per file
badness because it adds a notion of resource that is not process bound
in general so it will add all sorts of weird runtime corner cases which
are impossible to anticipate [*]. Maybe that will work in some scenarios
but definitely not something to be done by default without users opting
into that and being aware of consequences. 

There have been discussions that the existing oom implementation cannot
fit all potential usecases so maybe we need to finally decide to use a
plugable, BPFable etc architecture allow implementations that fit
specific needs.

[*] I know it is not directly related but kinda similar. In the past
we used to have heuristics to consider work done as a resource . That is
kill younger processes preferably to reduce the damage.  This has turned
out to have a very unpredictable behavior and many complains by
users. Situation has improved when the selection was solely based on
rss. This has its own cons of course but at least they are predictable.
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 145+ messages in thread

* Re: [PATCH 03/13] mm: shmem: provide oom badness for shmem files
@ 2022-06-13 14:11                                     ` Michal Hocko
  0 siblings, 0 replies; 145+ messages in thread
From: Michal Hocko @ 2022-06-13 14:11 UTC (permalink / raw)
  To: Christian König
  Cc: Christian König, linux-media, linux-kernel, intel-gfx,
	amd-gfx, nouveau, linux-tegra, linux-fsdevel, linux-mm,
	alexander.deucher, daniel, viro, akpm, hughd, andrey.grodzovsky

On Mon 13-06-22 14:55:54, Christian König wrote:
> Am 13.06.22 um 14:11 schrieb Michal Hocko:
> > [SNIP]
> > > > > Alternative I could try to track the "owner" of a buffer (e.g. a shmem
> > > > > file), but then it can happen that one processes creates the object and
> > > > > another one is writing to it and actually allocating the memory.
> > > > If you can enforce that the owner is really responsible for the
> > > > allocation then all should be fine. That would require MAP_POPULATE like
> > > > semantic and I suspect this is not really feasible with the existing
> > > > userspace. It would be certainly hard to enforce for bad players.
> > > I've tried this today and the result was: "BUG: Bad rss-counter state
> > > mm:000000008751d9ff type:MM_FILEPAGES val:-571286".
> > > 
> > > The problem is once more that files are not informed when the process
> > > clones. So what happened is that somebody called fork() with an mm_struct
> > > I've accounted my pages to. The result is just that we messed up the
> > > rss_stats and  the the "BUG..." above.
> > > 
> > > The key difference between normal allocated pages and the resources here is
> > > just that we are not bound to an mm_struct in any way.
> > It is not really clear to me what exactly you have tried.
> 
> I've tried to track the "owner" of a driver connection by keeping a
> reference to the mm_struct who created this connection inside our file
> private and then use add_mm_counter() to account all the allocations of the
> driver to this mm_struct.
> 
> This works to the extend that now the right process is killed in an OOM
> situation. The problem with this approach is that the driver is not informed
> about operations like fork() or clone(), so what happens is that after a
> fork()/clone() we have an unbalanced rss-counter.

Yes, I do not think you can make per-process accounting without a
concept of the per-process ownership.

> Let me maybe get back to the initial question: We have resources which are
> not related to the virtual address space of a process, how should we tell
> the OOM killer about them?

I would say memcg, but we have discussed this already...

I do not think that exposing a resource (in a form of a counter
or something like that) is sufficient. The existing oom killer
implementation is hevily process centric (with memcg extension for
grouping but not changing the overall design in principle). If you
want to make it aware of resources which are not directly accounted to
processes then a a new implementation is necessary IMHO. You would need
to evaluate those resources and kill all the tasks that can hold on that
resource.

This is also the reason why I am not really fan of the per file
badness because it adds a notion of resource that is not process bound
in general so it will add all sorts of weird runtime corner cases which
are impossible to anticipate [*]. Maybe that will work in some scenarios
but definitely not something to be done by default without users opting
into that and being aware of consequences. 

There have been discussions that the existing oom implementation cannot
fit all potential usecases so maybe we need to finally decide to use a
plugable, BPFable etc architecture allow implementations that fit
specific needs.

[*] I know it is not directly related but kinda similar. In the past
we used to have heuristics to consider work done as a resource . That is
kill younger processes preferably to reduce the damage.  This has turned
out to have a very unpredictable behavior and many complains by
users. Situation has improved when the selection was solely based on
rss. This has its own cons of course but at least they are predictable.
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 145+ messages in thread

* Re: [PATCH 03/13] mm: shmem: provide oom badness for shmem files
@ 2022-06-13 14:11                                     ` Michal Hocko
  0 siblings, 0 replies; 145+ messages in thread
From: Michal Hocko @ 2022-06-13 14:11 UTC (permalink / raw)
  To: Christian König
  Cc: andrey.grodzovsky, linux-mm, nouveau, intel-gfx, hughd,
	linux-kernel, amd-gfx, linux-fsdevel, viro, daniel, linux-tegra,
	alexander.deucher, akpm, Christian König, linux-media

On Mon 13-06-22 14:55:54, Christian König wrote:
> Am 13.06.22 um 14:11 schrieb Michal Hocko:
> > [SNIP]
> > > > > Alternative I could try to track the "owner" of a buffer (e.g. a shmem
> > > > > file), but then it can happen that one processes creates the object and
> > > > > another one is writing to it and actually allocating the memory.
> > > > If you can enforce that the owner is really responsible for the
> > > > allocation then all should be fine. That would require MAP_POPULATE like
> > > > semantic and I suspect this is not really feasible with the existing
> > > > userspace. It would be certainly hard to enforce for bad players.
> > > I've tried this today and the result was: "BUG: Bad rss-counter state
> > > mm:000000008751d9ff type:MM_FILEPAGES val:-571286".
> > > 
> > > The problem is once more that files are not informed when the process
> > > clones. So what happened is that somebody called fork() with an mm_struct
> > > I've accounted my pages to. The result is just that we messed up the
> > > rss_stats and  the the "BUG..." above.
> > > 
> > > The key difference between normal allocated pages and the resources here is
> > > just that we are not bound to an mm_struct in any way.
> > It is not really clear to me what exactly you have tried.
> 
> I've tried to track the "owner" of a driver connection by keeping a
> reference to the mm_struct who created this connection inside our file
> private and then use add_mm_counter() to account all the allocations of the
> driver to this mm_struct.
> 
> This works to the extend that now the right process is killed in an OOM
> situation. The problem with this approach is that the driver is not informed
> about operations like fork() or clone(), so what happens is that after a
> fork()/clone() we have an unbalanced rss-counter.

Yes, I do not think you can make per-process accounting without a
concept of the per-process ownership.

> Let me maybe get back to the initial question: We have resources which are
> not related to the virtual address space of a process, how should we tell
> the OOM killer about them?

I would say memcg, but we have discussed this already...

I do not think that exposing a resource (in a form of a counter
or something like that) is sufficient. The existing oom killer
implementation is hevily process centric (with memcg extension for
grouping but not changing the overall design in principle). If you
want to make it aware of resources which are not directly accounted to
processes then a a new implementation is necessary IMHO. You would need
to evaluate those resources and kill all the tasks that can hold on that
resource.

This is also the reason why I am not really fan of the per file
badness because it adds a notion of resource that is not process bound
in general so it will add all sorts of weird runtime corner cases which
are impossible to anticipate [*]. Maybe that will work in some scenarios
but definitely not something to be done by default without users opting
into that and being aware of consequences. 

There have been discussions that the existing oom implementation cannot
fit all potential usecases so maybe we need to finally decide to use a
plugable, BPFable etc architecture allow implementations that fit
specific needs.

[*] I know it is not directly related but kinda similar. In the past
we used to have heuristics to consider work done as a resource . That is
kill younger processes preferably to reduce the damage.  This has turned
out to have a very unpredictable behavior and many complains by
users. Situation has improved when the selection was solely based on
rss. This has its own cons of course but at least they are predictable.
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 145+ messages in thread

* Re: [Nouveau] [PATCH 03/13] mm: shmem: provide oom badness for shmem files
@ 2022-06-13 14:11                                     ` Michal Hocko
  0 siblings, 0 replies; 145+ messages in thread
From: Michal Hocko @ 2022-06-13 14:11 UTC (permalink / raw)
  To: Christian König
  Cc: andrey.grodzovsky, linux-mm, nouveau, intel-gfx, hughd,
	linux-kernel, amd-gfx, linux-fsdevel, viro, daniel, linux-tegra,
	alexander.deucher, akpm, Christian König, linux-media

On Mon 13-06-22 14:55:54, Christian König wrote:
> Am 13.06.22 um 14:11 schrieb Michal Hocko:
> > [SNIP]
> > > > > Alternative I could try to track the "owner" of a buffer (e.g. a shmem
> > > > > file), but then it can happen that one processes creates the object and
> > > > > another one is writing to it and actually allocating the memory.
> > > > If you can enforce that the owner is really responsible for the
> > > > allocation then all should be fine. That would require MAP_POPULATE like
> > > > semantic and I suspect this is not really feasible with the existing
> > > > userspace. It would be certainly hard to enforce for bad players.
> > > I've tried this today and the result was: "BUG: Bad rss-counter state
> > > mm:000000008751d9ff type:MM_FILEPAGES val:-571286".
> > > 
> > > The problem is once more that files are not informed when the process
> > > clones. So what happened is that somebody called fork() with an mm_struct
> > > I've accounted my pages to. The result is just that we messed up the
> > > rss_stats and  the the "BUG..." above.
> > > 
> > > The key difference between normal allocated pages and the resources here is
> > > just that we are not bound to an mm_struct in any way.
> > It is not really clear to me what exactly you have tried.
> 
> I've tried to track the "owner" of a driver connection by keeping a
> reference to the mm_struct who created this connection inside our file
> private and then use add_mm_counter() to account all the allocations of the
> driver to this mm_struct.
> 
> This works to the extend that now the right process is killed in an OOM
> situation. The problem with this approach is that the driver is not informed
> about operations like fork() or clone(), so what happens is that after a
> fork()/clone() we have an unbalanced rss-counter.

Yes, I do not think you can make per-process accounting without a
concept of the per-process ownership.

> Let me maybe get back to the initial question: We have resources which are
> not related to the virtual address space of a process, how should we tell
> the OOM killer about them?

I would say memcg, but we have discussed this already...

I do not think that exposing a resource (in a form of a counter
or something like that) is sufficient. The existing oom killer
implementation is hevily process centric (with memcg extension for
grouping but not changing the overall design in principle). If you
want to make it aware of resources which are not directly accounted to
processes then a a new implementation is necessary IMHO. You would need
to evaluate those resources and kill all the tasks that can hold on that
resource.

This is also the reason why I am not really fan of the per file
badness because it adds a notion of resource that is not process bound
in general so it will add all sorts of weird runtime corner cases which
are impossible to anticipate [*]. Maybe that will work in some scenarios
but definitely not something to be done by default without users opting
into that and being aware of consequences. 

There have been discussions that the existing oom implementation cannot
fit all potential usecases so maybe we need to finally decide to use a
plugable, BPFable etc architecture allow implementations that fit
specific needs.

[*] I know it is not directly related but kinda similar. In the past
we used to have heuristics to consider work done as a resource . That is
kill younger processes preferably to reduce the damage.  This has turned
out to have a very unpredictable behavior and many complains by
users. Situation has improved when the selection was solely based on
rss. This has its own cons of course but at least they are predictable.
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 145+ messages in thread

* Re: [PATCH 03/13] mm: shmem: provide oom badness for shmem files
  2022-06-13 14:11                                     ` Michal Hocko
  (?)
  (?)
@ 2022-06-15 12:35                                       ` Christian König
  -1 siblings, 0 replies; 145+ messages in thread
From: Christian König @ 2022-06-15 12:35 UTC (permalink / raw)
  To: Michal Hocko, Christian König
  Cc: linux-media, linux-kernel, intel-gfx, amd-gfx, nouveau,
	linux-tegra, linux-fsdevel, linux-mm, alexander.deucher, daniel,
	viro, akpm, hughd, andrey.grodzovsky

Am 13.06.22 um 16:11 schrieb Michal Hocko:
> [SNIP]
>> Let me maybe get back to the initial question: We have resources which are
>> not related to the virtual address space of a process, how should we tell
>> the OOM killer about them?
> I would say memcg, but we have discussed this already...

Well memcg is at least closer to the requirements than the classic 
mm_struct accounting.

It won't work for really shared buffers, but if that's the requirement 
to find some doable solution for the remaining 99% then I can live with 
that.

> I do not think that exposing a resource (in a form of a counter
> or something like that) is sufficient. The existing oom killer
> implementation is hevily process centric (with memcg extension for
> grouping but not changing the overall design in principle). If you
> want to make it aware of resources which are not directly accounted to
> processes then a a new implementation is necessary IMHO. You would need
> to evaluate those resources and kill all the tasks that can hold on that
> resource.

Well the OOM killer is process centric because processes are what you 
can kill.

Even the classic mm_struct based accounting includes MM_SHMEMPAGES into 
the badness. So accounting shared resources as badness to make a 
decision is nothing new here.

The difference is that this time the badness doesn't come from the 
memory management subsystem, but rather from the I/O subsystem.

> This is also the reason why I am not really fan of the per file
> badness because it adds a notion of resource that is not process bound
> in general so it will add all sorts of weird runtime corner cases which
> are impossible to anticipate [*]. Maybe that will work in some scenarios
> but definitely not something to be done by default without users opting
> into that and being aware of consequences.

Would a kernel command line option to control the behavior be helpful here?

> There have been discussions that the existing oom implementation cannot
> fit all potential usecases so maybe we need to finally decide to use a
> plugable, BPFable etc architecture allow implementations that fit
> specific needs.

Yeah, BPF came to my mind as well. But need to talk with out experts on 
that topic first.

When the OOM killer runs allocating more memory is pretty much a no-go 
and I'm not sure what the requirements of running a BPF to find the 
badness are.

> [*] I know it is not directly related but kinda similar. In the past
> we used to have heuristics to consider work done as a resource . That is
> kill younger processes preferably to reduce the damage.  This has turned
> out to have a very unpredictable behavior and many complains by
> users. Situation has improved when the selection was solely based on
> rss. This has its own cons of course but at least they are predictable.

Good to know, thanks.

Regards,
Christian.

^ permalink raw reply	[flat|nested] 145+ messages in thread

* Re: [Nouveau] [PATCH 03/13] mm: shmem: provide oom badness for shmem files
@ 2022-06-15 12:35                                       ` Christian König
  0 siblings, 0 replies; 145+ messages in thread
From: Christian König @ 2022-06-15 12:35 UTC (permalink / raw)
  To: Michal Hocko, Christian König
  Cc: andrey.grodzovsky, linux-mm, nouveau, intel-gfx, hughd,
	linux-kernel, amd-gfx, linux-fsdevel, viro, daniel, linux-tegra,
	alexander.deucher, akpm, linux-media

Am 13.06.22 um 16:11 schrieb Michal Hocko:
> [SNIP]
>> Let me maybe get back to the initial question: We have resources which are
>> not related to the virtual address space of a process, how should we tell
>> the OOM killer about them?
> I would say memcg, but we have discussed this already...

Well memcg is at least closer to the requirements than the classic 
mm_struct accounting.

It won't work for really shared buffers, but if that's the requirement 
to find some doable solution for the remaining 99% then I can live with 
that.

> I do not think that exposing a resource (in a form of a counter
> or something like that) is sufficient. The existing oom killer
> implementation is hevily process centric (with memcg extension for
> grouping but not changing the overall design in principle). If you
> want to make it aware of resources which are not directly accounted to
> processes then a a new implementation is necessary IMHO. You would need
> to evaluate those resources and kill all the tasks that can hold on that
> resource.

Well the OOM killer is process centric because processes are what you 
can kill.

Even the classic mm_struct based accounting includes MM_SHMEMPAGES into 
the badness. So accounting shared resources as badness to make a 
decision is nothing new here.

The difference is that this time the badness doesn't come from the 
memory management subsystem, but rather from the I/O subsystem.

> This is also the reason why I am not really fan of the per file
> badness because it adds a notion of resource that is not process bound
> in general so it will add all sorts of weird runtime corner cases which
> are impossible to anticipate [*]. Maybe that will work in some scenarios
> but definitely not something to be done by default without users opting
> into that and being aware of consequences.

Would a kernel command line option to control the behavior be helpful here?

> There have been discussions that the existing oom implementation cannot
> fit all potential usecases so maybe we need to finally decide to use a
> plugable, BPFable etc architecture allow implementations that fit
> specific needs.

Yeah, BPF came to my mind as well. But need to talk with out experts on 
that topic first.

When the OOM killer runs allocating more memory is pretty much a no-go 
and I'm not sure what the requirements of running a BPF to find the 
badness are.

> [*] I know it is not directly related but kinda similar. In the past
> we used to have heuristics to consider work done as a resource . That is
> kill younger processes preferably to reduce the damage.  This has turned
> out to have a very unpredictable behavior and many complains by
> users. Situation has improved when the selection was solely based on
> rss. This has its own cons of course but at least they are predictable.

Good to know, thanks.

Regards,
Christian.

^ permalink raw reply	[flat|nested] 145+ messages in thread

* Re: [Intel-gfx] [PATCH 03/13] mm: shmem: provide oom badness for shmem files
@ 2022-06-15 12:35                                       ` Christian König
  0 siblings, 0 replies; 145+ messages in thread
From: Christian König @ 2022-06-15 12:35 UTC (permalink / raw)
  To: Michal Hocko, Christian König
  Cc: andrey.grodzovsky, linux-mm, nouveau, intel-gfx, hughd,
	linux-kernel, amd-gfx, linux-fsdevel, viro, linux-tegra,
	alexander.deucher, akpm, linux-media

Am 13.06.22 um 16:11 schrieb Michal Hocko:
> [SNIP]
>> Let me maybe get back to the initial question: We have resources which are
>> not related to the virtual address space of a process, how should we tell
>> the OOM killer about them?
> I would say memcg, but we have discussed this already...

Well memcg is at least closer to the requirements than the classic 
mm_struct accounting.

It won't work for really shared buffers, but if that's the requirement 
to find some doable solution for the remaining 99% then I can live with 
that.

> I do not think that exposing a resource (in a form of a counter
> or something like that) is sufficient. The existing oom killer
> implementation is hevily process centric (with memcg extension for
> grouping but not changing the overall design in principle). If you
> want to make it aware of resources which are not directly accounted to
> processes then a a new implementation is necessary IMHO. You would need
> to evaluate those resources and kill all the tasks that can hold on that
> resource.

Well the OOM killer is process centric because processes are what you 
can kill.

Even the classic mm_struct based accounting includes MM_SHMEMPAGES into 
the badness. So accounting shared resources as badness to make a 
decision is nothing new here.

The difference is that this time the badness doesn't come from the 
memory management subsystem, but rather from the I/O subsystem.

> This is also the reason why I am not really fan of the per file
> badness because it adds a notion of resource that is not process bound
> in general so it will add all sorts of weird runtime corner cases which
> are impossible to anticipate [*]. Maybe that will work in some scenarios
> but definitely not something to be done by default without users opting
> into that and being aware of consequences.

Would a kernel command line option to control the behavior be helpful here?

> There have been discussions that the existing oom implementation cannot
> fit all potential usecases so maybe we need to finally decide to use a
> plugable, BPFable etc architecture allow implementations that fit
> specific needs.

Yeah, BPF came to my mind as well. But need to talk with out experts on 
that topic first.

When the OOM killer runs allocating more memory is pretty much a no-go 
and I'm not sure what the requirements of running a BPF to find the 
badness are.

> [*] I know it is not directly related but kinda similar. In the past
> we used to have heuristics to consider work done as a resource . That is
> kill younger processes preferably to reduce the damage.  This has turned
> out to have a very unpredictable behavior and many complains by
> users. Situation has improved when the selection was solely based on
> rss. This has its own cons of course but at least they are predictable.

Good to know, thanks.

Regards,
Christian.

^ permalink raw reply	[flat|nested] 145+ messages in thread

* Re: [PATCH 03/13] mm: shmem: provide oom badness for shmem files
@ 2022-06-15 12:35                                       ` Christian König
  0 siblings, 0 replies; 145+ messages in thread
From: Christian König @ 2022-06-15 12:35 UTC (permalink / raw)
  To: Michal Hocko, Christian König
  Cc: andrey.grodzovsky, linux-mm, nouveau, intel-gfx, hughd,
	linux-kernel, amd-gfx, linux-fsdevel, viro, daniel, linux-tegra,
	alexander.deucher, akpm, linux-media

Am 13.06.22 um 16:11 schrieb Michal Hocko:
> [SNIP]
>> Let me maybe get back to the initial question: We have resources which are
>> not related to the virtual address space of a process, how should we tell
>> the OOM killer about them?
> I would say memcg, but we have discussed this already...

Well memcg is at least closer to the requirements than the classic 
mm_struct accounting.

It won't work for really shared buffers, but if that's the requirement 
to find some doable solution for the remaining 99% then I can live with 
that.

> I do not think that exposing a resource (in a form of a counter
> or something like that) is sufficient. The existing oom killer
> implementation is hevily process centric (with memcg extension for
> grouping but not changing the overall design in principle). If you
> want to make it aware of resources which are not directly accounted to
> processes then a a new implementation is necessary IMHO. You would need
> to evaluate those resources and kill all the tasks that can hold on that
> resource.

Well the OOM killer is process centric because processes are what you 
can kill.

Even the classic mm_struct based accounting includes MM_SHMEMPAGES into 
the badness. So accounting shared resources as badness to make a 
decision is nothing new here.

The difference is that this time the badness doesn't come from the 
memory management subsystem, but rather from the I/O subsystem.

> This is also the reason why I am not really fan of the per file
> badness because it adds a notion of resource that is not process bound
> in general so it will add all sorts of weird runtime corner cases which
> are impossible to anticipate [*]. Maybe that will work in some scenarios
> but definitely not something to be done by default without users opting
> into that and being aware of consequences.

Would a kernel command line option to control the behavior be helpful here?

> There have been discussions that the existing oom implementation cannot
> fit all potential usecases so maybe we need to finally decide to use a
> plugable, BPFable etc architecture allow implementations that fit
> specific needs.

Yeah, BPF came to my mind as well. But need to talk with out experts on 
that topic first.

When the OOM killer runs allocating more memory is pretty much a no-go 
and I'm not sure what the requirements of running a BPF to find the 
badness are.

> [*] I know it is not directly related but kinda similar. In the past
> we used to have heuristics to consider work done as a resource . That is
> kill younger processes preferably to reduce the damage.  This has turned
> out to have a very unpredictable behavior and many complains by
> users. Situation has improved when the selection was solely based on
> rss. This has its own cons of course but at least they are predictable.

Good to know, thanks.

Regards,
Christian.

^ permalink raw reply	[flat|nested] 145+ messages in thread

* Re: [PATCH 03/13] mm: shmem: provide oom badness for shmem files
  2022-06-15 12:35                                       ` [Nouveau] " Christian König
  (?)
  (?)
@ 2022-06-15 13:15                                         ` Michal Hocko
  -1 siblings, 0 replies; 145+ messages in thread
From: Michal Hocko @ 2022-06-15 13:15 UTC (permalink / raw)
  To: Christian König
  Cc: Christian König, linux-media, linux-kernel, intel-gfx,
	amd-gfx, nouveau, linux-tegra, linux-fsdevel, linux-mm,
	alexander.deucher, daniel, viro, akpm, hughd, andrey.grodzovsky

On Wed 15-06-22 14:35:22, Christian König wrote:
[...]
> Even the classic mm_struct based accounting includes MM_SHMEMPAGES into the
> badness. So accounting shared resources as badness to make a decision is
> nothing new here.

Yeah, it is nothing really new but it also doesn't mean it is an example
worth following as this doesn't really work currently. Also please note
that MM_SHMEMPAGES is counting at least something process specific as
those pages are mapped in to the process (and with enough of wishful
thinking unmapping can drop the last reference and free something up
actually) . With generic per-file memory this is even more detached from
process.

> The difference is that this time the badness doesn't come from the memory
> management subsystem, but rather from the I/O subsystem.
> 
> > This is also the reason why I am not really fan of the per file
> > badness because it adds a notion of resource that is not process bound
> > in general so it will add all sorts of weird runtime corner cases which
> > are impossible to anticipate [*]. Maybe that will work in some scenarios
> > but definitely not something to be done by default without users opting
> > into that and being aware of consequences.
> 
> Would a kernel command line option to control the behavior be helpful here?

I am not sure what would be the proper way to control that that would be
future extensible. Kernel command line is certainly and option but if we
want to extend that to module like or eBPF interface then it wouldn't
stand a future test very quickly.

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 145+ messages in thread

* Re: [Intel-gfx] [PATCH 03/13] mm: shmem: provide oom badness for shmem files
@ 2022-06-15 13:15                                         ` Michal Hocko
  0 siblings, 0 replies; 145+ messages in thread
From: Michal Hocko @ 2022-06-15 13:15 UTC (permalink / raw)
  To: Christian König
  Cc: andrey.grodzovsky, linux-mm, nouveau, intel-gfx, hughd,
	linux-kernel, amd-gfx, linux-fsdevel, viro, Christian König,
	linux-tegra, alexander.deucher, akpm, linux-media

On Wed 15-06-22 14:35:22, Christian König wrote:
[...]
> Even the classic mm_struct based accounting includes MM_SHMEMPAGES into the
> badness. So accounting shared resources as badness to make a decision is
> nothing new here.

Yeah, it is nothing really new but it also doesn't mean it is an example
worth following as this doesn't really work currently. Also please note
that MM_SHMEMPAGES is counting at least something process specific as
those pages are mapped in to the process (and with enough of wishful
thinking unmapping can drop the last reference and free something up
actually) . With generic per-file memory this is even more detached from
process.

> The difference is that this time the badness doesn't come from the memory
> management subsystem, but rather from the I/O subsystem.
> 
> > This is also the reason why I am not really fan of the per file
> > badness because it adds a notion of resource that is not process bound
> > in general so it will add all sorts of weird runtime corner cases which
> > are impossible to anticipate [*]. Maybe that will work in some scenarios
> > but definitely not something to be done by default without users opting
> > into that and being aware of consequences.
> 
> Would a kernel command line option to control the behavior be helpful here?

I am not sure what would be the proper way to control that that would be
future extensible. Kernel command line is certainly and option but if we
want to extend that to module like or eBPF interface then it wouldn't
stand a future test very quickly.

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 145+ messages in thread

* Re: [PATCH 03/13] mm: shmem: provide oom badness for shmem files
@ 2022-06-15 13:15                                         ` Michal Hocko
  0 siblings, 0 replies; 145+ messages in thread
From: Michal Hocko @ 2022-06-15 13:15 UTC (permalink / raw)
  To: Christian König
  Cc: andrey.grodzovsky, linux-mm, nouveau, intel-gfx, hughd,
	linux-kernel, amd-gfx, linux-fsdevel, viro, daniel,
	Christian König, linux-tegra, alexander.deucher, akpm,
	linux-media

On Wed 15-06-22 14:35:22, Christian König wrote:
[...]
> Even the classic mm_struct based accounting includes MM_SHMEMPAGES into the
> badness. So accounting shared resources as badness to make a decision is
> nothing new here.

Yeah, it is nothing really new but it also doesn't mean it is an example
worth following as this doesn't really work currently. Also please note
that MM_SHMEMPAGES is counting at least something process specific as
those pages are mapped in to the process (and with enough of wishful
thinking unmapping can drop the last reference and free something up
actually) . With generic per-file memory this is even more detached from
process.

> The difference is that this time the badness doesn't come from the memory
> management subsystem, but rather from the I/O subsystem.
> 
> > This is also the reason why I am not really fan of the per file
> > badness because it adds a notion of resource that is not process bound
> > in general so it will add all sorts of weird runtime corner cases which
> > are impossible to anticipate [*]. Maybe that will work in some scenarios
> > but definitely not something to be done by default without users opting
> > into that and being aware of consequences.
> 
> Would a kernel command line option to control the behavior be helpful here?

I am not sure what would be the proper way to control that that would be
future extensible. Kernel command line is certainly and option but if we
want to extend that to module like or eBPF interface then it wouldn't
stand a future test very quickly.

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 145+ messages in thread

* Re: [Nouveau] [PATCH 03/13] mm: shmem: provide oom badness for shmem files
@ 2022-06-15 13:15                                         ` Michal Hocko
  0 siblings, 0 replies; 145+ messages in thread
From: Michal Hocko @ 2022-06-15 13:15 UTC (permalink / raw)
  To: Christian König
  Cc: andrey.grodzovsky, linux-mm, nouveau, intel-gfx, hughd,
	linux-kernel, amd-gfx, linux-fsdevel, viro, daniel,
	Christian König, linux-tegra, alexander.deucher, akpm,
	linux-media

On Wed 15-06-22 14:35:22, Christian König wrote:
[...]
> Even the classic mm_struct based accounting includes MM_SHMEMPAGES into the
> badness. So accounting shared resources as badness to make a decision is
> nothing new here.

Yeah, it is nothing really new but it also doesn't mean it is an example
worth following as this doesn't really work currently. Also please note
that MM_SHMEMPAGES is counting at least something process specific as
those pages are mapped in to the process (and with enough of wishful
thinking unmapping can drop the last reference and free something up
actually) . With generic per-file memory this is even more detached from
process.

> The difference is that this time the badness doesn't come from the memory
> management subsystem, but rather from the I/O subsystem.
> 
> > This is also the reason why I am not really fan of the per file
> > badness because it adds a notion of resource that is not process bound
> > in general so it will add all sorts of weird runtime corner cases which
> > are impossible to anticipate [*]. Maybe that will work in some scenarios
> > but definitely not something to be done by default without users opting
> > into that and being aware of consequences.
> 
> Would a kernel command line option to control the behavior be helpful here?

I am not sure what would be the proper way to control that that would be
future extensible. Kernel command line is certainly and option but if we
want to extend that to module like or eBPF interface then it wouldn't
stand a future test very quickly.

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 145+ messages in thread

* Re: [PATCH 03/13] mm: shmem: provide oom badness for shmem files
  2022-06-15 13:15                                         ` [Intel-gfx] " Michal Hocko
  (?)
  (?)
@ 2022-06-15 14:24                                           ` Christian König
  -1 siblings, 0 replies; 145+ messages in thread
From: Christian König @ 2022-06-15 14:24 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Christian König, linux-media, linux-kernel, intel-gfx,
	amd-gfx, nouveau, linux-tegra, linux-fsdevel, linux-mm,
	alexander.deucher, daniel, viro, akpm, hughd, andrey.grodzovsky

Am 15.06.22 um 15:15 schrieb Michal Hocko:
> On Wed 15-06-22 14:35:22, Christian König wrote:
> [...]
>> Even the classic mm_struct based accounting includes MM_SHMEMPAGES into the
>> badness. So accounting shared resources as badness to make a decision is
>> nothing new here.
> Yeah, it is nothing really new but it also doesn't mean it is an example
> worth following as this doesn't really work currently. Also please note
> that MM_SHMEMPAGES is counting at least something process specific as
> those pages are mapped in to the process (and with enough of wishful
> thinking unmapping can drop the last reference and free something up
> actually) . With generic per-file memory this is even more detached from
> process.

But this is exactly the use case here. See I do have the 1% which is 
shared between processes, but 99% of the allocations only one process 
has a reference to them.

So that wishful thinking that we can drop the last reference when we 
kill this specific process is perfectly justified.

It can be that this doesn't fit all use cases for the shmem file, but it 
certainly does for DRM and DMA-buf.

>> The difference is that this time the badness doesn't come from the memory
>> management subsystem, but rather from the I/O subsystem.
>>
>>> This is also the reason why I am not really fan of the per file
>>> badness because it adds a notion of resource that is not process bound
>>> in general so it will add all sorts of weird runtime corner cases which
>>> are impossible to anticipate [*]. Maybe that will work in some scenarios
>>> but definitely not something to be done by default without users opting
>>> into that and being aware of consequences.
>> Would a kernel command line option to control the behavior be helpful here?
> I am not sure what would be the proper way to control that that would be
> future extensible. Kernel command line is certainly and option but if we
> want to extend that to module like or eBPF interface then it wouldn't
> stand a future test very quickly.

Well kernel command lines are not really meant to be stable, aren't they?

Regards,
Christian.

^ permalink raw reply	[flat|nested] 145+ messages in thread

* Re: [Nouveau] [PATCH 03/13] mm: shmem: provide oom badness for shmem files
@ 2022-06-15 14:24                                           ` Christian König
  0 siblings, 0 replies; 145+ messages in thread
From: Christian König @ 2022-06-15 14:24 UTC (permalink / raw)
  To: Michal Hocko
  Cc: andrey.grodzovsky, linux-mm, nouveau, intel-gfx, hughd,
	linux-kernel, amd-gfx, linux-fsdevel, viro, daniel,
	Christian König, linux-tegra, alexander.deucher, akpm,
	linux-media

Am 15.06.22 um 15:15 schrieb Michal Hocko:
> On Wed 15-06-22 14:35:22, Christian König wrote:
> [...]
>> Even the classic mm_struct based accounting includes MM_SHMEMPAGES into the
>> badness. So accounting shared resources as badness to make a decision is
>> nothing new here.
> Yeah, it is nothing really new but it also doesn't mean it is an example
> worth following as this doesn't really work currently. Also please note
> that MM_SHMEMPAGES is counting at least something process specific as
> those pages are mapped in to the process (and with enough of wishful
> thinking unmapping can drop the last reference and free something up
> actually) . With generic per-file memory this is even more detached from
> process.

But this is exactly the use case here. See I do have the 1% which is 
shared between processes, but 99% of the allocations only one process 
has a reference to them.

So that wishful thinking that we can drop the last reference when we 
kill this specific process is perfectly justified.

It can be that this doesn't fit all use cases for the shmem file, but it 
certainly does for DRM and DMA-buf.

>> The difference is that this time the badness doesn't come from the memory
>> management subsystem, but rather from the I/O subsystem.
>>
>>> This is also the reason why I am not really fan of the per file
>>> badness because it adds a notion of resource that is not process bound
>>> in general so it will add all sorts of weird runtime corner cases which
>>> are impossible to anticipate [*]. Maybe that will work in some scenarios
>>> but definitely not something to be done by default without users opting
>>> into that and being aware of consequences.
>> Would a kernel command line option to control the behavior be helpful here?
> I am not sure what would be the proper way to control that that would be
> future extensible. Kernel command line is certainly and option but if we
> want to extend that to module like or eBPF interface then it wouldn't
> stand a future test very quickly.

Well kernel command lines are not really meant to be stable, aren't they?

Regards,
Christian.

^ permalink raw reply	[flat|nested] 145+ messages in thread

* Re: [Intel-gfx] [PATCH 03/13] mm: shmem: provide oom badness for shmem files
@ 2022-06-15 14:24                                           ` Christian König
  0 siblings, 0 replies; 145+ messages in thread
From: Christian König @ 2022-06-15 14:24 UTC (permalink / raw)
  To: Michal Hocko
  Cc: andrey.grodzovsky, linux-mm, nouveau, intel-gfx, hughd,
	linux-kernel, amd-gfx, linux-fsdevel, viro, Christian König,
	linux-tegra, alexander.deucher, akpm, linux-media

Am 15.06.22 um 15:15 schrieb Michal Hocko:
> On Wed 15-06-22 14:35:22, Christian König wrote:
> [...]
>> Even the classic mm_struct based accounting includes MM_SHMEMPAGES into the
>> badness. So accounting shared resources as badness to make a decision is
>> nothing new here.
> Yeah, it is nothing really new but it also doesn't mean it is an example
> worth following as this doesn't really work currently. Also please note
> that MM_SHMEMPAGES is counting at least something process specific as
> those pages are mapped in to the process (and with enough of wishful
> thinking unmapping can drop the last reference and free something up
> actually) . With generic per-file memory this is even more detached from
> process.

But this is exactly the use case here. See I do have the 1% which is 
shared between processes, but 99% of the allocations only one process 
has a reference to them.

So that wishful thinking that we can drop the last reference when we 
kill this specific process is perfectly justified.

It can be that this doesn't fit all use cases for the shmem file, but it 
certainly does for DRM and DMA-buf.

>> The difference is that this time the badness doesn't come from the memory
>> management subsystem, but rather from the I/O subsystem.
>>
>>> This is also the reason why I am not really fan of the per file
>>> badness because it adds a notion of resource that is not process bound
>>> in general so it will add all sorts of weird runtime corner cases which
>>> are impossible to anticipate [*]. Maybe that will work in some scenarios
>>> but definitely not something to be done by default without users opting
>>> into that and being aware of consequences.
>> Would a kernel command line option to control the behavior be helpful here?
> I am not sure what would be the proper way to control that that would be
> future extensible. Kernel command line is certainly and option but if we
> want to extend that to module like or eBPF interface then it wouldn't
> stand a future test very quickly.

Well kernel command lines are not really meant to be stable, aren't they?

Regards,
Christian.

^ permalink raw reply	[flat|nested] 145+ messages in thread

* Re: [PATCH 03/13] mm: shmem: provide oom badness for shmem files
@ 2022-06-15 14:24                                           ` Christian König
  0 siblings, 0 replies; 145+ messages in thread
From: Christian König @ 2022-06-15 14:24 UTC (permalink / raw)
  To: Michal Hocko
  Cc: andrey.grodzovsky, linux-mm, nouveau, intel-gfx, hughd,
	linux-kernel, amd-gfx, linux-fsdevel, viro, daniel,
	Christian König, linux-tegra, alexander.deucher, akpm,
	linux-media

Am 15.06.22 um 15:15 schrieb Michal Hocko:
> On Wed 15-06-22 14:35:22, Christian König wrote:
> [...]
>> Even the classic mm_struct based accounting includes MM_SHMEMPAGES into the
>> badness. So accounting shared resources as badness to make a decision is
>> nothing new here.
> Yeah, it is nothing really new but it also doesn't mean it is an example
> worth following as this doesn't really work currently. Also please note
> that MM_SHMEMPAGES is counting at least something process specific as
> those pages are mapped in to the process (and with enough of wishful
> thinking unmapping can drop the last reference and free something up
> actually) . With generic per-file memory this is even more detached from
> process.

But this is exactly the use case here. See I do have the 1% which is 
shared between processes, but 99% of the allocations only one process 
has a reference to them.

So that wishful thinking that we can drop the last reference when we 
kill this specific process is perfectly justified.

It can be that this doesn't fit all use cases for the shmem file, but it 
certainly does for DRM and DMA-buf.

>> The difference is that this time the badness doesn't come from the memory
>> management subsystem, but rather from the I/O subsystem.
>>
>>> This is also the reason why I am not really fan of the per file
>>> badness because it adds a notion of resource that is not process bound
>>> in general so it will add all sorts of weird runtime corner cases which
>>> are impossible to anticipate [*]. Maybe that will work in some scenarios
>>> but definitely not something to be done by default without users opting
>>> into that and being aware of consequences.
>> Would a kernel command line option to control the behavior be helpful here?
> I am not sure what would be the proper way to control that that would be
> future extensible. Kernel command line is certainly and option but if we
> want to extend that to module like or eBPF interface then it wouldn't
> stand a future test very quickly.

Well kernel command lines are not really meant to be stable, aren't they?

Regards,
Christian.

^ permalink raw reply	[flat|nested] 145+ messages in thread

end of thread, other threads:[~2022-08-04 20:48 UTC | newest]

Thread overview: 145+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-05-31  9:59 Per file OOM badness Christian König
2022-05-31  9:59 ` Christian König
2022-05-31  9:59 ` [Nouveau] " Christian König
2022-05-31  9:59 ` [PATCH 01/13] fs: add OOM badness callback to file_operatrations struct Christian König
2022-05-31  9:59   ` Christian König
2022-05-31  9:59   ` [Nouveau] " Christian König
2022-05-31  9:59 ` [PATCH 02/13] oom: take per file badness into account Christian König
2022-05-31  9:59   ` Christian König
2022-05-31  9:59   ` [Nouveau] " Christian König
2022-05-31  9:59 ` [PATCH 03/13] mm: shmem: provide oom badness for shmem files Christian König
2022-05-31  9:59   ` Christian König
2022-05-31  9:59   ` [Nouveau] " Christian König
2022-06-09  9:18   ` Michal Hocko
2022-06-09  9:18     ` [Nouveau] " Michal Hocko
2022-06-09  9:18     ` Michal Hocko
2022-06-09  9:18     ` [Intel-gfx] " Michal Hocko
2022-06-09 12:16     ` Christian König
2022-06-09 12:16       ` Christian König
2022-06-09 12:16       ` [Intel-gfx] " Christian König
2022-06-09 12:16       ` [Nouveau] " Christian König
2022-06-09 12:57       ` Michal Hocko
2022-06-09 12:57         ` [Nouveau] " Michal Hocko
2022-06-09 12:57         ` Michal Hocko
2022-06-09 12:57         ` [Intel-gfx] " Michal Hocko
2022-06-09 14:10         ` Christian König
2022-06-09 14:10           ` Christian König
2022-06-09 14:10           ` [Nouveau] " Christian König
2022-06-09 14:21           ` Michal Hocko
2022-06-09 14:21             ` [Nouveau] " Michal Hocko
2022-06-09 14:21             ` Michal Hocko
2022-06-09 14:21             ` [Intel-gfx] " Michal Hocko
2022-06-09 14:29             ` Christian König
2022-06-09 14:29               ` Christian König
2022-06-09 14:29               ` [Intel-gfx] " Christian König
2022-06-09 14:29               ` [Nouveau] " Christian König
2022-06-09 15:07               ` Michal Hocko
2022-06-09 15:07                 ` [Nouveau] " Michal Hocko
2022-06-09 15:07                 ` Michal Hocko
2022-06-09 15:07                 ` [Intel-gfx] " Michal Hocko
2022-06-10 10:58                 ` Christian König
2022-06-10 10:58                   ` Christian König
2022-06-10 10:58                   ` [Nouveau] " Christian König
2022-06-10 11:44                   ` Michal Hocko
2022-06-10 11:44                     ` [Nouveau] " Michal Hocko
2022-06-10 11:44                     ` Michal Hocko
2022-06-10 11:44                     ` [Intel-gfx] " Michal Hocko
2022-06-10 12:17                     ` Christian König
2022-06-10 12:17                       ` Christian König
2022-06-10 12:17                       ` [Intel-gfx] " Christian König
2022-06-10 12:17                       ` [Nouveau] " Christian König
2022-06-10 14:16                       ` Michal Hocko
2022-06-10 14:16                         ` [Nouveau] " Michal Hocko
2022-06-10 14:16                         ` Michal Hocko
2022-06-10 14:16                         ` [Intel-gfx] " Michal Hocko
2022-06-11  8:06                         ` Christian König
2022-06-11  8:06                           ` Christian König
2022-06-11  8:06                           ` [Intel-gfx] " Christian König
2022-06-11  8:06                           ` [Nouveau] " Christian König
2022-06-13  7:45                           ` Michal Hocko
2022-06-13  7:45                             ` [Nouveau] " Michal Hocko
2022-06-13  7:45                             ` Michal Hocko
2022-06-13  7:45                             ` [Intel-gfx] " Michal Hocko
2022-06-13 11:50                             ` Christian König
2022-06-13 11:50                               ` Christian König
2022-06-13 11:50                               ` [Intel-gfx] " Christian König
2022-06-13 11:50                               ` [Nouveau] " Christian König
2022-06-13 12:11                               ` Michal Hocko
2022-06-13 12:11                                 ` [Nouveau] " Michal Hocko
2022-06-13 12:11                                 ` Michal Hocko
2022-06-13 12:11                                 ` [Intel-gfx] " Michal Hocko
2022-06-13 12:55                                 ` [Nouveau] " Christian König
2022-06-13 12:55                                   ` Christian König
2022-06-13 12:55                                   ` Christian König
2022-06-13 12:55                                   ` [Intel-gfx] " Christian König
2022-06-13 14:11                                   ` Michal Hocko
2022-06-13 14:11                                     ` [Nouveau] " Michal Hocko
2022-06-13 14:11                                     ` Michal Hocko
2022-06-13 14:11                                     ` Michal Hocko
2022-06-15 12:35                                     ` Christian König
2022-06-15 12:35                                       ` Christian König
2022-06-15 12:35                                       ` [Intel-gfx] " Christian König
2022-06-15 12:35                                       ` [Nouveau] " Christian König
2022-06-15 13:15                                       ` Michal Hocko
2022-06-15 13:15                                         ` [Nouveau] " Michal Hocko
2022-06-15 13:15                                         ` Michal Hocko
2022-06-15 13:15                                         ` [Intel-gfx] " Michal Hocko
2022-06-15 14:24                                         ` Christian König
2022-06-15 14:24                                           ` Christian König
2022-06-15 14:24                                           ` [Intel-gfx] " Christian König
2022-06-15 14:24                                           ` [Nouveau] " Christian König
2022-06-13  9:08                           ` Michel Dänzer
2022-06-13  9:08                             ` [Nouveau] " Michel Dänzer
2022-06-13  9:08                             ` [Intel-gfx] " Michel Dänzer
2022-06-13  9:08                             ` Michel Dänzer
2022-06-13  9:11                             ` Christian König
2022-06-13  9:11                               ` Christian König
2022-06-13  9:11                               ` [Intel-gfx] " Christian König
2022-06-13  9:11                               ` [Nouveau] " Christian König
2022-06-09 15:19             ` Felix Kuehling
2022-06-09 15:19               ` Felix Kuehling
2022-06-09 15:19               ` [Intel-gfx] " Felix Kuehling
2022-06-09 15:19               ` [Nouveau] " Felix Kuehling
2022-06-09 15:22               ` Christian König
2022-06-09 15:22                 ` Christian König
2022-06-09 15:22                 ` [Intel-gfx] " Christian König
2022-06-09 15:22                 ` [Nouveau] " Christian König
2022-06-09 15:54                 ` Michal Hocko
2022-06-09 15:54                   ` [Nouveau] " Michal Hocko
2022-06-09 15:54                   ` Michal Hocko
2022-06-09 15:54                   ` [Intel-gfx] " Michal Hocko
2022-05-31  9:59 ` [PATCH 04/13] dma-buf: provide oom badness for DMA-buf files Christian König
2022-05-31  9:59   ` Christian König
2022-05-31  9:59   ` [Nouveau] " Christian König
2022-05-31  9:59 ` [PATCH 05/13] drm/gem: adjust per file OOM badness on handling buffers Christian König
2022-05-31  9:59   ` Christian König
2022-05-31  9:59   ` [Nouveau] " Christian König
2022-05-31 10:00 ` [PATCH 06/13] drm/gma500: use drm_oom_badness Christian König
2022-05-31 10:00   ` Christian König
2022-05-31 10:00   ` [Nouveau] " Christian König
2022-05-31 10:00 ` [PATCH 07/13] drm/amdgpu: Use drm_oom_badness for amdgpu Christian König
2022-05-31 10:00   ` Christian König
2022-05-31 10:00   ` [Nouveau] " Christian König
2022-05-31 10:00 ` [PATCH 08/13] drm/radeon: use drm_oom_badness Christian König
2022-05-31 10:00   ` Christian König
2022-05-31 10:00   ` [Nouveau] " Christian König
2022-05-31 10:00 ` [PATCH 09/13] drm/i915: " Christian König
2022-05-31 10:00   ` Christian König
2022-05-31 10:00   ` [Nouveau] " Christian König
2022-05-31 10:00 ` [PATCH 10/13] drm/nouveau: " Christian König
2022-05-31 10:00   ` Christian König
2022-05-31 10:00   ` [Nouveau] " Christian König
2022-05-31 10:00 ` [PATCH 11/13] drm/omap: " Christian König
2022-05-31 10:00   ` Christian König
2022-05-31 10:00   ` [Nouveau] " Christian König
2022-05-31 10:00 ` [PATCH 12/13] drm/vmwgfx: " Christian König
2022-05-31 10:00   ` Christian König
2022-05-31 10:00   ` [Nouveau] " Christian König
2022-05-31 10:00 ` [PATCH 13/13] drm/tegra: " Christian König
2022-05-31 10:00   ` Christian König
2022-05-31 10:00   ` [Nouveau] " Christian König
2022-05-31 22:00 ` Per file OOM badness Alex Deucher
2022-05-31 22:00   ` Alex Deucher
2022-05-31 22:00   ` [Intel-gfx] " Alex Deucher
2022-05-31 22:00   ` Alex Deucher
2022-05-31 22:00   ` [Nouveau] " Alex Deucher

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.