linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH NOTFORMERGE 0/5] Extend remote madvise API to KSM hints
@ 2019-06-16  8:58 Oleksandr Natalenko
  2019-06-16  8:58 ` [PATCH NOTFORMERGE 1/5] mm: rename madvise_core to madvise_common Oleksandr Natalenko
                   ` (5 more replies)
  0 siblings, 6 replies; 8+ messages in thread
From: Oleksandr Natalenko @ 2019-06-16  8:58 UTC (permalink / raw)
  To: Minchan Kim; +Cc: linux-kernel, linux-mm, linux-api

Hi, Minchan.

This is a set of commits based on our discussion on your submission [1].

First 2 implement minor suggestions just for you to not forget to take
them into account.

uio.h inclusion was needed for me to be able to compile your series
successfully. Also please note I had to enable "Transparent Hugepage
Support" as well as "Enable idle page tracking" options, otherwise the
build failed. I guess this can be addressed by you better since the
errors are introduced with MADV_COLD introduction.

Last 2 commits are the actual KSM hints enablement. The first one
implements additional check for the case where the mmap_sem is taken for
write, and the second one just allows KSM hints to be used by the remote
interface.

I'm not Cc'ing else anyone except two mailing lists to not distract
people unnecessarily. If you are fine with this addition, please use it
for your next iteration of process_madvise(), and then you'll Cc all the
people needed.

Thanks.

[1] https://lore.kernel.org/lkml/20190531064313.193437-1-minchan@kernel.org/

Oleksandr Natalenko (5):
  mm: rename madvise_core to madvise_common
  mm: revert madvise_inject_error line split
  mm: include uio.h to madvise.c
  mm/madvise: employ mmget_still_valid for write lock
  mm/madvise: allow KSM hints for remote API

 mm/madvise.c | 23 ++++++++++++++---------
 1 file changed, 14 insertions(+), 9 deletions(-)

-- 
2.22.0


^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH NOTFORMERGE 1/5] mm: rename madvise_core to madvise_common
  2019-06-16  8:58 [PATCH NOTFORMERGE 0/5] Extend remote madvise API to KSM hints Oleksandr Natalenko
@ 2019-06-16  8:58 ` Oleksandr Natalenko
  2019-06-16  8:58 ` [PATCH NOTFORMERGE 2/5] mm: revert madvise_inject_error line split Oleksandr Natalenko
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 8+ messages in thread
From: Oleksandr Natalenko @ 2019-06-16  8:58 UTC (permalink / raw)
  To: Minchan Kim; +Cc: linux-kernel, linux-mm, linux-api

"core" usually means something very different within the kernel land,
thus lets just follow the way it is handled in mutexes, rw_semaphores
etc and name common things as "_common".

Signed-off-by: Oleksandr Natalenko <oleksandr@redhat.com>
---
 mm/madvise.c | 14 +++++++-------
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/mm/madvise.c b/mm/madvise.c
index 94d782097afd..edb7184f665c 100644
--- a/mm/madvise.c
+++ b/mm/madvise.c
@@ -998,7 +998,7 @@ process_madvise_behavior_valid(int behavior)
 }
 
 /*
- * madvise_core - request behavior hint to address range of the target process
+ * madvise_common - request behavior hint to address range of the target process
  *
  * @task: task_struct got behavior hint, not giving the hint
  * @mm: mm_struct got behavior hint, not giving the hint
@@ -1009,7 +1009,7 @@ process_madvise_behavior_valid(int behavior)
  * @task could be a zombie leader if it calls sys_exit so accessing mm_struct
  * via task->mm is prohibited. Please use @mm insetad of task->mm.
  */
-static int madvise_core(struct task_struct *task, struct mm_struct *mm,
+static int madvise_common(struct task_struct *task, struct mm_struct *mm,
 			unsigned long start, size_t len_in, int behavior)
 {
 	unsigned long end, tmp;
@@ -1132,7 +1132,7 @@ static int pr_madvise_copy_param(struct pr_madvise_param __user *u_param,
 	return ret;
 }
 
-static int process_madvise_core(struct task_struct *tsk, struct mm_struct *mm,
+static int process_madvise_common(struct task_struct *tsk, struct mm_struct *mm,
 				int *behaviors,
 				struct iov_iter *iter,
 				const struct iovec *range_vec,
@@ -1144,7 +1144,7 @@ static int process_madvise_core(struct task_struct *tsk, struct mm_struct *mm,
 	for (i = 0; i < riovcnt && iov_iter_count(iter); i++) {
 		err = -EINVAL;
 		if (process_madvise_behavior_valid(behaviors[i]))
-			err = madvise_core(tsk, mm,
+			err = madvise_common(tsk, mm,
 				(unsigned long)range_vec[i].iov_base,
 				range_vec[i].iov_len, behaviors[i]);
 
@@ -1220,7 +1220,7 @@ static int process_madvise_core(struct task_struct *tsk, struct mm_struct *mm,
  */
 SYSCALL_DEFINE3(madvise, unsigned long, start, size_t, len_in, int, behavior)
 {
-	return madvise_core(current, current->mm, start, len_in, behavior);
+	return madvise_common(current, current->mm, start, len_in, behavior);
 }
 
 
@@ -1252,7 +1252,7 @@ SYSCALL_DEFINE3(process_madvise, int, pidfd,
 
 	/*
 	 * We don't support cookie to gaurantee address space atomicity yet.
-	 * Once we implment cookie, process_madvise_core need to hold mmap_sme
+	 * Once we implment cookie, process_madvise_common need to hold mmap_sme
 	 * during entire operation to guarantee atomicity.
 	 */
 	if (params.cookie != 0)
@@ -1316,7 +1316,7 @@ SYSCALL_DEFINE3(process_madvise, int, pidfd,
 		goto release_task;
 	}
 
-	ret = process_madvise_core(task, mm, behaviors, &iter, iov_r, nr_elem);
+	ret = process_madvise_common(task, mm, behaviors, &iter, iov_r, nr_elem);
 	mmput(mm);
 release_task:
 	put_task_struct(task);
-- 
2.22.0


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH NOTFORMERGE 2/5] mm: revert madvise_inject_error line split
  2019-06-16  8:58 [PATCH NOTFORMERGE 0/5] Extend remote madvise API to KSM hints Oleksandr Natalenko
  2019-06-16  8:58 ` [PATCH NOTFORMERGE 1/5] mm: rename madvise_core to madvise_common Oleksandr Natalenko
@ 2019-06-16  8:58 ` Oleksandr Natalenko
  2019-06-16  8:58 ` [PATCH NOTFORMERGE 3/5] mm: include uio.h to madvise.c Oleksandr Natalenko
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 8+ messages in thread
From: Oleksandr Natalenko @ 2019-06-16  8:58 UTC (permalink / raw)
  To: Minchan Kim; +Cc: linux-kernel, linux-mm, linux-api

Just to highlight it after our conversation.

Signed-off-by: Oleksandr Natalenko <oleksandr@redhat.com>
---
 mm/madvise.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/mm/madvise.c b/mm/madvise.c
index edb7184f665c..70aeb54f3e1c 100644
--- a/mm/madvise.c
+++ b/mm/madvise.c
@@ -1041,8 +1041,7 @@ static int madvise_common(struct task_struct *task, struct mm_struct *mm,
 
 #ifdef CONFIG_MEMORY_FAILURE
 	if (behavior == MADV_HWPOISON || behavior == MADV_SOFT_OFFLINE)
-		return madvise_inject_error(behavior,
-					start, start + len_in);
+		return madvise_inject_error(behavior, start, start + len_in);
 #endif
 
 	write = madvise_need_mmap_write(behavior);
-- 
2.22.0


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH NOTFORMERGE 3/5] mm: include uio.h to madvise.c
  2019-06-16  8:58 [PATCH NOTFORMERGE 0/5] Extend remote madvise API to KSM hints Oleksandr Natalenko
  2019-06-16  8:58 ` [PATCH NOTFORMERGE 1/5] mm: rename madvise_core to madvise_common Oleksandr Natalenko
  2019-06-16  8:58 ` [PATCH NOTFORMERGE 2/5] mm: revert madvise_inject_error line split Oleksandr Natalenko
@ 2019-06-16  8:58 ` Oleksandr Natalenko
  2019-06-16  8:58 ` [PATCH NOTFORMERGE 4/5] mm/madvise: employ mmget_still_valid for write lock Oleksandr Natalenko
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 8+ messages in thread
From: Oleksandr Natalenko @ 2019-06-16  8:58 UTC (permalink / raw)
  To: Minchan Kim; +Cc: linux-kernel, linux-mm, linux-api

I couldn't compile it w/o this header.

Signed-off-by: Oleksandr Natalenko <oleksandr@redhat.com>
---
 mm/madvise.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/mm/madvise.c b/mm/madvise.c
index 70aeb54f3e1c..9755340da157 100644
--- a/mm/madvise.c
+++ b/mm/madvise.c
@@ -25,6 +25,7 @@
 #include <linux/swapops.h>
 #include <linux/shmem_fs.h>
 #include <linux/mmu_notifier.h>
+#include <linux/uio.h>
 
 #include <asm/tlb.h>
 
-- 
2.22.0


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH NOTFORMERGE 4/5] mm/madvise: employ mmget_still_valid for write lock
  2019-06-16  8:58 [PATCH NOTFORMERGE 0/5] Extend remote madvise API to KSM hints Oleksandr Natalenko
                   ` (2 preceding siblings ...)
  2019-06-16  8:58 ` [PATCH NOTFORMERGE 3/5] mm: include uio.h to madvise.c Oleksandr Natalenko
@ 2019-06-16  8:58 ` Oleksandr Natalenko
  2019-06-16  8:58 ` [PATCH NOTFORMERGE 5/5] mm/madvise: allow KSM hints for remote API Oleksandr Natalenko
  2019-12-10 10:49 ` [PATCH NOTFORMERGE 0/5] Extend remote madvise API to KSM hints Oleksandr Natalenko
  5 siblings, 0 replies; 8+ messages in thread
From: Oleksandr Natalenko @ 2019-06-16  8:58 UTC (permalink / raw)
  To: Minchan Kim; +Cc: linux-kernel, linux-mm, linux-api

Do the very same trick as we already do since 04f5866e41fb. KSM hints
will require locking mmap_sem for write since they modify vm_flags, so
for remote KSM hinting this additional check is needed.

Signed-off-by: Oleksandr Natalenko <oleksandr@redhat.com>
---
 mm/madvise.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/mm/madvise.c b/mm/madvise.c
index 9755340da157..84f899b1b6da 100644
--- a/mm/madvise.c
+++ b/mm/madvise.c
@@ -1049,6 +1049,8 @@ static int madvise_common(struct task_struct *task, struct mm_struct *mm,
 	if (write) {
 		if (down_write_killable(&mm->mmap_sem))
 			return -EINTR;
+		if (current->mm != mm && !mmget_still_valid(mm))
+			goto skip_mm;
 	} else {
 		down_read(&mm->mmap_sem);
 	}
@@ -1099,6 +1101,7 @@ static int madvise_common(struct task_struct *task, struct mm_struct *mm,
 	}
 out:
 	blk_finish_plug(&plug);
+skip_mm:
 	if (write)
 		up_write(&mm->mmap_sem);
 	else
-- 
2.22.0


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH NOTFORMERGE 5/5] mm/madvise: allow KSM hints for remote API
  2019-06-16  8:58 [PATCH NOTFORMERGE 0/5] Extend remote madvise API to KSM hints Oleksandr Natalenko
                   ` (3 preceding siblings ...)
  2019-06-16  8:58 ` [PATCH NOTFORMERGE 4/5] mm/madvise: employ mmget_still_valid for write lock Oleksandr Natalenko
@ 2019-06-16  8:58 ` Oleksandr Natalenko
  2019-12-10 10:49 ` [PATCH NOTFORMERGE 0/5] Extend remote madvise API to KSM hints Oleksandr Natalenko
  5 siblings, 0 replies; 8+ messages in thread
From: Oleksandr Natalenko @ 2019-06-16  8:58 UTC (permalink / raw)
  To: Minchan Kim; +Cc: linux-kernel, linux-mm, linux-api

It all began with the fact that KSM works only on memory that is marked
by madvise(). And the only way to get around that is to either:

  * use LD_PRELOAD; or
  * patch the kernel with something like UKSM or PKSM.

(i skip ptrace can of worms here intentionally)

To overcome this restriction, lets employ a new remote madvise API. This
can be used by some small userspace helper daemon that will do auto-KSM
job for us.

I think of two major consumers of remote KSM hints:

  * hosts, that run containers, especially similar ones and especially in
    a trusted environment, sharing the same runtime like Node.js;

  * heavy applications, that can be run in multiple instances, not
    limited to opensource ones like Firefox, but also those that cannot be
    modified since they are binary-only and, maybe, statically linked.

Speaking of statistics, more numbers can be found in the very first
submission, that is related to this one [1]. For my current setup with
two Firefox instances I get 100 to 200 MiB saved for the second instance
depending on the amount of tabs.

1 FF instance with 15 tabs:

   $ echo "$(cat /sys/kernel/mm/ksm/pages_sharing) * 4 / 1024" | bc
   410

2 FF instances, second one has 12 tabs (all the tabs are different):

   $ echo "$(cat /sys/kernel/mm/ksm/pages_sharing) * 4 / 1024" | bc
   592

At the very moment I do not have specific numbers for containerised
workload, but those should be comparable in case the containers share
similar/same runtime.

[1] https://lore.kernel.org/patchwork/patch/1012142/

Signed-off-by: Oleksandr Natalenko <oleksandr@redhat.com>
---
 mm/madvise.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/mm/madvise.c b/mm/madvise.c
index 84f899b1b6da..e8f9c49794a3 100644
--- a/mm/madvise.c
+++ b/mm/madvise.c
@@ -991,6 +991,8 @@ process_madvise_behavior_valid(int behavior)
 	switch (behavior) {
 	case MADV_COLD:
 	case MADV_PAGEOUT:
+	case MADV_MERGEABLE:
+	case MADV_UNMERGEABLE:
 		return true;
 
 	default:
-- 
2.22.0


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH NOTFORMERGE 0/5] Extend remote madvise API to KSM hints
  2019-06-16  8:58 [PATCH NOTFORMERGE 0/5] Extend remote madvise API to KSM hints Oleksandr Natalenko
                   ` (4 preceding siblings ...)
  2019-06-16  8:58 ` [PATCH NOTFORMERGE 5/5] mm/madvise: allow KSM hints for remote API Oleksandr Natalenko
@ 2019-12-10 10:49 ` Oleksandr Natalenko
  2019-12-11  1:35   ` Minchan Kim
  5 siblings, 1 reply; 8+ messages in thread
From: Oleksandr Natalenko @ 2019-12-10 10:49 UTC (permalink / raw)
  To: Minchan Kim; +Cc: linux-kernel, linux-mm, linux-api

Hello, Minchan.

On Sun, Jun 16, 2019 at 10:58:30AM +0200, Oleksandr Natalenko wrote:
> This is a set of commits based on our discussion on your submission [1].
> 
> First 2 implement minor suggestions just for you to not forget to take
> them into account.
> 
> uio.h inclusion was needed for me to be able to compile your series
> successfully. Also please note I had to enable "Transparent Hugepage
> Support" as well as "Enable idle page tracking" options, otherwise the
> build failed. I guess this can be addressed by you better since the
> errors are introduced with MADV_COLD introduction.
> 
> Last 2 commits are the actual KSM hints enablement. The first one
> implements additional check for the case where the mmap_sem is taken for
> write, and the second one just allows KSM hints to be used by the remote
> interface.
> 
> I'm not Cc'ing else anyone except two mailing lists to not distract
> people unnecessarily. If you are fine with this addition, please use it
> for your next iteration of process_madvise(), and then you'll Cc all the
> people needed.
> 
> Thanks.
> 
> [1] https://lore.kernel.org/lkml/20190531064313.193437-1-minchan@kernel.org/
> 
> Oleksandr Natalenko (5):
>   mm: rename madvise_core to madvise_common
>   mm: revert madvise_inject_error line split
>   mm: include uio.h to madvise.c
>   mm/madvise: employ mmget_still_valid for write lock
>   mm/madvise: allow KSM hints for remote API
> 
>  mm/madvise.c | 23 ++++++++++++++---------
>  1 file changed, 14 insertions(+), 9 deletions(-)
> 
> -- 
> 2.22.0
> 

This is a gentle ping. Are you still planning to submit process_madvise() solution?

-- 
  Best regards,
    Oleksandr Natalenko (post-factum)
    Senior Software Maintenance Engineer


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH NOTFORMERGE 0/5] Extend remote madvise API to KSM hints
  2019-12-10 10:49 ` [PATCH NOTFORMERGE 0/5] Extend remote madvise API to KSM hints Oleksandr Natalenko
@ 2019-12-11  1:35   ` Minchan Kim
  0 siblings, 0 replies; 8+ messages in thread
From: Minchan Kim @ 2019-12-11  1:35 UTC (permalink / raw)
  To: Oleksandr Natalenko; +Cc: linux-kernel, linux-mm, linux-api

Hi Oleksandr,

On Tue, Dec 10, 2019 at 11:49:39AM +0100, Oleksandr Natalenko wrote:
> Hello, Minchan.
> 
> On Sun, Jun 16, 2019 at 10:58:30AM +0200, Oleksandr Natalenko wrote:
> > This is a set of commits based on our discussion on your submission [1].
> > 
> > First 2 implement minor suggestions just for you to not forget to take
> > them into account.
> > 
> > uio.h inclusion was needed for me to be able to compile your series
> > successfully. Also please note I had to enable "Transparent Hugepage
> > Support" as well as "Enable idle page tracking" options, otherwise the
> > build failed. I guess this can be addressed by you better since the
> > errors are introduced with MADV_COLD introduction.
> > 
> > Last 2 commits are the actual KSM hints enablement. The first one
> > implements additional check for the case where the mmap_sem is taken for
> > write, and the second one just allows KSM hints to be used by the remote
> > interface.
> > 
> > I'm not Cc'ing else anyone except two mailing lists to not distract
> > people unnecessarily. If you are fine with this addition, please use it
> > for your next iteration of process_madvise(), and then you'll Cc all the
> > people needed.
> > 
> > Thanks.
> > 
> > [1] https://lore.kernel.org/lkml/20190531064313.193437-1-minchan@kernel.org/
> > 
> > Oleksandr Natalenko (5):
> >   mm: rename madvise_core to madvise_common
> >   mm: revert madvise_inject_error line split
> >   mm: include uio.h to madvise.c
> >   mm/madvise: employ mmget_still_valid for write lock
> >   mm/madvise: allow KSM hints for remote API
> > 
> >  mm/madvise.c | 23 ++++++++++++++---------
> >  1 file changed, 14 insertions(+), 9 deletions(-)
> > 
> > -- 
> > 2.22.0
> > 
> 
> This is a gentle ping. Are you still planning to submit process_madvise() solution?

I'm really sorry for being slow progress.
I am stuck with internal stuff of company.
I will do best effort to send it within one or two weeks.

Thanks.

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2019-12-11  1:35 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-06-16  8:58 [PATCH NOTFORMERGE 0/5] Extend remote madvise API to KSM hints Oleksandr Natalenko
2019-06-16  8:58 ` [PATCH NOTFORMERGE 1/5] mm: rename madvise_core to madvise_common Oleksandr Natalenko
2019-06-16  8:58 ` [PATCH NOTFORMERGE 2/5] mm: revert madvise_inject_error line split Oleksandr Natalenko
2019-06-16  8:58 ` [PATCH NOTFORMERGE 3/5] mm: include uio.h to madvise.c Oleksandr Natalenko
2019-06-16  8:58 ` [PATCH NOTFORMERGE 4/5] mm/madvise: employ mmget_still_valid for write lock Oleksandr Natalenko
2019-06-16  8:58 ` [PATCH NOTFORMERGE 5/5] mm/madvise: allow KSM hints for remote API Oleksandr Natalenko
2019-12-10 10:49 ` [PATCH NOTFORMERGE 0/5] Extend remote madvise API to KSM hints Oleksandr Natalenko
2019-12-11  1:35   ` Minchan Kim

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).