linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v3] mm: memory: fix /proc/meminfo reporting for MLOCK_ONFAULT
@ 2019-09-13 21:11 Lucian Adrian Grijincu
  2019-09-13 21:17 ` [Potential Spoof] " Roman Gushchin
                   ` (2 more replies)
  0 siblings, 3 replies; 9+ messages in thread
From: Lucian Adrian Grijincu @ 2019-09-13 21:11 UTC (permalink / raw)
  To: Lucian Adrian Grijincu, linux-mm, Souptick Joarder
  Cc: linux-kernel, Michal Hocko, Andrew Morton, Rik van Riel, Roman Gushchin

As pages are faulted in MLOCK_ONFAULT correctly updates
/proc/self/smaps, but doesn't update /proc/meminfo's Mlocked field.

- Before this /proc/meminfo fields didn't change as pages were faulted in:

= Start =
/proc/meminfo
Unevictable:       10128 kB
Mlocked:           10132 kB
= Creating testfile =

= after mlock2(MLOCK_ONFAULT) =
/proc/meminfo
Unevictable:       10128 kB
Mlocked:           10132 kB
/proc/self/smaps
7f8714000000-7f8754000000 rw-s 00000000 08:04 50857050   /root/testfile
Locked:                0 kB

= after reading half of the file =
/proc/meminfo
Unevictable:       10128 kB
Mlocked:           10132 kB
/proc/self/smaps
7f8714000000-7f8754000000 rw-s 00000000 08:04 50857050   /root/testfile
Locked:           524288 kB

= after reading the entire the file =
/proc/meminfo
Unevictable:       10128 kB
Mlocked:           10132 kB
/proc/self/smaps
7f8714000000-7f8754000000 rw-s 00000000 08:04 50857050   /root/testfile
Locked:          1048576 kB

= after munmap =
/proc/meminfo
Unevictable:       10128 kB
Mlocked:           10132 kB
/proc/self/smaps

- After: /proc/meminfo fields are properly updated as pages are touched:

= Start =
/proc/meminfo
Unevictable:          60 kB
Mlocked:              60 kB
= Creating testfile =

= after mlock2(MLOCK_ONFAULT) =
/proc/meminfo
Unevictable:          60 kB
Mlocked:              60 kB
/proc/self/smaps
7f2b9c600000-7f2bdc600000 rw-s 00000000 08:04 63045798   /root/testfile
Locked:                0 kB

= after reading half of the file =
/proc/meminfo
Unevictable:      524220 kB
Mlocked:          524220 kB
/proc/self/smaps
7f2b9c600000-7f2bdc600000 rw-s 00000000 08:04 63045798   /root/testfile
Locked:           524288 kB

= after reading the entire the file =
/proc/meminfo
Unevictable:     1048496 kB
Mlocked:         1048508 kB
/proc/self/smaps
7f2b9c600000-7f2bdc600000 rw-s 00000000 08:04 63045798   /root/testfile
Locked:          1048576 kB

= after munmap =
/proc/meminfo
Unevictable:         176 kB
Mlocked:              60 kB
/proc/self/smaps

Repro code.
---

int mlock2wrap(const void* addr, size_t len, int flags) {
  return syscall(SYS_mlock2, addr, len, flags);
}

void smaps() {
  char smapscmd[1000];
  snprintf(
      smapscmd,
      sizeof(smapscmd) - 1,
      "grep testfile -A 20 /proc/%d/smaps | grep -E '(testfile|Locked)'",
      getpid());
  printf("/proc/self/smaps\n");
  fflush(stdout);
  system(smapscmd);
}

void meminfo() {
  const char* meminfocmd = "grep -E '(Mlocked|Unevictable)' /proc/meminfo";
  printf("/proc/meminfo\n");
  fflush(stdout);
  system(meminfocmd);
}

  {                                                 \
    int rc = (call);                                \
    if (rc != 0) {                                  \
      printf("error %d %s\n", rc, strerror(errno)); \
      exit(1);                                      \
    }                                               \
  }
int main(int argc, char* argv[]) {
  printf("= Start =\n");
  meminfo();

  printf("= Creating testfile =\n");
  size_t size = 1 << 30; // 1 GiB
  int fd = open("testfile", O_CREAT | O_RDWR, 0666);
  {
    void* buf = malloc(size);
    write(fd, buf, size);
    free(buf);
  }
  int ret = 0;
  void* addr = NULL;
  addr = mmap(NULL, size, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0);

  if (argc > 1) {
    PCHECK(mlock2wrap(addr, size, MLOCK_ONFAULT));
    printf("= after mlock2(MLOCK_ONFAULT) =\n");
    meminfo();
    smaps();

    for (size_t i = 0; i < size / 2; i += 4096) {
      ret += ((char*)addr)[i];
    }
    printf("= after reading half of the file =\n");
    meminfo();
    smaps();

    for (size_t i = 0; i < size; i += 4096) {
      ret += ((char*)addr)[i];
    }
    printf("= after reading the entire the file =\n");
    meminfo();
    smaps();

  } else {
    PCHECK(mlock(addr, size));
    printf("= after mlock =\n");
    meminfo();
    smaps();
  }

  PCHECK(munmap(addr, size));
  printf("= after munmap =\n");
  meminfo();
  smaps();

  return ret;
}

---

Signed-off-by: Lucian Adrian Grijincu <lucian@fb.com>
Acked-by: Souptick Joarder <jrdr.linux@gmail.com>
---
 mm/memory.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/mm/memory.c b/mm/memory.c
index e0c232fe81d9..55da24f33bc4 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -3311,6 +3311,8 @@ vm_fault_t alloc_set_pte(struct vm_fault *vmf, struct mem_cgroup *memcg,
 	} else {
 		inc_mm_counter_fast(vma->vm_mm, mm_counter_file(page));
 		page_add_file_rmap(page, false);
+		if (vma->vm_flags & VM_LOCKED && !PageTransCompound(page))
+			mlock_vma_page(page);
 	}
 	set_pte_at(vma->vm_mm, vmf->address, vmf->pte, entry);
 
-- 
2.17.1



^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [Potential Spoof] [PATCH v3] mm: memory: fix /proc/meminfo reporting for MLOCK_ONFAULT
  2019-09-13 21:11 [PATCH v3] mm: memory: fix /proc/meminfo reporting for MLOCK_ONFAULT Lucian Adrian Grijincu
@ 2019-09-13 21:17 ` Roman Gushchin
  2019-09-16 11:35 ` Michal Hocko
  2019-09-16 15:26 ` Kirill A. Shutemov
  2 siblings, 0 replies; 9+ messages in thread
From: Roman Gushchin @ 2019-09-13 21:17 UTC (permalink / raw)
  To: Lucian Grijincu
  Cc: linux-mm, Souptick Joarder, linux-kernel, Michal Hocko,
	Andrew Morton, Rik van Riel

On Fri, Sep 13, 2019 at 02:11:19PM -0700, Lucian Adrian Grijincu wrote:
> As pages are faulted in MLOCK_ONFAULT correctly updates
> /proc/self/smaps, but doesn't update /proc/meminfo's Mlocked field.
> 
> - Before this /proc/meminfo fields didn't change as pages were faulted in:
> 
> = Start =
> /proc/meminfo
> Unevictable:       10128 kB
> Mlocked:           10132 kB
> = Creating testfile =
> 
> = after mlock2(MLOCK_ONFAULT) =
> /proc/meminfo
> Unevictable:       10128 kB
> Mlocked:           10132 kB
> /proc/self/smaps
> 7f8714000000-7f8754000000 rw-s 00000000 08:04 50857050   /root/testfile
> Locked:                0 kB
> 
> = after reading half of the file =
> /proc/meminfo
> Unevictable:       10128 kB
> Mlocked:           10132 kB
> /proc/self/smaps
> 7f8714000000-7f8754000000 rw-s 00000000 08:04 50857050   /root/testfile
> Locked:           524288 kB
> 
> = after reading the entire the file =
> /proc/meminfo
> Unevictable:       10128 kB
> Mlocked:           10132 kB
> /proc/self/smaps
> 7f8714000000-7f8754000000 rw-s 00000000 08:04 50857050   /root/testfile
> Locked:          1048576 kB
> 
> = after munmap =
> /proc/meminfo
> Unevictable:       10128 kB
> Mlocked:           10132 kB
> /proc/self/smaps
> 
> - After: /proc/meminfo fields are properly updated as pages are touched:
> 
> = Start =
> /proc/meminfo
> Unevictable:          60 kB
> Mlocked:              60 kB
> = Creating testfile =
> 
> = after mlock2(MLOCK_ONFAULT) =
> /proc/meminfo
> Unevictable:          60 kB
> Mlocked:              60 kB
> /proc/self/smaps
> 7f2b9c600000-7f2bdc600000 rw-s 00000000 08:04 63045798   /root/testfile
> Locked:                0 kB
> 
> = after reading half of the file =
> /proc/meminfo
> Unevictable:      524220 kB
> Mlocked:          524220 kB
> /proc/self/smaps
> 7f2b9c600000-7f2bdc600000 rw-s 00000000 08:04 63045798   /root/testfile
> Locked:           524288 kB
> 
> = after reading the entire the file =
> /proc/meminfo
> Unevictable:     1048496 kB
> Mlocked:         1048508 kB
> /proc/self/smaps
> 7f2b9c600000-7f2bdc600000 rw-s 00000000 08:04 63045798   /root/testfile
> Locked:          1048576 kB
> 
> = after munmap =
> /proc/meminfo
> Unevictable:         176 kB
> Mlocked:              60 kB
> /proc/self/smaps
> 
> Repro code.
> ---
> 
> int mlock2wrap(const void* addr, size_t len, int flags) {
>   return syscall(SYS_mlock2, addr, len, flags);
> }
> 
> void smaps() {
>   char smapscmd[1000];
>   snprintf(
>       smapscmd,
>       sizeof(smapscmd) - 1,
>       "grep testfile -A 20 /proc/%d/smaps | grep -E '(testfile|Locked)'",
>       getpid());
>   printf("/proc/self/smaps\n");
>   fflush(stdout);
>   system(smapscmd);
> }
> 
> void meminfo() {
>   const char* meminfocmd = "grep -E '(Mlocked|Unevictable)' /proc/meminfo";
>   printf("/proc/meminfo\n");
>   fflush(stdout);
>   system(meminfocmd);
> }
> 
>   {                                                 \
>     int rc = (call);                                \
>     if (rc != 0) {                                  \
>       printf("error %d %s\n", rc, strerror(errno)); \
>       exit(1);                                      \
>     }                                               \
>   }
> int main(int argc, char* argv[]) {
>   printf("= Start =\n");
>   meminfo();
> 
>   printf("= Creating testfile =\n");
>   size_t size = 1 << 30; // 1 GiB
>   int fd = open("testfile", O_CREAT | O_RDWR, 0666);
>   {
>     void* buf = malloc(size);
>     write(fd, buf, size);
>     free(buf);
>   }
>   int ret = 0;
>   void* addr = NULL;
>   addr = mmap(NULL, size, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0);
> 
>   if (argc > 1) {
>     PCHECK(mlock2wrap(addr, size, MLOCK_ONFAULT));
>     printf("= after mlock2(MLOCK_ONFAULT) =\n");
>     meminfo();
>     smaps();
> 
>     for (size_t i = 0; i < size / 2; i += 4096) {
>       ret += ((char*)addr)[i];
>     }
>     printf("= after reading half of the file =\n");
>     meminfo();
>     smaps();
> 
>     for (size_t i = 0; i < size; i += 4096) {
>       ret += ((char*)addr)[i];
>     }
>     printf("= after reading the entire the file =\n");
>     meminfo();
>     smaps();
> 
>   } else {
>     PCHECK(mlock(addr, size));
>     printf("= after mlock =\n");
>     meminfo();
>     smaps();
>   }
> 
>   PCHECK(munmap(addr, size));
>   printf("= after munmap =\n");
>   meminfo();
>   smaps();
> 
>   return ret;
> }
> 
> ---
> 
> Signed-off-by: Lucian Adrian Grijincu <lucian@fb.com>
> Acked-by: Souptick Joarder <jrdr.linux@gmail.com>
> ---
>  mm/memory.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/mm/memory.c b/mm/memory.c
> index e0c232fe81d9..55da24f33bc4 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -3311,6 +3311,8 @@ vm_fault_t alloc_set_pte(struct vm_fault *vmf, struct mem_cgroup *memcg,
>  	} else {
>  		inc_mm_counter_fast(vma->vm_mm, mm_counter_file(page));
>  		page_add_file_rmap(page, false);
> +		if (vma->vm_flags & VM_LOCKED && !PageTransCompound(page))
> +			mlock_vma_page(page);

Acked-by: Roman Gushchin <guro@fb.com>

Thanks!


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v3] mm: memory: fix /proc/meminfo reporting for MLOCK_ONFAULT
  2019-09-13 21:11 [PATCH v3] mm: memory: fix /proc/meminfo reporting for MLOCK_ONFAULT Lucian Adrian Grijincu
  2019-09-13 21:17 ` [Potential Spoof] " Roman Gushchin
@ 2019-09-16 11:35 ` Michal Hocko
  2019-09-16 21:34   ` Lucian Grijincu
  2019-09-16 15:26 ` Kirill A. Shutemov
  2 siblings, 1 reply; 9+ messages in thread
From: Michal Hocko @ 2019-09-16 11:35 UTC (permalink / raw)
  To: Lucian Adrian Grijincu
  Cc: linux-mm, Souptick Joarder, linux-kernel, Andrew Morton,
	Rik van Riel, Roman Gushchin, Hugh Dickins

[Cc Hugh]

On Fri 13-09-19 14:11:19, Lucian Adrian Grijincu wrote:
> As pages are faulted in MLOCK_ONFAULT correctly updates
> /proc/self/smaps, but doesn't update /proc/meminfo's Mlocked field.
> 
> - Before this /proc/meminfo fields didn't change as pages were faulted in:
> 
> = Start =
> /proc/meminfo
> Unevictable:       10128 kB
> Mlocked:           10132 kB
> = Creating testfile =
> 
> = after mlock2(MLOCK_ONFAULT) =
> /proc/meminfo
> Unevictable:       10128 kB
> Mlocked:           10132 kB
> /proc/self/smaps
> 7f8714000000-7f8754000000 rw-s 00000000 08:04 50857050   /root/testfile
> Locked:                0 kB
> 
> = after reading half of the file =
> /proc/meminfo
> Unevictable:       10128 kB
> Mlocked:           10132 kB
> /proc/self/smaps
> 7f8714000000-7f8754000000 rw-s 00000000 08:04 50857050   /root/testfile
> Locked:           524288 kB
> 
> = after reading the entire the file =
> /proc/meminfo
> Unevictable:       10128 kB
> Mlocked:           10132 kB
> /proc/self/smaps
> 7f8714000000-7f8754000000 rw-s 00000000 08:04 50857050   /root/testfile
> Locked:          1048576 kB
> 
> = after munmap =
> /proc/meminfo
> Unevictable:       10128 kB
> Mlocked:           10132 kB
> /proc/self/smaps
> 
> - After: /proc/meminfo fields are properly updated as pages are touched:
> 
> = Start =
> /proc/meminfo
> Unevictable:          60 kB
> Mlocked:              60 kB
> = Creating testfile =
> 
> = after mlock2(MLOCK_ONFAULT) =
> /proc/meminfo
> Unevictable:          60 kB
> Mlocked:              60 kB
> /proc/self/smaps
> 7f2b9c600000-7f2bdc600000 rw-s 00000000 08:04 63045798   /root/testfile
> Locked:                0 kB
> 
> = after reading half of the file =
> /proc/meminfo
> Unevictable:      524220 kB
> Mlocked:          524220 kB
> /proc/self/smaps
> 7f2b9c600000-7f2bdc600000 rw-s 00000000 08:04 63045798   /root/testfile
> Locked:           524288 kB
> 
> = after reading the entire the file =
> /proc/meminfo
> Unevictable:     1048496 kB
> Mlocked:         1048508 kB
> /proc/self/smaps
> 7f2b9c600000-7f2bdc600000 rw-s 00000000 08:04 63045798   /root/testfile
> Locked:          1048576 kB
> 
> = after munmap =
> /proc/meminfo
> Unevictable:         176 kB
> Mlocked:              60 kB
> /proc/self/smaps
> 
> Repro code.
> ---
> 
> int mlock2wrap(const void* addr, size_t len, int flags) {
>   return syscall(SYS_mlock2, addr, len, flags);
> }
> 
> void smaps() {
>   char smapscmd[1000];
>   snprintf(
>       smapscmd,
>       sizeof(smapscmd) - 1,
>       "grep testfile -A 20 /proc/%d/smaps | grep -E '(testfile|Locked)'",
>       getpid());
>   printf("/proc/self/smaps\n");
>   fflush(stdout);
>   system(smapscmd);
> }
> 
> void meminfo() {
>   const char* meminfocmd = "grep -E '(Mlocked|Unevictable)' /proc/meminfo";
>   printf("/proc/meminfo\n");
>   fflush(stdout);
>   system(meminfocmd);
> }
> 
>   {                                                 \
>     int rc = (call);                                \
>     if (rc != 0) {                                  \
>       printf("error %d %s\n", rc, strerror(errno)); \
>       exit(1);                                      \
>     }                                               \
>   }
> int main(int argc, char* argv[]) {
>   printf("= Start =\n");
>   meminfo();
> 
>   printf("= Creating testfile =\n");
>   size_t size = 1 << 30; // 1 GiB
>   int fd = open("testfile", O_CREAT | O_RDWR, 0666);
>   {
>     void* buf = malloc(size);
>     write(fd, buf, size);
>     free(buf);
>   }
>   int ret = 0;
>   void* addr = NULL;
>   addr = mmap(NULL, size, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0);
> 
>   if (argc > 1) {
>     PCHECK(mlock2wrap(addr, size, MLOCK_ONFAULT));
>     printf("= after mlock2(MLOCK_ONFAULT) =\n");
>     meminfo();
>     smaps();
> 
>     for (size_t i = 0; i < size / 2; i += 4096) {
>       ret += ((char*)addr)[i];
>     }
>     printf("= after reading half of the file =\n");
>     meminfo();
>     smaps();
> 
>     for (size_t i = 0; i < size; i += 4096) {
>       ret += ((char*)addr)[i];
>     }
>     printf("= after reading the entire the file =\n");
>     meminfo();
>     smaps();
> 
>   } else {
>     PCHECK(mlock(addr, size));
>     printf("= after mlock =\n");
>     meminfo();
>     smaps();
>   }
> 
>   PCHECK(munmap(addr, size));
>   printf("= after munmap =\n");
>   meminfo();
>   smaps();
> 
>   return ret;
> }
> 
> ---
> 
> Signed-off-by: Lucian Adrian Grijincu <lucian@fb.com>
> Acked-by: Souptick Joarder <jrdr.linux@gmail.com>

Fixes: b0f205c2a308 ("mm: mlock: add mlock flags to enable VM_LOCKONFAULT usage")

I am not really sure a backport to stable is really needed because an
imprecise accounting is not really critical. Pages should eventually
get accounted under memory pressure when they are attempted to unmap
IIRC.

Btw. the changelog could benefit from a more details on the issue and
the fix description. The reproducer is really nice but it doesn't really
explain the maze of the mlock accounting and why only the file backed
memory has a problem.

> ---
>  mm/memory.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/mm/memory.c b/mm/memory.c
> index e0c232fe81d9..55da24f33bc4 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -3311,6 +3311,8 @@ vm_fault_t alloc_set_pte(struct vm_fault *vmf, struct mem_cgroup *memcg,
>  	} else {
>  		inc_mm_counter_fast(vma->vm_mm, mm_counter_file(page));
>  		page_add_file_rmap(page, false);
> +		if (vma->vm_flags & VM_LOCKED && !PageTransCompound(page))
> +			mlock_vma_page(page);
>  	}
>  	set_pte_at(vma->vm_mm, vmf->address, vmf->pte, entry);

I dunno. Handling it here in alloc_set_pte sounds a bit weird to me.
Altough we already do mlock for CoW pages there, I thought this was more
of an exception.
Is there any real reason why this cannot be done in the standard #PF
path? finish_fault for example?
-- 
Michal Hocko
SUSE Labs


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v3] mm: memory: fix /proc/meminfo reporting for MLOCK_ONFAULT
  2019-09-13 21:11 [PATCH v3] mm: memory: fix /proc/meminfo reporting for MLOCK_ONFAULT Lucian Adrian Grijincu
  2019-09-13 21:17 ` [Potential Spoof] " Roman Gushchin
  2019-09-16 11:35 ` Michal Hocko
@ 2019-09-16 15:26 ` Kirill A. Shutemov
  2019-09-17 10:15   ` Michal Hocko
  2019-09-17 11:37   ` Kirill A. Shutemov
  2 siblings, 2 replies; 9+ messages in thread
From: Kirill A. Shutemov @ 2019-09-16 15:26 UTC (permalink / raw)
  To: Lucian Adrian Grijincu
  Cc: linux-mm, Souptick Joarder, linux-kernel, Michal Hocko,
	Andrew Morton, Rik van Riel, Roman Gushchin

On Fri, Sep 13, 2019 at 02:11:19PM -0700, Lucian Adrian Grijincu wrote:
> As pages are faulted in MLOCK_ONFAULT correctly updates
> /proc/self/smaps, but doesn't update /proc/meminfo's Mlocked field.

I don't think there's something wrong with this behaviour. It is okay to
keep the page an evictable LRU list (and not account it to NR_MLOCKED).
Some pages, like partly mapped THP will never be on unevictable LRU,
others will be found by vmscan later.

So, it's not bug per se.

Said that, we probably should try to put pages on unevictable LRU sooner
rather than later.

> 
> - Before this /proc/meminfo fields didn't change as pages were faulted in:
> 
> = Start =
> /proc/meminfo
> Unevictable:       10128 kB
> Mlocked:           10132 kB
> = Creating testfile =
> 
> = after mlock2(MLOCK_ONFAULT) =
> /proc/meminfo
> Unevictable:       10128 kB
> Mlocked:           10132 kB
> /proc/self/smaps
> 7f8714000000-7f8754000000 rw-s 00000000 08:04 50857050   /root/testfile
> Locked:                0 kB
> 
> = after reading half of the file =
> /proc/meminfo
> Unevictable:       10128 kB
> Mlocked:           10132 kB
> /proc/self/smaps
> 7f8714000000-7f8754000000 rw-s 00000000 08:04 50857050   /root/testfile
> Locked:           524288 kB
> 
> = after reading the entire the file =
> /proc/meminfo
> Unevictable:       10128 kB
> Mlocked:           10132 kB
> /proc/self/smaps
> 7f8714000000-7f8754000000 rw-s 00000000 08:04 50857050   /root/testfile
> Locked:          1048576 kB
> 
> = after munmap =
> /proc/meminfo
> Unevictable:       10128 kB
> Mlocked:           10132 kB
> /proc/self/smaps
> 
> - After: /proc/meminfo fields are properly updated as pages are touched:
> 
> = Start =
> /proc/meminfo
> Unevictable:          60 kB
> Mlocked:              60 kB
> = Creating testfile =
> 
> = after mlock2(MLOCK_ONFAULT) =
> /proc/meminfo
> Unevictable:          60 kB
> Mlocked:              60 kB
> /proc/self/smaps
> 7f2b9c600000-7f2bdc600000 rw-s 00000000 08:04 63045798   /root/testfile
> Locked:                0 kB
> 
> = after reading half of the file =
> /proc/meminfo
> Unevictable:      524220 kB
> Mlocked:          524220 kB
> /proc/self/smaps
> 7f2b9c600000-7f2bdc600000 rw-s 00000000 08:04 63045798   /root/testfile
> Locked:           524288 kB
> 
> = after reading the entire the file =
> /proc/meminfo
> Unevictable:     1048496 kB
> Mlocked:         1048508 kB
> /proc/self/smaps
> 7f2b9c600000-7f2bdc600000 rw-s 00000000 08:04 63045798   /root/testfile
> Locked:          1048576 kB
> 
> = after munmap =
> /proc/meminfo
> Unevictable:         176 kB
> Mlocked:              60 kB
> /proc/self/smaps
> 
> Repro code.
> ---
> 
> int mlock2wrap(const void* addr, size_t len, int flags) {
>   return syscall(SYS_mlock2, addr, len, flags);
> }
> 
> void smaps() {
>   char smapscmd[1000];
>   snprintf(
>       smapscmd,
>       sizeof(smapscmd) - 1,
>       "grep testfile -A 20 /proc/%d/smaps | grep -E '(testfile|Locked)'",
>       getpid());
>   printf("/proc/self/smaps\n");
>   fflush(stdout);
>   system(smapscmd);
> }
> 
> void meminfo() {
>   const char* meminfocmd = "grep -E '(Mlocked|Unevictable)' /proc/meminfo";
>   printf("/proc/meminfo\n");
>   fflush(stdout);
>   system(meminfocmd);
> }
> 
>   {                                                 \
>     int rc = (call);                                \
>     if (rc != 0) {                                  \
>       printf("error %d %s\n", rc, strerror(errno)); \
>       exit(1);                                      \
>     }                                               \
>   }
> int main(int argc, char* argv[]) {
>   printf("= Start =\n");
>   meminfo();
> 
>   printf("= Creating testfile =\n");
>   size_t size = 1 << 30; // 1 GiB
>   int fd = open("testfile", O_CREAT | O_RDWR, 0666);
>   {
>     void* buf = malloc(size);
>     write(fd, buf, size);
>     free(buf);
>   }
>   int ret = 0;
>   void* addr = NULL;
>   addr = mmap(NULL, size, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0);
> 
>   if (argc > 1) {
>     PCHECK(mlock2wrap(addr, size, MLOCK_ONFAULT));
>     printf("= after mlock2(MLOCK_ONFAULT) =\n");
>     meminfo();
>     smaps();
> 
>     for (size_t i = 0; i < size / 2; i += 4096) {
>       ret += ((char*)addr)[i];
>     }
>     printf("= after reading half of the file =\n");
>     meminfo();
>     smaps();
> 
>     for (size_t i = 0; i < size; i += 4096) {
>       ret += ((char*)addr)[i];
>     }
>     printf("= after reading the entire the file =\n");
>     meminfo();
>     smaps();
> 
>   } else {
>     PCHECK(mlock(addr, size));
>     printf("= after mlock =\n");
>     meminfo();
>     smaps();
>   }
> 
>   PCHECK(munmap(addr, size));
>   printf("= after munmap =\n");
>   meminfo();
>   smaps();
> 
>   return ret;
> }
> 
> ---
> 
> Signed-off-by: Lucian Adrian Grijincu <lucian@fb.com>
> Acked-by: Souptick Joarder <jrdr.linux@gmail.com>
> ---
>  mm/memory.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/mm/memory.c b/mm/memory.c
> index e0c232fe81d9..55da24f33bc4 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -3311,6 +3311,8 @@ vm_fault_t alloc_set_pte(struct vm_fault *vmf, struct mem_cgroup *memcg,
>  	} else {
>  		inc_mm_counter_fast(vma->vm_mm, mm_counter_file(page));
>  		page_add_file_rmap(page, false);
> +		if (vma->vm_flags & VM_LOCKED && !PageTransCompound(page))
> +			mlock_vma_page(page);

Why do you only do this for file pages?

>  	}
>  	set_pte_at(vma->vm_mm, vmf->address, vmf->pte, entry);
>  
> -- 
> 2.17.1
> 
> 

-- 
 Kirill A. Shutemov


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v3] mm: memory: fix /proc/meminfo reporting for MLOCK_ONFAULT
  2019-09-16 11:35 ` Michal Hocko
@ 2019-09-16 21:34   ` Lucian Grijincu
  0 siblings, 0 replies; 9+ messages in thread
From: Lucian Grijincu @ 2019-09-16 21:34 UTC (permalink / raw)
  To: Michal Hocko
  Cc: linux-mm, Souptick Joarder, linux-kernel, Andrew Morton,
	Rik van Riel, Roman Gushchin, Hugh Dickins



> On 9/16/19, 04:35, "Michal Hocko" <mhocko@kernel.org> wrote:
>     > diff --git a/mm/memory.c b/mm/memory.c
>     > index e0c232fe81d9..55da24f33bc4 100644
>     > --- a/mm/memory.c
>     > +++ b/mm/memory.c
>     > @@ -3311,6 +3311,8 @@ vm_fault_t alloc_set_pte(struct vm_fault *vmf, struct mem_cgroup *memcg,
>     >  	} else {
>     >  		inc_mm_counter_fast(vma->vm_mm, mm_counter_file(page));
>     >  		page_add_file_rmap(page, false);
>     > +		if (vma->vm_flags & VM_LOCKED && !PageTransCompound(page))
>     > +			mlock_vma_page(page);
>     >  	}
>     >  	set_pte_at(vma->vm_mm, vmf->address, vmf->pte, entry);
    
>     I dunno. Handling it here in alloc_set_pte sounds a bit weird to me.
>     Altough we already do mlock for CoW pages there, I thought this was more
>     of an exception.
>     Is there any real reason why this cannot be done in the standard #PF
>     path? finish_fault for example?

alloc_set_pte is called from finish_fault https://github.com/torvalds/linux/blob/v5.2/mm/memory.c#L3400

   vm_fault_t finish_fault(struct vm_fault *vmf)
     ...
	if (!ret)
		ret = alloc_set_pte(vmf, vmf->memcg, page);

and inside alloc_set_pte one of the branches of the if-clause already handled mlocked pages:
https://github.com/torvalds/linux/blob/v5.2/mm/memory.c#L3348-L3356

I added it to the else-branch as that seemed like the least intrusive change, but I will move this to finish_fault, probably like this (after I'm done testing):

   vm_fault_t finish_fault(struct vm_fault *vmf)
    ...
        if (!ret)
                ret = alloc_set_pte(vmf, vmf->memcg, page);
+       if (!ret && (vmf->vma->vm_flags & VM_LOCKED) && !PageTransCompound(page))
+                       mlock_vma_page(page);

Thanks for the review and suggestions!

--
Lucian


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v3] mm: memory: fix /proc/meminfo reporting for MLOCK_ONFAULT
  2019-09-16 15:26 ` Kirill A. Shutemov
@ 2019-09-17 10:15   ` Michal Hocko
  2019-09-17 11:35     ` Kirill A. Shutemov
  2019-09-17 11:37   ` Kirill A. Shutemov
  1 sibling, 1 reply; 9+ messages in thread
From: Michal Hocko @ 2019-09-17 10:15 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Lucian Adrian Grijincu, linux-mm, Souptick Joarder, linux-kernel,
	Andrew Morton, Rik van Riel, Roman Gushchin

On Mon 16-09-19 18:26:19, Kirill A. Shutemov wrote:
> On Fri, Sep 13, 2019 at 02:11:19PM -0700, Lucian Adrian Grijincu wrote:
> > As pages are faulted in MLOCK_ONFAULT correctly updates
> > /proc/self/smaps, but doesn't update /proc/meminfo's Mlocked field.
> 
> I don't think there's something wrong with this behaviour. It is okay to
> keep the page an evictable LRU list (and not account it to NR_MLOCKED).

evictable list is an implementation detail. Having an overview about an
amount of mlocked pages can be important. Lazy accounting makes this
more fuzzy and harder for admins to monitor.

Sure it is not a bug to panic about but it certainly makes life of poor
admins harder.

If there is a pathological THP behavior possible then we should look
into that as well.
-- 
Michal Hocko
SUSE Labs


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v3] mm: memory: fix /proc/meminfo reporting for MLOCK_ONFAULT
  2019-09-17 10:15   ` Michal Hocko
@ 2019-09-17 11:35     ` Kirill A. Shutemov
  2019-09-23  8:50       ` Michal Hocko
  0 siblings, 1 reply; 9+ messages in thread
From: Kirill A. Shutemov @ 2019-09-17 11:35 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Lucian Adrian Grijincu, linux-mm, Souptick Joarder, linux-kernel,
	Andrew Morton, Rik van Riel, Roman Gushchin

On Tue, Sep 17, 2019 at 12:15:19PM +0200, Michal Hocko wrote:
> On Mon 16-09-19 18:26:19, Kirill A. Shutemov wrote:
> > On Fri, Sep 13, 2019 at 02:11:19PM -0700, Lucian Adrian Grijincu wrote:
> > > As pages are faulted in MLOCK_ONFAULT correctly updates
> > > /proc/self/smaps, but doesn't update /proc/meminfo's Mlocked field.
> > 
> > I don't think there's something wrong with this behaviour. It is okay to
> > keep the page an evictable LRU list (and not account it to NR_MLOCKED).
> 
> evictable list is an implementation detail. Having an overview about an

s/evictable/unevictable/

> amount of mlocked pages can be important. Lazy accounting makes this
> more fuzzy and harder for admins to monitor.
> 
> Sure it is not a bug to panic about but it certainly makes life of poor
> admins harder.

Good luck with making mlock accounting exact :P

For start, try to handle sanely trylock_page() failure under ptl while
dealing with FOLL_MLOCK.

> If there is a pathological THP behavior possible then we should look
> into that as well.

There's nothing pathological about THP behaviour. See "MLOCKING
Transparent Huge Pages" section in Documentation/vm/unevictable-lru.rst.

-- 
 Kirill A. Shutemov


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v3] mm: memory: fix /proc/meminfo reporting for MLOCK_ONFAULT
  2019-09-16 15:26 ` Kirill A. Shutemov
  2019-09-17 10:15   ` Michal Hocko
@ 2019-09-17 11:37   ` Kirill A. Shutemov
  1 sibling, 0 replies; 9+ messages in thread
From: Kirill A. Shutemov @ 2019-09-17 11:37 UTC (permalink / raw)
  To: Lucian Adrian Grijincu
  Cc: linux-mm, Souptick Joarder, linux-kernel, Michal Hocko,
	Andrew Morton, Rik van Riel, Roman Gushchin

On Mon, Sep 16, 2019 at 06:26:19PM +0300, Kirill A. Shutemov wrote:
> > ---
> >  mm/memory.c | 2 ++
> >  1 file changed, 2 insertions(+)
> > 
> > diff --git a/mm/memory.c b/mm/memory.c
> > index e0c232fe81d9..55da24f33bc4 100644
> > --- a/mm/memory.c
> > +++ b/mm/memory.c
> > @@ -3311,6 +3311,8 @@ vm_fault_t alloc_set_pte(struct vm_fault *vmf, struct mem_cgroup *memcg,
> >  	} else {
> >  		inc_mm_counter_fast(vma->vm_mm, mm_counter_file(page));
> >  		page_add_file_rmap(page, false);
> > +		if (vma->vm_flags & VM_LOCKED && !PageTransCompound(page))
> > +			mlock_vma_page(page);
> 
> Why do you only do this for file pages?

Because file pages are locked already, right?

-- 
 Kirill A. Shutemov


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v3] mm: memory: fix /proc/meminfo reporting for MLOCK_ONFAULT
  2019-09-17 11:35     ` Kirill A. Shutemov
@ 2019-09-23  8:50       ` Michal Hocko
  0 siblings, 0 replies; 9+ messages in thread
From: Michal Hocko @ 2019-09-23  8:50 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Lucian Adrian Grijincu, linux-mm, Souptick Joarder, linux-kernel,
	Andrew Morton, Rik van Riel, Roman Gushchin

On Tue 17-09-19 14:35:50, Kirill A. Shutemov wrote:
> On Tue, Sep 17, 2019 at 12:15:19PM +0200, Michal Hocko wrote:
> > On Mon 16-09-19 18:26:19, Kirill A. Shutemov wrote:
> > > On Fri, Sep 13, 2019 at 02:11:19PM -0700, Lucian Adrian Grijincu wrote:
> > > > As pages are faulted in MLOCK_ONFAULT correctly updates
> > > > /proc/self/smaps, but doesn't update /proc/meminfo's Mlocked field.
> > > 
> > > I don't think there's something wrong with this behaviour. It is okay to
> > > keep the page an evictable LRU list (and not account it to NR_MLOCKED).
> > 
> > evictable list is an implementation detail. Having an overview about an
> 
> s/evictable/unevictable/
> 
> > amount of mlocked pages can be important. Lazy accounting makes this
> > more fuzzy and harder for admins to monitor.
> > 
> > Sure it is not a bug to panic about but it certainly makes life of poor
> > admins harder.
> 
> Good luck with making mlock accounting exact :P

I didn't say exact. All I am saying is that the more imprecise it will
be the harder it is for admin to make any sense of the value.

> For start, try to handle sanely trylock_page() failure under ptl while
> dealing with FOLL_MLOCK.

There are likely cases when accounting is problematic/impossible. But
those should be a minority.
 
> > If there is a pathological THP behavior possible then we should look
> > into that as well.
> 
> There's nothing pathological about THP behaviour. See "MLOCKING
> Transparent Huge Pages" section in Documentation/vm/unevictable-lru.rst.

Thanks this documentation helps. I was worried there is something more
going on.
-- 
Michal Hocko
SUSE Labs


^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2019-09-23  8:50 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-09-13 21:11 [PATCH v3] mm: memory: fix /proc/meminfo reporting for MLOCK_ONFAULT Lucian Adrian Grijincu
2019-09-13 21:17 ` [Potential Spoof] " Roman Gushchin
2019-09-16 11:35 ` Michal Hocko
2019-09-16 21:34   ` Lucian Grijincu
2019-09-16 15:26 ` Kirill A. Shutemov
2019-09-17 10:15   ` Michal Hocko
2019-09-17 11:35     ` Kirill A. Shutemov
2019-09-23  8:50       ` Michal Hocko
2019-09-17 11:37   ` Kirill A. Shutemov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).