All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v3 1/2] mm, proc: Fix region lost in /proc/self/smaps
@ 2016-09-23 13:12 ` Robert Ho
  0 siblings, 0 replies; 23+ messages in thread
From: Robert Ho @ 2016-09-23 13:12 UTC (permalink / raw)
  To: pbonzini, akpm, mhocko, oleg, dan.j.williams, dave.hansen
  Cc: guangrong.xiao, gleb, mtosatti, kvm, linux-kernel, stefanha,
	yuhuang, linux-mm, ross.zwisler, robert.hu

From: Xiao Guangrong <guangrong.xiao@linux.intel.com>

Recently, Redhat reported that nvml test suite failed on QEMU/KVM,
more detailed info please refer to:
   https://bugzilla.redhat.com/show_bug.cgi?id=1365721

Actually, this bug is not only for NVDIMM/DAX but also for any other file
systems. This simple test case abstracted from nvml can easily reproduce
this bug in common environment:

-------------------------- testcase.c -----------------------------

int
is_pmem_proc(const void *addr, size_t len)
{
        const char *caddr = addr;

        FILE *fp;
        if ((fp = fopen("/proc/self/smaps", "r")) == NULL) {
                printf("!/proc/self/smaps");
                return 0;
        }

        int retval = 0;         /* assume false until proven otherwise */
        char line[PROCMAXLEN];  /* for fgets() */
        char *lo = NULL;        /* beginning of current range in smaps file */
        char *hi = NULL;        /* end of current range in smaps file */
        int needmm = 0;         /* looking for mm flag for current range */
        while (fgets(line, PROCMAXLEN, fp) != NULL) {
                static const char vmflags[] = "VmFlags:";
                static const char mm[] = " wr";

                /* check for range line */
                if (sscanf(line, "%p-%p", &lo, &hi) == 2) {
                        if (needmm) {
                                /* last range matched, but no mm flag found */
                                printf("never found mm flag.\n");
                                break;
                        } else if (caddr < lo) {
                                /* never found the range for caddr */
                                printf("#######no match for addr %p.\n", caddr);
                                break;
                        } else if (caddr < hi) {
                                /* start address is in this range */
                                size_t rangelen = (size_t)(hi - caddr);

                                /* remember that matching has started */
                                needmm = 1;

                                /* calculate remaining range to search for */
                                if (len > rangelen) {
                                        len -= rangelen;
                                        caddr += rangelen;
                                        printf("matched %zu bytes in range "
                                                "%p-%p, %zu left over.\n",
                                                        rangelen, lo, hi, len);
                                } else {
                                        len = 0;
                                        printf("matched all bytes in range "
                                                        "%p-%p.\n", lo, hi);
                                }
                        }
                } else if (needmm && strncmp(line, vmflags,
                                        sizeof(vmflags) - 1) == 0) {
                        if (strstr(&line[sizeof(vmflags) - 1], mm) != NULL) {
                                printf("mm flag found.\n");
                                if (len == 0) {
                                        /* entire range matched */
                                        retval = 1;
                                        break;
                                }
                                needmm = 0;     /* saw what was needed */
                        } else {
                                /* mm flag not set for some or all of range */
                                printf("range has no mm flag.\n");
                                break;
                        }
                }
        }

        fclose(fp);

        printf("returning %d.\n", retval);
        return retval;
}

void *Addr;
size_t Size;

/*
 * worker -- the work each thread performs
 */
static void *
worker(void *arg)
{
        int *ret = (int *)arg;
        *ret =  is_pmem_proc(Addr, Size);
        return NULL;
}

int main(int argc, char *argv[])
{
        if (argc <  2 || argc > 3) {
                printf("usage: %s file [env].\n", argv[0]);
                return -1;
        }

        int fd = open(argv[1], O_RDWR);

        struct stat stbuf;
        fstat(fd, &stbuf);

        Size = stbuf.st_size;
        Addr = mmap(0, stbuf.st_size, PROT_READ|PROT_WRITE, MAP_PRIVATE, fd, 0);

        close(fd);

        pthread_t threads[NTHREAD];
        int ret[NTHREAD];

        /* kick off NTHREAD threads */
        for (int i = 0; i < NTHREAD; i++)
                pthread_create(&threads[i], NULL, worker, &ret[i]);

        /* wait for all the threads to complete */
        for (int i = 0; i < NTHREAD; i++)
                pthread_join(threads[i], NULL);

        /* verify that all the threads return the same value */
        for (int i = 1; i < NTHREAD; i++) {
                if (ret[0] != ret[i]) {
                        printf("Error i %d ret[0] = %d ret[i] = %d.\n", i,
                                ret[0], ret[i]);
                }
        }

        printf("%d", ret[0]);
        return 0;
}

It failed as some threads can not find the memory region in
"/proc/self/smaps" which is allocated in the main process

It is caused by proc fs which uses 'file->version' to indicate the VMA that
is the last one has already been handled by read() system call. When the
next read() issues, it uses the 'version' to find the VMA, then the next
VMA is what we want to handle, the related code is as follows:

        if (last_addr) {
                vma = find_vma(mm, last_addr);
                if (vma && (vma = m_next_vma(priv, vma)))
                        return vma;
        }

However, VMA will be lost if the last VMA is gone, e.g:

The process VMA list is A->B->C->D

CPU 0                                  CPU 1
read() system call
   handle VMA B
   version = B
return to userspace

                                   unmap VMA B

issue read() again to continue to get
the region info
   find_vma(version) will get VMA C
   m_next_vma(C) will get VMA D
   handle D
   !!! VMA C is lost !!!

In order to fix this bug, we make 'file->version' indicate the end address
of current VMA

Changelog:
v3:
	Thank Michal's pointing. Fix the incompletion of v2's fixing:
"/proc/<pid>/smaps will report counters for the full vma range while
the header (aka show_map_vma) will report shorter (non-overlapping) range"
    Add description in Documentation/filesystems/proc.txt, regarding maps,
smaps reading's guaruntees.

v2:
Thanks to Dave Hansen's comments, this version fixes the issue in v1 that
same virtual address range may be outputted twice, e.g:

Take two example VMAs:

	vma-A: (0x1000 -> 0x2000)
	vma-B: (0x2000 -> 0x3000)

read() #1: prints vma-A, sets m->version=0x2000

Now, merge A/B to make C:

	vma-C: (0x1000 -> 0x3000)

read() #2: find_vma(m->version=0x2000), returns vma-C, prints vma-C

The user will see two VMAs in their output:

	A: 0x1000->0x2000
	C: 0x1000->0x3000

Acked-by: Dave Hansen <dave.hansen@intel.com>
Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>
Signed-off-by: Robert Hu <robert.hu@intel.com>
---
 fs/proc/task_mmu.c | 17 +++++++++++------
 1 file changed, 11 insertions(+), 6 deletions(-)

diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
index f6fa99e..97abcf7 100644
--- a/fs/proc/task_mmu.c
+++ b/fs/proc/task_mmu.c
@@ -147,7 +147,7 @@ m_next_vma(struct proc_maps_private *priv, struct vm_area_struct *vma)
 static void m_cache_vma(struct seq_file *m, struct vm_area_struct *vma)
 {
 	if (m->count < m->size)	/* vma is copied successfully */
-		m->version = m_next_vma(m->private, vma) ? vma->vm_start : -1UL;
+		m->version = m_next_vma(m->private, vma) ? vma->vm_end : -1UL;
 }
 
 static void *m_start(struct seq_file *m, loff_t *ppos)
@@ -176,14 +176,14 @@ static void *m_start(struct seq_file *m, loff_t *ppos)
 
 	if (last_addr) {
 		vma = find_vma(mm, last_addr);
-		if (vma && (vma = m_next_vma(priv, vma)))
+		if (vma)
 			return vma;
 	}
 
 	m->version = 0;
 	if (pos < mm->map_count) {
 		for (vma = mm->mmap; pos; pos--) {
-			m->version = vma->vm_start;
+			m->version = vma->vm_end;
 			vma = vma->vm_next;
 		}
 		return vma;
@@ -293,7 +293,7 @@ show_map_vma(struct seq_file *m, struct vm_area_struct *vma, int is_pid)
 	vm_flags_t flags = vma->vm_flags;
 	unsigned long ino = 0;
 	unsigned long long pgoff = 0;
-	unsigned long start, end;
+	unsigned long end, start = m->version;
 	dev_t dev = 0;
 	const char *name = NULL;
 
@@ -304,8 +304,13 @@ show_map_vma(struct seq_file *m, struct vm_area_struct *vma, int is_pid)
 		pgoff = ((loff_t)vma->vm_pgoff) << PAGE_SHIFT;
 	}
 
+	/*
+	 * the region [0, m->version) has already been handled, do not
+	 * handle it doubly.
+	 */
+	start = max(vma->vm_start, start);
+
 	/* We don't show the stack guard page in /proc/maps */
-	start = vma->vm_start;
 	if (stack_guard_page_start(vma, start))
 		start += PAGE_SIZE;
 	end = vma->vm_end;
@@ -786,7 +791,7 @@ static int show_smap(struct seq_file *m, void *v, int is_pid)
 		   "KernelPageSize: %8lu kB\n"
 		   "MMUPageSize:    %8lu kB\n"
 		   "Locked:         %8lu kB\n",
-		   (vma->vm_end - vma->vm_start) >> 10,
+		   (vma->vm_end - max(vma->vm_start, m->version)) >> 10,
 		   mss.resident >> 10,
 		   (unsigned long)(mss.pss >> (10 + PSS_SHIFT)),
 		   mss.shared_clean  >> 10,
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v3 1/2] mm, proc: Fix region lost in /proc/self/smaps
@ 2016-09-23 13:12 ` Robert Ho
  0 siblings, 0 replies; 23+ messages in thread
From: Robert Ho @ 2016-09-23 13:12 UTC (permalink / raw)
  To: pbonzini, akpm, mhocko, oleg, dan.j.williams, dave.hansen
  Cc: guangrong.xiao, gleb, mtosatti, kvm, linux-kernel, stefanha,
	yuhuang, linux-mm, ross.zwisler, robert.hu

From: Xiao Guangrong <guangrong.xiao@linux.intel.com>

Recently, Redhat reported that nvml test suite failed on QEMU/KVM,
more detailed info please refer to:
   https://bugzilla.redhat.com/show_bug.cgi?id=1365721

Actually, this bug is not only for NVDIMM/DAX but also for any other file
systems. This simple test case abstracted from nvml can easily reproduce
this bug in common environment:

-------------------------- testcase.c -----------------------------

int
is_pmem_proc(const void *addr, size_t len)
{
        const char *caddr = addr;

        FILE *fp;
        if ((fp = fopen("/proc/self/smaps", "r")) == NULL) {
                printf("!/proc/self/smaps");
                return 0;
        }

        int retval = 0;         /* assume false until proven otherwise */
        char line[PROCMAXLEN];  /* for fgets() */
        char *lo = NULL;        /* beginning of current range in smaps file */
        char *hi = NULL;        /* end of current range in smaps file */
        int needmm = 0;         /* looking for mm flag for current range */
        while (fgets(line, PROCMAXLEN, fp) != NULL) {
                static const char vmflags[] = "VmFlags:";
                static const char mm[] = " wr";

                /* check for range line */
                if (sscanf(line, "%p-%p", &lo, &hi) == 2) {
                        if (needmm) {
                                /* last range matched, but no mm flag found */
                                printf("never found mm flag.\n");
                                break;
                        } else if (caddr < lo) {
                                /* never found the range for caddr */
                                printf("#######no match for addr %p.\n", caddr);
                                break;
                        } else if (caddr < hi) {
                                /* start address is in this range */
                                size_t rangelen = (size_t)(hi - caddr);

                                /* remember that matching has started */
                                needmm = 1;

                                /* calculate remaining range to search for */
                                if (len > rangelen) {
                                        len -= rangelen;
                                        caddr += rangelen;
                                        printf("matched %zu bytes in range "
                                                "%p-%p, %zu left over.\n",
                                                        rangelen, lo, hi, len);
                                } else {
                                        len = 0;
                                        printf("matched all bytes in range "
                                                        "%p-%p.\n", lo, hi);
                                }
                        }
                } else if (needmm && strncmp(line, vmflags,
                                        sizeof(vmflags) - 1) == 0) {
                        if (strstr(&line[sizeof(vmflags) - 1], mm) != NULL) {
                                printf("mm flag found.\n");
                                if (len == 0) {
                                        /* entire range matched */
                                        retval = 1;
                                        break;
                                }
                                needmm = 0;     /* saw what was needed */
                        } else {
                                /* mm flag not set for some or all of range */
                                printf("range has no mm flag.\n");
                                break;
                        }
                }
        }

        fclose(fp);

        printf("returning %d.\n", retval);
        return retval;
}

void *Addr;
size_t Size;

/*
 * worker -- the work each thread performs
 */
static void *
worker(void *arg)
{
        int *ret = (int *)arg;
        *ret =  is_pmem_proc(Addr, Size);
        return NULL;
}

int main(int argc, char *argv[])
{
        if (argc <  2 || argc > 3) {
                printf("usage: %s file [env].\n", argv[0]);
                return -1;
        }

        int fd = open(argv[1], O_RDWR);

        struct stat stbuf;
        fstat(fd, &stbuf);

        Size = stbuf.st_size;
        Addr = mmap(0, stbuf.st_size, PROT_READ|PROT_WRITE, MAP_PRIVATE, fd, 0);

        close(fd);

        pthread_t threads[NTHREAD];
        int ret[NTHREAD];

        /* kick off NTHREAD threads */
        for (int i = 0; i < NTHREAD; i++)
                pthread_create(&threads[i], NULL, worker, &ret[i]);

        /* wait for all the threads to complete */
        for (int i = 0; i < NTHREAD; i++)
                pthread_join(threads[i], NULL);

        /* verify that all the threads return the same value */
        for (int i = 1; i < NTHREAD; i++) {
                if (ret[0] != ret[i]) {
                        printf("Error i %d ret[0] = %d ret[i] = %d.\n", i,
                                ret[0], ret[i]);
                }
        }

        printf("%d", ret[0]);
        return 0;
}

It failed as some threads can not find the memory region in
"/proc/self/smaps" which is allocated in the main process

It is caused by proc fs which uses 'file->version' to indicate the VMA that
is the last one has already been handled by read() system call. When the
next read() issues, it uses the 'version' to find the VMA, then the next
VMA is what we want to handle, the related code is as follows:

        if (last_addr) {
                vma = find_vma(mm, last_addr);
                if (vma && (vma = m_next_vma(priv, vma)))
                        return vma;
        }

However, VMA will be lost if the last VMA is gone, e.g:

The process VMA list is A->B->C->D

CPU 0                                  CPU 1
read() system call
   handle VMA B
   version = B
return to userspace

                                   unmap VMA B

issue read() again to continue to get
the region info
   find_vma(version) will get VMA C
   m_next_vma(C) will get VMA D
   handle D
   !!! VMA C is lost !!!

In order to fix this bug, we make 'file->version' indicate the end address
of current VMA

Changelog:
v3:
	Thank Michal's pointing. Fix the incompletion of v2's fixing:
"/proc/<pid>/smaps will report counters for the full vma range while
the header (aka show_map_vma) will report shorter (non-overlapping) range"
    Add description in Documentation/filesystems/proc.txt, regarding maps,
smaps reading's guaruntees.

v2:
Thanks to Dave Hansen's comments, this version fixes the issue in v1 that
same virtual address range may be outputted twice, e.g:

Take two example VMAs:

	vma-A: (0x1000 -> 0x2000)
	vma-B: (0x2000 -> 0x3000)

read() #1: prints vma-A, sets m->version=0x2000

Now, merge A/B to make C:

	vma-C: (0x1000 -> 0x3000)

read() #2: find_vma(m->version=0x2000), returns vma-C, prints vma-C

The user will see two VMAs in their output:

	A: 0x1000->0x2000
	C: 0x1000->0x3000

Acked-by: Dave Hansen <dave.hansen@intel.com>
Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>
Signed-off-by: Robert Hu <robert.hu@intel.com>
---
 fs/proc/task_mmu.c | 17 +++++++++++------
 1 file changed, 11 insertions(+), 6 deletions(-)

diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
index f6fa99e..97abcf7 100644
--- a/fs/proc/task_mmu.c
+++ b/fs/proc/task_mmu.c
@@ -147,7 +147,7 @@ m_next_vma(struct proc_maps_private *priv, struct vm_area_struct *vma)
 static void m_cache_vma(struct seq_file *m, struct vm_area_struct *vma)
 {
 	if (m->count < m->size)	/* vma is copied successfully */
-		m->version = m_next_vma(m->private, vma) ? vma->vm_start : -1UL;
+		m->version = m_next_vma(m->private, vma) ? vma->vm_end : -1UL;
 }
 
 static void *m_start(struct seq_file *m, loff_t *ppos)
@@ -176,14 +176,14 @@ static void *m_start(struct seq_file *m, loff_t *ppos)
 
 	if (last_addr) {
 		vma = find_vma(mm, last_addr);
-		if (vma && (vma = m_next_vma(priv, vma)))
+		if (vma)
 			return vma;
 	}
 
 	m->version = 0;
 	if (pos < mm->map_count) {
 		for (vma = mm->mmap; pos; pos--) {
-			m->version = vma->vm_start;
+			m->version = vma->vm_end;
 			vma = vma->vm_next;
 		}
 		return vma;
@@ -293,7 +293,7 @@ show_map_vma(struct seq_file *m, struct vm_area_struct *vma, int is_pid)
 	vm_flags_t flags = vma->vm_flags;
 	unsigned long ino = 0;
 	unsigned long long pgoff = 0;
-	unsigned long start, end;
+	unsigned long end, start = m->version;
 	dev_t dev = 0;
 	const char *name = NULL;
 
@@ -304,8 +304,13 @@ show_map_vma(struct seq_file *m, struct vm_area_struct *vma, int is_pid)
 		pgoff = ((loff_t)vma->vm_pgoff) << PAGE_SHIFT;
 	}
 
+	/*
+	 * the region [0, m->version) has already been handled, do not
+	 * handle it doubly.
+	 */
+	start = max(vma->vm_start, start);
+
 	/* We don't show the stack guard page in /proc/maps */
-	start = vma->vm_start;
 	if (stack_guard_page_start(vma, start))
 		start += PAGE_SIZE;
 	end = vma->vm_end;
@@ -786,7 +791,7 @@ static int show_smap(struct seq_file *m, void *v, int is_pid)
 		   "KernelPageSize: %8lu kB\n"
 		   "MMUPageSize:    %8lu kB\n"
 		   "Locked:         %8lu kB\n",
-		   (vma->vm_end - vma->vm_start) >> 10,
+		   (vma->vm_end - max(vma->vm_start, m->version)) >> 10,
 		   mss.resident >> 10,
 		   (unsigned long)(mss.pss >> (10 + PSS_SHIFT)),
 		   mss.shared_clean  >> 10,
-- 
1.8.3.1

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v3 2/2] Documentation/filesystems/proc.txt: Add more description for maps/smaps
  2016-09-23 13:12 ` Robert Ho
@ 2016-09-23 13:12   ` Robert Ho
  -1 siblings, 0 replies; 23+ messages in thread
From: Robert Ho @ 2016-09-23 13:12 UTC (permalink / raw)
  To: pbonzini, akpm, mhocko, oleg, dan.j.williams, dave.hansen
  Cc: guangrong.xiao, gleb, mtosatti, kvm, linux-kernel, stefanha,
	yuhuang, linux-mm, ross.zwisler, robert.hu

Add some more description on the limitations for smaps/maps readings, as well
as some guaruntees we can make.

Signed-off-by: Robert Ho <robert.hu@intel.com>
---
 Documentation/filesystems/proc.txt | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/Documentation/filesystems/proc.txt b/Documentation/filesystems/proc.txt
index 68080ad..90eabc7 100644
--- a/Documentation/filesystems/proc.txt
+++ b/Documentation/filesystems/proc.txt
@@ -515,6 +515,14 @@ be vanished or the reverse -- new added.
 This file is only present if the CONFIG_MMU kernel configuration option is
 enabled.
 
+Note: for both /proc/PID/maps and /proc/PID/smaps readings, it's
+possible in race conditions, that the mappings printed may not be that
+up-to-date, because during each read walking, the task's mappings may have
+changed, this typically happens in multithread cases. But anyway in each single
+read these can be guarunteed: 1) the mapped addresses doesn't go backward; 2) no
+overlaps 3) if there is something at a given vaddr during the entirety of the
+life of the smaps/maps walk, there will be some output for it.
+
 The /proc/PID/clear_refs is used to reset the PG_Referenced and ACCESSED/YOUNG
 bits on both physical and virtual pages associated with a process, and the
 soft-dirty bit on pte (see Documentation/vm/soft-dirty.txt for details).
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v3 2/2] Documentation/filesystems/proc.txt: Add more description for maps/smaps
@ 2016-09-23 13:12   ` Robert Ho
  0 siblings, 0 replies; 23+ messages in thread
From: Robert Ho @ 2016-09-23 13:12 UTC (permalink / raw)
  To: pbonzini, akpm, mhocko, oleg, dan.j.williams, dave.hansen
  Cc: guangrong.xiao, gleb, mtosatti, kvm, linux-kernel, stefanha,
	yuhuang, linux-mm, ross.zwisler, robert.hu

Add some more description on the limitations for smaps/maps readings, as well
as some guaruntees we can make.

Signed-off-by: Robert Ho <robert.hu@intel.com>
---
 Documentation/filesystems/proc.txt | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/Documentation/filesystems/proc.txt b/Documentation/filesystems/proc.txt
index 68080ad..90eabc7 100644
--- a/Documentation/filesystems/proc.txt
+++ b/Documentation/filesystems/proc.txt
@@ -515,6 +515,14 @@ be vanished or the reverse -- new added.
 This file is only present if the CONFIG_MMU kernel configuration option is
 enabled.
 
+Note: for both /proc/PID/maps and /proc/PID/smaps readings, it's
+possible in race conditions, that the mappings printed may not be that
+up-to-date, because during each read walking, the task's mappings may have
+changed, this typically happens in multithread cases. But anyway in each single
+read these can be guarunteed: 1) the mapped addresses doesn't go backward; 2) no
+overlaps 3) if there is something at a given vaddr during the entirety of the
+life of the smaps/maps walk, there will be some output for it.
+
 The /proc/PID/clear_refs is used to reset the PG_Referenced and ACCESSED/YOUNG
 bits on both physical and virtual pages associated with a process, and the
 soft-dirty bit on pte (see Documentation/vm/soft-dirty.txt for details).
-- 
1.8.3.1

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: [PATCH v3 1/2] mm, proc: Fix region lost in /proc/self/smaps
  2016-09-23 13:12 ` Robert Ho
@ 2016-09-23 13:50   ` Michal Hocko
  -1 siblings, 0 replies; 23+ messages in thread
From: Michal Hocko @ 2016-09-23 13:50 UTC (permalink / raw)
  To: Robert Ho
  Cc: pbonzini, akpm, oleg, dan.j.williams, dave.hansen,
	guangrong.xiao, gleb, mtosatti, kvm, linux-kernel, stefanha,
	yuhuang, linux-mm, ross.zwisler

On Fri 23-09-16 21:12:33, Robert Ho wrote:
> From: Xiao Guangrong <guangrong.xiao@linux.intel.com>
> 
> Recently, Redhat reported that nvml test suite failed on QEMU/KVM,
> more detailed info please refer to:
>    https://bugzilla.redhat.com/show_bug.cgi?id=1365721
> 
> Actually, this bug is not only for NVDIMM/DAX but also for any other file
> systems. This simple test case abstracted from nvml can easily reproduce
> this bug in common environment:
> 
> -------------------------- testcase.c -----------------------------
> 
> int
> is_pmem_proc(const void *addr, size_t len)
> {
>         const char *caddr = addr;
> 
>         FILE *fp;
>         if ((fp = fopen("/proc/self/smaps", "r")) == NULL) {
>                 printf("!/proc/self/smaps");
>                 return 0;
>         }
> 
>         int retval = 0;         /* assume false until proven otherwise */
>         char line[PROCMAXLEN];  /* for fgets() */
>         char *lo = NULL;        /* beginning of current range in smaps file */
>         char *hi = NULL;        /* end of current range in smaps file */
>         int needmm = 0;         /* looking for mm flag for current range */
>         while (fgets(line, PROCMAXLEN, fp) != NULL) {
>                 static const char vmflags[] = "VmFlags:";
>                 static const char mm[] = " wr";
> 
>                 /* check for range line */
>                 if (sscanf(line, "%p-%p", &lo, &hi) == 2) {
>                         if (needmm) {
>                                 /* last range matched, but no mm flag found */
>                                 printf("never found mm flag.\n");
>                                 break;
>                         } else if (caddr < lo) {
>                                 /* never found the range for caddr */
>                                 printf("#######no match for addr %p.\n", caddr);
>                                 break;
>                         } else if (caddr < hi) {
>                                 /* start address is in this range */
>                                 size_t rangelen = (size_t)(hi - caddr);
> 
>                                 /* remember that matching has started */
>                                 needmm = 1;
> 
>                                 /* calculate remaining range to search for */
>                                 if (len > rangelen) {
>                                         len -= rangelen;
>                                         caddr += rangelen;
>                                         printf("matched %zu bytes in range "
>                                                 "%p-%p, %zu left over.\n",
>                                                         rangelen, lo, hi, len);
>                                 } else {
>                                         len = 0;
>                                         printf("matched all bytes in range "
>                                                         "%p-%p.\n", lo, hi);
>                                 }
>                         }
>                 } else if (needmm && strncmp(line, vmflags,
>                                         sizeof(vmflags) - 1) == 0) {
>                         if (strstr(&line[sizeof(vmflags) - 1], mm) != NULL) {
>                                 printf("mm flag found.\n");
>                                 if (len == 0) {
>                                         /* entire range matched */
>                                         retval = 1;
>                                         break;
>                                 }
>                                 needmm = 0;     /* saw what was needed */
>                         } else {
>                                 /* mm flag not set for some or all of range */
>                                 printf("range has no mm flag.\n");
>                                 break;
>                         }
>                 }
>         }
> 
>         fclose(fp);
> 
>         printf("returning %d.\n", retval);
>         return retval;
> }
> 
> void *Addr;
> size_t Size;
> 
> /*
>  * worker -- the work each thread performs
>  */
> static void *
> worker(void *arg)
> {
>         int *ret = (int *)arg;
>         *ret =  is_pmem_proc(Addr, Size);
>         return NULL;
> }
> 
> int main(int argc, char *argv[])
> {
>         if (argc <  2 || argc > 3) {
>                 printf("usage: %s file [env].\n", argv[0]);
>                 return -1;
>         }
> 
>         int fd = open(argv[1], O_RDWR);
> 
>         struct stat stbuf;
>         fstat(fd, &stbuf);
> 
>         Size = stbuf.st_size;
>         Addr = mmap(0, stbuf.st_size, PROT_READ|PROT_WRITE, MAP_PRIVATE, fd, 0);
> 
>         close(fd);
> 
>         pthread_t threads[NTHREAD];
>         int ret[NTHREAD];
> 
>         /* kick off NTHREAD threads */
>         for (int i = 0; i < NTHREAD; i++)
>                 pthread_create(&threads[i], NULL, worker, &ret[i]);
> 
>         /* wait for all the threads to complete */
>         for (int i = 0; i < NTHREAD; i++)
>                 pthread_join(threads[i], NULL);
> 
>         /* verify that all the threads return the same value */
>         for (int i = 1; i < NTHREAD; i++) {
>                 if (ret[0] != ret[i]) {
>                         printf("Error i %d ret[0] = %d ret[i] = %d.\n", i,
>                                 ret[0], ret[i]);
>                 }
>         }
> 
>         printf("%d", ret[0]);
>         return 0;
> }
> 
> It failed as some threads can not find the memory region in
> "/proc/self/smaps" which is allocated in the main process
> 
> It is caused by proc fs which uses 'file->version' to indicate the VMA that
> is the last one has already been handled by read() system call. When the
> next read() issues, it uses the 'version' to find the VMA, then the next
> VMA is what we want to handle, the related code is as follows:
> 
>         if (last_addr) {
>                 vma = find_vma(mm, last_addr);
>                 if (vma && (vma = m_next_vma(priv, vma)))
>                         return vma;
>         }
> 
> However, VMA will be lost if the last VMA is gone, e.g:
> 
> The process VMA list is A->B->C->D
> 
> CPU 0                                  CPU 1
> read() system call
>    handle VMA B
>    version = B
> return to userspace
> 
>                                    unmap VMA B
> 
> issue read() again to continue to get
> the region info
>    find_vma(version) will get VMA C
>    m_next_vma(C) will get VMA D
>    handle D
>    !!! VMA C is lost !!!
> 
> In order to fix this bug, we make 'file->version' indicate the end address
> of current VMA
> 
> Changelog:
> v3:
> 	Thank Michal's pointing. Fix the incompletion of v2's fixing:
> "/proc/<pid>/smaps will report counters for the full vma range while
> the header (aka show_map_vma) will report shorter (non-overlapping) range"
>     Add description in Documentation/filesystems/proc.txt, regarding maps,
> smaps reading's guaruntees.
> 
> v2:
> Thanks to Dave Hansen's comments, this version fixes the issue in v1 that
> same virtual address range may be outputted twice, e.g:
> 
> Take two example VMAs:
> 
> 	vma-A: (0x1000 -> 0x2000)
> 	vma-B: (0x2000 -> 0x3000)
> 
> read() #1: prints vma-A, sets m->version=0x2000
> 
> Now, merge A/B to make C:
> 
> 	vma-C: (0x1000 -> 0x3000)
> 
> read() #2: find_vma(m->version=0x2000), returns vma-C, prints vma-C
> 
> The user will see two VMAs in their output:
> 
> 	A: 0x1000->0x2000
> 	C: 0x1000->0x3000
> 
> Acked-by: Dave Hansen <dave.hansen@intel.com>
> Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>
> Signed-off-by: Robert Hu <robert.hu@intel.com>
> ---
>  fs/proc/task_mmu.c | 17 +++++++++++------
>  1 file changed, 11 insertions(+), 6 deletions(-)
> 
> diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
> index f6fa99e..97abcf7 100644
> --- a/fs/proc/task_mmu.c
> +++ b/fs/proc/task_mmu.c
> @@ -147,7 +147,7 @@ m_next_vma(struct proc_maps_private *priv, struct vm_area_struct *vma)
>  static void m_cache_vma(struct seq_file *m, struct vm_area_struct *vma)
>  {
>  	if (m->count < m->size)	/* vma is copied successfully */
> -		m->version = m_next_vma(m->private, vma) ? vma->vm_start : -1UL;
> +		m->version = m_next_vma(m->private, vma) ? vma->vm_end : -1UL;
>  }
>  
>  static void *m_start(struct seq_file *m, loff_t *ppos)
> @@ -176,14 +176,14 @@ static void *m_start(struct seq_file *m, loff_t *ppos)
>  
>  	if (last_addr) {
>  		vma = find_vma(mm, last_addr);
> -		if (vma && (vma = m_next_vma(priv, vma)))
> +		if (vma)
>  			return vma;
>  	}
>  
>  	m->version = 0;
>  	if (pos < mm->map_count) {
>  		for (vma = mm->mmap; pos; pos--) {
> -			m->version = vma->vm_start;
> +			m->version = vma->vm_end;
>  			vma = vma->vm_next;
>  		}
>  		return vma;
> @@ -293,7 +293,7 @@ show_map_vma(struct seq_file *m, struct vm_area_struct *vma, int is_pid)
>  	vm_flags_t flags = vma->vm_flags;
>  	unsigned long ino = 0;
>  	unsigned long long pgoff = 0;
> -	unsigned long start, end;
> +	unsigned long end, start = m->version;
>  	dev_t dev = 0;
>  	const char *name = NULL;
>  
> @@ -304,8 +304,13 @@ show_map_vma(struct seq_file *m, struct vm_area_struct *vma, int is_pid)
>  		pgoff = ((loff_t)vma->vm_pgoff) << PAGE_SHIFT;
>  	}
>  
> +	/*
> +	 * the region [0, m->version) has already been handled, do not
> +	 * handle it doubly.
> +	 */
> +	start = max(vma->vm_start, start);
> +
>  	/* We don't show the stack guard page in /proc/maps */
> -	start = vma->vm_start;
>  	if (stack_guard_page_start(vma, start))
>  		start += PAGE_SIZE;
>  	end = vma->vm_end;
> @@ -786,7 +791,7 @@ static int show_smap(struct seq_file *m, void *v, int is_pid)
>  		   "KernelPageSize: %8lu kB\n"
>  		   "MMUPageSize:    %8lu kB\n"
>  		   "Locked:         %8lu kB\n",
> -		   (vma->vm_end - vma->vm_start) >> 10,
> +		   (vma->vm_end - max(vma->vm_start, m->version)) >> 10,
>  		   mss.resident >> 10,
>  		   (unsigned long)(mss.pss >> (10 + PSS_SHIFT)),
>  		   mss.shared_clean  >> 10,

What about other m_next users? E.g. show_numa_map (I didn't go and check
further) will operate on vm_start and therefor might provide misleading
results. So again, while this is definitely far from ideal, or goofy if
you will, but isn't it really saner to just tell that partial reads are
more basically undefined rather than having something half baked?
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v3 1/2] mm, proc: Fix region lost in /proc/self/smaps
@ 2016-09-23 13:50   ` Michal Hocko
  0 siblings, 0 replies; 23+ messages in thread
From: Michal Hocko @ 2016-09-23 13:50 UTC (permalink / raw)
  To: Robert Ho
  Cc: pbonzini, akpm, oleg, dan.j.williams, dave.hansen,
	guangrong.xiao, gleb, mtosatti, kvm, linux-kernel, stefanha,
	yuhuang, linux-mm, ross.zwisler

On Fri 23-09-16 21:12:33, Robert Ho wrote:
> From: Xiao Guangrong <guangrong.xiao@linux.intel.com>
> 
> Recently, Redhat reported that nvml test suite failed on QEMU/KVM,
> more detailed info please refer to:
>    https://bugzilla.redhat.com/show_bug.cgi?id=1365721
> 
> Actually, this bug is not only for NVDIMM/DAX but also for any other file
> systems. This simple test case abstracted from nvml can easily reproduce
> this bug in common environment:
> 
> -------------------------- testcase.c -----------------------------
> 
> int
> is_pmem_proc(const void *addr, size_t len)
> {
>         const char *caddr = addr;
> 
>         FILE *fp;
>         if ((fp = fopen("/proc/self/smaps", "r")) == NULL) {
>                 printf("!/proc/self/smaps");
>                 return 0;
>         }
> 
>         int retval = 0;         /* assume false until proven otherwise */
>         char line[PROCMAXLEN];  /* for fgets() */
>         char *lo = NULL;        /* beginning of current range in smaps file */
>         char *hi = NULL;        /* end of current range in smaps file */
>         int needmm = 0;         /* looking for mm flag for current range */
>         while (fgets(line, PROCMAXLEN, fp) != NULL) {
>                 static const char vmflags[] = "VmFlags:";
>                 static const char mm[] = " wr";
> 
>                 /* check for range line */
>                 if (sscanf(line, "%p-%p", &lo, &hi) == 2) {
>                         if (needmm) {
>                                 /* last range matched, but no mm flag found */
>                                 printf("never found mm flag.\n");
>                                 break;
>                         } else if (caddr < lo) {
>                                 /* never found the range for caddr */
>                                 printf("#######no match for addr %p.\n", caddr);
>                                 break;
>                         } else if (caddr < hi) {
>                                 /* start address is in this range */
>                                 size_t rangelen = (size_t)(hi - caddr);
> 
>                                 /* remember that matching has started */
>                                 needmm = 1;
> 
>                                 /* calculate remaining range to search for */
>                                 if (len > rangelen) {
>                                         len -= rangelen;
>                                         caddr += rangelen;
>                                         printf("matched %zu bytes in range "
>                                                 "%p-%p, %zu left over.\n",
>                                                         rangelen, lo, hi, len);
>                                 } else {
>                                         len = 0;
>                                         printf("matched all bytes in range "
>                                                         "%p-%p.\n", lo, hi);
>                                 }
>                         }
>                 } else if (needmm && strncmp(line, vmflags,
>                                         sizeof(vmflags) - 1) == 0) {
>                         if (strstr(&line[sizeof(vmflags) - 1], mm) != NULL) {
>                                 printf("mm flag found.\n");
>                                 if (len == 0) {
>                                         /* entire range matched */
>                                         retval = 1;
>                                         break;
>                                 }
>                                 needmm = 0;     /* saw what was needed */
>                         } else {
>                                 /* mm flag not set for some or all of range */
>                                 printf("range has no mm flag.\n");
>                                 break;
>                         }
>                 }
>         }
> 
>         fclose(fp);
> 
>         printf("returning %d.\n", retval);
>         return retval;
> }
> 
> void *Addr;
> size_t Size;
> 
> /*
>  * worker -- the work each thread performs
>  */
> static void *
> worker(void *arg)
> {
>         int *ret = (int *)arg;
>         *ret =  is_pmem_proc(Addr, Size);
>         return NULL;
> }
> 
> int main(int argc, char *argv[])
> {
>         if (argc <  2 || argc > 3) {
>                 printf("usage: %s file [env].\n", argv[0]);
>                 return -1;
>         }
> 
>         int fd = open(argv[1], O_RDWR);
> 
>         struct stat stbuf;
>         fstat(fd, &stbuf);
> 
>         Size = stbuf.st_size;
>         Addr = mmap(0, stbuf.st_size, PROT_READ|PROT_WRITE, MAP_PRIVATE, fd, 0);
> 
>         close(fd);
> 
>         pthread_t threads[NTHREAD];
>         int ret[NTHREAD];
> 
>         /* kick off NTHREAD threads */
>         for (int i = 0; i < NTHREAD; i++)
>                 pthread_create(&threads[i], NULL, worker, &ret[i]);
> 
>         /* wait for all the threads to complete */
>         for (int i = 0; i < NTHREAD; i++)
>                 pthread_join(threads[i], NULL);
> 
>         /* verify that all the threads return the same value */
>         for (int i = 1; i < NTHREAD; i++) {
>                 if (ret[0] != ret[i]) {
>                         printf("Error i %d ret[0] = %d ret[i] = %d.\n", i,
>                                 ret[0], ret[i]);
>                 }
>         }
> 
>         printf("%d", ret[0]);
>         return 0;
> }
> 
> It failed as some threads can not find the memory region in
> "/proc/self/smaps" which is allocated in the main process
> 
> It is caused by proc fs which uses 'file->version' to indicate the VMA that
> is the last one has already been handled by read() system call. When the
> next read() issues, it uses the 'version' to find the VMA, then the next
> VMA is what we want to handle, the related code is as follows:
> 
>         if (last_addr) {
>                 vma = find_vma(mm, last_addr);
>                 if (vma && (vma = m_next_vma(priv, vma)))
>                         return vma;
>         }
> 
> However, VMA will be lost if the last VMA is gone, e.g:
> 
> The process VMA list is A->B->C->D
> 
> CPU 0                                  CPU 1
> read() system call
>    handle VMA B
>    version = B
> return to userspace
> 
>                                    unmap VMA B
> 
> issue read() again to continue to get
> the region info
>    find_vma(version) will get VMA C
>    m_next_vma(C) will get VMA D
>    handle D
>    !!! VMA C is lost !!!
> 
> In order to fix this bug, we make 'file->version' indicate the end address
> of current VMA
> 
> Changelog:
> v3:
> 	Thank Michal's pointing. Fix the incompletion of v2's fixing:
> "/proc/<pid>/smaps will report counters for the full vma range while
> the header (aka show_map_vma) will report shorter (non-overlapping) range"
>     Add description in Documentation/filesystems/proc.txt, regarding maps,
> smaps reading's guaruntees.
> 
> v2:
> Thanks to Dave Hansen's comments, this version fixes the issue in v1 that
> same virtual address range may be outputted twice, e.g:
> 
> Take two example VMAs:
> 
> 	vma-A: (0x1000 -> 0x2000)
> 	vma-B: (0x2000 -> 0x3000)
> 
> read() #1: prints vma-A, sets m->version=0x2000
> 
> Now, merge A/B to make C:
> 
> 	vma-C: (0x1000 -> 0x3000)
> 
> read() #2: find_vma(m->version=0x2000), returns vma-C, prints vma-C
> 
> The user will see two VMAs in their output:
> 
> 	A: 0x1000->0x2000
> 	C: 0x1000->0x3000
> 
> Acked-by: Dave Hansen <dave.hansen@intel.com>
> Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>
> Signed-off-by: Robert Hu <robert.hu@intel.com>
> ---
>  fs/proc/task_mmu.c | 17 +++++++++++------
>  1 file changed, 11 insertions(+), 6 deletions(-)
> 
> diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
> index f6fa99e..97abcf7 100644
> --- a/fs/proc/task_mmu.c
> +++ b/fs/proc/task_mmu.c
> @@ -147,7 +147,7 @@ m_next_vma(struct proc_maps_private *priv, struct vm_area_struct *vma)
>  static void m_cache_vma(struct seq_file *m, struct vm_area_struct *vma)
>  {
>  	if (m->count < m->size)	/* vma is copied successfully */
> -		m->version = m_next_vma(m->private, vma) ? vma->vm_start : -1UL;
> +		m->version = m_next_vma(m->private, vma) ? vma->vm_end : -1UL;
>  }
>  
>  static void *m_start(struct seq_file *m, loff_t *ppos)
> @@ -176,14 +176,14 @@ static void *m_start(struct seq_file *m, loff_t *ppos)
>  
>  	if (last_addr) {
>  		vma = find_vma(mm, last_addr);
> -		if (vma && (vma = m_next_vma(priv, vma)))
> +		if (vma)
>  			return vma;
>  	}
>  
>  	m->version = 0;
>  	if (pos < mm->map_count) {
>  		for (vma = mm->mmap; pos; pos--) {
> -			m->version = vma->vm_start;
> +			m->version = vma->vm_end;
>  			vma = vma->vm_next;
>  		}
>  		return vma;
> @@ -293,7 +293,7 @@ show_map_vma(struct seq_file *m, struct vm_area_struct *vma, int is_pid)
>  	vm_flags_t flags = vma->vm_flags;
>  	unsigned long ino = 0;
>  	unsigned long long pgoff = 0;
> -	unsigned long start, end;
> +	unsigned long end, start = m->version;
>  	dev_t dev = 0;
>  	const char *name = NULL;
>  
> @@ -304,8 +304,13 @@ show_map_vma(struct seq_file *m, struct vm_area_struct *vma, int is_pid)
>  		pgoff = ((loff_t)vma->vm_pgoff) << PAGE_SHIFT;
>  	}
>  
> +	/*
> +	 * the region [0, m->version) has already been handled, do not
> +	 * handle it doubly.
> +	 */
> +	start = max(vma->vm_start, start);
> +
>  	/* We don't show the stack guard page in /proc/maps */
> -	start = vma->vm_start;
>  	if (stack_guard_page_start(vma, start))
>  		start += PAGE_SIZE;
>  	end = vma->vm_end;
> @@ -786,7 +791,7 @@ static int show_smap(struct seq_file *m, void *v, int is_pid)
>  		   "KernelPageSize: %8lu kB\n"
>  		   "MMUPageSize:    %8lu kB\n"
>  		   "Locked:         %8lu kB\n",
> -		   (vma->vm_end - vma->vm_start) >> 10,
> +		   (vma->vm_end - max(vma->vm_start, m->version)) >> 10,
>  		   mss.resident >> 10,
>  		   (unsigned long)(mss.pss >> (10 + PSS_SHIFT)),
>  		   mss.shared_clean  >> 10,

What about other m_next users? E.g. show_numa_map (I didn't go and check
further) will operate on vm_start and therefor might provide misleading
results. So again, while this is definitely far from ideal, or goofy if
you will, but isn't it really saner to just tell that partial reads are
more basically undefined rather than having something half baked?
-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v3 1/2] mm, proc: Fix region lost in /proc/self/smaps
  2016-09-23 13:12 ` Robert Ho
@ 2016-09-23 13:56   ` Oleg Nesterov
  -1 siblings, 0 replies; 23+ messages in thread
From: Oleg Nesterov @ 2016-09-23 13:56 UTC (permalink / raw)
  To: Robert Ho
  Cc: pbonzini, akpm, mhocko, dan.j.williams, dave.hansen,
	guangrong.xiao, gleb, mtosatti, kvm, linux-kernel, stefanha,
	yuhuang, linux-mm, ross.zwisler

On 09/23, Robert Ho wrote:
>
> --- a/fs/proc/task_mmu.c
> +++ b/fs/proc/task_mmu.c
> @@ -147,7 +147,7 @@ m_next_vma(struct proc_maps_private *priv, struct vm_area_struct *vma)
>  static void m_cache_vma(struct seq_file *m, struct vm_area_struct *vma)
>  {
>  	if (m->count < m->size)	/* vma is copied successfully */
> -		m->version = m_next_vma(m->private, vma) ? vma->vm_start : -1UL;
> +		m->version = m_next_vma(m->private, vma) ? vma->vm_end : -1UL;
>  }

OK.

>  static void *m_start(struct seq_file *m, loff_t *ppos)
> @@ -176,14 +176,14 @@ static void *m_start(struct seq_file *m, loff_t *ppos)
>  
>  	if (last_addr) {
>  		vma = find_vma(mm, last_addr);
> -		if (vma && (vma = m_next_vma(priv, vma)))
> +		if (vma)
>  			return vma;
>  	}

I think we can simplify this patch. And imo make it better. How about

	if (last_addr) {
		vma = find_vma(mm, last_addr - 1);
		if (vma && vma->vm_start <= last_addr)
			vma = m_next_vma(priv, vma);
		if (vma)
			return vma;
	}

?

This way we do not need other changes in show_map_vma(), and the same vma
won't be reported twice (as 2 different vma's) if it grows in between.

Oleg.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v3 1/2] mm, proc: Fix region lost in /proc/self/smaps
@ 2016-09-23 13:56   ` Oleg Nesterov
  0 siblings, 0 replies; 23+ messages in thread
From: Oleg Nesterov @ 2016-09-23 13:56 UTC (permalink / raw)
  To: Robert Ho
  Cc: pbonzini, akpm, mhocko, dan.j.williams, dave.hansen,
	guangrong.xiao, gleb, mtosatti, kvm, linux-kernel, stefanha,
	yuhuang, linux-mm, ross.zwisler

On 09/23, Robert Ho wrote:
>
> --- a/fs/proc/task_mmu.c
> +++ b/fs/proc/task_mmu.c
> @@ -147,7 +147,7 @@ m_next_vma(struct proc_maps_private *priv, struct vm_area_struct *vma)
>  static void m_cache_vma(struct seq_file *m, struct vm_area_struct *vma)
>  {
>  	if (m->count < m->size)	/* vma is copied successfully */
> -		m->version = m_next_vma(m->private, vma) ? vma->vm_start : -1UL;
> +		m->version = m_next_vma(m->private, vma) ? vma->vm_end : -1UL;
>  }

OK.

>  static void *m_start(struct seq_file *m, loff_t *ppos)
> @@ -176,14 +176,14 @@ static void *m_start(struct seq_file *m, loff_t *ppos)
>  
>  	if (last_addr) {
>  		vma = find_vma(mm, last_addr);
> -		if (vma && (vma = m_next_vma(priv, vma)))
> +		if (vma)
>  			return vma;
>  	}

I think we can simplify this patch. And imo make it better. How about

	if (last_addr) {
		vma = find_vma(mm, last_addr - 1);
		if (vma && vma->vm_start <= last_addr)
			vma = m_next_vma(priv, vma);
		if (vma)
			return vma;
	}

?

This way we do not need other changes in show_map_vma(), and the same vma
won't be reported twice (as 2 different vma's) if it grows in between.

Oleg.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v3 1/2] mm, proc: Fix region lost in /proc/self/smaps
  2016-09-23 13:50   ` Michal Hocko
@ 2016-09-23 14:39     ` Michal Hocko
  -1 siblings, 0 replies; 23+ messages in thread
From: Michal Hocko @ 2016-09-23 14:39 UTC (permalink / raw)
  To: Robert Ho
  Cc: pbonzini, akpm, oleg, dan.j.williams, dave.hansen,
	guangrong.xiao, gleb, mtosatti, kvm, linux-kernel, stefanha,
	yuhuang, linux-mm, ross.zwisler

On Fri 23-09-16 15:50:51, Michal Hocko wrote:
> On Fri 23-09-16 21:12:33, Robert Ho wrote:
[...]
> > @@ -786,7 +791,7 @@ static int show_smap(struct seq_file *m, void *v, int is_pid)
> >  		   "KernelPageSize: %8lu kB\n"
> >  		   "MMUPageSize:    %8lu kB\n"
> >  		   "Locked:         %8lu kB\n",
> > -		   (vma->vm_end - vma->vm_start) >> 10,
> > +		   (vma->vm_end - max(vma->vm_start, m->version)) >> 10,
> >  		   mss.resident >> 10,
> >  		   (unsigned long)(mss.pss >> (10 + PSS_SHIFT)),
> >  		   mss.shared_clean  >> 10,

And forgot to mention that this is not sufficient either. You also need
to restrict the pte walk to get sane numbers...
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v3 1/2] mm, proc: Fix region lost in /proc/self/smaps
@ 2016-09-23 14:39     ` Michal Hocko
  0 siblings, 0 replies; 23+ messages in thread
From: Michal Hocko @ 2016-09-23 14:39 UTC (permalink / raw)
  To: Robert Ho
  Cc: pbonzini, akpm, oleg, dan.j.williams, dave.hansen,
	guangrong.xiao, gleb, mtosatti, kvm, linux-kernel, stefanha,
	yuhuang, linux-mm, ross.zwisler

On Fri 23-09-16 15:50:51, Michal Hocko wrote:
> On Fri 23-09-16 21:12:33, Robert Ho wrote:
[...]
> > @@ -786,7 +791,7 @@ static int show_smap(struct seq_file *m, void *v, int is_pid)
> >  		   "KernelPageSize: %8lu kB\n"
> >  		   "MMUPageSize:    %8lu kB\n"
> >  		   "Locked:         %8lu kB\n",
> > -		   (vma->vm_end - vma->vm_start) >> 10,
> > +		   (vma->vm_end - max(vma->vm_start, m->version)) >> 10,
> >  		   mss.resident >> 10,
> >  		   (unsigned long)(mss.pss >> (10 + PSS_SHIFT)),
> >  		   mss.shared_clean  >> 10,

And forgot to mention that this is not sufficient either. You also need
to restrict the pte walk to get sane numbers...
-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v3 1/2] mm, proc: Fix region lost in /proc/self/smaps
  2016-09-23 13:56   ` Oleg Nesterov
@ 2016-09-23 14:53     ` Michal Hocko
  -1 siblings, 0 replies; 23+ messages in thread
From: Michal Hocko @ 2016-09-23 14:53 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Robert Ho, pbonzini, akpm, dan.j.williams, dave.hansen,
	guangrong.xiao, gleb, mtosatti, kvm, linux-kernel, stefanha,
	yuhuang, linux-mm, ross.zwisler

On Fri 23-09-16 15:56:36, Oleg Nesterov wrote:
> On 09/23, Robert Ho wrote:
> >
> > --- a/fs/proc/task_mmu.c
> > +++ b/fs/proc/task_mmu.c
> > @@ -147,7 +147,7 @@ m_next_vma(struct proc_maps_private *priv, struct vm_area_struct *vma)
> >  static void m_cache_vma(struct seq_file *m, struct vm_area_struct *vma)
> >  {
> >  	if (m->count < m->size)	/* vma is copied successfully */
> > -		m->version = m_next_vma(m->private, vma) ? vma->vm_start : -1UL;
> > +		m->version = m_next_vma(m->private, vma) ? vma->vm_end : -1UL;
> >  }
> 
> OK.
> 
> >  static void *m_start(struct seq_file *m, loff_t *ppos)
> > @@ -176,14 +176,14 @@ static void *m_start(struct seq_file *m, loff_t *ppos)
> >  
> >  	if (last_addr) {
> >  		vma = find_vma(mm, last_addr);
> > -		if (vma && (vma = m_next_vma(priv, vma)))
> > +		if (vma)
> >  			return vma;
> >  	}
> 
> I think we can simplify this patch. And imo make it better. How about

it is certainly less subtle because it doesn't report "sub-vmas".

> 	if (last_addr) {
> 		vma = find_vma(mm, last_addr - 1);
> 		if (vma && vma->vm_start <= last_addr)
> 			vma = m_next_vma(priv, vma);
> 		if (vma)
> 			return vma;
> 	}

we would still miss a VMA if the last one got shrunk/split but at least
it would provide monotonic results. So definitely an improvement but
I guess we really want to document that only full reads provide a
consistent (at some moment in time) output.

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v3 1/2] mm, proc: Fix region lost in /proc/self/smaps
@ 2016-09-23 14:53     ` Michal Hocko
  0 siblings, 0 replies; 23+ messages in thread
From: Michal Hocko @ 2016-09-23 14:53 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Robert Ho, pbonzini, akpm, dan.j.williams, dave.hansen,
	guangrong.xiao, gleb, mtosatti, kvm, linux-kernel, stefanha,
	yuhuang, linux-mm, ross.zwisler

On Fri 23-09-16 15:56:36, Oleg Nesterov wrote:
> On 09/23, Robert Ho wrote:
> >
> > --- a/fs/proc/task_mmu.c
> > +++ b/fs/proc/task_mmu.c
> > @@ -147,7 +147,7 @@ m_next_vma(struct proc_maps_private *priv, struct vm_area_struct *vma)
> >  static void m_cache_vma(struct seq_file *m, struct vm_area_struct *vma)
> >  {
> >  	if (m->count < m->size)	/* vma is copied successfully */
> > -		m->version = m_next_vma(m->private, vma) ? vma->vm_start : -1UL;
> > +		m->version = m_next_vma(m->private, vma) ? vma->vm_end : -1UL;
> >  }
> 
> OK.
> 
> >  static void *m_start(struct seq_file *m, loff_t *ppos)
> > @@ -176,14 +176,14 @@ static void *m_start(struct seq_file *m, loff_t *ppos)
> >  
> >  	if (last_addr) {
> >  		vma = find_vma(mm, last_addr);
> > -		if (vma && (vma = m_next_vma(priv, vma)))
> > +		if (vma)
> >  			return vma;
> >  	}
> 
> I think we can simplify this patch. And imo make it better. How about

it is certainly less subtle because it doesn't report "sub-vmas".

> 	if (last_addr) {
> 		vma = find_vma(mm, last_addr - 1);
> 		if (vma && vma->vm_start <= last_addr)
> 			vma = m_next_vma(priv, vma);
> 		if (vma)
> 			return vma;
> 	}

we would still miss a VMA if the last one got shrunk/split but at least
it would provide monotonic results. So definitely an improvement but
I guess we really want to document that only full reads provide a
consistent (at some moment in time) output.

-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v3 1/2] mm, proc: Fix region lost in /proc/self/smaps
  2016-09-23 14:53     ` Michal Hocko
@ 2016-09-23 15:53       ` Oleg Nesterov
  -1 siblings, 0 replies; 23+ messages in thread
From: Oleg Nesterov @ 2016-09-23 15:53 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Robert Ho, pbonzini, akpm, dan.j.williams, dave.hansen,
	guangrong.xiao, gleb, mtosatti, kvm, linux-kernel, stefanha,
	yuhuang, linux-mm, ross.zwisler

On 09/23, Michal Hocko wrote:
>
> On Fri 23-09-16 15:56:36, Oleg Nesterov wrote:
> > 
> > I think we can simplify this patch. And imo make it better. How about
> 
> it is certainly less subtle because it doesn't report "sub-vmas".
> 
> > 	if (last_addr) {
> > 		vma = find_vma(mm, last_addr - 1);
> > 		if (vma && vma->vm_start <= last_addr)
> > 			vma = m_next_vma(priv, vma);
> > 		if (vma)
> > 			return vma;
> > 	}
> 
> we would still miss a VMA if the last one got shrunk/split

Not sure I understand what you mean... If the last one was split
we probably should not report the new vma. Nevermind, in any case
yes, sure, this can't "fix" other corner cases.

> So definitely an improvement but
> I guess we really want to document that only full reads provide a
> consistent (at some moment in time) output.

or all the threads were stopped. Agreed. And again, this applies to
any file in /proc.

Oleg.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v3 1/2] mm, proc: Fix region lost in /proc/self/smaps
@ 2016-09-23 15:53       ` Oleg Nesterov
  0 siblings, 0 replies; 23+ messages in thread
From: Oleg Nesterov @ 2016-09-23 15:53 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Robert Ho, pbonzini, akpm, dan.j.williams, dave.hansen,
	guangrong.xiao, gleb, mtosatti, kvm, linux-kernel, stefanha,
	yuhuang, linux-mm, ross.zwisler

On 09/23, Michal Hocko wrote:
>
> On Fri 23-09-16 15:56:36, Oleg Nesterov wrote:
> > 
> > I think we can simplify this patch. And imo make it better. How about
> 
> it is certainly less subtle because it doesn't report "sub-vmas".
> 
> > 	if (last_addr) {
> > 		vma = find_vma(mm, last_addr - 1);
> > 		if (vma && vma->vm_start <= last_addr)
> > 			vma = m_next_vma(priv, vma);
> > 		if (vma)
> > 			return vma;
> > 	}
> 
> we would still miss a VMA if the last one got shrunk/split

Not sure I understand what you mean... If the last one was split
we probably should not report the new vma. Nevermind, in any case
yes, sure, this can't "fix" other corner cases.

> So definitely an improvement but
> I guess we really want to document that only full reads provide a
> consistent (at some moment in time) output.

or all the threads were stopped. Agreed. And again, this applies to
any file in /proc.

Oleg.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v3 2/2] Documentation/filesystems/proc.txt: Add more description for maps/smaps
  2016-09-23 13:12   ` Robert Ho
@ 2016-09-23 16:06     ` Dave Hansen
  -1 siblings, 0 replies; 23+ messages in thread
From: Dave Hansen @ 2016-09-23 16:06 UTC (permalink / raw)
  To: Robert Ho, pbonzini, akpm, mhocko, oleg, dan.j.williams
  Cc: guangrong.xiao, gleb, mtosatti, kvm, linux-kernel, stefanha,
	yuhuang, linux-mm, ross.zwisler

On 09/23/2016 06:12 AM, Robert Ho wrote:
> +Note: for both /proc/PID/maps and /proc/PID/smaps readings, it's
> +possible in race conditions, that the mappings printed may not be that
> +up-to-date, because during each read walking, the task's mappings may have
> +changed, this typically happens in multithread cases. But anyway in each single
> +read these can be guarunteed: 1) the mapped addresses doesn't go backward; 2) no
> +overlaps 3) if there is something at a given vaddr during the entirety of the
> +life of the smaps/maps walk, there will be some output for it.

Could we spuce this description up a bit?  Perhaps:

Note: reading /proc/PID/maps or /proc/PID/smaps is inherently racy.
This typically manifests when doing partial reads of these files while
the memory map is being modified.  Despite the races, we do provide the
following guarantees:
1) The mapped addresses never go backwards, which implies no two
   regions will ever overlap.
2) If there is something at a given vaddr during the entirety of the
   life of the smaps/maps walk, there will be some output for it.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v3 2/2] Documentation/filesystems/proc.txt: Add more description for maps/smaps
@ 2016-09-23 16:06     ` Dave Hansen
  0 siblings, 0 replies; 23+ messages in thread
From: Dave Hansen @ 2016-09-23 16:06 UTC (permalink / raw)
  To: Robert Ho, pbonzini, akpm, mhocko, oleg, dan.j.williams
  Cc: guangrong.xiao, gleb, mtosatti, kvm, linux-kernel, stefanha,
	yuhuang, linux-mm, ross.zwisler

On 09/23/2016 06:12 AM, Robert Ho wrote:
> +Note: for both /proc/PID/maps and /proc/PID/smaps readings, it's
> +possible in race conditions, that the mappings printed may not be that
> +up-to-date, because during each read walking, the task's mappings may have
> +changed, this typically happens in multithread cases. But anyway in each single
> +read these can be guarunteed: 1) the mapped addresses doesn't go backward; 2) no
> +overlaps 3) if there is something at a given vaddr during the entirety of the
> +life of the smaps/maps walk, there will be some output for it.

Could we spuce this description up a bit?  Perhaps:

Note: reading /proc/PID/maps or /proc/PID/smaps is inherently racy.
This typically manifests when doing partial reads of these files while
the memory map is being modified.  Despite the races, we do provide the
following guarantees:
1) The mapped addresses never go backwards, which implies no two
   regions will ever overlap.
2) If there is something at a given vaddr during the entirety of the
   life of the smaps/maps walk, there will be some output for it.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v3 1/2] mm, proc: Fix region lost in /proc/self/smaps
  2016-09-23 15:53       ` Oleg Nesterov
@ 2016-09-26  8:46         ` Michal Hocko
  -1 siblings, 0 replies; 23+ messages in thread
From: Michal Hocko @ 2016-09-26  8:46 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Robert Ho, pbonzini, akpm, dan.j.williams, dave.hansen,
	guangrong.xiao, gleb, mtosatti, kvm, linux-kernel, stefanha,
	yuhuang, linux-mm, ross.zwisler

On Fri 23-09-16 17:53:51, Oleg Nesterov wrote:
> On 09/23, Michal Hocko wrote:
> >
> > On Fri 23-09-16 15:56:36, Oleg Nesterov wrote:
> > > 
> > > I think we can simplify this patch. And imo make it better. How about
> > 
> > it is certainly less subtle because it doesn't report "sub-vmas".
> > 
> > > 	if (last_addr) {
> > > 		vma = find_vma(mm, last_addr - 1);
> > > 		if (vma && vma->vm_start <= last_addr)
> > > 			vma = m_next_vma(priv, vma);
> > > 		if (vma)
> > > 			return vma;
> > > 	}
> > 
> > we would still miss a VMA if the last one got shrunk/split
> 
> Not sure I understand what you mean... If the last one was split
> we probably should not report the new vma.

Right, VMA split is less of a problem. I meant to say that if the
last_vma->vm_end got lower for whatever reason then we could miss a VMA
right after. We actually might want to display such a VMA because it
could be a completely new one. We just do not know whether it is a
former split with enlarged VMA or a completely new one

[      old VMA     ]   Hole       [   VMA    ]
[ old VMA   ][  New VMa    ]      [   VMA    ]

> Nevermind, in any case yes, sure, this can't "fix" other corner cases.

Agreed, or at least I do not see an easy way for that.
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v3 1/2] mm, proc: Fix region lost in /proc/self/smaps
@ 2016-09-26  8:46         ` Michal Hocko
  0 siblings, 0 replies; 23+ messages in thread
From: Michal Hocko @ 2016-09-26  8:46 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Robert Ho, pbonzini, akpm, dan.j.williams, dave.hansen,
	guangrong.xiao, gleb, mtosatti, kvm, linux-kernel, stefanha,
	yuhuang, linux-mm, ross.zwisler

On Fri 23-09-16 17:53:51, Oleg Nesterov wrote:
> On 09/23, Michal Hocko wrote:
> >
> > On Fri 23-09-16 15:56:36, Oleg Nesterov wrote:
> > > 
> > > I think we can simplify this patch. And imo make it better. How about
> > 
> > it is certainly less subtle because it doesn't report "sub-vmas".
> > 
> > > 	if (last_addr) {
> > > 		vma = find_vma(mm, last_addr - 1);
> > > 		if (vma && vma->vm_start <= last_addr)
> > > 			vma = m_next_vma(priv, vma);
> > > 		if (vma)
> > > 			return vma;
> > > 	}
> > 
> > we would still miss a VMA if the last one got shrunk/split
> 
> Not sure I understand what you mean... If the last one was split
> we probably should not report the new vma.

Right, VMA split is less of a problem. I meant to say that if the
last_vma->vm_end got lower for whatever reason then we could miss a VMA
right after. We actually might want to display such a VMA because it
could be a completely new one. We just do not know whether it is a
former split with enlarged VMA or a completely new one

[      old VMA     ]   Hole       [   VMA    ]
[ old VMA   ][  New VMa    ]      [   VMA    ]

> Nevermind, in any case yes, sure, this can't "fix" other corner cases.

Agreed, or at least I do not see an easy way for that.
-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v3 2/2] Documentation/filesystems/proc.txt: Add more description for maps/smaps
  2016-09-23 16:06     ` Dave Hansen
  (?)
@ 2016-09-28  2:27     ` Robert Hu
  -1 siblings, 0 replies; 23+ messages in thread
From: Robert Hu @ 2016-09-28  2:27 UTC (permalink / raw)
  To: Dave Hansen
  Cc: Robert Ho, pbonzini, akpm, mhocko, oleg, dan.j.williams,
	guangrong.xiao, gleb, mtosatti, kvm, linux-kernel, stefanha,
	yuhuang, linux-mm, ross.zwisler

On Fri, 2016-09-23 at 09:06 -0700, Dave Hansen wrote:
> On 09/23/2016 06:12 AM, Robert Ho wrote:
> > +Note: for both /proc/PID/maps and /proc/PID/smaps readings, it's
> > +possible in race conditions, that the mappings printed may not be that
> > +up-to-date, because during each read walking, the task's mappings may have
> > +changed, this typically happens in multithread cases. But anyway in each single
> > +read these can be guarunteed: 1) the mapped addresses doesn't go backward; 2) no
> > +overlaps 3) if there is something at a given vaddr during the entirety of the
> > +life of the smaps/maps walk, there will be some output for it.
> 
> Could we spuce this description up a bit?  Perhaps:
> 
> Note: reading /proc/PID/maps or /proc/PID/smaps is inherently racy.
> This typically manifests when doing partial reads of these files while
> the memory map is being modified.  Despite the races, we do provide the
> following guarantees:
> 1) The mapped addresses never go backwards, which implies no two
>    regions will ever overlap.
> 2) If there is something at a given vaddr during the entirety of the
>    life of the smaps/maps walk, there will be some output for it.
Sure. Thanks Dave for helping make it more concise and correct.



--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v3 1/2] mm, proc: Fix region lost in /proc/self/smaps
  2016-09-23 14:53     ` Michal Hocko
  (?)
  (?)
@ 2016-09-29 13:05     ` Robert Hu
  -1 siblings, 0 replies; 23+ messages in thread
From: Robert Hu @ 2016-09-29 13:05 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Oleg Nesterov, Robert Ho, pbonzini, akpm, dan.j.williams,
	dave.hansen, guangrong.xiao, gleb, mtosatti, kvm, linux-kernel,
	stefanha, yuhuang, linux-mm, ross.zwisler

On Fri, 2016-09-23 at 16:53 +0200, Michal Hocko wrote:
> On Fri 23-09-16 15:56:36, Oleg Nesterov wrote:
> > On 09/23, Robert Ho wrote:
> > >
> > > --- a/fs/proc/task_mmu.c
> > > +++ b/fs/proc/task_mmu.c
> > > @@ -147,7 +147,7 @@ m_next_vma(struct proc_maps_private *priv, struct vm_area_struct *vma)
> > >  static void m_cache_vma(struct seq_file *m, struct vm_area_struct *vma)
> > >  {
> > >  	if (m->count < m->size)	/* vma is copied successfully */
> > > -		m->version = m_next_vma(m->private, vma) ? vma->vm_start : -1UL;
> > > +		m->version = m_next_vma(m->private, vma) ? vma->vm_end : -1UL;
> > >  }
> > 
> > OK.
> > 
> > >  static void *m_start(struct seq_file *m, loff_t *ppos)
> > > @@ -176,14 +176,14 @@ static void *m_start(struct seq_file *m, loff_t *ppos)
> > >  
> > >  	if (last_addr) {
> > >  		vma = find_vma(mm, last_addr);
> > > -		if (vma && (vma = m_next_vma(priv, vma)))
> > > +		if (vma)
> > >  			return vma;
> > >  	}
> > 
> > I think we can simplify this patch. And imo make it better. How about
> 
> it is certainly less subtle because it doesn't report "sub-vmas".
> 
> > 	if (last_addr) {
> > 		vma = find_vma(mm, last_addr - 1);
> > 		if (vma && vma->vm_start <= last_addr)
> > 			vma = m_next_vma(priv, vma);
> > 		if (vma)
> > 			return vma;
> > 	}
> 
> we would still miss a VMA if the last one got shrunk/split but at least
> it would provide monotonic results. So definitely an improvement but
> I guess we really want to document that only full reads provide a
> consistent (at some moment in time) output.

Indeed an improvement. I prefer Oleg's approach as well.
> 


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v3 1/2] mm, proc: Fix region lost in /proc/self/smaps
  2016-09-26  8:46         ` Michal Hocko
  (?)
@ 2016-09-29 13:14         ` Robert Hu
  2016-09-29 13:42             ` Michal Hocko
  -1 siblings, 1 reply; 23+ messages in thread
From: Robert Hu @ 2016-09-29 13:14 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Oleg Nesterov, Robert Ho, pbonzini, akpm, dan.j.williams,
	dave.hansen, guangrong.xiao, gleb, mtosatti, kvm, linux-kernel,
	stefanha, yuhuang, linux-mm, ross.zwisler

On Mon, 2016-09-26 at 10:46 +0200, Michal Hocko wrote:
> On Fri 23-09-16 17:53:51, Oleg Nesterov wrote:
> > On 09/23, Michal Hocko wrote:
> > >
> > > On Fri 23-09-16 15:56:36, Oleg Nesterov wrote:
> > > > 
> > > > I think we can simplify this patch. And imo make it better. How about
> > > 
> > > it is certainly less subtle because it doesn't report "sub-vmas".
> > > 
> > > > 	if (last_addr) {
> > > > 		vma = find_vma(mm, last_addr - 1);
> > > > 		if (vma && vma->vm_start <= last_addr)
> > > > 			vma = m_next_vma(priv, vma);
> > > > 		if (vma)
> > > > 			return vma;
> > > > 	}
> > > 
> > > we would still miss a VMA if the last one got shrunk/split
> > 
> > Not sure I understand what you mean... If the last one was split
> > we probably should not report the new vma.
> 
> Right, VMA split is less of a problem. I meant to say that if the
> last_vma->vm_end got lower for whatever reason then we could miss a VMA
> right after. We actually might want to display such a VMA because it
> could be a completely new one. We just do not know whether it is a
> former split with enlarged VMA or a completely new one
> 
> [      old VMA     ]   Hole       [   VMA    ]
> [ old VMA   ][  New VMa    ]      [   VMA    ]

This is indeed possible. But I see this is like the last_vma enlargement
case. I suggest we accept such missing, as we accept the enlargement
part of last_vma is not printed.

How about we set such target: 1) no duplicate print; 2) no old vma
missing (unless it's unmapped); 3) monotonic printing.
We accept those newly added/changed parts between 2 partial reads is not
printed.

How about above suggestion? If you, Dave, Oleg and others accept it,
then Oleg's improvement can achieve it, I think.
> 
> > Nevermind, in any case yes, sure, this can't "fix" other corner cases.
> 
> Agreed, or at least I do not see an easy way for that.


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v3 1/2] mm, proc: Fix region lost in /proc/self/smaps
  2016-09-29 13:14         ` Robert Hu
@ 2016-09-29 13:42             ` Michal Hocko
  0 siblings, 0 replies; 23+ messages in thread
From: Michal Hocko @ 2016-09-29 13:42 UTC (permalink / raw)
  To: robert.hu
  Cc: Oleg Nesterov, pbonzini, akpm, dan.j.williams, dave.hansen,
	guangrong.xiao, gleb, mtosatti, kvm, linux-kernel, stefanha,
	yuhuang, linux-mm, ross.zwisler

On Thu 29-09-16 21:14:40, Robert Hu wrote:
> On Mon, 2016-09-26 at 10:46 +0200, Michal Hocko wrote:
> > On Fri 23-09-16 17:53:51, Oleg Nesterov wrote:
> > > On 09/23, Michal Hocko wrote:
> > > >
> > > > On Fri 23-09-16 15:56:36, Oleg Nesterov wrote:
> > > > > 
> > > > > I think we can simplify this patch. And imo make it better. How about
> > > > 
> > > > it is certainly less subtle because it doesn't report "sub-vmas".
> > > > 
> > > > > 	if (last_addr) {
> > > > > 		vma = find_vma(mm, last_addr - 1);
> > > > > 		if (vma && vma->vm_start <= last_addr)
> > > > > 			vma = m_next_vma(priv, vma);
> > > > > 		if (vma)
> > > > > 			return vma;
> > > > > 	}
> > > > 
> > > > we would still miss a VMA if the last one got shrunk/split
> > > 
> > > Not sure I understand what you mean... If the last one was split
> > > we probably should not report the new vma.
> > 
> > Right, VMA split is less of a problem. I meant to say that if the
> > last_vma->vm_end got lower for whatever reason then we could miss a VMA
> > right after. We actually might want to display such a VMA because it
> > could be a completely new one. We just do not know whether it is a
> > former split with enlarged VMA or a completely new one
> > 
> > [      old VMA     ]   Hole       [   VMA    ]
> > [ old VMA   ][  New VMa    ]      [   VMA    ]
> 
> This is indeed possible. But I see this is like the last_vma enlargement
> case. I suggest we accept such missing, as we accept the enlargement
> part of last_vma is not printed.
> 
> How about we set such target:

0) consistent output can be achieved only in the single read call

> 1) no duplicate print; 2) no old vma
> missing (unless it's unmapped); 3) monotonic printing.
> We accept those newly added/changed parts between 2 partial reads is not
> printed.

OK
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v3 1/2] mm, proc: Fix region lost in /proc/self/smaps
@ 2016-09-29 13:42             ` Michal Hocko
  0 siblings, 0 replies; 23+ messages in thread
From: Michal Hocko @ 2016-09-29 13:42 UTC (permalink / raw)
  To: robert.hu
  Cc: Oleg Nesterov, pbonzini, akpm, dan.j.williams, dave.hansen,
	guangrong.xiao, gleb, mtosatti, kvm, linux-kernel, stefanha,
	yuhuang, linux-mm, ross.zwisler

On Thu 29-09-16 21:14:40, Robert Hu wrote:
> On Mon, 2016-09-26 at 10:46 +0200, Michal Hocko wrote:
> > On Fri 23-09-16 17:53:51, Oleg Nesterov wrote:
> > > On 09/23, Michal Hocko wrote:
> > > >
> > > > On Fri 23-09-16 15:56:36, Oleg Nesterov wrote:
> > > > > 
> > > > > I think we can simplify this patch. And imo make it better. How about
> > > > 
> > > > it is certainly less subtle because it doesn't report "sub-vmas".
> > > > 
> > > > > 	if (last_addr) {
> > > > > 		vma = find_vma(mm, last_addr - 1);
> > > > > 		if (vma && vma->vm_start <= last_addr)
> > > > > 			vma = m_next_vma(priv, vma);
> > > > > 		if (vma)
> > > > > 			return vma;
> > > > > 	}
> > > > 
> > > > we would still miss a VMA if the last one got shrunk/split
> > > 
> > > Not sure I understand what you mean... If the last one was split
> > > we probably should not report the new vma.
> > 
> > Right, VMA split is less of a problem. I meant to say that if the
> > last_vma->vm_end got lower for whatever reason then we could miss a VMA
> > right after. We actually might want to display such a VMA because it
> > could be a completely new one. We just do not know whether it is a
> > former split with enlarged VMA or a completely new one
> > 
> > [      old VMA     ]   Hole       [   VMA    ]
> > [ old VMA   ][  New VMa    ]      [   VMA    ]
> 
> This is indeed possible. But I see this is like the last_vma enlargement
> case. I suggest we accept such missing, as we accept the enlargement
> part of last_vma is not printed.
> 
> How about we set such target:

0) consistent output can be achieved only in the single read call

> 1) no duplicate print; 2) no old vma
> missing (unless it's unmapped); 3) monotonic printing.
> We accept those newly added/changed parts between 2 partial reads is not
> printed.

OK
-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2016-09-29 13:43 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-09-23 13:12 [PATCH v3 1/2] mm, proc: Fix region lost in /proc/self/smaps Robert Ho
2016-09-23 13:12 ` Robert Ho
2016-09-23 13:12 ` [PATCH v3 2/2] Documentation/filesystems/proc.txt: Add more description for maps/smaps Robert Ho
2016-09-23 13:12   ` Robert Ho
2016-09-23 16:06   ` Dave Hansen
2016-09-23 16:06     ` Dave Hansen
2016-09-28  2:27     ` Robert Hu
2016-09-23 13:50 ` [PATCH v3 1/2] mm, proc: Fix region lost in /proc/self/smaps Michal Hocko
2016-09-23 13:50   ` Michal Hocko
2016-09-23 14:39   ` Michal Hocko
2016-09-23 14:39     ` Michal Hocko
2016-09-23 13:56 ` Oleg Nesterov
2016-09-23 13:56   ` Oleg Nesterov
2016-09-23 14:53   ` Michal Hocko
2016-09-23 14:53     ` Michal Hocko
2016-09-23 15:53     ` Oleg Nesterov
2016-09-23 15:53       ` Oleg Nesterov
2016-09-26  8:46       ` Michal Hocko
2016-09-26  8:46         ` Michal Hocko
2016-09-29 13:14         ` Robert Hu
2016-09-29 13:42           ` Michal Hocko
2016-09-29 13:42             ` Michal Hocko
2016-09-29 13:05     ` Robert Hu

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.