linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH -v7 0/2] Fixing the issue with memory-mapped file times
@ 2008-01-22  0:32 Anton Salikhmetov
  2008-01-22  0:32 ` [PATCH -v7 1/2] Massive code cleanup of sys_msync() Anton Salikhmetov
                   ` (2 more replies)
  0 siblings, 3 replies; 16+ messages in thread
From: Anton Salikhmetov @ 2008-01-22  0:32 UTC (permalink / raw)
  To: linux-mm, jakob, linux-kernel, valdis.kletnieks, riel, ksm,
	staubach, jesper.juhl, torvalds, a.p.zijlstra, akpm, protasnb,
	miklos, r.e.wolff, hidave.darkstar, hch

This is the seventh version of my solution for the bug #2645:

http://bugzilla.kernel.org/show_bug.cgi?id=2645

Since the previous version, the following has changed: based on
Linus' comment, SMP-safe PTE update implemented.

Discussions, which followed my past submissions, showed that it was
tempting to approach this problem using very different assumptions.
However, many such designs have proved to be incomplete or inefficient.

Taking into account the obvious complexity of this problem, I prepared a
design document, the purpose of which is twofold. First, it summarizes
previous attempts to resolve the ctime/mtime issue. Second, it attempts
to prove that what the patches do is necessary and sufficient for mtime
and ctime update provided that we start from a certain sane set of
requirements. The design document is available via the following link:

http://bugzilla.kernel.org/show_bug.cgi?id=2645#c40

For the seventh version, comprehensive performance testing was performed.
The results of performance tests, including numbers, are available here:

http://bugzilla.kernel.org/show_bug.cgi?id=2645#c43

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH -v7 1/2] Massive code cleanup of sys_msync()
  2008-01-22  0:32 [PATCH -v7 0/2] Fixing the issue with memory-mapped file times Anton Salikhmetov
@ 2008-01-22  0:32 ` Anton Salikhmetov
  2008-01-22  0:32 ` [PATCH -v7 2/2] Update ctime and mtime for memory-mapped files Anton Salikhmetov
  2008-01-22  1:34 ` [PATCH -v7 0/2] Fixing the issue with memory-mapped file times Jesper Juhl
  2 siblings, 0 replies; 16+ messages in thread
From: Anton Salikhmetov @ 2008-01-22  0:32 UTC (permalink / raw)
  To: linux-mm, jakob, linux-kernel, valdis.kletnieks, riel, ksm,
	staubach, jesper.juhl, torvalds, a.p.zijlstra, akpm, protasnb,
	miklos, r.e.wolff, hidave.darkstar, hch

Use the PAGE_ALIGN() macro instead of "manual" alignment.
Improve readability of the loop, which traverses the process
memory regions. Make code more symmetric and possibly boost
performance on some RISC CPUs by moving variable assignments.

Signed-off-by: Anton Salikhmetov <salikhmetov@gmail.com>
---
 mm/msync.c |   77 ++++++++++++++++++++++++++++--------------------------------
 1 files changed, 36 insertions(+), 41 deletions(-)

diff --git a/mm/msync.c b/mm/msync.c
index 144a757..a4de868 100644
--- a/mm/msync.c
+++ b/mm/msync.c
@@ -1,85 +1,83 @@
 /*
- *	linux/mm/msync.c
+ * The msync() system call.
  *
- * Copyright (C) 1994-1999  Linus Torvalds
+ * Copyright (C) 1994-1999 Linus Torvalds
+ * Copyright (C) 2008 Anton Salikhmetov <salikhmetov@gmail.com>
  */
 
-/*
- * The msync() system call.
- */
+#include <linux/file.h>
 #include <linux/fs.h>
 #include <linux/mm.h>
 #include <linux/mman.h>
-#include <linux/file.h>
-#include <linux/syscalls.h>
 #include <linux/sched.h>
+#include <linux/syscalls.h>
 
 /*
  * MS_SYNC syncs the entire file - including mappings.
  *
  * MS_ASYNC does not start I/O (it used to, up to 2.5.67).
- * Nor does it marks the relevant pages dirty (it used to up to 2.6.17).
+ * Nor does it mark the relevant pages dirty (it used to up to 2.6.17).
  * Now it doesn't do anything, since dirty pages are properly tracked.
  *
- * The application may now run fsync() to
- * write out the dirty pages and wait on the writeout and check the result.
- * Or the application may run fadvise(FADV_DONTNEED) against the fd to start
- * async writeout immediately.
+ * The application may now run fsync() to write out the dirty pages and
+ * wait on the writeout and check the result. Or the application may run
+ * fadvise(FADV_DONTNEED) against the fd to start async writeout immediately.
  * So by _not_ starting I/O in MS_ASYNC we provide complete flexibility to
  * applications.
  */
 asmlinkage long sys_msync(unsigned long start, size_t len, int flags)
 {
 	unsigned long end;
-	struct mm_struct *mm = current->mm;
+	int error, unmapped_error;
 	struct vm_area_struct *vma;
-	int unmapped_error = 0;
-	int error = -EINVAL;
+	struct mm_struct *mm;
 
+	error = -EINVAL;
 	if (flags & ~(MS_ASYNC | MS_INVALIDATE | MS_SYNC))
 		goto out;
 	if (start & ~PAGE_MASK)
 		goto out;
 	if ((flags & MS_ASYNC) && (flags & MS_SYNC))
 		goto out;
+
 	error = -ENOMEM;
-	len = (len + ~PAGE_MASK) & PAGE_MASK;
+	len = PAGE_ALIGN(len);
 	end = start + len;
 	if (end < start)
 		goto out;
-	error = 0;
+
+	error = unmapped_error = 0;
 	if (end == start)
 		goto out;
+
 	/*
 	 * If the interval [start,end) covers some unmapped address ranges,
 	 * just ignore them, but return -ENOMEM at the end.
 	 */
+	mm = current->mm;
 	down_read(&mm->mmap_sem);
 	vma = find_vma(mm, start);
-	for (;;) {
+	do {
 		struct file *file;
 
-		/* Still start < end. */
 		error = -ENOMEM;
 		if (!vma)
-			goto out_unlock;
-		/* Here start < vma->vm_end. */
+			break;
 		if (start < vma->vm_start) {
 			start = vma->vm_start;
 			if (start >= end)
-				goto out_unlock;
-			unmapped_error = -ENOMEM;
-		}
-		/* Here vma->vm_start <= start < vma->vm_end. */
-		if ((flags & MS_INVALIDATE) &&
-				(vma->vm_flags & VM_LOCKED)) {
-			error = -EBUSY;
-			goto out_unlock;
+				break;
+			unmapped_error = error;
 		}
-		file = vma->vm_file;
+
+		error = -EBUSY;
+		if ((flags & MS_INVALIDATE) && (vma->vm_flags & VM_LOCKED))
+			break;
+
+		error = 0;
 		start = vma->vm_end;
-		if ((flags & MS_SYNC) && file &&
-				(vma->vm_flags & VM_SHARED)) {
+		file = vma->vm_file;
+		if (file && (vma->vm_flags & VM_SHARED) && (flags & MS_SYNC)) {
 			get_file(file);
 			up_read(&mm->mmap_sem);
 			error = do_fsync(file, 0);
@@ -88,16 +86,13 @@ asmlinkage long sys_msync(unsigned long start, size_t len, int flags)
 				goto out;
 			down_read(&mm->mmap_sem);
 			vma = find_vma(mm, start);
-		} else {
-			if (start >= end) {
-				error = 0;
-				goto out_unlock;
-			}
-			vma = vma->vm_next;
+			continue;
 		}
-	}
-out_unlock:
+
+		vma = vma->vm_next;
+	} while (start < end);
 	up_read(&mm->mmap_sem);
+
 out:
-	return error ? : unmapped_error;
+	return error ? error : unmapped_error;
 }
-- 
1.4.4.4


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH -v7 2/2] Update ctime and mtime for memory-mapped files
  2008-01-22  0:32 [PATCH -v7 0/2] Fixing the issue with memory-mapped file times Anton Salikhmetov
  2008-01-22  0:32 ` [PATCH -v7 1/2] Massive code cleanup of sys_msync() Anton Salikhmetov
@ 2008-01-22  0:32 ` Anton Salikhmetov
  2008-01-22  1:40   ` Jesper Juhl
                     ` (2 more replies)
  2008-01-22  1:34 ` [PATCH -v7 0/2] Fixing the issue with memory-mapped file times Jesper Juhl
  2 siblings, 3 replies; 16+ messages in thread
From: Anton Salikhmetov @ 2008-01-22  0:32 UTC (permalink / raw)
  To: linux-mm, jakob, linux-kernel, valdis.kletnieks, riel, ksm,
	staubach, jesper.juhl, torvalds, a.p.zijlstra, akpm, protasnb,
	miklos, r.e.wolff, hidave.darkstar, hch

http://bugzilla.kernel.org/show_bug.cgi?id=2645#c40

Update file times at write references to memory-mapped files.
Force file times update at the next write reference after
calling the msync() system call with the MS_ASYNC flag.

Signed-off-by: Anton Salikhmetov <salikhmetov@gmail.com>
---
 mm/memory.c |    6 ++++++
 mm/msync.c  |   57 ++++++++++++++++++++++++++++++++++++++++++++-------------
 2 files changed, 50 insertions(+), 13 deletions(-)

diff --git a/mm/memory.c b/mm/memory.c
index 6dd1cd8..4b0144b 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -1670,6 +1670,9 @@ gotten:
 unlock:
 	pte_unmap_unlock(page_table, ptl);
 	if (dirty_page) {
+		if (vma->vm_file)
+			file_update_time(vma->vm_file);
+
 		/*
 		 * Yes, Virginia, this is actually required to prevent a race
 		 * with clear_page_dirty_for_io() from clearing the page dirty
@@ -2343,6 +2346,9 @@ out_unlocked:
 	if (anon)
 		page_cache_release(vmf.page);
 	else if (dirty_page) {
+		if (vma->vm_file)
+			file_update_time(vma->vm_file);
+
 		set_page_dirty_balance(dirty_page, page_mkwrite);
 		put_page(dirty_page);
 	}
diff --git a/mm/msync.c b/mm/msync.c
index a4de868..394130d 100644
--- a/mm/msync.c
+++ b/mm/msync.c
@@ -5,6 +5,7 @@
  * Copyright (C) 2008 Anton Salikhmetov <salikhmetov@gmail.com>
  */
 
+#include <asm/tlbflush.h>
 #include <linux/file.h>
 #include <linux/fs.h>
 #include <linux/mm.h>
@@ -13,11 +14,37 @@
 #include <linux/syscalls.h>
 
 /*
+ * Scan the PTEs for pages belonging to the VMA and mark them read-only.
+ * It will force a pagefault on the next write access.
+ */
+static void vma_wrprotect(struct vm_area_struct *vma)
+{
+	unsigned long addr;
+
+	for (addr = vma->vm_start; addr < vma->vm_end; addr += PAGE_SIZE) {
+		spinlock_t *ptl;
+		pgd_t *pgd = pgd_offset(vma->vm_mm, addr);
+		pud_t *pud = pud_offset(pgd, addr);
+		pmd_t *pmd = pmd_offset(pud, addr);
+		pte_t *pte = pte_offset_map_lock(vma->vm_mm, pmd, addr, &ptl);
+
+		if (pte_dirty(*pte) && pte_write(*pte)) {
+			pte_t entry = ptep_clear_flush(vma, addr, pte);
+
+			entry = pte_wrprotect(entry);
+			set_pte_at(vma->vm_mm, addr, pte, entry);
+		}
+		pte_unmap_unlock(pte, ptl);
+	}
+}
+
+/*
  * MS_SYNC syncs the entire file - including mappings.
  *
- * MS_ASYNC does not start I/O (it used to, up to 2.5.67).
- * Nor does it mark the relevant pages dirty (it used to up to 2.6.17).
- * Now it doesn't do anything, since dirty pages are properly tracked.
+ * MS_ASYNC does not start I/O. Instead, it marks the relevant pages
+ * read-only by calling vma_wrprotect(). This is needed to catch the next
+ * write reference to the mapped region and update the file times
+ * accordingly.
  *
  * The application may now run fsync() to write out the dirty pages and
  * wait on the writeout and check the result. Or the application may run
@@ -77,16 +104,20 @@ asmlinkage long sys_msync(unsigned long start, size_t len, int flags)
 		error = 0;
 		start = vma->vm_end;
 		file = vma->vm_file;
-		if (file && (vma->vm_flags & VM_SHARED) && (flags & MS_SYNC)) {
-			get_file(file);
-			up_read(&mm->mmap_sem);
-			error = do_fsync(file, 0);
-			fput(file);
-			if (error || start >= end)
-				goto out;
-			down_read(&mm->mmap_sem);
-			vma = find_vma(mm, start);
-			continue;
+		if (file && (vma->vm_flags & VM_SHARED)) {
+			if (flags & MS_ASYNC)
+				vma_wrprotect(vma);
+			if (flags & MS_SYNC) {
+				get_file(file);
+				up_read(&mm->mmap_sem);
+				error = do_fsync(file, 0);
+				fput(file);
+				if (error || start >= end)
+					goto out;
+				down_read(&mm->mmap_sem);
+				vma = find_vma(mm, start);
+				continue;
+			}
 		}
 
 		vma = vma->vm_next;
-- 
1.4.4.4


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH -v7 0/2] Fixing the issue with memory-mapped file times
  2008-01-22  0:32 [PATCH -v7 0/2] Fixing the issue with memory-mapped file times Anton Salikhmetov
  2008-01-22  0:32 ` [PATCH -v7 1/2] Massive code cleanup of sys_msync() Anton Salikhmetov
  2008-01-22  0:32 ` [PATCH -v7 2/2] Update ctime and mtime for memory-mapped files Anton Salikhmetov
@ 2008-01-22  1:34 ` Jesper Juhl
  2008-01-22  1:40   ` Anton Salikhmetov
  2 siblings, 1 reply; 16+ messages in thread
From: Jesper Juhl @ 2008-01-22  1:34 UTC (permalink / raw)
  To: Anton Salikhmetov
  Cc: linux-mm, jakob, linux-kernel, valdis.kletnieks, riel, ksm,
	staubach, torvalds, a.p.zijlstra, akpm, protasnb, miklos,
	r.e.wolff, hidave.darkstar, hch

On 22/01/2008, Anton Salikhmetov <salikhmetov@gmail.com> wrote:
> This is the seventh version of my solution for the bug #2645:
>
> http://bugzilla.kernel.org/show_bug.cgi?id=2645
>
> Since the previous version, the following has changed: based on
> Linus' comment, SMP-safe PTE update implemented.
>
> Discussions, which followed my past submissions, showed that it was
> tempting to approach this problem using very different assumptions.
> However, many such designs have proved to be incomplete or inefficient.
>
> Taking into account the obvious complexity of this problem, I prepared a
> design document, the purpose of which is twofold. First, it summarizes
> previous attempts to resolve the ctime/mtime issue. Second, it attempts
> to prove that what the patches do is necessary and sufficient for mtime
> and ctime update provided that we start from a certain sane set of
> requirements. The design document is available via the following link:
>
> http://bugzilla.kernel.org/show_bug.cgi?id=2645#c40
>
> For the seventh version, comprehensive performance testing was performed.
> The results of performance tests, including numbers, are available here:
>
> http://bugzilla.kernel.org/show_bug.cgi?id=2645#c43
>

Hi Anton,

I applied your patches here and as far as my own test programs go,
these patches solve the previously observed problems I saw with mtime
not getting updated.

Thank you very much for so persistently working on these long standing issues.

-- 
Jesper Juhl <jesper.juhl@gmail.com>
Don't top-post  http://www.catb.org/~esr/jargon/html/T/top-post.html
Plain text mails only, please      http://www.expita.com/nomime.html

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH -v7 2/2] Update ctime and mtime for memory-mapped files
  2008-01-22  0:32 ` [PATCH -v7 2/2] Update ctime and mtime for memory-mapped files Anton Salikhmetov
@ 2008-01-22  1:40   ` Jesper Juhl
  2008-01-22  1:51     ` Anton Salikhmetov
  2008-01-22  2:16   ` Linus Torvalds
  2008-01-22  4:39   ` Andi Kleen
  2 siblings, 1 reply; 16+ messages in thread
From: Jesper Juhl @ 2008-01-22  1:40 UTC (permalink / raw)
  To: Anton Salikhmetov
  Cc: linux-mm, jakob, linux-kernel, valdis.kletnieks, riel, ksm,
	staubach, torvalds, a.p.zijlstra, akpm, protasnb, miklos,
	r.e.wolff, hidave.darkstar, hch

Some very pedantic nitpicking below;

On 22/01/2008, Anton Salikhmetov <salikhmetov@gmail.com> wrote:
> http://bugzilla.kernel.org/show_bug.cgi?id=2645#c40
>
> Update file times at write references to memory-mapped files.
> Force file times update at the next write reference after
> calling the msync() system call with the MS_ASYNC flag.
>
> Signed-off-by: Anton Salikhmetov <salikhmetov@gmail.com>
> ---
>  mm/memory.c |    6 ++++++
>  mm/msync.c  |   57 ++++++++++++++++++++++++++++++++++++++++++++-------------
>  2 files changed, 50 insertions(+), 13 deletions(-)
>
> diff --git a/mm/memory.c b/mm/memory.c
> index 6dd1cd8..4b0144b 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -1670,6 +1670,9 @@ gotten:
>  unlock:
>         pte_unmap_unlock(page_table, ptl);
>         if (dirty_page) {
> +               if (vma->vm_file)
> +                       file_update_time(vma->vm_file);
> +
>                 /*
>                  * Yes, Virginia, this is actually required to prevent a race
>                  * with clear_page_dirty_for_io() from clearing the page dirty
> @@ -2343,6 +2346,9 @@ out_unlocked:
>         if (anon)
>                 page_cache_release(vmf.page);
>         else if (dirty_page) {
> +               if (vma->vm_file)
> +                       file_update_time(vma->vm_file);
> +
>                 set_page_dirty_balance(dirty_page, page_mkwrite);
>                 put_page(dirty_page);
>         }
> diff --git a/mm/msync.c b/mm/msync.c
> index a4de868..394130d 100644
> --- a/mm/msync.c
> +++ b/mm/msync.c
> @@ -5,6 +5,7 @@
>   * Copyright (C) 2008 Anton Salikhmetov <salikhmetov@gmail.com>
>   */
>
> +#include <asm/tlbflush.h>
>  #include <linux/file.h>
>  #include <linux/fs.h>
>  #include <linux/mm.h>
> @@ -13,11 +14,37 @@
>  #include <linux/syscalls.h>
>
>  /*
> + * Scan the PTEs for pages belonging to the VMA and mark them read-only.
> + * It will force a pagefault on the next write access.
> + */
> +static void vma_wrprotect(struct vm_area_struct *vma)
> +{
> +       unsigned long addr;
> +
> +       for (addr = vma->vm_start; addr < vma->vm_end; addr += PAGE_SIZE) {

I know it's not the common "Linux Kernel way", but 'addr' could be
made to have just 'for' scope here according to C99;

       for (unsigned long addr = vma->vm_start; addr < vma->vm_end;
addr += PAGE_SIZE) {


> +               spinlock_t *ptl;
> +               pgd_t *pgd = pgd_offset(vma->vm_mm, addr);
> +               pud_t *pud = pud_offset(pgd, addr);
> +               pmd_t *pmd = pmd_offset(pud, addr);
> +               pte_t *pte = pte_offset_map_lock(vma->vm_mm, pmd, addr, &ptl);
> +
> +               if (pte_dirty(*pte) && pte_write(*pte)) {
> +                       pte_t entry = ptep_clear_flush(vma, addr, pte);
> +
> +                       entry = pte_wrprotect(entry);
> +                       set_pte_at(vma->vm_mm, addr, pte, entry);
> +               }
> +               pte_unmap_unlock(pte, ptl);
> +       }
> +}
> +
> +/*
>   * MS_SYNC syncs the entire file - including mappings.
>   *
> - * MS_ASYNC does not start I/O (it used to, up to 2.5.67).
> - * Nor does it mark the relevant pages dirty (it used to up to 2.6.17).
> - * Now it doesn't do anything, since dirty pages are properly tracked.

I think keeping some version of the "up to ..." comments makes sense.
It documents that we previously had different behaviour.

> + * MS_ASYNC does not start I/O. Instead, it marks the relevant pages
> + * read-only by calling vma_wrprotect(). This is needed to catch the next
> + * write reference to the mapped region and update the file times
> + * accordingly.
>   *
>   * The application may now run fsync() to write out the dirty pages and
>   * wait on the writeout and check the result. Or the application may run
> @@ -77,16 +104,20 @@ asmlinkage long sys_msync(unsigned long start, size_t len, int flags)
>                 error = 0;
>                 start = vma->vm_end;
>                 file = vma->vm_file;
> -               if (file && (vma->vm_flags & VM_SHARED) && (flags & MS_SYNC)) {
> -                       get_file(file);
> -                       up_read(&mm->mmap_sem);
> -                       error = do_fsync(file, 0);
> -                       fput(file);
> -                       if (error || start >= end)
> -                               goto out;
> -                       down_read(&mm->mmap_sem);
> -                       vma = find_vma(mm, start);
> -                       continue;
> +               if (file && (vma->vm_flags & VM_SHARED)) {
> +                       if (flags & MS_ASYNC)
> +                               vma_wrprotect(vma);
> +                       if (flags & MS_SYNC) {

"else if" ??

> +                               get_file(file);
> +                               up_read(&mm->mmap_sem);
> +                               error = do_fsync(file, 0);
> +                               fput(file);
> +                               if (error || start >= end)
> +                                       goto out;
> +                               down_read(&mm->mmap_sem);
> +                               vma = find_vma(mm, start);
> +                               continue;
> +                       }
>                 }
>
>                 vma = vma->vm_next;

-- 
Jesper Juhl <jesper.juhl@gmail.com>
Don't top-post  http://www.catb.org/~esr/jargon/html/T/top-post.html
Plain text mails only, please      http://www.expita.com/nomime.html

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH -v7 0/2] Fixing the issue with memory-mapped file times
  2008-01-22  1:34 ` [PATCH -v7 0/2] Fixing the issue with memory-mapped file times Jesper Juhl
@ 2008-01-22  1:40   ` Anton Salikhmetov
  0 siblings, 0 replies; 16+ messages in thread
From: Anton Salikhmetov @ 2008-01-22  1:40 UTC (permalink / raw)
  To: Jesper Juhl
  Cc: linux-mm, jakob, linux-kernel, valdis.kletnieks, riel, ksm,
	staubach, torvalds, a.p.zijlstra, akpm, protasnb, miklos,
	r.e.wolff, hidave.darkstar, hch

2008/1/22, Jesper Juhl <jesper.juhl@gmail.com>:
> On 22/01/2008, Anton Salikhmetov <salikhmetov@gmail.com> wrote:
> > This is the seventh version of my solution for the bug #2645:
> >
> > http://bugzilla.kernel.org/show_bug.cgi?id=2645
> >
> > Since the previous version, the following has changed: based on
> > Linus' comment, SMP-safe PTE update implemented.
> >
> > Discussions, which followed my past submissions, showed that it was
> > tempting to approach this problem using very different assumptions.
> > However, many such designs have proved to be incomplete or inefficient.
> >
> > Taking into account the obvious complexity of this problem, I prepared a
> > design document, the purpose of which is twofold. First, it summarizes
> > previous attempts to resolve the ctime/mtime issue. Second, it attempts
> > to prove that what the patches do is necessary and sufficient for mtime
> > and ctime update provided that we start from a certain sane set of
> > requirements. The design document is available via the following link:
> >
> > http://bugzilla.kernel.org/show_bug.cgi?id=2645#c40
> >
> > For the seventh version, comprehensive performance testing was performed.
> > The results of performance tests, including numbers, are available here:
> >
> > http://bugzilla.kernel.org/show_bug.cgi?id=2645#c43
> >
>
> Hi Anton,
>
> I applied your patches here and as far as my own test programs go,
> these patches solve the previously observed problems I saw with mtime
> not getting updated.
>
> Thank you very much for so persistently working on these long standing issues.

Thank you very much for testing these patches!

>
> --
> Jesper Juhl <jesper.juhl@gmail.com>
> Don't top-post  http://www.catb.org/~esr/jargon/html/T/top-post.html
> Plain text mails only, please      http://www.expita.com/nomime.html
>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH -v7 2/2] Update ctime and mtime for memory-mapped files
  2008-01-22  1:40   ` Jesper Juhl
@ 2008-01-22  1:51     ` Anton Salikhmetov
  2008-01-22  1:54       ` Jesper Juhl
  2008-01-22  2:07       ` Anton Salikhmetov
  0 siblings, 2 replies; 16+ messages in thread
From: Anton Salikhmetov @ 2008-01-22  1:51 UTC (permalink / raw)
  To: Jesper Juhl
  Cc: linux-mm, jakob, linux-kernel, valdis.kletnieks, riel, ksm,
	staubach, torvalds, a.p.zijlstra, akpm, protasnb, miklos,
	r.e.wolff, hidave.darkstar, hch

2008/1/22, Jesper Juhl <jesper.juhl@gmail.com>:
> Some very pedantic nitpicking below;
>
> On 22/01/2008, Anton Salikhmetov <salikhmetov@gmail.com> wrote:
> > http://bugzilla.kernel.org/show_bug.cgi?id=2645#c40
> >
> > Update file times at write references to memory-mapped files.
> > Force file times update at the next write reference after
> > calling the msync() system call with the MS_ASYNC flag.
> >
> > Signed-off-by: Anton Salikhmetov <salikhmetov@gmail.com>
> > ---
> >  mm/memory.c |    6 ++++++
> >  mm/msync.c  |   57 ++++++++++++++++++++++++++++++++++++++++++++-------------
> >  2 files changed, 50 insertions(+), 13 deletions(-)
> >
> > diff --git a/mm/memory.c b/mm/memory.c
> > index 6dd1cd8..4b0144b 100644
> > --- a/mm/memory.c
> > +++ b/mm/memory.c
> > @@ -1670,6 +1670,9 @@ gotten:
> >  unlock:
> >         pte_unmap_unlock(page_table, ptl);
> >         if (dirty_page) {
> > +               if (vma->vm_file)
> > +                       file_update_time(vma->vm_file);
> > +
> >                 /*
> >                  * Yes, Virginia, this is actually required to prevent a race
> >                  * with clear_page_dirty_for_io() from clearing the page dirty
> > @@ -2343,6 +2346,9 @@ out_unlocked:
> >         if (anon)
> >                 page_cache_release(vmf.page);
> >         else if (dirty_page) {
> > +               if (vma->vm_file)
> > +                       file_update_time(vma->vm_file);
> > +
> >                 set_page_dirty_balance(dirty_page, page_mkwrite);
> >                 put_page(dirty_page);
> >         }
> > diff --git a/mm/msync.c b/mm/msync.c
> > index a4de868..394130d 100644
> > --- a/mm/msync.c
> > +++ b/mm/msync.c
> > @@ -5,6 +5,7 @@
> >   * Copyright (C) 2008 Anton Salikhmetov <salikhmetov@gmail.com>
> >   */
> >
> > +#include <asm/tlbflush.h>
> >  #include <linux/file.h>
> >  #include <linux/fs.h>
> >  #include <linux/mm.h>
> > @@ -13,11 +14,37 @@
> >  #include <linux/syscalls.h>
> >
> >  /*
> > + * Scan the PTEs for pages belonging to the VMA and mark them read-only.
> > + * It will force a pagefault on the next write access.
> > + */
> > +static void vma_wrprotect(struct vm_area_struct *vma)
> > +{
> > +       unsigned long addr;
> > +
> > +       for (addr = vma->vm_start; addr < vma->vm_end; addr += PAGE_SIZE) {
>
> I know it's not the common "Linux Kernel way", but 'addr' could be
> made to have just 'for' scope here according to C99;

I believe that the C89 style is more common for the Linux kernel, so
I've used the out-of-scope variable declaration.

>
>        for (unsigned long addr = vma->vm_start; addr < vma->vm_end;
> addr += PAGE_SIZE) {
>
>
> > +               spinlock_t *ptl;
> > +               pgd_t *pgd = pgd_offset(vma->vm_mm, addr);
> > +               pud_t *pud = pud_offset(pgd, addr);
> > +               pmd_t *pmd = pmd_offset(pud, addr);
> > +               pte_t *pte = pte_offset_map_lock(vma->vm_mm, pmd, addr, &ptl);
> > +
> > +               if (pte_dirty(*pte) && pte_write(*pte)) {
> > +                       pte_t entry = ptep_clear_flush(vma, addr, pte);
> > +
> > +                       entry = pte_wrprotect(entry);
> > +                       set_pte_at(vma->vm_mm, addr, pte, entry);
> > +               }
> > +               pte_unmap_unlock(pte, ptl);
> > +       }
> > +}
> > +
> > +/*
> >   * MS_SYNC syncs the entire file - including mappings.
> >   *
> > - * MS_ASYNC does not start I/O (it used to, up to 2.5.67).
> > - * Nor does it mark the relevant pages dirty (it used to up to 2.6.17).
> > - * Now it doesn't do anything, since dirty pages are properly tracked.
>
> I think keeping some version of the "up to ..." comments makes sense.
> It documents that we previously had different behaviour.

Earlier I had a request to remove any "changelog-style" comments from the code.

>
> > + * MS_ASYNC does not start I/O. Instead, it marks the relevant pages
> > + * read-only by calling vma_wrprotect(). This is needed to catch the next
> > + * write reference to the mapped region and update the file times
> > + * accordingly.
> >   *
> >   * The application may now run fsync() to write out the dirty pages and
> >   * wait on the writeout and check the result. Or the application may run
> > @@ -77,16 +104,20 @@ asmlinkage long sys_msync(unsigned long start, size_t len, int flags)
> >                 error = 0;
> >                 start = vma->vm_end;
> >                 file = vma->vm_file;
> > -               if (file && (vma->vm_flags & VM_SHARED) && (flags & MS_SYNC)) {
> > -                       get_file(file);
> > -                       up_read(&mm->mmap_sem);
> > -                       error = do_fsync(file, 0);
> > -                       fput(file);
> > -                       if (error || start >= end)
> > -                               goto out;
> > -                       down_read(&mm->mmap_sem);
> > -                       vma = find_vma(mm, start);
> > -                       continue;
> > +               if (file && (vma->vm_flags & VM_SHARED)) {
> > +                       if (flags & MS_ASYNC)
> > +                               vma_wrprotect(vma);
> > +                       if (flags & MS_SYNC) {
>
> "else if" ??

The MS_ASYNC and MS_SYNC flags are mutually exclusive, that is why I
did not use the "else-if" here. Moreover, this function itself checks
that they never come together.

>
> > +                               get_file(file);
> > +                               up_read(&mm->mmap_sem);
> > +                               error = do_fsync(file, 0);
> > +                               fput(file);
> > +                               if (error || start >= end)
> > +                                       goto out;
> > +                               down_read(&mm->mmap_sem);
> > +                               vma = find_vma(mm, start);
> > +                               continue;
> > +                       }
> >                 }
> >
> >                 vma = vma->vm_next;
>
> --
> Jesper Juhl <jesper.juhl@gmail.com>
> Don't top-post  http://www.catb.org/~esr/jargon/html/T/top-post.html
> Plain text mails only, please      http://www.expita.com/nomime.html
>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH -v7 2/2] Update ctime and mtime for memory-mapped files
  2008-01-22  1:51     ` Anton Salikhmetov
@ 2008-01-22  1:54       ` Jesper Juhl
  2008-01-22  1:57         ` Anton Salikhmetov
  2008-01-22  2:07       ` Anton Salikhmetov
  1 sibling, 1 reply; 16+ messages in thread
From: Jesper Juhl @ 2008-01-22  1:54 UTC (permalink / raw)
  To: Anton Salikhmetov
  Cc: linux-mm, jakob, linux-kernel, valdis.kletnieks, riel, ksm,
	staubach, torvalds, a.p.zijlstra, akpm, protasnb, miklos,
	r.e.wolff, hidave.darkstar, hch

On 22/01/2008, Anton Salikhmetov <salikhmetov@gmail.com> wrote:
> 2008/1/22, Jesper Juhl <jesper.juhl@gmail.com>:
> > Some very pedantic nitpicking below;
> >
> > On 22/01/2008, Anton Salikhmetov <salikhmetov@gmail.com> wrote:
...
> > > +               if (file && (vma->vm_flags & VM_SHARED)) {
> > > +                       if (flags & MS_ASYNC)
> > > +                               vma_wrprotect(vma);
> > > +                       if (flags & MS_SYNC) {
> >
> > "else if" ??
>
> The MS_ASYNC and MS_SYNC flags are mutually exclusive, that is why I
> did not use the "else-if" here. Moreover, this function itself checks
> that they never come together.
>

I would say that them being mutually exclusive would be a reason *for*
using "else-if" here.

-- 
Jesper Juhl <jesper.juhl@gmail.com>
Don't top-post  http://www.catb.org/~esr/jargon/html/T/top-post.html
Plain text mails only, please      http://www.expita.com/nomime.html

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH -v7 2/2] Update ctime and mtime for memory-mapped files
  2008-01-22  1:54       ` Jesper Juhl
@ 2008-01-22  1:57         ` Anton Salikhmetov
  2008-01-22  2:18           ` Jesper Juhl
  0 siblings, 1 reply; 16+ messages in thread
From: Anton Salikhmetov @ 2008-01-22  1:57 UTC (permalink / raw)
  To: Jesper Juhl
  Cc: linux-mm, jakob, linux-kernel, valdis.kletnieks, riel, ksm,
	staubach, torvalds, a.p.zijlstra, akpm, protasnb, miklos,
	r.e.wolff, hidave.darkstar, hch

2008/1/22, Jesper Juhl <jesper.juhl@gmail.com>:
> On 22/01/2008, Anton Salikhmetov <salikhmetov@gmail.com> wrote:
> > 2008/1/22, Jesper Juhl <jesper.juhl@gmail.com>:
> > > Some very pedantic nitpicking below;
> > >
> > > On 22/01/2008, Anton Salikhmetov <salikhmetov@gmail.com> wrote:
> ...
> > > > +               if (file && (vma->vm_flags & VM_SHARED)) {
> > > > +                       if (flags & MS_ASYNC)
> > > > +                               vma_wrprotect(vma);
> > > > +                       if (flags & MS_SYNC) {
> > >
> > > "else if" ??
> >
> > The MS_ASYNC and MS_SYNC flags are mutually exclusive, that is why I
> > did not use the "else-if" here. Moreover, this function itself checks
> > that they never come together.
> >
>
> I would say that them being mutually exclusive would be a reason *for*
> using "else-if" here.

This check is performed by the sys_msync() function itself in its very
beginning.

We don't need to check it later.

>
> --
> Jesper Juhl <jesper.juhl@gmail.com>
> Don't top-post  http://www.catb.org/~esr/jargon/html/T/top-post.html
> Plain text mails only, please      http://www.expita.com/nomime.html
>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH -v7 2/2] Update ctime and mtime for memory-mapped files
  2008-01-22  1:51     ` Anton Salikhmetov
  2008-01-22  1:54       ` Jesper Juhl
@ 2008-01-22  2:07       ` Anton Salikhmetov
  2008-01-22  2:16         ` Jesper Juhl
  1 sibling, 1 reply; 16+ messages in thread
From: Anton Salikhmetov @ 2008-01-22  2:07 UTC (permalink / raw)
  To: Jesper Juhl
  Cc: linux-mm, jakob, linux-kernel, valdis.kletnieks, riel, ksm,
	staubach, torvalds, a.p.zijlstra, akpm, protasnb, miklos,
	r.e.wolff, hidave.darkstar, hch

2008/1/22, Anton Salikhmetov <salikhmetov@gmail.com>:
> 2008/1/22, Jesper Juhl <jesper.juhl@gmail.com>:
> > Some very pedantic nitpicking below;
> >
> > On 22/01/2008, Anton Salikhmetov <salikhmetov@gmail.com> wrote:
> > > http://bugzilla.kernel.org/show_bug.cgi?id=2645#c40
> > >
> > > Update file times at write references to memory-mapped files.
> > > Force file times update at the next write reference after
> > > calling the msync() system call with the MS_ASYNC flag.
> > >
> > > Signed-off-by: Anton Salikhmetov <salikhmetov@gmail.com>
> > > ---
> > >  mm/memory.c |    6 ++++++
> > >  mm/msync.c  |   57 ++++++++++++++++++++++++++++++++++++++++++++-------------
> > >  2 files changed, 50 insertions(+), 13 deletions(-)
> > >
> > > diff --git a/mm/memory.c b/mm/memory.c
> > > index 6dd1cd8..4b0144b 100644
> > > --- a/mm/memory.c
> > > +++ b/mm/memory.c
> > > @@ -1670,6 +1670,9 @@ gotten:
> > >  unlock:
> > >         pte_unmap_unlock(page_table, ptl);
> > >         if (dirty_page) {
> > > +               if (vma->vm_file)
> > > +                       file_update_time(vma->vm_file);
> > > +
> > >                 /*
> > >                  * Yes, Virginia, this is actually required to prevent a race
> > >                  * with clear_page_dirty_for_io() from clearing the page dirty
> > > @@ -2343,6 +2346,9 @@ out_unlocked:
> > >         if (anon)
> > >                 page_cache_release(vmf.page);
> > >         else if (dirty_page) {
> > > +               if (vma->vm_file)
> > > +                       file_update_time(vma->vm_file);
> > > +
> > >                 set_page_dirty_balance(dirty_page, page_mkwrite);
> > >                 put_page(dirty_page);
> > >         }
> > > diff --git a/mm/msync.c b/mm/msync.c
> > > index a4de868..394130d 100644
> > > --- a/mm/msync.c
> > > +++ b/mm/msync.c
> > > @@ -5,6 +5,7 @@
> > >   * Copyright (C) 2008 Anton Salikhmetov <salikhmetov@gmail.com>
> > >   */
> > >
> > > +#include <asm/tlbflush.h>
> > >  #include <linux/file.h>
> > >  #include <linux/fs.h>
> > >  #include <linux/mm.h>
> > > @@ -13,11 +14,37 @@
> > >  #include <linux/syscalls.h>
> > >
> > >  /*
> > > + * Scan the PTEs for pages belonging to the VMA and mark them read-only.
> > > + * It will force a pagefault on the next write access.
> > > + */
> > > +static void vma_wrprotect(struct vm_area_struct *vma)
> > > +{
> > > +       unsigned long addr;
> > > +
> > > +       for (addr = vma->vm_start; addr < vma->vm_end; addr += PAGE_SIZE) {
> >
> > I know it's not the common "Linux Kernel way", but 'addr' could be
> > made to have just 'for' scope here according to C99;
>
> I believe that the C89 style is more common for the Linux kernel, so
> I've used the out-of-scope variable declaration.
>
> >
> >        for (unsigned long addr = vma->vm_start; addr < vma->vm_end;
> > addr += PAGE_SIZE) {

By the way, if we're talking "pedantic", then:

>>>

debian:/tmp$ cat c.c
void f()
{
       for (unsigned long i = 0; i < 10; i++)
               continue;
}
debian:/tmp$ gcc -c -pedantic c.c
c.c: In function 'f':
c.c:3: error: 'for' loop initial declaration used outside C99 mode
debian:/tmp$

<<<

No pun intended :)

> >
> >
> > > +               spinlock_t *ptl;
> > > +               pgd_t *pgd = pgd_offset(vma->vm_mm, addr);
> > > +               pud_t *pud = pud_offset(pgd, addr);
> > > +               pmd_t *pmd = pmd_offset(pud, addr);
> > > +               pte_t *pte = pte_offset_map_lock(vma->vm_mm, pmd, addr, &ptl);
> > > +
> > > +               if (pte_dirty(*pte) && pte_write(*pte)) {
> > > +                       pte_t entry = ptep_clear_flush(vma, addr, pte);
> > > +
> > > +                       entry = pte_wrprotect(entry);
> > > +                       set_pte_at(vma->vm_mm, addr, pte, entry);
> > > +               }
> > > +               pte_unmap_unlock(pte, ptl);
> > > +       }
> > > +}
> > > +
> > > +/*
> > >   * MS_SYNC syncs the entire file - including mappings.
> > >   *
> > > - * MS_ASYNC does not start I/O (it used to, up to 2.5.67).
> > > - * Nor does it mark the relevant pages dirty (it used to up to 2.6.17).
> > > - * Now it doesn't do anything, since dirty pages are properly tracked.
> >
> > I think keeping some version of the "up to ..." comments makes sense.
> > It documents that we previously had different behaviour.
>
> Earlier I had a request to remove any "changelog-style" comments from the code.
>
> >
> > > + * MS_ASYNC does not start I/O. Instead, it marks the relevant pages
> > > + * read-only by calling vma_wrprotect(). This is needed to catch the next
> > > + * write reference to the mapped region and update the file times
> > > + * accordingly.
> > >   *
> > >   * The application may now run fsync() to write out the dirty pages and
> > >   * wait on the writeout and check the result. Or the application may run
> > > @@ -77,16 +104,20 @@ asmlinkage long sys_msync(unsigned long start, size_t len, int flags)
> > >                 error = 0;
> > >                 start = vma->vm_end;
> > >                 file = vma->vm_file;
> > > -               if (file && (vma->vm_flags & VM_SHARED) && (flags & MS_SYNC)) {
> > > -                       get_file(file);
> > > -                       up_read(&mm->mmap_sem);
> > > -                       error = do_fsync(file, 0);
> > > -                       fput(file);
> > > -                       if (error || start >= end)
> > > -                               goto out;
> > > -                       down_read(&mm->mmap_sem);
> > > -                       vma = find_vma(mm, start);
> > > -                       continue;
> > > +               if (file && (vma->vm_flags & VM_SHARED)) {
> > > +                       if (flags & MS_ASYNC)
> > > +                               vma_wrprotect(vma);
> > > +                       if (flags & MS_SYNC) {
> >
> > "else if" ??
>
> The MS_ASYNC and MS_SYNC flags are mutually exclusive, that is why I
> did not use the "else-if" here. Moreover, this function itself checks
> that they never come together.
>
> >
> > > +                               get_file(file);
> > > +                               up_read(&mm->mmap_sem);
> > > +                               error = do_fsync(file, 0);
> > > +                               fput(file);
> > > +                               if (error || start >= end)
> > > +                                       goto out;
> > > +                               down_read(&mm->mmap_sem);
> > > +                               vma = find_vma(mm, start);
> > > +                               continue;
> > > +                       }
> > >                 }
> > >
> > >                 vma = vma->vm_next;
> >
> > --
> > Jesper Juhl <jesper.juhl@gmail.com>
> > Don't top-post  http://www.catb.org/~esr/jargon/html/T/top-post.html
> > Plain text mails only, please      http://www.expita.com/nomime.html
> >
>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH -v7 2/2] Update ctime and mtime for memory-mapped files
  2008-01-22  2:07       ` Anton Salikhmetov
@ 2008-01-22  2:16         ` Jesper Juhl
  0 siblings, 0 replies; 16+ messages in thread
From: Jesper Juhl @ 2008-01-22  2:16 UTC (permalink / raw)
  To: Anton Salikhmetov
  Cc: linux-mm, jakob, linux-kernel, valdis.kletnieks, riel, ksm,
	staubach, torvalds, a.p.zijlstra, akpm, protasnb, miklos,
	r.e.wolff, hidave.darkstar, hch

On 22/01/2008, Anton Salikhmetov <salikhmetov@gmail.com> wrote:
> 2008/1/22, Anton Salikhmetov <salikhmetov@gmail.com>:
> > 2008/1/22, Jesper Juhl <jesper.juhl@gmail.com>:
> > > Some very pedantic nitpicking below;
> > >
...
>
> By the way, if we're talking "pedantic", then:
>
> >>>
>
> debian:/tmp$ cat c.c
> void f()
> {
>        for (unsigned long i = 0; i < 10; i++)
>                continue;
> }
> debian:/tmp$ gcc -c -pedantic c.c
> c.c: In function 'f':
> c.c:3: error: 'for' loop initial declaration used outside C99 mode
> debian:/tmp$
>

Well, I just wrote the way I'd have writen the loop, I know it's not
the common kernel style.  Just offering my feedback/input :)


-- 
Jesper Juhl <jesper.juhl@gmail.com>
Don't top-post  http://www.catb.org/~esr/jargon/html/T/top-post.html
Plain text mails only, please      http://www.expita.com/nomime.html

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH -v7 2/2] Update ctime and mtime for memory-mapped files
  2008-01-22  0:32 ` [PATCH -v7 2/2] Update ctime and mtime for memory-mapped files Anton Salikhmetov
  2008-01-22  1:40   ` Jesper Juhl
@ 2008-01-22  2:16   ` Linus Torvalds
  2008-01-22  2:39     ` Anton Salikhmetov
  2008-01-22  4:39   ` Andi Kleen
  2 siblings, 1 reply; 16+ messages in thread
From: Linus Torvalds @ 2008-01-22  2:16 UTC (permalink / raw)
  To: Anton Salikhmetov
  Cc: linux-mm, jakob, Linux Kernel Mailing List, valdis.kletnieks,
	riel, ksm, staubach, jesper.juhl, a.p.zijlstra, Andrew Morton,
	protasnb, miklos, r.e.wolff, hidave.darkstar, hch



On Tue, 22 Jan 2008, Anton Salikhmetov wrote:
>  
>  /*
> + * Scan the PTEs for pages belonging to the VMA and mark them read-only.
> + * It will force a pagefault on the next write access.
> + */
> +static void vma_wrprotect(struct vm_area_struct *vma)
> +{
> +	unsigned long addr;
> +
> +	for (addr = vma->vm_start; addr < vma->vm_end; addr += PAGE_SIZE) {
> +		spinlock_t *ptl;
> +		pgd_t *pgd = pgd_offset(vma->vm_mm, addr);
> +		pud_t *pud = pud_offset(pgd, addr);
> +		pmd_t *pmd = pmd_offset(pud, addr);
> +		pte_t *pte = pte_offset_map_lock(vma->vm_mm, pmd, addr, &ptl);

This is extremely expensive over bigger areas, especially sparsely mapped 
ones (it does all the lookups for all four levels over and over and over 
again for eachg page).

I think Peter Zijlstra posted a version that uses the regular kind of 
nested loop (with inline functions to keep the thing nice and clean), 
which gets rid of that.

[ The sad/funny part is that this is all how we *used* to do msync(), back 
  in the days: we're literally going back to the "pre-cleanup" logic. See 
  commit 204ec841fbea3e5138168edbc3a76d46747cc987: "mm: msync() cleanup" 
  for details ]

Quite frankly, I really think you might be better off just doing a

	git revert 204ec841fbea3e5138168edbc3a76d46747cc987

and working from there! I just checked, and it still reverts cleanly, and 
you'd end up with a nice code-base that (a) has gotten years of testing 
and (b) already has the looping-over-the-pagetables code.

			Linus

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH -v7 2/2] Update ctime and mtime for memory-mapped files
  2008-01-22  1:57         ` Anton Salikhmetov
@ 2008-01-22  2:18           ` Jesper Juhl
  0 siblings, 0 replies; 16+ messages in thread
From: Jesper Juhl @ 2008-01-22  2:18 UTC (permalink / raw)
  To: Anton Salikhmetov
  Cc: linux-mm, jakob, linux-kernel, valdis.kletnieks, riel, ksm,
	staubach, torvalds, a.p.zijlstra, akpm, protasnb, miklos,
	r.e.wolff, hidave.darkstar, hch

On 22/01/2008, Anton Salikhmetov <salikhmetov@gmail.com> wrote:
> 2008/1/22, Jesper Juhl <jesper.juhl@gmail.com>:
> > On 22/01/2008, Anton Salikhmetov <salikhmetov@gmail.com> wrote:
> > > 2008/1/22, Jesper Juhl <jesper.juhl@gmail.com>:
> > > > Some very pedantic nitpicking below;
> > > >
> > > > On 22/01/2008, Anton Salikhmetov <salikhmetov@gmail.com> wrote:
> > ...
> > > > > +               if (file && (vma->vm_flags & VM_SHARED)) {
> > > > > +                       if (flags & MS_ASYNC)
> > > > > +                               vma_wrprotect(vma);
> > > > > +                       if (flags & MS_SYNC) {
> > > >
> > > > "else if" ??
> > >
> > > The MS_ASYNC and MS_SYNC flags are mutually exclusive, that is why I
> > > did not use the "else-if" here. Moreover, this function itself checks
> > > that they never come together.
> > >
> >
> > I would say that them being mutually exclusive would be a reason *for*
> > using "else-if" here.
>
> This check is performed by the sys_msync() function itself in its very
> beginning.
>
> We don't need to check it later.
>

Sure, it's just that, to me, using 'else-if' makes it explicit that
the two are mutually exclusive. Using "if (...), if (...)" doesn't.
Maybe it's just me, but I feel that 'else-if' here better shows the
intention...  No big deal.

-- 
Jesper Juhl <jesper.juhl@gmail.com>
Don't top-post  http://www.catb.org/~esr/jargon/html/T/top-post.html
Plain text mails only, please      http://www.expita.com/nomime.html

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH -v7 2/2] Update ctime and mtime for memory-mapped files
  2008-01-22  2:16   ` Linus Torvalds
@ 2008-01-22  2:39     ` Anton Salikhmetov
  2008-01-22  8:52       ` Miklos Szeredi
  0 siblings, 1 reply; 16+ messages in thread
From: Anton Salikhmetov @ 2008-01-22  2:39 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: linux-mm, jakob, Linux Kernel Mailing List, valdis.kletnieks,
	riel, ksm, staubach, jesper.juhl, a.p.zijlstra, Andrew Morton,
	protasnb, miklos, r.e.wolff, hidave.darkstar, hch

2008/1/22, Linus Torvalds <torvalds@linux-foundation.org>:
>
>
> On Tue, 22 Jan 2008, Anton Salikhmetov wrote:
> >
> >  /*
> > + * Scan the PTEs for pages belonging to the VMA and mark them read-only.
> > + * It will force a pagefault on the next write access.
> > + */
> > +static void vma_wrprotect(struct vm_area_struct *vma)
> > +{
> > +     unsigned long addr;
> > +
> > +     for (addr = vma->vm_start; addr < vma->vm_end; addr += PAGE_SIZE) {
> > +             spinlock_t *ptl;
> > +             pgd_t *pgd = pgd_offset(vma->vm_mm, addr);
> > +             pud_t *pud = pud_offset(pgd, addr);
> > +             pmd_t *pmd = pmd_offset(pud, addr);
> > +             pte_t *pte = pte_offset_map_lock(vma->vm_mm, pmd, addr, &ptl);
>
> This is extremely expensive over bigger areas, especially sparsely mapped
> ones (it does all the lookups for all four levels over and over and over
> again for eachg page).
>
> I think Peter Zijlstra posted a version that uses the regular kind of
> nested loop (with inline functions to keep the thing nice and clean),
> which gets rid of that.

Thanks for your feedback, Linus!

I will use Peter Zijlstra's version of such an operation in my next
patch series.

>
> [ The sad/funny part is that this is all how we *used* to do msync(), back
>   in the days: we're literally going back to the "pre-cleanup" logic. See
>   commit 204ec841fbea3e5138168edbc3a76d46747cc987: "mm: msync() cleanup"
>   for details ]
>
> Quite frankly, I really think you might be better off just doing a
>
>         git revert 204ec841fbea3e5138168edbc3a76d46747cc987
>
> and working from there! I just checked, and it still reverts cleanly, and
> you'd end up with a nice code-base that (a) has gotten years of testing
> and (b) already has the looping-over-the-pagetables code.
>
>                         Linus
>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH -v7 2/2] Update ctime and mtime for memory-mapped files
  2008-01-22  0:32 ` [PATCH -v7 2/2] Update ctime and mtime for memory-mapped files Anton Salikhmetov
  2008-01-22  1:40   ` Jesper Juhl
  2008-01-22  2:16   ` Linus Torvalds
@ 2008-01-22  4:39   ` Andi Kleen
  2 siblings, 0 replies; 16+ messages in thread
From: Andi Kleen @ 2008-01-22  4:39 UTC (permalink / raw)
  To: Anton Salikhmetov
  Cc: linux-mm, jakob, linux-kernel, valdis.kletnieks, riel, ksm,
	staubach, jesper.juhl, torvalds

Anton Salikhmetov <salikhmetov@gmail.com> writes:

You should probably put your design document somewhere in Documentation
with a patch.

> + * Scan the PTEs for pages belonging to the VMA and mark them read-only.
> + * It will force a pagefault on the next write access.
> + */
> +static void vma_wrprotect(struct vm_area_struct *vma)
> +{
> +	unsigned long addr;
> +
> +	for (addr = vma->vm_start; addr < vma->vm_end; addr += PAGE_SIZE) {
> +		spinlock_t *ptl;
> +		pgd_t *pgd = pgd_offset(vma->vm_mm, addr);
> +		pud_t *pud = pud_offset(pgd, addr);
> +		pmd_t *pmd = pmd_offset(pud, addr);
> +		pte_t *pte = pte_offset_map_lock(vma->vm_mm, pmd, addr, &ptl);

This means on i386 with highmem ptes you will map/flush tlb/unmap each
PTE individually. You will do 512 times as much work as really needed
per PTE leaf page.

The performance critical address space walkers use a different design
pattern that avoids this.

> +		if (pte_dirty(*pte) && pte_write(*pte)) {
> +			pte_t entry = ptep_clear_flush(vma, addr, pte);

Flushing TLBs unbatched can also be very expensive because if the MM is
shared by several CPUs you'll have a inter-processor interrupt for 
each iteration. They are quite costly even on smaller systems.

It would be better if you did a single flush_tlb_range() at the end.
This means on x86 this will currently always do a full flush, but that's
still better than really slowing down in the heavily multithreaded case.

-Andi

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH -v7 2/2] Update ctime and mtime for memory-mapped files
  2008-01-22  2:39     ` Anton Salikhmetov
@ 2008-01-22  8:52       ` Miklos Szeredi
  0 siblings, 0 replies; 16+ messages in thread
From: Miklos Szeredi @ 2008-01-22  8:52 UTC (permalink / raw)
  To: salikhmetov
  Cc: torvalds, linux-mm, jakob, linux-kernel, valdis.kletnieks, riel,
	ksm, staubach, jesper.juhl, a.p.zijlstra, akpm, protasnb, miklos,
	r.e.wolff, hidave.darkstar, hch

> > >
> > >  /*
> > > + * Scan the PTEs for pages belonging to the VMA and mark them read-only.
> > > + * It will force a pagefault on the next write access.
> > > + */
> > > +static void vma_wrprotect(struct vm_area_struct *vma)
> > > +{
> > > +     unsigned long addr;
> > > +
> > > +     for (addr = vma->vm_start; addr < vma->vm_end; addr += PAGE_SIZE) {
> > > +             spinlock_t *ptl;
> > > +             pgd_t *pgd = pgd_offset(vma->vm_mm, addr);
> > > +             pud_t *pud = pud_offset(pgd, addr);
> > > +             pmd_t *pmd = pmd_offset(pud, addr);
> > > +             pte_t *pte = pte_offset_map_lock(vma->vm_mm, pmd, addr, &ptl);
> >
> > This is extremely expensive over bigger areas, especially sparsely mapped
> > ones (it does all the lookups for all four levels over and over and over
> > again for eachg page).
> >
> > I think Peter Zijlstra posted a version that uses the regular kind of
> > nested loop (with inline functions to keep the thing nice and clean),
> > which gets rid of that.
> 
> Thanks for your feedback, Linus!
> 
> I will use Peter Zijlstra's version of such an operation in my next
> patch series.

But note, that those functions iterate over all the vmas for the given
page range, not just the one msync was performed on.  This might get
even more expensive, if the file is mapped lots of times.

The old version, that Linus was referring to, needs some modification
as well, because it doesn't write protect the ptes, just marks them
clean.

Miklos

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2008-01-22  8:52 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2008-01-22  0:32 [PATCH -v7 0/2] Fixing the issue with memory-mapped file times Anton Salikhmetov
2008-01-22  0:32 ` [PATCH -v7 1/2] Massive code cleanup of sys_msync() Anton Salikhmetov
2008-01-22  0:32 ` [PATCH -v7 2/2] Update ctime and mtime for memory-mapped files Anton Salikhmetov
2008-01-22  1:40   ` Jesper Juhl
2008-01-22  1:51     ` Anton Salikhmetov
2008-01-22  1:54       ` Jesper Juhl
2008-01-22  1:57         ` Anton Salikhmetov
2008-01-22  2:18           ` Jesper Juhl
2008-01-22  2:07       ` Anton Salikhmetov
2008-01-22  2:16         ` Jesper Juhl
2008-01-22  2:16   ` Linus Torvalds
2008-01-22  2:39     ` Anton Salikhmetov
2008-01-22  8:52       ` Miklos Szeredi
2008-01-22  4:39   ` Andi Kleen
2008-01-22  1:34 ` [PATCH -v7 0/2] Fixing the issue with memory-mapped file times Jesper Juhl
2008-01-22  1:40   ` Anton Salikhmetov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).