linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Re: [RFC][PATCH] vm_operation to avoid pagefault/inval race
@ 2003-05-17 18:21 Daniel Phillips
  2003-05-17 19:49 ` Andrew Morton
  0 siblings, 1 reply; 10+ messages in thread
From: Daniel Phillips @ 2003-05-17 18:21 UTC (permalink / raw)
  To: Paul E. McKenney; +Cc: Andrew Morton, Christoph Hellwig, linux-mm, linux-kernel

Please don't take lack of response for lack of interest.  The generic issue 
here is "what are the vfs changes needed to support cross-host mmap?".  You 
defined the problem nicely.

>
> [...]
>
> 5.	One way or another, life is now hard.

Indeed.  In brief, ->nopage just doesn't provide adequate coverage to support 
the cross-host lock.

> One solution would be for the distributed filesystem to hold
> onto a lock or semaphore upon return from the nopage function.
> The problem is that there is no way to determine (in a timely
> fashion) when it safe to release this lock or semaphore.
>
> The attached patch addresses this by adding a nopagedone
> function for when do_no_page() exits.  The filesystem may then
> drop the lock or semaphore in this nopagedone function.
>
> Thoughts?  Is there some other existing way to get this done?

There is.  One way is to make all of do_no_page a hook, and clearly this is 
more generic than what you proposed, since it covers your hook and the rest 
can be done with library calls.  Once you've gone there, the next question to 
ask is "what use is the existing ->nopage" hook, and the answer is: none, 
really.  The existing usage of ->nopage can be replaced by ->do_no_page plus 
library code, and the only problem is, we have to change pretty well every 
filesystem in and out of tree.  So that gets a little, em, interesting from 
the 2.6.0 point of view, which is why I cc'd Andrew on this.  Christoph has 
also expressed interest in this, which explains the other cc.

Any clustered filesystem that wants to support posix mmap is going to need 
this hook, so the sooner we hash this out, the better.

Regards,

Daniel

^ permalink raw reply	[flat|nested] 10+ messages in thread
* [RFC][PATCH] vm_operation to avoid pagefault/inval race
@ 2003-05-13 20:53 Paul E. McKenney
  2003-05-17 16:06 ` Christoph Hellwig
  0 siblings, 1 reply; 10+ messages in thread
From: Paul E. McKenney @ 2003-05-13 20:53 UTC (permalink / raw)
  To: linux-kernel, linux-mm, akpm; +Cc: mjbligh

This patch adds a vm_operations_struct function pointer that allows
networked and distributed filesystems to avoid a race between a
pagefault on an mmap and an invalidation request from some other
node.  The race goes as follows:

1.	A user process on node A accesses a portion of a mapped
	file, resulting in a page fault.  The pagefault handler
	invokes the corresponding nopage function, which reads
	the page into memory.

2.	A user process on node B writes to the same portion of
	the file (either via mmap or write()), therefore sending
	node A an invalidation request to node A.

3.	Node A receives this invalidate request, and dutifully
	invalidates all mmaps.  Except for the one that has
	not yet been fully mapped by step 1.

4.	Node A then executes the rest of do_no_page(), entering
	the now-invalid page into the PTEs.

5.	One way or another, life is now hard.

One solution would be for the distributed filesystem to hold
onto a lock or semaphore upon return from the nopage function.
The problem is that there is no way to determine (in a timely
fashion) when it safe to release this lock or semaphore.

The attached patch addresses this by adding a nopagedone
function for when do_no_page() exits.  The filesystem may then
drop the lock or semaphore in this nopagedone function.

Thoughts?  Is there some other existing way to get this done?

					Thanx, Paul


diff -urN -X dontdiff linux-2.5.69/include/linux/mm.h linux-2.5.69.stmmap/include/linux/mm.h
--- linux-2.5.69/include/linux/mm.h	Sun May  4 16:53:00 2003
+++ linux-2.5.69.stmmap/include/linux/mm.h	Fri May  9 09:30:37 2003
@@ -134,6 +134,7 @@
 	void (*open)(struct vm_area_struct * area);
 	void (*close)(struct vm_area_struct * area);
 	struct page * (*nopage)(struct vm_area_struct * area, unsigned long address, int unused);
+	void (*nopagedone)(struct vm_area_struct * area, unsigned long address, int status);
 	int (*populate)(struct vm_area_struct * area, unsigned long address, unsigned long len, pgprot_t prot, unsigned long pgoff, int nonblock);
 };
 
diff -urN -X dontdiff linux-2.5.69/mm/memory.c linux-2.5.69.stmmap/mm/memory.c
--- linux-2.5.69/mm/memory.c	Sun May  4 16:53:14 2003
+++ linux-2.5.69.stmmap/mm/memory.c	Fri May  9 17:04:09 2003
@@ -1426,6 +1487,9 @@
 	ret = VM_FAULT_OOM;
 out:
 	pte_chain_free(pte_chain);
+	if (vma->vm_ops && vma->vm_ops->nopagedone) {
+		vma->vm_ops->nopagedone(vma, address & PAGE_MASK, ret);
+	}
 	return ret;
 }
 

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2003-05-23 17:32 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2003-05-17 18:21 [RFC][PATCH] vm_operation to avoid pagefault/inval race Daniel Phillips
2003-05-17 19:49 ` Andrew Morton
2003-05-20  1:23   ` Paul E. McKenney
2003-05-20  8:11     ` Andrew Morton
2003-05-23 14:35       ` [RFC][PATCH] Avoid vmtruncate/mmap-page-fault race Paul E. McKenney
2003-05-23 16:21         ` Hugh Dickins
2003-05-23 17:10           ` Daniel Phillips
2003-05-23 17:47             ` Hugh Dickins
  -- strict thread matches above, loose matches on Subject: below --
2003-05-13 20:53 [RFC][PATCH] vm_operation to avoid pagefault/inval race Paul E. McKenney
2003-05-17 16:06 ` Christoph Hellwig

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).