linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Oddness in i_shared_lock and page_table_lock nesting hierarchies ?
@ 2000-11-05  5:51 bsuparna
  2000-11-06  3:14 ` David S. Miller
  2000-11-06 18:42 ` tytso
  0 siblings, 2 replies; 9+ messages in thread
From: bsuparna @ 2000-11-05  5:51 UTC (permalink / raw)
  To: linux-kernel; +Cc: Andi Kleen, kanoj

The vma list lock  can nest with i_shared_lock, as per Kanoj Sarcar's
   note on mem-mgmt locks (Documentation/vm/locking), and currently
   vma list lock == mm->page_table_lock.
   But there appears to be some inconsistency in the hierarchy of these
   2 locks.  (By vma list lock I mean vmlist_access/modify_lock(s) )

   Looking at mmap code, it appears that the vma list lock
   i.e. page_table_lock right now,  is to be acquired first
   (e.g insert_vm_struct which acquires i_shared_lock internally,
   is called under the page_table_lock/vma list lock).
   Elsewhere in madvise too, I see a similar hierarchy.
   In the unmap code, care has been taken not to have these locks
   acquired simultaneously.

   However, in the vmtruncate code, it looks like the hierarchy is
reversed.
  There, the i_shared_lock is acquired, in order to traverse the i_mmap
list
   and inside the loop it calls zap_page_range, which aquires the
   page_table_lock.

  This is odd. Isn't there a possibility of a deadlock if mmap and
truncation
   for the same file happen simultaneously (on an SMP) ?

  I'm wondering if this could be a side effect of the doubling up of the
  page_table_lock as a vma list lock ?

  Or have I missed something ?

  [I have checked upto  2.4-test10-pre5 ]

  I had put this query up on linux-mm, as part of a much larger mail, but
  didn't get any response yet, so I thought of putting up a more focussed
  query this time.

  Regards
  Suparna

  Suparna Bhattacharya
  Systems Software Group, IBM Global Services, India
  E-mail : bsuparna@in.ibm.com
  Phone : 91-80-5267117, Extn : 2525


>List:     linux-mm
>Subject:  Re: Updated Linux 2.4 Status/TODO List (from the ALS show)
>From:     Kanoj Sarcar <kanoj@google.engr.sgi.com>
>Date:     2000-10-13 18:19:06
>[Download message RAW]

>>
>>
>> On Thu, 12 Oct 2000, David S. Miller wrote:
>> >
>> >    page_table_lock is supposed to protect normal page table activity
(like
>> >    what's done in page fault handler) from swapping out.
>> >    However, grabbing this lock in swap-out code is completely missing!
>> >
>> > Audrey, vmlist_access_{un,}lock == unlocking/locking page_table_lock.
>>
>> Yeah, it's an easy mistake to make.
>>
>> I've made it myself - grepping for page_table_lock and coming up empty
in
>> places where I expected it to be.
>>
>> In fact, if somebody sends me patches to remove the
"vmlist_access_lock()"
>> stuff completely, and replace them with explicit page_table_lock things,
>> I'll apply it pretty much immediately. I don't like information hiding,
>> and right now that's the only thing that the vmlist_access_lock() stuff
is
>> doing.

>Linus,
>
>I came up with the vmlist_access_lock/vmlist_modify_lock names early in
>2.3. The reasoning behind that was that in most places where the "vmlist
>lock" was being taken was to protect the vmlist chain, vma_t fields or
>mm_struct fields. The fact that implementation wise this lock could be
>the same as page_table_lock was a good idea that you suggested.
>
>Nevertheless, the name was chosen to indicate what type of things it was
>guarding. For example, in the future, you might actually have a different
>(possibly sleeping) lock to guard the vmachain etc, but still have a
>spin lock for the page_table_lock (No, I don't want to be drawn into a
>discussion of why this might be needed right now). Some of this is
>mentioned in Documentation/vm/locking.
>
>Just thought I would mention, in case you don't recollect some of this
>history. Of course, I understand the "information hiding" part.
>
>Kanoj
>


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Oddness in i_shared_lock and page_table_lock nesting hierarchies ?
  2000-11-05  5:51 Oddness in i_shared_lock and page_table_lock nesting hierarchies ? bsuparna
@ 2000-11-06  3:14 ` David S. Miller
  2000-11-06 18:42 ` tytso
  1 sibling, 0 replies; 9+ messages in thread
From: David S. Miller @ 2000-11-06  3:14 UTC (permalink / raw)
  To: bsuparna; +Cc: linux-kernel, ak, kanoj

   From: bsuparna@in.ibm.com
   Date: 	Sun, 5 Nov 2000 11:21:05 +0530

      However, in the vmtruncate code, it looks like the hierarchy is
      reversed.

It is a well known bug amongst gurus :-) I sent a linux24 bug addition
to Ted Ty'tso a week or so ago but he dropped it aparently.

Ted, I'm resending this below, please add it to the linux24 list
thanks.

X-Coding-System: undecided-unix
Date: Fri, 13 Oct 2000 17:36:04 -0700
From: "David S. Miller" <davem@redhat.com>
To: tytso@mit.edu
Subject: New BUG for todo list


This bug will essentially hard-hang an SMP system if triggered.  Linus
and myself both know about it already for some time now, the fix is
straight forward, just nobody has coded it up and tested it yet.

The problem basically is that mm/memory.c:vmtruncate() violates the
lock acquisition ordering rules when both mm->page_table_lock and
mapping->i_shared_lock must both be acquired.  All other instances (in
the mmap/munmap syscalls for example) acuire the page_table_lock then
the i_shared_lock.  vmtruncate() on the other hand acquires the locks
i_shared_lock first then page_table_lock.

Essentially I would describe this in the TODO list as:

	vmtruncate() violates page_table_lock/i_shared_lock
	acquisition ordering rules leading to deadlock

The fix is to actually keep vmtruncate() how it is, and change the
ordering rules for the rest of the kernel to follow vmtruncate()'s
lock ordering.  Linus agreed with me on this because the more natural
data inspection flow is to go from the object to mappings of that
object.

I'm going to try and work on this change this weekend, but I want it
to be in the bug list so that it _is_ accounted for.

Later,
David S. Miller
davem@redhat.com


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Oddness in i_shared_lock and page_table_lock nesting hierarchies ?
  2000-11-05  5:51 Oddness in i_shared_lock and page_table_lock nesting hierarchies ? bsuparna
  2000-11-06  3:14 ` David S. Miller
@ 2000-11-06 18:42 ` tytso
  1 sibling, 0 replies; 9+ messages in thread
From: tytso @ 2000-11-06 18:42 UTC (permalink / raw)
  To: davem; +Cc: bsuparna, linux-kernel, ak, kanoj

   Date: 	Sun, 5 Nov 2000 19:14:29 -0800
   From: "David S. Miller" <davem@redhat.com>

   It is a well known bug amongst gurus :-) I sent a linux24 bug addition
   to Ted Ty'tso a week or so ago but he dropped it aparently.

I got it, but I thought it was fixed before I had a chance to add it to
the bug list.  I got confused by one of Linus's descriptions of fixes to
the test10-pre* series.

Sorry; my bad.  I'll get it added to the list.

						- Ted

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Oddness in i_shared_lock and page_table_lock nesting hierarchies ?
@ 2000-11-10  9:29 bsuparna
  0 siblings, 0 replies; 9+ messages in thread
From: bsuparna @ 2000-11-10  9:29 UTC (permalink / raw)
  To: David S. Miller; +Cc: aviro, linux-kernel


>Let me save you some time, below is the fix I sent to
>Linus this evening:

Yes it does :-) Thanks.

I had started making changes along similar lines, and then as
I realized all the places it'd affect, I got a little diverted trying to
see if there was a way to do it without having to bring the locks
out of the common insert routines; whether it was possible to
avoid having to hold i_shared_lock and page_table_lock
simultaneously (other than in vmtruncate of course), through a
careful sequencing of steps, since we  really need to serialize
only with truncate and the page stealer (as the mm sem takes
care of most other cases). Unmap does that now as it crosses
multiple objects (unlike the other cases where luckily we just
have one object at a time, which makes it possible to bring
the i_shared_lock to the top).

But it gets kind of complicated and harder to verify correctness
or generalize such an approach.
So your fix looks like a natural and consistent way to do this.

Now I can get back to what I was originally working on when I
hit this :-)

- Suparna



  Suparna Bhattacharya
  Systems Software Group, IBM Global Services, India
  E-mail : bsuparna@in.ibm.com
  Phone : 91-80-5267117, Extn : 2525


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Oddness in i_shared_lock and page_table_lock nesting hierarchies ?
  2000-11-10  6:55 aprasad
@ 2000-11-10  7:34 ` David S. Miller
  0 siblings, 0 replies; 9+ messages in thread
From: David S. Miller @ 2000-11-10  7:34 UTC (permalink / raw)
  To: aprasad; +Cc: bsuparna, linux-kernel

   From: aprasad@in.ibm.com
   Date: 	Fri, 10 Nov 2000 12:25:24 +0530

   this link might be useful
   http://mail.nl.linux.org/linux-mm/2000-07/msg00038.html

This talks about a completely different bug.

We are talking here about inconsistant ordering of lock acquisition
when both mapping->i_shared_lock and mm->page_table_lock must be
held simultaneously.

The thread you quoted merely mentions the issue we have mm->rss
modifications are not always protected properly by the
page_table_lock.

There were no previous public discussions about the
i_shared_lock/page_table_lock deadlock problems.

Later,
David S. Miller
davem@redhat.com
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Oddness in i_shared_lock and page_table_lock nesting hierarchies ?
@ 2000-11-10  6:55 aprasad
  2000-11-10  7:34 ` David S. Miller
  0 siblings, 1 reply; 9+ messages in thread
From: aprasad @ 2000-11-10  6:55 UTC (permalink / raw)
  To: bsuparna; +Cc: linux-kernel

>I was looking into the vmm code and trying to work out exactly how to fix
>this, and there are
>  some questions in my mind - mainly a few cases (involving multiple vma
>updates) which
>  I'm not sure about the cleanest way to tackle.
>  But before I bother anyone with those, I thought I ought to go through
>the earlier discussions
>  that you had while coming up with what the fix should be. Maybe you've
>already gone over
>  this once.
>  Could you point me to those ? Somehow I haven't been successful in
>locating them.

this link might be useful
http://mail.nl.linux.org/linux-mm/2000-07/msg00038.html

Anil Kumar Prasad,
Hardware Design,
IBM Global Services,
Pune,
INDIA
Phone:6349724 , 4007117 extn:1310


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Oddness in i_shared_lock and page_table_lock nesting hierarchies ?
  2000-11-09 12:16 bsuparna
@ 2000-11-10  6:44 ` David S. Miller
  0 siblings, 0 replies; 9+ messages in thread
From: David S. Miller @ 2000-11-10  6:44 UTC (permalink / raw)
  To: bsuparna; +Cc: aviro, linux-kernel

   From: bsuparna@in.ibm.com
   Date: 	Thu, 9 Nov 2000 17:46:53 +0530

     I was looking into the vmm code and trying to work out exactly
   how to fix this

Let me save you some time, below is the fix I sent to
Linus this evening:

diff -u --recursive --new-file --exclude=CVS --exclude=.cvsignore vanilla/linux/include/linux/mm.h linux/include/linux/mm.h
--- vanilla/linux/include/linux/mm.h	Thu Nov  9 21:43:46 2000
+++ linux/include/linux/mm.h	Thu Nov  9 20:16:04 2000
@@ -417,8 +417,11 @@
 extern void swapin_readahead(swp_entry_t);
 
 /* mmap.c */
+extern void lock_vma_mappings(struct vm_area_struct *);
+extern void unlock_vma_mappings(struct vm_area_struct *);
 extern void merge_segments(struct mm_struct *, unsigned long, unsigned long);
 extern void insert_vm_struct(struct mm_struct *, struct vm_area_struct *);
+extern void __insert_vm_struct(struct mm_struct *, struct vm_area_struct *);
 extern void build_mmap_avl(struct mm_struct *);
 extern void exit_mmap(struct mm_struct *);
 extern unsigned long get_unmapped_area(unsigned long, unsigned long);
diff -u --recursive --new-file --exclude=CVS --exclude=.cvsignore vanilla/linux/mm/filemap.c linux/mm/filemap.c
--- vanilla/linux/mm/filemap.c	Thu Nov  9 21:43:46 2000
+++ linux/mm/filemap.c	Thu Nov  9 20:08:24 2000
@@ -1813,11 +1813,13 @@
 	get_file(n->vm_file);
 	if (n->vm_ops && n->vm_ops->open)
 		n->vm_ops->open(n);
+	lock_vma_mappings(vma);
 	spin_lock(&vma->vm_mm->page_table_lock);
 	vma->vm_pgoff += (end - vma->vm_start) >> PAGE_SHIFT;
 	vma->vm_start = end;
-	insert_vm_struct(current->mm, n);
+	__insert_vm_struct(current->mm, n);
 	spin_unlock(&vma->vm_mm->page_table_lock);
+	unlock_vma_mappings(vma);
 	return 0;
 }
 
@@ -1837,10 +1839,12 @@
 	get_file(n->vm_file);
 	if (n->vm_ops && n->vm_ops->open)
 		n->vm_ops->open(n);
+	lock_vma_mappings(vma);
 	spin_lock(&vma->vm_mm->page_table_lock);
 	vma->vm_end = start;
-	insert_vm_struct(current->mm, n);
+	__insert_vm_struct(current->mm, n);
 	spin_unlock(&vma->vm_mm->page_table_lock);
+	unlock_vma_mappings(vma);
 	return 0;
 }
 
@@ -1870,15 +1874,17 @@
 		vma->vm_ops->open(left);
 		vma->vm_ops->open(right);
 	}
+	lock_vma_mappings(vma);
 	spin_lock(&vma->vm_mm->page_table_lock);
 	vma->vm_pgoff += (start - vma->vm_start) >> PAGE_SHIFT;
 	vma->vm_start = start;
 	vma->vm_end = end;
 	setup_read_behavior(vma, behavior);
 	vma->vm_raend = 0;
-	insert_vm_struct(current->mm, left);
-	insert_vm_struct(current->mm, right);
+	__insert_vm_struct(current->mm, left);
+	__insert_vm_struct(current->mm, right);
 	spin_unlock(&vma->vm_mm->page_table_lock);
+	unlock_vma_mappings(vma);
 	return 0;
 }
 
diff -u --recursive --new-file --exclude=CVS --exclude=.cvsignore vanilla/linux/mm/mlock.c linux/mm/mlock.c
--- vanilla/linux/mm/mlock.c	Fri Oct 13 12:10:30 2000
+++ linux/mm/mlock.c	Thu Nov  9 17:47:00 2000
@@ -36,11 +36,13 @@
 		get_file(n->vm_file);
 	if (n->vm_ops && n->vm_ops->open)
 		n->vm_ops->open(n);
+	lock_vma_mappings(vma);
 	spin_lock(&vma->vm_mm->page_table_lock);
 	vma->vm_pgoff += (end - vma->vm_start) >> PAGE_SHIFT;
 	vma->vm_start = end;
-	insert_vm_struct(current->mm, n);
+	__insert_vm_struct(current->mm, n);
 	spin_unlock(&vma->vm_mm->page_table_lock);
+	unlock_vma_mappings(vma);
 	return 0;
 }
 
@@ -61,10 +63,12 @@
 		get_file(n->vm_file);
 	if (n->vm_ops && n->vm_ops->open)
 		n->vm_ops->open(n);
+	lock_vma_mappings(vma);
 	spin_lock(&vma->vm_mm->page_table_lock);
 	vma->vm_end = start;
-	insert_vm_struct(current->mm, n);
+	__insert_vm_struct(current->mm, n);
 	spin_unlock(&vma->vm_mm->page_table_lock);
+	unlock_vma_mappings(vma);
 	return 0;
 }
 
@@ -96,15 +100,17 @@
 		vma->vm_ops->open(left);
 		vma->vm_ops->open(right);
 	}
+	lock_vma_mappings(vma);
 	spin_lock(&vma->vm_mm->page_table_lock);
 	vma->vm_pgoff += (start - vma->vm_start) >> PAGE_SHIFT;
 	vma->vm_start = start;
 	vma->vm_end = end;
 	vma->vm_flags = newflags;
 	vma->vm_raend = 0;
-	insert_vm_struct(current->mm, left);
-	insert_vm_struct(current->mm, right);
+	__insert_vm_struct(current->mm, left);
+	__insert_vm_struct(current->mm, right);
 	spin_unlock(&vma->vm_mm->page_table_lock);
+	unlock_vma_mappings(vma);
 	return 0;
 }
 
diff -u --recursive --new-file --exclude=CVS --exclude=.cvsignore vanilla/linux/mm/mmap.c linux/mm/mmap.c
--- vanilla/linux/mm/mmap.c	Fri Oct 13 12:10:30 2000
+++ linux/mm/mmap.c	Thu Nov  9 17:53:02 2000
@@ -67,7 +67,7 @@
 }
 
 /* Remove one vm structure from the inode's i_mapping address space. */
-static inline void remove_shared_vm_struct(struct vm_area_struct *vma)
+static inline void __remove_shared_vm_struct(struct vm_area_struct *vma)
 {
 	struct file * file = vma->vm_file;
 
@@ -75,14 +75,41 @@
 		struct inode *inode = file->f_dentry->d_inode;
 		if (vma->vm_flags & VM_DENYWRITE)
 			atomic_inc(&inode->i_writecount);
-		spin_lock(&inode->i_mapping->i_shared_lock);
 		if(vma->vm_next_share)
 			vma->vm_next_share->vm_pprev_share = vma->vm_pprev_share;
 		*vma->vm_pprev_share = vma->vm_next_share;
-		spin_unlock(&inode->i_mapping->i_shared_lock);
 	}
 }
 
+static inline void remove_shared_vm_struct(struct vm_area_struct *vma)
+{
+	lock_vma_mappings(vma);
+	__remove_shared_vm_struct(vma);
+	unlock_vma_mappings(vma);
+}
+
+void lock_vma_mappings(struct vm_area_struct *vma)
+{
+	struct address_space *mapping;
+
+	mapping = NULL;
+	if (vma->vm_file)
+		mapping = vma->vm_file->f_dentry->d_inode->i_mapping;
+	if (mapping)
+		spin_lock(&mapping->i_shared_lock);
+}
+
+void unlock_vma_mappings(struct vm_area_struct *vma)
+{
+	struct address_space *mapping;
+
+	mapping = NULL;
+	if (vma->vm_file)
+		mapping = vma->vm_file->f_dentry->d_inode->i_mapping;
+	if (mapping)
+		spin_unlock(&mapping->i_shared_lock);
+}
+
 /*
  *  sys_brk() for the most part doesn't need the global kernel
  *  lock, except when an application is doing something nasty
@@ -316,13 +343,22 @@
 	 * after the call.  Save the values we need now ...
 	 */
 	flags = vma->vm_flags;
-	addr = vma->vm_start; /* can addr have changed?? */
+
+	/* Can addr have changed??
+	 *
+	 * Answer: Yes, several device drivers can do it in their
+	 *         f_op->mmap method. -DaveM
+	 */
+	addr = vma->vm_start;
+
+	lock_vma_mappings(vma);
 	spin_lock(&mm->page_table_lock);
-	insert_vm_struct(mm, vma);
+	__insert_vm_struct(mm, vma);
 	if (correct_wcount)
 		atomic_inc(&file->f_dentry->d_inode->i_writecount);
 	merge_segments(mm, vma->vm_start, vma->vm_end);
 	spin_unlock(&mm->page_table_lock);
+	unlock_vma_mappings(vma);
 	
 	mm->total_vm += len >> PAGE_SHIFT;
 	if (flags & VM_LOCKED) {
@@ -534,10 +570,12 @@
 	/* Work out to one of the ends. */
 	if (end == area->vm_end) {
 		area->vm_end = addr;
+		lock_vma_mappings(area);
 		spin_lock(&mm->page_table_lock);
 	} else if (addr == area->vm_start) {
 		area->vm_pgoff += (end - area->vm_start) >> PAGE_SHIFT;
 		area->vm_start = end;
+		lock_vma_mappings(area);
 		spin_lock(&mm->page_table_lock);
 	} else {
 	/* Unmapping a hole: area->vm_start < addr <= end < area->vm_end */
@@ -560,12 +598,18 @@
 		if (mpnt->vm_ops && mpnt->vm_ops->open)
 			mpnt->vm_ops->open(mpnt);
 		area->vm_end = addr;	/* Truncate area */
+
+		/* Because mpnt->vm_file == area->vm_file this locks
+		 * things correctly.
+		 */
+		lock_vma_mappings(area);
 		spin_lock(&mm->page_table_lock);
-		insert_vm_struct(mm, mpnt);
+		__insert_vm_struct(mm, mpnt);
 	}
 
-	insert_vm_struct(mm, area);
+	__insert_vm_struct(mm, area);
 	spin_unlock(&mm->page_table_lock);
+	unlock_vma_mappings(area);
 	return extra;
 }
 
@@ -811,10 +855,12 @@
 	flags = vma->vm_flags;
 	addr = vma->vm_start;
 
+	lock_vma_mappings(vma);
 	spin_lock(&mm->page_table_lock);
-	insert_vm_struct(mm, vma);
+	__insert_vm_struct(mm, vma);
 	merge_segments(mm, vma->vm_start, vma->vm_end);
 	spin_unlock(&mm->page_table_lock);
+	unlock_vma_mappings(vma);
 	
 	mm->total_vm += len >> PAGE_SHIFT;
 	if (flags & VM_LOCKED) {
@@ -877,9 +923,10 @@
 }
 
 /* Insert vm structure into process list sorted by address
- * and into the inode's i_mmap ring.
+ * and into the inode's i_mmap ring.  If vm_file is non-NULL
+ * then the i_shared_lock must be held here.
  */
-void insert_vm_struct(struct mm_struct *mm, struct vm_area_struct *vmp)
+void __insert_vm_struct(struct mm_struct *mm, struct vm_area_struct *vmp)
 {
 	struct vm_area_struct **pprev;
 	struct file * file;
@@ -916,15 +963,20 @@
 			head = &mapping->i_mmap_shared;
       
 		/* insert vmp into inode's share list */
-		spin_lock(&mapping->i_shared_lock);
 		if((vmp->vm_next_share = *head) != NULL)
 			(*head)->vm_pprev_share = &vmp->vm_next_share;
 		*head = vmp;
 		vmp->vm_pprev_share = head;
-		spin_unlock(&mapping->i_shared_lock);
 	}
 }
 
+void insert_vm_struct(struct mm_struct *mm, struct vm_area_struct *vmp)
+{
+	lock_vma_mappings(vmp);
+	__insert_vm_struct(mm, vmp);
+	unlock_vma_mappings(vmp);
+}
+
 /* Merge the list of memory segments if possible.
  * Redundant vm_area_structs are freed.
  * This assumes that the list is ordered by address.
@@ -986,11 +1038,13 @@
 			mpnt->vm_pgoff += (mpnt->vm_end - mpnt->vm_start) >> PAGE_SHIFT;
 			mpnt->vm_start = mpnt->vm_end;
 			spin_unlock(&mm->page_table_lock);
+			unlock_vma_mappings(mpnt);
 			mpnt->vm_ops->close(mpnt);
+			lock_vma_mappings(mpnt);
 			spin_lock(&mm->page_table_lock);
 		}
 		mm->map_count--;
-		remove_shared_vm_struct(mpnt);
+		__remove_shared_vm_struct(mpnt);
 		if (mpnt->vm_file)
 			fput(mpnt->vm_file);
 		kmem_cache_free(vm_area_cachep, mpnt);
diff -u --recursive --new-file --exclude=CVS --exclude=.cvsignore vanilla/linux/mm/mprotect.c linux/mm/mprotect.c
--- vanilla/linux/mm/mprotect.c	Wed Oct 18 14:25:46 2000
+++ linux/mm/mprotect.c	Thu Nov  9 17:53:43 2000
@@ -118,11 +118,13 @@
 		get_file(n->vm_file);
 	if (n->vm_ops && n->vm_ops->open)
 		n->vm_ops->open(n);
+	lock_vma_mappings(vma);
 	spin_lock(&vma->vm_mm->page_table_lock);
 	vma->vm_pgoff += (end - vma->vm_start) >> PAGE_SHIFT;
 	vma->vm_start = end;
-	insert_vm_struct(current->mm, n);
+	__insert_vm_struct(current->mm, n);
 	spin_unlock(&vma->vm_mm->page_table_lock);
+	unlock_vma_mappings(vma);
 	return 0;
 }
 
@@ -145,10 +147,12 @@
 		get_file(n->vm_file);
 	if (n->vm_ops && n->vm_ops->open)
 		n->vm_ops->open(n);
+	lock_vma_mappings(vma);
 	spin_lock(&vma->vm_mm->page_table_lock);
 	vma->vm_end = start;
-	insert_vm_struct(current->mm, n);
+	__insert_vm_struct(current->mm, n);
 	spin_unlock(&vma->vm_mm->page_table_lock);
+	unlock_vma_mappings(vma);
 	return 0;
 }
 
@@ -179,6 +183,7 @@
 		vma->vm_ops->open(left);
 		vma->vm_ops->open(right);
 	}
+	lock_vma_mappings(vma);
 	spin_lock(&vma->vm_mm->page_table_lock);
 	vma->vm_pgoff += (start - vma->vm_start) >> PAGE_SHIFT;
 	vma->vm_start = start;
@@ -186,9 +191,10 @@
 	vma->vm_flags = newflags;
 	vma->vm_raend = 0;
 	vma->vm_page_prot = prot;
-	insert_vm_struct(current->mm, left);
-	insert_vm_struct(current->mm, right);
+	__insert_vm_struct(current->mm, left);
+	__insert_vm_struct(current->mm, right);
 	spin_unlock(&vma->vm_mm->page_table_lock);
+	unlock_vma_mappings(vma);
 	return 0;
 }
 
diff -u --recursive --new-file --exclude=CVS --exclude=.cvsignore vanilla/linux/mm/mremap.c linux/mm/mremap.c
--- vanilla/linux/mm/mremap.c	Wed Oct 18 14:25:46 2000
+++ linux/mm/mremap.c	Thu Nov  9 17:53:57 2000
@@ -141,10 +141,12 @@
 				get_file(new_vma->vm_file);
 			if (new_vma->vm_ops && new_vma->vm_ops->open)
 				new_vma->vm_ops->open(new_vma);
+			lock_vma_mappings(vma);
 			spin_lock(&current->mm->page_table_lock);
-			insert_vm_struct(current->mm, new_vma);
+			__insert_vm_struct(current->mm, new_vma);
 			merge_segments(current->mm, new_vma->vm_start, new_vma->vm_end);
 			spin_unlock(&current->mm->page_table_lock);
+			unlock_vma_mappings(vma);
 			do_munmap(current->mm, addr, old_len);
 			current->mm->total_vm += new_len >> PAGE_SHIFT;
 			if (new_vma->vm_flags & VM_LOCKED) {
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Oddness in i_shared_lock and page_table_lock nesting hierarchies ?
@ 2000-11-09 12:16 bsuparna
  2000-11-10  6:44 ` David S. Miller
  0 siblings, 1 reply; 9+ messages in thread
From: bsuparna @ 2000-11-09 12:16 UTC (permalink / raw)
  To: David S. Miller; +Cc: torvalds, aviro, linux-kernel


 > The fix we agreed on back then is easy (make rest of kernel match
 > vmtruncate()'s locking order), it just takes time to implement and
 > test.

  David,

  I was looking into the vmm code and trying to work out exactly how to fix
this, and there are
  some questions in my mind - mainly a few cases (involving multiple vma
updates) which
  I'm not sure about the cleanest way to tackle.
  But before I bother anyone with those, I thought I ought to go through
the earlier discussions
  that you had while coming up with what the fix should be. Maybe you've
already gone over
  this once.
  Could you point me to those ? Somehow I haven't been successful in
locating them.

  Regards
  Suparna


  Suparna Bhattacharya
  Systems Software Group, IBM Global Services, India
  E-mail : bsuparna@in.ibm.com
  Phone : 91-80-5267117, Extn : 2525




-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Oddness in i_shared_lock and page_table_lock nesting hierarchies ?
@ 2000-11-06 13:50 bsuparna
  0 siblings, 0 replies; 9+ messages in thread
From: bsuparna @ 2000-11-06 13:50 UTC (permalink / raw)
  To: David S. Miller; +Cc: linux-kernel, kanoj, torvalds


 Thanks.
 That sounds like a reasonable approach to take.

  Regards
 - Suparna

  Suparna Bhattacharya
  Systems Software Group, IBM Global Services, India
  E-mail : bsuparna@in.ibm.com
  Phone : 91-80-5267117, Extn : 2525


"David S. Miller" <davem@redhat.com> on 11/06/2000 08:44:29 AM

Please respond to "David S. Miller" <davem@redhat.com>

To:   Suparna Bhattacharya/India/IBM@IBMIN
cc:   linux-kernel@vger.kernel.org, ak@suse.de, kanoj@google.engr.sgi.com
Subject:  Re: Oddness in i_shared_lock and page_table_lock nesting
      hierarchies ?




   From: bsuparna@in.ibm.com
   Date:  Sun, 5 Nov 2000 11:21:05 +0530

      However, in the vmtruncate code, it looks like the hierarchy is
      reversed.

It is a well known bug amongst gurus :-) I sent a linux24 bug addition
to Ted Ty'tso a week or so ago but he dropped it aparently.

Ted, I'm resending this below, please add it to the linux24 list
thanks.

X-Coding-System: undecided-unix
Date: Fri, 13 Oct 2000 17:36:04 -0700
From: "David S. Miller" <davem@redhat.com>
To: tytso@mit.edu
Subject: New BUG for todo list


This bug will essentially hard-hang an SMP system if triggered.  Linus
and myself both know about it already for some time now, the fix is
straight forward, just nobody has coded it up and tested it yet.

The problem basically is that mm/memory.c:vmtruncate() violates the
lock acquisition ordering rules when both mm->page_table_lock and
mapping->i_shared_lock must both be acquired.  All other instances (in
the mmap/munmap syscalls for example) acuire the page_table_lock then
the i_shared_lock.  vmtruncate() on the other hand acquires the locks
i_shared_lock first then page_table_lock.

Essentially I would describe this in the TODO list as:

     vmtruncate() violates page_table_lock/i_shared_lock
     acquisition ordering rules leading to deadlock

The fix is to actually keep vmtruncate() how it is, and change the
ordering rules for the rest of the kernel to follow vmtruncate()'s
lock ordering.  Linus agreed with me on this because the more natural
data inspection flow is to go from the object to mappings of that
object.

I'm going to try and work on this change this weekend, but I want it
to be in the bug list so that it _is_ accounted for.

Later,
David S. Miller
davem@redhat.com





-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2000-11-10  9:47 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2000-11-05  5:51 Oddness in i_shared_lock and page_table_lock nesting hierarchies ? bsuparna
2000-11-06  3:14 ` David S. Miller
2000-11-06 18:42 ` tytso
2000-11-06 13:50 bsuparna
2000-11-09 12:16 bsuparna
2000-11-10  6:44 ` David S. Miller
2000-11-10  6:55 aprasad
2000-11-10  7:34 ` David S. Miller
2000-11-10  9:29 bsuparna

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).