linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Re: BUG FIX?: mm->rss is modified in some places without holding the  page_table_lock
       [not found] ` <Pine.LNX.4.21.0010130114090.13322-100000@neo.local>
@ 2000-11-03 11:33   ` David S. Miller
  2000-11-03 14:56     ` Theodore Y. Ts'o
  0 siblings, 1 reply; 5+ messages in thread
From: David S. Miller @ 2000-11-03 11:33 UTC (permalink / raw)
  To: tytso; +Cc: davej, torvalds, linux-kernel, linux-mm

   Date: 	Fri, 3 Nov 2000 06:39:22 -0500
   From: tytso@mit.edu

   Given that we don't have a 64-bit atomic_t type, what do people
   think of Davej's patch?  (attached, below)

Broken, in 9 out of 10 places where he adds page_table_lock
acquisitions, this lock is already held --> instant deadlock.

This report is complicated by the fact that people were forgetting
that vmlist_*_{lock,unlock}(mm) was actually just spin_{lock,unlock}
on mm->page_table_lock.  I fixed that already by removing the dumb
vmlist locking macros which were causing all of this confusion.

Later,
David S. Miller
davem@redhat.com
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 5+ messages in thread

* BUG FIX?: mm->rss is modified in some places without holding the  page_table_lock
       [not found] ` <Pine.LNX.4.21.0010130114090.13322-100000@neo.local>
@ 2000-11-03 11:39 tytso
       [not found] ` <Pine.LNX.4.21.0010130114090.13322-100000@neo.local>
  0 siblings, 1 reply; 5+ messages in thread
From: tytso @ 2000-11-03 11:39 UTC (permalink / raw)
  To: davej; +Cc: torvalds, linux-kernel, linux-mm


I'm in the process of updating the 2.4 bug list for test10, and as near
as I can tell, there was a lot of discussion about two different ways of
fixing this bug:

> 9. To Do
>     * mm->rss is modified in some places without holding the
>       page_table_lock (sct)

To wit, either making mm->rss an atomic_t (which doesn't quite work for
some architectures which need it to be 64 bits), or putting in some
locks as Dave Jones suggested.

As far as I can tell, neither patch has gone in, and discussion seems to
have dropped in the past two weeks or so.  An examination of the kernel
seems to indicate that neither fixed has been taken.  What's the status
of this?  Given that we don't have a 64-bit atomic_t type, what do
people think of Davej's patch?  (attached, below)

						- Ted

				    
diff -urN --exclude-from=/home/davej/.exclude linux/mm/memory.c linux.dj/mm/memory.c
--- linux/mm/memory.c	Sat Sep 16 00:51:21 2000
+++ linux.dj/mm/memory.c	Wed Oct 11 23:41:10 2000
@@ -370,7 +370,6 @@
 		address = (address + PGDIR_SIZE) & PGDIR_MASK;
 		dir++;
 	} while (address && (address < end));
-	spin_unlock(&mm->page_table_lock);
 	/*
 	 * Update rss for the mm_struct (not necessarily current->mm)
 	 * Notice that rss is an unsigned long.
@@ -379,6 +378,7 @@
 		mm->rss -= freed;
 	else
 		mm->rss = 0;
+	spin_unlock(&mm->page_table_lock);
 }
 
 
@@ -1074,7 +1074,9 @@
 		flush_icache_page(vma, page);
 	}
 
+	spin_lock(&mm->page_table_lock);
 	mm->rss++;
+	spin_unlock(&mm->page_table_lock);
 
 	pte = mk_pte(page, vma->vm_page_prot);
 
@@ -1113,7 +1115,9 @@
 			return -1;
 		clear_user_highpage(page, addr);
 		entry = pte_mkwrite(pte_mkdirty(mk_pte(page, vma->vm_page_prot)));
+		spin_lock(&mm->page_table_lock);
 		mm->rss++;
+		spin_unlock(&mm->page_table_lock);
 		flush_page_to_ram(page);
 	}
 	set_pte(page_table, entry);
@@ -1152,7 +1156,9 @@
 		return 0;
 	if (new_page == NOPAGE_OOM)
 		return -1;
+	spin_lock(&mm->page_table_lock);
 	++mm->rss;
+	spin_unlock(&mm->page_table_lock);
 	/*
 	 * This silly early PAGE_DIRTY setting removes a race
 	 * due to the bad i386 page protection. But it's valid
diff -urN --exclude-from=/home/davej/.exclude linux/mm/mmap.c linux.dj/mm/mmap.c
--- linux/mm/mmap.c	Tue Aug 29 20:41:12 2000
+++ linux.dj/mm/mmap.c	Wed Oct 11 23:48:30 2000
@@ -844,7 +844,9 @@
 	vmlist_modify_lock(mm);
 	mm->mmap = mm->mmap_avl = mm->mmap_cache = NULL;
 	vmlist_modify_unlock(mm);
+	spin_lock (&mm->page_table_lock);
 	mm->rss = 0;
+	spin_unlock (&mm->page_table_lock);
 	mm->total_vm = 0;
 	mm->locked_vm = 0;
 	while (mpnt) {
diff -urN --exclude-from=/home/davej/.exclude linux/mm/vmscan.c linux.dj/mm/vmscan.c
--- linux/mm/vmscan.c	Mon Oct  2 20:02:20 2000
+++ linux.dj/mm/vmscan.c	Wed Oct 11 23:46:01 2000
@@ -102,7 +102,9 @@
 		set_pte(page_table, swp_entry_to_pte(entry));
 drop_pte:
 		UnlockPage(page);
+		spin_lock (&mm->page_table_lock);
 		mm->rss--;
+		spin_unlock (&mm->page_table_lock);
 		flush_tlb_page(vma, address);
 		deactivate_page(page);
 		page_cache_release(page);
@@ -170,7 +172,9 @@
 		struct file *file = vma->vm_file;
 		if (file) get_file(file);
 		pte_clear(page_table);
+		spin_lock (&mm->page_table_lock);
 		mm->rss--;
+		spin_unlock (&mm->page_table_lock);
 		flush_tlb_page(vma, address);
 		vmlist_access_unlock(mm);
 		error = swapout(page, file);
@@ -202,7 +206,9 @@
 	add_to_swap_cache(page, entry);
 
 	/* Put the swap entry into the pte after the page is in swapcache */
+	spin_lock (&mm->page_table_lock);
 	mm->rss--;
+	spin_unlock (&mm->page_table_lock);
 	set_pte(page_table, swp_entry_to_pte(entry));
 	flush_tlb_page(vma, address);
 	vmlist_access_unlock(mm);


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: BUG FIX?: mm->rss is modified in some places without holding the  page_table_lock
  2000-11-03 14:56     ` Theodore Y. Ts'o
@ 2000-11-03 14:51       ` David S. Miller
  2000-11-04 23:37         ` Rasmus Andersen
  0 siblings, 1 reply; 5+ messages in thread
From: David S. Miller @ 2000-11-03 14:51 UTC (permalink / raw)
  To: tytso; +Cc: davej, torvalds, linux-kernel, linux-mm

   Date: 	Fri, 3 Nov 2000 09:56:15 -0500
   From: "Theodore Y. Ts'o" <tytso@MIT.EDU>

   Are you saying that the original bug report may not actually be a
   problem?  Is ms->rss actually protected in _all_ of the right
   places, but people got confused because of the syntactic sugar?

I don't know if all of them are ok, most are.

Someone would need to do the analysis.  David's patch could be used as
a good starting point.  A quick glance at mm/memory.c shows these
spots need to be fixed:

1) End of zap_page_range()
2) Middle of do_swap_page
3) do_anonymous_page
4) do_no_page

Later,
David S. Miller
davem@redhat.com
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: BUG FIX?: mm->rss is modified in some places without holding the  page_table_lock
  2000-11-03 11:33   ` David S. Miller
@ 2000-11-03 14:56     ` Theodore Y. Ts'o
  2000-11-03 14:51       ` David S. Miller
  0 siblings, 1 reply; 5+ messages in thread
From: Theodore Y. Ts'o @ 2000-11-03 14:56 UTC (permalink / raw)
  To: David S. Miller; +Cc: davej, torvalds, linux-kernel, linux-mm

   Date: Fri, 3 Nov 2000 03:33:37 -0800
   From: "David S. Miller" <davem@redhat.com>

      Given that we don't have a 64-bit atomic_t type, what do people
      think of Davej's patch?  (attached, below)

   Broken, in 9 out of 10 places where he adds page_table_lock
   acquisitions, this lock is already held --> instant deadlock.

   This report is complicated by the fact that people were forgetting
   that vmlist_*_{lock,unlock}(mm) was actually just spin_{lock,unlock}
   on mm->page_table_lock.  I fixed that already by removing the dumb
   vmlist locking macros which were causing all of this confusion.

Are you saying that the original bug report may not actually be a
problem?  Is ms->rss actually protected in _all_ of the right places, but
people got confused because of the syntactic sugar?

						- Ted
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: BUG FIX?: mm->rss is modified in some places without holding the  page_table_lock
  2000-11-03 14:51       ` David S. Miller
@ 2000-11-04 23:37         ` Rasmus Andersen
  0 siblings, 0 replies; 5+ messages in thread
From: Rasmus Andersen @ 2000-11-04 23:37 UTC (permalink / raw)
  To: David S. Miller; +Cc: tytso, davej, torvalds, linux-kernel, linux-mm

On Fri, Nov 03, 2000 at 06:51:05AM -0800, David S. Miller wrote:
>    Are you saying that the original bug report may not actually be a
>    problem?  Is ms->rss actually protected in _all_ of the right
>    places, but people got confused because of the syntactic sugar?
> 
> I don't know if all of them are ok, most are.
> 

Would this do? This is a subset of Davej's patch. I also noted that
fs/{exec.c,binfmt_aout.c,binfmt_elf.c} modifies rss without holding
the lock. I think exec.c needs it, but am at a loss whether the 
binfmt_* does too. The second patch below adds the lock to fs/exec.c.

Comments?

diff -ura linux-240-t10-clean/mm/memory.c linux/mm/memory.c
--- linux-240-t10-clean/mm/memory.c	Sat Nov  4 23:27:17 2000
+++ linux/mm/memory.c	Sun Nov  5 00:13:59 2000
@@ -369,7 +369,6 @@
 		address = (address + PGDIR_SIZE) & PGDIR_MASK;
 		dir++;
 	} while (address && (address < end));
-	spin_unlock(&mm->page_table_lock);
 	/*
 	 * Update rss for the mm_struct (not necessarily current->mm)
 	 * Notice that rss is an unsigned long.
@@ -378,6 +377,7 @@
 		mm->rss -= freed;
 	else
 		mm->rss = 0;
+	spin_unlock(&mm->page_table_lock);
 }
 
 
@@ -1074,7 +1074,9 @@
 		flush_icache_page(vma, page);
 	}
 
+	spin_lock(&mm->page_table_lock);
 	mm->rss++;
+	spin_unlock(&mm->page_table_lock);
 
 	pte = mk_pte(page, vma->vm_page_prot);
 
@@ -1113,7 +1115,9 @@
 			return -1;
 		clear_user_highpage(page, addr);
 		entry = pte_mkwrite(pte_mkdirty(mk_pte(page, vma->vm_page_prot)));
+		spin_lock(&mm->page_table_lock);
 		mm->rss++;
+		spin_unlock(&mm->page_table_lock);
 		flush_page_to_ram(page);
 	}
 	set_pte(page_table, entry);
@@ -1152,7 +1156,9 @@
 		return 0;
 	if (new_page == NOPAGE_OOM)
 		return -1;
+	spin_lock(&mm->page_table_lock);
 	++mm->rss;
+	spin_unlock(&mm->page_table_lock);
 	/*
 	 * This silly early PAGE_DIRTY setting removes a race
 	 * due to the bad i386 page protection. But it's valid
diff -ura linux-240-t10-clean/mm/mmap.c linux/mm/mmap.c
--- linux-240-t10-clean/mm/mmap.c	Sat Nov  4 23:27:17 2000
+++ linux/mm/mmap.c	Sat Nov  4 23:53:49 2000
@@ -843,8 +843,8 @@
 	spin_lock(&mm->page_table_lock);
 	mpnt = mm->mmap;
 	mm->mmap = mm->mmap_avl = mm->mmap_cache = NULL;
-	spin_unlock(&mm->page_table_lock);
 	mm->rss = 0;
+	spin_unlock(&mm->page_table_lock);
 	mm->total_vm = 0;
 	mm->locked_vm = 0;
 	while (mpnt) {
diff -ura linux-240-t10-clean/mm/swapfile.c linux/mm/swapfile.c
--- linux-240-t10-clean/mm/swapfile.c	Sat Nov  4 23:27:17 2000
+++ linux/mm/swapfile.c	Sun Nov  5 00:19:15 2000
@@ -231,7 +231,9 @@
 	set_pte(dir, pte_mkdirty(mk_pte(page, vma->vm_page_prot)));
 	swap_free(entry);
 	get_page(page);
+	spin_lock(&vma->vm_mm->page_table_lock);
 	++vma->vm_mm->rss;
+	spin_unlock(&vma->vm_mm->page_table_lock);
 }
 
 static inline void unuse_pmd(struct vm_area_struct * vma, pmd_t *dir,
diff -ura linux-240-t10-clean/mm/vmscan.c linux/mm/vmscan.c
--- linux-240-t10-clean/mm/vmscan.c	Sat Nov  4 23:27:17 2000
+++ linux/mm/vmscan.c	Sun Nov  5 00:19:48 2000
@@ -95,7 +95,9 @@
 		set_pte(page_table, swp_entry_to_pte(entry));
 drop_pte:
 		UnlockPage(page);
+		spin_lock(&mm->page_table_lock);
 		mm->rss--;
+		spin_unlock(&mm->page_table_lock);
 		flush_tlb_page(vma, address);
 		deactivate_page(page);
 		page_cache_release(page);



Second patch:

--- linux-240-t10-clean/fs/exec.c	Sat Nov  4 23:27:14 2000
+++ linux/fs/exec.c	Sat Nov  4 23:55:37 2000
@@ -324,7 +324,9 @@
 		struct page *page = bprm->page[i];
 		if (page) {
 			bprm->page[i] = NULL;
+			spin_lock(mm->page_table_lock);
 			current->mm->rss++;
+			spin_unlock(mm->page_table_lock);
 			put_dirty_page(current,page,stack_base);
 		}
 		stack_base += PAGE_SIZE;

-- 
Regards,
        Rasmus(rasmus@jaquet.dk)

Duct tape is like the force; it has a light side and a dark side, and
it holds the universe together.
  -- Anonymous
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2000-11-04 22:45 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2000-11-03 11:39 BUG FIX?: mm->rss is modified in some places without holding the page_table_lock tytso
     [not found] ` <Pine.LNX.4.21.0010130114090.13322-100000@neo.local>
2000-11-03 11:33   ` David S. Miller
2000-11-03 14:56     ` Theodore Y. Ts'o
2000-11-03 14:51       ` David S. Miller
2000-11-04 23:37         ` Rasmus Andersen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).