linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: NeilBrown <neilb@suse.de>
To: Andrew Morton <akpm@osdl.org>
Cc: linux-kernel@vger.kernel.org
Cc: "Steinar H. Gunderson" <sgunderson@bigfoot.com>
Subject: [PATCH] Remove softlockup from invalidate_mapping_pages.
Date: Thu, 20 Apr 2006 16:29:55 +1000	[thread overview]
Message-ID: <1060420062955.7727@suse.de> (raw)
In-Reply-To: 20060420160549.7637.patches@notabene

The following patch fixes a problem with invalidate_mapping_pages.
Please look at the patch description and then come back here, because
there are some things I don't understand which you might be able to
help me with.
...

Thanks. 
I have had 2 reports of softlockups in this code apparently related to md. 
md calls invalidate_bdev on all the component block devices that it is 
building into an array.  These block devices are very likely to have just one
page in their mapping, right near the end as mdadm will have read the 
superblock which lives near the end.

However, I cannot see why the page would be locked.
Being locked for read is very unlikely because mdadm would have already
read the superblock.  I guess locked for read-ahead might be possible
(I assume readahead does lock the pages) but as only one or maybe two reads 
are a performed by mdadm, not much readahead should be generated.

Being locked for write also seems unlikely as if mdadm were to write
(which is fairly unlikely but not impossible) it would fsync() straight
away so by the time that it comes to assemble the array, all io
should have finished.

So that is (1) - I don't see why the page would be locked.

And (2) - I have a report (on linux-raid) of a soft-lockup which
lasted 76 seconds! 
Now if the device was 100Gig, that is 25million page addresses or 
3microseconds per loop. Is that at all likely for this loop - it 
does take and drop a spinlock but could that come close to a
few thousand cycles?

And the processor in this case was a dual-core amd64 - with SMP enabled.
I can imaging a long lockup on a uniprocessor, but if a second processor
core is free to unlock the page when the IO (Whatever it is) completes,
a 76 second delay would be unexpected.

The original bug report can be found at
  http://marc.theaimsgroup.com/?l=linux-raid&m=114550096908177&w=2

Finally (3) - I think that invalidate_mapping_pages should probably
have a cond_resched() call in it, except that drop_pagecache_sb in
fs/drop_caches.c calls it with the "inode_lock" spinlock held, which
would be bad.  Would it be safe (or could it be made safe) to drop and
regain the lock around that call?

Comments welcome, but in any case I think the patch is needed.

Thanks for your time,
NeilBrown




### Comments for Changeset

If invalidate_mapping_pages is called to invalidate a very large
mapping (e.g. a very large block device) and if the only active page
in that device is near the end  (or at least, at a very large  index),
such as, say, the superblock of an md array, and if that page
happens to be locked when invalidate_mapping_pages is called,
then
  pagevec_lookup will return this page and
  as it is locked, 'next' will be incremented and pagevec_lookup
  will be called again. and again. and again.
  while we count from 0 upto a very large number.

We should really always set 'next' to 'page->index+1' before going
around the loop again, not just if the page isn't locked.


Cc: "Steinar H. Gunderson" <sgunderson@bigfoot.com>
Signed-off-by: Neil Brown <neilb@suse.de>

### Diffstat output
 ./mm/truncate.c |   10 ++++------
 1 file changed, 4 insertions(+), 6 deletions(-)

diff ./mm/truncate.c~current~ ./mm/truncate.c
--- ./mm/truncate.c~current~	2006-04-20 15:27:22.000000000 +1000
+++ ./mm/truncate.c	2006-04-20 15:38:20.000000000 +1000
@@ -238,13 +238,11 @@ unsigned long invalidate_mapping_pages(s
 		for (i = 0; i < pagevec_count(&pvec); i++) {
 			struct page *page = pvec.pages[i];
 
-			if (TestSetPageLocked(page)) {
-				next++;
+			next = page->index+1;
+
+			if (TestSetPageLocked(page))
 				continue;
-			}
-			if (page->index > next)
-				next = page->index;
-			next++;
+
 			if (PageDirty(page) || PageWriteback(page))
 				goto unlock;
 			if (page_mapped(page))

       reply	other threads:[~2006-04-20  6:30 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <20060420160549.7637.patches@notabene>
2006-04-20  6:29 ` NeilBrown [this message]
2006-04-20  7:38   ` [PATCH] Remove softlockup from invalidate_mapping_pages Andrew Morton
2006-04-20  9:24     ` Neil Brown
2006-04-20  9:30       ` Jens Axboe
2006-04-20  9:50         ` Neil Brown
2006-04-20  9:59           ` Jens Axboe
2006-04-20 10:26             ` Neil Brown
2006-04-20  9:52       ` Andrew Morton
2006-04-20 10:52         ` Jens Axboe
2006-04-26 20:48     ` Steinar H. Gunderson
2006-04-26 20:58       ` Andrew Morton
2006-04-26 21:26         ` Steinar H. Gunderson
2006-05-01 16:07           ` Steinar H. Gunderson
2006-05-13 13:49         ` [PATCH] Remove softlockup from invalidate_mapping_pages. (might be dm related) Steinar H. Gunderson
2006-05-13 14:33           ` Andrew Morton
2006-05-13 14:43             ` Steinar H. Gunderson
2006-05-13 14:51               ` Andrew Morton
2006-05-13 15:00                 ` Steinar H. Gunderson
2006-05-13 15:24                   ` Andrew Morton
2006-05-13 15:32                     ` Steinar H. Gunderson
2006-05-13 15:43                       ` Andrew Morton
2006-05-13 17:14                         ` Steinar H. Gunderson
2006-05-13 23:28                           ` Steinar H. Gunderson
2006-05-13 22:37                         ` [dm-devel] " Alasdair G Kergon

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1060420062955.7727@suse.de \
    --to=neilb@suse.de \
    --cc=akpm@osdl.org \
    --cc=linux-kernel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).