All of lore.kernel.org
 help / color / mirror / Atom feed
From: Guus Sliepen <Guus.Sliepen@astro.su.se>
To: Nick Piggin <npiggin@suse.de>
Cc: Christoph Hellwig <hch@infradead.org>,
	Peter Klotz <peter.klotz@aon.at>,
	Roman Kononov <kernel@kononov.ftml.net>,
	linux-kernel@vger.kernel.org, xfs@oss.sgi.com
Subject: Re: BUG: soft lockup - is this XFS problem?
Date: Thu, 14 Jul 2011 13:23:24 +0200	[thread overview]
Message-ID: <20110714112324.GM30145@sliepen.org> (raw)
In-Reply-To: <20090105064838.GA5209@wotan.suse.de>

[-- Attachment #1: Type: text/plain, Size: 4406 bytes --]

Hello,

I'm having a problem with a system having an XFS filesystem on RAID locking up
fairly consistently when writing large amounts of data to it, with several
kernels, including 2.6.38.2 and 2.6.39.3, on both AMD and Intel multi-core
processors. The kernel always logs this several times:

BUG: soft lockup - CPU#2 stuck for 67s! [kswapd0:33]

With different CPU# numbers, but always in kswapd0. Eventually the system will
really lock up, requiring a reset. During soft lockups (when file transfer
apparently stalled), merely typing "ps aux" would often cause the lockup to end
immediately. After googling I found this page:

https://patchwork.kernel.org/patch/789/

An unpatched vanilla 2.6.39.3 consistently locked up, however after patching it
(adding a barrier() after all 4 instances of if
(!page_cache_get_speculative(page))) the lockups never happened anymore, and
file transfer has been steady.

I also tested it with ext4, which doesn't give lockups on unpatched kernels,
but unfortunately mkfs.ext4 cannot create filesystems larger than 16TB yet, so
I have to use XFS instead.

On Mon, Jan 05, 2009 at 06:48:38AM -0000, Nick Piggin wrote:

> I believe this patch should solve it. Please test and confirm before
> I send it upstream.

Further comments on that thread in 2009 indicated the patch was very useful,
but it doesn't seem to have been applied upstream. Is there any reason this
patch should not be applied?

If necessary I can submit a reworked patch for 2.6.39.3 or 3.0 when that comes
out.

> ---
> An XFS workload showed up a bug in the lockless pagecache patch. Basically it
> would go into an "infinite" loop, although it would sometimes be able to break
> out of the loop! The reason is a missing compiler barrier in the "increment
> reference count unless it was zero" case of the lockless pagecache protocol in
> the gang lookup functions.
> 
> This would cause the compiler to use a cached value of struct page pointer to
> retry the operation with, rather than reload it. So the page might have been
> removed from pagecache and freed (refcount==0) but the lookup would not correctly
> notice the page is no longer in pagecache, and keep attempting to increment the
> refcount and failing, until the page gets reallocated for something else. This
> isn't a data corruption because the condition will be detected if the page has
> been reallocated. However it can result in a lockup. 
> 
> Add a the required compiler barrier and comment to fix this.
[...]
> Index: linux-2.6/mm/filemap.c
> ===================================================================
> --- linux-2.6.orig/mm/filemap.c	2009-01-05 17:22:57.000000000 +1100
> +++ linux-2.6/mm/filemap.c	2009-01-05 17:28:40.000000000 +1100
> @@ -794,8 +794,19 @@ repeat:
>  		if (unlikely(page == RADIX_TREE_RETRY))
>  			goto restart;
>  
> -		if (!page_cache_get_speculative(page))
> +		if (!page_cache_get_speculative(page)) {
> +			/*
> +			 * A failed page_cache_get_speculative operation does
> +			 * not imply any barriers (Documentation/atomic_ops.txt),
> +			 * and as such, we must force the compiler to deref the
> +			 * radix-tree slot again rather than using the cached
> +			 * value (because we need to give up if the page has been
> +			 * removed from the radix-tree, rather than looping until
> +			 * it gets reused for something else).
> +			 */
> +			barrier();
>  			goto repeat;
> +		}
>  
>  		/* Has the page moved? */
>  		if (unlikely(page != *((void **)pages[i]))) {
> @@ -850,8 +861,11 @@ repeat:
>  		if (page->mapping == NULL || page->index != index)
>  			break;
>  
> -		if (!page_cache_get_speculative(page))
> +		if (!page_cache_get_speculative(page)) {
> +			/* barrier: see find_get_pages() */
> +			barrier();
>  			goto repeat;
> +		}
>  
>  		/* Has the page moved? */
>  		if (unlikely(page != *((void **)pages[i]))) {
> @@ -904,8 +918,11 @@ repeat:
>  		if (unlikely(page == RADIX_TREE_RETRY))
>  			goto restart;
>  
> -		if (!page_cache_get_speculative(page))
> +		if (!page_cache_get_speculative(page)) {
> +			/* barrier: see find_get_pages() */
> +			barrier();
>  			goto repeat;
> +		}
>  
>  		/* Has the page moved? */
>  		if (unlikely(page != *((void **)pages[i]))) {

-- 
Met vriendelijke groet / with kind regards,
Guus Sliepen <Guus.Sliepen@astro.su.se>

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 198 bytes --]

WARNING: multiple messages have this Message-ID (diff)
From: Guus Sliepen <Guus.Sliepen@astro.su.se>
To: Nick Piggin <npiggin@suse.de>
Cc: Christoph Hellwig <hch@infradead.org>,
	Peter Klotz <peter.klotz@aon.at>,
	linux-kernel@vger.kernel.org,
	Roman Kononov <kernel@kononov.ftml.net>,
	xfs@oss.sgi.com
Subject: Re: BUG: soft lockup - is this XFS problem?
Date: Thu, 14 Jul 2011 13:23:24 +0200	[thread overview]
Message-ID: <20110714112324.GM30145@sliepen.org> (raw)
In-Reply-To: <20090105064838.GA5209@wotan.suse.de>


[-- Attachment #1.1: Type: text/plain, Size: 4406 bytes --]

Hello,

I'm having a problem with a system having an XFS filesystem on RAID locking up
fairly consistently when writing large amounts of data to it, with several
kernels, including 2.6.38.2 and 2.6.39.3, on both AMD and Intel multi-core
processors. The kernel always logs this several times:

BUG: soft lockup - CPU#2 stuck for 67s! [kswapd0:33]

With different CPU# numbers, but always in kswapd0. Eventually the system will
really lock up, requiring a reset. During soft lockups (when file transfer
apparently stalled), merely typing "ps aux" would often cause the lockup to end
immediately. After googling I found this page:

https://patchwork.kernel.org/patch/789/

An unpatched vanilla 2.6.39.3 consistently locked up, however after patching it
(adding a barrier() after all 4 instances of if
(!page_cache_get_speculative(page))) the lockups never happened anymore, and
file transfer has been steady.

I also tested it with ext4, which doesn't give lockups on unpatched kernels,
but unfortunately mkfs.ext4 cannot create filesystems larger than 16TB yet, so
I have to use XFS instead.

On Mon, Jan 05, 2009 at 06:48:38AM -0000, Nick Piggin wrote:

> I believe this patch should solve it. Please test and confirm before
> I send it upstream.

Further comments on that thread in 2009 indicated the patch was very useful,
but it doesn't seem to have been applied upstream. Is there any reason this
patch should not be applied?

If necessary I can submit a reworked patch for 2.6.39.3 or 3.0 when that comes
out.

> ---
> An XFS workload showed up a bug in the lockless pagecache patch. Basically it
> would go into an "infinite" loop, although it would sometimes be able to break
> out of the loop! The reason is a missing compiler barrier in the "increment
> reference count unless it was zero" case of the lockless pagecache protocol in
> the gang lookup functions.
> 
> This would cause the compiler to use a cached value of struct page pointer to
> retry the operation with, rather than reload it. So the page might have been
> removed from pagecache and freed (refcount==0) but the lookup would not correctly
> notice the page is no longer in pagecache, and keep attempting to increment the
> refcount and failing, until the page gets reallocated for something else. This
> isn't a data corruption because the condition will be detected if the page has
> been reallocated. However it can result in a lockup. 
> 
> Add a the required compiler barrier and comment to fix this.
[...]
> Index: linux-2.6/mm/filemap.c
> ===================================================================
> --- linux-2.6.orig/mm/filemap.c	2009-01-05 17:22:57.000000000 +1100
> +++ linux-2.6/mm/filemap.c	2009-01-05 17:28:40.000000000 +1100
> @@ -794,8 +794,19 @@ repeat:
>  		if (unlikely(page == RADIX_TREE_RETRY))
>  			goto restart;
>  
> -		if (!page_cache_get_speculative(page))
> +		if (!page_cache_get_speculative(page)) {
> +			/*
> +			 * A failed page_cache_get_speculative operation does
> +			 * not imply any barriers (Documentation/atomic_ops.txt),
> +			 * and as such, we must force the compiler to deref the
> +			 * radix-tree slot again rather than using the cached
> +			 * value (because we need to give up if the page has been
> +			 * removed from the radix-tree, rather than looping until
> +			 * it gets reused for something else).
> +			 */
> +			barrier();
>  			goto repeat;
> +		}
>  
>  		/* Has the page moved? */
>  		if (unlikely(page != *((void **)pages[i]))) {
> @@ -850,8 +861,11 @@ repeat:
>  		if (page->mapping == NULL || page->index != index)
>  			break;
>  
> -		if (!page_cache_get_speculative(page))
> +		if (!page_cache_get_speculative(page)) {
> +			/* barrier: see find_get_pages() */
> +			barrier();
>  			goto repeat;
> +		}
>  
>  		/* Has the page moved? */
>  		if (unlikely(page != *((void **)pages[i]))) {
> @@ -904,8 +918,11 @@ repeat:
>  		if (unlikely(page == RADIX_TREE_RETRY))
>  			goto restart;
>  
> -		if (!page_cache_get_speculative(page))
> +		if (!page_cache_get_speculative(page)) {
> +			/* barrier: see find_get_pages() */
> +			barrier();
>  			goto repeat;
> +		}
>  
>  		/* Has the page moved? */
>  		if (unlikely(page != *((void **)pages[i]))) {

-- 
Met vriendelijke groet / with kind regards,
Guus Sliepen <Guus.Sliepen@astro.su.se>

[-- Attachment #1.2: Digital signature --]
[-- Type: application/pgp-signature, Size: 198 bytes --]

[-- Attachment #2: Type: text/plain, Size: 121 bytes --]

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

  parent reply	other threads:[~2011-07-14 11:32 UTC|newest]

Thread overview: 74+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-12-19  6:59 BUG: soft lockup - is this XFS problem? Roman Kononov
2008-12-23 17:12 ` Christoph Hellwig
2008-12-23 17:12   ` Christoph Hellwig
2008-12-30  4:23   ` Nick Piggin
2008-12-30  4:23     ` Nick Piggin
2009-01-03 21:44     ` Christoph Hellwig
2009-01-03 21:44       ` Christoph Hellwig
2009-01-05  1:48       ` Nick Piggin
2009-01-05  1:48         ` Nick Piggin
2009-01-05  4:19         ` Nick Piggin
2009-01-05  4:19           ` Nick Piggin
2009-01-05  6:48           ` Nick Piggin
2009-01-05  6:48             ` Nick Piggin
2009-01-05 14:25             ` Roman Kononov
2009-01-05 14:25               ` Roman Kononov
2009-01-05 16:21             ` Peter Klotz
2009-01-05 16:21               ` Peter Klotz
2009-01-05 16:41               ` [patch] mm: fix lockless pagecache reordering bug (was Re: BUG: soft lockup - is this XFS problem?) Nick Piggin
2009-01-05 16:41                 ` Nick Piggin
2009-01-05 16:41                 ` Nick Piggin
2009-01-05 17:30                 ` Linus Torvalds
2009-01-05 17:30                   ` Linus Torvalds
2009-01-05 17:30                   ` Linus Torvalds
2009-01-05 18:00                   ` Nick Piggin
2009-01-05 18:00                     ` Nick Piggin
2009-01-05 18:00                     ` Nick Piggin
2009-01-05 18:44                     ` Linus Torvalds
2009-01-05 18:44                       ` Linus Torvalds
2009-01-05 18:44                       ` Linus Torvalds
2009-01-05 19:39                       ` Linus Torvalds
2009-01-05 19:39                         ` Linus Torvalds
2009-01-05 19:39                         ` Linus Torvalds
2009-01-06 17:17                         ` Paul E. McKenney
2009-01-06 17:17                           ` Paul E. McKenney
2009-01-06 17:17                           ` Paul E. McKenney
2009-01-05 20:12                       ` Paul E. McKenney
2009-01-05 20:12                         ` Paul E. McKenney
2009-01-05 20:12                         ` Paul E. McKenney
2009-01-05 20:39                         ` Linus Torvalds
2009-01-05 20:39                           ` Linus Torvalds
2009-01-05 20:39                           ` Linus Torvalds
2009-01-05 21:57                           ` Paul E. McKenney
2009-01-05 21:57                             ` Paul E. McKenney
2009-01-05 21:57                             ` Paul E. McKenney
2009-01-06  2:05                             ` Nick Piggin
2009-01-06  2:05                               ` Nick Piggin
2009-01-06  2:05                               ` Nick Piggin
2009-01-06  2:23                               ` Paul E. McKenney
2009-01-06  2:23                                 ` Paul E. McKenney
2009-01-06  2:23                                 ` Paul E. McKenney
2009-01-06  2:29                               ` Linus Torvalds
2009-01-06  2:29                                 ` Linus Torvalds
2009-01-06  2:29                                 ` Linus Torvalds
2009-01-06  8:38                               ` Peter Klotz
2009-01-06  8:38                                 ` Peter Klotz
2009-01-06  8:38                                 ` Peter Klotz
2009-01-06  8:43                                 ` Nick Piggin
2009-01-06  8:43                                   ` Nick Piggin
2009-01-06  8:43                                   ` Nick Piggin
2009-01-06 16:16                               ` Roman Kononov
2009-01-06 16:16                                 ` Roman Kononov
2009-01-06 16:16                                 ` Roman Kononov
2009-01-05 21:04                         ` [patch] mm: fix lockless pagecache reordering bug (was Peter Zijlstra
2009-01-05 21:04                           ` Peter Zijlstra
2009-01-05 21:04                           ` Peter Zijlstra
2009-01-05 21:58                           ` Paul E. McKenney
2009-01-05 21:58                             ` Paul E. McKenney
2009-01-05 21:58                             ` Paul E. McKenney
2011-07-14 11:23             ` Guus Sliepen [this message]
2011-07-14 11:23               ` BUG: soft lockup - is this XFS problem? Guus Sliepen
2011-07-14 18:03               ` Peter Klotz
2011-07-14 18:03                 ` Peter Klotz
2011-07-14 19:29                 ` Guus Sliepen
2011-07-14 19:29                   ` Guus Sliepen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20110714112324.GM30145@sliepen.org \
    --to=guus.sliepen@astro.su.se \
    --cc=hch@infradead.org \
    --cc=kernel@kononov.ftml.net \
    --cc=linux-kernel@vger.kernel.org \
    --cc=npiggin@suse.de \
    --cc=peter.klotz@aon.at \
    --cc=xfs@oss.sgi.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.