All of lore.kernel.org
 help / color / mirror / Atom feed
From: Dave Chinner <david@fromorbit.com>
To: Calvin Owens <calvinowens@fb.com>
Cc: linux-block@vger.kernel.org, kernel-team@fb.com,
	linux-kernel@vger.kernel.org, linux-scsi@vger.kernel.org,
	xfs@oss.sgi.com
Subject: Re: [BUG] Slab corruption during XFS writeback under memory pressure
Date: Wed, 20 Jul 2016 08:58:51 +1000	[thread overview]
Message-ID: <20160719225851.GF16044@dastard> (raw)
In-Reply-To: <53af895c-7ddb-1e50-6c90-d4d59f5c7a2f@fb.com>

On Tue, Jul 19, 2016 at 02:22:47PM -0700, Calvin Owens wrote:
> On 07/18/2016 07:05 PM, Calvin Owens wrote:
> >On 07/17/2016 11:02 PM, Dave Chinner wrote:
> >>On Sun, Jul 17, 2016 at 10:00:03AM +1000, Dave Chinner wrote:
> >>>On Fri, Jul 15, 2016 at 05:18:02PM -0700, Calvin Owens wrote:
> >>>>Hello all,
> >>>>
> >>>>I've found a nasty source of slab corruption. Based on seeing similar symptoms
> >>>>on boxes at Facebook, I suspect it's been around since at least 3.10.
> >>>>
> >>>>It only reproduces under memory pressure so far as I can tell: the issue seems
> >>>>to be that XFS reclaims pages from buffers that are still in use by
> >>>>scsi/block. I'm not sure which side the bug lies on, but I've only observed it
> >>>>with XFS.
> >>[....]
> >>>But this indicates that the page is under writeback at this point,
> >>>so that tends to indicate that the above freeing was incorrect.
> >>>
> >>>Hmmm - it's clear we've got direct reclaim involved here, and the
> >>>suspicion of a dirty page that has had it's bufferheads cleared.
> >>>Are there any other warnings in the log from XFS prior to kasan
> >>>throwing the error?
> >>
> >>Can you try the patch below?
> >
> >Thanks for getting this out so quickly :)
> >
> >So far so good: I booted Linus' tree as of this morning and reproduced the ASAN
> >splat. After applying your patch I haven't triggered it.
> >
> >I'm a bit wary since it was hard to trigger reliably in the first place... so I
> >lined up a few dozen boxes to run the test case overnight. I'll confirm in the
> >morning (-0700) they look good.
> 
> All right, my testcase ran 2099 times overnight without triggering anything.
> 
> For the overnight tests, I booted the boxes with "mem=" to artificially limit RAM,
> which makes my repro *much* more reliable (I feel silly for not thinking of that
> in the first place). With that setup, I hit the ASAN splat 21 times in 98 runs on
> vanilla 4.7-rc7. So I'm sold.
> 
> Tested-by: Calvin Owens <calvinowens@fb.com>

Thanks for testing, Calvin. I'll update the patch and get it
reviewed and committed.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

WARNING: multiple messages have this Message-ID (diff)
From: Dave Chinner <david@fromorbit.com>
To: Calvin Owens <calvinowens@fb.com>
Cc: linux-block@vger.kernel.org, kernel-team@fb.com,
	linux-kernel@vger.kernel.org, linux-scsi@vger.kernel.org,
	xfs@oss.sgi.com
Subject: Re: [BUG] Slab corruption during XFS writeback under memory pressure
Date: Wed, 20 Jul 2016 08:58:51 +1000	[thread overview]
Message-ID: <20160719225851.GF16044@dastard> (raw)
In-Reply-To: <53af895c-7ddb-1e50-6c90-d4d59f5c7a2f@fb.com>

On Tue, Jul 19, 2016 at 02:22:47PM -0700, Calvin Owens wrote:
> On 07/18/2016 07:05 PM, Calvin Owens wrote:
> >On 07/17/2016 11:02 PM, Dave Chinner wrote:
> >>On Sun, Jul 17, 2016 at 10:00:03AM +1000, Dave Chinner wrote:
> >>>On Fri, Jul 15, 2016 at 05:18:02PM -0700, Calvin Owens wrote:
> >>>>Hello all,
> >>>>
> >>>>I've found a nasty source of slab corruption. Based on seeing similar symptoms
> >>>>on boxes at Facebook, I suspect it's been around since at least 3.10.
> >>>>
> >>>>It only reproduces under memory pressure so far as I can tell: the issue seems
> >>>>to be that XFS reclaims pages from buffers that are still in use by
> >>>>scsi/block. I'm not sure which side the bug lies on, but I've only observed it
> >>>>with XFS.
> >>[....]
> >>>But this indicates that the page is under writeback at this point,
> >>>so that tends to indicate that the above freeing was incorrect.
> >>>
> >>>Hmmm - it's clear we've got direct reclaim involved here, and the
> >>>suspicion of a dirty page that has had it's bufferheads cleared.
> >>>Are there any other warnings in the log from XFS prior to kasan
> >>>throwing the error?
> >>
> >>Can you try the patch below?
> >
> >Thanks for getting this out so quickly :)
> >
> >So far so good: I booted Linus' tree as of this morning and reproduced the ASAN
> >splat. After applying your patch I haven't triggered it.
> >
> >I'm a bit wary since it was hard to trigger reliably in the first place... so I
> >lined up a few dozen boxes to run the test case overnight. I'll confirm in the
> >morning (-0700) they look good.
> 
> All right, my testcase ran 2099 times overnight without triggering anything.
> 
> For the overnight tests, I booted the boxes with "mem=" to artificially limit RAM,
> which makes my repro *much* more reliable (I feel silly for not thinking of that
> in the first place). With that setup, I hit the ASAN splat 21 times in 98 runs on
> vanilla 4.7-rc7. So I'm sold.
> 
> Tested-by: Calvin Owens <calvinowens@fb.com>

Thanks for testing, Calvin. I'll update the patch and get it
reviewed and committed.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

  reply	other threads:[~2016-07-19 23:04 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-07-16  0:18 [BUG] Slab corruption during XFS writeback under memory pressure Calvin Owens
2016-07-16  0:18 ` Calvin Owens
2016-07-16  0:18 ` Calvin Owens
2016-07-17  0:00 ` Dave Chinner
2016-07-17  0:00   ` Dave Chinner
2016-07-18  6:02   ` Dave Chinner
2016-07-18  6:02     ` Dave Chinner
2016-07-19  2:05     ` Calvin Owens
2016-07-19  2:05       ` Calvin Owens
2016-07-19  2:05       ` Calvin Owens
2016-07-19 21:22       ` Calvin Owens
2016-07-19 21:22         ` Calvin Owens
2016-07-19 21:22         ` Calvin Owens
2016-07-19 22:58         ` Dave Chinner [this message]
2016-07-19 22:58           ` Dave Chinner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20160719225851.GF16044@dastard \
    --to=david@fromorbit.com \
    --cc=calvinowens@fb.com \
    --cc=kernel-team@fb.com \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-scsi@vger.kernel.org \
    --cc=xfs@oss.sgi.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.