All of lore.kernel.org
 help / color / mirror / Atom feed
From: Eric Sandeen <sandeen@sandeen.net>
To: Dave Chinner <david@fromorbit.com>,
	"J. Bruce Fields" <bfields@fieldses.org>
Cc: linux-nfs@vger.kernel.org, Christoph Hellwig <hch@lst.de>,
	xfs@oss.sgi.com
Subject: Re: panic on 4.20 server exporting xfs filesystem
Date: Wed, 04 Mar 2015 16:49:09 -0600	[thread overview]
Message-ID: <54F78BE5.1020608@sandeen.net> (raw)
In-Reply-To: <20150304224557.GY4251@dastard>

On 3/4/15 4:45 PM, Dave Chinner wrote:
> On Wed, Mar 04, 2015 at 05:27:09PM -0500, J. Bruce Fields wrote:
>> On Thu, Mar 05, 2015 at 09:09:00AM +1100, Dave Chinner wrote:
>>> On Wed, Mar 04, 2015 at 10:54:21AM -0500, J. Bruce Fields wrote:
>>>> On Tue, Mar 03, 2015 at 09:08:26PM -0500, J. Bruce Fields wrote:
>>>>> On Wed, Mar 04, 2015 at 09:44:56AM +1100, Dave Chinner wrote:
>>>>>> On Tue, Mar 03, 2015 at 05:10:33PM -0500, J. Bruce Fields wrote:
>>>>>>> I'm getting mysterious crashes on a server exporting an xfs filesystem.
>>>>>>>
>>>>>>> Strangely, I've reproduced this on
>>>>>>>
>>>>>>> 	93aaa830fc17 "Merge tag 'xfs-pnfs-for-linus-3.20-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/dgc/linux-xfs
>>>>>>>
>>>>>>> but haven't yet managed to reproduce on either of its parents
>>>>>>> (24a52e412ef2 or 781355c6e5ae).  That might just be chance, I'll try
>>>>>>> again.
>>>>>>
>>>>>> I think you'll find that the bug is only triggered after that XFS
>>>>>> merge because it's what enabled block layout support in the server,
>>>>>> i.e.  nfsd4_setup_layout_type() is now setting the export type to
>>>>>> LAYOUT_BLOCK_VOLUME because XFS has added the necessary functions to
>>>>>> it's export ops.
>>>>>
>>>>> Doh--after all the discussion I didn't actually pay attention to what
>>>>> happened in the end.  OK, I see, you're right, it's all more-or-less
>>>>> dead code till that merge.
>>>>>
>>>>> Christoph's code was passing all my tests before that, so maybe we
>>>>> broke something in the merge process.
>>>>>
>>>>> Alternatively, it could be because I've added more tests--I'll rerun my
>>>>> current tests on his original branch....
>>>>
>>>> The below is on Christoph's pnfsd-for-3.20-4 (at cd4b02e).  Doesn't look
>>>> very informative.  I'm running xfstests over NFSv4.1 with client and
>>>> server running the same kernel, the filesystem in question is xfs, but
>>>> isn't otherwise available to the client (so the client shouldn't be
>>>> doing pnfs).
>>>>
>>>> --b.
>>>>
>>>> BUG: unable to handle kernel paging request at 00000000757d4900
>>>> IP: [<ffffffff810b59af>] cpuacct_charge+0x5f/0xa0
>>>> PGD 0 
>>>> Thread overran stack, or stack corrupted
>>>
>>> Hmmmm. That is not at all informative, especially as it's only
>>> dumped the interrupt stack and not the stack or the task that it
>>> has detected as overrun or corrupted.
>>>
>>> Can you turn on all the stack overrun debug options? Maybe even
>>> turn on the stack tracer to get an idea of whether we are recursing
>>> deeply somewhere we shouldn't be?
>>
>> Digging around under "Kernel hacking".... I already have
>> DEBUG_STACK_USAGE, DEBUG_STACKOVERFLOW, and STACK_TRACER, and I can try
>> turning on the latter.  (Will I be able to get information out of it
>> before the panic?)
> 
> just keep taking samples of the worst case stack usage as the test
> runs. If there's anything unusual before the failure then it will
> show up, otherwise I'm not sure how to track this down...

I think it should print "maximum stack depth" messages whenever a stack
reaches a new max excursion...

> Cheers,
> 
> Dave.
> 

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

WARNING: multiple messages have this Message-ID (diff)
From: Eric Sandeen <sandeen@sandeen.net>
To: Dave Chinner <david@fromorbit.com>,
	"J. Bruce Fields" <bfields@fieldses.org>
Cc: linux-nfs@vger.kernel.org, Christoph Hellwig <hch@lst.de>,
	xfs@oss.sgi.com
Subject: Re: panic on 4.20 server exporting xfs filesystem
Date: Wed, 04 Mar 2015 16:49:09 -0600	[thread overview]
Message-ID: <54F78BE5.1020608@sandeen.net> (raw)
In-Reply-To: <20150304224557.GY4251@dastard>

On 3/4/15 4:45 PM, Dave Chinner wrote:
> On Wed, Mar 04, 2015 at 05:27:09PM -0500, J. Bruce Fields wrote:
>> On Thu, Mar 05, 2015 at 09:09:00AM +1100, Dave Chinner wrote:
>>> On Wed, Mar 04, 2015 at 10:54:21AM -0500, J. Bruce Fields wrote:
>>>> On Tue, Mar 03, 2015 at 09:08:26PM -0500, J. Bruce Fields wrote:
>>>>> On Wed, Mar 04, 2015 at 09:44:56AM +1100, Dave Chinner wrote:
>>>>>> On Tue, Mar 03, 2015 at 05:10:33PM -0500, J. Bruce Fields wrote:
>>>>>>> I'm getting mysterious crashes on a server exporting an xfs filesystem.
>>>>>>>
>>>>>>> Strangely, I've reproduced this on
>>>>>>>
>>>>>>> 	93aaa830fc17 "Merge tag 'xfs-pnfs-for-linus-3.20-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/dgc/linux-xfs
>>>>>>>
>>>>>>> but haven't yet managed to reproduce on either of its parents
>>>>>>> (24a52e412ef2 or 781355c6e5ae).  That might just be chance, I'll try
>>>>>>> again.
>>>>>>
>>>>>> I think you'll find that the bug is only triggered after that XFS
>>>>>> merge because it's what enabled block layout support in the server,
>>>>>> i.e.  nfsd4_setup_layout_type() is now setting the export type to
>>>>>> LAYOUT_BLOCK_VOLUME because XFS has added the necessary functions to
>>>>>> it's export ops.
>>>>>
>>>>> Doh--after all the discussion I didn't actually pay attention to what
>>>>> happened in the end.  OK, I see, you're right, it's all more-or-less
>>>>> dead code till that merge.
>>>>>
>>>>> Christoph's code was passing all my tests before that, so maybe we
>>>>> broke something in the merge process.
>>>>>
>>>>> Alternatively, it could be because I've added more tests--I'll rerun my
>>>>> current tests on his original branch....
>>>>
>>>> The below is on Christoph's pnfsd-for-3.20-4 (at cd4b02e).  Doesn't look
>>>> very informative.  I'm running xfstests over NFSv4.1 with client and
>>>> server running the same kernel, the filesystem in question is xfs, but
>>>> isn't otherwise available to the client (so the client shouldn't be
>>>> doing pnfs).
>>>>
>>>> --b.
>>>>
>>>> BUG: unable to handle kernel paging request at 00000000757d4900
>>>> IP: [<ffffffff810b59af>] cpuacct_charge+0x5f/0xa0
>>>> PGD 0 
>>>> Thread overran stack, or stack corrupted
>>>
>>> Hmmmm. That is not at all informative, especially as it's only
>>> dumped the interrupt stack and not the stack or the task that it
>>> has detected as overrun or corrupted.
>>>
>>> Can you turn on all the stack overrun debug options? Maybe even
>>> turn on the stack tracer to get an idea of whether we are recursing
>>> deeply somewhere we shouldn't be?
>>
>> Digging around under "Kernel hacking".... I already have
>> DEBUG_STACK_USAGE, DEBUG_STACKOVERFLOW, and STACK_TRACER, and I can try
>> turning on the latter.  (Will I be able to get information out of it
>> before the panic?)
> 
> just keep taking samples of the worst case stack usage as the test
> runs. If there's anything unusual before the failure then it will
> show up, otherwise I'm not sure how to track this down...

I think it should print "maximum stack depth" messages whenever a stack
reaches a new max excursion...

> Cheers,
> 
> Dave.
> 


  reply	other threads:[~2015-03-04 22:49 UTC|newest]

Thread overview: 69+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-03-03 22:10 panic on 4.20 server exporting xfs filesystem J. Bruce Fields
2015-03-03 22:10 ` J. Bruce Fields
2015-03-03 22:44 ` Dave Chinner
2015-03-03 22:44   ` Dave Chinner
2015-03-04  2:08   ` J. Bruce Fields
2015-03-04  2:08     ` J. Bruce Fields
2015-03-04  4:41     ` Dave Chinner
2015-03-04  4:41       ` Dave Chinner
2015-03-05 13:19       ` Christoph Hellwig
2015-03-05 13:19         ` Christoph Hellwig
2015-03-05 15:21         ` J. Bruce Fields
2015-03-05 15:21           ` J. Bruce Fields
2015-03-08 13:08         ` Tom Haynes
2015-03-08 13:08           ` Tom Haynes
2015-03-04 15:54     ` J. Bruce Fields
2015-03-04 15:54       ` J. Bruce Fields
2015-03-04 22:09       ` Dave Chinner
2015-03-04 22:09         ` Dave Chinner
2015-03-04 22:27         ` J. Bruce Fields
2015-03-04 22:27           ` J. Bruce Fields
2015-03-04 22:45           ` Dave Chinner
2015-03-04 22:45             ` Dave Chinner
2015-03-04 22:49             ` Eric Sandeen [this message]
2015-03-04 22:49               ` Eric Sandeen
2015-03-04 22:56               ` Dave Chinner
2015-03-04 22:56                 ` Dave Chinner
2015-03-05  4:08                 ` J. Bruce Fields
2015-03-05  4:08                   ` J. Bruce Fields
2015-03-05 13:17                   ` Christoph Hellwig
2015-03-05 13:17                     ` Christoph Hellwig
2015-03-05 15:01                     ` J. Bruce Fields
2015-03-05 15:01                       ` J. Bruce Fields
2015-03-05 17:02                       ` J. Bruce Fields
2015-03-05 17:02                         ` J. Bruce Fields
2015-03-05 20:47                         ` J. Bruce Fields
2015-03-05 20:47                           ` J. Bruce Fields
2015-03-05 20:59                           ` Dave Chinner
2015-03-05 20:59                             ` Dave Chinner
2015-03-06 20:47                             ` J. Bruce Fields
2015-03-06 20:47                               ` J. Bruce Fields
2015-03-19 17:27                               ` Christoph Hellwig
2015-03-19 17:27                                 ` Christoph Hellwig
2015-03-19 18:47                                 ` J. Bruce Fields
2015-03-19 18:47                                   ` J. Bruce Fields
2015-03-20  6:49                                   ` Christoph Hellwig
2015-03-20  6:49                                     ` Christoph Hellwig
2015-03-08 15:30                           ` Christoph Hellwig
2015-03-08 15:30                             ` Christoph Hellwig
2015-03-09 19:45                             ` J. Bruce Fields
2015-03-09 19:45                               ` J. Bruce Fields
2015-03-20  4:06                     ` Kinglong Mee
2015-03-20  4:06                       ` Kinglong Mee
2015-03-20  6:50                       ` Christoph Hellwig
2015-03-20  6:50                         ` Christoph Hellwig
2015-03-20  7:56                         ` [PATCH] NFSD: Fix infinite loop in nfsd4_cb_layout_fail() Kinglong Mee
2015-03-20  7:56                           ` Kinglong Mee
2015-03-15 12:58 ` panic on 4.20 server exporting xfs filesystem Christoph Hellwig
2015-03-15 12:58   ` Christoph Hellwig
2015-03-16 14:27   ` J. Bruce Fields
2015-03-16 14:27     ` J. Bruce Fields
2015-03-17 10:30     ` Christoph Hellwig
2015-03-17 10:30       ` Christoph Hellwig
2015-03-18 10:50     ` Christoph Hellwig
2015-03-18 10:50       ` Christoph Hellwig
2015-03-27 10:41 ` Christoph Hellwig
2015-03-27 14:50   ` Jeff Layton
2015-03-30 16:44     ` Christoph Hellwig
2015-03-27 15:13   ` J. Bruce Fields
2015-04-26 16:19   ` Christoph Hellwig

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=54F78BE5.1020608@sandeen.net \
    --to=sandeen@sandeen.net \
    --cc=bfields@fieldses.org \
    --cc=david@fromorbit.com \
    --cc=hch@lst.de \
    --cc=linux-nfs@vger.kernel.org \
    --cc=xfs@oss.sgi.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.