All of lore.kernel.org
 help / color / mirror / Atom feed
* Crash with 3.8.3 and TuxOnIce
@ 2013-03-19 23:31 Pedro Ribeiro
  2013-03-20  0:44 ` Dave Chinner
  2013-03-27 23:39 ` Ben Myers
  0 siblings, 2 replies; 9+ messages in thread
From: Pedro Ribeiro @ 2013-03-19 23:31 UTC (permalink / raw)
  To: xfs


[-- Attachment #1.1: Type: text/plain, Size: 741 bytes --]

Hi,

I'm using a TuxOnIce enabled kernel, and recently moved from 3.7.1 to
3.8.3. The former used to work nicely, but now I get a hard crash when I
try to hibernate.
This is solely related to TuxOnIce as it does not happen in the default
hibernation.

I know this is an unsupported out of tree patch, but can you please help me
debug it or point in the right direction?

Unfortunately as this was a kernel crash all I have are crappy photos, but
they do show a readable stack trace:
img18[dot]imageshack[dot]us/img18/395/imag0228hp[dot]jpg
img13[dot]imageshack[dot]us/img13/4375/imag0231p[dot]jpg

(I removed the HTTP part and the dots off the links else it would bounce
back from the list)

Thanks in advance for your help.

Regards,
Pedro

[-- Attachment #1.2: Type: text/html, Size: 1867 bytes --]

[-- Attachment #2: Type: text/plain, Size: 121 bytes --]

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Crash with 3.8.3 and TuxOnIce
  2013-03-19 23:31 Crash with 3.8.3 and TuxOnIce Pedro Ribeiro
@ 2013-03-20  0:44 ` Dave Chinner
  2013-03-20 18:01   ` Pedro Ribeiro
  2013-03-27 23:39 ` Ben Myers
  1 sibling, 1 reply; 9+ messages in thread
From: Dave Chinner @ 2013-03-20  0:44 UTC (permalink / raw)
  To: Pedro Ribeiro; +Cc: xfs

On Tue, Mar 19, 2013 at 11:31:16PM +0000, Pedro Ribeiro wrote:
> Hi,
> 
> I'm using a TuxOnIce enabled kernel, and recently moved from 3.7.1 to
> 3.8.3. The former used to work nicely, but now I get a hard crash when I
> try to hibernate.
> This is solely related to TuxOnIce as it does not happen in the default
> hibernation.
> 
> I know this is an unsupported out of tree patch, but can you please help me
> debug it or point in the right direction?
> 
> Unfortunately as this was a kernel crash all I have are crappy photos, but
> they do show a readable stack trace:
> img18[dot]imageshack[dot]us/img18/395/imag0228hp[dot]jpg
> img13[dot]imageshack[dot]us/img13/4375/imag0231p[dot]jpg
> 
> (I removed the HTTP part and the dots off the links else it would bounce
> back from the list)

The list doesn't bounce URLs. Please make then clicky in future.

As it is, your kernel has oopsed in submit_bio() when writing a log
buffer writing a dummy log record.

The xfs_log_worker is trying to cover the log, and it seems like the
IO subsystem below it is doing something wrong. Perhaps Tux-on-Ice
is killing the IO subsystem without having first stopped all the
async filesystem processing first?

Looks like a Tux-on-ICE problem to me...

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Crash with 3.8.3 and TuxOnIce
  2013-03-20  0:44 ` Dave Chinner
@ 2013-03-20 18:01   ` Pedro Ribeiro
  2013-03-21  1:01     ` Dave Chinner
  0 siblings, 1 reply; 9+ messages in thread
From: Pedro Ribeiro @ 2013-03-20 18:01 UTC (permalink / raw)
  To: Dave Chinner; +Cc: xfs


[-- Attachment #1.1: Type: text/plain, Size: 2237 bytes --]

Thanks for the answer Dave.

Yes I would definitely say it's a ToI bug that perhaps has been dormant so
far. Unfortunately the ToI developer is very busy at the moment, so I will
have to debug and fix it myself.
This problem did not occur with 3.7 and the ToI code did not change.

Do you have any idea where I can start looking for the XFS change in 3.8
that triggered this behaviour in ToI? Or maybe it was a VFS change?

PS: the email definitely bounced back, most likely because imageshack is
blocked on the sgi server:

Technical details of permanent failure:
Google tried to deliver your message, but it was rejected by the server for
the recipient domain oss.sgi.com by cuda-allmx.sgi.com. [192.48.176.16].

The error that the other server returned was:
554 rejecting banned content

Regards,
Pedro

On 20 March 2013 00:44, Dave Chinner <david@fromorbit.com> wrote:

> On Tue, Mar 19, 2013 at 11:31:16PM +0000, Pedro Ribeiro wrote:
> > Hi,
> >
> > I'm using a TuxOnIce enabled kernel, and recently moved from 3.7.1 to
> > 3.8.3. The former used to work nicely, but now I get a hard crash when I
> > try to hibernate.
> > This is solely related to TuxOnIce as it does not happen in the default
> > hibernation.
> >
> > I know this is an unsupported out of tree patch, but can you please help
> me
> > debug it or point in the right direction?
> >
> > Unfortunately as this was a kernel crash all I have are crappy photos,
> but
> > they do show a readable stack trace:
> > img18[dot]imageshack[dot]us/img18/395/imag0228hp[dot]jpg
> > img13[dot]imageshack[dot]us/img13/4375/imag0231p[dot]jpg
> >
> > (I removed the HTTP part and the dots off the links else it would bounce
> > back from the list)
>
> The list doesn't bounce URLs. Please make then clicky in future.
>
> As it is, your kernel has oopsed in submit_bio() when writing a log
> buffer writing a dummy log record.
>
> The xfs_log_worker is trying to cover the log, and it seems like the
> IO subsystem below it is doing something wrong. Perhaps Tux-on-Ice
> is killing the IO subsystem without having first stopped all the
> async filesystem processing first?
>
> Looks like a Tux-on-ICE problem to me...
>
> Cheers,
>
> Dave.
> --
> Dave Chinner
> david@fromorbit.com
>

[-- Attachment #1.2: Type: text/html, Size: 4118 bytes --]

[-- Attachment #2: Type: text/plain, Size: 121 bytes --]

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Crash with 3.8.3 and TuxOnIce
  2013-03-20 18:01   ` Pedro Ribeiro
@ 2013-03-21  1:01     ` Dave Chinner
  2013-03-21 17:45       ` Pedro Ribeiro
  0 siblings, 1 reply; 9+ messages in thread
From: Dave Chinner @ 2013-03-21  1:01 UTC (permalink / raw)
  To: Pedro Ribeiro; +Cc: xfs

On Wed, Mar 20, 2013 at 06:01:35PM +0000, Pedro Ribeiro wrote:
> Thanks for the answer Dave.
> 
> Yes I would definitely say it's a ToI bug that perhaps has been dormant so
> far. Unfortunately the ToI developer is very busy at the moment, so I will
> have to debug and fix it myself.
> This problem did not occur with 3.7 and the ToI code did not change.
> 
> Do you have any idea where I can start looking for the XFS change in 3.8
> that triggered this behaviour in ToI? Or maybe it was a VFS change?

It's almost certainly an XFS change that triggered it, but it
indicates (once again) that the hibernate code is simply not
quiescing filesystems properly (i.e. by freezing them). The work
that caused this problem is stopped by the filesystem when it
is frozen, and started again when it is thawed...

> PS: the email definitely bounced back, most likely because imageshack is
> blocked on the sgi server:
> 
> Technical details of permanent failure:
> Google tried to deliver your message, but it was rejected by the server for
> the recipient domain oss.sgi.com by cuda-allmx.sgi.com. [192.48.176.16].
> 
> The error that the other server returned was:
> 554 rejecting banned content

IOWs, a stupid spam filter.

I'll see if I can get this fixed.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Crash with 3.8.3 and TuxOnIce
  2013-03-21  1:01     ` Dave Chinner
@ 2013-03-21 17:45       ` Pedro Ribeiro
  2013-03-27 21:58         ` Pedro Ribeiro
  0 siblings, 1 reply; 9+ messages in thread
From: Pedro Ribeiro @ 2013-03-21 17:45 UTC (permalink / raw)
  To: Dave Chinner; +Cc: xfs


[-- Attachment #1.1: Type: text/plain, Size: 1735 bytes --]

On 21 March 2013 01:01, Dave Chinner <david@fromorbit.com> wrote:

> On Wed, Mar 20, 2013 at 06:01:35PM +0000, Pedro Ribeiro wrote:
> > Thanks for the answer Dave.
> >
> > Yes I would definitely say it's a ToI bug that perhaps has been dormant
> so
> > far. Unfortunately the ToI developer is very busy at the moment, so I
> will
> > have to debug and fix it myself.
> > This problem did not occur with 3.7 and the ToI code did not change.
> >
> > Do you have any idea where I can start looking for the XFS change in 3.8
> > that triggered this behaviour in ToI? Or maybe it was a VFS change?
>
> It's almost certainly an XFS change that triggered it, but it
> indicates (once again) that the hibernate code is simply not
> quiescing filesystems properly (i.e. by freezing them). The work
> that caused this problem is stopped by the filesystem when it
> is frozen, and started again when it is thawed...
>
> > PS: the email definitely bounced back, most likely because imageshack is
> > blocked on the sgi server:
> >
> > Technical details of permanent failure:
> > Google tried to deliver your message, but it was rejected by the server
> for
> > the recipient domain oss.sgi.com by cuda-allmx.sgi.com. [192.48.176.16].
> >
> > The error that the other server returned was:
> > 554 rejecting banned content
>
> IOWs, a stupid spam filter.
>
> I'll see if I can get this fixed.
>
> Cheers,
>
> Dave.
> --
> Dave Chinner
> david@fromorbit.com
>

Actually I've nailed it down to a commit between 3.7.1 and 3.7.10. I'll do
some git bisection and come back with the results.

Regarding ToI and filesystem freezing, I guess I need to start delving into
the code to see if I can fix it - long but fun journey ahead I guess.

Regards,
Pedro

[-- Attachment #1.2: Type: text/html, Size: 2600 bytes --]

[-- Attachment #2: Type: text/plain, Size: 121 bytes --]

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Crash with 3.8.3 and TuxOnIce
  2013-03-21 17:45       ` Pedro Ribeiro
@ 2013-03-27 21:58         ` Pedro Ribeiro
  2013-03-28  0:51           ` Dave Chinner
  0 siblings, 1 reply; 9+ messages in thread
From: Pedro Ribeiro @ 2013-03-27 21:58 UTC (permalink / raw)
  To: Dave Chinner; +Cc: xfs


[-- Attachment #1.1: Type: text/plain, Size: 2829 bytes --]

Hi Dave (and others),

I've pretty much established the responsible: commit
437a255aa23766666aec78af63be4c253faa8d57
(
http://git.kernel.org/cgit/linux/kernel/git/stable/stable-queue.git/tree/releases/3.7.2/xfs-fix-direct-io-nested-transaction-deadlock.patch?id=HEAD
).

Without this patch, the computer does not lock up in hibernate. So I
understand that this is most likely a bug in ToI, not in xfs. Does this
give you a better idea of how to solve the problem? The only xfs-specific
patch in ToI is below:

diff --git a/fs/xfs/xfs_trans_ail.c b/fs/xfs/xfs_trans_ail.c
index 0eda725..55de808 100644
--- a/fs/xfs/xfs_trans_ail.c
+++ b/fs/xfs/xfs_trans_ail.c
@@ -511,6 +511,7 @@ xfsaild(
  struct xfs_ail *ailp = data;
  long tout = 0; /* milliseconds */

+ set_freezable();
  current->flags |= PF_MEMALLOC;

  while (!kthread_should_stop()) {

Looking at the code blindly, it appears to be similar to what goes on in
other filesystems...

Regards,
Pedro


On 21 March 2013 17:45, Pedro Ribeiro <pedrib@gmail.com> wrote:

>
>
>
> On 21 March 2013 01:01, Dave Chinner <david@fromorbit.com> wrote:
>
>> On Wed, Mar 20, 2013 at 06:01:35PM +0000, Pedro Ribeiro wrote:
>> > Thanks for the answer Dave.
>> >
>> > Yes I would definitely say it's a ToI bug that perhaps has been dormant
>> so
>> > far. Unfortunately the ToI developer is very busy at the moment, so I
>> will
>> > have to debug and fix it myself.
>> > This problem did not occur with 3.7 and the ToI code did not change.
>> >
>> > Do you have any idea where I can start looking for the XFS change in 3.8
>> > that triggered this behaviour in ToI? Or maybe it was a VFS change?
>>
>> It's almost certainly an XFS change that triggered it, but it
>> indicates (once again) that the hibernate code is simply not
>> quiescing filesystems properly (i.e. by freezing them). The work
>> that caused this problem is stopped by the filesystem when it
>> is frozen, and started again when it is thawed...
>>
>> > PS: the email definitely bounced back, most likely because imageshack is
>> > blocked on the sgi server:
>> >
>> > Technical details of permanent failure:
>> > Google tried to deliver your message, but it was rejected by the server
>> for
>> > the recipient domain oss.sgi.com by cuda-allmx.sgi.com.
>> [192.48.176.16].
>> >
>> > The error that the other server returned was:
>> > 554 rejecting banned content
>>
>> IOWs, a stupid spam filter.
>>
>> I'll see if I can get this fixed.
>>
>> Cheers,
>>
>> Dave.
>> --
>> Dave Chinner
>> david@fromorbit.com
>>
>
> Actually I've nailed it down to a commit between 3.7.1 and 3.7.10. I'll do
> some git bisection and come back with the results.
>
> Regarding ToI and filesystem freezing, I guess I need to start delving
> into the code to see if I can fix it - long but fun journey ahead I guess.
>
> Regards,
> Pedro
>
>

[-- Attachment #1.2: Type: text/html, Size: 4841 bytes --]

[-- Attachment #2: Type: text/plain, Size: 121 bytes --]

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: Crash with 3.8.3 and TuxOnIce
  2013-03-19 23:31 Crash with 3.8.3 and TuxOnIce Pedro Ribeiro
  2013-03-20  0:44 ` Dave Chinner
@ 2013-03-27 23:39 ` Ben Myers
  2013-03-28  0:51   ` Dave Chinner
  1 sibling, 1 reply; 9+ messages in thread
From: Ben Myers @ 2013-03-27 23:39 UTC (permalink / raw)
  To: Pedro Ribeiro; +Cc: xfs

On Tue, Mar 19, 2013 at 11:31:16PM +0000, Pedro Ribeiro wrote:
> I'm using a TuxOnIce enabled kernel, and recently moved from 3.7.1 to
> 3.8.3. The former used to work nicely, but now I get a hard crash when I
> try to hibernate.
> This is solely related to TuxOnIce as it does not happen in the default
> hibernation.
> 
> I know this is an unsupported out of tree patch, but can you please help me
> debug it or point in the right direction?
> 
> Unfortunately as this was a kernel crash all I have are crappy photos, but
> they do show a readable stack trace:
> img18[dot]imageshack[dot]us/img18/395/imag0228hp[dot]jpg
> img13[dot]imageshack[dot]us/img13/4375/imag0231p[dot]jpg

The admin made a change change in mailer settings.  Lets see if it works now:
http://img18.imageshack.us/img18/395/imag0228hp.jpg

-Ben

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Crash with 3.8.3 and TuxOnIce
  2013-03-27 21:58         ` Pedro Ribeiro
@ 2013-03-28  0:51           ` Dave Chinner
  0 siblings, 0 replies; 9+ messages in thread
From: Dave Chinner @ 2013-03-28  0:51 UTC (permalink / raw)
  To: Pedro Ribeiro; +Cc: xfs

On Wed, Mar 27, 2013 at 09:58:43PM +0000, Pedro Ribeiro wrote:
> Hi Dave (and others),
> 
> I've pretty much established the responsible: commit
> 437a255aa23766666aec78af63be4c253faa8d57
> (
> http://git.kernel.org/cgit/linux/kernel/git/stable/stable-queue.git/tree/releases/3.7.2/xfs-fix-direct-io-nested-transaction-deadlock.patch?id=HEAD
> ).

Seems completely unrelated to the problem you saw.

> Without this patch, the computer does not lock up in hibernate. So I
> understand that this is most likely a bug in ToI, not in xfs. Does this
> give you a better idea of how to solve the problem?

No.

> The only xfs-specific
> patch in ToI is below:
> 
> diff --git a/fs/xfs/xfs_trans_ail.c b/fs/xfs/xfs_trans_ail.c
> index 0eda725..55de808 100644
> --- a/fs/xfs/xfs_trans_ail.c
> +++ b/fs/xfs/xfs_trans_ail.c
> @@ -511,6 +511,7 @@ xfsaild(
>   struct xfs_ail *ailp = data;
>   long tout = 0; /* milliseconds */
> 
> + set_freezable();
>   current->flags |= PF_MEMALLOC;

We do not ever set the PF_NOFREEZE, so set_freezable() is a no-op.
If ToI has introduced new freeze API dependencies, then I'm not
going to try to understand or fix them.

>   while (!kthread_should_stop()) {
> 
> Looking at the code blindly, it appears to be similar to what goes on in
> other filesystems...

That loop has a call to try_to_freeze() in it, which is how such
kthreads are supposed to handle freezing. i.e. once they enter a
state in which they can freeze, they call try_to_freeze() and then
get moved to the refrigerator.

You need to talk to the ToI developers...

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Crash with 3.8.3 and TuxOnIce
  2013-03-27 23:39 ` Ben Myers
@ 2013-03-28  0:51   ` Dave Chinner
  0 siblings, 0 replies; 9+ messages in thread
From: Dave Chinner @ 2013-03-28  0:51 UTC (permalink / raw)
  To: Ben Myers; +Cc: Pedro Ribeiro, xfs

On Wed, Mar 27, 2013 at 06:39:12PM -0500, Ben Myers wrote:
> On Tue, Mar 19, 2013 at 11:31:16PM +0000, Pedro Ribeiro wrote:
> > I'm using a TuxOnIce enabled kernel, and recently moved from 3.7.1 to
> > 3.8.3. The former used to work nicely, but now I get a hard crash when I
> > try to hibernate.
> > This is solely related to TuxOnIce as it does not happen in the default
> > hibernation.
> > 
> > I know this is an unsupported out of tree patch, but can you please help me
> > debug it or point in the right direction?
> > 
> > Unfortunately as this was a kernel crash all I have are crappy photos, but
> > they do show a readable stack trace:
> > img18[dot]imageshack[dot]us/img18/395/imag0228hp[dot]jpg
> > img13[dot]imageshack[dot]us/img13/4375/imag0231p[dot]jpg
> 
> The admin made a change change in mailer settings.  Lets see if it works now:
> http://img18.imageshack.us/img18/395/imag0228hp.jpg

Thanks, Ben. it works just fine.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2013-03-28 16:13 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-03-19 23:31 Crash with 3.8.3 and TuxOnIce Pedro Ribeiro
2013-03-20  0:44 ` Dave Chinner
2013-03-20 18:01   ` Pedro Ribeiro
2013-03-21  1:01     ` Dave Chinner
2013-03-21 17:45       ` Pedro Ribeiro
2013-03-27 21:58         ` Pedro Ribeiro
2013-03-28  0:51           ` Dave Chinner
2013-03-27 23:39 ` Ben Myers
2013-03-28  0:51   ` Dave Chinner

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.