Hi Dave (and others),

I've pretty much established the responsible: commit 437a255aa23766666aec78af63be4c253faa8d57 (http://git.kernel.org/cgit/linux/kernel/git/stable/stable-queue.git/tree/releases/3.7.2/xfs-fix-direct-io-nested-transaction-deadlock.patch?id=HEAD).

Without this patch, the computer does not lock up in hibernate. So I understand that this is most likely a bug in ToI, not in xfs. Does this give you a better idea of how to solve the problem? The only xfs-specific patch in ToI is below:

diff --git a/fs/xfs/xfs_trans_ail.c b/fs/xfs/xfs_trans_ail.c
index 0eda725..55de808 100644
--- a/fs/xfs/xfs_trans_ail.c
+++ b/fs/xfs/xfs_trans_ail.c
@@ -511,6 +511,7 @@ xfsaild(
  struct xfs_ail *ailp = data;
  long tout = 0; /* milliseconds */
 
+ set_freezable();
  current->flags |= PF_MEMALLOC;
 
  while (!kthread_should_stop()) {

Looking at the code blindly, it appears to be similar to what goes on in other filesystems...

Regards,
Pedro


On 21 March 2013 17:45, Pedro Ribeiro <pedrib@gmail.com> wrote:



On 21 March 2013 01:01, Dave Chinner <david@fromorbit.com> wrote:
On Wed, Mar 20, 2013 at 06:01:35PM +0000, Pedro Ribeiro wrote:
> Thanks for the answer Dave.
>
> Yes I would definitely say it's a ToI bug that perhaps has been dormant so
> far. Unfortunately the ToI developer is very busy at the moment, so I will
> have to debug and fix it myself.
> This problem did not occur with 3.7 and the ToI code did not change.
>
> Do you have any idea where I can start looking for the XFS change in 3.8
> that triggered this behaviour in ToI? Or maybe it was a VFS change?

It's almost certainly an XFS change that triggered it, but it
indicates (once again) that the hibernate code is simply not
quiescing filesystems properly (i.e. by freezing them). The work
that caused this problem is stopped by the filesystem when it
is frozen, and started again when it is thawed...

> PS: the email definitely bounced back, most likely because imageshack is
> blocked on the sgi server:
>
> Technical details of permanent failure:
> Google tried to deliver your message, but it was rejected by the server for
> the recipient domain oss.sgi.com by cuda-allmx.sgi.com. [192.48.176.16].
>
> The error that the other server returned was:
> 554 rejecting banned content

IOWs, a stupid spam filter.

I'll see if I can get this fixed.

Cheers,

Dave.
--
Dave Chinner
david@fromorbit.com

Actually I've nailed it down to a commit between 3.7.1 and 3.7.10. I'll do some git bisection and come back with the results.

Regarding ToI and filesystem freezing, I guess I need to start delving into the code to see if I can fix it - long but fun journey ahead I guess.

Regards,
Pedro