All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jan Kara <jack@suse.cz>
To: "Peter M. Petrakis" <peter.petrakis@canonical.com>
Cc: Toshiyuki Okajima <toshi.okajima@jp.fujitsu.com>,
	Jan Kara <jack@suse.cz>, Ted Ts'o <tytso@mit.edu>,
	Masayoshi MIZUMA <m.mizuma@jp.fujitsu.com>,
	Andreas Dilger <adilger.kernel@dilger.ca>,
	linux-ext4@vger.kernel.org, linux-fsdevel@vger.kernel.org,
	sandeen@redhat.com, Craig Magina <Craig.Magina@canonical.com>
Subject: Re: [RFC][PATCH] Re: [BUG] ext4: cannot unfreeze a filesystem due to a deadlock
Date: Fri, 22 Apr 2011 23:40:46 +0200	[thread overview]
Message-ID: <20110422214046.GE2977@quack.suse.cz> (raw)
In-Reply-To: <4DB1F26F.8040403@canonical.com>

  Hello,

On Fri 22-04-11 17:26:07, Peter M. Petrakis wrote:
> On 04/22/2011 02:58 AM, Toshiyuki Okajima wrote:
> > On Tue, 19 Apr 2011 18:43:16 +0900
> > I have confirmed that the following patch works fine while my or
> > Mizuma-san's reproducer is running. Therefore,
> >  we can block to write the data, which is mmapped to a file, into a disk
> > by a page-fault while fsfreezing. 
> > 
> > I think this patch fixes the following two problems:
> > - A deadlock occurs between ext4_da_writepages() (called from
> > writeback_inodes_wb) and thaw_super(). (reported by Mizuma-san)
> > - We can also write the data, which is mmapped to a file,
> >   into a disk while fsfreezing (ext3/ext4).
> >                                        (reported by me)
> > 
> > Please examine this patch.
> 
> We've recently identified the same root cause in 2.6.32 though the hit rate
> is much much higher. The configuration is a SAN ALUA Active/Standby using
> multipath. The s_wait_unfrozen/s_umount deadlock is regularly encountered
> when a path comes back into service, as a result of a kpartx invocation on
> behalf of this udev rule.
> 
> /lib/udev/rules.d/95-kpartx.rules
> 
> # Create dm tables for partitions
> ENV{DM_STATE}=="ACTIVE", ENV{DM_UUID}=="mpath-*", \
>         RUN+="/sbin/dmsetup ls --target multipath --exec '/sbin/kpartx -a -p -part' -j %M -m %m"
  Hmm, I don't think this is the same problem... See:

> [ 1898.017614] mptsas: ioc0: mptsas_add_fw_event: add (fw_event=0xffff880c3c815200)
> [ 1898.025995] mptsas: ioc0: mptsas_free_fw_event: kfree (fw_event=0xffff880c3c814780)
> [ 1898.034625] mptsas: ioc0: mptsas_firmware_event_work: fw_event=(0xffff880c3c814b40), event = (0x12)
> [ 1898.044803] mptsas: ioc0: mptsas_free_fw_event: kfree (fw_event=0xffff880c3c814b40)
> [ 1898.053475] mptsas: ioc0: mptsas_firmware_event_work: fw_event=(0xffff880c3c815c80), event = (0x12)
> [ 1898.063690] mptsas: ioc0: mptsas_free_fw_event: kfree (fw_event=0xffff880c3c815c80)
> [ 1898.072316] mptsas: ioc0: mptsas_firmware_event_work: fw_event=(0xffff880c3c815200), event = (0x0f)
> [ 1898.082544] mptsas: ioc0: mptsas_free_fw_event: kfree (fw_event=0xffff880c3c815200)
> [ 1898.571426] sd 0:0:1:0: alua: port group 01 state S supports toluSnA
> [ 1898.578635] device-mapper: multipath: Failing path 8:32.
> [ 2041.345645] INFO: task kjournald:595 blocked for more than 120 seconds.
> [ 2041.353075] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> [ 2041.361891] kjournald       D ffff88063acb9a90     0   595      2 0x00000000
> [ 2041.369891]  ffff88063ace1c30 0000000000000046 ffff88063c282140 ffff880600000000
> [ 2041.378416]  0000000000013cc0 ffff88063acb96e0 ffff88063acb9a90 ffff88063ace1fd8
> [ 2041.386954]  ffff88063acb9a98 0000000000013cc0 ffff88063ace0010 0000000000013cc0
> 
> [ 2041.395561] Call Trace:
> [ 2041.398358]  [<ffffffff81192380>] ? sync_buffer+0x0/0x50
> [ 2041.404342]  [<ffffffff815d3120>] io_schedule+0x70/0xc0
> [ 2041.410227]  [<ffffffff811923c5>] sync_buffer+0x45/0x50
> [ 2041.416179]  [<ffffffff815d378f>] __wait_on_bit+0x5f/0x90
> [ 2041.422258]  [<ffffffff81192380>] ? sync_buffer+0x0/0x50
> [ 2041.428275]  [<ffffffff815d3838>] out_of_line_wait_on_bit+0x78/0x90
> [ 2041.435324]  [<ffffffff81086b90>] ? wake_bit_function+0x0/0x40
> [ 2041.441958]  [<ffffffff8119237e>] __wait_on_buffer+0x2e/0x30
> [ 2041.448333]  [<ffffffff8123ab14>] journal_commit_transaction+0x7e4/0xec0
  So kjournald is committing a transaction and waiting for IO to complete.
Which maybe never happens because of multipath being in transition? That
would be a bug...

> [ 2041.670669] multipathd      D ffff88063e3303b0     0  1337      1 0x00000000
> [ 2041.678746]  ffff88063c0fda18 0000000000000082 0000000000000000 ffff880600000000
> [ 2041.687219]  0000000000013cc0 ffff88063e330000 ffff88063e3303b0 ffff88063c0fdfd8
> [ 2041.695818]  ffff88063e3303b8 0000000000013cc0 ffff88063c0fc010 0000000000013cc0
> [ 2041.704369] Call Trace:
> [ 2041.707128]  [<ffffffff815d349d>] schedule_timeout+0x21d/0x300
> [ 2041.713679]  [<ffffffff8104c8ec>] ? resched_task+0x2c/0x90
> [ 2041.719846]  [<ffffffff8105f763>] ? try_to_wake_up+0xc3/0x410
> [ 2041.726301]  [<ffffffff815d2436>] wait_for_common+0xd6/0x180
> [ 2041.732685]  [<ffffffff8105fb05>] ? wake_up_process+0x15/0x20
> [ 2041.739138]  [<ffffffff8105fab0>] ? default_wake_function+0x0/0x20
> [ 2041.746079]  [<ffffffff815d25bd>] wait_for_completion+0x1d/0x20
> [ 2041.752716]  [<ffffffff8107de18>] call_usermodehelper_exec+0xd8/0xe0
> [ 2041.759853]  [<ffffffff814a3110>] ? parse_hw_handler+0xb0/0x240
> [ 2041.766503]  [<ffffffff8107e060>] __request_module+0x190/0x210
> [ 2041.773054]  [<ffffffff812e0c28>] ? sscanf+0x38/0x40
> [ 2041.778636]  [<ffffffff814a3110>] parse_hw_handler+0xb0/0x240
> [ 2041.785121]  [<ffffffff814a38c3>] multipath_ctr+0x83/0x1d0
> [ 2041.791312]  [<ffffffff8149abd5>] ? dm_split_args+0x75/0x140
> [ 2041.797671]  [<ffffffff8149b9af>] dm_table_add_target+0xff/0x250
> [ 2041.804413]  [<ffffffff8149de3a>] table_load+0xca/0x2f0
> [ 2041.810317]  [<ffffffff8149dd70>] ? table_load+0x0/0x2f0
> [ 2041.816316]  [<ffffffff8149f0d5>] ctl_ioctl+0x1a5/0x240
> [ 2041.822184]  [<ffffffff8149f183>] dm_ctl_ioctl+0x13/0x20
> [ 2041.828188]  [<ffffffff81175245>] do_vfs_ioctl+0x95/0x3c0
> [ 2041.834250]  [<ffffffff8109ae6b>] ? sys_futex+0x7b/0x170
> [ 2041.840219]  [<ffffffff81175611>] sys_ioctl+0xa1/0xb0
> [ 2041.845898]  [<ffffffff8100c042>] system_call_fastpath+0x16/0x1b
  multipathd is hung waiting for module to be loaded? How come?

> [ 2041.964575] INFO: task kpartx:1897 blocked for more than 120 seconds.
> [ 2041.971801] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> [ 2041.980626] kpartx          D ffff88063d05df30     0  1897   1896 0x00000000
> [ 2041.988607]  ffff88063c3a5b58 0000000000000082 0000000e3c3a5ac8 ffff880c00000000
> [ 2041.997056]  0000000000013cc0 ffff88063d05db80 ffff88063d05df30 ffff88063c3a5fd8
> [ 2042.005496]  ffff88063d05df38 0000000000013cc0 ffff88063c3a4010 0000000000013cc0
> [ 2042.013939] Call Trace:
> [ 2042.016702]  [<ffffffff8123dc85>] log_wait_commit+0xc5/0x150
> [ 2042.023089]  [<ffffffff81086b50>] ? autoremove_wake_function+0x0/0x40
> [ 2042.030321]  [<ffffffff815d4eee>] ? _raw_spin_lock+0xe/0x20
> [ 2042.036584]  [<ffffffff811e6256>] ext3_sync_fs+0x66/0x70
> [ 2042.042552]  [<ffffffff811ba7c1>] dquot_quota_sync+0x1c1/0x330
> [ 2042.049133]  [<ffffffff81115391>] ? do_writepages+0x21/0x40
> [ 2042.055423]  [<ffffffff8110ae8b>] ? __filemap_fdatawrite_range+0x5b/0x60
> [ 2042.062944]  [<ffffffff8118f42c>] __sync_filesystem+0x3c/0x90
> [ 2042.069430]  [<ffffffff8118f56b>] sync_filesystem+0x4b/0x70
> [ 2042.075690]  [<ffffffff81166a85>] freeze_super+0x55/0x100
> [ 2042.081754]  [<ffffffff811993b8>] freeze_bdev+0x98/0xe0
> [ 2042.087625]  [<ffffffff81499001>] dm_suspend+0xa1/0x2e0
> [ 2042.093495]  [<ffffffff8149ced9>] ? __get_name_cell+0x99/0xb0
> [ 2042.099948]  [<ffffffff8149e2d0>] ? dev_suspend+0x0/0xb0
> [ 2042.105916]  [<ffffffff8149e29b>] do_resume+0x17b/0x1b0
> [ 2042.111784]  [<ffffffff8149e2d0>] ? dev_suspend+0x0/0xb0
> [ 2042.117753]  [<ffffffff8149e365>] dev_suspend+0x95/0xb0
> [ 2042.123621]  [<ffffffff8149e2d0>] ? dev_suspend+0x0/0xb0
> [ 2042.129591]  [<ffffffff8149f0d5>] ctl_ioctl+0x1a5/0x240
> [ 2042.135493]  [<ffffffff815d4eee>] ? _raw_spin_lock+0xe/0x20
> [ 2042.141770]  [<ffffffff8149f183>] dm_ctl_ioctl+0x13/0x20
> [ 2042.147739]  [<ffffffff81175245>] do_vfs_ioctl+0x95/0x3c0
> [ 2042.153801]  [<ffffffff81175611>] sys_ioctl+0xa1/0xb0
> [ 2042.159478]  [<ffffffff8100c042>] system_call_fastpath+0x16/0x1b
  kpartx is waiting for kjournald to finish transaction commit and it is
holding s_umount but that doesn't really seem to be a problem...

So as I say, find a reason why kjournald is not able to finish committing a
transaction and you should solve this riddle ;).

								Honza
-- 
Jan Kara <jack@suse.cz>
SUSE Labs, CR

  reply	other threads:[~2011-04-22 21:40 UTC|newest]

Thread overview: 121+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-02-07 11:53 [BUG] ext4: cannot unfreeze a filesystem due to a deadlock Masayoshi MIZUMA
2011-02-15 16:06 ` Jan Kara
2011-02-15 17:03   ` Ted Ts'o
2011-02-15 17:29     ` Jan Kara
2011-02-15 18:04       ` Ted Ts'o
2011-02-15 19:11         ` Jan Kara
2011-02-15 23:17       ` Toshiyuki Okajima
2011-02-16 14:56         ` Jan Kara
2011-02-17  3:50           ` Toshiyuki Okajima
2011-02-17  5:13             ` Andreas Dilger
2011-02-17 10:41               ` Jan Kara
2011-02-17 10:45             ` Jan Kara
2011-03-28  8:06               ` [RFC][PATCH] " Toshiyuki Okajima
2011-03-30 14:12                 ` Jan Kara
2011-03-31  8:37                   ` Yongqiang Yang
2011-03-31  8:48                     ` Yongqiang Yang
2011-03-31 14:04                     ` Eric Sandeen
2011-03-31 14:36                       ` Yongqiang Yang
2011-03-31 15:25                         ` Eric Sandeen
2011-03-31 16:28                         ` Jan Kara
2011-03-31 12:03                   ` Toshiyuki Okajima
2011-04-05 10:25                     ` Toshiyuki Okajima
2011-04-05 22:54                       ` Jan Kara
2011-04-06  5:09                         ` Toshiyuki Okajima
2011-04-06  5:57                           ` Jan Kara
2011-04-06  7:40                             ` Toshiyuki Okajima
2011-04-06 17:46                               ` Jan Kara
2011-04-15 13:39                                 ` Toshiyuki Okajima
2011-04-15 17:13                                   ` Jan Kara
2011-04-15 17:17                                     ` Eric Sandeen
2011-04-15 17:37                                       ` Jan Kara
2011-04-18  9:05                                     ` Toshiyuki Okajima
2011-04-18 10:51                                       ` Jan Kara
2011-04-19  9:43                                         ` Toshiyuki Okajima
2011-04-22  6:58                                           ` Toshiyuki Okajima
2011-04-22 21:26                                             ` Peter M. Petrakis
2011-04-22 21:40                                               ` Jan Kara [this message]
2011-04-22 22:57                                                 ` Peter M. Petrakis
2011-04-22 22:10                                             ` Jan Kara
2011-04-25  6:28                                               ` Toshiyuki Okajima
2011-05-03  8:06                                                 ` Surbhi Palande
2011-05-03 11:01                                       ` Surbhi Palande
2011-05-03 13:08                                         ` (unknown), Surbhi Palande
2011-05-03 13:46                                           ` your mail Jan Kara
2011-05-03 13:56                                             ` Surbhi Palande
2011-05-03 15:26                                               ` Surbhi Palande
2011-05-03 15:36                                               ` Jan Kara
2011-05-03 15:43                                                 ` Surbhi Palande
2011-05-04 19:24                                                   ` Jan Kara
2011-05-06 15:20                                                     ` [RFC][PATCH] Do not accept a new handle when the F.S is frozen Surbhi Palande
2011-05-06 15:20                                                     ` [PATCH] Adding support to freeze and unfreeze a journal Surbhi Palande
2011-05-06 20:56                                                       ` Andreas Dilger
2011-05-07 20:04                                                         ` [PATCH v2] " Surbhi Palande
2011-05-08  8:24                                                           ` Marco Stornelli
2011-05-09  9:04                                                             ` Surbhi Palande
2011-05-09  9:24                                                               ` Jan Kara
2011-05-09  9:53                                                           ` Jan Kara
2011-05-09 13:49                                                             ` Surbhi Palande
2011-05-09 14:51                                                               ` [PATCH v3] " Surbhi Palande
2011-05-09 15:08                                                                 ` Jan Kara
2011-05-10 15:07                                                                   ` [PATCH] " Surbhi Palande
2011-05-10 21:07                                                                     ` Andreas Dilger
2011-05-11  7:46                                                                       ` Surbhi Palande
2011-05-09 15:23                                                                 ` [PATCH v3] " Eric Sandeen
2011-05-11  7:06                                                                   ` Surbhi Palande
2011-05-11  7:10                                                                     ` [PATCH] Attempt to sync the fsstress writes to a frozen F.S Surbhi Palande
2011-05-12 14:22                                                                       ` Eric Sandeen
2011-05-12 14:22                                                                         ` Eric Sandeen
2011-05-24 21:42                                                                       ` Ted Ts'o
2011-05-25 12:00                                                                         ` Surbhi Palande
2011-05-25 12:12                                                                           ` Theodore Tso
2011-05-27 16:28                                                                             ` Jan Kara
2011-05-11  9:05                                                                     ` [PATCH v3] Adding support to freeze and unfreeze a journal Andreas Dilger
2011-05-12  9:40                                                                       ` Surbhi Palande
2011-05-03 13:08                                         ` [PATCH] Prevent dirtying a page when ext4 F.S is frozen Surbhi Palande
2011-05-03 15:19                                         ` [RFC][PATCH] Re: [BUG] ext4: cannot unfreeze a filesystem due to a deadlock Jan Kara
2011-05-04 12:09                                           ` Surbhi Palande
2011-05-04 19:19                                             ` Jan Kara
2011-05-04 21:34                                               ` Surbhi Palande
2011-05-04 22:48                                                 ` Jan Kara
2011-05-05  6:06                                                   ` Surbhi Palande
2011-05-05 11:18                                                     ` Jan Kara
2011-05-05 14:01                                                       ` Surbhi Palande
2011-03-31 23:40                 ` Dave Chinner
2011-03-31 23:53                   ` Eric Sandeen
2011-04-01 14:08                   ` Jan Kara
2011-04-06  5:40                     ` Dave Chinner
2011-04-06  6:18                       ` Jan Kara
2011-04-06 11:21                         ` Dave Chinner
2011-04-06 13:44                           ` Christoph Hellwig
2011-04-06 22:59                             ` Dave Chinner
2011-04-06 17:40                           ` Jan Kara
2011-04-06 22:54                             ` Dave Chinner
2011-04-08 21:33                               ` Jan Kara
2011-05-02  9:07                           ` Surbhi Palande
2011-05-02 10:56                             ` Jan Kara
2011-05-02 11:27                               ` Surbhi Palande
2011-05-02 12:06                                 ` Surbhi Palande
2011-05-02 12:20                                 ` Jan Kara
2011-05-02 12:30                                   ` Surbhi Palande
2011-05-02 13:16                                     ` Jan Kara
2011-05-02 13:22                                       ` Christoph Hellwig
2011-05-02 14:20                                         ` Jan Kara
2011-05-02 14:41                                           ` Christoph Hellwig
2011-05-02 16:23                                             ` Jan Kara
2011-05-02 16:38                                               ` Christoph Hellwig
2011-05-02 13:22                                       ` Surbhi Palande
2011-05-02 13:24                                         ` Christoph Hellwig
2011-05-02 13:27                                           ` Surbhi Palande
2011-05-02 14:26                                             ` Jan Kara
2011-05-02 14:04                                         ` Eric Sandeen
2011-05-03  7:27                                           ` Surbhi Palande
2011-05-03 20:14                                             ` Eric Sandeen
2011-05-04  8:26                                               ` Surbhi Palande
2011-05-04 14:30                                                 ` Eric Sandeen
2011-05-02 14:01                                     ` Eric Sandeen
2011-04-05 10:44                   ` Toshiyuki Okajima
2011-12-09  1:56 ` Masayoshi MIZUMA
2011-12-15 12:41   ` Masayoshi MIZUMA
2013-11-29  4:58     ` Yongqiang Yang
2013-11-29  8:00       ` Jan Kara

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20110422214046.GE2977@quack.suse.cz \
    --to=jack@suse.cz \
    --cc=Craig.Magina@canonical.com \
    --cc=adilger.kernel@dilger.ca \
    --cc=linux-ext4@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=m.mizuma@jp.fujitsu.com \
    --cc=peter.petrakis@canonical.com \
    --cc=sandeen@redhat.com \
    --cc=toshi.okajima@jp.fujitsu.com \
    --cc=tytso@mit.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.