All of lore.kernel.org
 help / color / mirror / Atom feed
* BUG in xlog_get_lowest_lsn
@ 2012-03-23 17:43 Ben Myers
  2012-05-14 20:34 ` [PATCH] xfs: use s_umount sema in xfs_sync_worker Ben Myers
  0 siblings, 1 reply; 27+ messages in thread
From: Ben Myers @ 2012-03-23 17:43 UTC (permalink / raw)
  To: xfs

Hi,

I just had an interesing crash:

# git describe
v3.3-rc1-51-gc999a22
# quilt applied
patches/v3-xfs-Fix-oops-on-IO-error-during-xlog_recover_process_iunlinks.patch
patches/1-4-xfs-remove-log-item-from-AIL-in-xfs_qm_dqflush-after-a-shutdown.patch
patches/2-4-xfs-remove-log-item-from-AIL-in-xfs_iflush-after-a-shutdown.patch
patches/3-4-xfs-allow-assigning-the-tail-lsn-with-the-AIL-lock-held.patch
patches/4-4-xfs-implement-freezing-by-emptying-the-AIL.patch
patches/3-8-xfs-initialise-xfssync-work-before-running-quotachecks-2.patch
patches/4-8-xfs-remove-MS_ACTIVE-guard-from-inode-reclaim-work.patch
patches/5-8-xfs-don-t-cache-inodes-read-through-bulkstat.patch
patches/6-8-xfs-Account-log-unmount-transaction-correctly.patch
patches/7-8-xfs-fix-fstrim-offset-calculations.patch
patches/8-8-xfs-add-lots-of-attribute-trace-points.patch

[21607.920067] BUG: unable to handle kernel paging request at ffffc9001c970010
[21607.924031] IP: [<ffffffffa04b06d0>] xlog_get_lowest_lsn+0x30/0x80 [xfs]
[21607.924031] PGD 22f05f067 PUD 22f060067 PMD 1d0cd3067 PTE 0
[21607.924031] Oops: 0000 [#1] SMP
[21607.924031] CPU 1
[21607.924031] Modules linked in: xfs(O) exportfs af_packet microcode fuse loop dm_mod e1000 tpm_tis shpchp iTCO_wdt tpm sg i2c_i801 iTCO_vendor_support i2c_core pci_hotplug sr_mod e752x
_edac pcspkr edac_core floppy intel_rng tpm_bios cdrom container serio_raw button uhci_hcd ehci_hcd sd_mod crc_t10dif usbcore usb_common edd ext3 mbcache jbd fan processor ide_pci_generi
c piix ide_core ata_generic ata_piix libata aic79xx scsi_transport_spi scsi_mod thermal thermal_sys hwmon [last unloaded: xfs]
[21607.924031]
[21607.924031] Pid: 4062, comm: kworker/1:0 Tainted: G           O 3.3.0-rc1-0.7-default+ #13 Supermicro X6DHR-8G/X6DHR-8GS/X6DHR-8G/X6DHR-8GS
[21607.924031] RIP: 0010:[<ffffffffa04b06d0>]  [<ffffffffa04b06d0>] xlog_get_lowest_lsn+0x30/0x80 [xfs]
[21607.924031] RSP: 0018:ffff8801f8917d10  EFLAGS: 00010246
[21607.924031] RAX: ffffc9001c970000 RBX: ffff8802204d84c0 RCX: ffff8802204d84c0
[21607.924031] RDX: ffff8802243a5cc0 RSI: 0000000000000000 RDI: 0000000000000000
[21607.924031] RBP: ffff8801f8917d10 R08: ffff8802204d8dc0 R09: 0000000000000000
[21607.924031] R10: 0000000000000000 R11: 0000000000000000 R12: ffff8802243a5c00
[21607.924031] R13: ffff8802204d84c0 R14: 0000000000000000 R15: ffff88022fc963d5
[21607.924031] FS:  0000000000000000(0000) GS:ffff88022fc80000(0000) knlGS:0000000000000000
[21607.924031] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b 
[21607.924031] CR2: ffffc9001c970010 CR3: 000000014e09e000 CR4: 00000000000006e0
[21607.924031] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[21607.924031] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[21607.924031] Process kworker/1:0 (pid: 4062, threadinfo ffff8801f8916000, task ffff880220fb60c0)
[21607.924031] Stack:
[21607.924031]  ffff8801f8917d90 ffffffffa04b0f46 0000000300000000 ffff8802204d84c0
[21607.924031]  ffff8802243a5c00 ffff8802204d8dc0 0000000000000286 0000000000000000
[21607.924031]  0000000100000000 ffff8802243a5ca8 ffff8802243a5cc0 ffff8802204d84c0
[21607.924031] Call Trace:
[21607.924031]  [<ffffffffa04b0f46>] xlog_state_do_callback+0xa6/0x3a0 [xfs]
[21607.924031]  [<ffffffffa04b1328>] xlog_state_done_syncing+0xe8/0x110 [xfs]
[21607.924031]  [<ffffffffa04b1b1e>] xlog_iodone+0x7e/0x100 [xfs]
[21607.924031]  [<ffffffffa0450d31>] xfs_buf_iodone_work+0x21/0x50 [xfs]
[21607.924031]  [<ffffffff8104b78b>] process_one_work+0x12b/0x3f0
[21607.924031]  [<ffffffffa0450d10>] ? xfs_buf_rele+0x180/0x180 [xfs]
[21607.924031]  [<ffffffff8104e333>] worker_thread+0x183/0x430
[21607.924031]  [<ffffffff8104e1b0>] ? manage_workers+0x210/0x210
[21607.924031]  [<ffffffff81052bee>] kthread+0x9e/0xb0
[21607.924031]  [<ffffffff813ab124>] kernel_thread_helper+0x4/0x10
[21607.924031]  [<ffffffff81052b50>] ? kthread_freezable_should_stop+0x70/0x70
[21607.924031]  [<ffffffff813ab120>] ? gs_change+0x13/0x13
[21607.924031] Code: 00 00 00 31 ff 48 89 e5 4c 89 c1 eb 0f 66 0f 1f 44 00 00 48 8b 49 30 49 39 c8 74 40 0f b7 41 5c a8 41 75 ef 48 8b 81 c8 00 00 00 <48> 8b 70 10 48 0f ce 48 85 f6 74 0
5 48 85 ff 74 14 48 89 f2 48
[21607.924031] RIP  [<ffffffffa04b06d0>] xlog_get_lowest_lsn+0x30/0x80 [xfs]
[21607.924031]  RSP <ffff8801f8917d10>
[21607.924031] CR2: ffffc9001c970010

0xffffffffa04b06a0 <xlog_get_lowest_lsn>:       push   %rbp
0xffffffffa04b06a1 <xlog_get_lowest_lsn+0x1>:   mov    0xa0(%rdi),%r8           * rdi is xlog_t, offset 0xa0 is l_iclog, lsn_log in r8
0xffffffffa04b06a8 <xlog_get_lowest_lsn+0x8>:   xor    %edi,%edi
0xffffffffa04b06aa <xlog_get_lowest_lsn+0xa>:   mov    %rsp,%rbp
0xffffffffa04b06ad <xlog_get_lowest_lsn+0xd>:   mov    %r8,%rcx                 * (xlog_in_core_t)lsn_log in rcx
0xffffffffa04b06b0 <xlog_get_lowest_lsn+0x10>:  jmp    0xffffffffa04b06c1 <xlog_get_lowest_lsn+0x21>
0xffffffffa04b06b2 <xlog_get_lowest_lsn+0x12>:  nopw   0x0(%rax,%rax,1)
0xffffffffa04b06b8 <xlog_get_lowest_lsn+0x18>:  mov    0x30(%rcx),%rcx          * [0x30] struct xlog_in_core *ic_next;  now in rcx
0xffffffffa04b06bc <xlog_get_lowest_lsn+0x1c>:  cmp    %rcx,%r8
0xffffffffa04b06bf <xlog_get_lowest_lsn+0x1f>:  je     0xffffffffa04b0701 <xlog_get_lowest_lsn+0x61>
0xffffffffa04b06c1 <xlog_get_lowest_lsn+0x21>:  movzwl 0x5c(%rcx),%eax          * [0x5c] short unsigned int ic_state;

#define XLOG_STATE_ACTIVE    0x0001 /* Current IC log being written to */
#define XLOG_STATE_DONE_SYNC 0x0008 /* Done syncing to disk */
#define XLOG_STATE_DIRTY     0x0040 /* Dirty IC log, not ready for ACTIVE status*/

0xffffffffa04b06c5 <xlog_get_lowest_lsn+0x25>:  test   $0x41,%al                * XLOG_STATE_ACTIVE|XLOG_STATE_SYNCING
0xffffffffa04b06c7 <xlog_get_lowest_lsn+0x27>:  jne    0xffffffffa04b06b8 <xlog_get_lowest_lsn+0x18>    * loop
0xffffffffa04b06c9 <xlog_get_lowest_lsn+0x29>:  mov    0xc8(%rcx),%rax          * [0xc8] xlog_in_core_2_t *ic_data;
0xffffffffa04b06d0 <xlog_get_lowest_lsn+0x30>:  mov    0x10(%rax),%rsi          * [0x10] __be64 h_lsn;

crash> xlog_in_core_t ffff8802204d84c0 | grep ic_state
  ic_state = 0x8, 		* XLOG_STATE_DONE_SYNC

It seems like we have an iclog with a bad ic_data pointer:

crash> xlog_rec_header_t ffffc9001c970000
struct xlog_rec_header_t struct: invalid kernel virtual address: ffffc9001c970000  type: "gdb_readmem_callback"
Cannot access memory at address 0xffffc9001c970000

A use after free?

Regards,
Ben

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH] xfs: use s_umount sema in xfs_sync_worker
  2012-03-23 17:43 BUG in xlog_get_lowest_lsn Ben Myers
@ 2012-05-14 20:34 ` Ben Myers
  2012-05-15 18:30   ` Mark Tinguely
  2012-05-16  1:56   ` Dave Chinner
  0 siblings, 2 replies; 27+ messages in thread
From: Ben Myers @ 2012-05-14 20:34 UTC (permalink / raw)
  To: xfs

I'm still hitting this on a regular basis.  Here is some analysis from a recent
crash dump which you may want to skip.  The fix is at the end.

PID: 3862   TASK: ffff88022175c640  CPU: 0   COMMAND: "262"

Hrm.  Looks like test 262 was running this time.

PID: 27544  TASK: ffff88013544e040  CPU: 3   COMMAND: "kworker/3:0"
 #0 [ffff88016fdff930] machine_kexec at ffffffff810244e9
 #1 [ffff88016fdff9a0] crash_kexec at ffffffff8108d053
 #2 [ffff88016fdffa70] oops_end at ffffffff813ad1b8
 #3 [ffff88016fdffaa0] no_context at ffffffff8102bd48
 #4 [ffff88016fdffaf0] __bad_area_nosemaphore at ffffffff8102c04d
 #5 [ffff88016fdffb40] bad_area_nosemaphore at ffffffff8102c12e
 #6 [ffff88016fdffb50] do_page_fault at ffffffff813afaee
 #7 [ffff88016fdffc60] page_fault at ffffffff813ac635
    [exception RIP: xlog_get_lowest_lsn+0x30]
    RIP: ffffffffa04a9910  RSP: ffff88016fdffd10  RFLAGS: 00010246
    RAX: ffffc90014e48000  RBX: ffff88014d879980  RCX: ffff88014d879980
    RDX: ffff8802214ee4c0  RSI: 0000000000000000  RDI: 0000000000000000
    RBP: ffff88016fdffd10   R8: ffff88014d879a80   R9: 0000000000000000
    R10: 0000000000000001  R11: 0000000000000000  R12: ffff8802214ee400
    R13: ffff88014d879980  R14: 0000000000000000  R15: ffff88022fd96605
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
 #8 [ffff88016fdffd18] xlog_state_do_callback at ffffffffa04aa186 [xfs]
 #9 [ffff88016fdffd98] xlog_state_done_syncing at ffffffffa04aa568 [xfs]
#10 [ffff88016fdffdc8] xlog_iodone at ffffffffa04aad6e [xfs]
#11 [ffff88016fdffdf8] xfs_buf_iodone_work at ffffffffa044a541 [xfs]
#12 [ffff88016fdffe18] process_one_work at ffffffff8104c53b
#13 [ffff88016fdffe68] worker_thread at ffffffff8104f0e3
#14 [ffff88016fdffee8] kthread at ffffffff8105395e
#15 [ffff88016fdfff48] kernel_thread_helper at ffffffff813b3ae4

Here's the guy who crashed the system.  Lets find the superblock:

Ok, walked from the xlog_in_core_t in rcx to ic_log (struct log) to l_mp
(xfs_mount 0xffff88013f8f9800) to get m_super (struct super_block), which is at
0xffff88021d1c2c00. 

struct xfs_mount {
	m_super = 0xffff88021d1c2c00,
	m_tid = 0x0,
	m_ail = 0xffff880149ea58c0,
	m_sb = {
	sb_magicnum = 0x58465342,	<--- yep, correct xfs super magic.

His s_id is /dev/sdb4 which is not currently mounted.  Looks like this is my
scratch filesystem, as /dev/sdb3 is mounted at /mnt/test.

PID: 4025   TASK: ffff880220870300  CPU: 1   COMMAND: "umount"
 #0 [ffff88022167ba58] __schedule at ffffffff813aacac
 #1 [ffff88022167bba0] schedule at ffffffff813ab0c4
 #2 [ffff88022167bbb0] schedule_timeout at ffffffff813a92d5
 #3 [ffff88022167bc50] wait_for_common at ffffffff813aa527
 #4 [ffff88022167bce0] wait_for_completion at ffffffff813aa6a8
 #5 [ffff88022167bcf0] wait_on_work at ffffffff8104d8b5
 #6 [ffff88022167bd90] __cancel_work_timer at ffffffff8104da2a
 #7 [ffff88022167bdd0] cancel_delayed_work_sync at ffffffff8104db2d
 #8 [ffff88022167bde0] xfs_syncd_stop at ffffffffa045aa78 [xfs]
 #9 [ffff88022167be00] xfs_fs_put_super at ffffffffa0459228 [xfs]
#10 [ffff88022167be20] generic_shutdown_super at ffffffff81110bdc
#11 [ffff88022167be50] kill_block_super at ffffffff81110c8c
#12 [ffff88022167be70] deactivate_locked_super at ffffffff811111f5
#13 [ffff88022167be90] deactivate_super at ffffffff81111f41
#14 [ffff88022167beb0] mntput_no_expire at ffffffff8112be1d
#15 [ffff88022167bef0] sys_umount at ffffffff8112c546
#16 [ffff88022167bf80] system_call_fastpath at ffffffff813b27f9
    RIP: 00007f789d7eb8e7  RSP: 00007fff1a744f60  RFLAGS: 00010246
    RAX: 00000000000000a6  RBX: ffffffff813b27f9  RCX: 00007fff1a7456c8
    RDX: 0000000000000000  RSI: 0000000000000000  RDI: 0000000000610940
    RBP: 0000000000610940   R8: 0000000000610960   R9: 0000000000000001
    R10: 0000000000000000  R11: 0000000000000246  R12: 00007fff1a7461ce
    R13: 0000000000610920  R14: 0000000000610060  R15: 00000000006109a0
    ORIG_RAX: 00000000000000a6  CS: 0033  SS: 002b

We have an unmount running.  Is it the same filesytem as the log crash?

crash> bt -f ffff880220870300
...
 #8 [ffff88022167bde0] xfs_syncd_stop at ffffffffa045aa78 [xfs]
    ffff88022167bde8: 0000000000000000 ffff88013f8f9800 <-------- xfs_mount
    ffff88022167bdf8: ffff88022167be18 ffffffffa0459228 

0xffffffffa045aa60 <xfs_syncd_stop>:	push   %rbp			*
0xffffffffa045aa61 <xfs_syncd_stop+0x1>:	mov    %rsp,%rbp
0xffffffffa045aa64 <xfs_syncd_stop+0x4>:	push   %rbx		*
0xffffffffa045aa65 <xfs_syncd_stop+0x5>:	mov    %rdi,%rbx
0xffffffffa045aa68 <xfs_syncd_stop+0x8>:	lea    0x370(%rdi),%rdi
0xffffffffa045aa6f <xfs_syncd_stop+0xf>:	sub    $0x8,%rsp

xfs_syncd_stop pushes rbp and rbx on to the stack...

 #9 [ffff88022167be00] xfs_fs_put_super at ffffffffa0459228 [xfs]
    ffff88022167be08: ffff880221a6e0a0 ffff88021d1c2c00 
    ffff88022167be18: ffff88022167be48 ffffffff81110bdc 

crash> dis xfs_fs_put_super
0xffffffffa0459200 <xfs_fs_put_super>:  push   %rbp
0xffffffffa0459201 <xfs_fs_put_super+0x1>:      mov    %rsp,%rbp
0xffffffffa0459204 <xfs_fs_put_super+0x4>:      push   %rbx
0xffffffffa0459205 <xfs_fs_put_super+0x5>:      sub    $0x8,%rsp
0xffffffffa0459209 <xfs_fs_put_super+0x9>:      mov    0x2f0(%rdi),%rbx		*

The superblock is in rdi,
      [0x2f0] void *s_fs_info;  

We stuff the xfs_mount in s_fs_info, so then the xfs_mount should be in rbx
when xfs_syncd_stop pushes rbx onto the stack.

So the xfs_mount should be at ffff88013f8f9800, the same filesystem as the task
that crashed the system.

PID: 28181  TASK: ffff880001764200  CPU: 3   COMMAND: "kworker/3:10"
 #0 [ffff880001767b40] __schedule at ffffffff813aacac
 #1 [ffff880001767c88] schedule at ffffffff813ab0c4
 #2 [ffff880001767c98] xlog_wait at ffffffffa04a9f0a [xfs]
 #3 [ffff880001767cf8] _xfs_log_force_lsn at ffffffffa04ac95e [xfs]
 #4 [ffff880001767d88] xfs_trans_commit at ffffffffa04a9083 [xfs]
 #5 [ffff880001767dd8] xfs_fs_log_dummy at ffffffffa044f15b [xfs]
 #6 [ffff880001767df8] xfs_sync_worker at ffffffffa045ac4c [xfs]
 #7 [ffff880001767e18] process_one_work at ffffffff8104c53b
 #8 [ffff880001767e68] worker_thread at ffffffff8104f0e3
 #9 [ffff880001767ee8] kthread at ffffffff8105395e
#10 [ffff880001767f48] kernel_thread_helper at ffffffff813b3ae4

Is this xlog_wait the same filesytem?

 #2 [ffff880001767c98] xlog_wait at ffffffffa04a9f0a [xfs]
    ffff880001767ca0: 0000000000000001 ffff880001764200
    ffff880001767cb0: ffffffff81061940 ffff88014d879988
    ffff880001767cc0: ffff88014d879988 ffffffffa04ac2db
    ffff880001767cd0: ffff88014d879980 000000010000003f
    ffff880001767ce0: ffff8802214ee400 ffff8802214ee4a8
    ffff880001767cf0: ffff880001767d80 ffffffffa04ac95e
 #3 [ffff880001767cf8] _xfs_log_force_lsn at ffffffffa04ac95e [xfs]
    ffff880001767d00: ffff880001764200 0000000000011da0
    ffff880001767d10: 0000000000000000 0000000100000001
    ffff880001767d20: 0000000000000001 ffff880001764200
    ffff880001767d30: ffffffff81061940 dead000000100100
    ffff880001767d40: dead000000200200 0000000000000000
    ffff880001767d50: ffff88013f8f9800 ffff880206a80ea8
    ffff880001767d60: 0000000000000000 0000000000000000
    ffff880001767d70: ffff88013f8f9800 0000000000000008
    ffff880001767d80: ffff880001767dd0 ffffffffa04a9083
 #4 [ffff880001767d88] xfs_trans_commit at ffffffffa04a9083 [xfs]
    ffff880001767d90: ffff880149d2d800 ffff88022fd96305
    ffff880001767da0: 0000000000000004 0000000000000000
    ffff880001767db0: ffff880206a80ea8 0000000000000000
			^^^^^ xfs_trans_t
    ffff880001767dc0: ffff88022fd8d400 ffff88022fd96305
    ffff880001767dd0: ffff880001767df0 ffffffffa044f15b
 #5 [ffff880001767dd8] xfs_fs_log_dummy at ffffffffa044f15b [xfs]
    ffff880001767de0: ffff88013f8f9b70 ffff88013f8f9800
    ffff880001767df0: ffff880001767e10 ffffffffa045ac4c

crash> dis xfs_fs_log_dummy
0xffffffffa044f100 <xfs_fs_log_dummy>:  push   %rbp
0xffffffffa044f101 <xfs_fs_log_dummy+0x1>:      mov    $0x1,%edx
0xffffffffa044f106 <xfs_fs_log_dummy+0x6>:      mov    $0x1b,%esi
0xffffffffa044f10b <xfs_fs_log_dummy+0xb>:      mov    %rsp,%rbp
0xffffffffa044f10e <xfs_fs_log_dummy+0xe>:      push   %r12
0xffffffffa044f110 <xfs_fs_log_dummy+0x10>:     push   %rbx
0xffffffffa044f111 <xfs_fs_log_dummy+0x11>:     mov    %rdi,%rbx
0xffffffffa044f114 <xfs_fs_log_dummy+0x14>:     callq  0xffffffffa04a7ba0 <_xfs_trans_alloc>	* (trans will be in rax)
0xffffffffa044f119 <xfs_fs_log_dummy+0x19>:     movzwl 0x7e(%rbx),%edx
0xffffffffa044f11d <xfs_fs_log_dummy+0x1d>:     xor    %r8d,%r8d
0xffffffffa044f120 <xfs_fs_log_dummy+0x20>:     xor    %ecx,%ecx
0xffffffffa044f122 <xfs_fs_log_dummy+0x22>:     xor    %esi,%esi
0xffffffffa044f124 <xfs_fs_log_dummy+0x24>:     mov    $0x1,%r9d
0xffffffffa044f12a <xfs_fs_log_dummy+0x2a>:     mov    %rax,%rdi
0xffffffffa044f12d <xfs_fs_log_dummy+0x2d>:     mov    %rax,%r12		*
0xffffffffa044f130 <xfs_fs_log_dummy+0x30>:     sub    $0xffffff80,%edx
0xffffffffa044f133 <xfs_fs_log_dummy+0x33>:     callq  0xffffffffa04a8050 <xfs_trans_reserve>
0xffffffffa044f138 <xfs_fs_log_dummy+0x38>:     test   %eax,%eax
0xffffffffa044f13a <xfs_fs_log_dummy+0x3a>:     mov    %eax,%ebx
0xffffffffa044f13c <xfs_fs_log_dummy+0x3c>:     jne    0xffffffffa044f168 <xfs_fs_log_dummy+0x68>
0xffffffffa044f13e <xfs_fs_log_dummy+0x3e>:     mov    %r12,%rdi		*
0xffffffffa044f141 <xfs_fs_log_dummy+0x41>:     mov    $0x20,%esi
0xffffffffa044f146 <xfs_fs_log_dummy+0x46>:     callq  0xffffffffa04a45b0 <xfs_mod_sb>
0xffffffffa044f14b <xfs_fs_log_dummy+0x4b>:     orl    $0x8,0x68(%r12)
0xffffffffa044f151 <xfs_fs_log_dummy+0x51>:     mov    %r12,%rdi
0xffffffffa044f154 <xfs_fs_log_dummy+0x54>:     xor    %esi,%esi
0xffffffffa044f156 <xfs_fs_log_dummy+0x56>:     callq  0xffffffffa04a8df0 <xfs_trans_commit>

Looks like the xfs_trans_t should be in r12.

crash> dis xfs_trans_commit
0xffffffffa04a8df0 <xfs_trans_commit>:  push   %rbp
0xffffffffa04a8df1 <xfs_trans_commit+0x1>:      mov    %rsp,%rbp		*
0xffffffffa04a8df4 <xfs_trans_commit+0x4>:      sub    $0x40,%rsp
0xffffffffa04a8df8 <xfs_trans_commit+0x8>:      mov    %rbx,-0x28(%rbp)
0xffffffffa04a8dfc <xfs_trans_commit+0xc>:      mov    %r12,-0x20(%rbp)		*

Which is put on the stack at -0x20 by xfs_trans_commit.
So that's ffff880206a80ea8.

crash> xfs_trans_t ffff880206a80ea8
struct xfs_trans_t {
  t_magic = 0x5452414e,
  t_logcb = {
    cb_next = 0x0,
    cb_func = 0,
    cb_arg = 0x0
  },
  t_type = 0x1b,
  t_log_res = 0x280,
  t_log_count = 0x1,
  t_blk_res = 0x0,
  t_blk_res_used = 0x0,
  t_rtx_res = 0x0,
  t_rtx_res_used = 0x0,
  t_ticket = 0xffff88020f721b50,
  t_lsn = 0x0,
  t_commit_lsn = 0x4,
  t_mountp = 0xffff88013f8f9800,		* xfs_mount_t

 #6 [ffff880001767df8] xfs_sync_worker at ffffffffa045ac4c [xfs]
    ffff880001767e00: ffff88013f8f9b70 ffff880221498ec0
    ffff880001767e10: ffff880001767e60 ffffffff8104c53b
 #7 [ffff880001767e18] process_one_work at ffffffff8104c53b

So that's the same xfs_mount as the one involved in the crash.  Looks like the
problem is that the sync worker is still running after the log has been torn
down, and it calls xfs_fs_log_dummy which generates log traffic.

/*
 * Every sync period we need to unpin all items, reclaim inodes and sync
 * disk quotas.  We might need to cover the log to indicate that the
 * filesystem is idle and not frozen.
 */
STATIC void
xfs_sync_worker(
        struct work_struct *work)
{
        struct xfs_mount *mp = container_of(to_delayed_work(work),
                                        struct xfs_mount, m_sync_work);
        int             error;

        /*
         * We shouldn't write/force the log if we are in the mount/unmount
         * process or on a read only filesystem. The workqueue still needs to be
         * active in both cases, however, because it is used for inode reclaim
         * during these times. hence use the MS_ACTIVE flag to avoid doing
         * anything in these periods.
         */
        if (!(mp->m_super->s_flags & MS_ACTIVE) &&		* 
            !(mp->m_flags & XFS_MOUNT_RDONLY)) {
                /* dgc: errors ignored here */
                if (mp->m_super->s_frozen == SB_UNFROZEN &&
                    xfs_log_need_covered(mp))
                        error = xfs_fs_log_dummy(mp);
                else
                        xfs_log_force(mp, 0);

                /* start pushing all the metadata that is currently dirty */
                xfs_ail_push_all(mp->m_ail);
        }

        /* queue us up again */
        xfs_syncd_queue_sync(mp);
}

Looks like checking for MS_ACTIVE is not adequate for preventing a race with
unmount.

Here is the fix.  I'll sign off on it once my testing is completed. -Ben

===================================================================

xfs: use s_umount sema in xfs_sync_worker

xfs_sync_worker checks the MS_ACTIVE flag in sb->s_flags to avoid doing work
during mount and unmount.  This flag can be cleared by unmount after the
xfs_sync_worker checks it but before the work is completed.  The has caused
crashes in the completion handler for the dummy transaction commited by
xfs_sync_worker:

PID: 27544  TASK: ffff88013544e040  CPU: 3   COMMAND: "kworker/3:0"
 #0 [ffff88016fdff930] machine_kexec at ffffffff810244e9
 #1 [ffff88016fdff9a0] crash_kexec at ffffffff8108d053
 #2 [ffff88016fdffa70] oops_end at ffffffff813ad1b8
 #3 [ffff88016fdffaa0] no_context at ffffffff8102bd48
 #4 [ffff88016fdffaf0] __bad_area_nosemaphore at ffffffff8102c04d
 #5 [ffff88016fdffb40] bad_area_nosemaphore at ffffffff8102c12e
 #6 [ffff88016fdffb50] do_page_fault at ffffffff813afaee
 #7 [ffff88016fdffc60] page_fault at ffffffff813ac635
    [exception RIP: xlog_get_lowest_lsn+0x30]
    RIP: ffffffffa04a9910  RSP: ffff88016fdffd10  RFLAGS: 00010246
    RAX: ffffc90014e48000  RBX: ffff88014d879980  RCX: ffff88014d879980
    RDX: ffff8802214ee4c0  RSI: 0000000000000000  RDI: 0000000000000000
    RBP: ffff88016fdffd10   R8: ffff88014d879a80   R9: 0000000000000000
    R10: 0000000000000001  R11: 0000000000000000  R12: ffff8802214ee400
    R13: ffff88014d879980  R14: 0000000000000000  R15: ffff88022fd96605
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
 #8 [ffff88016fdffd18] xlog_state_do_callback at ffffffffa04aa186 [xfs]
 #9 [ffff88016fdffd98] xlog_state_done_syncing at ffffffffa04aa568 [xfs]
#10 [ffff88016fdffdc8] xlog_iodone at ffffffffa04aad6e [xfs]
#11 [ffff88016fdffdf8] xfs_buf_iodone_work at ffffffffa044a541 [xfs]
#12 [ffff88016fdffe18] process_one_work at ffffffff8104c53b
#13 [ffff88016fdffe68] worker_thread at ffffffff8104f0e3
#14 [ffff88016fdffee8] kthread at ffffffff8105395e
#15 [ffff88016fdfff48] kernel_thread_helper at ffffffff813b3ae4

Protect xfs_sync_worker by using the s_umount semaphore at the read level to
provide exclusion with unmount while work is progressing.

Index: xfs/fs/xfs/xfs_sync.c
===================================================================
--- xfs.orig/fs/xfs/xfs_sync.c
+++ xfs/fs/xfs/xfs_sync.c
@@ -386,20 +386,23 @@ xfs_sync_worker(
 	 * We shouldn't write/force the log if we are in the mount/unmount
 	 * process or on a read only filesystem. The workqueue still needs to be
 	 * active in both cases, however, because it is used for inode reclaim
-	 * during these times. hence use the MS_ACTIVE flag to avoid doing
-	 * anything in these periods.
+	 * during these times.  Use the s_umount semaphore to provide exclusion
+	 * with unmount.
 	 */
-	if (!(mp->m_super->s_flags & MS_ACTIVE) &&
-	    !(mp->m_flags & XFS_MOUNT_RDONLY)) {
-		/* dgc: errors ignored here */
-		if (mp->m_super->s_frozen == SB_UNFROZEN &&
-		    xfs_log_need_covered(mp))
-			error = xfs_fs_log_dummy(mp);
-		else
-			xfs_log_force(mp, 0);
+	if (down_read_trylock(&mp->m_super->s_umount)) {
+		if (!(mp->m_flags & XFS_MOUNT_RDONLY)) {
+			/* dgc: errors ignored here */
+			if (mp->m_super->s_frozen == SB_UNFROZEN &&
+			    xfs_log_need_covered(mp))
+				error = xfs_fs_log_dummy(mp);
+			else
+				xfs_log_force(mp, 0);
 
-		/* start pushing all the metadata that is currently dirty */
-		xfs_ail_push_all(mp->m_ail);
+			/* start pushing all the metadata that is currently
+			 * dirty */
+			xfs_ail_push_all(mp->m_ail);
+		}
+		up_read(&mp->m_super->s_umount);
 	}
 
 	/* queue us up again */

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] xfs: use s_umount sema in xfs_sync_worker
  2012-05-14 20:34 ` [PATCH] xfs: use s_umount sema in xfs_sync_worker Ben Myers
@ 2012-05-15 18:30   ` Mark Tinguely
  2012-05-15 19:06     ` Ben Myers
  2012-05-16  1:56   ` Dave Chinner
  1 sibling, 1 reply; 27+ messages in thread
From: Mark Tinguely @ 2012-05-15 18:30 UTC (permalink / raw)
  To: Ben Myers; +Cc: xfs

On 05/14/12 15:34, Ben Myers wrote:

> Protect xfs_sync_worker by using the s_umount semaphore at the read level to
> provide exclusion with unmount while work is progressing.
>

> -			xfs_log_force(mp, 0);
> +	if (down_read_trylock(&mp->m_super->s_umount)) {

Okay, the s_umount has down_write() performed upon it when it is
created and does not do a up_write until the filesystem mount is
complete.

This test is just like the one in grab_super_passive(). The if
statements have to be separated to know to do the corresponding
up_read().

> +		if (!(mp->m_flags&  XFS_MOUNT_RDONLY)) {
...
> +		up_read(&mp->m_super->s_umount);

Looks good.

Reveiewed-by: Mark Tinguely <tinguely@sgi.com>

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] xfs: use s_umount sema in xfs_sync_worker
  2012-05-15 18:30   ` Mark Tinguely
@ 2012-05-15 19:06     ` Ben Myers
  0 siblings, 0 replies; 27+ messages in thread
From: Ben Myers @ 2012-05-15 19:06 UTC (permalink / raw)
  To: Mark Tinguely; +Cc: xfs

On Tue, May 15, 2012 at 01:30:46PM -0500, Mark Tinguely wrote:
> On 05/14/12 15:34, Ben Myers wrote:
> 
> >Protect xfs_sync_worker by using the s_umount semaphore at the read level to
> >provide exclusion with unmount while work is progressing.
> >
> 
> >-			xfs_log_force(mp, 0);
> >+	if (down_read_trylock(&mp->m_super->s_umount)) {
> 
> Okay, the s_umount has down_write() performed upon it when it is
> created and does not do a up_write until the filesystem mount is
> complete.
> 
> This test is just like the one in grab_super_passive(). The if
> statements have to be separated to know to do the corresponding
> up_read().
> 
> >+		if (!(mp->m_flags&  XFS_MOUNT_RDONLY)) {
> ...
> >+		up_read(&mp->m_super->s_umount);
> 
> Looks good.
> 
> Reveiewed-by: Mark Tinguely <tinguely@sgi.com>

Thanks Mark.  xfstests ran in a loop overnight and is still going, so I'm
satisfied.

Signed-off-by: Ben Myers <bpm@sgi.com>

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] xfs: use s_umount sema in xfs_sync_worker
  2012-05-14 20:34 ` [PATCH] xfs: use s_umount sema in xfs_sync_worker Ben Myers
  2012-05-15 18:30   ` Mark Tinguely
@ 2012-05-16  1:56   ` Dave Chinner
  2012-05-16 17:04     ` Ben Myers
  1 sibling, 1 reply; 27+ messages in thread
From: Dave Chinner @ 2012-05-16  1:56 UTC (permalink / raw)
  To: Ben Myers; +Cc: xfs

On Mon, May 14, 2012 at 03:34:49PM -0500, Ben Myers wrote:
> I'm still hitting this on a regular basis.  Here is some analysis from a recent
> crash dump which you may want to skip.  The fix is at the end.
....
> ===================================================================
> 
> xfs: use s_umount sema in xfs_sync_worker
> 
> xfs_sync_worker checks the MS_ACTIVE flag in sb->s_flags to avoid doing work
> during mount and unmount.  This flag can be cleared by unmount after the
> xfs_sync_worker checks it but before the work is completed.

Then there are problems all over the place in different filesystems
if the straight MS_ACTIVE check is not sufficient.

> Protect xfs_sync_worker by using the s_umount semaphore at the read level to
> provide exclusion with unmount while work is progressing.

I don't think that is the right fix for the given problem.

The problem is, as you've stated:

"Looks like the problem is that the sync worker is still running
after the log has been torn down, and it calls xfs_fs_log_dummy
which generates log traffic."

Why did we allow a new transaction to start while/after the log was
torn down? Isn't that the problem we need to fix because it leads to
invalid entries in the physical log that might cause recovery
failures? Further, any asynchronous worker thread that does
transactions could have this same problem regardless of whether we
are umounting or cleaning up after a failed mount, so it is not
unique to the xfs_sync_worker....

That is, if we've already started to tear down or torn down the log,
we must not allow new transactions to start. Likewise, we can't
finalise tear down the log until transactions in progress have
completed. Using the s_umount lock here avoids then race, but it
really is a VFS level lock not an filesystem level lock) and is,
IMO, just papering over the real problem....

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] xfs: use s_umount sema in xfs_sync_worker
  2012-05-16  1:56   ` Dave Chinner
@ 2012-05-16 17:04     ` Ben Myers
  2012-05-17  7:16       ` Dave Chinner
  0 siblings, 1 reply; 27+ messages in thread
From: Ben Myers @ 2012-05-16 17:04 UTC (permalink / raw)
  To: Dave Chinner; +Cc: xfs

Hey Dave,

On Wed, May 16, 2012 at 11:56:26AM +1000, Dave Chinner wrote:
> On Mon, May 14, 2012 at 03:34:49PM -0500, Ben Myers wrote:
> > I'm still hitting this on a regular basis.  Here is some analysis from a recent
> > crash dump which you may want to skip.  The fix is at the end.
> ....
> > ===================================================================
> > 
> > xfs: use s_umount sema in xfs_sync_worker
> > 
> > xfs_sync_worker checks the MS_ACTIVE flag in sb->s_flags to avoid doing work
> > during mount and unmount.  This flag can be cleared by unmount after the
> > xfs_sync_worker checks it but before the work is completed.
> 
> Then there are problems all over the place in different filesystems
> if the straight MS_ACTIVE check is not sufficient.

Eh, I won't speak to the problems in other filesystems.  ;)

MS_ACTIVE certainly isn't adequate in the case before us.

> > Protect xfs_sync_worker by using the s_umount semaphore at the read level to
> > provide exclusion with unmount while work is progressing.
> 
> I don't think that is the right fix for the given problem.
> 
> The problem is, as you've stated:
> 
> "Looks like the problem is that the sync worker is still running
> after the log has been torn down, and it calls xfs_fs_log_dummy
> which generates log traffic."

That's one problem, but we also want to protect against running this code at
mount time.  s_umount sema is the tool that can do both.  Maybe there are some
other options.

> Why did we allow a new transaction to start while/after the log was
> torn down?

> Isn't that the problem we need to fix because it leads to
> invalid entries in the physical log that might cause recovery
> failures?

> Further, any asynchronous worker thread that does
> transactions could have this same problem regardless of whether we
> are umounting or cleaning up after a failed mount, so it is not
> unique to the xfs_sync_worker....

> That is, if we've already started to tear down or torn down the log,
> we must not allow new transactions to start. Likewise, we can't
> finalise tear down the log until transactions in progress have
> completed. Using the s_umount lock here avoids then race, but it
> really is a VFS level lock not an filesystem level lock) and is,
> IMO, just papering over the real problem....

I think you have a good point here, but this isn't limited to transactions.  We
shouldn't even call xfs_log_need_covered without some protection from
xfs_fs_put_super; xfs_fs_writable doesn't cut the mustard.   I'd better have a
look at the other workqueues too.  Thanks for pointing this out.

Regards,
	Ben

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] xfs: use s_umount sema in xfs_sync_worker
  2012-05-16 17:04     ` Ben Myers
@ 2012-05-17  7:16       ` Dave Chinner
  2012-05-23  9:02         ` Dave Chinner
  2012-05-24 22:39         ` Ben Myers
  0 siblings, 2 replies; 27+ messages in thread
From: Dave Chinner @ 2012-05-17  7:16 UTC (permalink / raw)
  To: Ben Myers; +Cc: xfs

On Wed, May 16, 2012 at 12:04:03PM -0500, Ben Myers wrote:
> On Wed, May 16, 2012 at 11:56:26AM +1000, Dave Chinner wrote:
> > On Mon, May 14, 2012 at 03:34:49PM -0500, Ben Myers wrote:
> > > I'm still hitting this on a regular basis.  Here is some analysis from a recent
> > > crash dump which you may want to skip.  The fix is at the end.
> > ....
> > > ===================================================================
> > > 
> > > xfs: use s_umount sema in xfs_sync_worker
> > > 
> > > xfs_sync_worker checks the MS_ACTIVE flag in sb->s_flags to avoid doing work
> > > during mount and unmount.  This flag can be cleared by unmount after the
> > > xfs_sync_worker checks it but before the work is completed.
> > 
> > Then there are problems all over the place in different filesystems
> > if the straight MS_ACTIVE check is not sufficient.
> 
> Eh, I won't speak to the problems in other filesystems.  ;)
> 
> MS_ACTIVE certainly isn't adequate in the case before us.

Only because the XFS code isn't internally safe...

> > > Protect xfs_sync_worker by using the s_umount semaphore at the read level to
> > > provide exclusion with unmount while work is progressing.
> > 
> > I don't think that is the right fix for the given problem.
> > 
> > The problem is, as you've stated:
> > 
> > "Looks like the problem is that the sync worker is still running
> > after the log has been torn down, and it calls xfs_fs_log_dummy
> > which generates log traffic."
> 
> That's one problem, but we also want to protect against running this code at
> mount time.  s_umount sema is the tool that can do both.  Maybe there are some
> other options.

The mount case isn't a problem - MS_ACTIVE is only set once the
mount process is complete and so the check is just fine for that
case.

> > Why did we allow a new transaction to start while/after the log was
> > torn down?
> 
> > Isn't that the problem we need to fix because it leads to
> > invalid entries in the physical log that might cause recovery
> > failures?
> 
> > Further, any asynchronous worker thread that does
> > transactions could have this same problem regardless of whether we
> > are umounting or cleaning up after a failed mount, so it is not
> > unique to the xfs_sync_worker....
> 
> > That is, if we've already started to tear down or torn down the log,
> > we must not allow new transactions to start. Likewise, we can't
> > finalise tear down the log until transactions in progress have
> > completed. Using the s_umount lock here avoids then race, but it
> > really is a VFS level lock not an filesystem level lock) and is,
> > IMO, just papering over the real problem....
> 
> I think you have a good point here, but this isn't limited to transactions.

I'm pretty sure that the other async threads/wqs are safe w.r.t.
startup and shutdown. Indeed, some of them have to run while
mount/unmount are running so must have their own startup/shutdown
synchronisation. Hence we need to treat each different async work on
a case by case basis...

> We shouldn't even call xfs_log_need_covered without some
> protection from xfs_fs_put_super; xfs_fs_writable doesn't cut the
> mustard.

Sure, but that check is for a different purpose so it's no surprise
at all that it doesn't help this case.

AFAICT, we simply don't care if xfs_fs_put_super() has been called -
what matters is the state of the log at the time
xfs_log_need_covered() is called.  i.e. xfs_log_need_covered()
should error out if the log is being recovered or being torn down.
Same for xfs_log_force(), xfs_log_reserve(), etc. Right now we don't
even check if we can do the operation safely or not.

Right now we assume that new transactions or log operations cannot
occur (i.e they are stopped) before we tear down the log.  That
means we either need to prevent transactions from starting once we
start tearing down the log, or we need to stop the xfs_sync_worker
before we tear down the log. The second is how we currently avoid
this problem, so adding a cancel_delayed_work_sync(&mp->m_sync_work)
to xfs_log_unmount() would probably solve the problem....

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] xfs: use s_umount sema in xfs_sync_worker
  2012-05-17  7:16       ` Dave Chinner
@ 2012-05-23  9:02         ` Dave Chinner
  2012-05-23 16:45           ` Ben Myers
  2012-05-24 22:39         ` Ben Myers
  1 sibling, 1 reply; 27+ messages in thread
From: Dave Chinner @ 2012-05-23  9:02 UTC (permalink / raw)
  To: Ben Myers; +Cc: xfs

On Thu, May 17, 2012 at 05:16:58PM +1000, Dave Chinner wrote:
> On Wed, May 16, 2012 at 12:04:03PM -0500, Ben Myers wrote:
> > On Wed, May 16, 2012 at 11:56:26AM +1000, Dave Chinner wrote:
> > > On Mon, May 14, 2012 at 03:34:49PM -0500, Ben Myers wrote:
> > > > I'm still hitting this on a regular basis.  Here is some analysis from a recent
> > > > crash dump which you may want to skip.  The fix is at the end.
> > > ....
> > > > ===================================================================
> > > > 
> > > > xfs: use s_umount sema in xfs_sync_worker
> > > > 
> > > > xfs_sync_worker checks the MS_ACTIVE flag in sb->s_flags to avoid doing work
> > > > during mount and unmount.  This flag can be cleared by unmount after the
> > > > xfs_sync_worker checks it but before the work is completed.
> > > 
> > > Then there are problems all over the place in different filesystems
> > > if the straight MS_ACTIVE check is not sufficient.
> > 
> > Eh, I won't speak to the problems in other filesystems.  ;)
> > 
> > MS_ACTIVE certainly isn't adequate in the case before us.
> 
> Only because the XFS code isn't internally safe...
> 
> > > > Protect xfs_sync_worker by using the s_umount semaphore at the read level to
> > > > provide exclusion with unmount while work is progressing.
> > > 
> > > I don't think that is the right fix for the given problem.
> > > 
> > > The problem is, as you've stated:
> > > 
> > > "Looks like the problem is that the sync worker is still running
> > > after the log has been torn down, and it calls xfs_fs_log_dummy
> > > which generates log traffic."
> > 
> > That's one problem, but we also want to protect against running this code at
> > mount time.  s_umount sema is the tool that can do both.  Maybe there are some
> > other options.
> 
> The mount case isn't a problem - MS_ACTIVE is only set once the
> mount process is complete and so the check is just fine for that
> case.
> 
> > > Why did we allow a new transaction to start while/after the log was
> > > torn down?
> > 
> > > Isn't that the problem we need to fix because it leads to
> > > invalid entries in the physical log that might cause recovery
> > > failures?
> > 
> > > Further, any asynchronous worker thread that does
> > > transactions could have this same problem regardless of whether we
> > > are umounting or cleaning up after a failed mount, so it is not
> > > unique to the xfs_sync_worker....
> > 
> > > That is, if we've already started to tear down or torn down the log,
> > > we must not allow new transactions to start. Likewise, we can't
> > > finalise tear down the log until transactions in progress have
> > > completed. Using the s_umount lock here avoids then race, but it
> > > really is a VFS level lock not an filesystem level lock) and is,
> > > IMO, just papering over the real problem....
> > 
> > I think you have a good point here, but this isn't limited to transactions.
> 
> I'm pretty sure that the other async threads/wqs are safe w.r.t.
> startup and shutdown. Indeed, some of them have to run while
> mount/unmount are running so must have their own startup/shutdown
> synchronisation. Hence we need to treat each different async work on
> a case by case basis...
> 
> > We shouldn't even call xfs_log_need_covered without some
> > protection from xfs_fs_put_super; xfs_fs_writable doesn't cut the
> > mustard.
> 
> Sure, but that check is for a different purpose so it's no surprise
> at all that it doesn't help this case.
> 
> AFAICT, we simply don't care if xfs_fs_put_super() has been called -
> what matters is the state of the log at the time
> xfs_log_need_covered() is called.  i.e. xfs_log_need_covered()
> should error out if the log is being recovered or being torn down.
> Same for xfs_log_force(), xfs_log_reserve(), etc. Right now we don't
> even check if we can do the operation safely or not.
> 
> Right now we assume that new transactions or log operations cannot
> occur (i.e they are stopped) before we tear down the log.  That
> means we either need to prevent transactions from starting once we
> start tearing down the log, or we need to stop the xfs_sync_worker
> before we tear down the log. The second is how we currently avoid
> this problem, so adding a cancel_delayed_work_sync(&mp->m_sync_work)
> to xfs_log_unmount() would probably solve the problem....

<sigh>

So, i just found that this fix was checked in. the date stamps are:

	March 23 - Ben reports problem

	(silence for almost 2 months)

	May 14, 15:30 CT - Ben presents RFC fix without sign-off
	May 15, 13:30 CT - Mark reviews
	May 15, 14:27 CT - Commit to oss-xfs master branch
	May 15, 14:30 CT - Ben adds sign-off

Effectively that gave everyone on the list less than 24 hours to
review and comment on a bug analysis and RFC patch before the fix
hit the master branch.

When I commented on it:

	May 16, 11:15 AEST - Dave says "doesn't look like the right
			     way to fix the problem"

I didn't realise that it had already been checked into the master
branch. I thought it was still an RFC waiting for review.

I think I've got good reason for not catching that - I'd only just
arrived back home with my dog after she had surgery to repair the
thigh bone she shattered into 15 pieces the previous week - and so
merge emails were way below my threshold of interest.  Hell, I think
that commenting on it in under 36 hours is pretty good considering
all the other shit I had to deal with when you posted the RFC.

So, fast forward to today when I'm trying to get my trees up to
date, I find that a patch in my stack doesn't apply because of a
conflict in xfs_sync_worker. That's when I discover this has already
been committed, and committed less than 24 hours after the RFC was
posted, and more than 12 hours before I actually commented on it.

Ben, you just can't do that. Well, you can, but it's certainly going
to annoy people. I'm grumpy enough as it is right now without having
to deal with additional crap like this.

Especially as this is about code that was posted as "I'm not sure
this is correct, so I haven't added a sign-off". Having it pass
xfstests for a day does not make the code more correct, and such a
statement means the codeneeds discussion before progressing.

You have to give everyone time to review RFC patches before
committing them. Just because you think they are OK or the survive
xfstests doesn't mean that they really are OK. indeed, my biggest
pet peeve is not being given the opportunity to review code before
it is committed, and this falls right into that territory...

As it is, I'm going to ask that you revert commit 1307bbd2a and fix
the problem without using the s_umount lock. I have mentioned a way
that it should be possible to do, so I'd like you to explore that
rather than ignoring/not responding to my comments because you've
already committed your fix.

Basically what I am saying is this: Ben, please don't abuse your
position as XFS maintainer to push fixes you think are OK without
giving everyone else a chance to review them first.  We have a
review process that works only as long as everyone plays by the same
rules....

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] xfs: use s_umount sema in xfs_sync_worker
  2012-05-23  9:02         ` Dave Chinner
@ 2012-05-23 16:45           ` Ben Myers
  0 siblings, 0 replies; 27+ messages in thread
From: Ben Myers @ 2012-05-23 16:45 UTC (permalink / raw)
  To: Dave Chinner; +Cc: xfs

Hey Dave,

On Wed, May 23, 2012 at 07:02:25PM +1000, Dave Chinner wrote:
> On Thu, May 17, 2012 at 05:16:58PM +1000, Dave Chinner wrote:
> > On Wed, May 16, 2012 at 12:04:03PM -0500, Ben Myers wrote:
> > > On Wed, May 16, 2012 at 11:56:26AM +1000, Dave Chinner wrote:
> > > > On Mon, May 14, 2012 at 03:34:49PM -0500, Ben Myers wrote:
> > > > > I'm still hitting this on a regular basis.  Here is some analysis from a recent
> > > > > crash dump which you may want to skip.  The fix is at the end.
> > > > ....
> > > > > ===================================================================
> > > > > 
> > > > > xfs: use s_umount sema in xfs_sync_worker
> > > > > 
> > > > > xfs_sync_worker checks the MS_ACTIVE flag in sb->s_flags to avoid doing work
> > > > > during mount and unmount.  This flag can be cleared by unmount after the
> > > > > xfs_sync_worker checks it but before the work is completed.
> > > > 
> > > > Then there are problems all over the place in different filesystems
> > > > if the straight MS_ACTIVE check is not sufficient.
> > > 
> > > Eh, I won't speak to the problems in other filesystems.  ;)
> > > 
> > > MS_ACTIVE certainly isn't adequate in the case before us.
> > 
> > Only because the XFS code isn't internally safe...
> > 
> > > > > Protect xfs_sync_worker by using the s_umount semaphore at the read level to
> > > > > provide exclusion with unmount while work is progressing.
> > > > 
> > > > I don't think that is the right fix for the given problem.
> > > > 
> > > > The problem is, as you've stated:
> > > > 
> > > > "Looks like the problem is that the sync worker is still running
> > > > after the log has been torn down, and it calls xfs_fs_log_dummy
> > > > which generates log traffic."
> > > 
> > > That's one problem, but we also want to protect against running this code at
> > > mount time.  s_umount sema is the tool that can do both.  Maybe there are some
> > > other options.
> > 
> > The mount case isn't a problem - MS_ACTIVE is only set once the
> > mount process is complete and so the check is just fine for that
> > case.
> > 
> > > > Why did we allow a new transaction to start while/after the log was
> > > > torn down?
> > > 
> > > > Isn't that the problem we need to fix because it leads to
> > > > invalid entries in the physical log that might cause recovery
> > > > failures?
> > > 
> > > > Further, any asynchronous worker thread that does
> > > > transactions could have this same problem regardless of whether we
> > > > are umounting or cleaning up after a failed mount, so it is not
> > > > unique to the xfs_sync_worker....
> > > 
> > > > That is, if we've already started to tear down or torn down the log,
> > > > we must not allow new transactions to start. Likewise, we can't
> > > > finalise tear down the log until transactions in progress have
> > > > completed. Using the s_umount lock here avoids then race, but it
> > > > really is a VFS level lock not an filesystem level lock) and is,
> > > > IMO, just papering over the real problem....
> > > 
> > > I think you have a good point here, but this isn't limited to transactions.
> > 
> > I'm pretty sure that the other async threads/wqs are safe w.r.t.
> > startup and shutdown. Indeed, some of them have to run while
> > mount/unmount are running so must have their own startup/shutdown
> > synchronisation. Hence we need to treat each different async work on
> > a case by case basis...
> > 
> > > We shouldn't even call xfs_log_need_covered without some
> > > protection from xfs_fs_put_super; xfs_fs_writable doesn't cut the
> > > mustard.
> > 
> > Sure, but that check is for a different purpose so it's no surprise
> > at all that it doesn't help this case.
> > 
> > AFAICT, we simply don't care if xfs_fs_put_super() has been called -
> > what matters is the state of the log at the time
> > xfs_log_need_covered() is called.  i.e. xfs_log_need_covered()
> > should error out if the log is being recovered or being torn down.
> > Same for xfs_log_force(), xfs_log_reserve(), etc. Right now we don't
> > even check if we can do the operation safely or not.
> > 
> > Right now we assume that new transactions or log operations cannot
> > occur (i.e they are stopped) before we tear down the log.  That
> > means we either need to prevent transactions from starting once we
> > start tearing down the log, or we need to stop the xfs_sync_worker
> > before we tear down the log. The second is how we currently avoid
> > this problem, so adding a cancel_delayed_work_sync(&mp->m_sync_work)
> > to xfs_log_unmount() would probably solve the problem....
> 
> <sigh>
> 
> So, i just found that this fix was checked in. the date stamps are:
> 
> 	March 23 - Ben reports problem
> 
> 	(silence for almost 2 months)
> 
> 	May 14, 15:30 CT - Ben presents RFC fix without sign-off
> 	May 15, 13:30 CT - Mark reviews
> 	May 15, 14:27 CT - Commit to oss-xfs master branch
> 	May 15, 14:30 CT - Ben adds sign-off
> 
> Effectively that gave everyone on the list less than 24 hours to
> review and comment on a bug analysis and RFC patch before the fix
> hit the master branch.
> 
> When I commented on it:
> 
> 	May 16, 11:15 AEST - Dave says "doesn't look like the right
> 			     way to fix the problem"
> 
> I didn't realise that it had already been checked into the master
> branch. I thought it was still an RFC waiting for review.
> 
> I think I've got good reason for not catching that - I'd only just
> arrived back home with my dog after she had surgery to repair the
> thigh bone she shattered into 15 pieces the previous week - and so
> merge emails were way below my threshold of interest.  Hell, I think
> that commenting on it in under 36 hours is pretty good considering
> all the other shit I had to deal with when you posted the RFC.
> 
> So, fast forward to today when I'm trying to get my trees up to
> date, I find that a patch in my stack doesn't apply because of a
> conflict in xfs_sync_worker. That's when I discover this has already
> been committed, and committed less than 24 hours after the RFC was
> posted, and more than 12 hours before I actually commented on it.
> 
> Ben, you just can't do that. Well, you can, but it's certainly going
> to annoy people. I'm grumpy enough as it is right now without having
> to deal with additional crap like this.
> 
> Especially as this is about code that was posted as "I'm not sure
> this is correct, so I haven't added a sign-off".

It was posted as 'I will add my sign-off once my testing is complete.', without
RFC in the subject line..  It was not an RFC.

> Having it pass
> xfstests for a day does not make the code more correct, and such a
> statement means the codeneeds discussion before progressing.
> 
> You have to give everyone time to review RFC patches before
> committing them. Just because you think they are OK or the survive
> xfstests doesn't mean that they really are OK. indeed, my biggest
> pet peeve is not being given the opportunity to review code before
> it is committed, and this falls right into that territory...
> 
> As it is, I'm going to ask that you revert commit 1307bbd2a and fix
> the problem without using the s_umount lock. I have mentioned a way
> that it should be possible to do, so I'd like you to explore that
> rather than ignoring/not responding to my comments because you've
> already committed your fix.
>
> Basically what I am saying is this: Ben, please don't abuse your
> position as XFS maintainer to push fixes you think are OK without
> giving everyone else a chance to review them first.  We have a
> review process that works only as long as everyone plays by the same
> rules....

This was not an RFC, and no abuse has occured.  I posted it (withholding my
signoff until testing was completed) with a very clear explanation and full
analysis, then Mark reviewed it, my testing completed to my satisfaction, I
signed off, and pulled it in.  Only later did you comment on it.  I'm sorry
that you feel you didn't have enough time for comment, but this clearly wasn't
marked RFC, I had added my signoff, and it had been reviewed prior to commit.

This regression and crash was caused by your patch, 8a00ebe4 "xfs: Ensure inode
reclaim can run during quotacheck".  As you pointed out, this regression was in
our master branch for over a month, and had been reported for two.  You didn't
seem to have any interest in working on it.  This crash adversely affected
SGI's ability to test this release.  Fixing this made it up to #1 on my list
for 3.5 for this reason, and with less than a week before the merge window this
needed to be fixed so that we can have some quality test exposure for the rest
of the code prior to the merge window.  I'm happy to say that now testing is
going much better.

The quality of our master branch prior to the 3.5 merge window is a top
priority.  If I were to revert this commit v3.5 testers running xfs will hit
this crash needlessly.  I won't do that to them just because you have some
suggestions for an alternate fix.  I do value your comments and will be happy
to address them when I'm able.  In the meantime, if you would prefer to fix the
crash in a different way, please post the code.  No problem!  ;)

Anyway... sorry to hear about your dog.  I hope she is mending quickly.

Regards,
Ben

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] xfs: use s_umount sema in xfs_sync_worker
  2012-05-17  7:16       ` Dave Chinner
  2012-05-23  9:02         ` Dave Chinner
@ 2012-05-24 22:39         ` Ben Myers
  2012-05-25 20:45           ` [PATCH] xfs: shutdown xfs_sync_worker before the log Ben Myers
  1 sibling, 1 reply; 27+ messages in thread
From: Ben Myers @ 2012-05-24 22:39 UTC (permalink / raw)
  To: Dave Chinner; +Cc: xfs

Hey Dave,

On Thu, May 17, 2012 at 05:16:58PM +1000, Dave Chinner wrote:
> On Wed, May 16, 2012 at 12:04:03PM -0500, Ben Myers wrote:
> > I think you have a good point here, but this isn't limited to transactions.
> 
> I'm pretty sure that the other async threads/wqs are safe w.r.t.
> startup and shutdown. Indeed, some of them have to run while
> mount/unmount are running so must have their own startup/shutdown
> synchronisation. Hence we need to treat each different async work on
> a case by case basis...

I'd still like to understand them all... and there are plenty of them.

> > We shouldn't even call xfs_log_need_covered without some
> > protection from xfs_fs_put_super; xfs_fs_writable doesn't cut the
> > mustard.
> 
> Sure, but that check is for a different purpose so it's no surprise
> at all that it doesn't help this case.

Yep.

> AFAICT, we simply don't care if xfs_fs_put_super() has been called -
> what matters is the state of the log at the time
> xfs_log_need_covered() is called.  i.e. xfs_log_need_covered()
> should error out if the log is being recovered or being torn down.
> Same for xfs_log_force(), xfs_log_reserve(), etc. Right now we don't
> even check if we can do the operation safely or not.

Erroring out is one option, but I suggest that we shouldn't necessarily
characterize a torn down log as an error, unless we plan to have some other
means of synchronization.  If we have some other means of synch, then it's fine
to warn or assert or error on these when the log is already torn down.

> Right now we assume that new transactions or log operations cannot
> occur (i.e they are stopped) before we tear down the log.  That
> means we either need to prevent transactions from starting once we
> start tearing down the log, or we need to stop the xfs_sync_worker
> before we tear down the log. The second is how we currently avoid
> this problem, so adding a cancel_delayed_work_sync(&mp->m_sync_work)
> to xfs_log_unmount() would probably solve the problem....

cancel_delayed_work_sync waits for all workers to finish before returning, so
that should work fine.  Is there any point in covering the log after the
unmount record has been written?  We should cancel the work before writing that
record.  Hrm.  Maybe xfs_log_need_covered handles that already..  Anyway, I'll
make some time to work on this tomorrow so I can test it over the weekend.

Regards,
	Ben

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH] xfs:  shutdown xfs_sync_worker before the log
  2012-05-24 22:39         ` Ben Myers
@ 2012-05-25 20:45           ` Ben Myers
  2012-05-29 15:07             ` Ben Myers
                               ` (3 more replies)
  0 siblings, 4 replies; 27+ messages in thread
From: Ben Myers @ 2012-05-25 20:45 UTC (permalink / raw)
  To: Dave Chinner; +Cc: xfs

Hey Dave,

On Thu, May 24, 2012 at 05:39:52PM -0500, Ben Myers wrote:
> Anyway, I'll make some time to work on this tomorrow so I can test it over
> the weekend.

This is going to spin over the weekend.  See what you think.

-----------

xfs:  shutdown xfs_sync_worker before the log

Revert commit 1307bbd, which uses the s_umount semaphore to provide
exclusion between xfs_sync_worker and unmount, in favor of shutting down
the sync worker before freeing the log in xfs_log_unmount.  This is a
cleaner way of resolving the race between xfs_sync_worker and unmount
than using s_umount.

Index: xfs/fs/xfs/xfs_log.c
===================================================================
--- xfs.orig/fs/xfs/xfs_log.c
+++ xfs/fs/xfs/xfs_log.c
@@ -810,6 +810,7 @@ xfs_log_unmount_write(xfs_mount_t *mp)
 void
 xfs_log_unmount(xfs_mount_t *mp)
 {
+	cancel_delayed_work_sync(&mp->m_sync_work);
 	xfs_trans_ail_destroy(mp);
 	xlog_dealloc_log(mp->m_log);
 }
Index: xfs/fs/xfs/xfs_sync.c
===================================================================
--- xfs.orig/fs/xfs/xfs_sync.c
+++ xfs/fs/xfs/xfs_sync.c
@@ -386,23 +386,23 @@ xfs_sync_worker(
 	 * We shouldn't write/force the log if we are in the mount/unmount
 	 * process or on a read only filesystem. The workqueue still needs to be
 	 * active in both cases, however, because it is used for inode reclaim
-	 * during these times.  Use the s_umount semaphore to provide exclusion
-	 * with unmount.
+	 * during these times.  Use the MS_ACTIVE flag to avoid doing anything
+	 * during mount.  Doing work during unmount is avoided by calling
+	 * cancel_delayed_work_sync on this work queue before tearing down
+	 * the ail and the log in xfs_log_unmount.
 	 */
-	if (down_read_trylock(&mp->m_super->s_umount)) {
-		if (!(mp->m_flags & XFS_MOUNT_RDONLY)) {
-			/* dgc: errors ignored here */
-			if (mp->m_super->s_frozen == SB_UNFROZEN &&
-			    xfs_log_need_covered(mp))
-				error = xfs_fs_log_dummy(mp);
-			else
-				xfs_log_force(mp, 0);
+	if (!(mp->m_super->s_flags & MS_ACTIVE) &&
+	    !(mp->m_flags & XFS_MOUNT_RDONLY)) {
+		/* dgc: errors ignored here */
+		if (mp->m_super->s_frozen == SB_UNFROZEN &&
+		    xfs_log_need_covered(mp))
+			error = xfs_fs_log_dummy(mp);
+		else
+			xfs_log_force(mp, 0);
 
-			/* start pushing all the metadata that is currently
-			 * dirty */
-			xfs_ail_push_all(mp->m_ail);
-		}
-		up_read(&mp->m_super->s_umount);
+		/* start pushing all the metadata that is currently
+		 * dirty */
+		xfs_ail_push_all(mp->m_ail);
 	}
 
 	/* queue us up again */

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] xfs:  shutdown xfs_sync_worker before the log
  2012-05-25 20:45           ` [PATCH] xfs: shutdown xfs_sync_worker before the log Ben Myers
@ 2012-05-29 15:07             ` Ben Myers
  2012-05-29 15:36               ` Brian Foster
  2012-05-31 16:23             ` Mark Tinguely
                               ` (2 subsequent siblings)
  3 siblings, 1 reply; 27+ messages in thread
From: Ben Myers @ 2012-05-29 15:07 UTC (permalink / raw)
  To: Dave Chinner; +Cc: xfs

On Fri, May 25, 2012 at 03:45:36PM -0500, Ben Myers wrote:
> On Thu, May 24, 2012 at 05:39:52PM -0500, Ben Myers wrote:
> > Anyway, I'll make some time to work on this tomorrow so I can test it over
> > the weekend.
> 
> This is going to spin over the weekend.  See what you think.

I'm reasonably satisfied with the test results over the weekend.  I did end up
hitting an unrelated assert:

[137111.090394] XFS: Assertion failed: (iflags & BMV_IF_DELALLOC) != 0, file: /root/xfs/fs/xfs/xfs_bmap.c, line: 5635

Signed-off-by: Ben Myers <bpm@sgi.com>

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] xfs:  shutdown xfs_sync_worker before the log
  2012-05-29 15:07             ` Ben Myers
@ 2012-05-29 15:36               ` Brian Foster
  2012-05-29 17:04                 ` Ben Myers
  0 siblings, 1 reply; 27+ messages in thread
From: Brian Foster @ 2012-05-29 15:36 UTC (permalink / raw)
  To: Ben Myers; +Cc: xfs

On 05/29/2012 11:07 AM, Ben Myers wrote:
> On Fri, May 25, 2012 at 03:45:36PM -0500, Ben Myers wrote:
>> On Thu, May 24, 2012 at 05:39:52PM -0500, Ben Myers wrote:
>>> Anyway, I'll make some time to work on this tomorrow so I can test it over
>>> the weekend.
>>
>> This is going to spin over the weekend.  See what you think.
> 
> I'm reasonably satisfied with the test results over the weekend.  I did end up
> hitting an unrelated assert:
> 

I started testing the xfsaild idle patch based against the xfs tree over
the weekend (after testing successfully against Linus' tree for several
days) and reproduced the xfs_sync_worker() hang that Mark alerted me to
last week. I was considering doing a bisect in that tree since it
doesn't occur in Linus' tree, but it sounds like I can pull this patch
now and shouldn't expect to reproduce the sync_worker() hang either,
correct? Thanks.

Brian

> [137111.090394] XFS: Assertion failed: (iflags & BMV_IF_DELALLOC) != 0, file: /root/xfs/fs/xfs/xfs_bmap.c, line: 5635
> 
> Signed-off-by: Ben Myers <bpm@sgi.com>
> 
> _______________________________________________
> xfs mailing list
> xfs@oss.sgi.com
> http://oss.sgi.com/mailman/listinfo/xfs

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] xfs:  shutdown xfs_sync_worker before the log
  2012-05-29 15:36               ` Brian Foster
@ 2012-05-29 17:04                 ` Ben Myers
  2012-05-29 17:54                   ` Brian Foster
  0 siblings, 1 reply; 27+ messages in thread
From: Ben Myers @ 2012-05-29 17:04 UTC (permalink / raw)
  To: Brian Foster; +Cc: xfs

Hey Brian,

On Tue, May 29, 2012 at 11:36:51AM -0400, Brian Foster wrote:
> On 05/29/2012 11:07 AM, Ben Myers wrote:
> > On Fri, May 25, 2012 at 03:45:36PM -0500, Ben Myers wrote:
> >> On Thu, May 24, 2012 at 05:39:52PM -0500, Ben Myers wrote:
> >>> Anyway, I'll make some time to work on this tomorrow so I can test it
> >>> over the weekend.
> >>
> >> This is going to spin over the weekend.  See what you think.
> > 
> > I'm reasonably satisfied with the test results over the weekend.  I did end
> > up hitting an unrelated assert:
> 
> I started testing the xfsaild idle patch based against the xfs tree over the
> weekend (after testing successfully against Linus' tree for several days) and
> reproduced the xfs_sync_worker() hang that Mark alerted me to last week.  I
> was considering doing a bisect in that tree since it doesn't occur in Linus'
> tree, but it sounds like I can pull this patch now and shouldn't expect to
> reproduce the sync_worker() hang either, correct? Thanks.

D'oh!  The xfs_sync_worker hang that Mark mentioned last week is when the sync
worker blocks on log reservation for the dummy transaction used to cover the
log, which means that it will not be calling xfs_ail_push_all, which might have
the effect of loosening things up a bit.

This thread is about a crash due to the xfs_sync_worker racing with unmount.  A
fix for this crash is in Linus' tree as of late last week.  Here we're looking
into replacing the existing fix with something that is a bit cleaner.  s_umount
is overkill for this situation, so now we're calling cancel_delayed_work_sync
to shutdown the sync_worker before shutting down the log in order to prevent
the crash.

Unfortunately this fix won't help you with the hang.  If you're considering
bisecting this, I think that Juerg Haefliger has reproduced a/the log hang all
the way back to 2.6.38.  Also Chris J Arges has reproduced one on 2.6.32.52.

See thread 'Still seeing hangs in xlog_grant_log_space'.  The log hang is a
wily coyote.  ;)

Regards,
Ben

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] xfs:  shutdown xfs_sync_worker before the log
  2012-05-29 17:04                 ` Ben Myers
@ 2012-05-29 17:54                   ` Brian Foster
  0 siblings, 0 replies; 27+ messages in thread
From: Brian Foster @ 2012-05-29 17:54 UTC (permalink / raw)
  To: Ben Myers; +Cc: xfs

On 05/29/2012 01:04 PM, Ben Myers wrote:
> Hey Brian,
> 
> On Tue, May 29, 2012 at 11:36:51AM -0400, Brian Foster wrote:
>> On 05/29/2012 11:07 AM, Ben Myers wrote:
>>> On Fri, May 25, 2012 at 03:45:36PM -0500, Ben Myers wrote:
>>>> On Thu, May 24, 2012 at 05:39:52PM -0500, Ben Myers wrote:
>>>>> Anyway, I'll make some time to work on this tomorrow so I can test it
>>>>> over the weekend.
>>>>
>>>> This is going to spin over the weekend.  See what you think.
>>>
>>> I'm reasonably satisfied with the test results over the weekend.  I did end
>>> up hitting an unrelated assert:
>>
>> I started testing the xfsaild idle patch based against the xfs tree over the
>> weekend (after testing successfully against Linus' tree for several days) and
>> reproduced the xfs_sync_worker() hang that Mark alerted me to last week.  I
>> was considering doing a bisect in that tree since it doesn't occur in Linus'
>> tree, but it sounds like I can pull this patch now and shouldn't expect to
>> reproduce the sync_worker() hang either, correct? Thanks.
> 
> D'oh!  The xfs_sync_worker hang that Mark mentioned last week is when the sync
> worker blocks on log reservation for the dummy transaction used to cover the
> log, which means that it will not be calling xfs_ail_push_all, which might have
> the effect of loosening things up a bit.
> 
> This thread is about a crash due to the xfs_sync_worker racing with unmount.  A
> fix for this crash is in Linus' tree as of late last week.  Here we're looking
> into replacing the existing fix with something that is a bit cleaner.  s_umount
> is overkill for this situation, so now we're calling cancel_delayed_work_sync
> to shutdown the sync_worker before shutting down the log in order to prevent
> the crash.
> 
> Unfortunately this fix won't help you with the hang.  If you're considering
> bisecting this, I think that Juerg Haefliger has reproduced a/the log hang all
> the way back to 2.6.38.  Also Chris J Arges has reproduced one on 2.6.32.52.
> 
> See thread 'Still seeing hangs in xlog_grant_log_space'.  The log hang is a
> wily coyote.  ;)
> 

Ah, ok. Thanks for the context and sorry for the noise in this thread. I
do find it interesting that I hit this rather quickly after so many
hours of testing on Linus' tree without seeing it once.

I didn't reproduce at the -rc2 tag in the xfs tree. That isn't too many
bisections so perhaps I'll just carry on with the bisect since I need to
gauge how often this occurs anyways. It will either prove my test as a
sporadic reproducer and not provide anything useful, or I get lucky and
maybe find a useful data point. If the latter, I'll carry it over to the
right thread... ;)

Brian

> Regards,
> Ben
> 
> _______________________________________________
> xfs mailing list
> xfs@oss.sgi.com
> http://oss.sgi.com/mailman/listinfo/xfs

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] xfs:  shutdown xfs_sync_worker before the log
  2012-05-25 20:45           ` [PATCH] xfs: shutdown xfs_sync_worker before the log Ben Myers
  2012-05-29 15:07             ` Ben Myers
@ 2012-05-31 16:23             ` Mark Tinguely
  2012-06-06  4:26             ` Dave Chinner
  2012-06-20  7:36             ` Christoph Hellwig
  3 siblings, 0 replies; 27+ messages in thread
From: Mark Tinguely @ 2012-05-31 16:23 UTC (permalink / raw)
  To: Ben Myers; +Cc: xfs

On 05/25/12 15:45, Ben Myers wrote:

> xfs:  shutdown xfs_sync_worker before the log
>
> Revert commit 1307bbd, which uses the s_umount semaphore to provide
> exclusion between xfs_sync_worker and unmount, in favor of shutting down
> the sync worker before freeing the log in xfs_log_unmount.  This is a
> cleaner way of resolving the race between xfs_sync_worker and unmount
> than using s_umount.
>
> Index: xfs/fs/xfs/xfs_log.c

Waiting for the worker to end before destroying the AIL looks like a
better solution.

Reviewed-by: Mark Tinguely <tinguely@sgi.com>

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] xfs:  shutdown xfs_sync_worker before the log
  2012-05-25 20:45           ` [PATCH] xfs: shutdown xfs_sync_worker before the log Ben Myers
  2012-05-29 15:07             ` Ben Myers
  2012-05-31 16:23             ` Mark Tinguely
@ 2012-06-06  4:26             ` Dave Chinner
  2012-06-11 20:45               ` Ben Myers
  2012-06-20  7:44               ` Christoph Hellwig
  2012-06-20  7:36             ` Christoph Hellwig
  3 siblings, 2 replies; 27+ messages in thread
From: Dave Chinner @ 2012-06-06  4:26 UTC (permalink / raw)
  To: Ben Myers; +Cc: xfs

On Fri, May 25, 2012 at 03:45:36PM -0500, Ben Myers wrote:
> Hey Dave,
> 
> On Thu, May 24, 2012 at 05:39:52PM -0500, Ben Myers wrote:
> > Anyway, I'll make some time to work on this tomorrow so I can test it over
> > the weekend.
> 
> This is going to spin over the weekend.  See what you think.
> 
> -----------
> 
> xfs:  shutdown xfs_sync_worker before the log
> 
> Revert commit 1307bbd, which uses the s_umount semaphore to provide
> exclusion between xfs_sync_worker and unmount, in favor of shutting down
> the sync worker before freeing the log in xfs_log_unmount.  This is a
> cleaner way of resolving the race between xfs_sync_worker and unmount
> than using s_umount.

Looks fine to me.

Reviewed-by: Dave Chinner <dchinner@redhat.com>

Once you commit this, I think I'll do a followup set of patches that
remove all the problems caused by trying to start and stop unrelated
functionality in the same place.

I think starting by renaming the xfs-syncd workqueue to the
xfs_mount_wq because there's nothing "sync" related about it's
functionality any more.

I'll then kill xfs_syncd_init/stop functions and open code the
intialisation of the work structures and start them in the
appropriate places for their functionality. e.g. reclaim work is
demand started and stops when there's nothing more to do or at
unmount, the flush work is demand started and we need to complete
them all at unmount, and the xfssync work is really now "periodic
log work" so should be started once we complete log recovery
successfullly and stopped before we tear down the log....

Then I can move the xfs_sync_worker to xfs_log.c and rename it.
If I then convert xfs_flush_worker to use the generic writeback code
(writeback_inodes_sb_if_idle) the xfs_sync_data() can go away. That
means the only user of xfs_inode_ag_iterator is the quotaoff code
(xfs_qm_dqrele_all_inodes), so it could be moved elsewhere (like
xfs_inode.c).

Then xfs_quiesce_data() can be moved to xfs-super.c where it can sit
alongside the two functions that call it, and the same can be done
for xfs_quiesce_attr().

That will leave only inode cache reclaim functions in xfs_sync.c.
These are closely aligned to the inode allocation, freeing and cache
lookup functions in xfs_iget.c, so I'm thinking of merging the two
into a single file named xfs_inode_cache.c so both xfs_sync.c and
xfs_iget.c go away.

Thoughts?

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] xfs:  shutdown xfs_sync_worker before the log
  2012-06-06  4:26             ` Dave Chinner
@ 2012-06-11 20:45               ` Ben Myers
  2012-06-11 21:11                 ` Mark Tinguely
  2012-06-20  7:44               ` Christoph Hellwig
  1 sibling, 1 reply; 27+ messages in thread
From: Ben Myers @ 2012-06-11 20:45 UTC (permalink / raw)
  To: Dave Chinner; +Cc: xfs

Hey Dave,

On Wed, Jun 06, 2012 at 02:26:47PM +1000, Dave Chinner wrote:
> On Fri, May 25, 2012 at 03:45:36PM -0500, Ben Myers wrote:
> > xfs:  shutdown xfs_sync_worker before the log
> > 
> > Revert commit 1307bbd, which uses the s_umount semaphore to provide
> > exclusion between xfs_sync_worker and unmount, in favor of shutting down
> > the sync worker before freeing the log in xfs_log_unmount.  This is a
> > cleaner way of resolving the race between xfs_sync_worker and unmount
> > than using s_umount.
> 
> Looks fine to me.
> 
> Reviewed-by: Dave Chinner <dchinner@redhat.com>

Thanks!

> Once you commit this, I think I'll do a followup set of patches that
> remove all the problems caused by trying to start and stop unrelated
> functionality in the same place.

I think that's a good idea.

> I think starting by renaming the xfs-syncd workqueue to the
> xfs_mount_wq because there's nothing "sync" related about it's
> functionality any more.
>
> I'll then kill xfs_syncd_init/stop functions and open code the
> intialisation of the work structures and start them in the
> appropriate places for their functionality. e.g. reclaim work is
> demand started and stops when there's nothing more to do or at
> unmount, the flush work is demand started and we need to complete
> them all at unmount, and the xfssync work is really now "periodic
> log work" so should be started once we complete log recovery
> successfullly and stopped before we tear down the log....
> 
> Then I can move the xfs_sync_worker to xfs_log.c and rename it.
> If I then convert xfs_flush_worker to use the generic writeback code
> (writeback_inodes_sb_if_idle) the xfs_sync_data() can go away. That
> means the only user of xfs_inode_ag_iterator is the quotaoff code
> (xfs_qm_dqrele_all_inodes), so it could be moved elsewhere (like
> xfs_inode.c).
> 
> Then xfs_quiesce_data() can be moved to xfs-super.c where it can sit
> alongside the two functions that call it, and the same can be done
> for xfs_quiesce_attr().
> 
> That will leave only inode cache reclaim functions in xfs_sync.c.
> These are closely aligned to the inode allocation, freeing and cache
> lookup functions in xfs_iget.c, so I'm thinking of merging the two
> into a single file named xfs_inode_cache.c so both xfs_sync.c and
> xfs_iget.c go away.
> 
> Thoughts?

That sounds pretty good.  In particular, I think that making the start
and stop of the workqueues correct should be the high priority.  I'm not
as concerned about the accuracy of the names, or cleaning up xfs_sync.c
and xfs_iget.c, but cleanups are worth doing too.

I hit a crash related to the xfslogd workqueue awhile back.  Mark has
taken it up, so there might be a little coordination to do with him.

Regards,
	Ben

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] xfs:  shutdown xfs_sync_worker before the log
  2012-06-11 20:45               ` Ben Myers
@ 2012-06-11 21:11                 ` Mark Tinguely
  2012-06-11 23:36                   ` Dave Chinner
  0 siblings, 1 reply; 27+ messages in thread
From: Mark Tinguely @ 2012-06-11 21:11 UTC (permalink / raw)
  To: Ben Myers; +Cc: xfs

On 06/11/12 15:45, Ben Myers wrote:
...

> That sounds pretty good.  In particular, I think that making the start
> and stop of the workqueues correct should be the high priority.  I'm not
> as concerned about the accuracy of the names, or cleaning up xfs_sync.c
> and xfs_iget.c, but cleanups are worth doing too.
>
> I hit a crash related to the xfslogd workqueue awhile back.  Mark has
> taken it up, so there might be a little coordination to do with him.
>
> Regards,
> 	Ben

To not leave a teaser out there:

PID: 25879  TASK: ffff88012ac20340  CPU: 3   COMMAND: "kworker/3:3"
  #0 [ffff8801a72af920] machine_kexec at ffffffff810244e9
  #1 [ffff8801a72af990] crash_kexec at ffffffff8108d053
  #2 [ffff8801a72afa60] oops_end at ffffffff813ad1b8
  #3 [ffff8801a72afa90] no_context at ffffffff8102bd48
  #4 [ffff8801a72afae0] __bad_area_nosemaphore at ffffffff8102c04d
  #5 [ffff8801a72afb30] bad_area_nosemaphore at ffffffff8102c12e
  #6 [ffff8801a72afb40] do_page_fault at ffffffff813afaee
  #7 [ffff8801a72afc50] page_fault at ffffffff813ac635
     [exception RIP: xlog_assign_tail_lsn_locked+72]
     RIP: ffffffffa040da68  RSP: ffff8801a72afd00  RFLAGS: 00010246
     RAX: 0000000000000000  RBX: 0000000000000000  RCX: dead000000200200
     RDX: ffff88013b32d550  RSI: dead000000100100  RDI: ffff88013b32d550
     RBP: ffff8801a72afd10   R8: ffff8801a72ae000   R9: 0000000000000000
     R10: 0000000000000000  R11: 0000000000000000  R12: ffff88013b32d568
     R13: 0000000000000001  R14: ffff8801a72afd90  R15: ffff88013b32d540
     ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
  #8 [ffff8801a72afd18] xfs_trans_ail_delete_bulk at ffffffffa0414b2a [xfs]
  #9 [ffff8801a72afd78] xfs_buf_iodone at ffffffffa04119c7 [xfs]
#10 [ffff8801a72afdb8] xfs_buf_do_callbacks at ffffffffa041166c [xfs]
#11 [ffff8801a72afdd8] xfs_buf_iodone_callbacks at ffffffffa04117de [xfs]
#12 [ffff8801a72afdf8] xfs_buf_iodone_work at ffffffffa03ad7e1 [xfs]
#13 [ffff8801a72afe18] process_one_work at ffffffff8104c53b
#14 [ffff8801a72afe68] worker_thread at ffffffff8104f0e3
#15 [ffff8801a72afee8] kthread at ffffffff8105395e
#16 [ffff8801a72aff48] kernel_thread_helper at ffffffff813b3ae4

I am just digging through that crash. It appears that xfs_umountfs() did 
a good job in cleaning the AIL and the m_ddev_targp, but it needs to 
wait for the xfslogd to be finished before deallocating the log.

Since workqueues are cheap, maybe it would be smart to have a 
per-filesystem xfslogd too.

--Mark.

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] xfs:  shutdown xfs_sync_worker before the log
  2012-06-11 21:11                 ` Mark Tinguely
@ 2012-06-11 23:36                   ` Dave Chinner
  2012-06-14 17:13                     ` Mark Tinguely
  0 siblings, 1 reply; 27+ messages in thread
From: Dave Chinner @ 2012-06-11 23:36 UTC (permalink / raw)
  To: Mark Tinguely; +Cc: Ben Myers, xfs

On Mon, Jun 11, 2012 at 04:11:28PM -0500, Mark Tinguely wrote:
> On 06/11/12 15:45, Ben Myers wrote:
> ...
> 
> >That sounds pretty good.  In particular, I think that making the start
> >and stop of the workqueues correct should be the high priority.  I'm not
> >as concerned about the accuracy of the names, or cleaning up xfs_sync.c
> >and xfs_iget.c, but cleanups are worth doing too.
> >
> >I hit a crash related to the xfslogd workqueue awhile back.  Mark has
> >taken it up, so there might be a little coordination to do with him.
> >
> >Regards,
> >	Ben
> 
> To not leave a teaser out there:
> 
> PID: 25879  TASK: ffff88012ac20340  CPU: 3   COMMAND: "kworker/3:3"
>  #0 [ffff8801a72af920] machine_kexec at ffffffff810244e9
>  #1 [ffff8801a72af990] crash_kexec at ffffffff8108d053
>  #2 [ffff8801a72afa60] oops_end at ffffffff813ad1b8
>  #3 [ffff8801a72afa90] no_context at ffffffff8102bd48
>  #4 [ffff8801a72afae0] __bad_area_nosemaphore at ffffffff8102c04d
>  #5 [ffff8801a72afb30] bad_area_nosemaphore at ffffffff8102c12e
>  #6 [ffff8801a72afb40] do_page_fault at ffffffff813afaee
>  #7 [ffff8801a72afc50] page_fault at ffffffff813ac635
>     [exception RIP: xlog_assign_tail_lsn_locked+72]
>     RIP: ffffffffa040da68  RSP: ffff8801a72afd00  RFLAGS: 00010246
>     RAX: 0000000000000000  RBX: 0000000000000000  RCX: dead000000200200
>     RDX: ffff88013b32d550  RSI: dead000000100100  RDI: ffff88013b32d550
>     RBP: ffff8801a72afd10   R8: ffff8801a72ae000   R9: 0000000000000000
>     R10: 0000000000000000  R11: 0000000000000000  R12: ffff88013b32d568
>     R13: 0000000000000001  R14: ffff8801a72afd90  R15: ffff88013b32d540
>     ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
>  #8 [ffff8801a72afd18] xfs_trans_ail_delete_bulk at ffffffffa0414b2a [xfs]
>  #9 [ffff8801a72afd78] xfs_buf_iodone at ffffffffa04119c7 [xfs]
> #10 [ffff8801a72afdb8] xfs_buf_do_callbacks at ffffffffa041166c [xfs]
> #11 [ffff8801a72afdd8] xfs_buf_iodone_callbacks at ffffffffa04117de [xfs]
> #12 [ffff8801a72afdf8] xfs_buf_iodone_work at ffffffffa03ad7e1 [xfs]
> #13 [ffff8801a72afe18] process_one_work at ffffffff8104c53b
> #14 [ffff8801a72afe68] worker_thread at ffffffff8104f0e3
> #15 [ffff8801a72afee8] kthread at ffffffff8105395e
> #16 [ffff8801a72aff48] kernel_thread_helper at ffffffff813b3ae4
> 
> I am just digging through that crash. It appears that xfs_umountfs()
> did a good job in cleaning the AIL and the m_ddev_targp, but it
> needs to wait for the xfslogd to be finished before deallocating the
> log.

It is supposed to be already idle before the log is torn down.  The
log is forced synchronously while flushes remaining CIL items, then
the AIL is emptied synchronously. That should result in no
outstanding log operations to be run. Then xfs_log_unmount_write()
is called, which is supposed to wait for the log write to complete
before allowing the log to be torn down in xfs_log_unmount(). i.e.
it is also synchronous.

So the question that needs to be answered is this: what is the
transaction/checkpoint that is being completed here?

> Since workqueues are cheap, maybe it would be smart to have a
> per-filesystem xfslogd too.

That's overkill. If all we need is to ensure that we have emptied
the logd wq, then a synchronous flush is all that is necessary. But
first we need to find the cause of the above problem, and I'd
suggest a new thread for that...

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] xfs:  shutdown xfs_sync_worker before the log
  2012-06-11 23:36                   ` Dave Chinner
@ 2012-06-14 17:13                     ` Mark Tinguely
  2012-06-14 23:56                       ` Dave Chinner
  0 siblings, 1 reply; 27+ messages in thread
From: Mark Tinguely @ 2012-06-14 17:13 UTC (permalink / raw)
  To: Dave Chinner; +Cc: Ben Myers, xfs

On 06/11/12 18:36, Dave Chinner wrote:
> On Mon, Jun 11, 2012 at 04:11:28PM -0500, Mark Tinguely wrote:
>> On 06/11/12 15:45, Ben Myers wrote:
>> ...
>>
>>> That sounds pretty good.  In particular, I think that making the start
>>> and stop of the workqueues correct should be the high priority.  I'm not
>>> as concerned about the accuracy of the names, or cleaning up xfs_sync.c
>>> and xfs_iget.c, but cleanups are worth doing too.
>>>
>>> I hit a crash related to the xfslogd workqueue awhile back.  Mark has
>>> taken it up, so there might be a little coordination to do with him.
>>>
>>> Regards,
>>> 	Ben
>>
>> To not leave a teaser out there:
>>
>> PID: 25879  TASK: ffff88012ac20340  CPU: 3   COMMAND: "kworker/3:3"
>>   #0 [ffff8801a72af920] machine_kexec at ffffffff810244e9
>>   #1 [ffff8801a72af990] crash_kexec at ffffffff8108d053
>>   #2 [ffff8801a72afa60] oops_end at ffffffff813ad1b8
>>   #3 [ffff8801a72afa90] no_context at ffffffff8102bd48
>>   #4 [ffff8801a72afae0] __bad_area_nosemaphore at ffffffff8102c04d
>>   #5 [ffff8801a72afb30] bad_area_nosemaphore at ffffffff8102c12e
>>   #6 [ffff8801a72afb40] do_page_fault at ffffffff813afaee
>>   #7 [ffff8801a72afc50] page_fault at ffffffff813ac635
>>      [exception RIP: xlog_assign_tail_lsn_locked+72]
>>      RIP: ffffffffa040da68  RSP: ffff8801a72afd00  RFLAGS: 00010246
>>      RAX: 0000000000000000  RBX: 0000000000000000  RCX: dead000000200200
>>      RDX: ffff88013b32d550  RSI: dead000000100100  RDI: ffff88013b32d550
>>      RBP: ffff8801a72afd10   R8: ffff8801a72ae000   R9: 0000000000000000
>>      R10: 0000000000000000  R11: 0000000000000000  R12: ffff88013b32d568
>>      R13: 0000000000000001  R14: ffff8801a72afd90  R15: ffff88013b32d540
>>      ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
>>   #8 [ffff8801a72afd18] xfs_trans_ail_delete_bulk at ffffffffa0414b2a [xfs]
>>   #9 [ffff8801a72afd78] xfs_buf_iodone at ffffffffa04119c7 [xfs]
>> #10 [ffff8801a72afdb8] xfs_buf_do_callbacks at ffffffffa041166c [xfs]
>> #11 [ffff8801a72afdd8] xfs_buf_iodone_callbacks at ffffffffa04117de [xfs]
>> #12 [ffff8801a72afdf8] xfs_buf_iodone_work at ffffffffa03ad7e1 [xfs]
>> #13 [ffff8801a72afe18] process_one_work at ffffffff8104c53b
>> #14 [ffff8801a72afe68] worker_thread at ffffffff8104f0e3
>> #15 [ffff8801a72afee8] kthread at ffffffff8105395e
>> #16 [ffff8801a72aff48] kernel_thread_helper at ffffffff813b3ae4
>>
>> I am just digging through that crash. It appears that xfs_umountfs()
>> did a good job in cleaning the AIL and the m_ddev_targp, but it
>> needs to wait for the xfslogd to be finished before deallocating the
>> log.
>
> It is supposed to be already idle before the log is torn down.  The
> log is forced synchronously while flushes remaining CIL items, then
> the AIL is emptied synchronously. That should result in no
> outstanding log operations to be run. Then xfs_log_unmount_write()
> is called, which is supposed to wait for the log write to complete
> before allowing the log to be torn down in xfs_log_unmount(). i.e.
> it is also synchronous.
>
> So the question that needs to be answered is this: what is the
> transaction/checkpoint that is being completed here?
>
>> Since workqueues are cheap, maybe it would be smart to have a
>> per-filesystem xfslogd too.
>
> That's overkill. If all we need is to ensure that we have emptied
> the logd wq, then a synchronous flush is all that is necessary. But
> first we need to find the cause of the above problem, and I'd
> suggest a new thread for that...
>
> Cheers,
>
> Dave.

I know this should be a separate thread, but a quick FYI that the buffer
that is being flushed has the XFS_SB_MAGIC in it. A late dirty of the
superblock from the shutdown process?


--Mark Tinguely.

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] xfs:  shutdown xfs_sync_worker before the log
  2012-06-14 17:13                     ` Mark Tinguely
@ 2012-06-14 23:56                       ` Dave Chinner
  0 siblings, 0 replies; 27+ messages in thread
From: Dave Chinner @ 2012-06-14 23:56 UTC (permalink / raw)
  To: Mark Tinguely; +Cc: Ben Myers, xfs

On Thu, Jun 14, 2012 at 12:13:12PM -0500, Mark Tinguely wrote:
> On 06/11/12 18:36, Dave Chinner wrote:
> >On Mon, Jun 11, 2012 at 04:11:28PM -0500, Mark Tinguely wrote:
> >>On 06/11/12 15:45, Ben Myers wrote:
> >>...
> >>
> >>>That sounds pretty good.  In particular, I think that making the start
> >>>and stop of the workqueues correct should be the high priority.  I'm not
> >>>as concerned about the accuracy of the names, or cleaning up xfs_sync.c
> >>>and xfs_iget.c, but cleanups are worth doing too.
> >>>
> >>>I hit a crash related to the xfslogd workqueue awhile back.  Mark has
> >>>taken it up, so there might be a little coordination to do with him.
> >>>
> >>>Regards,
> >>>	Ben
> >>
> >>To not leave a teaser out there:
> >>
> >>PID: 25879  TASK: ffff88012ac20340  CPU: 3   COMMAND: "kworker/3:3"
> >>  #0 [ffff8801a72af920] machine_kexec at ffffffff810244e9
> >>  #1 [ffff8801a72af990] crash_kexec at ffffffff8108d053
> >>  #2 [ffff8801a72afa60] oops_end at ffffffff813ad1b8
> >>  #3 [ffff8801a72afa90] no_context at ffffffff8102bd48
> >>  #4 [ffff8801a72afae0] __bad_area_nosemaphore at ffffffff8102c04d
> >>  #5 [ffff8801a72afb30] bad_area_nosemaphore at ffffffff8102c12e
> >>  #6 [ffff8801a72afb40] do_page_fault at ffffffff813afaee
> >>  #7 [ffff8801a72afc50] page_fault at ffffffff813ac635
> >>     [exception RIP: xlog_assign_tail_lsn_locked+72]
> >>     RIP: ffffffffa040da68  RSP: ffff8801a72afd00  RFLAGS: 00010246
> >>     RAX: 0000000000000000  RBX: 0000000000000000  RCX: dead000000200200
> >>     RDX: ffff88013b32d550  RSI: dead000000100100  RDI: ffff88013b32d550
> >>     RBP: ffff8801a72afd10   R8: ffff8801a72ae000   R9: 0000000000000000
> >>     R10: 0000000000000000  R11: 0000000000000000  R12: ffff88013b32d568
> >>     R13: 0000000000000001  R14: ffff8801a72afd90  R15: ffff88013b32d540
> >>     ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
> >>  #8 [ffff8801a72afd18] xfs_trans_ail_delete_bulk at ffffffffa0414b2a [xfs]
> >>  #9 [ffff8801a72afd78] xfs_buf_iodone at ffffffffa04119c7 [xfs]
> >>#10 [ffff8801a72afdb8] xfs_buf_do_callbacks at ffffffffa041166c [xfs]
> >>#11 [ffff8801a72afdd8] xfs_buf_iodone_callbacks at ffffffffa04117de [xfs]
> >>#12 [ffff8801a72afdf8] xfs_buf_iodone_work at ffffffffa03ad7e1 [xfs]
> >>#13 [ffff8801a72afe18] process_one_work at ffffffff8104c53b
> >>#14 [ffff8801a72afe68] worker_thread at ffffffff8104f0e3
> >>#15 [ffff8801a72afee8] kthread at ffffffff8105395e
> >>#16 [ffff8801a72aff48] kernel_thread_helper at ffffffff813b3ae4
> >>
> >>I am just digging through that crash. It appears that xfs_umountfs()
> >>did a good job in cleaning the AIL and the m_ddev_targp, but it
> >>needs to wait for the xfslogd to be finished before deallocating the
> >>log.
> >
> >It is supposed to be already idle before the log is torn down.  The
> >log is forced synchronously while flushes remaining CIL items, then
> >the AIL is emptied synchronously. That should result in no
> >outstanding log operations to be run. Then xfs_log_unmount_write()
> >is called, which is supposed to wait for the log write to complete
> >before allowing the log to be torn down in xfs_log_unmount(). i.e.
> >it is also synchronous.
> >
> >So the question that needs to be answered is this: what is the
> >transaction/checkpoint that is being completed here?
> >
> >>Since workqueues are cheap, maybe it would be smart to have a
> >>per-filesystem xfslogd too.
> >
> >That's overkill. If all we need is to ensure that we have emptied
> >the logd wq, then a synchronous flush is all that is necessary. But
> >first we need to find the cause of the above problem, and I'd
> >suggest a new thread for that...
> 
> I know this should be a separate thread, but a quick FYI that the buffer
> that is being flushed has the XFS_SB_MAGIC in it. A late dirty of the
> superblock from the shutdown process?

Oh, well that makes it easy to find the bug. Took me about 10s to
work it out with that information, without even looking at the
code. ;)

So, have a look at xfs_unmountfs(), what it does with the superblock
and the tell me what the bug is. Hint: the superblock is an uncached
buffer....

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] xfs:  shutdown xfs_sync_worker before the log
  2012-05-25 20:45           ` [PATCH] xfs: shutdown xfs_sync_worker before the log Ben Myers
                               ` (2 preceding siblings ...)
  2012-06-06  4:26             ` Dave Chinner
@ 2012-06-20  7:36             ` Christoph Hellwig
  2012-06-20 17:18               ` Ben Myers
  2012-06-20 22:59               ` Dave Chinner
  3 siblings, 2 replies; 27+ messages in thread
From: Christoph Hellwig @ 2012-06-20  7:36 UTC (permalink / raw)
  To: Ben Myers; +Cc: xfs

On Fri, May 25, 2012 at 03:45:36PM -0500, Ben Myers wrote:
> Hey Dave,
> 
> On Thu, May 24, 2012 at 05:39:52PM -0500, Ben Myers wrote:
> > Anyway, I'll make some time to work on this tomorrow so I can test it over
> > the weekend.
> 
> This is going to spin over the weekend.  See what you think.
> 
> -----------
> 
> xfs:  shutdown xfs_sync_worker before the log
> 
> Revert commit 1307bbd, which uses the s_umount semaphore to provide
> exclusion between xfs_sync_worker and unmount, in favor of shutting down
> the sync worker before freeing the log in xfs_log_unmount.  This is a
> cleaner way of resolving the race between xfs_sync_worker and unmount
> than using s_umount.

I like the cancel_delayed_work_sync on unmount side of this, but I don't
really like the MS_ACTIVE check - why can't we only do the initial
xfs_syncd_queue_sync as the last thing in the mount process?

On just slightly related issue with m_sync_work: what does the force
flush of it from xfs_fs_sync_fs in laptop mode buys us?  The sync
just did the log_force and log coverage, so the only think it will do
is force out the AIL after we did the log force and thus guarantee to
keep the fs busy for a while.  I can't really see how that's actually
useful for batter life.  Note that ->sync_fs is only called for an
epxlicit sync (in addition to umount,remount ro and freeze), so for
normal desktop operation it's entirely irrelevant anyway.

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] xfs:  shutdown xfs_sync_worker before the log
  2012-06-06  4:26             ` Dave Chinner
  2012-06-11 20:45               ` Ben Myers
@ 2012-06-20  7:44               ` Christoph Hellwig
  1 sibling, 0 replies; 27+ messages in thread
From: Christoph Hellwig @ 2012-06-20  7:44 UTC (permalink / raw)
  To: Dave Chinner; +Cc: Ben Myers, xfs

On Wed, Jun 06, 2012 at 02:26:47PM +1000, Dave Chinner wrote:
> I think starting by renaming the xfs-syncd workqueue to the
> xfs_mount_wq because there's nothing "sync" related about it's
> functionality any more.

Is there any good reason to keep queueing different work items on the
same queue now that workqueues are incredibly cheap? 

> I'll then kill xfs_syncd_init/stop functions and open code the
> intialisation of the work structures and start them in the
> appropriate places for their functionality. e.g. reclaim work is
> demand started and stops when there's nothing more to do or at
> unmount, the flush work is demand started and we need to complete
> them all at unmount, and the xfssync work is really now "periodic
> log work" so should be started once we complete log recovery
> successfullly and stopped before we tear down the log....

Sounds good.  You probably also want to kill off xfs_sync_fsdata as
it doesn't make any sense with our current sync / ail pushing code
before that.

> Then I can move the xfs_sync_worker to xfs_log.c and rename it.

Before that you probably want to kill the xfs_ail_push_all in it in
favour or xfsaild waking up periodically by itself if there is anything
in the AIL.

> If I then convert xfs_flush_worker to use the generic writeback code
> (writeback_inodes_sb_if_idle) the xfs_sync_data() can go away.

Good plan.  It'll still need the trylock changes for it that don't
really seem to make forward progress on fsdevel.

> That
> means the only user of xfs_inode_ag_iterator is the quotaoff code
> (xfs_qm_dqrele_all_inodes), so it could be moved elsewhere (like
> xfs_inode.c).

fair enough.

> Then xfs_quiesce_data() can be moved to xfs-super.c where it can sit
> alongside the two functions that call it, and the same can be done
> for xfs_quiesce_attr().

xfs_fs_remount shouldn't really need to call it, as do_remount_sb
calls it just before entering ->remount_fs.  Although looking at it
closter do_remount_sb probably needs to move the sync_filesystem
call until after the check for r/o files and preventing new writes.

Independent of that I think xfs_quiesce_data should be merged into
xfs_fs_sync_fs - until do_remount_sb is fixed xfs_fs_remount should
simply call xfs_fs_sync_fs.

This also is a good opportunity to redo the maze of comments describing
the freeze process, which is rather outdated and a bit confusing now.

> That will leave only inode cache reclaim functions in xfs_sync.c.
> These are closely aligned to the inode allocation, freeing and cache
> lookup functions in xfs_iget.c, so I'm thinking of merging the two
> into a single file named xfs_inode_cache.c so both xfs_sync.c and
> xfs_iget.c go away.

Sounds good, although I'd call the file xfs_icache.c - that seems to
fit better with the general theme of naming schemes in XFS.

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] xfs:  shutdown xfs_sync_worker before the log
  2012-06-20  7:36             ` Christoph Hellwig
@ 2012-06-20 17:18               ` Ben Myers
  2012-06-20 22:59               ` Dave Chinner
  1 sibling, 0 replies; 27+ messages in thread
From: Ben Myers @ 2012-06-20 17:18 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: xfs

Hey Christoph,

On Wed, Jun 20, 2012 at 03:36:00AM -0400, Christoph Hellwig wrote:
> On Fri, May 25, 2012 at 03:45:36PM -0500, Ben Myers wrote:
> > Hey Dave,
> > 
> > On Thu, May 24, 2012 at 05:39:52PM -0500, Ben Myers wrote:
> > > Anyway, I'll make some time to work on this tomorrow so I can test it over
> > > the weekend.
> > 
> > This is going to spin over the weekend.  See what you think.
> > 
> > -----------
> > 
> > xfs:  shutdown xfs_sync_worker before the log
> > 
> > Revert commit 1307bbd, which uses the s_umount semaphore to provide
> > exclusion between xfs_sync_worker and unmount, in favor of shutting down
> > the sync worker before freeing the log in xfs_log_unmount.  This is a
> > cleaner way of resolving the race between xfs_sync_worker and unmount
> > than using s_umount.
> 
> I like the cancel_delayed_work_sync on unmount side of this, but I don't
> really like the MS_ACTIVE check - why can't we only do the initial
> xfs_syncd_queue_sync as the last thing in the mount process?

There isn't a race on the mount side related to the MS_ACTIVE flag.  It's only
on the unmount side.  I agree that we could get rid of the MS_ACTIVE check if
we move xfs_syncd_queue_sync to the last thing in the mount process.

The patch as-is has been well tested and I feel it is ready for a 3.5 rc merge.
I'd like to address the MS_ACTIVE check for the 3.6 merge window.  I do think
it is a good idea.

> On just slightly related issue with m_sync_work: what does the force
> flush of it from xfs_fs_sync_fs in laptop mode buys us?  The sync
> just did the log_force and log coverage, so the only think it will do
> is force out the AIL after we did the log force and thus guarantee to
> keep the fs busy for a while.  I can't really see how that's actually
> useful for batter life.  Note that ->sync_fs is only called for an
> epxlicit sync (in addition to umount,remount ro and freeze), so for
> normal desktop operation it's entirely irrelevant anyway.

1001 STATIC int
1002 xfs_fs_sync_fs(
1003         struct super_block      *sb,
1004         int                     wait)
1005 {
1006         struct xfs_mount        *mp = XFS_M(sb);
1007         int                     error;
1008 
1009         /*
1010          * Doing anything during the async pass would be counterproductive.
1011          */
1012         if (!wait)
1013                 return 0;
1014 
1015         error = xfs_quiesce_data(mp);
1016         if (error)
1017                 return -error;
1018 
1019         if (laptop_mode) {
1020                 /*
1021                  * The disk must be active because we're syncing.
1022                  * We schedule xfssyncd now (now that the disk is
1023                  * active) instead of later (when it might not be).
1024                  */
1025                 flush_delayed_work_sync(&mp->m_sync_work);
1026         }
1027 
1028         return 0;
1029 }

Is that comment misleading?  I think it could be interpreted like this:

"We're currently syncing, so the disk must already be active.  Since it is
already active, lets schedule xfssyncd now so that the work gets done
immediately.  Otherwise the disk could be spun down between now and when xfssyncd
would normally have been scheduled, only to be spun up once again by xfssyncd."

Before removing this we should take a look at the commit where this was added
and see what was up.  A bit of research to do.

Regards,
Ben

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] xfs:  shutdown xfs_sync_worker before the log
  2012-06-20  7:36             ` Christoph Hellwig
  2012-06-20 17:18               ` Ben Myers
@ 2012-06-20 22:59               ` Dave Chinner
  2012-06-21  7:12                 ` Christoph Hellwig
  1 sibling, 1 reply; 27+ messages in thread
From: Dave Chinner @ 2012-06-20 22:59 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Ben Myers, xfs

On Wed, Jun 20, 2012 at 03:36:00AM -0400, Christoph Hellwig wrote:
> On Fri, May 25, 2012 at 03:45:36PM -0500, Ben Myers wrote:
> > Hey Dave,
> > 
> > On Thu, May 24, 2012 at 05:39:52PM -0500, Ben Myers wrote:
> > > Anyway, I'll make some time to work on this tomorrow so I can test it over
> > > the weekend.
> > 
> > This is going to spin over the weekend.  See what you think.
> > 
> > -----------
> > 
> > xfs:  shutdown xfs_sync_worker before the log
> > 
> > Revert commit 1307bbd, which uses the s_umount semaphore to provide
> > exclusion between xfs_sync_worker and unmount, in favor of shutting down
> > the sync worker before freeing the log in xfs_log_unmount.  This is a
> > cleaner way of resolving the race between xfs_sync_worker and unmount
> > than using s_umount.
> 
> I like the cancel_delayed_work_sync on unmount side of this, but I don't
> really like the MS_ACTIVE check - why can't we only do the initial
> xfs_syncd_queue_sync as the last thing in the mount process?

That's what i plan to fix as a followup - this is really just the
minimal fix to solve the problem at hand for 3.5....

> On just slightly related issue with m_sync_work: what does the force
> flush of it from xfs_fs_sync_fs in laptop mode buys us?  The sync
> just did the log_force and log coverage, so the only think it will do
> is force out the AIL after we did the log force and thus guarantee to
> keep the fs busy for a while.  I can't really see how that's actually
> useful for batter life.  Note that ->sync_fs is only called for an
> epxlicit sync (in addition to umount,remount ro and freeze), so for
> normal desktop operation it's entirely irrelevant anyway.

The laptop mode stuff has bit rotted for a long time - I've never
really cared about it, and realistically, we should either make it
work properly or rip it out. I'm in favor of ripping it out at the
moment, as SSDs are really making "keep the fs idle to avoid
spinning up the disk" optimisations redundant.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] xfs:  shutdown xfs_sync_worker before the log
  2012-06-20 22:59               ` Dave Chinner
@ 2012-06-21  7:12                 ` Christoph Hellwig
  0 siblings, 0 replies; 27+ messages in thread
From: Christoph Hellwig @ 2012-06-21  7:12 UTC (permalink / raw)
  To: Dave Chinner; +Cc: Christoph Hellwig, Ben Myers, xfs

On Thu, Jun 21, 2012 at 08:59:48AM +1000, Dave Chinner wrote:
> That's what i plan to fix as a followup - this is really just the
> minimal fix to solve the problem at hand for 3.5....

Oh well, let's get in in as-is for now and sort out the mess later.

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 27+ messages in thread

end of thread, other threads:[~2012-06-21  7:12 UTC | newest]

Thread overview: 27+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-03-23 17:43 BUG in xlog_get_lowest_lsn Ben Myers
2012-05-14 20:34 ` [PATCH] xfs: use s_umount sema in xfs_sync_worker Ben Myers
2012-05-15 18:30   ` Mark Tinguely
2012-05-15 19:06     ` Ben Myers
2012-05-16  1:56   ` Dave Chinner
2012-05-16 17:04     ` Ben Myers
2012-05-17  7:16       ` Dave Chinner
2012-05-23  9:02         ` Dave Chinner
2012-05-23 16:45           ` Ben Myers
2012-05-24 22:39         ` Ben Myers
2012-05-25 20:45           ` [PATCH] xfs: shutdown xfs_sync_worker before the log Ben Myers
2012-05-29 15:07             ` Ben Myers
2012-05-29 15:36               ` Brian Foster
2012-05-29 17:04                 ` Ben Myers
2012-05-29 17:54                   ` Brian Foster
2012-05-31 16:23             ` Mark Tinguely
2012-06-06  4:26             ` Dave Chinner
2012-06-11 20:45               ` Ben Myers
2012-06-11 21:11                 ` Mark Tinguely
2012-06-11 23:36                   ` Dave Chinner
2012-06-14 17:13                     ` Mark Tinguely
2012-06-14 23:56                       ` Dave Chinner
2012-06-20  7:44               ` Christoph Hellwig
2012-06-20  7:36             ` Christoph Hellwig
2012-06-20 17:18               ` Ben Myers
2012-06-20 22:59               ` Dave Chinner
2012-06-21  7:12                 ` Christoph Hellwig

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.