All of lore.kernel.org
 help / color / mirror / Atom feed
* BUG_ON(sh->batch_head) in init_stripe()
@ 2017-05-08 17:15 Stephane Thiell
  2017-08-13 17:29 ` Stephane Thiell
  0 siblings, 1 reply; 15+ messages in thread
From: Stephane Thiell @ 2017-05-08 17:15 UTC (permalink / raw)
  To: linux-raid

Hi list,

I have a recurring issue using raid6 which results in a panic due to a BUG_ON(sh->batch_head). During last weekend, the issue occurred during a weekly raid-check. The raid volumes (12 total) are pretty new, no mismatch nor hardware errors have been detected.

[535089.369357] kernel BUG at drivers/md/raid5.c:527!
[535089.374700] invalid opcode: 0000 [#1] SMP 
[535089.379384] Modules linked in: ...
[535089.503509] CPU: 34 PID: 280061 Comm: md0_resync Tainted: G           OE  ------------   3.10.0-514.10.2.el7_lustre.x86_64 #1

This is the backtrace:

crash> bt 280061
PID: 280061  TASK: ffff8800757cde20  CPU: 34  COMMAND: "md0_resync"
 #0 [ffff88024e217830] machine_kexec at ffffffff81059bdb
 #1 [ffff88024e217890] __crash_kexec at ffffffff81105382
 #2 [ffff88024e217960] crash_kexec at ffffffff81105470
 #3 [ffff88024e217978] oops_end at ffffffff8168f508
 #4 [ffff88024e2179a0] die at ffffffff8102e93b
 #5 [ffff88024e2179d0] do_trap at ffffffff8168ebc0
 #6 [ffff88024e217a20] do_invalid_op at ffffffff8102b144
 #7 [ffff88024e217ad0] invalid_op at ffffffff816984de
    [exception RIP: raid5_get_active_stripe+1809]
    RIP: ffffffffa0e4ed71  RSP: ffff88024e217b88  RFLAGS: 00010086
    RAX: 0000000000000000  RBX: ffff883fe5e40408  RCX: dead000000000200
    RDX: 0000000000000000  RSI: 0000000000000000  RDI: ffff8823ebf45ca0
    RBP: ffff88024e217c30   R8: ffff8823ebf45cb0   R9: 0000000000000080
    R10: 0000000000000006  R11: 0000000000000000  R12: ffff883fe5e40400
    R13: ffff8823ebf45ca0  R14: 0000000000000000  R15: 0000000000000000
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
 #8 [ffff88024e217b80] raid5_get_active_stripe at ffffffffa0e4e996 [raid456]
 #9 [ffff88024e217be0] raid5_release_stripe at ffffffffa0e48f24 [raid456]
#10 [ffff88024e217c38] raid5_sync_request at ffffffffa0e53c4b [raid456]
#11 [ffff88024e217ca8] md_do_sync at ffffffff814fef9b
#12 [ffff88024e217e50] md_thread at ffffffff814fb1b5
#13 [ffff88024e217ec8] kthread at ffffffff810b06ff
#14 [ffff88024e217f50] ret_from_fork at ffffffff81696b98

It appears to be triggered by BUG_ON(sh->batch_head) in init_stripe():

crash> l drivers/md/raid5.c:524
519     static void init_stripe(struct stripe_head *sh, sector_t sector, int previous)
520     {
521             struct r5conf *conf = sh->raid_conf;
522             int i, seq;
523     
524             BUG_ON(atomic_read(&sh->count) != 0);
525             BUG_ON(test_bit(STRIPE_HANDLE, &sh->state));
526             BUG_ON(stripe_operations_active(sh));
527             BUG_ON(sh->batch_head);    <<<
528 

Other I/Os were processed at this time, but I am not sure how to check that they were actually on the same md:

crash> ps | grep ">"
<snip>
> 59684      2  28  ffff88407fdc0fb0  RU   0.0       0      0  [md0_raid6]
> 61479      2  17  ffff883e46e80000  UN   0.0       0      0  [ll_ost_io01_001]
> 220748      2  23  ffff881fb7ab8fb0  UN   0.0       0      0  [ll_ost_io01_011]
> 220750      2  19  ffff881fb7abce70  UN   0.0       0      0  [ll_ost_io01_013]
> 279158      2  14  ffff883ab46b4e70  RU   0.0       0      0  [md22_resync]
> 280061      2  34  ffff8800757cde20  RU   0.0       0      0  [md0_resync]
> 280829      2   6  ffff881c72296dd0  RU   0.0       0      0  [md6_resync]

example of possible concurrent writing thread:

crash> bt 61479
PID: 61479  TASK: ffff883e46e80000  CPU: 17  COMMAND: "ll_ost_io01_001"
 #0 [ffff883ffc805e58] crash_nmi_callback at ffffffff8104d2e2
 #1 [ffff883ffc805e68] nmi_handle at ffffffff8168f699
 #2 [ffff883ffc805eb0] do_nmi at ffffffff8168f813
 #3 [ffff883ffc805ef0] end_repeat_nmi at ffffffff8168ead3
    [exception RIP: _raw_spin_lock_irq+63]
    RIP: ffffffff8168e09f  RSP: ffff883e46e3f588  RFLAGS: 00000002
    RAX: 00000000000044c2  RBX: ffff883fe5e40408  RCX: 000000000000c464
    RDX: 000000000000c468  RSI: 000000000000c468  RDI: ffff883fe5e40408
    RBP: ffff883e46e3f588   R8: 0000000000000000   R9: 0000000000000080
    R10: 0000000000000002  R11: 0000000000000000  R12: ffff883fe5e40400
    R13: 0000000000000000  R14: ffff883fe0e61900  R15: 0000000000000000
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
--- <NMI exception stack> ---
 #4 [ffff883e46e3f588] _raw_spin_lock_irq at ffffffff8168e09f
 #5 [ffff883e46e3f590] raid5_get_active_stripe at ffffffffa0e4e6cb [raid456]
 #6 [ffff883e46e3f648] raid5_make_request at ffffffffa0e4ef55 [raid456]
 #7 [ffff883e46e3f738] md_make_request at ffffffff814f7dfc
 #8 [ffff883e46e3f798] generic_make_request at ffffffff812ee939
 #9 [ffff883e46e3f7e0] submit_bio at ffffffff812eea81
#10 [ffff883e46e3f838] osd_submit_bio at ffffffffa10a0bcc [osd_ldiskfs]
#11 [ffff883e46e3f848] osd_do_bio at ffffffffa10a3007 [osd_ldiskfs]
#12 [ffff883e46e3f968] osd_write_commit at ffffffffa10a3b94 [osd_ldiskfs]
#13 [ffff883e46e3fa08] ofd_commitrw_write at ffffffffa1113774 [ofd]
#14 [ffff883e46e3fa80] ofd_commitrw at ffffffffa1116f2d [ofd]
#15 [ffff883e46e3fb08] obd_commitrw at ffffffffa0c43c22 [ptlrpc]
#16 [ffff883e46e3fb70] tgt_brw_write at ffffffffa0c1bfc1 [ptlrpc]
#17 [ffff883e46e3fcd8] tgt_request_handle at ffffffffa0c18275 [ptlrpc]
#18 [ffff883e46e3fd20] ptlrpc_server_handle_request at ffffffffa0bc41fb [ptlrpc]
#19 [ffff883e46e3fde8] ptlrpc_main at ffffffffa0bc82b0 [ptlrpc]
#20 [ffff883e46e3fec8] kthread at ffffffff810b06ff
#21 [ffff883e46e3ff50] ret_from_fork at ffffffff81696b98


The same issue happened during a heavy IOR benchmark a few months ago and was described in https://jira.hpdd.intel.com/browse/LU-8917 (on a slightly older el7 kernel).

I also found out that other users have described similar issues, like on this thread: https://lkml.org/lkml/2016/12/23/205

Now, I have a crash dump and I’m trying to understand why sh->batch_head could be set in init_stripe(), which is called by raid5_get_active_stripe() when __find_stripe() failed BUT get_free_stripe() succeeded. If sh->batch_head is set in that case, that means the idle stripe found had it set…

Does someone have any idea of how to troubleshoot or solve this?

Thanks!

Stephan

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: BUG_ON(sh->batch_head) in init_stripe()
  2017-05-08 17:15 BUG_ON(sh->batch_head) in init_stripe() Stephane Thiell
@ 2017-08-13 17:29 ` Stephane Thiell
  2017-08-15  9:24   ` Guoqing Jiang
  0 siblings, 1 reply; 15+ messages in thread
From: Stephane Thiell @ 2017-08-13 17:29 UTC (permalink / raw)
  To: linux-raid

Also known as "kernel BUG at drivers/md/raid5.c:527!"

I’m still struggling with this issue in raid5.c. I currently have one crash per week in average on a system using Lustre and 24 x 10-disk raid6 per server. It did happen with or without check running. I now think there is a race at some point where a released stripe head (batched) is added to conf->inactive_list without sh->batch_head being set back to NULL.

Because sh->batch_head is only being set back to NULL in a single function, break_stripe_batch_list(), my guess is that a call to raid5_release_stripe() on the head stripe is performed elsewhere. I didn’t find where this could come from yet. So perhaps sh->batch_head should be just set to NULL in do_release_stripe(), before being added to the inactive list?

Alternatively, I wonder if it would make more sense to set batch_head to NULL where sh->overwrite_disks is also set back to 0 (like in ops_run_biodrain() today).

Because sh->overwrite_disks is also set to 0 in init_stripe(), my current plan is to try this change, at least as a workaround: remove the BUG_ON(sh->batch_head) from init_stripe() and set sh->batch_head = NULL (I don’t think I need to lock on sh->stripe_lock there?). I will get new hardware soon so I will give it a try on a test system. Any suggestions welcomed…

Quick analysis of the last vmcore:

crash> ps | grep ">"
...
> 16119  16111  22  ffff881f93bace70  RU   0.0  150772  10748  e2label
> 151448      2  34  ffff881ff753bec0  RU   0.0       0      0  [kiblnd_sd_00_00]
> 151449      2   2  ffff881ff753edd0  RU   0.0       0      0  [kiblnd_sd_00_01]
> 154637      2  29  ffff881f776b1f60  RU   0.0       0      0  [md32_raid6]
> 163643      2  11  ffff8820d200bec0  UN   0.0       0      0  [ll_ost_io01_008]
> 164083      2  13  ffff8820d2753ec0  UN   0.0       0      0  [ll_ost_io01_034]
> 171389      2   5  ffff883fd072de20  UN   0.0       0      0  [ll_ost_io01_058]
> 171391      2  17  ffff883fd072ce70  UN   0.0       0      0  [ll_ost_io01_060]
> 171399      2  33  ffff883fd9e9ce70  UN   0.0       0      0  [ll_ost_io01_068]
> 237019      2  15  ffff882715628fb0  UN   0.0       0      0  [ll_ost_io01_084]
> 237033      2   3  ffff881fb0250fb0  UN   0.0       0      0  [ll_ost_io01_098]
...

crash>  bt 237033
PID: 237033  TASK: ffff881fb0250fb0  CPU: 3   COMMAND: "ll_ost_io01_098"
 #0 [ffff8826fa3e7240] machine_kexec at ffffffff81059bdb
 #1 [ffff8826fa3e72a0] __crash_kexec at ffffffff811057c2
 #2 [ffff8826fa3e7370] crash_kexec at ffffffff811058b0
 #3 [ffff8826fa3e7388] oops_end at ffffffff8168fac8
 #4 [ffff8826fa3e73b0] die at ffffffff8102e93b
 #5 [ffff8826fa3e73e0] do_trap at ffffffff8168f180
 #6 [ffff8826fa3e7430] do_invalid_op at ffffffff8102b144
 #7 [ffff8826fa3e74e0] invalid_op at ffffffff81698b5e
    [exception RIP: raid5_get_active_stripe+1809]
    RIP: ffffffffa0fefd71  RSP: ffff8826fa3e7590  RFLAGS: 00010086
    RAX: 0000000000000000  RBX: ffff883df051641c  RCX: dead000000000200
    RDX: 0000000000000000  RSI: 0000000000000005  RDI: ffff881e08d54d30
    RBP: ffff8826fa3e7638   R8: ffff881e08d54d40   R9: 0000000000000080
    R10: 0000000000000000  R11: 0000000000000000  R12: ffff883df0516400
    R13: ffff881e08d54d30  R14: 0000000000000000  R15: 0000000000000000
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
 #8 [ffff8826fa3e7640] raid5_make_request at ffffffffa0feff55 [raid456]
 #9 [ffff8826fa3e7730] md_make_request at ffffffff814f838c
#10 [ffff8826fa3e7790] generic_make_request at ffffffff812eedc9
#11 [ffff8826fa3e77d8] submit_bio at ffffffff812eef11
#12 [ffff8826fa3e7830] osd_submit_bio at ffffffffa10cdbcc [osd_ldiskfs]
#13 [ffff8826fa3e7840] osd_do_bio at ffffffffa10d03c1 [osd_ldiskfs]
#14 [ffff8826fa3e7960] osd_write_commit at ffffffffa10d0b94 [osd_ldiskfs]
#15 [ffff8826fa3e7a00] ofd_commitrw_write at ffffffffa120c734 [ofd]
#16 [ffff8826fa3e7a78] ofd_commitrw at ffffffffa120ff07 [ofd]
#17 [ffff8826fa3e7b08] obd_commitrw at ffffffffa0c7dc22 [ptlrpc]
#18 [ffff8826fa3e7b70] tgt_brw_write at ffffffffa0c55fc1 [ptlrpc]
#19 [ffff8826fa3e7cd8] tgt_request_handle at ffffffffa0c52275 [ptlrpc]
#20 [ffff8826fa3e7d20] ptlrpc_server_handle_request at ffffffffa0bfe1fb [ptlrpc]
#21 [ffff8826fa3e7de8] ptlrpc_main at ffffffffa0c022b0 [ptlrpc]
#22 [ffff8826fa3e7ec8] kthread at ffffffff810b0a4f
#23 [ffff8826fa3e7f50] ret_from_fork at ffffffff81697218


After some disassembly work, I found out that all active tasks are working on the same md (all are raid6):

md32’s (struct r5conf *) is 0xffff883df0516400

Our crashed task:

crash> printf "%s", ((struct r5conf)*0xffff883df0516400).cache_name[0]
raid6-md32

Example for md32_raid6 task in release_inactive_stripe_list:

crash> bt 154637
PID: 154637  TASK: ffff881f776b1f60  CPU: 29  COMMAND: "md32_raid6"
 #0 [ffff883ffc985e48] crash_nmi_callback at ffffffff8104d2e2
 #1 [ffff883ffc985e58] nmi_handle at ffffffff8168fc77
 #2 [ffff883ffc985eb0] do_nmi at ffffffff8168fe83
 #3 [ffff883ffc985ef0] end_repeat_nmi at ffffffff8168f093
    [exception RIP: _raw_spin_lock_irqsave+64]
    RIP: ffffffff8168e5f0  RSP: ffff881e8d0cbc60  RFLAGS: 00000006
    RAX: 00000000000031bb  RBX: ffff883df0516740  RCX: 000000000000009a
    RDX: 0000000000000098  RSI: 000000000000009a  RDI: ffff883df051641c
    RBP: ffff881e8d0cbc60   R8: 0000000000000206   R9: 0000000000000000
    R10: 0000000000000000  R11: 0000000000000007  R12: ffff883df0516400
    R13: ffff883df051641c  R14: 0000000000000006  R15: 0000000000000005
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
--- <NMI exception stack> ---
 #4 [ffff881e8d0cbc60] _raw_spin_lock_irqsave at ffffffff8168e5f0
 #5 [ffff881e8d0cbc68] release_inactive_stripe_list at ffffffffa0fe5500 [raid456]
 #6 [ffff881e8d0cbcb0] handle_active_stripes at ffffffffa0ff34c2 [raid456]
 #7 [ffff881e8d0cbd60] raid5d at ffffffffa0ff3c28 [raid456]
 #8 [ffff881e8d0cbe50] md_thread at ffffffff814fb745
 #9 [ffff881e8d0cbec8] kthread at ffffffff810b0a4f
#10 [ffff881e8d0cbf50] ret_from_fork at ffffffff81697218


crash> bt 163643 164083 171389 171391 171399 237019 237033 |grep "exception RIP:"
    [exception RIP: _raw_spin_lock_irq+63]
    [exception RIP: _raw_spin_lock_irq+63]
    [exception RIP: _raw_spin_lock_irq+58]
    [exception RIP: _raw_spin_lock_irq+63]
    [exception RIP: _raw_spin_lock_irq+58]
    [exception RIP: _raw_spin_lock_irq+63]
    [exception RIP: raid5_get_active_stripe+1809]

crash> bt 163643 164083 171389 171391 171399 237019 237033 |grep "raid5_get_active_stripe"
 #5 [ffff883bcee93588] raid5_get_active_stripe at ffffffffa0fef6cb [raid456]
 #5 [ffff883ba4af7588] raid5_get_active_stripe at ffffffffa0fef6cb [raid456]
 #5 [ffff88390877b588] raid5_get_active_stripe at ffffffffa0fef6cb [raid456]
 #5 [ffff88390803b588] raid5_get_active_stripe at ffffffffa0fef6cb [raid456]
 #5 [ffff88390831f588] raid5_get_active_stripe at ffffffffa0fef6cb [raid456]
 #5 [ffff8826fa7f7588] raid5_get_active_stripe at ffffffffa0fef6cb [raid456]
    [exception RIP: raid5_get_active_stripe+1809]

crash> bt 163643 164083 171389 171391 171399 237019 237033 |grep R12
    R10: 0000000000000003  R11: 0000000000000000  R12: ffff883df0516400
    R10: 0000000000000001  R11: 0000000000000000  R12: ffff883df0516400
    R10: 0000000000000005  R11: 0000000000000000  R12: ffff883df0516400
    R10: 0000000000000007  R11: 0000000000000000  R12: ffff883df0516400
    R10: 0000000000000009  R11: 0000000000000000  R12: ffff883df0516400
    R10: 0000000000000009  R11: 0000000000000000  R12: ffff883df0516400
    R10: 0000000000000000  R11: 0000000000000000  R12: ffff883df0516400


The bad stripe_head (sh) is available at (usually in %r13):

crash> p ((struct stripe_head)*0xffff881e08d54d30).raid_conf
$98 = (struct r5conf *) 0xffff883df0516400

and batch_head is not NULL... that's the issue:

crash> p ((struct stripe_head)*0xffff881e08d54d30).batch_head
$99 = (struct stripe_head *) 0xffff881e08d54d30

Full struct dump below. We can see than overwrite_disks is set to 0, state is 0… but this batch_head not set to NULL makes it crash in init_stripe():

crash> p ((struct stripe_head)*0xffff881e08d54d30)
$6 = {
  hash = {
    next = 0x0, 
    pprev = 0x0
  }, 
  lru = {
    next = 0xffff881e08d54d40, 
    prev = 0xffff881e08d54d40
  }, 
  release_list = {
    next = 0xffff881dfe405cc0
  }, 
  raid_conf = 0xffff883df0516400, 
  generation = 0, 
  sector = 3701224808, 
  pd_idx = 1, 
  qd_idx = 2, 
  ddf_layout = 0, 
  hash_lock_index = 5, 
  state = 0, 
  count = {
    counter = 0
  }, 
  bm_seq = 22708787, 
  disks = 10, 
  overwrite_disks = 0, 
  check_state = check_state_idle, 
  reconstruct_state = reconstruct_state_idle, 
  stripe_lock = {
    {
      rlock = {
        raw_lock = {
          {
            head_tail = 3709263126, 
            tickets = {
              head = 56598, 
              tail = 56598
            }
          }
        }
      }
    }
  }, 
  cpu = 15, 
  group = 0x0, 
  batch_head = 0xffff881e08d54d30, 
  batch_lock = {
    {
      rlock = {
        raw_lock = {
          {
            head_tail = 9961624, 
            tickets = {
              head = 152, 
              tail = 152
            }
          }
        }
      }
    }
  }, 
  batch_list = {
    next = 0xffff881dfe405d28, 
    prev = 0xffff881e0e2dec98
  }, 
  log_io = 0x0, 
  log_list = {
    next = 0x0, 
    prev = 0x0
  }, 
  ops = {
    target = 0, 
    target2 = 0, 
    zero_sum_result = (unknown: 0)
  }, 
  dev = {{
      req = {
        bi_sector = 0, 
        bi_next = 0x0, 
        bi_bdev = 0x0, 
        bi_flags = 1, 
        bi_rw = 0, 
        bi_vcnt = 0, 
        bi_idx = 0, 
        bi_phys_segments = 0, 
        bi_size = 0, 
        bi_seg_front_size = 0, 
        bi_seg_back_size = 0, 
        bi_end_io = 0x0, 
        bi_private = 0xffff881e08d54d30, 
        bi_ioc = 0x0, 
        bi_css = 0x0, 
        bi_integrity = 0x0, 
        bi_max_vecs = 1, 
        bi_cnt = {
          counter = 1
        }, 
        bi_io_vec = 0xffff881e08d54f00, 
        bi_pool = 0x0, 
        {
          bio_aux = 0x0, 
          __UNIQUE_ID_rh_kabi_hide15 = {
            rh_reserved1 = 0x0
          }, 
          {<No data fields>}
        }, 
        bi_inline_vecs = 0xffff881e08d54e78
      }, 
      rreq = {
        bi_sector = 0, 
        bi_next = 0x0, 
        bi_bdev = 0x0, 
        bi_flags = 1, 
        bi_rw = 0, 
        bi_vcnt = 0, 
        bi_idx = 0, 
        bi_phys_segments = 0, 
        bi_size = 0, 
        bi_seg_front_size = 0, 
        bi_seg_back_size = 0, 
        bi_end_io = 0x0, 
        bi_private = 0xffff881e08d54d30, 
        bi_ioc = 0x0, 
        bi_css = 0x0, 
        bi_integrity = 0x0, 
        bi_max_vecs = 1, 
        bi_cnt = {
          counter = 1
        }, 
        bi_io_vec = 0xffff881e08d54f10, 
        bi_pool = 0x0, 
        {
          bio_aux = 0x0, 
          __UNIQUE_ID_rh_kabi_hide15 = {
            rh_reserved1 = 0x0
          }, 
          {<No data fields>}
        }, 
        bi_inline_vecs = 0xffff881e08d54f00
      }, 
      vec = {
        bv_page = 0xffffea0078236080, 
        bv_len = 4096, 
        bv_offset = 0
      }, 
      rvec = {
        bv_page = 0x0, 
        bv_len = 0, 
        bv_offset = 0
      }, 
      page = 0xffffea0078236080, 
      orig_page = 0xffffea0078236080, 
      toread = 0x0, 
      read = 0x0, 
      towrite = 0x0, 
      written = 0xffff8824832a7c00, 
      sector = 29609798632, 
      flags = 24, 
      log_checksum = 0
    }}
}

Thanks,

Stephane



> On May 8, 2017, at 10:15 AM, Stephane Thiell <sthiell@stanford.edu> wrote:
> 
> Hi list,
> 
> I have a recurring issue using raid6 which results in a panic due to a BUG_ON(sh->batch_head). During last weekend, the issue occurred during a weekly raid-check. The raid volumes (12 total) are pretty new, no mismatch nor hardware errors have been detected.
> 
> [535089.369357] kernel BUG at drivers/md/raid5.c:527!
> [535089.374700] invalid opcode: 0000 [#1] SMP 
> [535089.379384] Modules linked in: ...
> [535089.503509] CPU: 34 PID: 280061 Comm: md0_resync Tainted: G           OE  ------------   3.10.0-514.10.2.el7_lustre.x86_64 #1
> 
> This is the backtrace:
> 
> crash> bt 280061
> PID: 280061  TASK: ffff8800757cde20  CPU: 34  COMMAND: "md0_resync"
> #0 [ffff88024e217830] machine_kexec at ffffffff81059bdb
> #1 [ffff88024e217890] __crash_kexec at ffffffff81105382
> #2 [ffff88024e217960] crash_kexec at ffffffff81105470
> #3 [ffff88024e217978] oops_end at ffffffff8168f508
> #4 [ffff88024e2179a0] die at ffffffff8102e93b
> #5 [ffff88024e2179d0] do_trap at ffffffff8168ebc0
> #6 [ffff88024e217a20] do_invalid_op at ffffffff8102b144
> #7 [ffff88024e217ad0] invalid_op at ffffffff816984de
>    [exception RIP: raid5_get_active_stripe+1809]
>    RIP: ffffffffa0e4ed71  RSP: ffff88024e217b88  RFLAGS: 00010086
>    RAX: 0000000000000000  RBX: ffff883fe5e40408  RCX: dead000000000200
>    RDX: 0000000000000000  RSI: 0000000000000000  RDI: ffff8823ebf45ca0
>    RBP: ffff88024e217c30   R8: ffff8823ebf45cb0   R9: 0000000000000080
>    R10: 0000000000000006  R11: 0000000000000000  R12: ffff883fe5e40400
>    R13: ffff8823ebf45ca0  R14: 0000000000000000  R15: 0000000000000000
>    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
> #8 [ffff88024e217b80] raid5_get_active_stripe at ffffffffa0e4e996 [raid456]
> #9 [ffff88024e217be0] raid5_release_stripe at ffffffffa0e48f24 [raid456]
> #10 [ffff88024e217c38] raid5_sync_request at ffffffffa0e53c4b [raid456]
> #11 [ffff88024e217ca8] md_do_sync at ffffffff814fef9b
> #12 [ffff88024e217e50] md_thread at ffffffff814fb1b5
> #13 [ffff88024e217ec8] kthread at ffffffff810b06ff
> #14 [ffff88024e217f50] ret_from_fork at ffffffff81696b98
> 
> It appears to be triggered by BUG_ON(sh->batch_head) in init_stripe():
> 
> crash> l drivers/md/raid5.c:524
> 519     static void init_stripe(struct stripe_head *sh, sector_t sector, int previous)
> 520     {
> 521             struct r5conf *conf = sh->raid_conf;
> 522             int i, seq;
> 523     
> 524             BUG_ON(atomic_read(&sh->count) != 0);
> 525             BUG_ON(test_bit(STRIPE_HANDLE, &sh->state));
> 526             BUG_ON(stripe_operations_active(sh));
> 527             BUG_ON(sh->batch_head);    <<<
> 528 
> 
> Other I/Os were processed at this time, but I am not sure how to check that they were actually on the same md:
> 
> crash> ps | grep ">"
> <snip>
>> 59684      2  28  ffff88407fdc0fb0  RU   0.0       0      0  [md0_raid6]
>> 61479      2  17  ffff883e46e80000  UN   0.0       0      0  [ll_ost_io01_001]
>> 220748      2  23  ffff881fb7ab8fb0  UN   0.0       0      0  [ll_ost_io01_011]
>> 220750      2  19  ffff881fb7abce70  UN   0.0       0      0  [ll_ost_io01_013]
>> 279158      2  14  ffff883ab46b4e70  RU   0.0       0      0  [md22_resync]
>> 280061      2  34  ffff8800757cde20  RU   0.0       0      0  [md0_resync]
>> 280829      2   6  ffff881c72296dd0  RU   0.0       0      0  [md6_resync]
> 
> example of possible concurrent writing thread:
> 
> crash> bt 61479
> PID: 61479  TASK: ffff883e46e80000  CPU: 17  COMMAND: "ll_ost_io01_001"
> #0 [ffff883ffc805e58] crash_nmi_callback at ffffffff8104d2e2
> #1 [ffff883ffc805e68] nmi_handle at ffffffff8168f699
> #2 [ffff883ffc805eb0] do_nmi at ffffffff8168f813
> #3 [ffff883ffc805ef0] end_repeat_nmi at ffffffff8168ead3
>    [exception RIP: _raw_spin_lock_irq+63]
>    RIP: ffffffff8168e09f  RSP: ffff883e46e3f588  RFLAGS: 00000002
>    RAX: 00000000000044c2  RBX: ffff883fe5e40408  RCX: 000000000000c464
>    RDX: 000000000000c468  RSI: 000000000000c468  RDI: ffff883fe5e40408
>    RBP: ffff883e46e3f588   R8: 0000000000000000   R9: 0000000000000080
>    R10: 0000000000000002  R11: 0000000000000000  R12: ffff883fe5e40400
>    R13: 0000000000000000  R14: ffff883fe0e61900  R15: 0000000000000000
>    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
> --- <NMI exception stack> ---
> #4 [ffff883e46e3f588] _raw_spin_lock_irq at ffffffff8168e09f
> #5 [ffff883e46e3f590] raid5_get_active_stripe at ffffffffa0e4e6cb [raid456]
> #6 [ffff883e46e3f648] raid5_make_request at ffffffffa0e4ef55 [raid456]
> #7 [ffff883e46e3f738] md_make_request at ffffffff814f7dfc
> #8 [ffff883e46e3f798] generic_make_request at ffffffff812ee939
> #9 [ffff883e46e3f7e0] submit_bio at ffffffff812eea81
> #10 [ffff883e46e3f838] osd_submit_bio at ffffffffa10a0bcc [osd_ldiskfs]
> #11 [ffff883e46e3f848] osd_do_bio at ffffffffa10a3007 [osd_ldiskfs]
> #12 [ffff883e46e3f968] osd_write_commit at ffffffffa10a3b94 [osd_ldiskfs]
> #13 [ffff883e46e3fa08] ofd_commitrw_write at ffffffffa1113774 [ofd]
> #14 [ffff883e46e3fa80] ofd_commitrw at ffffffffa1116f2d [ofd]
> #15 [ffff883e46e3fb08] obd_commitrw at ffffffffa0c43c22 [ptlrpc]
> #16 [ffff883e46e3fb70] tgt_brw_write at ffffffffa0c1bfc1 [ptlrpc]
> #17 [ffff883e46e3fcd8] tgt_request_handle at ffffffffa0c18275 [ptlrpc]
> #18 [ffff883e46e3fd20] ptlrpc_server_handle_request at ffffffffa0bc41fb [ptlrpc]
> #19 [ffff883e46e3fde8] ptlrpc_main at ffffffffa0bc82b0 [ptlrpc]
> #20 [ffff883e46e3fec8] kthread at ffffffff810b06ff
> #21 [ffff883e46e3ff50] ret_from_fork at ffffffff81696b98
> 
> 
> The same issue happened during a heavy IOR benchmark a few months ago and was described in https://jira.hpdd.intel.com/browse/LU-8917 (on a slightly older el7 kernel).
> 
> I also found out that other users have described similar issues, like on this thread: https://lkml.org/lkml/2016/12/23/205
> 
> Now, I have a crash dump and I’m trying to understand why sh->batch_head could be set in init_stripe(), which is called by raid5_get_active_stripe() when __find_stripe() failed BUT get_free_stripe() succeeded. If sh->batch_head is set in that case, that means the idle stripe found had it set…
> 
> Does someone have any idea of how to troubleshoot or solve this?
> 
> Thanks!
> 
> Stephan\x13


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: BUG_ON(sh->batch_head) in init_stripe()
  2017-08-13 17:29 ` Stephane Thiell
@ 2017-08-15  9:24   ` Guoqing Jiang
  2017-08-15 17:59     ` Stephane Thiell
  2017-08-15 18:36     ` Shaohua Li
  0 siblings, 2 replies; 15+ messages in thread
From: Guoqing Jiang @ 2017-08-15  9:24 UTC (permalink / raw)
  To: Stephane Thiell, linux-raid



On 08/14/2017 01:29 AM, Stephane Thiell wrote:
> Also known as "kernel BUG at drivers/md/raid5.c:527!"
>
> I’m still struggling with this issue in raid5.c. I currently have one crash per week in average on a system using Lustre and 24 x 10-disk raid6 per server. It did happen with or without check running. I now think there is a race at some point where a released stripe head (batched) is added to conf->inactive_list without sh->batch_head being set back to NULL.
>
> Because sh->batch_head is only being set back to NULL in a single function, break_stripe_batch_list(), my guess is that a call to raid5_release_stripe() on the head stripe is performed elsewhere. I didn’t find where this could come from yet. So perhaps sh->batch_head should be just set to NULL in do_release_stripe(), before being added to the inactive list?
>
> Alternatively, I wonder if it would make more sense to set batch_head to NULL where sh->overwrite_disks is also set back to 0 (like in ops_run_biodrain() today).
>
> Because sh->overwrite_disks is also set to 0 in init_stripe(), my current plan is to try this change, at least as a workaround: remove the BUG_ON(sh->batch_head) from init_stripe() and set sh->batch_head = NULL (I don’t think I need to lock on sh->stripe_lock there?). I will get new hardware soon so I will give it a try on a test system. Any suggestions welcomed…
>
> Quick analysis of the last vmcore:
>
> crash> ps | grep ">"
> ...
>> 16119  16111  22  ffff881f93bace70  RU   0.0  150772  10748  e2label
>> 151448      2  34  ffff881ff753bec0  RU   0.0       0      0  [kiblnd_sd_00_00]
>> 151449      2   2  ffff881ff753edd0  RU   0.0       0      0  [kiblnd_sd_00_01]
>> 154637      2  29  ffff881f776b1f60  RU   0.0       0      0  [md32_raid6]
>> 163643      2  11  ffff8820d200bec0  UN   0.0       0      0  [ll_ost_io01_008]
>> 164083      2  13  ffff8820d2753ec0  UN   0.0       0      0  [ll_ost_io01_034]
>> 171389      2   5  ffff883fd072de20  UN   0.0       0      0  [ll_ost_io01_058]
>> 171391      2  17  ffff883fd072ce70  UN   0.0       0      0  [ll_ost_io01_060]
>> 171399      2  33  ffff883fd9e9ce70  UN   0.0       0      0  [ll_ost_io01_068]
>> 237019      2  15  ffff882715628fb0  UN   0.0       0      0  [ll_ost_io01_084]
>> 237033      2   3  ffff881fb0250fb0  UN   0.0       0      0  [ll_ost_io01_098]
> ...
>
> crash>  bt 237033
> PID: 237033  TASK: ffff881fb0250fb0  CPU: 3   COMMAND: "ll_ost_io01_098"
>   #0 [ffff8826fa3e7240] machine_kexec at ffffffff81059bdb
>   #1 [ffff8826fa3e72a0] __crash_kexec at ffffffff811057c2
>   #2 [ffff8826fa3e7370] crash_kexec at ffffffff811058b0
>   #3 [ffff8826fa3e7388] oops_end at ffffffff8168fac8
>   #4 [ffff8826fa3e73b0] die at ffffffff8102e93b
>   #5 [ffff8826fa3e73e0] do_trap at ffffffff8168f180
>   #6 [ffff8826fa3e7430] do_invalid_op at ffffffff8102b144
>   #7 [ffff8826fa3e74e0] invalid_op at ffffffff81698b5e
>      [exception RIP: raid5_get_active_stripe+1809]
>      RIP: ffffffffa0fefd71  RSP: ffff8826fa3e7590  RFLAGS: 00010086
>      RAX: 0000000000000000  RBX: ffff883df051641c  RCX: dead000000000200
>      RDX: 0000000000000000  RSI: 0000000000000005  RDI: ffff881e08d54d30
>      RBP: ffff8826fa3e7638   R8: ffff881e08d54d40   R9: 0000000000000080
>      R10: 0000000000000000  R11: 0000000000000000  R12: ffff883df0516400
>      R13: ffff881e08d54d30  R14: 0000000000000000  R15: 0000000000000000
>      ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
>   #8 [ffff8826fa3e7640] raid5_make_request at ffffffffa0feff55 [raid456]
>   #9 [ffff8826fa3e7730] md_make_request at ffffffff814f838c
> #10 [ffff8826fa3e7790] generic_make_request at ffffffff812eedc9
> #11 [ffff8826fa3e77d8] submit_bio at ffffffff812eef11
> #12 [ffff8826fa3e7830] osd_submit_bio at ffffffffa10cdbcc [osd_ldiskfs]
> #13 [ffff8826fa3e7840] osd_do_bio at ffffffffa10d03c1 [osd_ldiskfs]
> #14 [ffff8826fa3e7960] osd_write_commit at ffffffffa10d0b94 [osd_ldiskfs]
> #15 [ffff8826fa3e7a00] ofd_commitrw_write at ffffffffa120c734 [ofd]
> #16 [ffff8826fa3e7a78] ofd_commitrw at ffffffffa120ff07 [ofd]
> #17 [ffff8826fa3e7b08] obd_commitrw at ffffffffa0c7dc22 [ptlrpc]
> #18 [ffff8826fa3e7b70] tgt_brw_write at ffffffffa0c55fc1 [ptlrpc]
> #19 [ffff8826fa3e7cd8] tgt_request_handle at ffffffffa0c52275 [ptlrpc]
> #20 [ffff8826fa3e7d20] ptlrpc_server_handle_request at ffffffffa0bfe1fb [ptlrpc]
> #21 [ffff8826fa3e7de8] ptlrpc_main at ffffffffa0c022b0 [ptlrpc]
> #22 [ffff8826fa3e7ec8] kthread at ffffffff810b0a4f
> #23 [ffff8826fa3e7f50] ret_from_fork at ffffffff81697218
>
>
> After some disassembly work, I found out that all active tasks are working on the same md (all are raid6):
>
> md32’s (struct r5conf *) is 0xffff883df0516400
>
> Our crashed task:
>
> crash> printf "%s", ((struct r5conf)*0xffff883df0516400).cache_name[0]
> raid6-md32
>
> Example for md32_raid6 task in release_inactive_stripe_list:
>
> crash> bt 154637
> PID: 154637  TASK: ffff881f776b1f60  CPU: 29  COMMAND: "md32_raid6"
>   #0 [ffff883ffc985e48] crash_nmi_callback at ffffffff8104d2e2
>   #1 [ffff883ffc985e58] nmi_handle at ffffffff8168fc77
>   #2 [ffff883ffc985eb0] do_nmi at ffffffff8168fe83
>   #3 [ffff883ffc985ef0] end_repeat_nmi at ffffffff8168f093
>      [exception RIP: _raw_spin_lock_irqsave+64]
>      RIP: ffffffff8168e5f0  RSP: ffff881e8d0cbc60  RFLAGS: 00000006
>      RAX: 00000000000031bb  RBX: ffff883df0516740  RCX: 000000000000009a
>      RDX: 0000000000000098  RSI: 000000000000009a  RDI: ffff883df051641c
>      RBP: ffff881e8d0cbc60   R8: 0000000000000206   R9: 0000000000000000
>      R10: 0000000000000000  R11: 0000000000000007  R12: ffff883df0516400
>      R13: ffff883df051641c  R14: 0000000000000006  R15: 0000000000000005
>      ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
> --- <NMI exception stack> ---
>   #4 [ffff881e8d0cbc60] _raw_spin_lock_irqsave at ffffffff8168e5f0
>   #5 [ffff881e8d0cbc68] release_inactive_stripe_list at ffffffffa0fe5500 [raid456]
>   #6 [ffff881e8d0cbcb0] handle_active_stripes at ffffffffa0ff34c2 [raid456]
>   #7 [ffff881e8d0cbd60] raid5d at ffffffffa0ff3c28 [raid456]
>   #8 [ffff881e8d0cbe50] md_thread at ffffffff814fb745
>   #9 [ffff881e8d0cbec8] kthread at ffffffff810b0a4f
> #10 [ffff881e8d0cbf50] ret_from_fork at ffffffff81697218
>
>
> crash> bt 163643 164083 171389 171391 171399 237019 237033 |grep "exception RIP:"
>      [exception RIP: _raw_spin_lock_irq+63]
>      [exception RIP: _raw_spin_lock_irq+63]
>      [exception RIP: _raw_spin_lock_irq+58]
>      [exception RIP: _raw_spin_lock_irq+63]
>      [exception RIP: _raw_spin_lock_irq+58]
>      [exception RIP: _raw_spin_lock_irq+63]
>      [exception RIP: raid5_get_active_stripe+1809]
>
> crash> bt 163643 164083 171389 171391 171399 237019 237033 |grep "raid5_get_active_stripe"
>   #5 [ffff883bcee93588] raid5_get_active_stripe at ffffffffa0fef6cb [raid456]
>   #5 [ffff883ba4af7588] raid5_get_active_stripe at ffffffffa0fef6cb [raid456]
>   #5 [ffff88390877b588] raid5_get_active_stripe at ffffffffa0fef6cb [raid456]
>   #5 [ffff88390803b588] raid5_get_active_stripe at ffffffffa0fef6cb [raid456]
>   #5 [ffff88390831f588] raid5_get_active_stripe at ffffffffa0fef6cb [raid456]
>   #5 [ffff8826fa7f7588] raid5_get_active_stripe at ffffffffa0fef6cb [raid456]
>      [exception RIP: raid5_get_active_stripe+1809]
>
> crash> bt 163643 164083 171389 171391 171399 237019 237033 |grep R12
>      R10: 0000000000000003  R11: 0000000000000000  R12: ffff883df0516400
>      R10: 0000000000000001  R11: 0000000000000000  R12: ffff883df0516400
>      R10: 0000000000000005  R11: 0000000000000000  R12: ffff883df0516400
>      R10: 0000000000000007  R11: 0000000000000000  R12: ffff883df0516400
>      R10: 0000000000000009  R11: 0000000000000000  R12: ffff883df0516400
>      R10: 0000000000000009  R11: 0000000000000000  R12: ffff883df0516400
>      R10: 0000000000000000  R11: 0000000000000000  R12: ffff883df0516400
>
>
> The bad stripe_head (sh) is available at (usually in %r13):
>
> crash> p ((struct stripe_head)*0xffff881e08d54d30).raid_conf
> $98 = (struct r5conf *) 0xffff883df0516400
>
> and batch_head is not NULL... that's the issue:
>
> crash> p ((struct stripe_head)*0xffff881e08d54d30).batch_head
> $99 = (struct stripe_head *) 0xffff881e08d54d30
>
> Full struct dump below. We can see than overwrite_disks is set to 0, state is 0… but this batch_head not set to NULL makes it crash in init_stripe():
>
> crash> p ((struct stripe_head)*0xffff881e08d54d30)
> $6 = {
>    hash = {
>      next = 0x0,
>      pprev = 0x0
>    },
>    lru = {
>      next = 0xffff881e08d54d40,
>      prev = 0xffff881e08d54d40
>    },
>    release_list = {
>      next = 0xffff881dfe405cc0
>    },
>    raid_conf = 0xffff883df0516400,
>    generation = 0,
>    sector = 3701224808,
>    pd_idx = 1,
>    qd_idx = 2,
>    ddf_layout = 0,
>    hash_lock_index = 5,
>    state = 0,
>    count = {
>      counter = 0
>    },
>    bm_seq = 22708787,
>    disks = 10,
>    overwrite_disks = 0,
>    check_state = check_state_idle,
>    reconstruct_state = reconstruct_state_idle,
>    stripe_lock = {
>      {
>        rlock = {
>          raw_lock = {
>            {
>              head_tail = 3709263126,
>              tickets = {
>                head = 56598,
>                tail = 56598
>              }
>            }
>          }
>        }
>      }
>    },
>    cpu = 15,
>    group = 0x0,
>    batch_head = 0xffff881e08d54d30,
>    batch_lock = {
>      {
>        rlock = {
>          raw_lock = {
>            {
>              head_tail = 9961624,
>              tickets = {
>                head = 152,
>                tail = 152
>              }
>            }
>          }
>        }
>      }
>    },
>    batch_list = {
>      next = 0xffff881dfe405d28,
>      prev = 0xffff881e0e2dec98
>    },
>    log_io = 0x0,
>    log_list = {
>      next = 0x0,
>      prev = 0x0
>    },
>    ops = {
>      target = 0,
>      target2 = 0,
>      zero_sum_result = (unknown: 0)
>    },
>    dev = {{
>        req = {
>          bi_sector = 0,
>          bi_next = 0x0,
>          bi_bdev = 0x0,
>          bi_flags = 1,
>          bi_rw = 0,
>          bi_vcnt = 0,
>          bi_idx = 0,
>          bi_phys_segments = 0,
>          bi_size = 0,
>          bi_seg_front_size = 0,
>          bi_seg_back_size = 0,
>          bi_end_io = 0x0,
>          bi_private = 0xffff881e08d54d30,
>          bi_ioc = 0x0,
>          bi_css = 0x0,
>          bi_integrity = 0x0,
>          bi_max_vecs = 1,
>          bi_cnt = {
>            counter = 1
>          },
>          bi_io_vec = 0xffff881e08d54f00,
>          bi_pool = 0x0,
>          {
>            bio_aux = 0x0,
>            __UNIQUE_ID_rh_kabi_hide15 = {
>              rh_reserved1 = 0x0
>            },
>            {<No data fields>}
>          },
>          bi_inline_vecs = 0xffff881e08d54e78
>        },
>        rreq = {
>          bi_sector = 0,
>          bi_next = 0x0,
>          bi_bdev = 0x0,
>          bi_flags = 1,
>          bi_rw = 0,
>          bi_vcnt = 0,
>          bi_idx = 0,
>          bi_phys_segments = 0,
>          bi_size = 0,
>          bi_seg_front_size = 0,
>          bi_seg_back_size = 0,
>          bi_end_io = 0x0,
>          bi_private = 0xffff881e08d54d30,
>          bi_ioc = 0x0,
>          bi_css = 0x0,
>          bi_integrity = 0x0,
>          bi_max_vecs = 1,
>          bi_cnt = {
>            counter = 1
>          },
>          bi_io_vec = 0xffff881e08d54f10,
>          bi_pool = 0x0,
>          {
>            bio_aux = 0x0,
>            __UNIQUE_ID_rh_kabi_hide15 = {
>              rh_reserved1 = 0x0
>            },
>            {<No data fields>}
>          },
>          bi_inline_vecs = 0xffff881e08d54f00
>        },
>        vec = {
>          bv_page = 0xffffea0078236080,
>          bv_len = 4096,
>          bv_offset = 0
>        },
>        rvec = {
>          bv_page = 0x0,
>          bv_len = 0,
>          bv_offset = 0
>        },
>        page = 0xffffea0078236080,
>        orig_page = 0xffffea0078236080,
>        toread = 0x0,
>        read = 0x0,
>        towrite = 0x0,
>        written = 0xffff8824832a7c00,
>        sector = 29609798632,
>        flags = 24,

Seems the flags is R5_OVERWRITE | R5_Insync, handle_stripe_clean_event
only calls break_stripe_batch_list if do_endio is set to true.

if (head_sh->batch_head && do_endio)
                 break_stripe_batch_list(head_sh, STRIPE_EXPAND_SYNC_FLAGS);

And do_endio is set to true if we meet the below condition.

if (!test_bit(R5_LOCKED, &dev->flags) &&
                             (test_bit(R5_UPTODATE, &dev->flags) ||
                              test_bit(R5_Discard, &dev->flags) ||
                              test_bit(R5_SkipCopy, &dev->flags))) {

I guess it is why the batch_head is not null, just my 2 cents.

>        log_checksum = 0
>      }}
> }

Thanks,
Guoqing


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: BUG_ON(sh->batch_head) in init_stripe()
  2017-08-15  9:24   ` Guoqing Jiang
@ 2017-08-15 17:59     ` Stephane Thiell
  2017-08-15 18:39       ` Shaohua Li
  2017-08-15 18:36     ` Shaohua Li
  1 sibling, 1 reply; 15+ messages in thread
From: Stephane Thiell @ 2017-08-15 17:59 UTC (permalink / raw)
  To: Guoqing Jiang; +Cc: linux-raid


> On Aug 15, 2017, at 2:24 AM, Guoqing Jiang <gqjiang@suse.com> wrote:
> 
> 
> 
> On 08/14/2017 01:29 AM, Stephane Thiell wrote:
>> Also known as "kernel BUG at drivers/md/raid5.c:527!"
>> 
>> I’m still struggling with this issue in raid5.c. I currently have one crash per week in average on a system using Lustre and 24 x 10-disk raid6 per server. It did happen with or without check running. I now think there is a race at some point where a released stripe head (batched) is added to conf->inactive_list without sh->batch_head being set back to NULL.
>> 
>> Because sh->batch_head is only being set back to NULL in a single function, break_stripe_batch_list(), my guess is that a call to raid5_release_stripe() on the head stripe is performed elsewhere. I didn’t find where this could come from yet. So perhaps sh->batch_head should be just set to NULL in do_release_stripe(), before being added to the inactive list?
>> 
>> Alternatively, I wonder if it would make more sense to set batch_head to NULL where sh->overwrite_disks is also set back to 0 (like in ops_run_biodrain() today).
>> 
>> Because sh->overwrite_disks is also set to 0 in init_stripe(), my current plan is to try this change, at least as a workaround: remove the BUG_ON(sh->batch_head) from init_stripe() and set sh->batch_head = NULL (I don’t think I need to lock on sh->stripe_lock there?). I will get new hardware soon so I will give it a try on a test system. Any suggestions welcomed…
>> 
>> Quick analysis of the last vmcore:
>> 
>> crash> ps | grep ">"
>> ...
>>> 16119  16111  22  ffff881f93bace70  RU   0.0  150772  10748  e2label
>>> 151448      2  34  ffff881ff753bec0  RU   0.0       0      0  [kiblnd_sd_00_00]
>>> 151449      2   2  ffff881ff753edd0  RU   0.0       0      0  [kiblnd_sd_00_01]
>>> 154637      2  29  ffff881f776b1f60  RU   0.0       0      0  [md32_raid6]
>>> 163643      2  11  ffff8820d200bec0  UN   0.0       0      0  [ll_ost_io01_008]
>>> 164083      2  13  ffff8820d2753ec0  UN   0.0       0      0  [ll_ost_io01_034]
>>> 171389      2   5  ffff883fd072de20  UN   0.0       0      0  [ll_ost_io01_058]
>>> 171391      2  17  ffff883fd072ce70  UN   0.0       0      0  [ll_ost_io01_060]
>>> 171399      2  33  ffff883fd9e9ce70  UN   0.0       0      0  [ll_ost_io01_068]
>>> 237019      2  15  ffff882715628fb0  UN   0.0       0      0  [ll_ost_io01_084]
>>> 237033      2   3  ffff881fb0250fb0  UN   0.0       0      0  [ll_ost_io01_098]
>> ...
>> 
>> crash>  bt 237033
>> PID: 237033  TASK: ffff881fb0250fb0  CPU: 3   COMMAND: "ll_ost_io01_098"
>>  #0 [ffff8826fa3e7240] machine_kexec at ffffffff81059bdb
>>  #1 [ffff8826fa3e72a0] __crash_kexec at ffffffff811057c2
>>  #2 [ffff8826fa3e7370] crash_kexec at ffffffff811058b0
>>  #3 [ffff8826fa3e7388] oops_end at ffffffff8168fac8
>>  #4 [ffff8826fa3e73b0] die at ffffffff8102e93b
>>  #5 [ffff8826fa3e73e0] do_trap at ffffffff8168f180
>>  #6 [ffff8826fa3e7430] do_invalid_op at ffffffff8102b144
>>  #7 [ffff8826fa3e74e0] invalid_op at ffffffff81698b5e
>>     [exception RIP: raid5_get_active_stripe+1809]
>>     RIP: ffffffffa0fefd71  RSP: ffff8826fa3e7590  RFLAGS: 00010086
>>     RAX: 0000000000000000  RBX: ffff883df051641c  RCX: dead000000000200
>>     RDX: 0000000000000000  RSI: 0000000000000005  RDI: ffff881e08d54d30
>>     RBP: ffff8826fa3e7638   R8: ffff881e08d54d40   R9: 0000000000000080
>>     R10: 0000000000000000  R11: 0000000000000000  R12: ffff883df0516400
>>     R13: ffff881e08d54d30  R14: 0000000000000000  R15: 0000000000000000
>>     ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
>>  #8 [ffff8826fa3e7640] raid5_make_request at ffffffffa0feff55 [raid456]
>>  #9 [ffff8826fa3e7730] md_make_request at ffffffff814f838c
>> #10 [ffff8826fa3e7790] generic_make_request at ffffffff812eedc9
>> #11 [ffff8826fa3e77d8] submit_bio at ffffffff812eef11
>> #12 [ffff8826fa3e7830] osd_submit_bio at ffffffffa10cdbcc [osd_ldiskfs]
>> #13 [ffff8826fa3e7840] osd_do_bio at ffffffffa10d03c1 [osd_ldiskfs]
>> #14 [ffff8826fa3e7960] osd_write_commit at ffffffffa10d0b94 [osd_ldiskfs]
>> #15 [ffff8826fa3e7a00] ofd_commitrw_write at ffffffffa120c734 [ofd]
>> #16 [ffff8826fa3e7a78] ofd_commitrw at ffffffffa120ff07 [ofd]
>> #17 [ffff8826fa3e7b08] obd_commitrw at ffffffffa0c7dc22 [ptlrpc]
>> #18 [ffff8826fa3e7b70] tgt_brw_write at ffffffffa0c55fc1 [ptlrpc]
>> #19 [ffff8826fa3e7cd8] tgt_request_handle at ffffffffa0c52275 [ptlrpc]
>> #20 [ffff8826fa3e7d20] ptlrpc_server_handle_request at ffffffffa0bfe1fb [ptlrpc]
>> #21 [ffff8826fa3e7de8] ptlrpc_main at ffffffffa0c022b0 [ptlrpc]
>> #22 [ffff8826fa3e7ec8] kthread at ffffffff810b0a4f
>> #23 [ffff8826fa3e7f50] ret_from_fork at ffffffff81697218
>> 
>> 
>> After some disassembly work, I found out that all active tasks are working on the same md (all are raid6):
>> 
>> md32’s (struct r5conf *) is 0xffff883df0516400
>> 
>> Our crashed task:
>> 
>> crash> printf "%s", ((struct r5conf)*0xffff883df0516400).cache_name[0]
>> raid6-md32
>> 
>> Example for md32_raid6 task in release_inactive_stripe_list:
>> 
>> crash> bt 154637
>> PID: 154637  TASK: ffff881f776b1f60  CPU: 29  COMMAND: "md32_raid6"
>>  #0 [ffff883ffc985e48] crash_nmi_callback at ffffffff8104d2e2
>>  #1 [ffff883ffc985e58] nmi_handle at ffffffff8168fc77
>>  #2 [ffff883ffc985eb0] do_nmi at ffffffff8168fe83
>>  #3 [ffff883ffc985ef0] end_repeat_nmi at ffffffff8168f093
>>     [exception RIP: _raw_spin_lock_irqsave+64]
>>     RIP: ffffffff8168e5f0  RSP: ffff881e8d0cbc60  RFLAGS: 00000006
>>     RAX: 00000000000031bb  RBX: ffff883df0516740  RCX: 000000000000009a
>>     RDX: 0000000000000098  RSI: 000000000000009a  RDI: ffff883df051641c
>>     RBP: ffff881e8d0cbc60   R8: 0000000000000206   R9: 0000000000000000
>>     R10: 0000000000000000  R11: 0000000000000007  R12: ffff883df0516400
>>     R13: ffff883df051641c  R14: 0000000000000006  R15: 0000000000000005
>>     ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
>> --- <NMI exception stack> ---
>>  #4 [ffff881e8d0cbc60] _raw_spin_lock_irqsave at ffffffff8168e5f0
>>  #5 [ffff881e8d0cbc68] release_inactive_stripe_list at ffffffffa0fe5500 [raid456]
>>  #6 [ffff881e8d0cbcb0] handle_active_stripes at ffffffffa0ff34c2 [raid456]
>>  #7 [ffff881e8d0cbd60] raid5d at ffffffffa0ff3c28 [raid456]
>>  #8 [ffff881e8d0cbe50] md_thread at ffffffff814fb745
>>  #9 [ffff881e8d0cbec8] kthread at ffffffff810b0a4f
>> #10 [ffff881e8d0cbf50] ret_from_fork at ffffffff81697218
>> 
>> 
>> crash> bt 163643 164083 171389 171391 171399 237019 237033 |grep "exception RIP:"
>>     [exception RIP: _raw_spin_lock_irq+63]
>>     [exception RIP: _raw_spin_lock_irq+63]
>>     [exception RIP: _raw_spin_lock_irq+58]
>>     [exception RIP: _raw_spin_lock_irq+63]
>>     [exception RIP: _raw_spin_lock_irq+58]
>>     [exception RIP: _raw_spin_lock_irq+63]
>>     [exception RIP: raid5_get_active_stripe+1809]
>> 
>> crash> bt 163643 164083 171389 171391 171399 237019 237033 |grep "raid5_get_active_stripe"
>>  #5 [ffff883bcee93588] raid5_get_active_stripe at ffffffffa0fef6cb [raid456]
>>  #5 [ffff883ba4af7588] raid5_get_active_stripe at ffffffffa0fef6cb [raid456]
>>  #5 [ffff88390877b588] raid5_get_active_stripe at ffffffffa0fef6cb [raid456]
>>  #5 [ffff88390803b588] raid5_get_active_stripe at ffffffffa0fef6cb [raid456]
>>  #5 [ffff88390831f588] raid5_get_active_stripe at ffffffffa0fef6cb [raid456]
>>  #5 [ffff8826fa7f7588] raid5_get_active_stripe at ffffffffa0fef6cb [raid456]
>>     [exception RIP: raid5_get_active_stripe+1809]
>> 
>> crash> bt 163643 164083 171389 171391 171399 237019 237033 |grep R12
>>     R10: 0000000000000003  R11: 0000000000000000  R12: ffff883df0516400
>>     R10: 0000000000000001  R11: 0000000000000000  R12: ffff883df0516400
>>     R10: 0000000000000005  R11: 0000000000000000  R12: ffff883df0516400
>>     R10: 0000000000000007  R11: 0000000000000000  R12: ffff883df0516400
>>     R10: 0000000000000009  R11: 0000000000000000  R12: ffff883df0516400
>>     R10: 0000000000000009  R11: 0000000000000000  R12: ffff883df0516400
>>     R10: 0000000000000000  R11: 0000000000000000  R12: ffff883df0516400
>> 
>> 
>> The bad stripe_head (sh) is available at (usually in %r13):
>> 
>> crash> p ((struct stripe_head)*0xffff881e08d54d30).raid_conf
>> $98 = (struct r5conf *) 0xffff883df0516400
>> 
>> and batch_head is not NULL... that's the issue:
>> 
>> crash> p ((struct stripe_head)*0xffff881e08d54d30).batch_head
>> $99 = (struct stripe_head *) 0xffff881e08d54d30
>> 
>> Full struct dump below. We can see than overwrite_disks is set to 0, state is 0… but this batch_head not set to NULL makes it crash in init_stripe():
>> 
>> crash> p ((struct stripe_head)*0xffff881e08d54d30)
>> $6 = {
>>   hash = {
>>     next = 0x0,
>>     pprev = 0x0
>>   },
>>   lru = {
>>     next = 0xffff881e08d54d40,
>>     prev = 0xffff881e08d54d40
>>   },
>>   release_list = {
>>     next = 0xffff881dfe405cc0
>>   },
>>   raid_conf = 0xffff883df0516400,
>>   generation = 0,
>>   sector = 3701224808,
>>   pd_idx = 1,
>>   qd_idx = 2,
>>   ddf_layout = 0,
>>   hash_lock_index = 5,
>>   state = 0,
>>   count = {
>>     counter = 0
>>   },
>>   bm_seq = 22708787,
>>   disks = 10,
>>   overwrite_disks = 0,
>>   check_state = check_state_idle,
>>   reconstruct_state = reconstruct_state_idle,
>>   stripe_lock = {
>>     {
>>       rlock = {
>>         raw_lock = {
>>           {
>>             head_tail = 3709263126,
>>             tickets = {
>>               head = 56598,
>>               tail = 56598
>>             }
>>           }
>>         }
>>       }
>>     }
>>   },
>>   cpu = 15,
>>   group = 0x0,
>>   batch_head = 0xffff881e08d54d30,
>>   batch_lock = {
>>     {
>>       rlock = {
>>         raw_lock = {
>>           {
>>             head_tail = 9961624,
>>             tickets = {
>>               head = 152,
>>               tail = 152
>>             }
>>           }
>>         }
>>       }
>>     }
>>   },
>>   batch_list = {
>>     next = 0xffff881dfe405d28,
>>     prev = 0xffff881e0e2dec98
>>   },
>>   log_io = 0x0,
>>   log_list = {
>>     next = 0x0,
>>     prev = 0x0
>>   },
>>   ops = {
>>     target = 0,
>>     target2 = 0,
>>     zero_sum_result = (unknown: 0)
>>   },
>>   dev = {{
>>       req = {
>>         bi_sector = 0,
>>         bi_next = 0x0,
>>         bi_bdev = 0x0,
>>         bi_flags = 1,
>>         bi_rw = 0,
>>         bi_vcnt = 0,
>>         bi_idx = 0,
>>         bi_phys_segments = 0,
>>         bi_size = 0,
>>         bi_seg_front_size = 0,
>>         bi_seg_back_size = 0,
>>         bi_end_io = 0x0,
>>         bi_private = 0xffff881e08d54d30,
>>         bi_ioc = 0x0,
>>         bi_css = 0x0,
>>         bi_integrity = 0x0,
>>         bi_max_vecs = 1,
>>         bi_cnt = {
>>           counter = 1
>>         },
>>         bi_io_vec = 0xffff881e08d54f00,
>>         bi_pool = 0x0,
>>         {
>>           bio_aux = 0x0,
>>           __UNIQUE_ID_rh_kabi_hide15 = {
>>             rh_reserved1 = 0x0
>>           },
>>           {<No data fields>}
>>         },
>>         bi_inline_vecs = 0xffff881e08d54e78
>>       },
>>       rreq = {
>>         bi_sector = 0,
>>         bi_next = 0x0,
>>         bi_bdev = 0x0,
>>         bi_flags = 1,
>>         bi_rw = 0,
>>         bi_vcnt = 0,
>>         bi_idx = 0,
>>         bi_phys_segments = 0,
>>         bi_size = 0,
>>         bi_seg_front_size = 0,
>>         bi_seg_back_size = 0,
>>         bi_end_io = 0x0,
>>         bi_private = 0xffff881e08d54d30,
>>         bi_ioc = 0x0,
>>         bi_css = 0x0,
>>         bi_integrity = 0x0,
>>         bi_max_vecs = 1,
>>         bi_cnt = {
>>           counter = 1
>>         },
>>         bi_io_vec = 0xffff881e08d54f10,
>>         bi_pool = 0x0,
>>         {
>>           bio_aux = 0x0,
>>           __UNIQUE_ID_rh_kabi_hide15 = {
>>             rh_reserved1 = 0x0
>>           },
>>           {<No data fields>}
>>         },
>>         bi_inline_vecs = 0xffff881e08d54f00
>>       },
>>       vec = {
>>         bv_page = 0xffffea0078236080,
>>         bv_len = 4096,
>>         bv_offset = 0
>>       },
>>       rvec = {
>>         bv_page = 0x0,
>>         bv_len = 0,
>>         bv_offset = 0
>>       },
>>       page = 0xffffea0078236080,
>>       orig_page = 0xffffea0078236080,
>>       toread = 0x0,
>>       read = 0x0,
>>       towrite = 0x0,
>>       written = 0xffff8824832a7c00,
>>       sector = 29609798632,
>>       flags = 24,
> 
> Seems the flags is R5_OVERWRITE | R5_Insync, handle_stripe_clean_event
> only calls break_stripe_batch_list if do_endio is set to true.
> 
> if (head_sh->batch_head && do_endio)
>                break_stripe_batch_list(head_sh, STRIPE_EXPAND_SYNC_FLAGS);
> 
> And do_endio is set to true if we meet the below condition.
> 
> if (!test_bit(R5_LOCKED, &dev->flags) &&
>                            (test_bit(R5_UPTODATE, &dev->flags) ||
>                             test_bit(R5_Discard, &dev->flags) ||
>                             test_bit(R5_SkipCopy, &dev->flags))) {
> 
> I guess it is why the batch_head is not null, just my 2 cents.


That’s a very good point, thanks Guoqing!

I checked another similar crash dump, and found out that this same flags was 16 (R5_Insync alone), which doesn’t meet the condition for having do_endio = true neither.

The code below in raid5_end_write_request does directly release a sh->batch_head, couldn’t that be an issue?

static void raid5_end_write_request(struct bio *bi)
{

...

        raid5_release_stripe(sh);

        if (sh->batch_head && sh != sh->batch_head)
                raid5_release_stripe(sh->batch_head);
}

Best,

Stephane


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: BUG_ON(sh->batch_head) in init_stripe()
  2017-08-15  9:24   ` Guoqing Jiang
  2017-08-15 17:59     ` Stephane Thiell
@ 2017-08-15 18:36     ` Shaohua Li
  2017-08-15 19:07       ` Stephane Thiell
  1 sibling, 1 reply; 15+ messages in thread
From: Shaohua Li @ 2017-08-15 18:36 UTC (permalink / raw)
  To: Guoqing Jiang; +Cc: Stephane Thiell, linux-raid

On Tue, Aug 15, 2017 at 05:24:53PM +0800, Guoqing Jiang wrote:
> 
> 
> On 08/14/2017 01:29 AM, Stephane Thiell wrote:
> > Also known as "kernel BUG at drivers/md/raid5.c:527!"
> > 
> > I’m still struggling with this issue in raid5.c. I currently have one crash per week in average on a system using Lustre and 24 x 10-disk raid6 per server. It did happen with or without check running. I now think there is a race at some point where a released stripe head (batched) is added to conf->inactive_list without sh->batch_head being set back to NULL.
> > 
> > Because sh->batch_head is only being set back to NULL in a single function, break_stripe_batch_list(), my guess is that a call to raid5_release_stripe() on the head stripe is performed elsewhere. I didn’t find where this could come from yet. So perhaps sh->batch_head should be just set to NULL in do_release_stripe(), before being added to the inactive list?
> > 
> > Alternatively, I wonder if it would make more sense to set batch_head to NULL where sh->overwrite_disks is also set back to 0 (like in ops_run_biodrain() today).
> > 
> > Because sh->overwrite_disks is also set to 0 in init_stripe(), my current plan is to try this change, at least as a workaround: remove the BUG_ON(sh->batch_head) from init_stripe() and set sh->batch_head = NULL (I don’t think I need to lock on sh->stripe_lock there?). I will get new hardware soon so I will give it a try on a test system. Any suggestions welcomed…
> > 
> > Quick analysis of the last vmcore:
> > 
> > crash> ps | grep ">"
> > ...
> > > 16119  16111  22  ffff881f93bace70  RU   0.0  150772  10748  e2label
> > > 151448      2  34  ffff881ff753bec0  RU   0.0       0      0  [kiblnd_sd_00_00]
> > > 151449      2   2  ffff881ff753edd0  RU   0.0       0      0  [kiblnd_sd_00_01]
> > > 154637      2  29  ffff881f776b1f60  RU   0.0       0      0  [md32_raid6]
> > > 163643      2  11  ffff8820d200bec0  UN   0.0       0      0  [ll_ost_io01_008]
> > > 164083      2  13  ffff8820d2753ec0  UN   0.0       0      0  [ll_ost_io01_034]
> > > 171389      2   5  ffff883fd072de20  UN   0.0       0      0  [ll_ost_io01_058]
> > > 171391      2  17  ffff883fd072ce70  UN   0.0       0      0  [ll_ost_io01_060]
> > > 171399      2  33  ffff883fd9e9ce70  UN   0.0       0      0  [ll_ost_io01_068]
> > > 237019      2  15  ffff882715628fb0  UN   0.0       0      0  [ll_ost_io01_084]
> > > 237033      2   3  ffff881fb0250fb0  UN   0.0       0      0  [ll_ost_io01_098]
> > ...
> > 
> > crash>  bt 237033
> > PID: 237033  TASK: ffff881fb0250fb0  CPU: 3   COMMAND: "ll_ost_io01_098"
> >   #0 [ffff8826fa3e7240] machine_kexec at ffffffff81059bdb
> >   #1 [ffff8826fa3e72a0] __crash_kexec at ffffffff811057c2
> >   #2 [ffff8826fa3e7370] crash_kexec at ffffffff811058b0
> >   #3 [ffff8826fa3e7388] oops_end at ffffffff8168fac8
> >   #4 [ffff8826fa3e73b0] die at ffffffff8102e93b
> >   #5 [ffff8826fa3e73e0] do_trap at ffffffff8168f180
> >   #6 [ffff8826fa3e7430] do_invalid_op at ffffffff8102b144
> >   #7 [ffff8826fa3e74e0] invalid_op at ffffffff81698b5e
> >      [exception RIP: raid5_get_active_stripe+1809]
> >      RIP: ffffffffa0fefd71  RSP: ffff8826fa3e7590  RFLAGS: 00010086
> >      RAX: 0000000000000000  RBX: ffff883df051641c  RCX: dead000000000200
> >      RDX: 0000000000000000  RSI: 0000000000000005  RDI: ffff881e08d54d30
> >      RBP: ffff8826fa3e7638   R8: ffff881e08d54d40   R9: 0000000000000080
> >      R10: 0000000000000000  R11: 0000000000000000  R12: ffff883df0516400
> >      R13: ffff881e08d54d30  R14: 0000000000000000  R15: 0000000000000000
> >      ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
> >   #8 [ffff8826fa3e7640] raid5_make_request at ffffffffa0feff55 [raid456]
> >   #9 [ffff8826fa3e7730] md_make_request at ffffffff814f838c
> > #10 [ffff8826fa3e7790] generic_make_request at ffffffff812eedc9
> > #11 [ffff8826fa3e77d8] submit_bio at ffffffff812eef11
> > #12 [ffff8826fa3e7830] osd_submit_bio at ffffffffa10cdbcc [osd_ldiskfs]
> > #13 [ffff8826fa3e7840] osd_do_bio at ffffffffa10d03c1 [osd_ldiskfs]
> > #14 [ffff8826fa3e7960] osd_write_commit at ffffffffa10d0b94 [osd_ldiskfs]
> > #15 [ffff8826fa3e7a00] ofd_commitrw_write at ffffffffa120c734 [ofd]
> > #16 [ffff8826fa3e7a78] ofd_commitrw at ffffffffa120ff07 [ofd]
> > #17 [ffff8826fa3e7b08] obd_commitrw at ffffffffa0c7dc22 [ptlrpc]
> > #18 [ffff8826fa3e7b70] tgt_brw_write at ffffffffa0c55fc1 [ptlrpc]
> > #19 [ffff8826fa3e7cd8] tgt_request_handle at ffffffffa0c52275 [ptlrpc]
> > #20 [ffff8826fa3e7d20] ptlrpc_server_handle_request at ffffffffa0bfe1fb [ptlrpc]
> > #21 [ffff8826fa3e7de8] ptlrpc_main at ffffffffa0c022b0 [ptlrpc]
> > #22 [ffff8826fa3e7ec8] kthread at ffffffff810b0a4f
> > #23 [ffff8826fa3e7f50] ret_from_fork at ffffffff81697218
> > 
> > 
> > After some disassembly work, I found out that all active tasks are working on the same md (all are raid6):
> > 
> > md32’s (struct r5conf *) is 0xffff883df0516400
> > 
> > Our crashed task:
> > 
> > crash> printf "%s", ((struct r5conf)*0xffff883df0516400).cache_name[0]
> > raid6-md32
> > 
> > Example for md32_raid6 task in release_inactive_stripe_list:
> > 
> > crash> bt 154637
> > PID: 154637  TASK: ffff881f776b1f60  CPU: 29  COMMAND: "md32_raid6"
> >   #0 [ffff883ffc985e48] crash_nmi_callback at ffffffff8104d2e2
> >   #1 [ffff883ffc985e58] nmi_handle at ffffffff8168fc77
> >   #2 [ffff883ffc985eb0] do_nmi at ffffffff8168fe83
> >   #3 [ffff883ffc985ef0] end_repeat_nmi at ffffffff8168f093
> >      [exception RIP: _raw_spin_lock_irqsave+64]
> >      RIP: ffffffff8168e5f0  RSP: ffff881e8d0cbc60  RFLAGS: 00000006
> >      RAX: 00000000000031bb  RBX: ffff883df0516740  RCX: 000000000000009a
> >      RDX: 0000000000000098  RSI: 000000000000009a  RDI: ffff883df051641c
> >      RBP: ffff881e8d0cbc60   R8: 0000000000000206   R9: 0000000000000000
> >      R10: 0000000000000000  R11: 0000000000000007  R12: ffff883df0516400
> >      R13: ffff883df051641c  R14: 0000000000000006  R15: 0000000000000005
> >      ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
> > --- <NMI exception stack> ---
> >   #4 [ffff881e8d0cbc60] _raw_spin_lock_irqsave at ffffffff8168e5f0
> >   #5 [ffff881e8d0cbc68] release_inactive_stripe_list at ffffffffa0fe5500 [raid456]
> >   #6 [ffff881e8d0cbcb0] handle_active_stripes at ffffffffa0ff34c2 [raid456]
> >   #7 [ffff881e8d0cbd60] raid5d at ffffffffa0ff3c28 [raid456]
> >   #8 [ffff881e8d0cbe50] md_thread at ffffffff814fb745
> >   #9 [ffff881e8d0cbec8] kthread at ffffffff810b0a4f
> > #10 [ffff881e8d0cbf50] ret_from_fork at ffffffff81697218
> > 
> > 
> > crash> bt 163643 164083 171389 171391 171399 237019 237033 |grep "exception RIP:"
> >      [exception RIP: _raw_spin_lock_irq+63]
> >      [exception RIP: _raw_spin_lock_irq+63]
> >      [exception RIP: _raw_spin_lock_irq+58]
> >      [exception RIP: _raw_spin_lock_irq+63]
> >      [exception RIP: _raw_spin_lock_irq+58]
> >      [exception RIP: _raw_spin_lock_irq+63]
> >      [exception RIP: raid5_get_active_stripe+1809]
> > 
> > crash> bt 163643 164083 171389 171391 171399 237019 237033 |grep "raid5_get_active_stripe"
> >   #5 [ffff883bcee93588] raid5_get_active_stripe at ffffffffa0fef6cb [raid456]
> >   #5 [ffff883ba4af7588] raid5_get_active_stripe at ffffffffa0fef6cb [raid456]
> >   #5 [ffff88390877b588] raid5_get_active_stripe at ffffffffa0fef6cb [raid456]
> >   #5 [ffff88390803b588] raid5_get_active_stripe at ffffffffa0fef6cb [raid456]
> >   #5 [ffff88390831f588] raid5_get_active_stripe at ffffffffa0fef6cb [raid456]
> >   #5 [ffff8826fa7f7588] raid5_get_active_stripe at ffffffffa0fef6cb [raid456]
> >      [exception RIP: raid5_get_active_stripe+1809]
> > 
> > crash> bt 163643 164083 171389 171391 171399 237019 237033 |grep R12
> >      R10: 0000000000000003  R11: 0000000000000000  R12: ffff883df0516400
> >      R10: 0000000000000001  R11: 0000000000000000  R12: ffff883df0516400
> >      R10: 0000000000000005  R11: 0000000000000000  R12: ffff883df0516400
> >      R10: 0000000000000007  R11: 0000000000000000  R12: ffff883df0516400
> >      R10: 0000000000000009  R11: 0000000000000000  R12: ffff883df0516400
> >      R10: 0000000000000009  R11: 0000000000000000  R12: ffff883df0516400
> >      R10: 0000000000000000  R11: 0000000000000000  R12: ffff883df0516400
> > 
> > 
> > The bad stripe_head (sh) is available at (usually in %r13):
> > 
> > crash> p ((struct stripe_head)*0xffff881e08d54d30).raid_conf
> > $98 = (struct r5conf *) 0xffff883df0516400
> > 
> > and batch_head is not NULL... that's the issue:
> > 
> > crash> p ((struct stripe_head)*0xffff881e08d54d30).batch_head
> > $99 = (struct stripe_head *) 0xffff881e08d54d30
> > 
> > Full struct dump below. We can see than overwrite_disks is set to 0, state is 0… but this batch_head not set to NULL makes it crash in init_stripe():
> > 
> > crash> p ((struct stripe_head)*0xffff881e08d54d30)
> > $6 = {
> >    hash = {
> >      next = 0x0,
> >      pprev = 0x0
> >    },
> >    lru = {
> >      next = 0xffff881e08d54d40,
> >      prev = 0xffff881e08d54d40
> >    },
> >    release_list = {
> >      next = 0xffff881dfe405cc0
> >    },
> >    raid_conf = 0xffff883df0516400,
> >    generation = 0,
> >    sector = 3701224808,
> >    pd_idx = 1,
> >    qd_idx = 2,
> >    ddf_layout = 0,
> >    hash_lock_index = 5,
> >    state = 0,
> >    count = {
> >      counter = 0
> >    },
> >    bm_seq = 22708787,
> >    disks = 10,
> >    overwrite_disks = 0,
> >    check_state = check_state_idle,
> >    reconstruct_state = reconstruct_state_idle,
> >    stripe_lock = {
> >      {
> >        rlock = {
> >          raw_lock = {
> >            {
> >              head_tail = 3709263126,
> >              tickets = {
> >                head = 56598,
> >                tail = 56598
> >              }
> >            }
> >          }
> >        }
> >      }
> >    },
> >    cpu = 15,
> >    group = 0x0,
> >    batch_head = 0xffff881e08d54d30,
> >    batch_lock = {
> >      {
> >        rlock = {
> >          raw_lock = {
> >            {
> >              head_tail = 9961624,
> >              tickets = {
> >                head = 152,
> >                tail = 152
> >              }
> >            }
> >          }
> >        }
> >      }
> >    },
> >    batch_list = {
> >      next = 0xffff881dfe405d28,
> >      prev = 0xffff881e0e2dec98
> >    },
> >    log_io = 0x0,
> >    log_list = {
> >      next = 0x0,
> >      prev = 0x0
> >    },
> >    ops = {
> >      target = 0,
> >      target2 = 0,
> >      zero_sum_result = (unknown: 0)
> >    },
> >    dev = {{
> >        req = {
> >          bi_sector = 0,
> >          bi_next = 0x0,
> >          bi_bdev = 0x0,
> >          bi_flags = 1,
> >          bi_rw = 0,
> >          bi_vcnt = 0,
> >          bi_idx = 0,
> >          bi_phys_segments = 0,
> >          bi_size = 0,
> >          bi_seg_front_size = 0,
> >          bi_seg_back_size = 0,
> >          bi_end_io = 0x0,
> >          bi_private = 0xffff881e08d54d30,
> >          bi_ioc = 0x0,
> >          bi_css = 0x0,
> >          bi_integrity = 0x0,
> >          bi_max_vecs = 1,
> >          bi_cnt = {
> >            counter = 1
> >          },
> >          bi_io_vec = 0xffff881e08d54f00,
> >          bi_pool = 0x0,
> >          {
> >            bio_aux = 0x0,
> >            __UNIQUE_ID_rh_kabi_hide15 = {
> >              rh_reserved1 = 0x0
> >            },
> >            {<No data fields>}
> >          },
> >          bi_inline_vecs = 0xffff881e08d54e78
> >        },
> >        rreq = {
> >          bi_sector = 0,
> >          bi_next = 0x0,
> >          bi_bdev = 0x0,
> >          bi_flags = 1,
> >          bi_rw = 0,
> >          bi_vcnt = 0,
> >          bi_idx = 0,
> >          bi_phys_segments = 0,
> >          bi_size = 0,
> >          bi_seg_front_size = 0,
> >          bi_seg_back_size = 0,
> >          bi_end_io = 0x0,
> >          bi_private = 0xffff881e08d54d30,
> >          bi_ioc = 0x0,
> >          bi_css = 0x0,
> >          bi_integrity = 0x0,
> >          bi_max_vecs = 1,
> >          bi_cnt = {
> >            counter = 1
> >          },
> >          bi_io_vec = 0xffff881e08d54f10,
> >          bi_pool = 0x0,
> >          {
> >            bio_aux = 0x0,
> >            __UNIQUE_ID_rh_kabi_hide15 = {
> >              rh_reserved1 = 0x0
> >            },
> >            {<No data fields>}
> >          },
> >          bi_inline_vecs = 0xffff881e08d54f00
> >        },
> >        vec = {
> >          bv_page = 0xffffea0078236080,
> >          bv_len = 4096,
> >          bv_offset = 0
> >        },
> >        rvec = {
> >          bv_page = 0x0,
> >          bv_len = 0,
> >          bv_offset = 0
> >        },
> >        page = 0xffffea0078236080,
> >        orig_page = 0xffffea0078236080,
> >        toread = 0x0,
> >        read = 0x0,
> >        towrite = 0x0,
> >        written = 0xffff8824832a7c00,
> >        sector = 29609798632,
> >        flags = 24,
> 
> Seems the flags is R5_OVERWRITE | R5_Insync, handle_stripe_clean_event
> only calls break_stripe_batch_list if do_endio is set to true.
> 
> if (head_sh->batch_head && do_endio)
>                 break_stripe_batch_list(head_sh, STRIPE_EXPAND_SYNC_FLAGS);
> 
> And do_endio is set to true if we meet the below condition.
> 
> if (!test_bit(R5_LOCKED, &dev->flags) &&
>                             (test_bit(R5_UPTODATE, &dev->flags) ||
>                              test_bit(R5_Discard, &dev->flags) ||
>                              test_bit(R5_SkipCopy, &dev->flags))) {
> 
> I guess it is why the batch_head is not null, just my 2 cents.

this is strange. The bi_end_io == NULL, written != NULL and !R5_UPTODATE means
we are do restruction and restruction isn't finished yet. But at that time, we
should have R5_LOCKED set, which isn't true for this case.

Did you enable skip_copy or discard?

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: BUG_ON(sh->batch_head) in init_stripe()
  2017-08-15 17:59     ` Stephane Thiell
@ 2017-08-15 18:39       ` Shaohua Li
  2017-08-15 19:27         ` Stephane Thiell
  0 siblings, 1 reply; 15+ messages in thread
From: Shaohua Li @ 2017-08-15 18:39 UTC (permalink / raw)
  To: Stephane Thiell; +Cc: Guoqing Jiang, linux-raid

On Tue, Aug 15, 2017 at 05:59:38PM +0000, Stephane Thiell wrote:
> 
> > On Aug 15, 2017, at 2:24 AM, Guoqing Jiang <gqjiang@suse.com> wrote:
> > 
> > 
> > 
> > On 08/14/2017 01:29 AM, Stephane Thiell wrote:
> >> Also known as "kernel BUG at drivers/md/raid5.c:527!"
> >> 
> >> I’m still struggling with this issue in raid5.c. I currently have one crash per week in average on a system using Lustre and 24 x 10-disk raid6 per server. It did happen with or without check running. I now think there is a race at some point where a released stripe head (batched) is added to conf->inactive_list without sh->batch_head being set back to NULL.
> >> 
> >> Because sh->batch_head is only being set back to NULL in a single function, break_stripe_batch_list(), my guess is that a call to raid5_release_stripe() on the head stripe is performed elsewhere. I didn’t find where this could come from yet. So perhaps sh->batch_head should be just set to NULL in do_release_stripe(), before being added to the inactive list?
> >> 
> >> Alternatively, I wonder if it would make more sense to set batch_head to NULL where sh->overwrite_disks is also set back to 0 (like in ops_run_biodrain() today).
> >> 
> >> Because sh->overwrite_disks is also set to 0 in init_stripe(), my current plan is to try this change, at least as a workaround: remove the BUG_ON(sh->batch_head) from init_stripe() and set sh->batch_head = NULL (I don’t think I need to lock on sh->stripe_lock there?). I will get new hardware soon so I will give it a try on a test system. Any suggestions welcomed…
> >> 
> >> Quick analysis of the last vmcore:
> >> 
> >> crash> ps | grep ">"
> >> ...
> >>> 16119  16111  22  ffff881f93bace70  RU   0.0  150772  10748  e2label
> >>> 151448      2  34  ffff881ff753bec0  RU   0.0       0      0  [kiblnd_sd_00_00]
> >>> 151449      2   2  ffff881ff753edd0  RU   0.0       0      0  [kiblnd_sd_00_01]
> >>> 154637      2  29  ffff881f776b1f60  RU   0.0       0      0  [md32_raid6]
> >>> 163643      2  11  ffff8820d200bec0  UN   0.0       0      0  [ll_ost_io01_008]
> >>> 164083      2  13  ffff8820d2753ec0  UN   0.0       0      0  [ll_ost_io01_034]
> >>> 171389      2   5  ffff883fd072de20  UN   0.0       0      0  [ll_ost_io01_058]
> >>> 171391      2  17  ffff883fd072ce70  UN   0.0       0      0  [ll_ost_io01_060]
> >>> 171399      2  33  ffff883fd9e9ce70  UN   0.0       0      0  [ll_ost_io01_068]
> >>> 237019      2  15  ffff882715628fb0  UN   0.0       0      0  [ll_ost_io01_084]
> >>> 237033      2   3  ffff881fb0250fb0  UN   0.0       0      0  [ll_ost_io01_098]
> >> ...
> >> 
> >> crash>  bt 237033
> >> PID: 237033  TASK: ffff881fb0250fb0  CPU: 3   COMMAND: "ll_ost_io01_098"
> >>  #0 [ffff8826fa3e7240] machine_kexec at ffffffff81059bdb
> >>  #1 [ffff8826fa3e72a0] __crash_kexec at ffffffff811057c2
> >>  #2 [ffff8826fa3e7370] crash_kexec at ffffffff811058b0
> >>  #3 [ffff8826fa3e7388] oops_end at ffffffff8168fac8
> >>  #4 [ffff8826fa3e73b0] die at ffffffff8102e93b
> >>  #5 [ffff8826fa3e73e0] do_trap at ffffffff8168f180
> >>  #6 [ffff8826fa3e7430] do_invalid_op at ffffffff8102b144
> >>  #7 [ffff8826fa3e74e0] invalid_op at ffffffff81698b5e
> >>     [exception RIP: raid5_get_active_stripe+1809]
> >>     RIP: ffffffffa0fefd71  RSP: ffff8826fa3e7590  RFLAGS: 00010086
> >>     RAX: 0000000000000000  RBX: ffff883df051641c  RCX: dead000000000200
> >>     RDX: 0000000000000000  RSI: 0000000000000005  RDI: ffff881e08d54d30
> >>     RBP: ffff8826fa3e7638   R8: ffff881e08d54d40   R9: 0000000000000080
> >>     R10: 0000000000000000  R11: 0000000000000000  R12: ffff883df0516400
> >>     R13: ffff881e08d54d30  R14: 0000000000000000  R15: 0000000000000000
> >>     ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
> >>  #8 [ffff8826fa3e7640] raid5_make_request at ffffffffa0feff55 [raid456]
> >>  #9 [ffff8826fa3e7730] md_make_request at ffffffff814f838c
> >> #10 [ffff8826fa3e7790] generic_make_request at ffffffff812eedc9
> >> #11 [ffff8826fa3e77d8] submit_bio at ffffffff812eef11
> >> #12 [ffff8826fa3e7830] osd_submit_bio at ffffffffa10cdbcc [osd_ldiskfs]
> >> #13 [ffff8826fa3e7840] osd_do_bio at ffffffffa10d03c1 [osd_ldiskfs]
> >> #14 [ffff8826fa3e7960] osd_write_commit at ffffffffa10d0b94 [osd_ldiskfs]
> >> #15 [ffff8826fa3e7a00] ofd_commitrw_write at ffffffffa120c734 [ofd]
> >> #16 [ffff8826fa3e7a78] ofd_commitrw at ffffffffa120ff07 [ofd]
> >> #17 [ffff8826fa3e7b08] obd_commitrw at ffffffffa0c7dc22 [ptlrpc]
> >> #18 [ffff8826fa3e7b70] tgt_brw_write at ffffffffa0c55fc1 [ptlrpc]
> >> #19 [ffff8826fa3e7cd8] tgt_request_handle at ffffffffa0c52275 [ptlrpc]
> >> #20 [ffff8826fa3e7d20] ptlrpc_server_handle_request at ffffffffa0bfe1fb [ptlrpc]
> >> #21 [ffff8826fa3e7de8] ptlrpc_main at ffffffffa0c022b0 [ptlrpc]
> >> #22 [ffff8826fa3e7ec8] kthread at ffffffff810b0a4f
> >> #23 [ffff8826fa3e7f50] ret_from_fork at ffffffff81697218
> >> 
> >> 
> >> After some disassembly work, I found out that all active tasks are working on the same md (all are raid6):
> >> 
> >> md32’s (struct r5conf *) is 0xffff883df0516400
> >> 
> >> Our crashed task:
> >> 
> >> crash> printf "%s", ((struct r5conf)*0xffff883df0516400).cache_name[0]
> >> raid6-md32
> >> 
> >> Example for md32_raid6 task in release_inactive_stripe_list:
> >> 
> >> crash> bt 154637
> >> PID: 154637  TASK: ffff881f776b1f60  CPU: 29  COMMAND: "md32_raid6"
> >>  #0 [ffff883ffc985e48] crash_nmi_callback at ffffffff8104d2e2
> >>  #1 [ffff883ffc985e58] nmi_handle at ffffffff8168fc77
> >>  #2 [ffff883ffc985eb0] do_nmi at ffffffff8168fe83
> >>  #3 [ffff883ffc985ef0] end_repeat_nmi at ffffffff8168f093
> >>     [exception RIP: _raw_spin_lock_irqsave+64]
> >>     RIP: ffffffff8168e5f0  RSP: ffff881e8d0cbc60  RFLAGS: 00000006
> >>     RAX: 00000000000031bb  RBX: ffff883df0516740  RCX: 000000000000009a
> >>     RDX: 0000000000000098  RSI: 000000000000009a  RDI: ffff883df051641c
> >>     RBP: ffff881e8d0cbc60   R8: 0000000000000206   R9: 0000000000000000
> >>     R10: 0000000000000000  R11: 0000000000000007  R12: ffff883df0516400
> >>     R13: ffff883df051641c  R14: 0000000000000006  R15: 0000000000000005
> >>     ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
> >> --- <NMI exception stack> ---
> >>  #4 [ffff881e8d0cbc60] _raw_spin_lock_irqsave at ffffffff8168e5f0
> >>  #5 [ffff881e8d0cbc68] release_inactive_stripe_list at ffffffffa0fe5500 [raid456]
> >>  #6 [ffff881e8d0cbcb0] handle_active_stripes at ffffffffa0ff34c2 [raid456]
> >>  #7 [ffff881e8d0cbd60] raid5d at ffffffffa0ff3c28 [raid456]
> >>  #8 [ffff881e8d0cbe50] md_thread at ffffffff814fb745
> >>  #9 [ffff881e8d0cbec8] kthread at ffffffff810b0a4f
> >> #10 [ffff881e8d0cbf50] ret_from_fork at ffffffff81697218
> >> 
> >> 
> >> crash> bt 163643 164083 171389 171391 171399 237019 237033 |grep "exception RIP:"
> >>     [exception RIP: _raw_spin_lock_irq+63]
> >>     [exception RIP: _raw_spin_lock_irq+63]
> >>     [exception RIP: _raw_spin_lock_irq+58]
> >>     [exception RIP: _raw_spin_lock_irq+63]
> >>     [exception RIP: _raw_spin_lock_irq+58]
> >>     [exception RIP: _raw_spin_lock_irq+63]
> >>     [exception RIP: raid5_get_active_stripe+1809]
> >> 
> >> crash> bt 163643 164083 171389 171391 171399 237019 237033 |grep "raid5_get_active_stripe"
> >>  #5 [ffff883bcee93588] raid5_get_active_stripe at ffffffffa0fef6cb [raid456]
> >>  #5 [ffff883ba4af7588] raid5_get_active_stripe at ffffffffa0fef6cb [raid456]
> >>  #5 [ffff88390877b588] raid5_get_active_stripe at ffffffffa0fef6cb [raid456]
> >>  #5 [ffff88390803b588] raid5_get_active_stripe at ffffffffa0fef6cb [raid456]
> >>  #5 [ffff88390831f588] raid5_get_active_stripe at ffffffffa0fef6cb [raid456]
> >>  #5 [ffff8826fa7f7588] raid5_get_active_stripe at ffffffffa0fef6cb [raid456]
> >>     [exception RIP: raid5_get_active_stripe+1809]
> >> 
> >> crash> bt 163643 164083 171389 171391 171399 237019 237033 |grep R12
> >>     R10: 0000000000000003  R11: 0000000000000000  R12: ffff883df0516400
> >>     R10: 0000000000000001  R11: 0000000000000000  R12: ffff883df0516400
> >>     R10: 0000000000000005  R11: 0000000000000000  R12: ffff883df0516400
> >>     R10: 0000000000000007  R11: 0000000000000000  R12: ffff883df0516400
> >>     R10: 0000000000000009  R11: 0000000000000000  R12: ffff883df0516400
> >>     R10: 0000000000000009  R11: 0000000000000000  R12: ffff883df0516400
> >>     R10: 0000000000000000  R11: 0000000000000000  R12: ffff883df0516400
> >> 
> >> 
> >> The bad stripe_head (sh) is available at (usually in %r13):
> >> 
> >> crash> p ((struct stripe_head)*0xffff881e08d54d30).raid_conf
> >> $98 = (struct r5conf *) 0xffff883df0516400
> >> 
> >> and batch_head is not NULL... that's the issue:
> >> 
> >> crash> p ((struct stripe_head)*0xffff881e08d54d30).batch_head
> >> $99 = (struct stripe_head *) 0xffff881e08d54d30
> >> 
> >> Full struct dump below. We can see than overwrite_disks is set to 0, state is 0… but this batch_head not set to NULL makes it crash in init_stripe():
> >> 
> >> crash> p ((struct stripe_head)*0xffff881e08d54d30)
> >> $6 = {
> >>   hash = {
> >>     next = 0x0,
> >>     pprev = 0x0
> >>   },
> >>   lru = {
> >>     next = 0xffff881e08d54d40,
> >>     prev = 0xffff881e08d54d40
> >>   },
> >>   release_list = {
> >>     next = 0xffff881dfe405cc0
> >>   },
> >>   raid_conf = 0xffff883df0516400,
> >>   generation = 0,
> >>   sector = 3701224808,
> >>   pd_idx = 1,
> >>   qd_idx = 2,
> >>   ddf_layout = 0,
> >>   hash_lock_index = 5,
> >>   state = 0,
> >>   count = {
> >>     counter = 0
> >>   },
> >>   bm_seq = 22708787,
> >>   disks = 10,
> >>   overwrite_disks = 0,
> >>   check_state = check_state_idle,
> >>   reconstruct_state = reconstruct_state_idle,
> >>   stripe_lock = {
> >>     {
> >>       rlock = {
> >>         raw_lock = {
> >>           {
> >>             head_tail = 3709263126,
> >>             tickets = {
> >>               head = 56598,
> >>               tail = 56598
> >>             }
> >>           }
> >>         }
> >>       }
> >>     }
> >>   },
> >>   cpu = 15,
> >>   group = 0x0,
> >>   batch_head = 0xffff881e08d54d30,
> >>   batch_lock = {
> >>     {
> >>       rlock = {
> >>         raw_lock = {
> >>           {
> >>             head_tail = 9961624,
> >>             tickets = {
> >>               head = 152,
> >>               tail = 152
> >>             }
> >>           }
> >>         }
> >>       }
> >>     }
> >>   },
> >>   batch_list = {
> >>     next = 0xffff881dfe405d28,
> >>     prev = 0xffff881e0e2dec98
> >>   },
> >>   log_io = 0x0,
> >>   log_list = {
> >>     next = 0x0,
> >>     prev = 0x0
> >>   },
> >>   ops = {
> >>     target = 0,
> >>     target2 = 0,
> >>     zero_sum_result = (unknown: 0)
> >>   },
> >>   dev = {{
> >>       req = {
> >>         bi_sector = 0,
> >>         bi_next = 0x0,
> >>         bi_bdev = 0x0,
> >>         bi_flags = 1,
> >>         bi_rw = 0,
> >>         bi_vcnt = 0,
> >>         bi_idx = 0,
> >>         bi_phys_segments = 0,
> >>         bi_size = 0,
> >>         bi_seg_front_size = 0,
> >>         bi_seg_back_size = 0,
> >>         bi_end_io = 0x0,
> >>         bi_private = 0xffff881e08d54d30,
> >>         bi_ioc = 0x0,
> >>         bi_css = 0x0,
> >>         bi_integrity = 0x0,
> >>         bi_max_vecs = 1,
> >>         bi_cnt = {
> >>           counter = 1
> >>         },
> >>         bi_io_vec = 0xffff881e08d54f00,
> >>         bi_pool = 0x0,
> >>         {
> >>           bio_aux = 0x0,
> >>           __UNIQUE_ID_rh_kabi_hide15 = {
> >>             rh_reserved1 = 0x0
> >>           },
> >>           {<No data fields>}
> >>         },
> >>         bi_inline_vecs = 0xffff881e08d54e78
> >>       },
> >>       rreq = {
> >>         bi_sector = 0,
> >>         bi_next = 0x0,
> >>         bi_bdev = 0x0,
> >>         bi_flags = 1,
> >>         bi_rw = 0,
> >>         bi_vcnt = 0,
> >>         bi_idx = 0,
> >>         bi_phys_segments = 0,
> >>         bi_size = 0,
> >>         bi_seg_front_size = 0,
> >>         bi_seg_back_size = 0,
> >>         bi_end_io = 0x0,
> >>         bi_private = 0xffff881e08d54d30,
> >>         bi_ioc = 0x0,
> >>         bi_css = 0x0,
> >>         bi_integrity = 0x0,
> >>         bi_max_vecs = 1,
> >>         bi_cnt = {
> >>           counter = 1
> >>         },
> >>         bi_io_vec = 0xffff881e08d54f10,
> >>         bi_pool = 0x0,
> >>         {
> >>           bio_aux = 0x0,
> >>           __UNIQUE_ID_rh_kabi_hide15 = {
> >>             rh_reserved1 = 0x0
> >>           },
> >>           {<No data fields>}
> >>         },
> >>         bi_inline_vecs = 0xffff881e08d54f00
> >>       },
> >>       vec = {
> >>         bv_page = 0xffffea0078236080,
> >>         bv_len = 4096,
> >>         bv_offset = 0
> >>       },
> >>       rvec = {
> >>         bv_page = 0x0,
> >>         bv_len = 0,
> >>         bv_offset = 0
> >>       },
> >>       page = 0xffffea0078236080,
> >>       orig_page = 0xffffea0078236080,
> >>       toread = 0x0,
> >>       read = 0x0,
> >>       towrite = 0x0,
> >>       written = 0xffff8824832a7c00,
> >>       sector = 29609798632,
> >>       flags = 24,
> > 
> > Seems the flags is R5_OVERWRITE | R5_Insync, handle_stripe_clean_event
> > only calls break_stripe_batch_list if do_endio is set to true.
> > 
> > if (head_sh->batch_head && do_endio)
> >                break_stripe_batch_list(head_sh, STRIPE_EXPAND_SYNC_FLAGS);
> > 
> > And do_endio is set to true if we meet the below condition.
> > 
> > if (!test_bit(R5_LOCKED, &dev->flags) &&
> >                            (test_bit(R5_UPTODATE, &dev->flags) ||
> >                             test_bit(R5_Discard, &dev->flags) ||
> >                             test_bit(R5_SkipCopy, &dev->flags))) {
> > 
> > I guess it is why the batch_head is not null, just my 2 cents.
> 
> 
> That’s a very good point, thanks Guoqing!
> 
> I checked another similar crash dump, and found out that this same flags was 16 (R5_Insync alone), which doesn’t meet the condition for having do_endio = true neither.
> 
> The code below in raid5_end_write_request does directly release a sh->batch_head, couldn’t that be an issue?
> 
> static void raid5_end_write_request(struct bio *bi)
> {
> 
> ...
> 
>         raid5_release_stripe(sh);
> 
>         if (sh->batch_head && sh != sh->batch_head)
>                 raid5_release_stripe(sh->batch_head);
> }

Can you dump the full stripe_head info for this crash dump? Also stripe_head
should include 10 dev for your 10-disk raid, but crash dump only show one, is
it possible you can dump the info for the devs in the stripe?

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: BUG_ON(sh->batch_head) in init_stripe()
  2017-08-15 18:36     ` Shaohua Li
@ 2017-08-15 19:07       ` Stephane Thiell
  0 siblings, 0 replies; 15+ messages in thread
From: Stephane Thiell @ 2017-08-15 19:07 UTC (permalink / raw)
  To: Shaohua Li; +Cc: Guoqing Jiang, linux-raid


> On Aug 15, 2017, at 11:36 AM, Shaohua Li <shli@kernel.org> wrote:
> 
> On Tue, Aug 15, 2017 at 05:24:53PM +0800, Guoqing Jiang wrote:
>> 
>> 
>> On 08/14/2017 01:29 AM, Stephane Thiell wrote:
>>> Also known as "kernel BUG at drivers/md/raid5.c:527!"
>>> 
>>> I’m still struggling with this issue in raid5.c. I currently have one crash per week in average on a system using Lustre and 24 x 10-disk raid6 per server. It did happen with or without check running. I now think there is a race at some point where a released stripe head (batched) is added to conf->inactive_list without sh->batch_head being set back to NULL.
>>> 
>>> Because sh->batch_head is only being set back to NULL in a single function, break_stripe_batch_list(), my guess is that a call to raid5_release_stripe() on the head stripe is performed elsewhere. I didn’t find where this could come from yet. So perhaps sh->batch_head should be just set to NULL in do_release_stripe(), before being added to the inactive list?
>>> 
>>> Alternatively, I wonder if it would make more sense to set batch_head to NULL where sh->overwrite_disks is also set back to 0 (like in ops_run_biodrain() today).
>>> 
>>> Because sh->overwrite_disks is also set to 0 in init_stripe(), my current plan is to try this change, at least as a workaround: remove the BUG_ON(sh->batch_head) from init_stripe() and set sh->batch_head = NULL (I don’t think I need to lock on sh->stripe_lock there?). I will get new hardware soon so I will give it a try on a test system. Any suggestions welcomed…
>>> 
>>> Quick analysis of the last vmcore:
>>> 
>>> crash> ps | grep ">"
>>> ...
>>>> 16119  16111  22  ffff881f93bace70  RU   0.0  150772  10748  e2label
>>>> 151448      2  34  ffff881ff753bec0  RU   0.0       0      0  [kiblnd_sd_00_00]
>>>> 151449      2   2  ffff881ff753edd0  RU   0.0       0      0  [kiblnd_sd_00_01]
>>>> 154637      2  29  ffff881f776b1f60  RU   0.0       0      0  [md32_raid6]
>>>> 163643      2  11  ffff8820d200bec0  UN   0.0       0      0  [ll_ost_io01_008]
>>>> 164083      2  13  ffff8820d2753ec0  UN   0.0       0      0  [ll_ost_io01_034]
>>>> 171389      2   5  ffff883fd072de20  UN   0.0       0      0  [ll_ost_io01_058]
>>>> 171391      2  17  ffff883fd072ce70  UN   0.0       0      0  [ll_ost_io01_060]
>>>> 171399      2  33  ffff883fd9e9ce70  UN   0.0       0      0  [ll_ost_io01_068]
>>>> 237019      2  15  ffff882715628fb0  UN   0.0       0      0  [ll_ost_io01_084]
>>>> 237033      2   3  ffff881fb0250fb0  UN   0.0       0      0  [ll_ost_io01_098]
>>> ...
>>> 
>>> crash>  bt 237033
>>> PID: 237033  TASK: ffff881fb0250fb0  CPU: 3   COMMAND: "ll_ost_io01_098"
>>>  #0 [ffff8826fa3e7240] machine_kexec at ffffffff81059bdb
>>>  #1 [ffff8826fa3e72a0] __crash_kexec at ffffffff811057c2
>>>  #2 [ffff8826fa3e7370] crash_kexec at ffffffff811058b0
>>>  #3 [ffff8826fa3e7388] oops_end at ffffffff8168fac8
>>>  #4 [ffff8826fa3e73b0] die at ffffffff8102e93b
>>>  #5 [ffff8826fa3e73e0] do_trap at ffffffff8168f180
>>>  #6 [ffff8826fa3e7430] do_invalid_op at ffffffff8102b144
>>>  #7 [ffff8826fa3e74e0] invalid_op at ffffffff81698b5e
>>>     [exception RIP: raid5_get_active_stripe+1809]
>>>     RIP: ffffffffa0fefd71  RSP: ffff8826fa3e7590  RFLAGS: 00010086
>>>     RAX: 0000000000000000  RBX: ffff883df051641c  RCX: dead000000000200
>>>     RDX: 0000000000000000  RSI: 0000000000000005  RDI: ffff881e08d54d30
>>>     RBP: ffff8826fa3e7638   R8: ffff881e08d54d40   R9: 0000000000000080
>>>     R10: 0000000000000000  R11: 0000000000000000  R12: ffff883df0516400
>>>     R13: ffff881e08d54d30  R14: 0000000000000000  R15: 0000000000000000
>>>     ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
>>>  #8 [ffff8826fa3e7640] raid5_make_request at ffffffffa0feff55 [raid456]
>>>  #9 [ffff8826fa3e7730] md_make_request at ffffffff814f838c
>>> #10 [ffff8826fa3e7790] generic_make_request at ffffffff812eedc9
>>> #11 [ffff8826fa3e77d8] submit_bio at ffffffff812eef11
>>> #12 [ffff8826fa3e7830] osd_submit_bio at ffffffffa10cdbcc [osd_ldiskfs]
>>> #13 [ffff8826fa3e7840] osd_do_bio at ffffffffa10d03c1 [osd_ldiskfs]
>>> #14 [ffff8826fa3e7960] osd_write_commit at ffffffffa10d0b94 [osd_ldiskfs]
>>> #15 [ffff8826fa3e7a00] ofd_commitrw_write at ffffffffa120c734 [ofd]
>>> #16 [ffff8826fa3e7a78] ofd_commitrw at ffffffffa120ff07 [ofd]
>>> #17 [ffff8826fa3e7b08] obd_commitrw at ffffffffa0c7dc22 [ptlrpc]
>>> #18 [ffff8826fa3e7b70] tgt_brw_write at ffffffffa0c55fc1 [ptlrpc]
>>> #19 [ffff8826fa3e7cd8] tgt_request_handle at ffffffffa0c52275 [ptlrpc]
>>> #20 [ffff8826fa3e7d20] ptlrpc_server_handle_request at ffffffffa0bfe1fb [ptlrpc]
>>> #21 [ffff8826fa3e7de8] ptlrpc_main at ffffffffa0c022b0 [ptlrpc]
>>> #22 [ffff8826fa3e7ec8] kthread at ffffffff810b0a4f
>>> #23 [ffff8826fa3e7f50] ret_from_fork at ffffffff81697218
>>> 
>>> 
>>> After some disassembly work, I found out that all active tasks are working on the same md (all are raid6):
>>> 
>>> md32’s (struct r5conf *) is 0xffff883df0516400
>>> 
>>> Our crashed task:
>>> 
>>> crash> printf "%s", ((struct r5conf)*0xffff883df0516400).cache_name[0]
>>> raid6-md32
>>> 
>>> Example for md32_raid6 task in release_inactive_stripe_list:
>>> 
>>> crash> bt 154637
>>> PID: 154637  TASK: ffff881f776b1f60  CPU: 29  COMMAND: "md32_raid6"
>>>  #0 [ffff883ffc985e48] crash_nmi_callback at ffffffff8104d2e2
>>>  #1 [ffff883ffc985e58] nmi_handle at ffffffff8168fc77
>>>  #2 [ffff883ffc985eb0] do_nmi at ffffffff8168fe83
>>>  #3 [ffff883ffc985ef0] end_repeat_nmi at ffffffff8168f093
>>>     [exception RIP: _raw_spin_lock_irqsave+64]
>>>     RIP: ffffffff8168e5f0  RSP: ffff881e8d0cbc60  RFLAGS: 00000006
>>>     RAX: 00000000000031bb  RBX: ffff883df0516740  RCX: 000000000000009a
>>>     RDX: 0000000000000098  RSI: 000000000000009a  RDI: ffff883df051641c
>>>     RBP: ffff881e8d0cbc60   R8: 0000000000000206   R9: 0000000000000000
>>>     R10: 0000000000000000  R11: 0000000000000007  R12: ffff883df0516400
>>>     R13: ffff883df051641c  R14: 0000000000000006  R15: 0000000000000005
>>>     ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
>>> --- <NMI exception stack> ---
>>>  #4 [ffff881e8d0cbc60] _raw_spin_lock_irqsave at ffffffff8168e5f0
>>>  #5 [ffff881e8d0cbc68] release_inactive_stripe_list at ffffffffa0fe5500 [raid456]
>>>  #6 [ffff881e8d0cbcb0] handle_active_stripes at ffffffffa0ff34c2 [raid456]
>>>  #7 [ffff881e8d0cbd60] raid5d at ffffffffa0ff3c28 [raid456]
>>>  #8 [ffff881e8d0cbe50] md_thread at ffffffff814fb745
>>>  #9 [ffff881e8d0cbec8] kthread at ffffffff810b0a4f
>>> #10 [ffff881e8d0cbf50] ret_from_fork at ffffffff81697218
>>> 
>>> 
>>> crash> bt 163643 164083 171389 171391 171399 237019 237033 |grep "exception RIP:"
>>>     [exception RIP: _raw_spin_lock_irq+63]
>>>     [exception RIP: _raw_spin_lock_irq+63]
>>>     [exception RIP: _raw_spin_lock_irq+58]
>>>     [exception RIP: _raw_spin_lock_irq+63]
>>>     [exception RIP: _raw_spin_lock_irq+58]
>>>     [exception RIP: _raw_spin_lock_irq+63]
>>>     [exception RIP: raid5_get_active_stripe+1809]
>>> 
>>> crash> bt 163643 164083 171389 171391 171399 237019 237033 |grep "raid5_get_active_stripe"
>>>  #5 [ffff883bcee93588] raid5_get_active_stripe at ffffffffa0fef6cb [raid456]
>>>  #5 [ffff883ba4af7588] raid5_get_active_stripe at ffffffffa0fef6cb [raid456]
>>>  #5 [ffff88390877b588] raid5_get_active_stripe at ffffffffa0fef6cb [raid456]
>>>  #5 [ffff88390803b588] raid5_get_active_stripe at ffffffffa0fef6cb [raid456]
>>>  #5 [ffff88390831f588] raid5_get_active_stripe at ffffffffa0fef6cb [raid456]
>>>  #5 [ffff8826fa7f7588] raid5_get_active_stripe at ffffffffa0fef6cb [raid456]
>>>     [exception RIP: raid5_get_active_stripe+1809]
>>> 
>>> crash> bt 163643 164083 171389 171391 171399 237019 237033 |grep R12
>>>     R10: 0000000000000003  R11: 0000000000000000  R12: ffff883df0516400
>>>     R10: 0000000000000001  R11: 0000000000000000  R12: ffff883df0516400
>>>     R10: 0000000000000005  R11: 0000000000000000  R12: ffff883df0516400
>>>     R10: 0000000000000007  R11: 0000000000000000  R12: ffff883df0516400
>>>     R10: 0000000000000009  R11: 0000000000000000  R12: ffff883df0516400
>>>     R10: 0000000000000009  R11: 0000000000000000  R12: ffff883df0516400
>>>     R10: 0000000000000000  R11: 0000000000000000  R12: ffff883df0516400
>>> 
>>> 
>>> The bad stripe_head (sh) is available at (usually in %r13):
>>> 
>>> crash> p ((struct stripe_head)*0xffff881e08d54d30).raid_conf
>>> $98 = (struct r5conf *) 0xffff883df0516400
>>> 
>>> and batch_head is not NULL... that's the issue:
>>> 
>>> crash> p ((struct stripe_head)*0xffff881e08d54d30).batch_head
>>> $99 = (struct stripe_head *) 0xffff881e08d54d30
>>> 
>>> Full struct dump below. We can see than overwrite_disks is set to 0, state is 0… but this batch_head not set to NULL makes it crash in init_stripe():
>>> 
>>> crash> p ((struct stripe_head)*0xffff881e08d54d30)
>>> $6 = {
>>>   hash = {
>>>     next = 0x0,
>>>     pprev = 0x0
>>>   },
>>>   lru = {
>>>     next = 0xffff881e08d54d40,
>>>     prev = 0xffff881e08d54d40
>>>   },
>>>   release_list = {
>>>     next = 0xffff881dfe405cc0
>>>   },
>>>   raid_conf = 0xffff883df0516400,
>>>   generation = 0,
>>>   sector = 3701224808,
>>>   pd_idx = 1,
>>>   qd_idx = 2,
>>>   ddf_layout = 0,
>>>   hash_lock_index = 5,
>>>   state = 0,
>>>   count = {
>>>     counter = 0
>>>   },
>>>   bm_seq = 22708787,
>>>   disks = 10,
>>>   overwrite_disks = 0,
>>>   check_state = check_state_idle,
>>>   reconstruct_state = reconstruct_state_idle,
>>>   stripe_lock = {
>>>     {
>>>       rlock = {
>>>         raw_lock = {
>>>           {
>>>             head_tail = 3709263126,
>>>             tickets = {
>>>               head = 56598,
>>>               tail = 56598
>>>             }
>>>           }
>>>         }
>>>       }
>>>     }
>>>   },
>>>   cpu = 15,
>>>   group = 0x0,
>>>   batch_head = 0xffff881e08d54d30,
>>>   batch_lock = {
>>>     {
>>>       rlock = {
>>>         raw_lock = {
>>>           {
>>>             head_tail = 9961624,
>>>             tickets = {
>>>               head = 152,
>>>               tail = 152
>>>             }
>>>           }
>>>         }
>>>       }
>>>     }
>>>   },
>>>   batch_list = {
>>>     next = 0xffff881dfe405d28,
>>>     prev = 0xffff881e0e2dec98
>>>   },
>>>   log_io = 0x0,
>>>   log_list = {
>>>     next = 0x0,
>>>     prev = 0x0
>>>   },
>>>   ops = {
>>>     target = 0,
>>>     target2 = 0,
>>>     zero_sum_result = (unknown: 0)
>>>   },
>>>   dev = {{
>>>       req = {
>>>         bi_sector = 0,
>>>         bi_next = 0x0,
>>>         bi_bdev = 0x0,
>>>         bi_flags = 1,
>>>         bi_rw = 0,
>>>         bi_vcnt = 0,
>>>         bi_idx = 0,
>>>         bi_phys_segments = 0,
>>>         bi_size = 0,
>>>         bi_seg_front_size = 0,
>>>         bi_seg_back_size = 0,
>>>         bi_end_io = 0x0,
>>>         bi_private = 0xffff881e08d54d30,
>>>         bi_ioc = 0x0,
>>>         bi_css = 0x0,
>>>         bi_integrity = 0x0,
>>>         bi_max_vecs = 1,
>>>         bi_cnt = {
>>>           counter = 1
>>>         },
>>>         bi_io_vec = 0xffff881e08d54f00,
>>>         bi_pool = 0x0,
>>>         {
>>>           bio_aux = 0x0,
>>>           __UNIQUE_ID_rh_kabi_hide15 = {
>>>             rh_reserved1 = 0x0
>>>           },
>>>           {<No data fields>}
>>>         },
>>>         bi_inline_vecs = 0xffff881e08d54e78
>>>       },
>>>       rreq = {
>>>         bi_sector = 0,
>>>         bi_next = 0x0,
>>>         bi_bdev = 0x0,
>>>         bi_flags = 1,
>>>         bi_rw = 0,
>>>         bi_vcnt = 0,
>>>         bi_idx = 0,
>>>         bi_phys_segments = 0,
>>>         bi_size = 0,
>>>         bi_seg_front_size = 0,
>>>         bi_seg_back_size = 0,
>>>         bi_end_io = 0x0,
>>>         bi_private = 0xffff881e08d54d30,
>>>         bi_ioc = 0x0,
>>>         bi_css = 0x0,
>>>         bi_integrity = 0x0,
>>>         bi_max_vecs = 1,
>>>         bi_cnt = {
>>>           counter = 1
>>>         },
>>>         bi_io_vec = 0xffff881e08d54f10,
>>>         bi_pool = 0x0,
>>>         {
>>>           bio_aux = 0x0,
>>>           __UNIQUE_ID_rh_kabi_hide15 = {
>>>             rh_reserved1 = 0x0
>>>           },
>>>           {<No data fields>}
>>>         },
>>>         bi_inline_vecs = 0xffff881e08d54f00
>>>       },
>>>       vec = {
>>>         bv_page = 0xffffea0078236080,
>>>         bv_len = 4096,
>>>         bv_offset = 0
>>>       },
>>>       rvec = {
>>>         bv_page = 0x0,
>>>         bv_len = 0,
>>>         bv_offset = 0
>>>       },
>>>       page = 0xffffea0078236080,
>>>       orig_page = 0xffffea0078236080,
>>>       toread = 0x0,
>>>       read = 0x0,
>>>       towrite = 0x0,
>>>       written = 0xffff8824832a7c00,
>>>       sector = 29609798632,
>>>       flags = 24,
>> 
>> Seems the flags is R5_OVERWRITE | R5_Insync, handle_stripe_clean_event
>> only calls break_stripe_batch_list if do_endio is set to true.
>> 
>> if (head_sh->batch_head && do_endio)
>>                break_stripe_batch_list(head_sh, STRIPE_EXPAND_SYNC_FLAGS);
>> 
>> And do_endio is set to true if we meet the below condition.
>> 
>> if (!test_bit(R5_LOCKED, &dev->flags) &&
>>                            (test_bit(R5_UPTODATE, &dev->flags) ||
>>                             test_bit(R5_Discard, &dev->flags) ||
>>                             test_bit(R5_SkipCopy, &dev->flags))) {
>> 
>> I guess it is why the batch_head is not null, just my 2 cents.
> 
> this is strange. The bi_end_io == NULL, written != NULL and !R5_UPTODATE means
> we are do restruction and restruction isn't finished yet. But at that time, we
> should have R5_LOCKED set, which isn't true for this case.
> 
> Did you enable skip_copy or discard?

Nope!

# cat /sys/block/md32/md/skip_copy 
0

# cat /sys/block/md32/queue/discard_max_bytes
0
# cat /sys/block/md32/queue/discard_granularity
524288
# cat /sys/block/md32/queue/discard_zeroes_data
0
# cat /sys/block/md32/discard_alignment 
524288

# cat /sys/module/raid456/parameters/devices_handle_discard_safely 
N


Thanks!

Stephane


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: BUG_ON(sh->batch_head) in init_stripe()
  2017-08-15 18:39       ` Shaohua Li
@ 2017-08-15 19:27         ` Stephane Thiell
  2017-08-17 22:58           ` Shaohua Li
  0 siblings, 1 reply; 15+ messages in thread
From: Stephane Thiell @ 2017-08-15 19:27 UTC (permalink / raw)
  To: Shaohua Li; +Cc: Guoqing Jiang, linux-raid


> On Aug 15, 2017, at 11:39 AM, Shaohua Li <shli@kernel.org> wrote:
> 
> On Tue, Aug 15, 2017 at 05:59:38PM +0000, Stephane Thiell wrote:
>> 
>>> On Aug 15, 2017, at 2:24 AM, Guoqing Jiang <gqjiang@suse.com> wrote:
>>> 
>>> 
>>> 
>>> On 08/14/2017 01:29 AM, Stephane Thiell wrote:
>>>> Also known as "kernel BUG at drivers/md/raid5.c:527!"
>>>> 
>>>> I’m still struggling with this issue in raid5.c. I currently have one crash per week in average on a system using Lustre and 24 x 10-disk raid6 per server. It did happen with or without check running. I now think there is a race at some point where a released stripe head (batched) is added to conf->inactive_list without sh->batch_head being set back to NULL.
>>>> 
>>>> Because sh->batch_head is only being set back to NULL in a single function, break_stripe_batch_list(), my guess is that a call to raid5_release_stripe() on the head stripe is performed elsewhere. I didn’t find where this could come from yet. So perhaps sh->batch_head should be just set to NULL in do_release_stripe(), before being added to the inactive list?
>>>> 
>>>> Alternatively, I wonder if it would make more sense to set batch_head to NULL where sh->overwrite_disks is also set back to 0 (like in ops_run_biodrain() today).
>>>> 
>>>> Because sh->overwrite_disks is also set to 0 in init_stripe(), my current plan is to try this change, at least as a workaround: remove the BUG_ON(sh->batch_head) from init_stripe() and set sh->batch_head = NULL (I don’t think I need to lock on sh->stripe_lock there?). I will get new hardware soon so I will give it a try on a test system. Any suggestions welcomed…
>>>> 
>>>> Quick analysis of the last vmcore:
>>>> 
>>>> crash> ps | grep ">"
>>>> ...
>>>>> 16119  16111  22  ffff881f93bace70  RU   0.0  150772  10748  e2label
>>>>> 151448      2  34  ffff881ff753bec0  RU   0.0       0      0  [kiblnd_sd_00_00]
>>>>> 151449      2   2  ffff881ff753edd0  RU   0.0       0      0  [kiblnd_sd_00_01]
>>>>> 154637      2  29  ffff881f776b1f60  RU   0.0       0      0  [md32_raid6]
>>>>> 163643      2  11  ffff8820d200bec0  UN   0.0       0      0  [ll_ost_io01_008]
>>>>> 164083      2  13  ffff8820d2753ec0  UN   0.0       0      0  [ll_ost_io01_034]
>>>>> 171389      2   5  ffff883fd072de20  UN   0.0       0      0  [ll_ost_io01_058]
>>>>> 171391      2  17  ffff883fd072ce70  UN   0.0       0      0  [ll_ost_io01_060]
>>>>> 171399      2  33  ffff883fd9e9ce70  UN   0.0       0      0  [ll_ost_io01_068]
>>>>> 237019      2  15  ffff882715628fb0  UN   0.0       0      0  [ll_ost_io01_084]
>>>>> 237033      2   3  ffff881fb0250fb0  UN   0.0       0      0  [ll_ost_io01_098]
>>>> ...
>>>> 
>>>> crash>  bt 237033
>>>> PID: 237033  TASK: ffff881fb0250fb0  CPU: 3   COMMAND: "ll_ost_io01_098"
>>>> #0 [ffff8826fa3e7240] machine_kexec at ffffffff81059bdb
>>>> #1 [ffff8826fa3e72a0] __crash_kexec at ffffffff811057c2
>>>> #2 [ffff8826fa3e7370] crash_kexec at ffffffff811058b0
>>>> #3 [ffff8826fa3e7388] oops_end at ffffffff8168fac8
>>>> #4 [ffff8826fa3e73b0] die at ffffffff8102e93b
>>>> #5 [ffff8826fa3e73e0] do_trap at ffffffff8168f180
>>>> #6 [ffff8826fa3e7430] do_invalid_op at ffffffff8102b144
>>>> #7 [ffff8826fa3e74e0] invalid_op at ffffffff81698b5e
>>>>    [exception RIP: raid5_get_active_stripe+1809]
>>>>    RIP: ffffffffa0fefd71  RSP: ffff8826fa3e7590  RFLAGS: 00010086
>>>>    RAX: 0000000000000000  RBX: ffff883df051641c  RCX: dead000000000200
>>>>    RDX: 0000000000000000  RSI: 0000000000000005  RDI: ffff881e08d54d30
>>>>    RBP: ffff8826fa3e7638   R8: ffff881e08d54d40   R9: 0000000000000080
>>>>    R10: 0000000000000000  R11: 0000000000000000  R12: ffff883df0516400
>>>>    R13: ffff881e08d54d30  R14: 0000000000000000  R15: 0000000000000000
>>>>    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
>>>> #8 [ffff8826fa3e7640] raid5_make_request at ffffffffa0feff55 [raid456]
>>>> #9 [ffff8826fa3e7730] md_make_request at ffffffff814f838c
>>>> #10 [ffff8826fa3e7790] generic_make_request at ffffffff812eedc9
>>>> #11 [ffff8826fa3e77d8] submit_bio at ffffffff812eef11
>>>> #12 [ffff8826fa3e7830] osd_submit_bio at ffffffffa10cdbcc [osd_ldiskfs]
>>>> #13 [ffff8826fa3e7840] osd_do_bio at ffffffffa10d03c1 [osd_ldiskfs]
>>>> #14 [ffff8826fa3e7960] osd_write_commit at ffffffffa10d0b94 [osd_ldiskfs]
>>>> #15 [ffff8826fa3e7a00] ofd_commitrw_write at ffffffffa120c734 [ofd]
>>>> #16 [ffff8826fa3e7a78] ofd_commitrw at ffffffffa120ff07 [ofd]
>>>> #17 [ffff8826fa3e7b08] obd_commitrw at ffffffffa0c7dc22 [ptlrpc]
>>>> #18 [ffff8826fa3e7b70] tgt_brw_write at ffffffffa0c55fc1 [ptlrpc]
>>>> #19 [ffff8826fa3e7cd8] tgt_request_handle at ffffffffa0c52275 [ptlrpc]
>>>> #20 [ffff8826fa3e7d20] ptlrpc_server_handle_request at ffffffffa0bfe1fb [ptlrpc]
>>>> #21 [ffff8826fa3e7de8] ptlrpc_main at ffffffffa0c022b0 [ptlrpc]
>>>> #22 [ffff8826fa3e7ec8] kthread at ffffffff810b0a4f
>>>> #23 [ffff8826fa3e7f50] ret_from_fork at ffffffff81697218
>>>> 
>>>> 
>>>> After some disassembly work, I found out that all active tasks are working on the same md (all are raid6):
>>>> 
>>>> md32’s (struct r5conf *) is 0xffff883df0516400
>>>> 
>>>> Our crashed task:
>>>> 
>>>> crash> printf "%s", ((struct r5conf)*0xffff883df0516400).cache_name[0]
>>>> raid6-md32
>>>> 
>>>> Example for md32_raid6 task in release_inactive_stripe_list:
>>>> 
>>>> crash> bt 154637
>>>> PID: 154637  TASK: ffff881f776b1f60  CPU: 29  COMMAND: "md32_raid6"
>>>> #0 [ffff883ffc985e48] crash_nmi_callback at ffffffff8104d2e2
>>>> #1 [ffff883ffc985e58] nmi_handle at ffffffff8168fc77
>>>> #2 [ffff883ffc985eb0] do_nmi at ffffffff8168fe83
>>>> #3 [ffff883ffc985ef0] end_repeat_nmi at ffffffff8168f093
>>>>    [exception RIP: _raw_spin_lock_irqsave+64]
>>>>    RIP: ffffffff8168e5f0  RSP: ffff881e8d0cbc60  RFLAGS: 00000006
>>>>    RAX: 00000000000031bb  RBX: ffff883df0516740  RCX: 000000000000009a
>>>>    RDX: 0000000000000098  RSI: 000000000000009a  RDI: ffff883df051641c
>>>>    RBP: ffff881e8d0cbc60   R8: 0000000000000206   R9: 0000000000000000
>>>>    R10: 0000000000000000  R11: 0000000000000007  R12: ffff883df0516400
>>>>    R13: ffff883df051641c  R14: 0000000000000006  R15: 0000000000000005
>>>>    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
>>>> --- <NMI exception stack> ---
>>>> #4 [ffff881e8d0cbc60] _raw_spin_lock_irqsave at ffffffff8168e5f0
>>>> #5 [ffff881e8d0cbc68] release_inactive_stripe_list at ffffffffa0fe5500 [raid456]
>>>> #6 [ffff881e8d0cbcb0] handle_active_stripes at ffffffffa0ff34c2 [raid456]
>>>> #7 [ffff881e8d0cbd60] raid5d at ffffffffa0ff3c28 [raid456]
>>>> #8 [ffff881e8d0cbe50] md_thread at ffffffff814fb745
>>>> #9 [ffff881e8d0cbec8] kthread at ffffffff810b0a4f
>>>> #10 [ffff881e8d0cbf50] ret_from_fork at ffffffff81697218
>>>> 
>>>> 
>>>> crash> bt 163643 164083 171389 171391 171399 237019 237033 |grep "exception RIP:"
>>>>    [exception RIP: _raw_spin_lock_irq+63]
>>>>    [exception RIP: _raw_spin_lock_irq+63]
>>>>    [exception RIP: _raw_spin_lock_irq+58]
>>>>    [exception RIP: _raw_spin_lock_irq+63]
>>>>    [exception RIP: _raw_spin_lock_irq+58]
>>>>    [exception RIP: _raw_spin_lock_irq+63]
>>>>    [exception RIP: raid5_get_active_stripe+1809]
>>>> 
>>>> crash> bt 163643 164083 171389 171391 171399 237019 237033 |grep "raid5_get_active_stripe"
>>>> #5 [ffff883bcee93588] raid5_get_active_stripe at ffffffffa0fef6cb [raid456]
>>>> #5 [ffff883ba4af7588] raid5_get_active_stripe at ffffffffa0fef6cb [raid456]
>>>> #5 [ffff88390877b588] raid5_get_active_stripe at ffffffffa0fef6cb [raid456]
>>>> #5 [ffff88390803b588] raid5_get_active_stripe at ffffffffa0fef6cb [raid456]
>>>> #5 [ffff88390831f588] raid5_get_active_stripe at ffffffffa0fef6cb [raid456]
>>>> #5 [ffff8826fa7f7588] raid5_get_active_stripe at ffffffffa0fef6cb [raid456]
>>>>    [exception RIP: raid5_get_active_stripe+1809]
>>>> 
>>>> crash> bt 163643 164083 171389 171391 171399 237019 237033 |grep R12
>>>>    R10: 0000000000000003  R11: 0000000000000000  R12: ffff883df0516400
>>>>    R10: 0000000000000001  R11: 0000000000000000  R12: ffff883df0516400
>>>>    R10: 0000000000000005  R11: 0000000000000000  R12: ffff883df0516400
>>>>    R10: 0000000000000007  R11: 0000000000000000  R12: ffff883df0516400
>>>>    R10: 0000000000000009  R11: 0000000000000000  R12: ffff883df0516400
>>>>    R10: 0000000000000009  R11: 0000000000000000  R12: ffff883df0516400
>>>>    R10: 0000000000000000  R11: 0000000000000000  R12: ffff883df0516400
>>>> 
>>>> 
>>>> The bad stripe_head (sh) is available at (usually in %r13):
>>>> 
>>>> crash> p ((struct stripe_head)*0xffff881e08d54d30).raid_conf
>>>> $98 = (struct r5conf *) 0xffff883df0516400
>>>> 
>>>> and batch_head is not NULL... that's the issue:
>>>> 
>>>> crash> p ((struct stripe_head)*0xffff881e08d54d30).batch_head
>>>> $99 = (struct stripe_head *) 0xffff881e08d54d30
>>>> 
>>>> Full struct dump below. We can see than overwrite_disks is set to 0, state is 0… but this batch_head not set to NULL makes it crash in init_stripe():
>>>> 
>>>> crash> p ((struct stripe_head)*0xffff881e08d54d30)
>>>> $6 = {
>>>>  hash = {
>>>>    next = 0x0,
>>>>    pprev = 0x0
>>>>  },
>>>>  lru = {
>>>>    next = 0xffff881e08d54d40,
>>>>    prev = 0xffff881e08d54d40
>>>>  },
>>>>  release_list = {
>>>>    next = 0xffff881dfe405cc0
>>>>  },
>>>>  raid_conf = 0xffff883df0516400,
>>>>  generation = 0,
>>>>  sector = 3701224808,
>>>>  pd_idx = 1,
>>>>  qd_idx = 2,
>>>>  ddf_layout = 0,
>>>>  hash_lock_index = 5,
>>>>  state = 0,
>>>>  count = {
>>>>    counter = 0
>>>>  },
>>>>  bm_seq = 22708787,
>>>>  disks = 10,
>>>>  overwrite_disks = 0,
>>>>  check_state = check_state_idle,
>>>>  reconstruct_state = reconstruct_state_idle,
>>>>  stripe_lock = {
>>>>    {
>>>>      rlock = {
>>>>        raw_lock = {
>>>>          {
>>>>            head_tail = 3709263126,
>>>>            tickets = {
>>>>              head = 56598,
>>>>              tail = 56598
>>>>            }
>>>>          }
>>>>        }
>>>>      }
>>>>    }
>>>>  },
>>>>  cpu = 15,
>>>>  group = 0x0,
>>>>  batch_head = 0xffff881e08d54d30,
>>>>  batch_lock = {
>>>>    {
>>>>      rlock = {
>>>>        raw_lock = {
>>>>          {
>>>>            head_tail = 9961624,
>>>>            tickets = {
>>>>              head = 152,
>>>>              tail = 152
>>>>            }
>>>>          }
>>>>        }
>>>>      }
>>>>    }
>>>>  },
>>>>  batch_list = {
>>>>    next = 0xffff881dfe405d28,
>>>>    prev = 0xffff881e0e2dec98
>>>>  },
>>>>  log_io = 0x0,
>>>>  log_list = {
>>>>    next = 0x0,
>>>>    prev = 0x0
>>>>  },
>>>>  ops = {
>>>>    target = 0,
>>>>    target2 = 0,
>>>>    zero_sum_result = (unknown: 0)
>>>>  },
>>>>  dev = {{
>>>>      req = {
>>>>        bi_sector = 0,
>>>>        bi_next = 0x0,
>>>>        bi_bdev = 0x0,
>>>>        bi_flags = 1,
>>>>        bi_rw = 0,
>>>>        bi_vcnt = 0,
>>>>        bi_idx = 0,
>>>>        bi_phys_segments = 0,
>>>>        bi_size = 0,
>>>>        bi_seg_front_size = 0,
>>>>        bi_seg_back_size = 0,
>>>>        bi_end_io = 0x0,
>>>>        bi_private = 0xffff881e08d54d30,
>>>>        bi_ioc = 0x0,
>>>>        bi_css = 0x0,
>>>>        bi_integrity = 0x0,
>>>>        bi_max_vecs = 1,
>>>>        bi_cnt = {
>>>>          counter = 1
>>>>        },
>>>>        bi_io_vec = 0xffff881e08d54f00,
>>>>        bi_pool = 0x0,
>>>>        {
>>>>          bio_aux = 0x0,
>>>>          __UNIQUE_ID_rh_kabi_hide15 = {
>>>>            rh_reserved1 = 0x0
>>>>          },
>>>>          {<No data fields>}
>>>>        },
>>>>        bi_inline_vecs = 0xffff881e08d54e78
>>>>      },
>>>>      rreq = {
>>>>        bi_sector = 0,
>>>>        bi_next = 0x0,
>>>>        bi_bdev = 0x0,
>>>>        bi_flags = 1,
>>>>        bi_rw = 0,
>>>>        bi_vcnt = 0,
>>>>        bi_idx = 0,
>>>>        bi_phys_segments = 0,
>>>>        bi_size = 0,
>>>>        bi_seg_front_size = 0,
>>>>        bi_seg_back_size = 0,
>>>>        bi_end_io = 0x0,
>>>>        bi_private = 0xffff881e08d54d30,
>>>>        bi_ioc = 0x0,
>>>>        bi_css = 0x0,
>>>>        bi_integrity = 0x0,
>>>>        bi_max_vecs = 1,
>>>>        bi_cnt = {
>>>>          counter = 1
>>>>        },
>>>>        bi_io_vec = 0xffff881e08d54f10,
>>>>        bi_pool = 0x0,
>>>>        {
>>>>          bio_aux = 0x0,
>>>>          __UNIQUE_ID_rh_kabi_hide15 = {
>>>>            rh_reserved1 = 0x0
>>>>          },
>>>>          {<No data fields>}
>>>>        },
>>>>        bi_inline_vecs = 0xffff881e08d54f00
>>>>      },
>>>>      vec = {
>>>>        bv_page = 0xffffea0078236080,
>>>>        bv_len = 4096,
>>>>        bv_offset = 0
>>>>      },
>>>>      rvec = {
>>>>        bv_page = 0x0,
>>>>        bv_len = 0,
>>>>        bv_offset = 0
>>>>      },
>>>>      page = 0xffffea0078236080,
>>>>      orig_page = 0xffffea0078236080,
>>>>      toread = 0x0,
>>>>      read = 0x0,
>>>>      towrite = 0x0,
>>>>      written = 0xffff8824832a7c00,
>>>>      sector = 29609798632,
>>>>      flags = 24,
>>> 
>>> Seems the flags is R5_OVERWRITE | R5_Insync, handle_stripe_clean_event
>>> only calls break_stripe_batch_list if do_endio is set to true.
>>> 
>>> if (head_sh->batch_head && do_endio)
>>>               break_stripe_batch_list(head_sh, STRIPE_EXPAND_SYNC_FLAGS);
>>> 
>>> And do_endio is set to true if we meet the below condition.
>>> 
>>> if (!test_bit(R5_LOCKED, &dev->flags) &&
>>>                           (test_bit(R5_UPTODATE, &dev->flags) ||
>>>                            test_bit(R5_Discard, &dev->flags) ||
>>>                            test_bit(R5_SkipCopy, &dev->flags))) {
>>> 
>>> I guess it is why the batch_head is not null, just my 2 cents.
>> 
>> 
>> That’s a very good point, thanks Guoqing!
>> 
>> I checked another similar crash dump, and found out that this same flags was 16 (R5_Insync alone), which doesn’t meet the condition for having do_endio = true neither.
>> 
>> The code below in raid5_end_write_request does directly release a sh->batch_head, couldn’t that be an issue?
>> 
>> static void raid5_end_write_request(struct bio *bi)
>> {
>> 
>> ...
>> 
>>        raid5_release_stripe(sh);
>> 
>>        if (sh->batch_head && sh != sh->batch_head)
>>                raid5_release_stripe(sh->batch_head);
>> }
> 
> Can you dump the full stripe_head info for this crash dump?

Yes, of course!

R13: ffff881e53f5dca0

crash> p (struct stripe_head)*0xffff881e53f5dca0
$1 = {
  hash = {
    next = 0x0, 
    pprev = 0x0
  }, 
  lru = {
    next = 0xffff881e53f5dcb0, 
    prev = 0xffff881e53f5dcb0
  }, 
  release_list = {
    next = 0xffff881e63faec30
  }, 
  raid_conf = 0xffff881f9fb81400, 
  generation = 0, 
  sector = 2278766080, 
  pd_idx = 9, 
  qd_idx = 0, 
  ddf_layout = 0, 
  hash_lock_index = 0, 
  state = 0, 
  count = {
    counter = 0
  }, 
  bm_seq = 21153188, 
  disks = 10, 
  overwrite_disks = 0, 
  check_state = check_state_idle, 
  reconstruct_state = reconstruct_state_idle, 
  stripe_lock = {
    {
      rlock = {
        raw_lock = {
          {
            head_tail = 3483684772, 
            tickets = {
              head = 53156, 
              tail = 53156
            }
          }
        }
      }
    }
  }, 
  cpu = 12, 
  group = 0x0, 
  batch_head = 0xffff881e53f5dca0, 
  batch_lock = {
    {
      rlock = {
        raw_lock = {
          {
            head_tail = 1983149620, 
            tickets = {
              head = 30260, 
              tail = 30260
            }
          }
        }
      }
    }
  }, 
  batch_list = {
    next = 0xffff881e63faec98, 
    prev = 0xffff881e5c390088
  }, 
  log_io = 0x0, 
  log_list = {
    next = 0x0, 
    prev = 0x0
  }, 
  ops = {
    target = 0, 
    target2 = 0, 
    zero_sum_result = (unknown: 0)
  }, 
  dev = {{
      req = {
        bi_sector = 0, 
        bi_next = 0x0, 
        bi_bdev = 0x0, 
        bi_flags = 1, 
        bi_rw = 0, 
        bi_vcnt = 0, 
        bi_idx = 0, 
        bi_phys_segments = 0, 
        bi_size = 0, 
        bi_seg_front_size = 0, 
        bi_seg_back_size = 0, 
        bi_end_io = 0x0, 
        bi_private = 0xffff881e53f5dca0, 
        bi_ioc = 0x0, 
        bi_css = 0x0, 
        bi_integrity = 0x0, 
        bi_max_vecs = 1, 
        bi_cnt = {
          counter = 1
        }, 
        bi_io_vec = 0xffff881e53f5de70, 
        bi_pool = 0x0, 
        {
          bio_aux = 0x0, 
          __UNIQUE_ID_rh_kabi_hide15 = {
            rh_reserved1 = 0x0
          }, 
          {<No data fields>}
        }, 
        bi_inline_vecs = 0xffff881e53f5dde8
      }, 
      rreq = {
        bi_sector = 0, 
        bi_next = 0x0, 
        bi_bdev = 0x0, 
        bi_flags = 1, 
        bi_rw = 0, 
        bi_vcnt = 0, 
        bi_idx = 0, 
        bi_phys_segments = 0, 
        bi_size = 0, 
        bi_seg_front_size = 0, 
        bi_seg_back_size = 0, 
        bi_end_io = 0x0, 
        bi_private = 0xffff881e53f5dca0, 
        bi_ioc = 0x0, 
        bi_css = 0x0, 
        bi_integrity = 0x0, 
        bi_max_vecs = 1, 
        bi_cnt = {
          counter = 1
        }, 
        bi_io_vec = 0xffff881e53f5de80, 
        bi_pool = 0x0, 
        {
          bio_aux = 0x0, 
          __UNIQUE_ID_rh_kabi_hide15 = {
            rh_reserved1 = 0x0
          }, 
          {<No data fields>}
        }, 
        bi_inline_vecs = 0xffff881e53f5de70
      }, 
      vec = {
        bv_page = 0xffffea00794fe000, 
        bv_len = 4096, 
        bv_offset = 0
      }, 
      rvec = {
        bv_page = 0x0, 
        bv_len = 0, 
        bv_offset = 0
      }, 
      page = 0xffffea00794fe000, 
      orig_page = 0xffffea00794fe000, 
      toread = 0x0, 
      read = 0x0, 
      towrite = 0x0, 
      written = 0x0, 
      sector = 0, 
      flags = 16, 
      log_checksum = 0
    }}
}



> Also stripe_head
> should include 10 dev for your 10-disk raid, but crash dump only show one, is
> it possible you can dump the info for the devs in the stripe?

Oh right, crash didn’t display it probably because it is an array allocated dynamically.

Let’s try this to print the 10 devs:

crash> p ((struct stripe_head)*0xffff881e53f5dca0).dev@10
$10 = 
 {{{
      req = {
        bi_sector = 0, 
        bi_next = 0x0, 
        bi_bdev = 0x0, 
        bi_flags = 1, 
        bi_rw = 0, 
        bi_vcnt = 0, 
        bi_idx = 0, 
        bi_phys_segments = 0, 
        bi_size = 0, 
        bi_seg_front_size = 0, 
        bi_seg_back_size = 0, 
        bi_end_io = 0x0, 
        bi_private = 0xffff881e53f5dca0, 
        bi_ioc = 0x0, 
        bi_css = 0x0, 
        bi_integrity = 0x0, 
        bi_max_vecs = 1, 
        bi_cnt = {
          counter = 1
        }, 
        bi_io_vec = 0xffff881e53f5de70, 
        bi_pool = 0x0, 
        {
          bio_aux = 0x0, 
          __UNIQUE_ID_rh_kabi_hide15 = {
            rh_reserved1 = 0x0
          }, 
          {<No data fields>}
        }, 
        bi_inline_vecs = 0xffff881e53f5dde8
      }, 
      rreq = {
        bi_sector = 0, 
        bi_next = 0x0, 
        bi_bdev = 0x0, 
        bi_flags = 1, 
        bi_rw = 0, 
        bi_vcnt = 0, 
        bi_idx = 0, 
        bi_phys_segments = 0, 
        bi_size = 0, 
        bi_seg_front_size = 0, 
        bi_seg_back_size = 0, 
        bi_end_io = 0x0, 
        bi_private = 0xffff881e53f5dca0, 
        bi_ioc = 0x0, 
        bi_css = 0x0, 
        bi_integrity = 0x0, 
        bi_max_vecs = 1, 
        bi_cnt = {
          counter = 1
        }, 
        bi_io_vec = 0xffff881e53f5de80, 
        bi_pool = 0x0, 
        {
          bio_aux = 0x0, 
          __UNIQUE_ID_rh_kabi_hide15 = {
            rh_reserved1 = 0x0
          }, 
          {<No data fields>}
        }, 
        bi_inline_vecs = 0xffff881e53f5de70
      }, 
      vec = {
        bv_page = 0xffffea00794fe000, 
        bv_len = 4096, 
        bv_offset = 0
      }, 
      rvec = {
        bv_page = 0x0, 
        bv_len = 0, 
        bv_offset = 0
      }, 
      page = 0xffffea00794fe000, 
      orig_page = 0xffffea00794fe000, 
      toread = 0x0, 
      read = 0x0, 
      towrite = 0x0, 
      written = 0x0, 
      sector = 0, 
      flags = 16, 
      log_checksum = 0
    }}, {{
      req = {
        bi_sector = 0, 
        bi_next = 0x0, 
        bi_bdev = 0x0, 
        bi_flags = 1, 
        bi_rw = 0, 
        bi_vcnt = 0, 
        bi_idx = 0, 
        bi_phys_segments = 0, 
        bi_size = 0, 
        bi_seg_front_size = 0, 
        bi_seg_back_size = 0, 
        bi_end_io = 0x0, 
        bi_private = 0xffff881e53f5dca0, 
        bi_ioc = 0x0, 
        bi_css = 0x0, 
        bi_integrity = 0x0, 
        bi_max_vecs = 1, 
        bi_cnt = {
          counter = 1
        }, 
        bi_io_vec = 0xffff881e53f5dfe8, 
        bi_pool = 0x0, 
        {
          bio_aux = 0x0, 
          __UNIQUE_ID_rh_kabi_hide15 = {
            rh_reserved1 = 0x0
          }, 
          {<No data fields>}
        }, 
        bi_inline_vecs = 0xffff881e53f5df60
      }, 
      rreq = {
        bi_sector = 0, 
        bi_next = 0x0, 
        bi_bdev = 0x0, 
        bi_flags = 1, 
        bi_rw = 0, 
        bi_vcnt = 0, 
        bi_idx = 0, 
        bi_phys_segments = 0, 
        bi_size = 0, 
        bi_seg_front_size = 0, 
        bi_seg_back_size = 0, 
        bi_end_io = 0x0, 
        bi_private = 0xffff881e53f5dca0, 
        bi_ioc = 0x0, 
        bi_css = 0x0, 
        bi_integrity = 0x0, 
        bi_max_vecs = 1, 
        bi_cnt = {
          counter = 1
        }, 
        bi_io_vec = 0xffff881e53f5dff8, 
        bi_pool = 0x0, 
        {
          bio_aux = 0x0, 
          __UNIQUE_ID_rh_kabi_hide15 = {
            rh_reserved1 = 0x0
          }, 
          {<No data fields>}
        }, 
        bi_inline_vecs = 0xffff881e53f5dfe8
      }, 
      vec = {
        bv_page = 0xffffea00794fe040, 
        bv_len = 4096, 
        bv_offset = 0
      }, 
      rvec = {
        bv_page = 0x0, 
        bv_len = 0, 
        bv_offset = 0
      }, 
      page = 0xffffea00794fe040, 
      orig_page = 0xffffea00794fe040, 
      toread = 0x0, 
      read = 0x0, 
      towrite = 0x0, 
      written = 0xffff881fea351800, 
      sector = 18230128640, 
      flags = 24, 
      log_checksum = 0
    }}, {{
      req = {
        bi_sector = 0, 
        bi_next = 0x0, 
        bi_bdev = 0x0, 
        bi_flags = 1, 
        bi_rw = 0, 
        bi_vcnt = 0, 
        bi_idx = 0, 
        bi_phys_segments = 0, 
        bi_size = 0, 
        bi_seg_front_size = 0, 
        bi_seg_back_size = 0, 
        bi_end_io = 0x0, 
        bi_private = 0xffff881e53f5dca0, 
        bi_ioc = 0x0, 
        bi_css = 0x0, 
        bi_integrity = 0x0, 
        bi_max_vecs = 1, 
        bi_cnt = {
          counter = 1
        }, 
        bi_io_vec = 0xffff881e53f5e160, 
        bi_pool = 0x0, 
        {
          bio_aux = 0x0, 
          __UNIQUE_ID_rh_kabi_hide15 = {
            rh_reserved1 = 0x0
          }, 
          {<No data fields>}
        }, 
        bi_inline_vecs = 0xffff881e53f5e0d8
      }, 
      rreq = {
        bi_sector = 0, 
        bi_next = 0x0, 
        bi_bdev = 0x0, 
        bi_flags = 1, 
        bi_rw = 0, 
        bi_vcnt = 0, 
        bi_idx = 0, 
        bi_phys_segments = 0, 
        bi_size = 0, 
        bi_seg_front_size = 0, 
        bi_seg_back_size = 0, 
        bi_end_io = 0x0, 
        bi_private = 0xffff881e53f5dca0, 
        bi_ioc = 0x0, 
        bi_css = 0x0, 
        bi_integrity = 0x0, 
        bi_max_vecs = 1, 
        bi_cnt = {
          counter = 1
        }, 
        bi_io_vec = 0xffff881e53f5e170, 
        bi_pool = 0x0, 
        {
          bio_aux = 0x0, 
          __UNIQUE_ID_rh_kabi_hide15 = {
            rh_reserved1 = 0x0
          }, 
          {<No data fields>}
        }, 
        bi_inline_vecs = 0xffff881e53f5e160
      }, 
      vec = {
        bv_page = 0xffffea00794fe080, 
        bv_len = 4096, 
        bv_offset = 0
      }, 
      rvec = {
        bv_page = 0x0, 
        bv_len = 0, 
        bv_offset = 0
      }, 
      page = 0xffffea00794fe080, 
      orig_page = 0xffffea00794fe080, 
      toread = 0x0, 
      read = 0x0, 
      towrite = 0x0, 
      written = 0xffff881fea351800, 
      sector = 18230128768, 
      flags = 24, 
      log_checksum = 0
    }}, {{
      req = {
        bi_sector = 0, 
        bi_next = 0x0, 
        bi_bdev = 0x0, 
        bi_flags = 1, 
        bi_rw = 0, 
        bi_vcnt = 0, 
        bi_idx = 0, 
        bi_phys_segments = 0, 
        bi_size = 0, 
        bi_seg_front_size = 0, 
        bi_seg_back_size = 0, 
        bi_end_io = 0x0, 
        bi_private = 0xffff881e53f5dca0, 
        bi_ioc = 0x0, 
        bi_css = 0x0, 
        bi_integrity = 0x0, 
        bi_max_vecs = 1, 
        bi_cnt = {
          counter = 1
        }, 
        bi_io_vec = 0xffff881e53f5e2d8, 
        bi_pool = 0x0, 
        {
          bio_aux = 0x0, 
          __UNIQUE_ID_rh_kabi_hide15 = {
            rh_reserved1 = 0x0
          }, 
          {<No data fields>}
        }, 
        bi_inline_vecs = 0xffff881e53f5e250
      }, 
      rreq = {
        bi_sector = 0, 
        bi_next = 0x0, 
        bi_bdev = 0x0, 
        bi_flags = 1, 
        bi_rw = 0, 
        bi_vcnt = 0, 
        bi_idx = 0, 
        bi_phys_segments = 0, 
        bi_size = 0, 
        bi_seg_front_size = 0, 
        bi_seg_back_size = 0, 
        bi_end_io = 0x0, 
        bi_private = 0xffff881e53f5dca0, 
        bi_ioc = 0x0, 
        bi_css = 0x0, 
        bi_integrity = 0x0, 
        bi_max_vecs = 1, 
        bi_cnt = {
          counter = 1
        }, 
        bi_io_vec = 0xffff881e53f5e2e8, 
        bi_pool = 0x0, 
        {
          bio_aux = 0x0, 
          __UNIQUE_ID_rh_kabi_hide15 = {
            rh_reserved1 = 0x0
          }, 
          {<No data fields>}
        }, 
        bi_inline_vecs = 0xffff881e53f5e2d8
      }, 
      vec = {
        bv_page = 0xffffea00794fe0c0, 
        bv_len = 4096, 
        bv_offset = 0
      }, 
      rvec = {
        bv_page = 0x0, 
        bv_len = 0, 
        bv_offset = 0
      }, 
      page = 0xffffea00794fe0c0, 
      orig_page = 0xffffea00794fe0c0, 
      toread = 0x0, 
      read = 0x0, 
      towrite = 0x0, 
      written = 0xffff881fea351800, 
      sector = 18230128896, 
      flags = 24, 
      log_checksum = 0
    }}, {{
      req = {
        bi_sector = 0, 
        bi_next = 0x0, 
        bi_bdev = 0x0, 
        bi_flags = 1, 
        bi_rw = 0, 
        bi_vcnt = 0, 
        bi_idx = 0, 
        bi_phys_segments = 0, 
        bi_size = 0, 
        bi_seg_front_size = 0, 
        bi_seg_back_size = 0, 
        bi_end_io = 0x0, 
        bi_private = 0xffff881e53f5dca0, 
        bi_ioc = 0x0, 
        bi_css = 0x0, 
        bi_integrity = 0x0, 
        bi_max_vecs = 1, 
        bi_cnt = {
          counter = 1
        }, 
        bi_io_vec = 0xffff881e53f5e450, 
        bi_pool = 0x0, 
        {
          bio_aux = 0x0, 
          __UNIQUE_ID_rh_kabi_hide15 = {
            rh_reserved1 = 0x0
          }, 
          {<No data fields>}
        }, 
        bi_inline_vecs = 0xffff881e53f5e3c8
      }, 
      rreq = {
        bi_sector = 0, 
        bi_next = 0x0, 
        bi_bdev = 0x0, 
        bi_flags = 1, 
        bi_rw = 0, 
        bi_vcnt = 0, 
        bi_idx = 0, 
        bi_phys_segments = 0, 
        bi_size = 0, 
        bi_seg_front_size = 0, 
        bi_seg_back_size = 0, 
        bi_end_io = 0x0, 
        bi_private = 0xffff881e53f5dca0, 
        bi_ioc = 0x0, 
        bi_css = 0x0, 
        bi_integrity = 0x0, 
        bi_max_vecs = 1, 
        bi_cnt = {
          counter = 1
        }, 
        bi_io_vec = 0xffff881e53f5e460, 
        bi_pool = 0x0, 
        {
          bio_aux = 0x0, 
          __UNIQUE_ID_rh_kabi_hide15 = {
            rh_reserved1 = 0x0
          }, 
          {<No data fields>}
        }, 
        bi_inline_vecs = 0xffff881e53f5e450
      }, 
      vec = {
        bv_page = 0xffffea00794fe100, 
        bv_len = 4096, 
        bv_offset = 0
      }, 
      rvec = {
        bv_page = 0x0, 
        bv_len = 0, 
        bv_offset = 0
      }, 
      page = 0xffffea00794fe100, 
      orig_page = 0xffffea00794fe100, 
      toread = 0x0, 
      read = 0x0, 
      towrite = 0x0, 
      written = 0xffff881fea351800, 
      sector = 18230129024, 
      flags = 24, 
      log_checksum = 0
    }}, {{
      req = {
        bi_sector = 0, 
        bi_next = 0x0, 
        bi_bdev = 0x0, 
        bi_flags = 1, 
        bi_rw = 0, 
        bi_vcnt = 0, 
        bi_idx = 0, 
        bi_phys_segments = 0, 
        bi_size = 0, 
        bi_seg_front_size = 0, 
        bi_seg_back_size = 0, 
        bi_end_io = 0x0, 
        bi_private = 0xffff881e53f5dca0, 
        bi_ioc = 0x0, 
        bi_css = 0x0, 
        bi_integrity = 0x0, 
        bi_max_vecs = 1, 
        bi_cnt = {
          counter = 1
        }, 
        bi_io_vec = 0xffff881e53f5e5c8, 
        bi_pool = 0x0, 
        {
          bio_aux = 0x0, 
          __UNIQUE_ID_rh_kabi_hide15 = {
            rh_reserved1 = 0x0
          }, 
          {<No data fields>}
        }, 
        bi_inline_vecs = 0xffff881e53f5e540
      }, 
      rreq = {
        bi_sector = 0, 
        bi_next = 0x0, 
        bi_bdev = 0x0, 
        bi_flags = 1, 
        bi_rw = 0, 
        bi_vcnt = 0, 
        bi_idx = 0, 
        bi_phys_segments = 0, 
        bi_size = 0, 
        bi_seg_front_size = 0, 
        bi_seg_back_size = 0, 
        bi_end_io = 0x0, 
        bi_private = 0xffff881e53f5dca0, 
        bi_ioc = 0x0, 
        bi_css = 0x0, 
        bi_integrity = 0x0, 
        bi_max_vecs = 1, 
        bi_cnt = {
          counter = 1
        }, 
        bi_io_vec = 0xffff881e53f5e5d8, 
        bi_pool = 0x0, 
        {
          bio_aux = 0x0, 
          __UNIQUE_ID_rh_kabi_hide15 = {
            rh_reserved1 = 0x0
          }, 
          {<No data fields>}
        }, 
        bi_inline_vecs = 0xffff881e53f5e5c8
      }, 
      vec = {
        bv_page = 0xffffea00794fe140, 
        bv_len = 4096, 
        bv_offset = 0
      }, 
      rvec = {
        bv_page = 0x0, 
        bv_len = 0, 
        bv_offset = 0
      }, 
      page = 0xffffea00794fe140, 
      orig_page = 0xffffea00794fe140, 
      toread = 0x0, 
      read = 0x0, 
      towrite = 0x0, 
      written = 0xffff881fea351800, 
      sector = 18230129152, 
      flags = 24, 
      log_checksum = 0
    }}, {{
      req = {
        bi_sector = 0, 
        bi_next = 0x0, 
        bi_bdev = 0x0, 
        bi_flags = 1, 
        bi_rw = 0, 
        bi_vcnt = 0, 
        bi_idx = 0, 
        bi_phys_segments = 0, 
        bi_size = 0, 
        bi_seg_front_size = 0, 
        bi_seg_back_size = 0, 
        bi_end_io = 0x0, 
        bi_private = 0xffff881e53f5dca0, 
        bi_ioc = 0x0, 
        bi_css = 0x0, 
        bi_integrity = 0x0, 
        bi_max_vecs = 1, 
        bi_cnt = {
          counter = 1
        }, 
        bi_io_vec = 0xffff881e53f5e740, 
        bi_pool = 0x0, 
        {
          bio_aux = 0x0, 
          __UNIQUE_ID_rh_kabi_hide15 = {
            rh_reserved1 = 0x0
          }, 
          {<No data fields>}
        }, 
        bi_inline_vecs = 0xffff881e53f5e6b8
      }, 
      rreq = {
        bi_sector = 0, 
        bi_next = 0x0, 
        bi_bdev = 0x0, 
        bi_flags = 1, 
        bi_rw = 0, 
        bi_vcnt = 0, 
        bi_idx = 0, 
        bi_phys_segments = 0, 
        bi_size = 0, 
        bi_seg_front_size = 0, 
        bi_seg_back_size = 0, 
        bi_end_io = 0x0, 
        bi_private = 0xffff881e53f5dca0, 
        bi_ioc = 0x0, 
        bi_css = 0x0, 
        bi_integrity = 0x0, 
        bi_max_vecs = 1, 
        bi_cnt = {
          counter = 1
        }, 
        bi_io_vec = 0xffff881e53f5e750, 
        bi_pool = 0x0, 
        {
          bio_aux = 0x0, 
          __UNIQUE_ID_rh_kabi_hide15 = {
            rh_reserved1 = 0x0
          }, 
          {<No data fields>}
        }, 
        bi_inline_vecs = 0xffff881e53f5e740
      }, 
      vec = {
        bv_page = 0xffffea00794fe180, 
        bv_len = 4096, 
        bv_offset = 0
      }, 
      rvec = {
        bv_page = 0x0, 
        bv_len = 0, 
        bv_offset = 0
      }, 
      page = 0xffffea00794fe180, 
      orig_page = 0xffffea00794fe180, 
      toread = 0x0, 
      read = 0x0, 
      towrite = 0x0, 
      written = 0xffff881fea351800, 
      sector = 18230129280, 
      flags = 24, 
      log_checksum = 0
    }}, {{
      req = {
        bi_sector = 0, 
        bi_next = 0x0, 
        bi_bdev = 0x0, 
        bi_flags = 1, 
        bi_rw = 0, 
        bi_vcnt = 0, 
        bi_idx = 0, 
        bi_phys_segments = 0, 
        bi_size = 0, 
        bi_seg_front_size = 0, 
        bi_seg_back_size = 0, 
        bi_end_io = 0x0, 
        bi_private = 0xffff881e53f5dca0, 
        bi_ioc = 0x0, 
        bi_css = 0x0, 
        bi_integrity = 0x0, 
        bi_max_vecs = 1, 
        bi_cnt = {
          counter = 1
        }, 
        bi_io_vec = 0xffff881e53f5e8b8, 
        bi_pool = 0x0, 
        {
          bio_aux = 0x0, 
          __UNIQUE_ID_rh_kabi_hide15 = {
            rh_reserved1 = 0x0
          }, 
          {<No data fields>}
        }, 
        bi_inline_vecs = 0xffff881e53f5e830
      }, 
      rreq = {
        bi_sector = 0, 
        bi_next = 0x0, 
        bi_bdev = 0x0, 
        bi_flags = 1, 
        bi_rw = 0, 
        bi_vcnt = 0, 
        bi_idx = 0, 
        bi_phys_segments = 0, 
        bi_size = 0, 
        bi_seg_front_size = 0, 
        bi_seg_back_size = 0, 
        bi_end_io = 0x0, 
        bi_private = 0xffff881e53f5dca0, 
        bi_ioc = 0x0, 
        bi_css = 0x0, 
        bi_integrity = 0x0, 
        bi_max_vecs = 1, 
        bi_cnt = {
          counter = 1
        }, 
        bi_io_vec = 0xffff881e53f5e8c8, 
        bi_pool = 0x0, 
        {
          bio_aux = 0x0, 
          __UNIQUE_ID_rh_kabi_hide15 = {
            rh_reserved1 = 0x0
          }, 
          {<No data fields>}
        }, 
        bi_inline_vecs = 0xffff881e53f5e8b8
      }, 
      vec = {
        bv_page = 0xffffea00794fe1c0, 
        bv_len = 4096, 
        bv_offset = 0
      }, 
      rvec = {
        bv_page = 0x0, 
        bv_len = 0, 
        bv_offset = 0
      }, 
      page = 0xffffea00794fe1c0, 
      orig_page = 0xffffea00794fe1c0, 
      toread = 0x0, 
      read = 0x0, 
      towrite = 0x0, 
      written = 0xffff881fea351800, 
      sector = 18230129408, 
      flags = 24, 
      log_checksum = 0
    }}, {{
      req = {
        bi_sector = 0, 
        bi_next = 0x0, 
        bi_bdev = 0x0, 
        bi_flags = 1, 
        bi_rw = 0, 
        bi_vcnt = 0, 
        bi_idx = 0, 
        bi_phys_segments = 0, 
        bi_size = 0, 
        bi_seg_front_size = 0, 
        bi_seg_back_size = 0, 
        bi_end_io = 0x0, 
        bi_private = 0xffff881e53f5dca0, 
        bi_ioc = 0x0, 
        bi_css = 0x0, 
        bi_integrity = 0x0, 
        bi_max_vecs = 1, 
        bi_cnt = {
          counter = 1
        }, 
        bi_io_vec = 0xffff881e53f5ea30, 
        bi_pool = 0x0, 
        {
          bio_aux = 0x0, 
          __UNIQUE_ID_rh_kabi_hide15 = {
            rh_reserved1 = 0x0
          }, 
          {<No data fields>}
        }, 
        bi_inline_vecs = 0xffff881e53f5e9a8
      }, 
      rreq = {
        bi_sector = 0, 
        bi_next = 0x0, 
        bi_bdev = 0x0, 
        bi_flags = 1, 
        bi_rw = 0, 
        bi_vcnt = 0, 
        bi_idx = 0, 
        bi_phys_segments = 0, 
        bi_size = 0, 
        bi_seg_front_size = 0, 
        bi_seg_back_size = 0, 
        bi_end_io = 0x0, 
        bi_private = 0xffff881e53f5dca0, 
        bi_ioc = 0x0, 
        bi_css = 0x0, 
        bi_integrity = 0x0, 
        bi_max_vecs = 1, 
        bi_cnt = {
          counter = 1
        }, 
        bi_io_vec = 0xffff881e53f5ea40, 
        bi_pool = 0x0, 
        {
          bio_aux = 0x0, 
          __UNIQUE_ID_rh_kabi_hide15 = {
            rh_reserved1 = 0x0
          }, 
          {<No data fields>}
        }, 
        bi_inline_vecs = 0xffff881e53f5ea30
      }, 
      vec = {
        bv_page = 0xffffea00794fe200, 
        bv_len = 4096, 
        bv_offset = 0
      }, 
      rvec = {
        bv_page = 0x0, 
        bv_len = 0, 
        bv_offset = 0
      }, 
      page = 0xffffea00794fe200, 
      orig_page = 0xffffea00794fe200, 
      toread = 0x0, 
      read = 0x0, 
      towrite = 0x0, 
      written = 0xffff881fea351800, 
      sector = 18230129536, 
      flags = 24, 
      log_checksum = 0
    }}, {{
      req = {
        bi_sector = 0, 
        bi_next = 0x0, 
        bi_bdev = 0x0, 
        bi_flags = 1, 
        bi_rw = 0, 
        bi_vcnt = 0, 
        bi_idx = 0, 
        bi_phys_segments = 0, 
        bi_size = 0, 
        bi_seg_front_size = 0, 
        bi_seg_back_size = 0, 
        bi_end_io = 0x0, 
        bi_private = 0xffff881e53f5dca0, 
        bi_ioc = 0x0, 
        bi_css = 0x0, 
        bi_integrity = 0x0, 
        bi_max_vecs = 1, 
        bi_cnt = {
          counter = 1
        }, 
        bi_io_vec = 0xffff881e53f5eba8, 
        bi_pool = 0x0, 
        {
          bio_aux = 0x0, 
          __UNIQUE_ID_rh_kabi_hide15 = {
            rh_reserved1 = 0x0
          }, 
          {<No data fields>}
        }, 
        bi_inline_vecs = 0xffff881e53f5eb20
      }, 
      rreq = {
        bi_sector = 0, 
        bi_next = 0x0, 
        bi_bdev = 0x0, 
        bi_flags = 1, 
        bi_rw = 0, 
        bi_vcnt = 0, 
        bi_idx = 0, 
        bi_phys_segments = 0, 
        bi_size = 0, 
        bi_seg_front_size = 0, 
        bi_seg_back_size = 0, 
        bi_end_io = 0x0, 
        bi_private = 0xffff881e53f5dca0, 
        bi_ioc = 0x0, 
        bi_css = 0x0, 
        bi_integrity = 0x0, 
        bi_max_vecs = 1, 
        bi_cnt = {
          counter = 1
        }, 
        bi_io_vec = 0xffff881e53f5ebb8, 
        bi_pool = 0x0, 
        {
          bio_aux = 0x0, 
          __UNIQUE_ID_rh_kabi_hide15 = {
            rh_reserved1 = 0x0
          }, 
          {<No data fields>}
        }, 
        bi_inline_vecs = 0xffff881e53f5eba8
      }, 
      vec = {
        bv_page = 0xffffea00794fe240, 
        bv_len = 4096, 
        bv_offset = 0
      }, 
      rvec = {
        bv_page = 0x0, 
        bv_len = 0, 
        bv_offset = 0
      }, 
      page = 0xffffea00794fe240, 
      orig_page = 0xffffea00794fe240, 
      toread = 0x0, 
      read = 0x0, 
      towrite = 0x0, 
      written = 0x0, 
      sector = 0, 
      flags = 16, 
      log_checksum = 0
    }}}


---------------------------------------------------

And this is the stripe_head devs from the crash dump I sent yesterday:

R13: ffff881e08d54d30

crash> p ((struct stripe_head)*0xffff881e08d54d30).dev@10
$1 = 
 {{{
      req = {
        bi_sector = 0, 
        bi_next = 0x0, 
        bi_bdev = 0x0, 
        bi_flags = 1, 
        bi_rw = 0, 
        bi_vcnt = 0, 
        bi_idx = 0, 
        bi_phys_segments = 0, 
        bi_size = 0, 
        bi_seg_front_size = 0, 
        bi_seg_back_size = 0, 
        bi_end_io = 0x0, 
        bi_private = 0xffff881e08d54d30, 
        bi_ioc = 0x0, 
        bi_css = 0x0, 
        bi_integrity = 0x0, 
        bi_max_vecs = 1, 
        bi_cnt = {
          counter = 1
        }, 
        bi_io_vec = 0xffff881e08d54f00, 
        bi_pool = 0x0, 
        {
          bio_aux = 0x0, 
          __UNIQUE_ID_rh_kabi_hide15 = {
            rh_reserved1 = 0x0
          }, 
          {<No data fields>}
        }, 
        bi_inline_vecs = 0xffff881e08d54e78
      }, 
      rreq = {
        bi_sector = 0, 
        bi_next = 0x0, 
        bi_bdev = 0x0, 
        bi_flags = 1, 
        bi_rw = 0, 
        bi_vcnt = 0, 
        bi_idx = 0, 
        bi_phys_segments = 0, 
        bi_size = 0, 
        bi_seg_front_size = 0, 
        bi_seg_back_size = 0, 
        bi_end_io = 0x0, 
        bi_private = 0xffff881e08d54d30, 
        bi_ioc = 0x0, 
        bi_css = 0x0, 
        bi_integrity = 0x0, 
        bi_max_vecs = 1, 
        bi_cnt = {
          counter = 1
        }, 
        bi_io_vec = 0xffff881e08d54f10, 
        bi_pool = 0x0, 
        {
          bio_aux = 0x0, 
          __UNIQUE_ID_rh_kabi_hide15 = {
            rh_reserved1 = 0x0
          }, 
          {<No data fields>}
        }, 
        bi_inline_vecs = 0xffff881e08d54f00
      }, 
      vec = {
        bv_page = 0xffffea0078236080, 
        bv_len = 4096, 
        bv_offset = 0
      }, 
      rvec = {
        bv_page = 0x0, 
        bv_len = 0, 
        bv_offset = 0
      }, 
      page = 0xffffea0078236080, 
      orig_page = 0xffffea0078236080, 
      toread = 0x0, 
      read = 0x0, 
      towrite = 0x0, 
      written = 0xffff8824832a7c00, 
      sector = 29609798632, 
      flags = 24, 
      log_checksum = 0
    }}, {{
      req = {
        bi_sector = 0, 
        bi_next = 0x0, 
        bi_bdev = 0x0, 
        bi_flags = 1, 
        bi_rw = 0, 
        bi_vcnt = 0, 
        bi_idx = 0, 
        bi_phys_segments = 0, 
        bi_size = 0, 
        bi_seg_front_size = 0, 
        bi_seg_back_size = 0, 
        bi_end_io = 0x0, 
        bi_private = 0xffff881e08d54d30, 
        bi_ioc = 0x0, 
        bi_css = 0x0, 
        bi_integrity = 0x0, 
        bi_max_vecs = 1, 
        bi_cnt = {
          counter = 1
        }, 
        bi_io_vec = 0xffff881e08d55078, 
        bi_pool = 0x0, 
        {
          bio_aux = 0x0, 
          __UNIQUE_ID_rh_kabi_hide15 = {
            rh_reserved1 = 0x0
          }, 
          {<No data fields>}
        }, 
        bi_inline_vecs = 0xffff881e08d54ff0
      }, 
      rreq = {
        bi_sector = 0, 
        bi_next = 0x0, 
        bi_bdev = 0x0, 
        bi_flags = 1, 
        bi_rw = 0, 
        bi_vcnt = 0, 
        bi_idx = 0, 
        bi_phys_segments = 0, 
        bi_size = 0, 
        bi_seg_front_size = 0, 
        bi_seg_back_size = 0, 
        bi_end_io = 0x0, 
        bi_private = 0xffff881e08d54d30, 
        bi_ioc = 0x0, 
        bi_css = 0x0, 
        bi_integrity = 0x0, 
        bi_max_vecs = 1, 
        bi_cnt = {
          counter = 1
        }, 
        bi_io_vec = 0xffff881e08d55088, 
        bi_pool = 0x0, 
        {
          bio_aux = 0x0, 
          __UNIQUE_ID_rh_kabi_hide15 = {
            rh_reserved1 = 0x0
          }, 
          {<No data fields>}
        }, 
        bi_inline_vecs = 0xffff881e08d55078
      }, 
      vec = {
        bv_page = 0xffffea00782360c0, 
        bv_len = 4096, 
        bv_offset = 0
      }, 
      rvec = {
        bv_page = 0x0, 
        bv_len = 0, 
        bv_offset = 0
      }, 
      page = 0xffffea00782360c0, 
      orig_page = 0xffffea00782360c0, 
      toread = 0x0, 
      read = 0x0, 
      towrite = 0x0, 
      written = 0x0, 
      sector = 0, 
      flags = 16, 
      log_checksum = 0
    }}, {{
      req = {
        bi_sector = 0, 
        bi_next = 0x0, 
        bi_bdev = 0x0, 
        bi_flags = 1, 
        bi_rw = 0, 
        bi_vcnt = 0, 
        bi_idx = 0, 
        bi_phys_segments = 0, 
        bi_size = 0, 
        bi_seg_front_size = 0, 
        bi_seg_back_size = 0, 
        bi_end_io = 0x0, 
        bi_private = 0xffff881e08d54d30, 
        bi_ioc = 0x0, 
        bi_css = 0x0, 
        bi_integrity = 0x0, 
        bi_max_vecs = 1, 
        bi_cnt = {
          counter = 1
        }, 
        bi_io_vec = 0xffff881e08d551f0, 
        bi_pool = 0x0, 
        {
          bio_aux = 0x0, 
          __UNIQUE_ID_rh_kabi_hide15 = {
            rh_reserved1 = 0x0
          }, 
          {<No data fields>}
        }, 
        bi_inline_vecs = 0xffff881e08d55168
      }, 
      rreq = {
        bi_sector = 0, 
        bi_next = 0x0, 
        bi_bdev = 0x0, 
        bi_flags = 1, 
        bi_rw = 0, 
        bi_vcnt = 0, 
        bi_idx = 0, 
        bi_phys_segments = 0, 
        bi_size = 0, 
        bi_seg_front_size = 0, 
        bi_seg_back_size = 0, 
        bi_end_io = 0x0, 
        bi_private = 0xffff881e08d54d30, 
        bi_ioc = 0x0, 
        bi_css = 0x0, 
        bi_integrity = 0x0, 
        bi_max_vecs = 1, 
        bi_cnt = {
          counter = 1
        }, 
        bi_io_vec = 0xffff881e08d55200, 
        bi_pool = 0x0, 
        {
          bio_aux = 0x0, 
          __UNIQUE_ID_rh_kabi_hide15 = {
            rh_reserved1 = 0x0
          }, 
          {<No data fields>}
        }, 
        bi_inline_vecs = 0xffff881e08d551f0
      }, 
      vec = {
        bv_page = 0xffffea0078236100, 
        bv_len = 4096, 
        bv_offset = 0
      }, 
      rvec = {
        bv_page = 0x0, 
        bv_len = 0, 
        bv_offset = 0
      }, 
      page = 0xffffea0078236100, 
      orig_page = 0xffffea0078236100, 
      toread = 0x0, 
      read = 0x0, 
      towrite = 0x0, 
      written = 0x0, 
      sector = 0, 
      flags = 16, 
      log_checksum = 0
    }}, {{
      req = {
        bi_sector = 0, 
        bi_next = 0x0, 
        bi_bdev = 0x0, 
        bi_flags = 1, 
        bi_rw = 0, 
        bi_vcnt = 0, 
        bi_idx = 0, 
        bi_phys_segments = 0, 
        bi_size = 0, 
        bi_seg_front_size = 0, 
        bi_seg_back_size = 0, 
        bi_end_io = 0x0, 
        bi_private = 0xffff881e08d54d30, 
        bi_ioc = 0x0, 
        bi_css = 0x0, 
        bi_integrity = 0x0, 
        bi_max_vecs = 1, 
        bi_cnt = {
          counter = 1
        }, 
        bi_io_vec = 0xffff881e08d55368, 
        bi_pool = 0x0, 
        {
          bio_aux = 0x0, 
          __UNIQUE_ID_rh_kabi_hide15 = {
            rh_reserved1 = 0x0
          }, 
          {<No data fields>}
        }, 
        bi_inline_vecs = 0xffff881e08d552e0
      }, 
      rreq = {
        bi_sector = 0, 
        bi_next = 0x0, 
        bi_bdev = 0x0, 
        bi_flags = 1, 
        bi_rw = 0, 
        bi_vcnt = 0, 
        bi_idx = 0, 
        bi_phys_segments = 0, 
        bi_size = 0, 
        bi_seg_front_size = 0, 
        bi_seg_back_size = 0, 
        bi_end_io = 0x0, 
        bi_private = 0xffff881e08d54d30, 
        bi_ioc = 0x0, 
        bi_css = 0x0, 
        bi_integrity = 0x0, 
        bi_max_vecs = 1, 
        bi_cnt = {
          counter = 1
        }, 
        bi_io_vec = 0xffff881e08d55378, 
        bi_pool = 0x0, 
        {
          bio_aux = 0x0, 
          __UNIQUE_ID_rh_kabi_hide15 = {
            rh_reserved1 = 0x0
          }, 
          {<No data fields>}
        }, 
        bi_inline_vecs = 0xffff881e08d55368
      }, 
      vec = {
        bv_page = 0xffffea0078236140, 
        bv_len = 4096, 
        bv_offset = 0
      }, 
      rvec = {
        bv_page = 0x0, 
        bv_len = 0, 
        bv_offset = 0
      }, 
      page = 0xffffea0078236140, 
      orig_page = 0xffffea0078236140, 
      toread = 0x0, 
      read = 0x0, 
      towrite = 0x0, 
      written = 0xffff8824832a7c00, 
      sector = 29609797736, 
      flags = 24, 
      log_checksum = 0
    }}, {{
      req = {
        bi_sector = 0, 
        bi_next = 0x0, 
        bi_bdev = 0x0, 
        bi_flags = 1, 
        bi_rw = 0, 
        bi_vcnt = 0, 
        bi_idx = 0, 
        bi_phys_segments = 0, 
        bi_size = 0, 
        bi_seg_front_size = 0, 
        bi_seg_back_size = 0, 
        bi_end_io = 0x0, 
        bi_private = 0xffff881e08d54d30, 
        bi_ioc = 0x0, 
        bi_css = 0x0, 
        bi_integrity = 0x0, 
        bi_max_vecs = 1, 
        bi_cnt = {
          counter = 1
        }, 
        bi_io_vec = 0xffff881e08d554e0, 
        bi_pool = 0x0, 
        {
          bio_aux = 0x0, 
          __UNIQUE_ID_rh_kabi_hide15 = {
            rh_reserved1 = 0x0
          }, 
          {<No data fields>}
        }, 
        bi_inline_vecs = 0xffff881e08d55458
      }, 
      rreq = {
        bi_sector = 0, 
        bi_next = 0x0, 
        bi_bdev = 0x0, 
        bi_flags = 1, 
        bi_rw = 0, 
        bi_vcnt = 0, 
        bi_idx = 0, 
        bi_phys_segments = 0, 
        bi_size = 0, 
        bi_seg_front_size = 0, 
        bi_seg_back_size = 0, 
        bi_end_io = 0x0, 
        bi_private = 0xffff881e08d54d30, 
        bi_ioc = 0x0, 
        bi_css = 0x0, 
        bi_integrity = 0x0, 
        bi_max_vecs = 1, 
        bi_cnt = {
          counter = 1
        }, 
        bi_io_vec = 0xffff881e08d554f0, 
        bi_pool = 0x0, 
        {
          bio_aux = 0x0, 
          __UNIQUE_ID_rh_kabi_hide15 = {
            rh_reserved1 = 0x0
          }, 
          {<No data fields>}
        }, 
        bi_inline_vecs = 0xffff881e08d554e0
      }, 
      vec = {
        bv_page = 0xffffea0078236180, 
        bv_len = 4096, 
        bv_offset = 0
      }, 
      rvec = {
        bv_page = 0x0, 
        bv_len = 0, 
        bv_offset = 0
      }, 
      page = 0xffffea0078236180, 
      orig_page = 0xffffea0078236180, 
      toread = 0x0, 
      read = 0x0, 
      towrite = 0x0, 
      written = 0xffff8824832a7c00, 
      sector = 29609797864, 
      flags = 24, 
      log_checksum = 0
    }}, {{
      req = {
        bi_sector = 0, 
        bi_next = 0x0, 
        bi_bdev = 0x0, 
        bi_flags = 1, 
        bi_rw = 0, 
        bi_vcnt = 0, 
        bi_idx = 0, 
        bi_phys_segments = 0, 
        bi_size = 0, 
        bi_seg_front_size = 0, 
        bi_seg_back_size = 0, 
        bi_end_io = 0x0, 
        bi_private = 0xffff881e08d54d30, 
        bi_ioc = 0x0, 
        bi_css = 0x0, 
        bi_integrity = 0x0, 
        bi_max_vecs = 1, 
        bi_cnt = {
          counter = 1
        }, 
        bi_io_vec = 0xffff881e08d55658, 
        bi_pool = 0x0, 
        {
          bio_aux = 0x0, 
          __UNIQUE_ID_rh_kabi_hide15 = {
            rh_reserved1 = 0x0
          }, 
          {<No data fields>}
        }, 
        bi_inline_vecs = 0xffff881e08d555d0
      }, 
      rreq = {
        bi_sector = 0, 
        bi_next = 0x0, 
        bi_bdev = 0x0, 
        bi_flags = 1, 
        bi_rw = 0, 
        bi_vcnt = 0, 
        bi_idx = 0, 
        bi_phys_segments = 0, 
        bi_size = 0, 
        bi_seg_front_size = 0, 
        bi_seg_back_size = 0, 
        bi_end_io = 0x0, 
        bi_private = 0xffff881e08d54d30, 
        bi_ioc = 0x0, 
        bi_css = 0x0, 
        bi_integrity = 0x0, 
        bi_max_vecs = 1, 
        bi_cnt = {
          counter = 1
        }, 
        bi_io_vec = 0xffff881e08d55668, 
        bi_pool = 0x0, 
        {
          bio_aux = 0x0, 
          __UNIQUE_ID_rh_kabi_hide15 = {
            rh_reserved1 = 0x0
          }, 
          {<No data fields>}
        }, 
        bi_inline_vecs = 0xffff881e08d55658
      }, 
      vec = {
        bv_page = 0xffffea00782361c0, 
        bv_len = 4096, 
        bv_offset = 0
      }, 
      rvec = {
        bv_page = 0x0, 
        bv_len = 0, 
        bv_offset = 0
      }, 
      page = 0xffffea00782361c0, 
      orig_page = 0xffffea00782361c0, 
      toread = 0x0, 
      read = 0x0, 
      towrite = 0x0, 
      written = 0xffff8824832a7c00, 
      sector = 29609797992, 
      flags = 24, 
      log_checksum = 0
    }}, {{
      req = {
        bi_sector = 0, 
        bi_next = 0x0, 
        bi_bdev = 0x0, 
        bi_flags = 1, 
        bi_rw = 0, 
        bi_vcnt = 0, 
        bi_idx = 0, 
        bi_phys_segments = 0, 
        bi_size = 0, 
        bi_seg_front_size = 0, 
        bi_seg_back_size = 0, 
        bi_end_io = 0x0, 
        bi_private = 0xffff881e08d54d30, 
        bi_ioc = 0x0, 
        bi_css = 0x0, 
        bi_integrity = 0x0, 
        bi_max_vecs = 1, 
        bi_cnt = {
          counter = 1
        }, 
        bi_io_vec = 0xffff881e08d557d0, 
        bi_pool = 0x0, 
        {
          bio_aux = 0x0, 
          __UNIQUE_ID_rh_kabi_hide15 = {
            rh_reserved1 = 0x0
          }, 
          {<No data fields>}
        }, 
        bi_inline_vecs = 0xffff881e08d55748
      }, 
      rreq = {
        bi_sector = 0, 
        bi_next = 0x0, 
        bi_bdev = 0x0, 
        bi_flags = 1, 
        bi_rw = 0, 
        bi_vcnt = 0, 
        bi_idx = 0, 
        bi_phys_segments = 0, 
        bi_size = 0, 
        bi_seg_front_size = 0, 
        bi_seg_back_size = 0, 
        bi_end_io = 0x0, 
        bi_private = 0xffff881e08d54d30, 
        bi_ioc = 0x0, 
        bi_css = 0x0, 
        bi_integrity = 0x0, 
        bi_max_vecs = 1, 
        bi_cnt = {
          counter = 1
        }, 
        bi_io_vec = 0xffff881e08d557e0, 
        bi_pool = 0x0, 
        {
          bio_aux = 0x0, 
          __UNIQUE_ID_rh_kabi_hide15 = {
            rh_reserved1 = 0x0
          }, 
          {<No data fields>}
        }, 
        bi_inline_vecs = 0xffff881e08d557d0
      }, 
      vec = {
        bv_page = 0xffffea0078236200, 
        bv_len = 4096, 
        bv_offset = 0
      }, 
      rvec = {
        bv_page = 0x0, 
        bv_len = 0, 
        bv_offset = 0
      }, 
      page = 0xffffea0078236200, 
      orig_page = 0xffffea0078236200, 
      toread = 0x0, 
      read = 0x0, 
      towrite = 0x0, 
      written = 0xffff8824832a7c00, 
      sector = 29609798120, 
      flags = 24, 
      log_checksum = 0
    }}, {{
      req = {
        bi_sector = 0, 
        bi_next = 0x0, 
        bi_bdev = 0x0, 
        bi_flags = 1, 
        bi_rw = 0, 
        bi_vcnt = 0, 
        bi_idx = 0, 
        bi_phys_segments = 0, 
        bi_size = 0, 
        bi_seg_front_size = 0, 
        bi_seg_back_size = 0, 
        bi_end_io = 0x0, 
        bi_private = 0xffff881e08d54d30, 
        bi_ioc = 0x0, 
        bi_css = 0x0, 
        bi_integrity = 0x0, 
        bi_max_vecs = 1, 
        bi_cnt = {
          counter = 1
        }, 
        bi_io_vec = 0xffff881e08d55948, 
        bi_pool = 0x0, 
        {
          bio_aux = 0x0, 
          __UNIQUE_ID_rh_kabi_hide15 = {
            rh_reserved1 = 0x0
          }, 
          {<No data fields>}
        }, 
        bi_inline_vecs = 0xffff881e08d558c0
      }, 
      rreq = {
        bi_sector = 0, 
        bi_next = 0x0, 
        bi_bdev = 0x0, 
        bi_flags = 1, 
        bi_rw = 0, 
        bi_vcnt = 0, 
        bi_idx = 0, 
        bi_phys_segments = 0, 
        bi_size = 0, 
        bi_seg_front_size = 0, 
        bi_seg_back_size = 0, 
        bi_end_io = 0x0, 
        bi_private = 0xffff881e08d54d30, 
        bi_ioc = 0x0, 
        bi_css = 0x0, 
        bi_integrity = 0x0, 
        bi_max_vecs = 1, 
        bi_cnt = {
          counter = 1
        }, 
        bi_io_vec = 0xffff881e08d55958, 
        bi_pool = 0x0, 
        {
          bio_aux = 0x0, 
          __UNIQUE_ID_rh_kabi_hide15 = {
            rh_reserved1 = 0x0
          }, 
          {<No data fields>}
        }, 
        bi_inline_vecs = 0xffff881e08d55948
      }, 
      vec = {
        bv_page = 0xffffea0078236240, 
        bv_len = 4096, 
        bv_offset = 0
      }, 
      rvec = {
        bv_page = 0x0, 
        bv_len = 0, 
        bv_offset = 0
      }, 
      page = 0xffffea0078236240, 
      orig_page = 0xffffea0078236240, 
      toread = 0x0, 
      read = 0x0, 
      towrite = 0x0, 
      written = 0xffff8824832a7c00, 
      sector = 29609798248, 
      flags = 24, 
      log_checksum = 0
    }}, {{
      req = {
        bi_sector = 0, 
        bi_next = 0x0, 
        bi_bdev = 0x0, 
        bi_flags = 1, 
        bi_rw = 0, 
        bi_vcnt = 0, 
        bi_idx = 0, 
        bi_phys_segments = 0, 
        bi_size = 0, 
        bi_seg_front_size = 0, 
        bi_seg_back_size = 0, 
        bi_end_io = 0x0, 
        bi_private = 0xffff881e08d54d30, 
        bi_ioc = 0x0, 
        bi_css = 0x0, 
        bi_integrity = 0x0, 
        bi_max_vecs = 1, 
        bi_cnt = {
          counter = 1
        }, 
        bi_io_vec = 0xffff881e08d55ac0, 
        bi_pool = 0x0, 
        {
          bio_aux = 0x0, 
          __UNIQUE_ID_rh_kabi_hide15 = {
            rh_reserved1 = 0x0
          }, 
          {<No data fields>}
        }, 
        bi_inline_vecs = 0xffff881e08d55a38
      }, 
      rreq = {
        bi_sector = 0, 
        bi_next = 0x0, 
        bi_bdev = 0x0, 
        bi_flags = 1, 
        bi_rw = 0, 
        bi_vcnt = 0, 
        bi_idx = 0, 
        bi_phys_segments = 0, 
        bi_size = 0, 
        bi_seg_front_size = 0, 
        bi_seg_back_size = 0, 
        bi_end_io = 0x0, 
        bi_private = 0xffff881e08d54d30, 
        bi_ioc = 0x0, 
        bi_css = 0x0, 
        bi_integrity = 0x0, 
        bi_max_vecs = 1, 
        bi_cnt = {
          counter = 1
        }, 
        bi_io_vec = 0xffff881e08d55ad0, 
        bi_pool = 0x0, 
        {
          bio_aux = 0x0, 
          __UNIQUE_ID_rh_kabi_hide15 = {
            rh_reserved1 = 0x0
          }, 
          {<No data fields>}
        }, 
        bi_inline_vecs = 0xffff881e08d55ac0
      }, 
      vec = {
        bv_page = 0xffffea0078236280, 
        bv_len = 4096, 
        bv_offset = 0
      }, 
      rvec = {
        bv_page = 0x0, 
        bv_len = 0, 
        bv_offset = 0
      }, 
      page = 0xffffea0078236280, 
      orig_page = 0xffffea0078236280, 
      toread = 0x0, 
      read = 0x0, 
      towrite = 0x0, 
      written = 0xffff8824832a7c00, 
      sector = 29609798376, 
      flags = 24, 
      log_checksum = 0
    }}, {{
      req = {
        bi_sector = 0, 
        bi_next = 0x0, 
        bi_bdev = 0x0, 
        bi_flags = 1, 
        bi_rw = 0, 
        bi_vcnt = 0, 
        bi_idx = 0, 
        bi_phys_segments = 0, 
        bi_size = 0, 
        bi_seg_front_size = 0, 
        bi_seg_back_size = 0, 
        bi_end_io = 0x0, 
        bi_private = 0xffff881e08d54d30, 
        bi_ioc = 0x0, 
        bi_css = 0x0, 
        bi_integrity = 0x0, 
        bi_max_vecs = 1, 
        bi_cnt = {
          counter = 1
        }, 
        bi_io_vec = 0xffff881e08d55c38, 
        bi_pool = 0x0, 
        {
          bio_aux = 0x0, 
          __UNIQUE_ID_rh_kabi_hide15 = {
            rh_reserved1 = 0x0
          }, 
          {<No data fields>}
        }, 
        bi_inline_vecs = 0xffff881e08d55bb0
      }, 
      rreq = {
        bi_sector = 0, 
        bi_next = 0x0, 
        bi_bdev = 0x0, 
        bi_flags = 1, 
        bi_rw = 0, 
        bi_vcnt = 0, 
        bi_idx = 0, 
        bi_phys_segments = 0, 
        bi_size = 0, 
        bi_seg_front_size = 0, 
        bi_seg_back_size = 0, 
        bi_end_io = 0x0, 
        bi_private = 0xffff881e08d54d30, 
        bi_ioc = 0x0, 
        bi_css = 0x0, 
        bi_integrity = 0x0, 
        bi_max_vecs = 1, 
        bi_cnt = {
          counter = 1
        }, 
        bi_io_vec = 0xffff881e08d55c48, 
        bi_pool = 0x0, 
        {
          bio_aux = 0x0, 
          __UNIQUE_ID_rh_kabi_hide15 = {
            rh_reserved1 = 0x0
          }, 
          {<No data fields>}
        }, 
        bi_inline_vecs = 0xffff881e08d55c38
      }, 
      vec = {
        bv_page = 0xffffea00782362c0, 
        bv_len = 4096, 
        bv_offset = 0
      }, 
      rvec = {
        bv_page = 0x0, 
        bv_len = 0, 
        bv_offset = 0
      }, 
      page = 0xffffea00782362c0, 
      orig_page = 0xffffea00782362c0, 
      toread = 0x0, 
      read = 0x0, 
      towrite = 0x0, 
      written = 0xffff8824832a7c00, 
      sector = 29609798504, 
      flags = 24, 
      log_checksum = 0
    }}}


Thanks!
Stephane





^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: BUG_ON(sh->batch_head) in init_stripe()
  2017-08-15 19:27         ` Stephane Thiell
@ 2017-08-17 22:58           ` Shaohua Li
  2017-08-18  4:13             ` Stephane Thiell
  0 siblings, 1 reply; 15+ messages in thread
From: Shaohua Li @ 2017-08-17 22:58 UTC (permalink / raw)
  To: Stephane Thiell; +Cc: Guoqing Jiang, linux-raid

On Tue, Aug 15, 2017 at 07:27:44PM +0000, Stephane Thiell wrote:
> 
> > On Aug 15, 2017, at 11:39 AM, Shaohua Li <shli@kernel.org> wrote:
> > 
> > On Tue, Aug 15, 2017 at 05:59:38PM +0000, Stephane Thiell wrote:
> >> 
> >>> On Aug 15, 2017, at 2:24 AM, Guoqing Jiang <gqjiang@suse.com> wrote:
> >>> 
> >>> 
> >>> 
> >>> On 08/14/2017 01:29 AM, Stephane Thiell wrote:
> >>>> Also known as "kernel BUG at drivers/md/raid5.c:527!"
> >>>> 
> >>>> I’m still struggling with this issue in raid5.c. I currently have one crash per week in average on a system using Lustre and 24 x 10-disk raid6 per server. It did happen with or without check running. I now think there is a race at some point where a released stripe head (batched) is added to conf->inactive_list without sh->batch_head being set back to NULL.
> >>>> 
> >>>> Because sh->batch_head is only being set back to NULL in a single function, break_stripe_batch_list(), my guess is that a call to raid5_release_stripe() on the head stripe is performed elsewhere. I didn’t find where this could come from yet. So perhaps sh->batch_head should be just set to NULL in do_release_stripe(), before being added to the inactive list?
> >>>> 
> >>>> Alternatively, I wonder if it would make more sense to set batch_head to NULL where sh->overwrite_disks is also set back to 0 (like in ops_run_biodrain() today).
> >>>> 
> >>>> Because sh->overwrite_disks is also set to 0 in init_stripe(), my current plan is to try this change, at least as a workaround: remove the BUG_ON(sh->batch_head) from init_stripe() and set sh->batch_head = NULL (I don’t think I need to lock on sh->stripe_lock there?). I will get new hardware soon so I will give it a try on a test system. Any suggestions welcomed…
> >>>> 
> >>>> Quick analysis of the last vmcore:
> >>>> 
> >>>> crash> ps | grep ">"
> >>>> ...
> >>>>> 16119  16111  22  ffff881f93bace70  RU   0.0  150772  10748  e2label
> >>>>> 151448      2  34  ffff881ff753bec0  RU   0.0       0      0  [kiblnd_sd_00_00]
> >>>>> 151449      2   2  ffff881ff753edd0  RU   0.0       0      0  [kiblnd_sd_00_01]
> >>>>> 154637      2  29  ffff881f776b1f60  RU   0.0       0      0  [md32_raid6]
> >>>>> 163643      2  11  ffff8820d200bec0  UN   0.0       0      0  [ll_ost_io01_008]
> >>>>> 164083      2  13  ffff8820d2753ec0  UN   0.0       0      0  [ll_ost_io01_034]
> >>>>> 171389      2   5  ffff883fd072de20  UN   0.0       0      0  [ll_ost_io01_058]
> >>>>> 171391      2  17  ffff883fd072ce70  UN   0.0       0      0  [ll_ost_io01_060]
> >>>>> 171399      2  33  ffff883fd9e9ce70  UN   0.0       0      0  [ll_ost_io01_068]
> >>>>> 237019      2  15  ffff882715628fb0  UN   0.0       0      0  [ll_ost_io01_084]
> >>>>> 237033      2   3  ffff881fb0250fb0  UN   0.0       0      0  [ll_ost_io01_098]
> >>>> ...
> >>>> 
> >>>> crash>  bt 237033
> >>>> PID: 237033  TASK: ffff881fb0250fb0  CPU: 3   COMMAND: "ll_ost_io01_098"
> >>>> #0 [ffff8826fa3e7240] machine_kexec at ffffffff81059bdb
> >>>> #1 [ffff8826fa3e72a0] __crash_kexec at ffffffff811057c2
> >>>> #2 [ffff8826fa3e7370] crash_kexec at ffffffff811058b0
> >>>> #3 [ffff8826fa3e7388] oops_end at ffffffff8168fac8
> >>>> #4 [ffff8826fa3e73b0] die at ffffffff8102e93b
> >>>> #5 [ffff8826fa3e73e0] do_trap at ffffffff8168f180
> >>>> #6 [ffff8826fa3e7430] do_invalid_op at ffffffff8102b144
> >>>> #7 [ffff8826fa3e74e0] invalid_op at ffffffff81698b5e
> >>>>    [exception RIP: raid5_get_active_stripe+1809]
> >>>>    RIP: ffffffffa0fefd71  RSP: ffff8826fa3e7590  RFLAGS: 00010086
> >>>>    RAX: 0000000000000000  RBX: ffff883df051641c  RCX: dead000000000200
> >>>>    RDX: 0000000000000000  RSI: 0000000000000005  RDI: ffff881e08d54d30
> >>>>    RBP: ffff8826fa3e7638   R8: ffff881e08d54d40   R9: 0000000000000080
> >>>>    R10: 0000000000000000  R11: 0000000000000000  R12: ffff883df0516400
> >>>>    R13: ffff881e08d54d30  R14: 0000000000000000  R15: 0000000000000000
> >>>>    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
> >>>> #8 [ffff8826fa3e7640] raid5_make_request at ffffffffa0feff55 [raid456]
> >>>> #9 [ffff8826fa3e7730] md_make_request at ffffffff814f838c
> >>>> #10 [ffff8826fa3e7790] generic_make_request at ffffffff812eedc9
> >>>> #11 [ffff8826fa3e77d8] submit_bio at ffffffff812eef11
> >>>> #12 [ffff8826fa3e7830] osd_submit_bio at ffffffffa10cdbcc [osd_ldiskfs]
> >>>> #13 [ffff8826fa3e7840] osd_do_bio at ffffffffa10d03c1 [osd_ldiskfs]
> >>>> #14 [ffff8826fa3e7960] osd_write_commit at ffffffffa10d0b94 [osd_ldiskfs]
> >>>> #15 [ffff8826fa3e7a00] ofd_commitrw_write at ffffffffa120c734 [ofd]
> >>>> #16 [ffff8826fa3e7a78] ofd_commitrw at ffffffffa120ff07 [ofd]
> >>>> #17 [ffff8826fa3e7b08] obd_commitrw at ffffffffa0c7dc22 [ptlrpc]
> >>>> #18 [ffff8826fa3e7b70] tgt_brw_write at ffffffffa0c55fc1 [ptlrpc]
> >>>> #19 [ffff8826fa3e7cd8] tgt_request_handle at ffffffffa0c52275 [ptlrpc]
> >>>> #20 [ffff8826fa3e7d20] ptlrpc_server_handle_request at ffffffffa0bfe1fb [ptlrpc]
> >>>> #21 [ffff8826fa3e7de8] ptlrpc_main at ffffffffa0c022b0 [ptlrpc]
> >>>> #22 [ffff8826fa3e7ec8] kthread at ffffffff810b0a4f
> >>>> #23 [ffff8826fa3e7f50] ret_from_fork at ffffffff81697218
> >>>> 
> >>>> 
> >>>> After some disassembly work, I found out that all active tasks are working on the same md (all are raid6):
> >>>> 
> >>>> md32’s (struct r5conf *) is 0xffff883df0516400
> >>>> 
> >>>> Our crashed task:
> >>>> 
> >>>> crash> printf "%s", ((struct r5conf)*0xffff883df0516400).cache_name[0]
> >>>> raid6-md32
> >>>> 
> >>>> Example for md32_raid6 task in release_inactive_stripe_list:
> >>>> 
> >>>> crash> bt 154637
> >>>> PID: 154637  TASK: ffff881f776b1f60  CPU: 29  COMMAND: "md32_raid6"
> >>>> #0 [ffff883ffc985e48] crash_nmi_callback at ffffffff8104d2e2
> >>>> #1 [ffff883ffc985e58] nmi_handle at ffffffff8168fc77
> >>>> #2 [ffff883ffc985eb0] do_nmi at ffffffff8168fe83
> >>>> #3 [ffff883ffc985ef0] end_repeat_nmi at ffffffff8168f093
> >>>>    [exception RIP: _raw_spin_lock_irqsave+64]
> >>>>    RIP: ffffffff8168e5f0  RSP: ffff881e8d0cbc60  RFLAGS: 00000006
> >>>>    RAX: 00000000000031bb  RBX: ffff883df0516740  RCX: 000000000000009a
> >>>>    RDX: 0000000000000098  RSI: 000000000000009a  RDI: ffff883df051641c
> >>>>    RBP: ffff881e8d0cbc60   R8: 0000000000000206   R9: 0000000000000000
> >>>>    R10: 0000000000000000  R11: 0000000000000007  R12: ffff883df0516400
> >>>>    R13: ffff883df051641c  R14: 0000000000000006  R15: 0000000000000005
> >>>>    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
> >>>> --- <NMI exception stack> ---
> >>>> #4 [ffff881e8d0cbc60] _raw_spin_lock_irqsave at ffffffff8168e5f0
> >>>> #5 [ffff881e8d0cbc68] release_inactive_stripe_list at ffffffffa0fe5500 [raid456]
> >>>> #6 [ffff881e8d0cbcb0] handle_active_stripes at ffffffffa0ff34c2 [raid456]
> >>>> #7 [ffff881e8d0cbd60] raid5d at ffffffffa0ff3c28 [raid456]
> >>>> #8 [ffff881e8d0cbe50] md_thread at ffffffff814fb745
> >>>> #9 [ffff881e8d0cbec8] kthread at ffffffff810b0a4f
> >>>> #10 [ffff881e8d0cbf50] ret_from_fork at ffffffff81697218
> >>>> 
> >>>> 
> >>>> crash> bt 163643 164083 171389 171391 171399 237019 237033 |grep "exception RIP:"
> >>>>    [exception RIP: _raw_spin_lock_irq+63]
> >>>>    [exception RIP: _raw_spin_lock_irq+63]
> >>>>    [exception RIP: _raw_spin_lock_irq+58]
> >>>>    [exception RIP: _raw_spin_lock_irq+63]
> >>>>    [exception RIP: _raw_spin_lock_irq+58]
> >>>>    [exception RIP: _raw_spin_lock_irq+63]
> >>>>    [exception RIP: raid5_get_active_stripe+1809]
> >>>> 
> >>>> crash> bt 163643 164083 171389 171391 171399 237019 237033 |grep "raid5_get_active_stripe"
> >>>> #5 [ffff883bcee93588] raid5_get_active_stripe at ffffffffa0fef6cb [raid456]
> >>>> #5 [ffff883ba4af7588] raid5_get_active_stripe at ffffffffa0fef6cb [raid456]
> >>>> #5 [ffff88390877b588] raid5_get_active_stripe at ffffffffa0fef6cb [raid456]
> >>>> #5 [ffff88390803b588] raid5_get_active_stripe at ffffffffa0fef6cb [raid456]
> >>>> #5 [ffff88390831f588] raid5_get_active_stripe at ffffffffa0fef6cb [raid456]
> >>>> #5 [ffff8826fa7f7588] raid5_get_active_stripe at ffffffffa0fef6cb [raid456]
> >>>>    [exception RIP: raid5_get_active_stripe+1809]
> >>>> 
> >>>> crash> bt 163643 164083 171389 171391 171399 237019 237033 |grep R12
> >>>>    R10: 0000000000000003  R11: 0000000000000000  R12: ffff883df0516400
> >>>>    R10: 0000000000000001  R11: 0000000000000000  R12: ffff883df0516400
> >>>>    R10: 0000000000000005  R11: 0000000000000000  R12: ffff883df0516400
> >>>>    R10: 0000000000000007  R11: 0000000000000000  R12: ffff883df0516400
> >>>>    R10: 0000000000000009  R11: 0000000000000000  R12: ffff883df0516400
> >>>>    R10: 0000000000000009  R11: 0000000000000000  R12: ffff883df0516400
> >>>>    R10: 0000000000000000  R11: 0000000000000000  R12: ffff883df0516400
> >>>> 
> >>>> 
> >>>> The bad stripe_head (sh) is available at (usually in %r13):
> >>>> 
> >>>> crash> p ((struct stripe_head)*0xffff881e08d54d30).raid_conf
> >>>> $98 = (struct r5conf *) 0xffff883df0516400
> >>>> 
> >>>> and batch_head is not NULL... that's the issue:
> >>>> 
> >>>> crash> p ((struct stripe_head)*0xffff881e08d54d30).batch_head
> >>>> $99 = (struct stripe_head *) 0xffff881e08d54d30
> >>>> 
> >>>> Full struct dump below. We can see than overwrite_disks is set to 0, state is 0… but this batch_head not set to NULL makes it crash in init_stripe():
> >>>> 
> >>>> crash> p ((struct stripe_head)*0xffff881e08d54d30)
> >>>> $6 = {
> >>>>  hash = {
> >>>>    next = 0x0,
> >>>>    pprev = 0x0
> >>>>  },
> >>>>  lru = {
> >>>>    next = 0xffff881e08d54d40,
> >>>>    prev = 0xffff881e08d54d40
> >>>>  },
> >>>>  release_list = {
> >>>>    next = 0xffff881dfe405cc0
> >>>>  },
> >>>>  raid_conf = 0xffff883df0516400,
> >>>>  generation = 0,
> >>>>  sector = 3701224808,
> >>>>  pd_idx = 1,
> >>>>  qd_idx = 2,
> >>>>  ddf_layout = 0,
> >>>>  hash_lock_index = 5,
> >>>>  state = 0,
> >>>>  count = {
> >>>>    counter = 0
> >>>>  },
> >>>>  bm_seq = 22708787,
> >>>>  disks = 10,
> >>>>  overwrite_disks = 0,
> >>>>  check_state = check_state_idle,
> >>>>  reconstruct_state = reconstruct_state_idle,
> >>>>  stripe_lock = {
> >>>>    {
> >>>>      rlock = {
> >>>>        raw_lock = {
> >>>>          {
> >>>>            head_tail = 3709263126,
> >>>>            tickets = {
> >>>>              head = 56598,
> >>>>              tail = 56598
> >>>>            }
> >>>>          }
> >>>>        }
> >>>>      }
> >>>>    }
> >>>>  },
> >>>>  cpu = 15,
> >>>>  group = 0x0,
> >>>>  batch_head = 0xffff881e08d54d30,
> >>>>  batch_lock = {
> >>>>    {
> >>>>      rlock = {
> >>>>        raw_lock = {
> >>>>          {
> >>>>            head_tail = 9961624,
> >>>>            tickets = {
> >>>>              head = 152,
> >>>>              tail = 152
> >>>>            }
> >>>>          }
> >>>>        }
> >>>>      }
> >>>>    }
> >>>>  },
> >>>>  batch_list = {
> >>>>    next = 0xffff881dfe405d28,
> >>>>    prev = 0xffff881e0e2dec98
> >>>>  },
> >>>>  log_io = 0x0,
> >>>>  log_list = {
> >>>>    next = 0x0,
> >>>>    prev = 0x0
> >>>>  },
> >>>>  ops = {
> >>>>    target = 0,
> >>>>    target2 = 0,
> >>>>    zero_sum_result = (unknown: 0)
> >>>>  },
> >>>>  dev = {{
> >>>>      req = {
> >>>>        bi_sector = 0,
> >>>>        bi_next = 0x0,
> >>>>        bi_bdev = 0x0,
> >>>>        bi_flags = 1,
> >>>>        bi_rw = 0,
> >>>>        bi_vcnt = 0,
> >>>>        bi_idx = 0,
> >>>>        bi_phys_segments = 0,
> >>>>        bi_size = 0,
> >>>>        bi_seg_front_size = 0,
> >>>>        bi_seg_back_size = 0,
> >>>>        bi_end_io = 0x0,
> >>>>        bi_private = 0xffff881e08d54d30,
> >>>>        bi_ioc = 0x0,
> >>>>        bi_css = 0x0,
> >>>>        bi_integrity = 0x0,
> >>>>        bi_max_vecs = 1,
> >>>>        bi_cnt = {
> >>>>          counter = 1
> >>>>        },
> >>>>        bi_io_vec = 0xffff881e08d54f00,
> >>>>        bi_pool = 0x0,
> >>>>        {
> >>>>          bio_aux = 0x0,
> >>>>          __UNIQUE_ID_rh_kabi_hide15 = {
> >>>>            rh_reserved1 = 0x0
> >>>>          },
> >>>>          {<No data fields>}
> >>>>        },
> >>>>        bi_inline_vecs = 0xffff881e08d54e78
> >>>>      },
> >>>>      rreq = {
> >>>>        bi_sector = 0,
> >>>>        bi_next = 0x0,
> >>>>        bi_bdev = 0x0,
> >>>>        bi_flags = 1,
> >>>>        bi_rw = 0,
> >>>>        bi_vcnt = 0,
> >>>>        bi_idx = 0,
> >>>>        bi_phys_segments = 0,
> >>>>        bi_size = 0,
> >>>>        bi_seg_front_size = 0,
> >>>>        bi_seg_back_size = 0,
> >>>>        bi_end_io = 0x0,
> >>>>        bi_private = 0xffff881e08d54d30,
> >>>>        bi_ioc = 0x0,
> >>>>        bi_css = 0x0,
> >>>>        bi_integrity = 0x0,
> >>>>        bi_max_vecs = 1,
> >>>>        bi_cnt = {
> >>>>          counter = 1
> >>>>        },
> >>>>        bi_io_vec = 0xffff881e08d54f10,
> >>>>        bi_pool = 0x0,
> >>>>        {
> >>>>          bio_aux = 0x0,
> >>>>          __UNIQUE_ID_rh_kabi_hide15 = {
> >>>>            rh_reserved1 = 0x0
> >>>>          },
> >>>>          {<No data fields>}
> >>>>        },
> >>>>        bi_inline_vecs = 0xffff881e08d54f00
> >>>>      },
> >>>>      vec = {
> >>>>        bv_page = 0xffffea0078236080,
> >>>>        bv_len = 4096,
> >>>>        bv_offset = 0
> >>>>      },
> >>>>      rvec = {
> >>>>        bv_page = 0x0,
> >>>>        bv_len = 0,
> >>>>        bv_offset = 0
> >>>>      },
> >>>>      page = 0xffffea0078236080,
> >>>>      orig_page = 0xffffea0078236080,
> >>>>      toread = 0x0,
> >>>>      read = 0x0,
> >>>>      towrite = 0x0,
> >>>>      written = 0xffff8824832a7c00,
> >>>>      sector = 29609798632,
> >>>>      flags = 24,
> >>> 
> >>> Seems the flags is R5_OVERWRITE | R5_Insync, handle_stripe_clean_event
> >>> only calls break_stripe_batch_list if do_endio is set to true.
> >>> 
> >>> if (head_sh->batch_head && do_endio)
> >>>               break_stripe_batch_list(head_sh, STRIPE_EXPAND_SYNC_FLAGS);
> >>> 
> >>> And do_endio is set to true if we meet the below condition.
> >>> 
> >>> if (!test_bit(R5_LOCKED, &dev->flags) &&
> >>>                           (test_bit(R5_UPTODATE, &dev->flags) ||
> >>>                            test_bit(R5_Discard, &dev->flags) ||
> >>>                            test_bit(R5_SkipCopy, &dev->flags))) {
> >>> 
> >>> I guess it is why the batch_head is not null, just my 2 cents.
> >> 
> >> 
> >> That’s a very good point, thanks Guoqing!
> >> 
> >> I checked another similar crash dump, and found out that this same flags was 16 (R5_Insync alone), which doesn’t meet the condition for having do_endio = true neither.
> >> 
> >> The code below in raid5_end_write_request does directly release a sh->batch_head, couldn’t that be an issue?
> >> 
> >> static void raid5_end_write_request(struct bio *bi)
> >> {
> >> 
> >> ...
> >> 
> >>        raid5_release_stripe(sh);
> >> 
> >>        if (sh->batch_head && sh != sh->batch_head)
> >>                raid5_release_stripe(sh->batch_head);
> >> }
> > 
> > Can you dump the full stripe_head info for this crash dump?
> 
> Yes, of course!
> 
> R13: ffff881e53f5dca0
> 
> crash> p (struct stripe_head)*0xffff881e53f5dca0
> $1 = {
>   hash = {
>     next = 0x0, 
>     pprev = 0x0
>   }, 
>   lru = {
>     next = 0xffff881e53f5dcb0, 
>     prev = 0xffff881e53f5dcb0
>   }, 
>   release_list = {
>     next = 0xffff881e63faec30
>   }, 
>   raid_conf = 0xffff881f9fb81400, 
>   generation = 0, 
>   sector = 2278766080, 
>   pd_idx = 9, 
>   qd_idx = 0, 
>   ddf_layout = 0, 
>   hash_lock_index = 0, 
>   state = 0, 
>   count = {
>     counter = 0
>   }, 
>   bm_seq = 21153188, 
>   disks = 10, 
>   overwrite_disks = 0, 
>   check_state = check_state_idle, 
>   reconstruct_state = reconstruct_state_idle, 
>   stripe_lock = {
>     {
>       rlock = {
>         raw_lock = {
>           {
>             head_tail = 3483684772, 
>             tickets = {
>               head = 53156, 
>               tail = 53156
>             }
>           }
>         }
>       }
>     }
>   }, 
>   cpu = 12, 
>   group = 0x0, 
>   batch_head = 0xffff881e53f5dca0, 
>   batch_lock = {
>     {
>       rlock = {
>         raw_lock = {
>           {
>             head_tail = 1983149620, 
>             tickets = {
>               head = 30260, 
>               tail = 30260
>             }
>           }
>         }
>       }
>     }
>   }, 
>   batch_list = {
>     next = 0xffff881e63faec98, 
>     prev = 0xffff881e5c390088
>   }, 
>   log_io = 0x0, 
>   log_list = {
>     next = 0x0, 
>     prev = 0x0
>   }, 
>   ops = {
>     target = 0, 
>     target2 = 0, 
>     zero_sum_result = (unknown: 0)
>   }, 
>   dev = {{
>       req = {
>         bi_sector = 0, 
>         bi_next = 0x0, 
>         bi_bdev = 0x0, 
>         bi_flags = 1, 
>         bi_rw = 0, 
>         bi_vcnt = 0, 
>         bi_idx = 0, 
>         bi_phys_segments = 0, 
>         bi_size = 0, 
>         bi_seg_front_size = 0, 
>         bi_seg_back_size = 0, 
>         bi_end_io = 0x0, 
>         bi_private = 0xffff881e53f5dca0, 
>         bi_ioc = 0x0, 
>         bi_css = 0x0, 
>         bi_integrity = 0x0, 
>         bi_max_vecs = 1, 
>         bi_cnt = {
>           counter = 1
>         }, 
>         bi_io_vec = 0xffff881e53f5de70, 
>         bi_pool = 0x0, 
>         {
>           bio_aux = 0x0, 
>           __UNIQUE_ID_rh_kabi_hide15 = {
>             rh_reserved1 = 0x0
>           }, 
>           {<No data fields>}
>         }, 
>         bi_inline_vecs = 0xffff881e53f5dde8
>       }, 
>       rreq = {
>         bi_sector = 0, 
>         bi_next = 0x0, 
>         bi_bdev = 0x0, 
>         bi_flags = 1, 
>         bi_rw = 0, 
>         bi_vcnt = 0, 
>         bi_idx = 0, 
>         bi_phys_segments = 0, 
>         bi_size = 0, 
>         bi_seg_front_size = 0, 
>         bi_seg_back_size = 0, 
>         bi_end_io = 0x0, 
>         bi_private = 0xffff881e53f5dca0, 
>         bi_ioc = 0x0, 
>         bi_css = 0x0, 
>         bi_integrity = 0x0, 
>         bi_max_vecs = 1, 
>         bi_cnt = {
>           counter = 1
>         }, 
>         bi_io_vec = 0xffff881e53f5de80, 
>         bi_pool = 0x0, 
>         {
>           bio_aux = 0x0, 
>           __UNIQUE_ID_rh_kabi_hide15 = {
>             rh_reserved1 = 0x0
>           }, 
>           {<No data fields>}
>         }, 
>         bi_inline_vecs = 0xffff881e53f5de70
>       }, 
>       vec = {
>         bv_page = 0xffffea00794fe000, 
>         bv_len = 4096, 
>         bv_offset = 0
>       }, 
>       rvec = {
>         bv_page = 0x0, 
>         bv_len = 0, 
>         bv_offset = 0
>       }, 
>       page = 0xffffea00794fe000, 
>       orig_page = 0xffffea00794fe000, 
>       toread = 0x0, 
>       read = 0x0, 
>       towrite = 0x0, 
>       written = 0x0, 
>       sector = 0, 
>       flags = 16, 
>       log_checksum = 0
>     }}
> }
> 
> 
> 
> > Also stripe_head
> > should include 10 dev for your 10-disk raid, but crash dump only show one, is
> > it possible you can dump the info for the devs in the stripe?
> 
> Oh right, crash didn’t display it probably because it is an array allocated dynamically.
> 
> Let’s try this to print the 10 devs:
> 
> crash> p ((struct stripe_head)*0xffff881e53f5dca0).dev@10
> $10 = 
>  {{{
>       req = {
>         bi_sector = 0, 
>         bi_next = 0x0, 
>         bi_bdev = 0x0, 
>         bi_flags = 1, 
>         bi_rw = 0, 
>         bi_vcnt = 0, 
>         bi_idx = 0, 
>         bi_phys_segments = 0, 
>         bi_size = 0, 
>         bi_seg_front_size = 0, 
>         bi_seg_back_size = 0, 
>         bi_end_io = 0x0, 
>         bi_private = 0xffff881e53f5dca0, 
>         bi_ioc = 0x0, 
>         bi_css = 0x0, 
>         bi_integrity = 0x0, 
>         bi_max_vecs = 1, 
>         bi_cnt = {
>           counter = 1
>         }, 
>         bi_io_vec = 0xffff881e53f5de70, 
>         bi_pool = 0x0, 
>         {
>           bio_aux = 0x0, 
>           __UNIQUE_ID_rh_kabi_hide15 = {
>             rh_reserved1 = 0x0
>           }, 
>           {<No data fields>}
>         }, 
>         bi_inline_vecs = 0xffff881e53f5dde8
>       }, 
>       rreq = {
>         bi_sector = 0, 
>         bi_next = 0x0, 
>         bi_bdev = 0x0, 
>         bi_flags = 1, 
>         bi_rw = 0, 
>         bi_vcnt = 0, 
>         bi_idx = 0, 
>         bi_phys_segments = 0, 
>         bi_size = 0, 
>         bi_seg_front_size = 0, 
>         bi_seg_back_size = 0, 
>         bi_end_io = 0x0, 
>         bi_private = 0xffff881e53f5dca0, 
>         bi_ioc = 0x0, 
>         bi_css = 0x0, 
>         bi_integrity = 0x0, 
>         bi_max_vecs = 1, 
>         bi_cnt = {
>           counter = 1
>         }, 
>         bi_io_vec = 0xffff881e53f5de80, 
>         bi_pool = 0x0, 
>         {
>           bio_aux = 0x0, 
>           __UNIQUE_ID_rh_kabi_hide15 = {
>             rh_reserved1 = 0x0
>           }, 
>           {<No data fields>}
>         }, 
>         bi_inline_vecs = 0xffff881e53f5de70
>       }, 
>       vec = {
>         bv_page = 0xffffea00794fe000, 
>         bv_len = 4096, 
>         bv_offset = 0
>       }, 
>       rvec = {
>         bv_page = 0x0, 
>         bv_len = 0, 
>         bv_offset = 0
>       }, 
>       page = 0xffffea00794fe000, 
>       orig_page = 0xffffea00794fe000, 
>       toread = 0x0, 
>       read = 0x0, 
>       towrite = 0x0, 
>       written = 0x0, 
>       sector = 0, 
>       flags = 16, 
>       log_checksum = 0
>     }}, {{
>       req = {
>         bi_sector = 0, 
>         bi_next = 0x0, 
>         bi_bdev = 0x0, 
>         bi_flags = 1, 
>         bi_rw = 0, 
>         bi_vcnt = 0, 
>         bi_idx = 0, 
>         bi_phys_segments = 0, 
>         bi_size = 0, 
>         bi_seg_front_size = 0, 
>         bi_seg_back_size = 0, 
>         bi_end_io = 0x0, 
>         bi_private = 0xffff881e53f5dca0, 
>         bi_ioc = 0x0, 
>         bi_css = 0x0, 
>         bi_integrity = 0x0, 
>         bi_max_vecs = 1, 
>         bi_cnt = {
>           counter = 1
>         }, 
>         bi_io_vec = 0xffff881e53f5dfe8, 
>         bi_pool = 0x0, 
>         {
>           bio_aux = 0x0, 
>           __UNIQUE_ID_rh_kabi_hide15 = {
>             rh_reserved1 = 0x0
>           }, 
>           {<No data fields>}
>         }, 
>         bi_inline_vecs = 0xffff881e53f5df60
>       }, 
>       rreq = {
>         bi_sector = 0, 
>         bi_next = 0x0, 
>         bi_bdev = 0x0, 
>         bi_flags = 1, 
>         bi_rw = 0, 
>         bi_vcnt = 0, 
>         bi_idx = 0, 
>         bi_phys_segments = 0, 
>         bi_size = 0, 
>         bi_seg_front_size = 0, 
>         bi_seg_back_size = 0, 
>         bi_end_io = 0x0, 
>         bi_private = 0xffff881e53f5dca0, 
>         bi_ioc = 0x0, 
>         bi_css = 0x0, 
>         bi_integrity = 0x0, 
>         bi_max_vecs = 1, 
>         bi_cnt = {
>           counter = 1
>         }, 
>         bi_io_vec = 0xffff881e53f5dff8, 
>         bi_pool = 0x0, 
>         {
>           bio_aux = 0x0, 
>           __UNIQUE_ID_rh_kabi_hide15 = {
>             rh_reserved1 = 0x0
>           }, 
>           {<No data fields>}
>         }, 
>         bi_inline_vecs = 0xffff881e53f5dfe8
>       }, 
>       vec = {
>         bv_page = 0xffffea00794fe040, 
>         bv_len = 4096, 
>         bv_offset = 0
>       }, 
>       rvec = {
>         bv_page = 0x0, 
>         bv_len = 0, 
>         bv_offset = 0
>       }, 
>       page = 0xffffea00794fe040, 
>       orig_page = 0xffffea00794fe040, 
>       toread = 0x0, 
>       read = 0x0, 
>       towrite = 0x0, 
>       written = 0xffff881fea351800, 
>       sector = 18230128640, 
>       flags = 24, 
>       log_checksum = 0
>     }}, {{
>       req = {
>         bi_sector = 0, 
>         bi_next = 0x0, 
>         bi_bdev = 0x0, 
>         bi_flags = 1, 
>         bi_rw = 0, 
>         bi_vcnt = 0, 
>         bi_idx = 0, 
>         bi_phys_segments = 0, 
>         bi_size = 0, 
>         bi_seg_front_size = 0, 
>         bi_seg_back_size = 0, 
>         bi_end_io = 0x0, 
>         bi_private = 0xffff881e53f5dca0, 
>         bi_ioc = 0x0, 
>         bi_css = 0x0, 
>         bi_integrity = 0x0, 
>         bi_max_vecs = 1, 
>         bi_cnt = {
>           counter = 1
>         }, 
>         bi_io_vec = 0xffff881e53f5e160, 
>         bi_pool = 0x0, 
>         {
>           bio_aux = 0x0, 
>           __UNIQUE_ID_rh_kabi_hide15 = {
>             rh_reserved1 = 0x0
>           }, 
>           {<No data fields>}
>         }, 
>         bi_inline_vecs = 0xffff881e53f5e0d8
>       }, 
>       rreq = {
>         bi_sector = 0, 
>         bi_next = 0x0, 
>         bi_bdev = 0x0, 
>         bi_flags = 1, 
>         bi_rw = 0, 
>         bi_vcnt = 0, 
>         bi_idx = 0, 
>         bi_phys_segments = 0, 
>         bi_size = 0, 
>         bi_seg_front_size = 0, 
>         bi_seg_back_size = 0, 
>         bi_end_io = 0x0, 
>         bi_private = 0xffff881e53f5dca0, 
>         bi_ioc = 0x0, 
>         bi_css = 0x0, 
>         bi_integrity = 0x0, 
>         bi_max_vecs = 1, 
>         bi_cnt = {
>           counter = 1
>         }, 
>         bi_io_vec = 0xffff881e53f5e170, 
>         bi_pool = 0x0, 
>         {
>           bio_aux = 0x0, 
>           __UNIQUE_ID_rh_kabi_hide15 = {
>             rh_reserved1 = 0x0
>           }, 
>           {<No data fields>}
>         }, 
>         bi_inline_vecs = 0xffff881e53f5e160
>       }, 
>       vec = {
>         bv_page = 0xffffea00794fe080, 
>         bv_len = 4096, 
>         bv_offset = 0
>       }, 
>       rvec = {
>         bv_page = 0x0, 
>         bv_len = 0, 
>         bv_offset = 0
>       }, 
>       page = 0xffffea00794fe080, 
>       orig_page = 0xffffea00794fe080, 
>       toread = 0x0, 
>       read = 0x0, 
>       towrite = 0x0, 
>       written = 0xffff881fea351800, 
>       sector = 18230128768, 
>       flags = 24, 
>       log_checksum = 0
>     }}, {{
>       req = {
>         bi_sector = 0, 
>         bi_next = 0x0, 
>         bi_bdev = 0x0, 
>         bi_flags = 1, 
>         bi_rw = 0, 
>         bi_vcnt = 0, 
>         bi_idx = 0, 
>         bi_phys_segments = 0, 
>         bi_size = 0, 
>         bi_seg_front_size = 0, 
>         bi_seg_back_size = 0, 
>         bi_end_io = 0x0, 
>         bi_private = 0xffff881e53f5dca0, 
>         bi_ioc = 0x0, 
>         bi_css = 0x0, 
>         bi_integrity = 0x0, 
>         bi_max_vecs = 1, 
>         bi_cnt = {
>           counter = 1
>         }, 
>         bi_io_vec = 0xffff881e53f5e2d8, 
>         bi_pool = 0x0, 
>         {
>           bio_aux = 0x0, 
>           __UNIQUE_ID_rh_kabi_hide15 = {
>             rh_reserved1 = 0x0
>           }, 
>           {<No data fields>}
>         }, 
>         bi_inline_vecs = 0xffff881e53f5e250
>       }, 
>       rreq = {
>         bi_sector = 0, 
>         bi_next = 0x0, 
>         bi_bdev = 0x0, 
>         bi_flags = 1, 
>         bi_rw = 0, 
>         bi_vcnt = 0, 
>         bi_idx = 0, 
>         bi_phys_segments = 0, 
>         bi_size = 0, 
>         bi_seg_front_size = 0, 
>         bi_seg_back_size = 0, 
>         bi_end_io = 0x0, 
>         bi_private = 0xffff881e53f5dca0, 
>         bi_ioc = 0x0, 
>         bi_css = 0x0, 
>         bi_integrity = 0x0, 
>         bi_max_vecs = 1, 
>         bi_cnt = {
>           counter = 1
>         }, 
>         bi_io_vec = 0xffff881e53f5e2e8, 
>         bi_pool = 0x0, 
>         {
>           bio_aux = 0x0, 
>           __UNIQUE_ID_rh_kabi_hide15 = {
>             rh_reserved1 = 0x0
>           }, 
>           {<No data fields>}
>         }, 
>         bi_inline_vecs = 0xffff881e53f5e2d8
>       }, 
>       vec = {
>         bv_page = 0xffffea00794fe0c0, 
>         bv_len = 4096, 
>         bv_offset = 0
>       }, 
>       rvec = {
>         bv_page = 0x0, 
>         bv_len = 0, 
>         bv_offset = 0
>       }, 
>       page = 0xffffea00794fe0c0, 
>       orig_page = 0xffffea00794fe0c0, 
>       toread = 0x0, 
>       read = 0x0, 
>       towrite = 0x0, 
>       written = 0xffff881fea351800, 
>       sector = 18230128896, 
>       flags = 24, 
>       log_checksum = 0
>     }}, {{
>       req = {
>         bi_sector = 0, 
>         bi_next = 0x0, 
>         bi_bdev = 0x0, 
>         bi_flags = 1, 
>         bi_rw = 0, 
>         bi_vcnt = 0, 
>         bi_idx = 0, 
>         bi_phys_segments = 0, 
>         bi_size = 0, 
>         bi_seg_front_size = 0, 
>         bi_seg_back_size = 0, 
>         bi_end_io = 0x0, 
>         bi_private = 0xffff881e53f5dca0, 
>         bi_ioc = 0x0, 
>         bi_css = 0x0, 
>         bi_integrity = 0x0, 
>         bi_max_vecs = 1, 
>         bi_cnt = {
>           counter = 1
>         }, 
>         bi_io_vec = 0xffff881e53f5e450, 
>         bi_pool = 0x0, 
>         {
>           bio_aux = 0x0, 
>           __UNIQUE_ID_rh_kabi_hide15 = {
>             rh_reserved1 = 0x0
>           }, 
>           {<No data fields>}
>         }, 
>         bi_inline_vecs = 0xffff881e53f5e3c8
>       }, 
>       rreq = {
>         bi_sector = 0, 
>         bi_next = 0x0, 
>         bi_bdev = 0x0, 
>         bi_flags = 1, 
>         bi_rw = 0, 
>         bi_vcnt = 0, 
>         bi_idx = 0, 
>         bi_phys_segments = 0, 
>         bi_size = 0, 
>         bi_seg_front_size = 0, 
>         bi_seg_back_size = 0, 
>         bi_end_io = 0x0, 
>         bi_private = 0xffff881e53f5dca0, 
>         bi_ioc = 0x0, 
>         bi_css = 0x0, 
>         bi_integrity = 0x0, 
>         bi_max_vecs = 1, 
>         bi_cnt = {
>           counter = 1
>         }, 
>         bi_io_vec = 0xffff881e53f5e460, 
>         bi_pool = 0x0, 
>         {
>           bio_aux = 0x0, 
>           __UNIQUE_ID_rh_kabi_hide15 = {
>             rh_reserved1 = 0x0
>           }, 
>           {<No data fields>}
>         }, 
>         bi_inline_vecs = 0xffff881e53f5e450
>       }, 
>       vec = {
>         bv_page = 0xffffea00794fe100, 
>         bv_len = 4096, 
>         bv_offset = 0
>       }, 
>       rvec = {
>         bv_page = 0x0, 
>         bv_len = 0, 
>         bv_offset = 0
>       }, 
>       page = 0xffffea00794fe100, 
>       orig_page = 0xffffea00794fe100, 
>       toread = 0x0, 
>       read = 0x0, 
>       towrite = 0x0, 
>       written = 0xffff881fea351800, 
>       sector = 18230129024, 
>       flags = 24, 
>       log_checksum = 0
>     }}, {{
>       req = {
>         bi_sector = 0, 
>         bi_next = 0x0, 
>         bi_bdev = 0x0, 
>         bi_flags = 1, 
>         bi_rw = 0, 
>         bi_vcnt = 0, 
>         bi_idx = 0, 
>         bi_phys_segments = 0, 
>         bi_size = 0, 
>         bi_seg_front_size = 0, 
>         bi_seg_back_size = 0, 
>         bi_end_io = 0x0, 
>         bi_private = 0xffff881e53f5dca0, 
>         bi_ioc = 0x0, 
>         bi_css = 0x0, 
>         bi_integrity = 0x0, 
>         bi_max_vecs = 1, 
>         bi_cnt = {
>           counter = 1
>         }, 
>         bi_io_vec = 0xffff881e53f5e5c8, 
>         bi_pool = 0x0, 
>         {
>           bio_aux = 0x0, 
>           __UNIQUE_ID_rh_kabi_hide15 = {
>             rh_reserved1 = 0x0
>           }, 
>           {<No data fields>}
>         }, 
>         bi_inline_vecs = 0xffff881e53f5e540
>       }, 
>       rreq = {
>         bi_sector = 0, 
>         bi_next = 0x0, 
>         bi_bdev = 0x0, 
>         bi_flags = 1, 
>         bi_rw = 0, 
>         bi_vcnt = 0, 
>         bi_idx = 0, 
>         bi_phys_segments = 0, 
>         bi_size = 0, 
>         bi_seg_front_size = 0, 
>         bi_seg_back_size = 0, 
>         bi_end_io = 0x0, 
>         bi_private = 0xffff881e53f5dca0, 
>         bi_ioc = 0x0, 
>         bi_css = 0x0, 
>         bi_integrity = 0x0, 
>         bi_max_vecs = 1, 
>         bi_cnt = {
>           counter = 1
>         }, 
>         bi_io_vec = 0xffff881e53f5e5d8, 
>         bi_pool = 0x0, 
>         {
>           bio_aux = 0x0, 
>           __UNIQUE_ID_rh_kabi_hide15 = {
>             rh_reserved1 = 0x0
>           }, 
>           {<No data fields>}
>         }, 
>         bi_inline_vecs = 0xffff881e53f5e5c8
>       }, 
>       vec = {
>         bv_page = 0xffffea00794fe140, 
>         bv_len = 4096, 
>         bv_offset = 0
>       }, 
>       rvec = {
>         bv_page = 0x0, 
>         bv_len = 0, 
>         bv_offset = 0
>       }, 
>       page = 0xffffea00794fe140, 
>       orig_page = 0xffffea00794fe140, 
>       toread = 0x0, 
>       read = 0x0, 
>       towrite = 0x0, 
>       written = 0xffff881fea351800, 
>       sector = 18230129152, 
>       flags = 24, 
>       log_checksum = 0
>     }}, {{
>       req = {
>         bi_sector = 0, 
>         bi_next = 0x0, 
>         bi_bdev = 0x0, 
>         bi_flags = 1, 
>         bi_rw = 0, 
>         bi_vcnt = 0, 
>         bi_idx = 0, 
>         bi_phys_segments = 0, 
>         bi_size = 0, 
>         bi_seg_front_size = 0, 
>         bi_seg_back_size = 0, 
>         bi_end_io = 0x0, 
>         bi_private = 0xffff881e53f5dca0, 
>         bi_ioc = 0x0, 
>         bi_css = 0x0, 
>         bi_integrity = 0x0, 
>         bi_max_vecs = 1, 
>         bi_cnt = {
>           counter = 1
>         }, 
>         bi_io_vec = 0xffff881e53f5e740, 
>         bi_pool = 0x0, 
>         {
>           bio_aux = 0x0, 
>           __UNIQUE_ID_rh_kabi_hide15 = {
>             rh_reserved1 = 0x0
>           }, 
>           {<No data fields>}
>         }, 
>         bi_inline_vecs = 0xffff881e53f5e6b8
>       }, 
>       rreq = {
>         bi_sector = 0, 
>         bi_next = 0x0, 
>         bi_bdev = 0x0, 
>         bi_flags = 1, 
>         bi_rw = 0, 
>         bi_vcnt = 0, 
>         bi_idx = 0, 
>         bi_phys_segments = 0, 
>         bi_size = 0, 
>         bi_seg_front_size = 0, 
>         bi_seg_back_size = 0, 
>         bi_end_io = 0x0, 
>         bi_private = 0xffff881e53f5dca0, 
>         bi_ioc = 0x0, 
>         bi_css = 0x0, 
>         bi_integrity = 0x0, 
>         bi_max_vecs = 1, 
>         bi_cnt = {
>           counter = 1
>         }, 
>         bi_io_vec = 0xffff881e53f5e750, 
>         bi_pool = 0x0, 
>         {
>           bio_aux = 0x0, 
>           __UNIQUE_ID_rh_kabi_hide15 = {
>             rh_reserved1 = 0x0
>           }, 
>           {<No data fields>}
>         }, 
>         bi_inline_vecs = 0xffff881e53f5e740
>       }, 
>       vec = {
>         bv_page = 0xffffea00794fe180, 
>         bv_len = 4096, 
>         bv_offset = 0
>       }, 
>       rvec = {
>         bv_page = 0x0, 
>         bv_len = 0, 
>         bv_offset = 0
>       }, 
>       page = 0xffffea00794fe180, 
>       orig_page = 0xffffea00794fe180, 
>       toread = 0x0, 
>       read = 0x0, 
>       towrite = 0x0, 
>       written = 0xffff881fea351800, 
>       sector = 18230129280, 
>       flags = 24, 
>       log_checksum = 0
>     }}, {{
>       req = {
>         bi_sector = 0, 
>         bi_next = 0x0, 
>         bi_bdev = 0x0, 
>         bi_flags = 1, 
>         bi_rw = 0, 
>         bi_vcnt = 0, 
>         bi_idx = 0, 
>         bi_phys_segments = 0, 
>         bi_size = 0, 
>         bi_seg_front_size = 0, 
>         bi_seg_back_size = 0, 
>         bi_end_io = 0x0, 
>         bi_private = 0xffff881e53f5dca0, 
>         bi_ioc = 0x0, 
>         bi_css = 0x0, 
>         bi_integrity = 0x0, 
>         bi_max_vecs = 1, 
>         bi_cnt = {
>           counter = 1
>         }, 
>         bi_io_vec = 0xffff881e53f5e8b8, 
>         bi_pool = 0x0, 
>         {
>           bio_aux = 0x0, 
>           __UNIQUE_ID_rh_kabi_hide15 = {
>             rh_reserved1 = 0x0
>           }, 
>           {<No data fields>}
>         }, 
>         bi_inline_vecs = 0xffff881e53f5e830
>       }, 
>       rreq = {
>         bi_sector = 0, 
>         bi_next = 0x0, 
>         bi_bdev = 0x0, 
>         bi_flags = 1, 
>         bi_rw = 0, 
>         bi_vcnt = 0, 
>         bi_idx = 0, 
>         bi_phys_segments = 0, 
>         bi_size = 0, 
>         bi_seg_front_size = 0, 
>         bi_seg_back_size = 0, 
>         bi_end_io = 0x0, 
>         bi_private = 0xffff881e53f5dca0, 
>         bi_ioc = 0x0, 
>         bi_css = 0x0, 
>         bi_integrity = 0x0, 
>         bi_max_vecs = 1, 
>         bi_cnt = {
>           counter = 1
>         }, 
>         bi_io_vec = 0xffff881e53f5e8c8, 
>         bi_pool = 0x0, 
>         {
>           bio_aux = 0x0, 
>           __UNIQUE_ID_rh_kabi_hide15 = {
>             rh_reserved1 = 0x0
>           }, 
>           {<No data fields>}
>         }, 
>         bi_inline_vecs = 0xffff881e53f5e8b8
>       }, 
>       vec = {
>         bv_page = 0xffffea00794fe1c0, 
>         bv_len = 4096, 
>         bv_offset = 0
>       }, 
>       rvec = {
>         bv_page = 0x0, 
>         bv_len = 0, 
>         bv_offset = 0
>       }, 
>       page = 0xffffea00794fe1c0, 
>       orig_page = 0xffffea00794fe1c0, 
>       toread = 0x0, 
>       read = 0x0, 
>       towrite = 0x0, 
>       written = 0xffff881fea351800, 
>       sector = 18230129408, 
>       flags = 24, 
>       log_checksum = 0
>     }}, {{
>       req = {
>         bi_sector = 0, 
>         bi_next = 0x0, 
>         bi_bdev = 0x0, 
>         bi_flags = 1, 
>         bi_rw = 0, 
>         bi_vcnt = 0, 
>         bi_idx = 0, 
>         bi_phys_segments = 0, 
>         bi_size = 0, 
>         bi_seg_front_size = 0, 
>         bi_seg_back_size = 0, 
>         bi_end_io = 0x0, 
>         bi_private = 0xffff881e53f5dca0, 
>         bi_ioc = 0x0, 
>         bi_css = 0x0, 
>         bi_integrity = 0x0, 
>         bi_max_vecs = 1, 
>         bi_cnt = {
>           counter = 1
>         }, 
>         bi_io_vec = 0xffff881e53f5ea30, 
>         bi_pool = 0x0, 
>         {
>           bio_aux = 0x0, 
>           __UNIQUE_ID_rh_kabi_hide15 = {
>             rh_reserved1 = 0x0
>           }, 
>           {<No data fields>}
>         }, 
>         bi_inline_vecs = 0xffff881e53f5e9a8
>       }, 
>       rreq = {
>         bi_sector = 0, 
>         bi_next = 0x0, 
>         bi_bdev = 0x0, 
>         bi_flags = 1, 
>         bi_rw = 0, 
>         bi_vcnt = 0, 
>         bi_idx = 0, 
>         bi_phys_segments = 0, 
>         bi_size = 0, 
>         bi_seg_front_size = 0, 
>         bi_seg_back_size = 0, 
>         bi_end_io = 0x0, 
>         bi_private = 0xffff881e53f5dca0, 
>         bi_ioc = 0x0, 
>         bi_css = 0x0, 
>         bi_integrity = 0x0, 
>         bi_max_vecs = 1, 
>         bi_cnt = {
>           counter = 1
>         }, 
>         bi_io_vec = 0xffff881e53f5ea40, 
>         bi_pool = 0x0, 
>         {
>           bio_aux = 0x0, 
>           __UNIQUE_ID_rh_kabi_hide15 = {
>             rh_reserved1 = 0x0
>           }, 
>           {<No data fields>}
>         }, 
>         bi_inline_vecs = 0xffff881e53f5ea30
>       }, 
>       vec = {
>         bv_page = 0xffffea00794fe200, 
>         bv_len = 4096, 
>         bv_offset = 0
>       }, 
>       rvec = {
>         bv_page = 0x0, 
>         bv_len = 0, 
>         bv_offset = 0
>       }, 
>       page = 0xffffea00794fe200, 
>       orig_page = 0xffffea00794fe200, 
>       toread = 0x0, 
>       read = 0x0, 
>       towrite = 0x0, 
>       written = 0xffff881fea351800, 
>       sector = 18230129536, 
>       flags = 24, 
>       log_checksum = 0
>     }}, {{
>       req = {
>         bi_sector = 0, 
>         bi_next = 0x0, 
>         bi_bdev = 0x0, 
>         bi_flags = 1, 
>         bi_rw = 0, 
>         bi_vcnt = 0, 
>         bi_idx = 0, 
>         bi_phys_segments = 0, 
>         bi_size = 0, 
>         bi_seg_front_size = 0, 
>         bi_seg_back_size = 0, 
>         bi_end_io = 0x0, 
>         bi_private = 0xffff881e53f5dca0, 
>         bi_ioc = 0x0, 
>         bi_css = 0x0, 
>         bi_integrity = 0x0, 
>         bi_max_vecs = 1, 
>         bi_cnt = {
>           counter = 1
>         }, 
>         bi_io_vec = 0xffff881e53f5eba8, 
>         bi_pool = 0x0, 
>         {
>           bio_aux = 0x0, 
>           __UNIQUE_ID_rh_kabi_hide15 = {
>             rh_reserved1 = 0x0
>           }, 
>           {<No data fields>}
>         }, 
>         bi_inline_vecs = 0xffff881e53f5eb20
>       }, 
>       rreq = {
>         bi_sector = 0, 
>         bi_next = 0x0, 
>         bi_bdev = 0x0, 
>         bi_flags = 1, 
>         bi_rw = 0, 
>         bi_vcnt = 0, 
>         bi_idx = 0, 
>         bi_phys_segments = 0, 
>         bi_size = 0, 
>         bi_seg_front_size = 0, 
>         bi_seg_back_size = 0, 
>         bi_end_io = 0x0, 
>         bi_private = 0xffff881e53f5dca0, 
>         bi_ioc = 0x0, 
>         bi_css = 0x0, 
>         bi_integrity = 0x0, 
>         bi_max_vecs = 1, 
>         bi_cnt = {
>           counter = 1
>         }, 
>         bi_io_vec = 0xffff881e53f5ebb8, 
>         bi_pool = 0x0, 
>         {
>           bio_aux = 0x0, 
>           __UNIQUE_ID_rh_kabi_hide15 = {
>             rh_reserved1 = 0x0
>           }, 
>           {<No data fields>}
>         }, 
>         bi_inline_vecs = 0xffff881e53f5eba8
>       }, 
>       vec = {
>         bv_page = 0xffffea00794fe240, 
>         bv_len = 4096, 
>         bv_offset = 0
>       }, 
>       rvec = {
>         bv_page = 0x0, 
>         bv_len = 0, 
>         bv_offset = 0
>       }, 
>       page = 0xffffea00794fe240, 
>       orig_page = 0xffffea00794fe240, 
>       toread = 0x0, 
>       read = 0x0, 
>       towrite = 0x0, 
>       written = 0x0, 
>       sector = 0, 
>       flags = 16, 
>       log_checksum = 0
>     }}}
> 
> 
> ---------------------------------------------------
> 
> And this is the stripe_head devs from the crash dump I sent yesterday:
> 
> R13: ffff881e08d54d30
> 
> crash> p ((struct stripe_head)*0xffff881e08d54d30).dev@10
> $1 = 
>  {{{
>       req = {
>         bi_sector = 0, 
>         bi_next = 0x0, 
>         bi_bdev = 0x0, 
>         bi_flags = 1, 
>         bi_rw = 0, 
>         bi_vcnt = 0, 
>         bi_idx = 0, 
>         bi_phys_segments = 0, 
>         bi_size = 0, 
>         bi_seg_front_size = 0, 
>         bi_seg_back_size = 0, 
>         bi_end_io = 0x0, 
>         bi_private = 0xffff881e08d54d30, 
>         bi_ioc = 0x0, 
>         bi_css = 0x0, 
>         bi_integrity = 0x0, 
>         bi_max_vecs = 1, 
>         bi_cnt = {
>           counter = 1
>         }, 
>         bi_io_vec = 0xffff881e08d54f00, 
>         bi_pool = 0x0, 
>         {
>           bio_aux = 0x0, 
>           __UNIQUE_ID_rh_kabi_hide15 = {
>             rh_reserved1 = 0x0
>           }, 
>           {<No data fields>}
>         }, 
>         bi_inline_vecs = 0xffff881e08d54e78
>       }, 
>       rreq = {
>         bi_sector = 0, 
>         bi_next = 0x0, 
>         bi_bdev = 0x0, 
>         bi_flags = 1, 
>         bi_rw = 0, 
>         bi_vcnt = 0, 
>         bi_idx = 0, 
>         bi_phys_segments = 0, 
>         bi_size = 0, 
>         bi_seg_front_size = 0, 
>         bi_seg_back_size = 0, 
>         bi_end_io = 0x0, 
>         bi_private = 0xffff881e08d54d30, 
>         bi_ioc = 0x0, 
>         bi_css = 0x0, 
>         bi_integrity = 0x0, 
>         bi_max_vecs = 1, 
>         bi_cnt = {
>           counter = 1
>         }, 
>         bi_io_vec = 0xffff881e08d54f10, 
>         bi_pool = 0x0, 
>         {
>           bio_aux = 0x0, 
>           __UNIQUE_ID_rh_kabi_hide15 = {
>             rh_reserved1 = 0x0
>           }, 
>           {<No data fields>}
>         }, 
>         bi_inline_vecs = 0xffff881e08d54f00
>       }, 
>       vec = {
>         bv_page = 0xffffea0078236080, 
>         bv_len = 4096, 
>         bv_offset = 0
>       }, 
>       rvec = {
>         bv_page = 0x0, 
>         bv_len = 0, 
>         bv_offset = 0
>       }, 
>       page = 0xffffea0078236080, 
>       orig_page = 0xffffea0078236080, 
>       toread = 0x0, 
>       read = 0x0, 
>       towrite = 0x0, 
>       written = 0xffff8824832a7c00, 
>       sector = 29609798632, 
>       flags = 24, 
>       log_checksum = 0
>     }}, {{
>       req = {
>         bi_sector = 0, 
>         bi_next = 0x0, 
>         bi_bdev = 0x0, 
>         bi_flags = 1, 
>         bi_rw = 0, 
>         bi_vcnt = 0, 
>         bi_idx = 0, 
>         bi_phys_segments = 0, 
>         bi_size = 0, 
>         bi_seg_front_size = 0, 
>         bi_seg_back_size = 0, 
>         bi_end_io = 0x0, 
>         bi_private = 0xffff881e08d54d30, 
>         bi_ioc = 0x0, 
>         bi_css = 0x0, 
>         bi_integrity = 0x0, 
>         bi_max_vecs = 1, 
>         bi_cnt = {
>           counter = 1
>         }, 
>         bi_io_vec = 0xffff881e08d55078, 
>         bi_pool = 0x0, 
>         {
>           bio_aux = 0x0, 
>           __UNIQUE_ID_rh_kabi_hide15 = {
>             rh_reserved1 = 0x0
>           }, 
>           {<No data fields>}
>         }, 
>         bi_inline_vecs = 0xffff881e08d54ff0
>       }, 
>       rreq = {
>         bi_sector = 0, 
>         bi_next = 0x0, 
>         bi_bdev = 0x0, 
>         bi_flags = 1, 
>         bi_rw = 0, 
>         bi_vcnt = 0, 
>         bi_idx = 0, 
>         bi_phys_segments = 0, 
>         bi_size = 0, 
>         bi_seg_front_size = 0, 
>         bi_seg_back_size = 0, 
>         bi_end_io = 0x0, 
>         bi_private = 0xffff881e08d54d30, 
>         bi_ioc = 0x0, 
>         bi_css = 0x0, 
>         bi_integrity = 0x0, 
>         bi_max_vecs = 1, 
>         bi_cnt = {
>           counter = 1
>         }, 
>         bi_io_vec = 0xffff881e08d55088, 
>         bi_pool = 0x0, 
>         {
>           bio_aux = 0x0, 
>           __UNIQUE_ID_rh_kabi_hide15 = {
>             rh_reserved1 = 0x0
>           }, 
>           {<No data fields>}
>         }, 
>         bi_inline_vecs = 0xffff881e08d55078
>       }, 
>       vec = {
>         bv_page = 0xffffea00782360c0, 
>         bv_len = 4096, 
>         bv_offset = 0
>       }, 
>       rvec = {
>         bv_page = 0x0, 
>         bv_len = 0, 
>         bv_offset = 0
>       }, 
>       page = 0xffffea00782360c0, 
>       orig_page = 0xffffea00782360c0, 
>       toread = 0x0, 
>       read = 0x0, 
>       towrite = 0x0, 
>       written = 0x0, 
>       sector = 0, 
>       flags = 16, 
>       log_checksum = 0
>     }}, {{
>       req = {
>         bi_sector = 0, 
>         bi_next = 0x0, 
>         bi_bdev = 0x0, 
>         bi_flags = 1, 
>         bi_rw = 0, 
>         bi_vcnt = 0, 
>         bi_idx = 0, 
>         bi_phys_segments = 0, 
>         bi_size = 0, 
>         bi_seg_front_size = 0, 
>         bi_seg_back_size = 0, 
>         bi_end_io = 0x0, 
>         bi_private = 0xffff881e08d54d30, 
>         bi_ioc = 0x0, 
>         bi_css = 0x0, 
>         bi_integrity = 0x0, 
>         bi_max_vecs = 1, 
>         bi_cnt = {
>           counter = 1
>         }, 
>         bi_io_vec = 0xffff881e08d551f0, 
>         bi_pool = 0x0, 
>         {
>           bio_aux = 0x0, 
>           __UNIQUE_ID_rh_kabi_hide15 = {
>             rh_reserved1 = 0x0
>           }, 
>           {<No data fields>}
>         }, 
>         bi_inline_vecs = 0xffff881e08d55168
>       }, 
>       rreq = {
>         bi_sector = 0, 
>         bi_next = 0x0, 
>         bi_bdev = 0x0, 
>         bi_flags = 1, 
>         bi_rw = 0, 
>         bi_vcnt = 0, 
>         bi_idx = 0, 
>         bi_phys_segments = 0, 
>         bi_size = 0, 
>         bi_seg_front_size = 0, 
>         bi_seg_back_size = 0, 
>         bi_end_io = 0x0, 
>         bi_private = 0xffff881e08d54d30, 
>         bi_ioc = 0x0, 
>         bi_css = 0x0, 
>         bi_integrity = 0x0, 
>         bi_max_vecs = 1, 
>         bi_cnt = {
>           counter = 1
>         }, 
>         bi_io_vec = 0xffff881e08d55200, 
>         bi_pool = 0x0, 
>         {
>           bio_aux = 0x0, 
>           __UNIQUE_ID_rh_kabi_hide15 = {
>             rh_reserved1 = 0x0
>           }, 
>           {<No data fields>}
>         }, 
>         bi_inline_vecs = 0xffff881e08d551f0
>       }, 
>       vec = {
>         bv_page = 0xffffea0078236100, 
>         bv_len = 4096, 
>         bv_offset = 0
>       }, 
>       rvec = {
>         bv_page = 0x0, 
>         bv_len = 0, 
>         bv_offset = 0
>       }, 
>       page = 0xffffea0078236100, 
>       orig_page = 0xffffea0078236100, 
>       toread = 0x0, 
>       read = 0x0, 
>       towrite = 0x0, 
>       written = 0x0, 
>       sector = 0, 
>       flags = 16, 
>       log_checksum = 0
>     }}, {{
>       req = {
>         bi_sector = 0, 
>         bi_next = 0x0, 
>         bi_bdev = 0x0, 
>         bi_flags = 1, 
>         bi_rw = 0, 
>         bi_vcnt = 0, 
>         bi_idx = 0, 
>         bi_phys_segments = 0, 
>         bi_size = 0, 
>         bi_seg_front_size = 0, 
>         bi_seg_back_size = 0, 
>         bi_end_io = 0x0, 
>         bi_private = 0xffff881e08d54d30, 
>         bi_ioc = 0x0, 
>         bi_css = 0x0, 
>         bi_integrity = 0x0, 
>         bi_max_vecs = 1, 
>         bi_cnt = {
>           counter = 1
>         }, 
>         bi_io_vec = 0xffff881e08d55368, 
>         bi_pool = 0x0, 
>         {
>           bio_aux = 0x0, 
>           __UNIQUE_ID_rh_kabi_hide15 = {
>             rh_reserved1 = 0x0
>           }, 
>           {<No data fields>}
>         }, 
>         bi_inline_vecs = 0xffff881e08d552e0
>       }, 
>       rreq = {
>         bi_sector = 0, 
>         bi_next = 0x0, 
>         bi_bdev = 0x0, 
>         bi_flags = 1, 
>         bi_rw = 0, 
>         bi_vcnt = 0, 
>         bi_idx = 0, 
>         bi_phys_segments = 0, 
>         bi_size = 0, 
>         bi_seg_front_size = 0, 
>         bi_seg_back_size = 0, 
>         bi_end_io = 0x0, 
>         bi_private = 0xffff881e08d54d30, 
>         bi_ioc = 0x0, 
>         bi_css = 0x0, 
>         bi_integrity = 0x0, 
>         bi_max_vecs = 1, 
>         bi_cnt = {
>           counter = 1
>         }, 
>         bi_io_vec = 0xffff881e08d55378, 
>         bi_pool = 0x0, 
>         {
>           bio_aux = 0x0, 
>           __UNIQUE_ID_rh_kabi_hide15 = {
>             rh_reserved1 = 0x0
>           }, 
>           {<No data fields>}
>         }, 
>         bi_inline_vecs = 0xffff881e08d55368
>       }, 
>       vec = {
>         bv_page = 0xffffea0078236140, 
>         bv_len = 4096, 
>         bv_offset = 0
>       }, 
>       rvec = {
>         bv_page = 0x0, 
>         bv_len = 0, 
>         bv_offset = 0
>       }, 
>       page = 0xffffea0078236140, 
>       orig_page = 0xffffea0078236140, 
>       toread = 0x0, 
>       read = 0x0, 
>       towrite = 0x0, 
>       written = 0xffff8824832a7c00, 
>       sector = 29609797736, 
>       flags = 24, 
>       log_checksum = 0
>     }}, {{
>       req = {
>         bi_sector = 0, 
>         bi_next = 0x0, 
>         bi_bdev = 0x0, 
>         bi_flags = 1, 
>         bi_rw = 0, 
>         bi_vcnt = 0, 
>         bi_idx = 0, 
>         bi_phys_segments = 0, 
>         bi_size = 0, 
>         bi_seg_front_size = 0, 
>         bi_seg_back_size = 0, 
>         bi_end_io = 0x0, 
>         bi_private = 0xffff881e08d54d30, 
>         bi_ioc = 0x0, 
>         bi_css = 0x0, 
>         bi_integrity = 0x0, 
>         bi_max_vecs = 1, 
>         bi_cnt = {
>           counter = 1
>         }, 
>         bi_io_vec = 0xffff881e08d554e0, 
>         bi_pool = 0x0, 
>         {
>           bio_aux = 0x0, 
>           __UNIQUE_ID_rh_kabi_hide15 = {
>             rh_reserved1 = 0x0
>           }, 
>           {<No data fields>}
>         }, 
>         bi_inline_vecs = 0xffff881e08d55458
>       }, 
>       rreq = {
>         bi_sector = 0, 
>         bi_next = 0x0, 
>         bi_bdev = 0x0, 
>         bi_flags = 1, 
>         bi_rw = 0, 
>         bi_vcnt = 0, 
>         bi_idx = 0, 
>         bi_phys_segments = 0, 
>         bi_size = 0, 
>         bi_seg_front_size = 0, 
>         bi_seg_back_size = 0, 
>         bi_end_io = 0x0, 
>         bi_private = 0xffff881e08d54d30, 
>         bi_ioc = 0x0, 
>         bi_css = 0x0, 
>         bi_integrity = 0x0, 
>         bi_max_vecs = 1, 
>         bi_cnt = {
>           counter = 1
>         }, 
>         bi_io_vec = 0xffff881e08d554f0, 
>         bi_pool = 0x0, 
>         {
>           bio_aux = 0x0, 
>           __UNIQUE_ID_rh_kabi_hide15 = {
>             rh_reserved1 = 0x0
>           }, 
>           {<No data fields>}
>         }, 
>         bi_inline_vecs = 0xffff881e08d554e0
>       }, 
>       vec = {
>         bv_page = 0xffffea0078236180, 
>         bv_len = 4096, 
>         bv_offset = 0
>       }, 
>       rvec = {
>         bv_page = 0x0, 
>         bv_len = 0, 
>         bv_offset = 0
>       }, 
>       page = 0xffffea0078236180, 
>       orig_page = 0xffffea0078236180, 
>       toread = 0x0, 
>       read = 0x0, 
>       towrite = 0x0, 
>       written = 0xffff8824832a7c00, 
>       sector = 29609797864, 
>       flags = 24, 
>       log_checksum = 0
>     }}, {{
>       req = {
>         bi_sector = 0, 
>         bi_next = 0x0, 
>         bi_bdev = 0x0, 
>         bi_flags = 1, 
>         bi_rw = 0, 
>         bi_vcnt = 0, 
>         bi_idx = 0, 
>         bi_phys_segments = 0, 
>         bi_size = 0, 
>         bi_seg_front_size = 0, 
>         bi_seg_back_size = 0, 
>         bi_end_io = 0x0, 
>         bi_private = 0xffff881e08d54d30, 
>         bi_ioc = 0x0, 
>         bi_css = 0x0, 
>         bi_integrity = 0x0, 
>         bi_max_vecs = 1, 
>         bi_cnt = {
>           counter = 1
>         }, 
>         bi_io_vec = 0xffff881e08d55658, 
>         bi_pool = 0x0, 
>         {
>           bio_aux = 0x0, 
>           __UNIQUE_ID_rh_kabi_hide15 = {
>             rh_reserved1 = 0x0
>           }, 
>           {<No data fields>}
>         }, 
>         bi_inline_vecs = 0xffff881e08d555d0
>       }, 
>       rreq = {
>         bi_sector = 0, 
>         bi_next = 0x0, 
>         bi_bdev = 0x0, 
>         bi_flags = 1, 
>         bi_rw = 0, 
>         bi_vcnt = 0, 
>         bi_idx = 0, 
>         bi_phys_segments = 0, 
>         bi_size = 0, 
>         bi_seg_front_size = 0, 
>         bi_seg_back_size = 0, 
>         bi_end_io = 0x0, 
>         bi_private = 0xffff881e08d54d30, 
>         bi_ioc = 0x0, 
>         bi_css = 0x0, 
>         bi_integrity = 0x0, 
>         bi_max_vecs = 1, 
>         bi_cnt = {
>           counter = 1
>         }, 
>         bi_io_vec = 0xffff881e08d55668, 
>         bi_pool = 0x0, 
>         {
>           bio_aux = 0x0, 
>           __UNIQUE_ID_rh_kabi_hide15 = {
>             rh_reserved1 = 0x0
>           }, 
>           {<No data fields>}
>         }, 
>         bi_inline_vecs = 0xffff881e08d55658
>       }, 
>       vec = {
>         bv_page = 0xffffea00782361c0, 
>         bv_len = 4096, 
>         bv_offset = 0
>       }, 
>       rvec = {
>         bv_page = 0x0, 
>         bv_len = 0, 
>         bv_offset = 0
>       }, 
>       page = 0xffffea00782361c0, 
>       orig_page = 0xffffea00782361c0, 
>       toread = 0x0, 
>       read = 0x0, 
>       towrite = 0x0, 
>       written = 0xffff8824832a7c00, 
>       sector = 29609797992, 
>       flags = 24, 
>       log_checksum = 0
>     }}, {{
>       req = {
>         bi_sector = 0, 
>         bi_next = 0x0, 
>         bi_bdev = 0x0, 
>         bi_flags = 1, 
>         bi_rw = 0, 
>         bi_vcnt = 0, 
>         bi_idx = 0, 
>         bi_phys_segments = 0, 
>         bi_size = 0, 
>         bi_seg_front_size = 0, 
>         bi_seg_back_size = 0, 
>         bi_end_io = 0x0, 
>         bi_private = 0xffff881e08d54d30, 
>         bi_ioc = 0x0, 
>         bi_css = 0x0, 
>         bi_integrity = 0x0, 
>         bi_max_vecs = 1, 
>         bi_cnt = {
>           counter = 1
>         }, 
>         bi_io_vec = 0xffff881e08d557d0, 
>         bi_pool = 0x0, 
>         {
>           bio_aux = 0x0, 
>           __UNIQUE_ID_rh_kabi_hide15 = {
>             rh_reserved1 = 0x0
>           }, 
>           {<No data fields>}
>         }, 
>         bi_inline_vecs = 0xffff881e08d55748
>       }, 
>       rreq = {
>         bi_sector = 0, 
>         bi_next = 0x0, 
>         bi_bdev = 0x0, 
>         bi_flags = 1, 
>         bi_rw = 0, 
>         bi_vcnt = 0, 
>         bi_idx = 0, 
>         bi_phys_segments = 0, 
>         bi_size = 0, 
>         bi_seg_front_size = 0, 
>         bi_seg_back_size = 0, 
>         bi_end_io = 0x0, 
>         bi_private = 0xffff881e08d54d30, 
>         bi_ioc = 0x0, 
>         bi_css = 0x0, 
>         bi_integrity = 0x0, 
>         bi_max_vecs = 1, 
>         bi_cnt = {
>           counter = 1
>         }, 
>         bi_io_vec = 0xffff881e08d557e0, 
>         bi_pool = 0x0, 
>         {
>           bio_aux = 0x0, 
>           __UNIQUE_ID_rh_kabi_hide15 = {
>             rh_reserved1 = 0x0
>           }, 
>           {<No data fields>}
>         }, 
>         bi_inline_vecs = 0xffff881e08d557d0
>       }, 
>       vec = {
>         bv_page = 0xffffea0078236200, 
>         bv_len = 4096, 
>         bv_offset = 0
>       }, 
>       rvec = {
>         bv_page = 0x0, 
>         bv_len = 0, 
>         bv_offset = 0
>       }, 
>       page = 0xffffea0078236200, 
>       orig_page = 0xffffea0078236200, 
>       toread = 0x0, 
>       read = 0x0, 
>       towrite = 0x0, 
>       written = 0xffff8824832a7c00, 
>       sector = 29609798120, 
>       flags = 24, 
>       log_checksum = 0
>     }}, {{
>       req = {
>         bi_sector = 0, 
>         bi_next = 0x0, 
>         bi_bdev = 0x0, 
>         bi_flags = 1, 
>         bi_rw = 0, 
>         bi_vcnt = 0, 
>         bi_idx = 0, 
>         bi_phys_segments = 0, 
>         bi_size = 0, 
>         bi_seg_front_size = 0, 
>         bi_seg_back_size = 0, 
>         bi_end_io = 0x0, 
>         bi_private = 0xffff881e08d54d30, 
>         bi_ioc = 0x0, 
>         bi_css = 0x0, 
>         bi_integrity = 0x0, 
>         bi_max_vecs = 1, 
>         bi_cnt = {
>           counter = 1
>         }, 
>         bi_io_vec = 0xffff881e08d55948, 
>         bi_pool = 0x0, 
>         {
>           bio_aux = 0x0, 
>           __UNIQUE_ID_rh_kabi_hide15 = {
>             rh_reserved1 = 0x0
>           }, 
>           {<No data fields>}
>         }, 
>         bi_inline_vecs = 0xffff881e08d558c0
>       }, 
>       rreq = {
>         bi_sector = 0, 
>         bi_next = 0x0, 
>         bi_bdev = 0x0, 
>         bi_flags = 1, 
>         bi_rw = 0, 
>         bi_vcnt = 0, 
>         bi_idx = 0, 
>         bi_phys_segments = 0, 
>         bi_size = 0, 
>         bi_seg_front_size = 0, 
>         bi_seg_back_size = 0, 
>         bi_end_io = 0x0, 
>         bi_private = 0xffff881e08d54d30, 
>         bi_ioc = 0x0, 
>         bi_css = 0x0, 
>         bi_integrity = 0x0, 
>         bi_max_vecs = 1, 
>         bi_cnt = {
>           counter = 1
>         }, 
>         bi_io_vec = 0xffff881e08d55958, 
>         bi_pool = 0x0, 
>         {
>           bio_aux = 0x0, 
>           __UNIQUE_ID_rh_kabi_hide15 = {
>             rh_reserved1 = 0x0
>           }, 
>           {<No data fields>}
>         }, 
>         bi_inline_vecs = 0xffff881e08d55948
>       }, 
>       vec = {
>         bv_page = 0xffffea0078236240, 
>         bv_len = 4096, 
>         bv_offset = 0
>       }, 
>       rvec = {
>         bv_page = 0x0, 
>         bv_len = 0, 
>         bv_offset = 0
>       }, 
>       page = 0xffffea0078236240, 
>       orig_page = 0xffffea0078236240, 
>       toread = 0x0, 
>       read = 0x0, 
>       towrite = 0x0, 
>       written = 0xffff8824832a7c00, 
>       sector = 29609798248, 
>       flags = 24, 
>       log_checksum = 0
>     }}, {{
>       req = {
>         bi_sector = 0, 
>         bi_next = 0x0, 
>         bi_bdev = 0x0, 
>         bi_flags = 1, 
>         bi_rw = 0, 
>         bi_vcnt = 0, 
>         bi_idx = 0, 
>         bi_phys_segments = 0, 
>         bi_size = 0, 
>         bi_seg_front_size = 0, 
>         bi_seg_back_size = 0, 
>         bi_end_io = 0x0, 
>         bi_private = 0xffff881e08d54d30, 
>         bi_ioc = 0x0, 
>         bi_css = 0x0, 
>         bi_integrity = 0x0, 
>         bi_max_vecs = 1, 
>         bi_cnt = {
>           counter = 1
>         }, 
>         bi_io_vec = 0xffff881e08d55ac0, 
>         bi_pool = 0x0, 
>         {
>           bio_aux = 0x0, 
>           __UNIQUE_ID_rh_kabi_hide15 = {
>             rh_reserved1 = 0x0
>           }, 
>           {<No data fields>}
>         }, 
>         bi_inline_vecs = 0xffff881e08d55a38
>       }, 
>       rreq = {
>         bi_sector = 0, 
>         bi_next = 0x0, 
>         bi_bdev = 0x0, 
>         bi_flags = 1, 
>         bi_rw = 0, 
>         bi_vcnt = 0, 
>         bi_idx = 0, 
>         bi_phys_segments = 0, 
>         bi_size = 0, 
>         bi_seg_front_size = 0, 
>         bi_seg_back_size = 0, 
>         bi_end_io = 0x0, 
>         bi_private = 0xffff881e08d54d30, 
>         bi_ioc = 0x0, 
>         bi_css = 0x0, 
>         bi_integrity = 0x0, 
>         bi_max_vecs = 1, 
>         bi_cnt = {
>           counter = 1
>         }, 
>         bi_io_vec = 0xffff881e08d55ad0, 
>         bi_pool = 0x0, 
>         {
>           bio_aux = 0x0, 
>           __UNIQUE_ID_rh_kabi_hide15 = {
>             rh_reserved1 = 0x0
>           }, 
>           {<No data fields>}
>         }, 
>         bi_inline_vecs = 0xffff881e08d55ac0
>       }, 
>       vec = {
>         bv_page = 0xffffea0078236280, 
>         bv_len = 4096, 
>         bv_offset = 0
>       }, 
>       rvec = {
>         bv_page = 0x0, 
>         bv_len = 0, 
>         bv_offset = 0
>       }, 
>       page = 0xffffea0078236280, 
>       orig_page = 0xffffea0078236280, 
>       toread = 0x0, 
>       read = 0x0, 
>       towrite = 0x0, 
>       written = 0xffff8824832a7c00, 
>       sector = 29609798376, 
>       flags = 24, 
>       log_checksum = 0
>     }}, {{
>       req = {
>         bi_sector = 0, 
>         bi_next = 0x0, 
>         bi_bdev = 0x0, 
>         bi_flags = 1, 
>         bi_rw = 0, 
>         bi_vcnt = 0, 
>         bi_idx = 0, 
>         bi_phys_segments = 0, 
>         bi_size = 0, 
>         bi_seg_front_size = 0, 
>         bi_seg_back_size = 0, 
>         bi_end_io = 0x0, 
>         bi_private = 0xffff881e08d54d30, 
>         bi_ioc = 0x0, 
>         bi_css = 0x0, 
>         bi_integrity = 0x0, 
>         bi_max_vecs = 1, 
>         bi_cnt = {
>           counter = 1
>         }, 
>         bi_io_vec = 0xffff881e08d55c38, 
>         bi_pool = 0x0, 
>         {
>           bio_aux = 0x0, 
>           __UNIQUE_ID_rh_kabi_hide15 = {
>             rh_reserved1 = 0x0
>           }, 
>           {<No data fields>}
>         }, 
>         bi_inline_vecs = 0xffff881e08d55bb0
>       }, 
>       rreq = {
>         bi_sector = 0, 
>         bi_next = 0x0, 
>         bi_bdev = 0x0, 
>         bi_flags = 1, 
>         bi_rw = 0, 
>         bi_vcnt = 0, 
>         bi_idx = 0, 
>         bi_phys_segments = 0, 
>         bi_size = 0, 
>         bi_seg_front_size = 0, 
>         bi_seg_back_size = 0, 
>         bi_end_io = 0x0, 
>         bi_private = 0xffff881e08d54d30, 
>         bi_ioc = 0x0, 
>         bi_css = 0x0, 
>         bi_integrity = 0x0, 
>         bi_max_vecs = 1, 
>         bi_cnt = {
>           counter = 1
>         }, 
>         bi_io_vec = 0xffff881e08d55c48, 
>         bi_pool = 0x0, 
>         {
>           bio_aux = 0x0, 
>           __UNIQUE_ID_rh_kabi_hide15 = {
>             rh_reserved1 = 0x0
>           }, 
>           {<No data fields>}
>         }, 
>         bi_inline_vecs = 0xffff881e08d55c38
>       }, 
>       vec = {
>         bv_page = 0xffffea00782362c0, 
>         bv_len = 4096, 
>         bv_offset = 0
>       }, 
>       rvec = {
>         bv_page = 0x0, 
>         bv_len = 0, 
>         bv_offset = 0
>       }, 
>       page = 0xffffea00782362c0, 
>       orig_page = 0xffffea00782362c0, 
>       toread = 0x0, 
>       read = 0x0, 
>       towrite = 0x0, 
>       written = 0xffff8824832a7c00, 
>       sector = 29609798504, 
>       flags = 24, 
>       log_checksum = 0
>     }}}

Didn't find anything interesting yet, sorry. There might be a race condition
which makes the stripe batch head and also adds it to another batch list,
that's my current theory, but can't find such race condition.

So in the crashdump, we have something like this:

batch_list = {
    next = 0xffff881dfe405d28,
    prev = 0xffff881e0e2dec98
  },

Can you convert the next/prev to stripes and dump their state? That probably is
helpful for debugging.

Thanks,
Shaohua

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: BUG_ON(sh->batch_head) in init_stripe()
  2017-08-17 22:58           ` Shaohua Li
@ 2017-08-18  4:13             ` Stephane Thiell
  2017-08-18  5:50               ` Shaohua Li
  0 siblings, 1 reply; 15+ messages in thread
From: Stephane Thiell @ 2017-08-18  4:13 UTC (permalink / raw)
  To: Shaohua Li; +Cc: Guoqing Jiang, linux-raid

> On Aug 17, 2017, at 3:58 PM, Shaohua Li <shli@kernel.org> wrote:
> 
> On Tue, Aug 15, 2017 at 07:27:44PM +0000, Stephane Thiell wrote:
>> 
>>> On Aug 15, 2017, at 11:39 AM, Shaohua Li <shli@kernel.org> wrote:
>>> 
>>> On Tue, Aug 15, 2017 at 05:59:38PM +0000, Stephane Thiell wrote:
>>>> 
>>>>> On Aug 15, 2017, at 2:24 AM, Guoqing Jiang <gqjiang@suse.com> wrote:
>>>>> 
>>>>> 
>>>>> 
>>>>> On 08/14/2017 01:29 AM, Stephane Thiell wrote:
>>>>>> Also known as "kernel BUG at drivers/md/raid5.c:527!"
>>>>>> 
>>>>>> I’m still struggling with this issue in raid5.c. I currently have one crash per week in average on a system using Lustre and 24 x 10-disk raid6 per server. It did happen with or without check running. I now think there is a race at some point where a released stripe head (batched) is added to conf->inactive_list without sh->batch_head being set back to NULL.
>>>>>> 
>>>>>> Because sh->batch_head is only being set back to NULL in a single function, break_stripe_batch_list(), my guess is that a call to raid5_release_stripe() on the head stripe is performed elsewhere. I didn’t find where this could come from yet. So perhaps sh->batch_head should be just set to NULL in do_release_stripe(), before being added to the inactive list?
>>>>>> 
>>>>>> Alternatively, I wonder if it would make more sense to set batch_head to NULL where sh->overwrite_disks is also set back to 0 (like in ops_run_biodrain() today).
>>>>>> 
>>>>>> Because sh->overwrite_disks is also set to 0 in init_stripe(), my current plan is to try this change, at least as a workaround: remove the BUG_ON(sh->batch_head) from init_stripe() and set sh->batch_head = NULL (I don’t think I need to lock on sh->stripe_lock there?). I will get new hardware soon so I will give it a try on a test system. Any suggestions welcomed…
>>>>>> 
>>>>>> Quick analysis of the last vmcore:
>>>>>> 
>>>>>> crash> ps | grep ">"
>>>>>> ...
>>>>>>> 16119  16111  22  ffff881f93bace70  RU   0.0  150772  10748  e2label
>>>>>>> 151448      2  34  ffff881ff753bec0  RU   0.0       0      0  [kiblnd_sd_00_00]
>>>>>>> 151449      2   2  ffff881ff753edd0  RU   0.0       0      0  [kiblnd_sd_00_01]
>>>>>>> 154637      2  29  ffff881f776b1f60  RU   0.0       0      0  [md32_raid6]
>>>>>>> 163643      2  11  ffff8820d200bec0  UN   0.0       0      0  [ll_ost_io01_008]
>>>>>>> 164083      2  13  ffff8820d2753ec0  UN   0.0       0      0  [ll_ost_io01_034]
>>>>>>> 171389      2   5  ffff883fd072de20  UN   0.0       0      0  [ll_ost_io01_058]
>>>>>>> 171391      2  17  ffff883fd072ce70  UN   0.0       0      0  [ll_ost_io01_060]
>>>>>>> 171399      2  33  ffff883fd9e9ce70  UN   0.0       0      0  [ll_ost_io01_068]
>>>>>>> 237019      2  15  ffff882715628fb0  UN   0.0       0      0  [ll_ost_io01_084]
>>>>>>> 237033      2   3  ffff881fb0250fb0  UN   0.0       0      0  [ll_ost_io01_098]
>>>>>> ...
>>>>>> 
>>>>>> crash>  bt 237033
>>>>>> PID: 237033  TASK: ffff881fb0250fb0  CPU: 3   COMMAND: "ll_ost_io01_098"
>>>>>> #0 [ffff8826fa3e7240] machine_kexec at ffffffff81059bdb
>>>>>> #1 [ffff8826fa3e72a0] __crash_kexec at ffffffff811057c2
>>>>>> #2 [ffff8826fa3e7370] crash_kexec at ffffffff811058b0
>>>>>> #3 [ffff8826fa3e7388] oops_end at ffffffff8168fac8
>>>>>> #4 [ffff8826fa3e73b0] die at ffffffff8102e93b
>>>>>> #5 [ffff8826fa3e73e0] do_trap at ffffffff8168f180
>>>>>> #6 [ffff8826fa3e7430] do_invalid_op at ffffffff8102b144
>>>>>> #7 [ffff8826fa3e74e0] invalid_op at ffffffff81698b5e
>>>>>>   [exception RIP: raid5_get_active_stripe+1809]
>>>>>>   RIP: ffffffffa0fefd71  RSP: ffff8826fa3e7590  RFLAGS: 00010086
>>>>>>   RAX: 0000000000000000  RBX: ffff883df051641c  RCX: dead000000000200
>>>>>>   RDX: 0000000000000000  RSI: 0000000000000005  RDI: ffff881e08d54d30
>>>>>>   RBP: ffff8826fa3e7638   R8: ffff881e08d54d40   R9: 0000000000000080
>>>>>>   R10: 0000000000000000  R11: 0000000000000000  R12: ffff883df0516400
>>>>>>   R13: ffff881e08d54d30  R14: 0000000000000000  R15: 0000000000000000
>>>>>>   ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
>>>>>> #8 [ffff8826fa3e7640] raid5_make_request at ffffffffa0feff55 [raid456]
>>>>>> #9 [ffff8826fa3e7730] md_make_request at ffffffff814f838c
>>>>>> #10 [ffff8826fa3e7790] generic_make_request at ffffffff812eedc9
>>>>>> #11 [ffff8826fa3e77d8] submit_bio at ffffffff812eef11
>>>>>> #12 [ffff8826fa3e7830] osd_submit_bio at ffffffffa10cdbcc [osd_ldiskfs]
>>>>>> #13 [ffff8826fa3e7840] osd_do_bio at ffffffffa10d03c1 [osd_ldiskfs]
>>>>>> #14 [ffff8826fa3e7960] osd_write_commit at ffffffffa10d0b94 [osd_ldiskfs]
>>>>>> #15 [ffff8826fa3e7a00] ofd_commitrw_write at ffffffffa120c734 [ofd]
>>>>>> #16 [ffff8826fa3e7a78] ofd_commitrw at ffffffffa120ff07 [ofd]
>>>>>> #17 [ffff8826fa3e7b08] obd_commitrw at ffffffffa0c7dc22 [ptlrpc]
>>>>>> #18 [ffff8826fa3e7b70] tgt_brw_write at ffffffffa0c55fc1 [ptlrpc]
>>>>>> #19 [ffff8826fa3e7cd8] tgt_request_handle at ffffffffa0c52275 [ptlrpc]
>>>>>> #20 [ffff8826fa3e7d20] ptlrpc_server_handle_request at ffffffffa0bfe1fb [ptlrpc]
>>>>>> #21 [ffff8826fa3e7de8] ptlrpc_main at ffffffffa0c022b0 [ptlrpc]
>>>>>> #22 [ffff8826fa3e7ec8] kthread at ffffffff810b0a4f
>>>>>> #23 [ffff8826fa3e7f50] ret_from_fork at ffffffff81697218
>>>>>> 
>>>>>> 
>>>>>> After some disassembly work, I found out that all active tasks are working on the same md (all are raid6):
>>>>>> 
>>>>>> md32’s (struct r5conf *) is 0xffff883df0516400
>>>>>> 
>>>>>> Our crashed task:
>>>>>> 
>>>>>> crash> printf "%s", ((struct r5conf)*0xffff883df0516400).cache_name[0]
>>>>>> raid6-md32
>>>>>> 
>>>>>> Example for md32_raid6 task in release_inactive_stripe_list:
>>>>>> 
>>>>>> crash> bt 154637
>>>>>> PID: 154637  TASK: ffff881f776b1f60  CPU: 29  COMMAND: "md32_raid6"
>>>>>> #0 [ffff883ffc985e48] crash_nmi_callback at ffffffff8104d2e2
>>>>>> #1 [ffff883ffc985e58] nmi_handle at ffffffff8168fc77
>>>>>> #2 [ffff883ffc985eb0] do_nmi at ffffffff8168fe83
>>>>>> #3 [ffff883ffc985ef0] end_repeat_nmi at ffffffff8168f093
>>>>>>   [exception RIP: _raw_spin_lock_irqsave+64]
>>>>>>   RIP: ffffffff8168e5f0  RSP: ffff881e8d0cbc60  RFLAGS: 00000006
>>>>>>   RAX: 00000000000031bb  RBX: ffff883df0516740  RCX: 000000000000009a
>>>>>>   RDX: 0000000000000098  RSI: 000000000000009a  RDI: ffff883df051641c
>>>>>>   RBP: ffff881e8d0cbc60   R8: 0000000000000206   R9: 0000000000000000
>>>>>>   R10: 0000000000000000  R11: 0000000000000007  R12: ffff883df0516400
>>>>>>   R13: ffff883df051641c  R14: 0000000000000006  R15: 0000000000000005
>>>>>>   ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
>>>>>> --- <NMI exception stack> ---
>>>>>> #4 [ffff881e8d0cbc60] _raw_spin_lock_irqsave at ffffffff8168e5f0
>>>>>> #5 [ffff881e8d0cbc68] release_inactive_stripe_list at ffffffffa0fe5500 [raid456]
>>>>>> #6 [ffff881e8d0cbcb0] handle_active_stripes at ffffffffa0ff34c2 [raid456]
>>>>>> #7 [ffff881e8d0cbd60] raid5d at ffffffffa0ff3c28 [raid456]
>>>>>> #8 [ffff881e8d0cbe50] md_thread at ffffffff814fb745
>>>>>> #9 [ffff881e8d0cbec8] kthread at ffffffff810b0a4f
>>>>>> #10 [ffff881e8d0cbf50] ret_from_fork at ffffffff81697218
>>>>>> 
>>>>>> 
>>>>>> crash> bt 163643 164083 171389 171391 171399 237019 237033 |grep "exception RIP:"
>>>>>>   [exception RIP: _raw_spin_lock_irq+63]
>>>>>>   [exception RIP: _raw_spin_lock_irq+63]
>>>>>>   [exception RIP: _raw_spin_lock_irq+58]
>>>>>>   [exception RIP: _raw_spin_lock_irq+63]
>>>>>>   [exception RIP: _raw_spin_lock_irq+58]
>>>>>>   [exception RIP: _raw_spin_lock_irq+63]
>>>>>>   [exception RIP: raid5_get_active_stripe+1809]
>>>>>> 
>>>>>> crash> bt 163643 164083 171389 171391 171399 237019 237033 |grep "raid5_get_active_stripe"
>>>>>> #5 [ffff883bcee93588] raid5_get_active_stripe at ffffffffa0fef6cb [raid456]
>>>>>> #5 [ffff883ba4af7588] raid5_get_active_stripe at ffffffffa0fef6cb [raid456]
>>>>>> #5 [ffff88390877b588] raid5_get_active_stripe at ffffffffa0fef6cb [raid456]
>>>>>> #5 [ffff88390803b588] raid5_get_active_stripe at ffffffffa0fef6cb [raid456]
>>>>>> #5 [ffff88390831f588] raid5_get_active_stripe at ffffffffa0fef6cb [raid456]
>>>>>> #5 [ffff8826fa7f7588] raid5_get_active_stripe at ffffffffa0fef6cb [raid456]
>>>>>>   [exception RIP: raid5_get_active_stripe+1809]
>>>>>> 
>>>>>> crash> bt 163643 164083 171389 171391 171399 237019 237033 |grep R12
>>>>>>   R10: 0000000000000003  R11: 0000000000000000  R12: ffff883df0516400
>>>>>>   R10: 0000000000000001  R11: 0000000000000000  R12: ffff883df0516400
>>>>>>   R10: 0000000000000005  R11: 0000000000000000  R12: ffff883df0516400
>>>>>>   R10: 0000000000000007  R11: 0000000000000000  R12: ffff883df0516400
>>>>>>   R10: 0000000000000009  R11: 0000000000000000  R12: ffff883df0516400
>>>>>>   R10: 0000000000000009  R11: 0000000000000000  R12: ffff883df0516400
>>>>>>   R10: 0000000000000000  R11: 0000000000000000  R12: ffff883df0516400
>>>>>> 
>>>>>> 
>>>>>> The bad stripe_head (sh) is available at (usually in %r13):
>>>>>> 
>>>>>> crash> p ((struct stripe_head)*0xffff881e08d54d30).raid_conf
>>>>>> $98 = (struct r5conf *) 0xffff883df0516400
>>>>>> 
>>>>>> and batch_head is not NULL... that's the issue:
>>>>>> 
>>>>>> crash> p ((struct stripe_head)*0xffff881e08d54d30).batch_head
>>>>>> $99 = (struct stripe_head *) 0xffff881e08d54d30
>>>>>> 
>>>>>> Full struct dump below. We can see than overwrite_disks is set to 0, state is 0… but this batch_head not set to NULL makes it crash in init_stripe():
>>>>>> 
>>>>>> crash> p ((struct stripe_head)*0xffff881e08d54d30)
>>>>>> $6 = {
>>>>>> hash = {
>>>>>>   next = 0x0,
>>>>>>   pprev = 0x0
>>>>>> },
>>>>>> lru = {
>>>>>>   next = 0xffff881e08d54d40,
>>>>>>   prev = 0xffff881e08d54d40
>>>>>> },
>>>>>> release_list = {
>>>>>>   next = 0xffff881dfe405cc0
>>>>>> },
>>>>>> raid_conf = 0xffff883df0516400,
>>>>>> generation = 0,
>>>>>> sector = 3701224808,
>>>>>> pd_idx = 1,
>>>>>> qd_idx = 2,
>>>>>> ddf_layout = 0,
>>>>>> hash_lock_index = 5,
>>>>>> state = 0,
>>>>>> count = {
>>>>>>   counter = 0
>>>>>> },
>>>>>> bm_seq = 22708787,
>>>>>> disks = 10,
>>>>>> overwrite_disks = 0,
>>>>>> check_state = check_state_idle,
>>>>>> reconstruct_state = reconstruct_state_idle,
>>>>>> stripe_lock = {
>>>>>>   {
>>>>>>     rlock = {
>>>>>>       raw_lock = {
>>>>>>         {
>>>>>>           head_tail = 3709263126,
>>>>>>           tickets = {
>>>>>>             head = 56598,
>>>>>>             tail = 56598
>>>>>>           }
>>>>>>         }
>>>>>>       }
>>>>>>     }
>>>>>>   }
>>>>>> },
>>>>>> cpu = 15,
>>>>>> group = 0x0,
>>>>>> batch_head = 0xffff881e08d54d30,
>>>>>> batch_lock = {
>>>>>>   {
>>>>>>     rlock = {
>>>>>>       raw_lock = {
>>>>>>         {
>>>>>>           head_tail = 9961624,
>>>>>>           tickets = {
>>>>>>             head = 152,
>>>>>>             tail = 152
>>>>>>           }
>>>>>>         }
>>>>>>       }
>>>>>>     }
>>>>>>   }
>>>>>> },
>>>>>> batch_list = {
>>>>>>   next = 0xffff881dfe405d28,
>>>>>>   prev = 0xffff881e0e2dec98
>>>>>> },
>>>>>> log_io = 0x0,
>>>>>> log_list = {
>>>>>>   next = 0x0,
>>>>>>   prev = 0x0
>>>>>> },
>>>>>> ops = {
>>>>>>   target = 0,
>>>>>>   target2 = 0,
>>>>>>   zero_sum_result = (unknown: 0)
>>>>>> },
>>>>>> dev = {{
>>>>>>     req = {
>>>>>>       bi_sector = 0,
>>>>>>       bi_next = 0x0,
>>>>>>       bi_bdev = 0x0,
>>>>>>       bi_flags = 1,
>>>>>>       bi_rw = 0,
>>>>>>       bi_vcnt = 0,
>>>>>>       bi_idx = 0,
>>>>>>       bi_phys_segments = 0,
>>>>>>       bi_size = 0,
>>>>>>       bi_seg_front_size = 0,
>>>>>>       bi_seg_back_size = 0,
>>>>>>       bi_end_io = 0x0,
>>>>>>       bi_private = 0xffff881e08d54d30,
>>>>>>       bi_ioc = 0x0,
>>>>>>       bi_css = 0x0,
>>>>>>       bi_integrity = 0x0,
>>>>>>       bi_max_vecs = 1,
>>>>>>       bi_cnt = {
>>>>>>         counter = 1
>>>>>>       },
>>>>>>       bi_io_vec = 0xffff881e08d54f00,
>>>>>>       bi_pool = 0x0,
>>>>>>       {
>>>>>>         bio_aux = 0x0,
>>>>>>         __UNIQUE_ID_rh_kabi_hide15 = {
>>>>>>           rh_reserved1 = 0x0
>>>>>>         },
>>>>>>         {<No data fields>}
>>>>>>       },
>>>>>>       bi_inline_vecs = 0xffff881e08d54e78
>>>>>>     },
>>>>>>     rreq = {
>>>>>>       bi_sector = 0,
>>>>>>       bi_next = 0x0,
>>>>>>       bi_bdev = 0x0,
>>>>>>       bi_flags = 1,
>>>>>>       bi_rw = 0,
>>>>>>       bi_vcnt = 0,
>>>>>>       bi_idx = 0,
>>>>>>       bi_phys_segments = 0,
>>>>>>       bi_size = 0,
>>>>>>       bi_seg_front_size = 0,
>>>>>>       bi_seg_back_size = 0,
>>>>>>       bi_end_io = 0x0,
>>>>>>       bi_private = 0xffff881e08d54d30,
>>>>>>       bi_ioc = 0x0,
>>>>>>       bi_css = 0x0,
>>>>>>       bi_integrity = 0x0,
>>>>>>       bi_max_vecs = 1,
>>>>>>       bi_cnt = {
>>>>>>         counter = 1
>>>>>>       },
>>>>>>       bi_io_vec = 0xffff881e08d54f10,
>>>>>>       bi_pool = 0x0,
>>>>>>       {
>>>>>>         bio_aux = 0x0,
>>>>>>         __UNIQUE_ID_rh_kabi_hide15 = {
>>>>>>           rh_reserved1 = 0x0
>>>>>>         },
>>>>>>         {<No data fields>}
>>>>>>       },
>>>>>>       bi_inline_vecs = 0xffff881e08d54f00
>>>>>>     },
>>>>>>     vec = {
>>>>>>       bv_page = 0xffffea0078236080,
>>>>>>       bv_len = 4096,
>>>>>>       bv_offset = 0
>>>>>>     },
>>>>>>     rvec = {
>>>>>>       bv_page = 0x0,
>>>>>>       bv_len = 0,
>>>>>>       bv_offset = 0
>>>>>>     },
>>>>>>     page = 0xffffea0078236080,
>>>>>>     orig_page = 0xffffea0078236080,
>>>>>>     toread = 0x0,
>>>>>>     read = 0x0,
>>>>>>     towrite = 0x0,
>>>>>>     written = 0xffff8824832a7c00,
>>>>>>     sector = 29609798632,
>>>>>>     flags = 24,
>>>>> 
>>>>> Seems the flags is R5_OVERWRITE | R5_Insync, handle_stripe_clean_event
>>>>> only calls break_stripe_batch_list if do_endio is set to true.
>>>>> 
>>>>> if (head_sh->batch_head && do_endio)
>>>>>              break_stripe_batch_list(head_sh, STRIPE_EXPAND_SYNC_FLAGS);
>>>>> 
>>>>> And do_endio is set to true if we meet the below condition.
>>>>> 
>>>>> if (!test_bit(R5_LOCKED, &dev->flags) &&
>>>>>                          (test_bit(R5_UPTODATE, &dev->flags) ||
>>>>>                           test_bit(R5_Discard, &dev->flags) ||
>>>>>                           test_bit(R5_SkipCopy, &dev->flags))) {
>>>>> 
>>>>> I guess it is why the batch_head is not null, just my 2 cents.
>>>> 
>>>> 
>>>> That’s a very good point, thanks Guoqing!
>>>> 
>>>> I checked another similar crash dump, and found out that this same flags was 16 (R5_Insync alone), which doesn’t meet the condition for having do_endio = true neither.
>>>> 
>>>> The code below in raid5_end_write_request does directly release a sh->batch_head, couldn’t that be an issue?
>>>> 
>>>> static void raid5_end_write_request(struct bio *bi)
>>>> {
>>>> 
>>>> ...
>>>> 
>>>>       raid5_release_stripe(sh);
>>>> 
>>>>       if (sh->batch_head && sh != sh->batch_head)
>>>>               raid5_release_stripe(sh->batch_head);
>>>> }
>>> 
>>> Can you dump the full stripe_head info for this crash dump?
>> 
>> Yes, of course!
>> 
>> R13: ffff881e53f5dca0
>> 
>> crash> p (struct stripe_head)*0xffff881e53f5dca0
>> $1 = {
>>  hash = {
>>    next = 0x0, 
>>    pprev = 0x0
>>  }, 
>>  lru = {
>>    next = 0xffff881e53f5dcb0, 
>>    prev = 0xffff881e53f5dcb0
>>  }, 
>>  release_list = {
>>    next = 0xffff881e63faec30
>>  }, 
>>  raid_conf = 0xffff881f9fb81400, 
>>  generation = 0, 
>>  sector = 2278766080, 
>>  pd_idx = 9, 
>>  qd_idx = 0, 
>>  ddf_layout = 0, 
>>  hash_lock_index = 0, 
>>  state = 0, 
>>  count = {
>>    counter = 0
>>  }, 
>>  bm_seq = 21153188, 
>>  disks = 10, 
>>  overwrite_disks = 0, 
>>  check_state = check_state_idle, 
>>  reconstruct_state = reconstruct_state_idle, 
>>  stripe_lock = {
>>    {
>>      rlock = {
>>        raw_lock = {
>>          {
>>            head_tail = 3483684772, 
>>            tickets = {
>>              head = 53156, 
>>              tail = 53156
>>            }
>>          }
>>        }
>>      }
>>    }
>>  }, 
>>  cpu = 12, 
>>  group = 0x0, 
>>  batch_head = 0xffff881e53f5dca0, 
>>  batch_lock = {
>>    {
>>      rlock = {
>>        raw_lock = {
>>          {
>>            head_tail = 1983149620, 
>>            tickets = {
>>              head = 30260, 
>>              tail = 30260
>>            }
>>          }
>>        }
>>      }
>>    }
>>  }, 
>>  batch_list = {
>>    next = 0xffff881e63faec98, 
>>    prev = 0xffff881e5c390088
>>  }, 
>>  log_io = 0x0, 
>>  log_list = {
>>    next = 0x0, 
>>    prev = 0x0
>>  }, 
>>  ops = {
>>    target = 0, 
>>    target2 = 0, 
>>    zero_sum_result = (unknown: 0)
>>  }, 
>>  dev = {{
>>      req = {
>>        bi_sector = 0, 
>>        bi_next = 0x0, 
>>        bi_bdev = 0x0, 
>>        bi_flags = 1, 
>>        bi_rw = 0, 
>>        bi_vcnt = 0, 
>>        bi_idx = 0, 
>>        bi_phys_segments = 0, 
>>        bi_size = 0, 
>>        bi_seg_front_size = 0, 
>>        bi_seg_back_size = 0, 
>>        bi_end_io = 0x0, 
>>        bi_private = 0xffff881e53f5dca0, 
>>        bi_ioc = 0x0, 
>>        bi_css = 0x0, 
>>        bi_integrity = 0x0, 
>>        bi_max_vecs = 1, 
>>        bi_cnt = {
>>          counter = 1
>>        }, 
>>        bi_io_vec = 0xffff881e53f5de70, 
>>        bi_pool = 0x0, 
>>        {
>>          bio_aux = 0x0, 
>>          __UNIQUE_ID_rh_kabi_hide15 = {
>>            rh_reserved1 = 0x0
>>          }, 
>>          {<No data fields>}
>>        }, 
>>        bi_inline_vecs = 0xffff881e53f5dde8
>>      }, 
>>      rreq = {
>>        bi_sector = 0, 
>>        bi_next = 0x0, 
>>        bi_bdev = 0x0, 
>>        bi_flags = 1, 
>>        bi_rw = 0, 
>>        bi_vcnt = 0, 
>>        bi_idx = 0, 
>>        bi_phys_segments = 0, 
>>        bi_size = 0, 
>>        bi_seg_front_size = 0, 
>>        bi_seg_back_size = 0, 
>>        bi_end_io = 0x0, 
>>        bi_private = 0xffff881e53f5dca0, 
>>        bi_ioc = 0x0, 
>>        bi_css = 0x0, 
>>        bi_integrity = 0x0, 
>>        bi_max_vecs = 1, 
>>        bi_cnt = {
>>          counter = 1
>>        }, 
>>        bi_io_vec = 0xffff881e53f5de80, 
>>        bi_pool = 0x0, 
>>        {
>>          bio_aux = 0x0, 
>>          __UNIQUE_ID_rh_kabi_hide15 = {
>>            rh_reserved1 = 0x0
>>          }, 
>>          {<No data fields>}
>>        }, 
>>        bi_inline_vecs = 0xffff881e53f5de70
>>      }, 
>>      vec = {
>>        bv_page = 0xffffea00794fe000, 
>>        bv_len = 4096, 
>>        bv_offset = 0
>>      }, 
>>      rvec = {
>>        bv_page = 0x0, 
>>        bv_len = 0, 
>>        bv_offset = 0
>>      }, 
>>      page = 0xffffea00794fe000, 
>>      orig_page = 0xffffea00794fe000, 
>>      toread = 0x0, 
>>      read = 0x0, 
>>      towrite = 0x0, 
>>      written = 0x0, 
>>      sector = 0, 
>>      flags = 16, 
>>      log_checksum = 0
>>    }}
>> }
>> 
>> 
>> 
>>> Also stripe_head
>>> should include 10 dev for your 10-disk raid, but crash dump only show one, is
>>> it possible you can dump the info for the devs in the stripe?
>> 
>> Oh right, crash didn’t display it probably because it is an array allocated dynamically.
>> 
>> Let’s try this to print the 10 devs:
>> 
>> crash> p ((struct stripe_head)*0xffff881e53f5dca0).dev@10
>> $10 = 
>> {{{
>>      req = {
>>        bi_sector = 0, 
>>        bi_next = 0x0, 
>>        bi_bdev = 0x0, 
>>        bi_flags = 1, 
>>        bi_rw = 0, 
>>        bi_vcnt = 0, 
>>        bi_idx = 0, 
>>        bi_phys_segments = 0, 
>>        bi_size = 0, 
>>        bi_seg_front_size = 0, 
>>        bi_seg_back_size = 0, 
>>        bi_end_io = 0x0, 
>>        bi_private = 0xffff881e53f5dca0, 
>>        bi_ioc = 0x0, 
>>        bi_css = 0x0, 
>>        bi_integrity = 0x0, 
>>        bi_max_vecs = 1, 
>>        bi_cnt = {
>>          counter = 1
>>        }, 
>>        bi_io_vec = 0xffff881e53f5de70, 
>>        bi_pool = 0x0, 
>>        {
>>          bio_aux = 0x0, 
>>          __UNIQUE_ID_rh_kabi_hide15 = {
>>            rh_reserved1 = 0x0
>>          }, 
>>          {<No data fields>}
>>        }, 
>>        bi_inline_vecs = 0xffff881e53f5dde8
>>      }, 
>>      rreq = {
>>        bi_sector = 0, 
>>        bi_next = 0x0, 
>>        bi_bdev = 0x0, 
>>        bi_flags = 1, 
>>        bi_rw = 0, 
>>        bi_vcnt = 0, 
>>        bi_idx = 0, 
>>        bi_phys_segments = 0, 
>>        bi_size = 0, 
>>        bi_seg_front_size = 0, 
>>        bi_seg_back_size = 0, 
>>        bi_end_io = 0x0, 
>>        bi_private = 0xffff881e53f5dca0, 
>>        bi_ioc = 0x0, 
>>        bi_css = 0x0, 
>>        bi_integrity = 0x0, 
>>        bi_max_vecs = 1, 
>>        bi_cnt = {
>>          counter = 1
>>        }, 
>>        bi_io_vec = 0xffff881e53f5de80, 
>>        bi_pool = 0x0, 
>>        {
>>          bio_aux = 0x0, 
>>          __UNIQUE_ID_rh_kabi_hide15 = {
>>            rh_reserved1 = 0x0
>>          }, 
>>          {<No data fields>}
>>        }, 
>>        bi_inline_vecs = 0xffff881e53f5de70
>>      }, 
>>      vec = {
>>        bv_page = 0xffffea00794fe000, 
>>        bv_len = 4096, 
>>        bv_offset = 0
>>      }, 
>>      rvec = {
>>        bv_page = 0x0, 
>>        bv_len = 0, 
>>        bv_offset = 0
>>      }, 
>>      page = 0xffffea00794fe000, 
>>      orig_page = 0xffffea00794fe000, 
>>      toread = 0x0, 
>>      read = 0x0, 
>>      towrite = 0x0, 
>>      written = 0x0, 
>>      sector = 0, 
>>      flags = 16, 
>>      log_checksum = 0
>>    }}, {{
>>      req = {
>>        bi_sector = 0, 
>>        bi_next = 0x0, 
>>        bi_bdev = 0x0, 
>>        bi_flags = 1, 
>>        bi_rw = 0, 
>>        bi_vcnt = 0, 
>>        bi_idx = 0, 
>>        bi_phys_segments = 0, 
>>        bi_size = 0, 
>>        bi_seg_front_size = 0, 
>>        bi_seg_back_size = 0, 
>>        bi_end_io = 0x0, 
>>        bi_private = 0xffff881e53f5dca0, 
>>        bi_ioc = 0x0, 
>>        bi_css = 0x0, 
>>        bi_integrity = 0x0, 
>>        bi_max_vecs = 1, 
>>        bi_cnt = {
>>          counter = 1
>>        }, 
>>        bi_io_vec = 0xffff881e53f5dfe8, 
>>        bi_pool = 0x0, 
>>        {
>>          bio_aux = 0x0, 
>>          __UNIQUE_ID_rh_kabi_hide15 = {
>>            rh_reserved1 = 0x0
>>          }, 
>>          {<No data fields>}
>>        }, 
>>        bi_inline_vecs = 0xffff881e53f5df60
>>      }, 
>>      rreq = {
>>        bi_sector = 0, 
>>        bi_next = 0x0, 
>>        bi_bdev = 0x0, 
>>        bi_flags = 1, 
>>        bi_rw = 0, 
>>        bi_vcnt = 0, 
>>        bi_idx = 0, 
>>        bi_phys_segments = 0, 
>>        bi_size = 0, 
>>        bi_seg_front_size = 0, 
>>        bi_seg_back_size = 0, 
>>        bi_end_io = 0x0, 
>>        bi_private = 0xffff881e53f5dca0, 
>>        bi_ioc = 0x0, 
>>        bi_css = 0x0, 
>>        bi_integrity = 0x0, 
>>        bi_max_vecs = 1, 
>>        bi_cnt = {
>>          counter = 1
>>        }, 
>>        bi_io_vec = 0xffff881e53f5dff8, 
>>        bi_pool = 0x0, 
>>        {
>>          bio_aux = 0x0, 
>>          __UNIQUE_ID_rh_kabi_hide15 = {
>>            rh_reserved1 = 0x0
>>          }, 
>>          {<No data fields>}
>>        }, 
>>        bi_inline_vecs = 0xffff881e53f5dfe8
>>      }, 
>>      vec = {
>>        bv_page = 0xffffea00794fe040, 
>>        bv_len = 4096, 
>>        bv_offset = 0
>>      }, 
>>      rvec = {
>>        bv_page = 0x0, 
>>        bv_len = 0, 
>>        bv_offset = 0
>>      }, 
>>      page = 0xffffea00794fe040, 
>>      orig_page = 0xffffea00794fe040, 
>>      toread = 0x0, 
>>      read = 0x0, 
>>      towrite = 0x0, 
>>      written = 0xffff881fea351800, 
>>      sector = 18230128640, 
>>      flags = 24, 
>>      log_checksum = 0
>>    }}, {{
>>      req = {
>>        bi_sector = 0, 
>>        bi_next = 0x0, 
>>        bi_bdev = 0x0, 
>>        bi_flags = 1, 
>>        bi_rw = 0, 
>>        bi_vcnt = 0, 
>>        bi_idx = 0, 
>>        bi_phys_segments = 0, 
>>        bi_size = 0, 
>>        bi_seg_front_size = 0, 
>>        bi_seg_back_size = 0, 
>>        bi_end_io = 0x0, 
>>        bi_private = 0xffff881e53f5dca0, 
>>        bi_ioc = 0x0, 
>>        bi_css = 0x0, 
>>        bi_integrity = 0x0, 
>>        bi_max_vecs = 1, 
>>        bi_cnt = {
>>          counter = 1
>>        }, 
>>        bi_io_vec = 0xffff881e53f5e160, 
>>        bi_pool = 0x0, 
>>        {
>>          bio_aux = 0x0, 
>>          __UNIQUE_ID_rh_kabi_hide15 = {
>>            rh_reserved1 = 0x0
>>          }, 
>>          {<No data fields>}
>>        }, 
>>        bi_inline_vecs = 0xffff881e53f5e0d8
>>      }, 
>>      rreq = {
>>        bi_sector = 0, 
>>        bi_next = 0x0, 
>>        bi_bdev = 0x0, 
>>        bi_flags = 1, 
>>        bi_rw = 0, 
>>        bi_vcnt = 0, 
>>        bi_idx = 0, 
>>        bi_phys_segments = 0, 
>>        bi_size = 0, 
>>        bi_seg_front_size = 0, 
>>        bi_seg_back_size = 0, 
>>        bi_end_io = 0x0, 
>>        bi_private = 0xffff881e53f5dca0, 
>>        bi_ioc = 0x0, 
>>        bi_css = 0x0, 
>>        bi_integrity = 0x0, 
>>        bi_max_vecs = 1, 
>>        bi_cnt = {
>>          counter = 1
>>        }, 
>>        bi_io_vec = 0xffff881e53f5e170, 
>>        bi_pool = 0x0, 
>>        {
>>          bio_aux = 0x0, 
>>          __UNIQUE_ID_rh_kabi_hide15 = {
>>            rh_reserved1 = 0x0
>>          }, 
>>          {<No data fields>}
>>        }, 
>>        bi_inline_vecs = 0xffff881e53f5e160
>>      }, 
>>      vec = {
>>        bv_page = 0xffffea00794fe080, 
>>        bv_len = 4096, 
>>        bv_offset = 0
>>      }, 
>>      rvec = {
>>        bv_page = 0x0, 
>>        bv_len = 0, 
>>        bv_offset = 0
>>      }, 
>>      page = 0xffffea00794fe080, 
>>      orig_page = 0xffffea00794fe080, 
>>      toread = 0x0, 
>>      read = 0x0, 
>>      towrite = 0x0, 
>>      written = 0xffff881fea351800, 
>>      sector = 18230128768, 
>>      flags = 24, 
>>      log_checksum = 0
>>    }}, {{
>>      req = {
>>        bi_sector = 0, 
>>        bi_next = 0x0, 
>>        bi_bdev = 0x0, 
>>        bi_flags = 1, 
>>        bi_rw = 0, 
>>        bi_vcnt = 0, 
>>        bi_idx = 0, 
>>        bi_phys_segments = 0, 
>>        bi_size = 0, 
>>        bi_seg_front_size = 0, 
>>        bi_seg_back_size = 0, 
>>        bi_end_io = 0x0, 
>>        bi_private = 0xffff881e53f5dca0, 
>>        bi_ioc = 0x0, 
>>        bi_css = 0x0, 
>>        bi_integrity = 0x0, 
>>        bi_max_vecs = 1, 
>>        bi_cnt = {
>>          counter = 1
>>        }, 
>>        bi_io_vec = 0xffff881e53f5e2d8, 
>>        bi_pool = 0x0, 
>>        {
>>          bio_aux = 0x0, 
>>          __UNIQUE_ID_rh_kabi_hide15 = {
>>            rh_reserved1 = 0x0
>>          }, 
>>          {<No data fields>}
>>        }, 
>>        bi_inline_vecs = 0xffff881e53f5e250
>>      }, 
>>      rreq = {
>>        bi_sector = 0, 
>>        bi_next = 0x0, 
>>        bi_bdev = 0x0, 
>>        bi_flags = 1, 
>>        bi_rw = 0, 
>>        bi_vcnt = 0, 
>>        bi_idx = 0, 
>>        bi_phys_segments = 0, 
>>        bi_size = 0, 
>>        bi_seg_front_size = 0, 
>>        bi_seg_back_size = 0, 
>>        bi_end_io = 0x0, 
>>        bi_private = 0xffff881e53f5dca0, 
>>        bi_ioc = 0x0, 
>>        bi_css = 0x0, 
>>        bi_integrity = 0x0, 
>>        bi_max_vecs = 1, 
>>        bi_cnt = {
>>          counter = 1
>>        }, 
>>        bi_io_vec = 0xffff881e53f5e2e8, 
>>        bi_pool = 0x0, 
>>        {
>>          bio_aux = 0x0, 
>>          __UNIQUE_ID_rh_kabi_hide15 = {
>>            rh_reserved1 = 0x0
>>          }, 
>>          {<No data fields>}
>>        }, 
>>        bi_inline_vecs = 0xffff881e53f5e2d8
>>      }, 
>>      vec = {
>>        bv_page = 0xffffea00794fe0c0, 
>>        bv_len = 4096, 
>>        bv_offset = 0
>>      }, 
>>      rvec = {
>>        bv_page = 0x0, 
>>        bv_len = 0, 
>>        bv_offset = 0
>>      }, 
>>      page = 0xffffea00794fe0c0, 
>>      orig_page = 0xffffea00794fe0c0, 
>>      toread = 0x0, 
>>      read = 0x0, 
>>      towrite = 0x0, 
>>      written = 0xffff881fea351800, 
>>      sector = 18230128896, 
>>      flags = 24, 
>>      log_checksum = 0
>>    }}, {{
>>      req = {
>>        bi_sector = 0, 
>>        bi_next = 0x0, 
>>        bi_bdev = 0x0, 
>>        bi_flags = 1, 
>>        bi_rw = 0, 
>>        bi_vcnt = 0, 
>>        bi_idx = 0, 
>>        bi_phys_segments = 0, 
>>        bi_size = 0, 
>>        bi_seg_front_size = 0, 
>>        bi_seg_back_size = 0, 
>>        bi_end_io = 0x0, 
>>        bi_private = 0xffff881e53f5dca0, 
>>        bi_ioc = 0x0, 
>>        bi_css = 0x0, 
>>        bi_integrity = 0x0, 
>>        bi_max_vecs = 1, 
>>        bi_cnt = {
>>          counter = 1
>>        }, 
>>        bi_io_vec = 0xffff881e53f5e450, 
>>        bi_pool = 0x0, 
>>        {
>>          bio_aux = 0x0, 
>>          __UNIQUE_ID_rh_kabi_hide15 = {
>>            rh_reserved1 = 0x0
>>          }, 
>>          {<No data fields>}
>>        }, 
>>        bi_inline_vecs = 0xffff881e53f5e3c8
>>      }, 
>>      rreq = {
>>        bi_sector = 0, 
>>        bi_next = 0x0, 
>>        bi_bdev = 0x0, 
>>        bi_flags = 1, 
>>        bi_rw = 0, 
>>        bi_vcnt = 0, 
>>        bi_idx = 0, 
>>        bi_phys_segments = 0, 
>>        bi_size = 0, 
>>        bi_seg_front_size = 0, 
>>        bi_seg_back_size = 0, 
>>        bi_end_io = 0x0, 
>>        bi_private = 0xffff881e53f5dca0, 
>>        bi_ioc = 0x0, 
>>        bi_css = 0x0, 
>>        bi_integrity = 0x0, 
>>        bi_max_vecs = 1, 
>>        bi_cnt = {
>>          counter = 1
>>        }, 
>>        bi_io_vec = 0xffff881e53f5e460, 
>>        bi_pool = 0x0, 
>>        {
>>          bio_aux = 0x0, 
>>          __UNIQUE_ID_rh_kabi_hide15 = {
>>            rh_reserved1 = 0x0
>>          }, 
>>          {<No data fields>}
>>        }, 
>>        bi_inline_vecs = 0xffff881e53f5e450
>>      }, 
>>      vec = {
>>        bv_page = 0xffffea00794fe100, 
>>        bv_len = 4096, 
>>        bv_offset = 0
>>      }, 
>>      rvec = {
>>        bv_page = 0x0, 
>>        bv_len = 0, 
>>        bv_offset = 0
>>      }, 
>>      page = 0xffffea00794fe100, 
>>      orig_page = 0xffffea00794fe100, 
>>      toread = 0x0, 
>>      read = 0x0, 
>>      towrite = 0x0, 
>>      written = 0xffff881fea351800, 
>>      sector = 18230129024, 
>>      flags = 24, 
>>      log_checksum = 0
>>    }}, {{
>>      req = {
>>        bi_sector = 0, 
>>        bi_next = 0x0, 
>>        bi_bdev = 0x0, 
>>        bi_flags = 1, 
>>        bi_rw = 0, 
>>        bi_vcnt = 0, 
>>        bi_idx = 0, 
>>        bi_phys_segments = 0, 
>>        bi_size = 0, 
>>        bi_seg_front_size = 0, 
>>        bi_seg_back_size = 0, 
>>        bi_end_io = 0x0, 
>>        bi_private = 0xffff881e53f5dca0, 
>>        bi_ioc = 0x0, 
>>        bi_css = 0x0, 
>>        bi_integrity = 0x0, 
>>        bi_max_vecs = 1, 
>>        bi_cnt = {
>>          counter = 1
>>        }, 
>>        bi_io_vec = 0xffff881e53f5e5c8, 
>>        bi_pool = 0x0, 
>>        {
>>          bio_aux = 0x0, 
>>          __UNIQUE_ID_rh_kabi_hide15 = {
>>            rh_reserved1 = 0x0
>>          }, 
>>          {<No data fields>}
>>        }, 
>>        bi_inline_vecs = 0xffff881e53f5e540
>>      }, 
>>      rreq = {
>>        bi_sector = 0, 
>>        bi_next = 0x0, 
>>        bi_bdev = 0x0, 
>>        bi_flags = 1, 
>>        bi_rw = 0, 
>>        bi_vcnt = 0, 
>>        bi_idx = 0, 
>>        bi_phys_segments = 0, 
>>        bi_size = 0, 
>>        bi_seg_front_size = 0, 
>>        bi_seg_back_size = 0, 
>>        bi_end_io = 0x0, 
>>        bi_private = 0xffff881e53f5dca0, 
>>        bi_ioc = 0x0, 
>>        bi_css = 0x0, 
>>        bi_integrity = 0x0, 
>>        bi_max_vecs = 1, 
>>        bi_cnt = {
>>          counter = 1
>>        }, 
>>        bi_io_vec = 0xffff881e53f5e5d8, 
>>        bi_pool = 0x0, 
>>        {
>>          bio_aux = 0x0, 
>>          __UNIQUE_ID_rh_kabi_hide15 = {
>>            rh_reserved1 = 0x0
>>          }, 
>>          {<No data fields>}
>>        }, 
>>        bi_inline_vecs = 0xffff881e53f5e5c8
>>      }, 
>>      vec = {
>>        bv_page = 0xffffea00794fe140, 
>>        bv_len = 4096, 
>>        bv_offset = 0
>>      }, 
>>      rvec = {
>>        bv_page = 0x0, 
>>        bv_len = 0, 
>>        bv_offset = 0
>>      }, 
>>      page = 0xffffea00794fe140, 
>>      orig_page = 0xffffea00794fe140, 
>>      toread = 0x0, 
>>      read = 0x0, 
>>      towrite = 0x0, 
>>      written = 0xffff881fea351800, 
>>      sector = 18230129152, 
>>      flags = 24, 
>>      log_checksum = 0
>>    }}, {{
>>      req = {
>>        bi_sector = 0, 
>>        bi_next = 0x0, 
>>        bi_bdev = 0x0, 
>>        bi_flags = 1, 
>>        bi_rw = 0, 
>>        bi_vcnt = 0, 
>>        bi_idx = 0, 
>>        bi_phys_segments = 0, 
>>        bi_size = 0, 
>>        bi_seg_front_size = 0, 
>>        bi_seg_back_size = 0, 
>>        bi_end_io = 0x0, 
>>        bi_private = 0xffff881e53f5dca0, 
>>        bi_ioc = 0x0, 
>>        bi_css = 0x0, 
>>        bi_integrity = 0x0, 
>>        bi_max_vecs = 1, 
>>        bi_cnt = {
>>          counter = 1
>>        }, 
>>        bi_io_vec = 0xffff881e53f5e740, 
>>        bi_pool = 0x0, 
>>        {
>>          bio_aux = 0x0, 
>>          __UNIQUE_ID_rh_kabi_hide15 = {
>>            rh_reserved1 = 0x0
>>          }, 
>>          {<No data fields>}
>>        }, 
>>        bi_inline_vecs = 0xffff881e53f5e6b8
>>      }, 
>>      rreq = {
>>        bi_sector = 0, 
>>        bi_next = 0x0, 
>>        bi_bdev = 0x0, 
>>        bi_flags = 1, 
>>        bi_rw = 0, 
>>        bi_vcnt = 0, 
>>        bi_idx = 0, 
>>        bi_phys_segments = 0, 
>>        bi_size = 0, 
>>        bi_seg_front_size = 0, 
>>        bi_seg_back_size = 0, 
>>        bi_end_io = 0x0, 
>>        bi_private = 0xffff881e53f5dca0, 
>>        bi_ioc = 0x0, 
>>        bi_css = 0x0, 
>>        bi_integrity = 0x0, 
>>        bi_max_vecs = 1, 
>>        bi_cnt = {
>>          counter = 1
>>        }, 
>>        bi_io_vec = 0xffff881e53f5e750, 
>>        bi_pool = 0x0, 
>>        {
>>          bio_aux = 0x0, 
>>          __UNIQUE_ID_rh_kabi_hide15 = {
>>            rh_reserved1 = 0x0
>>          }, 
>>          {<No data fields>}
>>        }, 
>>        bi_inline_vecs = 0xffff881e53f5e740
>>      }, 
>>      vec = {
>>        bv_page = 0xffffea00794fe180, 
>>        bv_len = 4096, 
>>        bv_offset = 0
>>      }, 
>>      rvec = {
>>        bv_page = 0x0, 
>>        bv_len = 0, 
>>        bv_offset = 0
>>      }, 
>>      page = 0xffffea00794fe180, 
>>      orig_page = 0xffffea00794fe180, 
>>      toread = 0x0, 
>>      read = 0x0, 
>>      towrite = 0x0, 
>>      written = 0xffff881fea351800, 
>>      sector = 18230129280, 
>>      flags = 24, 
>>      log_checksum = 0
>>    }}, {{
>>      req = {
>>        bi_sector = 0, 
>>        bi_next = 0x0, 
>>        bi_bdev = 0x0, 
>>        bi_flags = 1, 
>>        bi_rw = 0, 
>>        bi_vcnt = 0, 
>>        bi_idx = 0, 
>>        bi_phys_segments = 0, 
>>        bi_size = 0, 
>>        bi_seg_front_size = 0, 
>>        bi_seg_back_size = 0, 
>>        bi_end_io = 0x0, 
>>        bi_private = 0xffff881e53f5dca0, 
>>        bi_ioc = 0x0, 
>>        bi_css = 0x0, 
>>        bi_integrity = 0x0, 
>>        bi_max_vecs = 1, 
>>        bi_cnt = {
>>          counter = 1
>>        }, 
>>        bi_io_vec = 0xffff881e53f5e8b8, 
>>        bi_pool = 0x0, 
>>        {
>>          bio_aux = 0x0, 
>>          __UNIQUE_ID_rh_kabi_hide15 = {
>>            rh_reserved1 = 0x0
>>          }, 
>>          {<No data fields>}
>>        }, 
>>        bi_inline_vecs = 0xffff881e53f5e830
>>      }, 
>>      rreq = {
>>        bi_sector = 0, 
>>        bi_next = 0x0, 
>>        bi_bdev = 0x0, 
>>        bi_flags = 1, 
>>        bi_rw = 0, 
>>        bi_vcnt = 0, 
>>        bi_idx = 0, 
>>        bi_phys_segments = 0, 
>>        bi_size = 0, 
>>        bi_seg_front_size = 0, 
>>        bi_seg_back_size = 0, 
>>        bi_end_io = 0x0, 
>>        bi_private = 0xffff881e53f5dca0, 
>>        bi_ioc = 0x0, 
>>        bi_css = 0x0, 
>>        bi_integrity = 0x0, 
>>        bi_max_vecs = 1, 
>>        bi_cnt = {
>>          counter = 1
>>        }, 
>>        bi_io_vec = 0xffff881e53f5e8c8, 
>>        bi_pool = 0x0, 
>>        {
>>          bio_aux = 0x0, 
>>          __UNIQUE_ID_rh_kabi_hide15 = {
>>            rh_reserved1 = 0x0
>>          }, 
>>          {<No data fields>}
>>        }, 
>>        bi_inline_vecs = 0xffff881e53f5e8b8
>>      }, 
>>      vec = {
>>        bv_page = 0xffffea00794fe1c0, 
>>        bv_len = 4096, 
>>        bv_offset = 0
>>      }, 
>>      rvec = {
>>        bv_page = 0x0, 
>>        bv_len = 0, 
>>        bv_offset = 0
>>      }, 
>>      page = 0xffffea00794fe1c0, 
>>      orig_page = 0xffffea00794fe1c0, 
>>      toread = 0x0, 
>>      read = 0x0, 
>>      towrite = 0x0, 
>>      written = 0xffff881fea351800, 
>>      sector = 18230129408, 
>>      flags = 24, 
>>      log_checksum = 0
>>    }}, {{
>>      req = {
>>        bi_sector = 0, 
>>        bi_next = 0x0, 
>>        bi_bdev = 0x0, 
>>        bi_flags = 1, 
>>        bi_rw = 0, 
>>        bi_vcnt = 0, 
>>        bi_idx = 0, 
>>        bi_phys_segments = 0, 
>>        bi_size = 0, 
>>        bi_seg_front_size = 0, 
>>        bi_seg_back_size = 0, 
>>        bi_end_io = 0x0, 
>>        bi_private = 0xffff881e53f5dca0, 
>>        bi_ioc = 0x0, 
>>        bi_css = 0x0, 
>>        bi_integrity = 0x0, 
>>        bi_max_vecs = 1, 
>>        bi_cnt = {
>>          counter = 1
>>        }, 
>>        bi_io_vec = 0xffff881e53f5ea30, 
>>        bi_pool = 0x0, 
>>        {
>>          bio_aux = 0x0, 
>>          __UNIQUE_ID_rh_kabi_hide15 = {
>>            rh_reserved1 = 0x0
>>          }, 
>>          {<No data fields>}
>>        }, 
>>        bi_inline_vecs = 0xffff881e53f5e9a8
>>      }, 
>>      rreq = {
>>        bi_sector = 0, 
>>        bi_next = 0x0, 
>>        bi_bdev = 0x0, 
>>        bi_flags = 1, 
>>        bi_rw = 0, 
>>        bi_vcnt = 0, 
>>        bi_idx = 0, 
>>        bi_phys_segments = 0, 
>>        bi_size = 0, 
>>        bi_seg_front_size = 0, 
>>        bi_seg_back_size = 0, 
>>        bi_end_io = 0x0, 
>>        bi_private = 0xffff881e53f5dca0, 
>>        bi_ioc = 0x0, 
>>        bi_css = 0x0, 
>>        bi_integrity = 0x0, 
>>        bi_max_vecs = 1, 
>>        bi_cnt = {
>>          counter = 1
>>        }, 
>>        bi_io_vec = 0xffff881e53f5ea40, 
>>        bi_pool = 0x0, 
>>        {
>>          bio_aux = 0x0, 
>>          __UNIQUE_ID_rh_kabi_hide15 = {
>>            rh_reserved1 = 0x0
>>          }, 
>>          {<No data fields>}
>>        }, 
>>        bi_inline_vecs = 0xffff881e53f5ea30
>>      }, 
>>      vec = {
>>        bv_page = 0xffffea00794fe200, 
>>        bv_len = 4096, 
>>        bv_offset = 0
>>      }, 
>>      rvec = {
>>        bv_page = 0x0, 
>>        bv_len = 0, 
>>        bv_offset = 0
>>      }, 
>>      page = 0xffffea00794fe200, 
>>      orig_page = 0xffffea00794fe200, 
>>      toread = 0x0, 
>>      read = 0x0, 
>>      towrite = 0x0, 
>>      written = 0xffff881fea351800, 
>>      sector = 18230129536, 
>>      flags = 24, 
>>      log_checksum = 0
>>    }}, {{
>>      req = {
>>        bi_sector = 0, 
>>        bi_next = 0x0, 
>>        bi_bdev = 0x0, 
>>        bi_flags = 1, 
>>        bi_rw = 0, 
>>        bi_vcnt = 0, 
>>        bi_idx = 0, 
>>        bi_phys_segments = 0, 
>>        bi_size = 0, 
>>        bi_seg_front_size = 0, 
>>        bi_seg_back_size = 0, 
>>        bi_end_io = 0x0, 
>>        bi_private = 0xffff881e53f5dca0, 
>>        bi_ioc = 0x0, 
>>        bi_css = 0x0, 
>>        bi_integrity = 0x0, 
>>        bi_max_vecs = 1, 
>>        bi_cnt = {
>>          counter = 1
>>        }, 
>>        bi_io_vec = 0xffff881e53f5eba8, 
>>        bi_pool = 0x0, 
>>        {
>>          bio_aux = 0x0, 
>>          __UNIQUE_ID_rh_kabi_hide15 = {
>>            rh_reserved1 = 0x0
>>          }, 
>>          {<No data fields>}
>>        }, 
>>        bi_inline_vecs = 0xffff881e53f5eb20
>>      }, 
>>      rreq = {
>>        bi_sector = 0, 
>>        bi_next = 0x0, 
>>        bi_bdev = 0x0, 
>>        bi_flags = 1, 
>>        bi_rw = 0, 
>>        bi_vcnt = 0, 
>>        bi_idx = 0, 
>>        bi_phys_segments = 0, 
>>        bi_size = 0, 
>>        bi_seg_front_size = 0, 
>>        bi_seg_back_size = 0, 
>>        bi_end_io = 0x0, 
>>        bi_private = 0xffff881e53f5dca0, 
>>        bi_ioc = 0x0, 
>>        bi_css = 0x0, 
>>        bi_integrity = 0x0, 
>>        bi_max_vecs = 1, 
>>        bi_cnt = {
>>          counter = 1
>>        }, 
>>        bi_io_vec = 0xffff881e53f5ebb8, 
>>        bi_pool = 0x0, 
>>        {
>>          bio_aux = 0x0, 
>>          __UNIQUE_ID_rh_kabi_hide15 = {
>>            rh_reserved1 = 0x0
>>          }, 
>>          {<No data fields>}
>>        }, 
>>        bi_inline_vecs = 0xffff881e53f5eba8
>>      }, 
>>      vec = {
>>        bv_page = 0xffffea00794fe240, 
>>        bv_len = 4096, 
>>        bv_offset = 0
>>      }, 
>>      rvec = {
>>        bv_page = 0x0, 
>>        bv_len = 0, 
>>        bv_offset = 0
>>      }, 
>>      page = 0xffffea00794fe240, 
>>      orig_page = 0xffffea00794fe240, 
>>      toread = 0x0, 
>>      read = 0x0, 
>>      towrite = 0x0, 
>>      written = 0x0, 
>>      sector = 0, 
>>      flags = 16, 
>>      log_checksum = 0
>>    }}}
>> 
>> 
>> ---------------------------------------------------
>> 
>> And this is the stripe_head devs from the crash dump I sent yesterday:
>> 
>> R13: ffff881e08d54d30
>> 
>> crash> p ((struct stripe_head)*0xffff881e08d54d30).dev@10
>> $1 = 
>> {{{
>>      req = {
>>        bi_sector = 0, 
>>        bi_next = 0x0, 
>>        bi_bdev = 0x0, 
>>        bi_flags = 1, 
>>        bi_rw = 0, 
>>        bi_vcnt = 0, 
>>        bi_idx = 0, 
>>        bi_phys_segments = 0, 
>>        bi_size = 0, 
>>        bi_seg_front_size = 0, 
>>        bi_seg_back_size = 0, 
>>        bi_end_io = 0x0, 
>>        bi_private = 0xffff881e08d54d30, 
>>        bi_ioc = 0x0, 
>>        bi_css = 0x0, 
>>        bi_integrity = 0x0, 
>>        bi_max_vecs = 1, 
>>        bi_cnt = {
>>          counter = 1
>>        }, 
>>        bi_io_vec = 0xffff881e08d54f00, 
>>        bi_pool = 0x0, 
>>        {
>>          bio_aux = 0x0, 
>>          __UNIQUE_ID_rh_kabi_hide15 = {
>>            rh_reserved1 = 0x0
>>          }, 
>>          {<No data fields>}
>>        }, 
>>        bi_inline_vecs = 0xffff881e08d54e78
>>      }, 
>>      rreq = {
>>        bi_sector = 0, 
>>        bi_next = 0x0, 
>>        bi_bdev = 0x0, 
>>        bi_flags = 1, 
>>        bi_rw = 0, 
>>        bi_vcnt = 0, 
>>        bi_idx = 0, 
>>        bi_phys_segments = 0, 
>>        bi_size = 0, 
>>        bi_seg_front_size = 0, 
>>        bi_seg_back_size = 0, 
>>        bi_end_io = 0x0, 
>>        bi_private = 0xffff881e08d54d30, 
>>        bi_ioc = 0x0, 
>>        bi_css = 0x0, 
>>        bi_integrity = 0x0, 
>>        bi_max_vecs = 1, 
>>        bi_cnt = {
>>          counter = 1
>>        }, 
>>        bi_io_vec = 0xffff881e08d54f10, 
>>        bi_pool = 0x0, 
>>        {
>>          bio_aux = 0x0, 
>>          __UNIQUE_ID_rh_kabi_hide15 = {
>>            rh_reserved1 = 0x0
>>          }, 
>>          {<No data fields>}
>>        }, 
>>        bi_inline_vecs = 0xffff881e08d54f00
>>      }, 
>>      vec = {
>>        bv_page = 0xffffea0078236080, 
>>        bv_len = 4096, 
>>        bv_offset = 0
>>      }, 
>>      rvec = {
>>        bv_page = 0x0, 
>>        bv_len = 0, 
>>        bv_offset = 0
>>      }, 
>>      page = 0xffffea0078236080, 
>>      orig_page = 0xffffea0078236080, 
>>      toread = 0x0, 
>>      read = 0x0, 
>>      towrite = 0x0, 
>>      written = 0xffff8824832a7c00, 
>>      sector = 29609798632, 
>>      flags = 24, 
>>      log_checksum = 0
>>    }}, {{
>>      req = {
>>        bi_sector = 0, 
>>        bi_next = 0x0, 
>>        bi_bdev = 0x0, 
>>        bi_flags = 1, 
>>        bi_rw = 0, 
>>        bi_vcnt = 0, 
>>        bi_idx = 0, 
>>        bi_phys_segments = 0, 
>>        bi_size = 0, 
>>        bi_seg_front_size = 0, 
>>        bi_seg_back_size = 0, 
>>        bi_end_io = 0x0, 
>>        bi_private = 0xffff881e08d54d30, 
>>        bi_ioc = 0x0, 
>>        bi_css = 0x0, 
>>        bi_integrity = 0x0, 
>>        bi_max_vecs = 1, 
>>        bi_cnt = {
>>          counter = 1
>>        }, 
>>        bi_io_vec = 0xffff881e08d55078, 
>>        bi_pool = 0x0, 
>>        {
>>          bio_aux = 0x0, 
>>          __UNIQUE_ID_rh_kabi_hide15 = {
>>            rh_reserved1 = 0x0
>>          }, 
>>          {<No data fields>}
>>        }, 
>>        bi_inline_vecs = 0xffff881e08d54ff0
>>      }, 
>>      rreq = {
>>        bi_sector = 0, 
>>        bi_next = 0x0, 
>>        bi_bdev = 0x0, 
>>        bi_flags = 1, 
>>        bi_rw = 0, 
>>        bi_vcnt = 0, 
>>        bi_idx = 0, 
>>        bi_phys_segments = 0, 
>>        bi_size = 0, 
>>        bi_seg_front_size = 0, 
>>        bi_seg_back_size = 0, 
>>        bi_end_io = 0x0, 
>>        bi_private = 0xffff881e08d54d30, 
>>        bi_ioc = 0x0, 
>>        bi_css = 0x0, 
>>        bi_integrity = 0x0, 
>>        bi_max_vecs = 1, 
>>        bi_cnt = {
>>          counter = 1
>>        }, 
>>        bi_io_vec = 0xffff881e08d55088, 
>>        bi_pool = 0x0, 
>>        {
>>          bio_aux = 0x0, 
>>          __UNIQUE_ID_rh_kabi_hide15 = {
>>            rh_reserved1 = 0x0
>>          }, 
>>          {<No data fields>}
>>        }, 
>>        bi_inline_vecs = 0xffff881e08d55078
>>      }, 
>>      vec = {
>>        bv_page = 0xffffea00782360c0, 
>>        bv_len = 4096, 
>>        bv_offset = 0
>>      }, 
>>      rvec = {
>>        bv_page = 0x0, 
>>        bv_len = 0, 
>>        bv_offset = 0
>>      }, 
>>      page = 0xffffea00782360c0, 
>>      orig_page = 0xffffea00782360c0, 
>>      toread = 0x0, 
>>      read = 0x0, 
>>      towrite = 0x0, 
>>      written = 0x0, 
>>      sector = 0, 
>>      flags = 16, 
>>      log_checksum = 0
>>    }}, {{
>>      req = {
>>        bi_sector = 0, 
>>        bi_next = 0x0, 
>>        bi_bdev = 0x0, 
>>        bi_flags = 1, 
>>        bi_rw = 0, 
>>        bi_vcnt = 0, 
>>        bi_idx = 0, 
>>        bi_phys_segments = 0, 
>>        bi_size = 0, 
>>        bi_seg_front_size = 0, 
>>        bi_seg_back_size = 0, 
>>        bi_end_io = 0x0, 
>>        bi_private = 0xffff881e08d54d30, 
>>        bi_ioc = 0x0, 
>>        bi_css = 0x0, 
>>        bi_integrity = 0x0, 
>>        bi_max_vecs = 1, 
>>        bi_cnt = {
>>          counter = 1
>>        }, 
>>        bi_io_vec = 0xffff881e08d551f0, 
>>        bi_pool = 0x0, 
>>        {
>>          bio_aux = 0x0, 
>>          __UNIQUE_ID_rh_kabi_hide15 = {
>>            rh_reserved1 = 0x0
>>          }, 
>>          {<No data fields>}
>>        }, 
>>        bi_inline_vecs = 0xffff881e08d55168
>>      }, 
>>      rreq = {
>>        bi_sector = 0, 
>>        bi_next = 0x0, 
>>        bi_bdev = 0x0, 
>>        bi_flags = 1, 
>>        bi_rw = 0, 
>>        bi_vcnt = 0, 
>>        bi_idx = 0, 
>>        bi_phys_segments = 0, 
>>        bi_size = 0, 
>>        bi_seg_front_size = 0, 
>>        bi_seg_back_size = 0, 
>>        bi_end_io = 0x0, 
>>        bi_private = 0xffff881e08d54d30, 
>>        bi_ioc = 0x0, 
>>        bi_css = 0x0, 
>>        bi_integrity = 0x0, 
>>        bi_max_vecs = 1, 
>>        bi_cnt = {
>>          counter = 1
>>        }, 
>>        bi_io_vec = 0xffff881e08d55200, 
>>        bi_pool = 0x0, 
>>        {
>>          bio_aux = 0x0, 
>>          __UNIQUE_ID_rh_kabi_hide15 = {
>>            rh_reserved1 = 0x0
>>          }, 
>>          {<No data fields>}
>>        }, 
>>        bi_inline_vecs = 0xffff881e08d551f0
>>      }, 
>>      vec = {
>>        bv_page = 0xffffea0078236100, 
>>        bv_len = 4096, 
>>        bv_offset = 0
>>      }, 
>>      rvec = {
>>        bv_page = 0x0, 
>>        bv_len = 0, 
>>        bv_offset = 0
>>      }, 
>>      page = 0xffffea0078236100, 
>>      orig_page = 0xffffea0078236100, 
>>      toread = 0x0, 
>>      read = 0x0, 
>>      towrite = 0x0, 
>>      written = 0x0, 
>>      sector = 0, 
>>      flags = 16, 
>>      log_checksum = 0
>>    }}, {{
>>      req = {
>>        bi_sector = 0, 
>>        bi_next = 0x0, 
>>        bi_bdev = 0x0, 
>>        bi_flags = 1, 
>>        bi_rw = 0, 
>>        bi_vcnt = 0, 
>>        bi_idx = 0, 
>>        bi_phys_segments = 0, 
>>        bi_size = 0, 
>>        bi_seg_front_size = 0, 
>>        bi_seg_back_size = 0, 
>>        bi_end_io = 0x0, 
>>        bi_private = 0xffff881e08d54d30, 
>>        bi_ioc = 0x0, 
>>        bi_css = 0x0, 
>>        bi_integrity = 0x0, 
>>        bi_max_vecs = 1, 
>>        bi_cnt = {
>>          counter = 1
>>        }, 
>>        bi_io_vec = 0xffff881e08d55368, 
>>        bi_pool = 0x0, 
>>        {
>>          bio_aux = 0x0, 
>>          __UNIQUE_ID_rh_kabi_hide15 = {
>>            rh_reserved1 = 0x0
>>          }, 
>>          {<No data fields>}
>>        }, 
>>        bi_inline_vecs = 0xffff881e08d552e0
>>      }, 
>>      rreq = {
>>        bi_sector = 0, 
>>        bi_next = 0x0, 
>>        bi_bdev = 0x0, 
>>        bi_flags = 1, 
>>        bi_rw = 0, 
>>        bi_vcnt = 0, 
>>        bi_idx = 0, 
>>        bi_phys_segments = 0, 
>>        bi_size = 0, 
>>        bi_seg_front_size = 0, 
>>        bi_seg_back_size = 0, 
>>        bi_end_io = 0x0, 
>>        bi_private = 0xffff881e08d54d30, 
>>        bi_ioc = 0x0, 
>>        bi_css = 0x0, 
>>        bi_integrity = 0x0, 
>>        bi_max_vecs = 1, 
>>        bi_cnt = {
>>          counter = 1
>>        }, 
>>        bi_io_vec = 0xffff881e08d55378, 
>>        bi_pool = 0x0, 
>>        {
>>          bio_aux = 0x0, 
>>          __UNIQUE_ID_rh_kabi_hide15 = {
>>            rh_reserved1 = 0x0
>>          }, 
>>          {<No data fields>}
>>        }, 
>>        bi_inline_vecs = 0xffff881e08d55368
>>      }, 
>>      vec = {
>>        bv_page = 0xffffea0078236140, 
>>        bv_len = 4096, 
>>        bv_offset = 0
>>      }, 
>>      rvec = {
>>        bv_page = 0x0, 
>>        bv_len = 0, 
>>        bv_offset = 0
>>      }, 
>>      page = 0xffffea0078236140, 
>>      orig_page = 0xffffea0078236140, 
>>      toread = 0x0, 
>>      read = 0x0, 
>>      towrite = 0x0, 
>>      written = 0xffff8824832a7c00, 
>>      sector = 29609797736, 
>>      flags = 24, 
>>      log_checksum = 0
>>    }}, {{
>>      req = {
>>        bi_sector = 0, 
>>        bi_next = 0x0, 
>>        bi_bdev = 0x0, 
>>        bi_flags = 1, 
>>        bi_rw = 0, 
>>        bi_vcnt = 0, 
>>        bi_idx = 0, 
>>        bi_phys_segments = 0, 
>>        bi_size = 0, 
>>        bi_seg_front_size = 0, 
>>        bi_seg_back_size = 0, 
>>        bi_end_io = 0x0, 
>>        bi_private = 0xffff881e08d54d30, 
>>        bi_ioc = 0x0, 
>>        bi_css = 0x0, 
>>        bi_integrity = 0x0, 
>>        bi_max_vecs = 1, 
>>        bi_cnt = {
>>          counter = 1
>>        }, 
>>        bi_io_vec = 0xffff881e08d554e0, 
>>        bi_pool = 0x0, 
>>        {
>>          bio_aux = 0x0, 
>>          __UNIQUE_ID_rh_kabi_hide15 = {
>>            rh_reserved1 = 0x0
>>          }, 
>>          {<No data fields>}
>>        }, 
>>        bi_inline_vecs = 0xffff881e08d55458
>>      }, 
>>      rreq = {
>>        bi_sector = 0, 
>>        bi_next = 0x0, 
>>        bi_bdev = 0x0, 
>>        bi_flags = 1, 
>>        bi_rw = 0, 
>>        bi_vcnt = 0, 
>>        bi_idx = 0, 
>>        bi_phys_segments = 0, 
>>        bi_size = 0, 
>>        bi_seg_front_size = 0, 
>>        bi_seg_back_size = 0, 
>>        bi_end_io = 0x0, 
>>        bi_private = 0xffff881e08d54d30, 
>>        bi_ioc = 0x0, 
>>        bi_css = 0x0, 
>>        bi_integrity = 0x0, 
>>        bi_max_vecs = 1, 
>>        bi_cnt = {
>>          counter = 1
>>        }, 
>>        bi_io_vec = 0xffff881e08d554f0, 
>>        bi_pool = 0x0, 
>>        {
>>          bio_aux = 0x0, 
>>          __UNIQUE_ID_rh_kabi_hide15 = {
>>            rh_reserved1 = 0x0
>>          }, 
>>          {<No data fields>}
>>        }, 
>>        bi_inline_vecs = 0xffff881e08d554e0
>>      }, 
>>      vec = {
>>        bv_page = 0xffffea0078236180, 
>>        bv_len = 4096, 
>>        bv_offset = 0
>>      }, 
>>      rvec = {
>>        bv_page = 0x0, 
>>        bv_len = 0, 
>>        bv_offset = 0
>>      }, 
>>      page = 0xffffea0078236180, 
>>      orig_page = 0xffffea0078236180, 
>>      toread = 0x0, 
>>      read = 0x0, 
>>      towrite = 0x0, 
>>      written = 0xffff8824832a7c00, 
>>      sector = 29609797864, 
>>      flags = 24, 
>>      log_checksum = 0
>>    }}, {{
>>      req = {
>>        bi_sector = 0, 
>>        bi_next = 0x0, 
>>        bi_bdev = 0x0, 
>>        bi_flags = 1, 
>>        bi_rw = 0, 
>>        bi_vcnt = 0, 
>>        bi_idx = 0, 
>>        bi_phys_segments = 0, 
>>        bi_size = 0, 
>>        bi_seg_front_size = 0, 
>>        bi_seg_back_size = 0, 
>>        bi_end_io = 0x0, 
>>        bi_private = 0xffff881e08d54d30, 
>>        bi_ioc = 0x0, 
>>        bi_css = 0x0, 
>>        bi_integrity = 0x0, 
>>        bi_max_vecs = 1, 
>>        bi_cnt = {
>>          counter = 1
>>        }, 
>>        bi_io_vec = 0xffff881e08d55658, 
>>        bi_pool = 0x0, 
>>        {
>>          bio_aux = 0x0, 
>>          __UNIQUE_ID_rh_kabi_hide15 = {
>>            rh_reserved1 = 0x0
>>          }, 
>>          {<No data fields>}
>>        }, 
>>        bi_inline_vecs = 0xffff881e08d555d0
>>      }, 
>>      rreq = {
>>        bi_sector = 0, 
>>        bi_next = 0x0, 
>>        bi_bdev = 0x0, 
>>        bi_flags = 1, 
>>        bi_rw = 0, 
>>        bi_vcnt = 0, 
>>        bi_idx = 0, 
>>        bi_phys_segments = 0, 
>>        bi_size = 0, 
>>        bi_seg_front_size = 0, 
>>        bi_seg_back_size = 0, 
>>        bi_end_io = 0x0, 
>>        bi_private = 0xffff881e08d54d30, 
>>        bi_ioc = 0x0, 
>>        bi_css = 0x0, 
>>        bi_integrity = 0x0, 
>>        bi_max_vecs = 1, 
>>        bi_cnt = {
>>          counter = 1
>>        }, 
>>        bi_io_vec = 0xffff881e08d55668, 
>>        bi_pool = 0x0, 
>>        {
>>          bio_aux = 0x0, 
>>          __UNIQUE_ID_rh_kabi_hide15 = {
>>            rh_reserved1 = 0x0
>>          }, 
>>          {<No data fields>}
>>        }, 
>>        bi_inline_vecs = 0xffff881e08d55658
>>      }, 
>>      vec = {
>>        bv_page = 0xffffea00782361c0, 
>>        bv_len = 4096, 
>>        bv_offset = 0
>>      }, 
>>      rvec = {
>>        bv_page = 0x0, 
>>        bv_len = 0, 
>>        bv_offset = 0
>>      }, 
>>      page = 0xffffea00782361c0, 
>>      orig_page = 0xffffea00782361c0, 
>>      toread = 0x0, 
>>      read = 0x0, 
>>      towrite = 0x0, 
>>      written = 0xffff8824832a7c00, 
>>      sector = 29609797992, 
>>      flags = 24, 
>>      log_checksum = 0
>>    }}, {{
>>      req = {
>>        bi_sector = 0, 
>>        bi_next = 0x0, 
>>        bi_bdev = 0x0, 
>>        bi_flags = 1, 
>>        bi_rw = 0, 
>>        bi_vcnt = 0, 
>>        bi_idx = 0, 
>>        bi_phys_segments = 0, 
>>        bi_size = 0, 
>>        bi_seg_front_size = 0, 
>>        bi_seg_back_size = 0, 
>>        bi_end_io = 0x0, 
>>        bi_private = 0xffff881e08d54d30, 
>>        bi_ioc = 0x0, 
>>        bi_css = 0x0, 
>>        bi_integrity = 0x0, 
>>        bi_max_vecs = 1, 
>>        bi_cnt = {
>>          counter = 1
>>        }, 
>>        bi_io_vec = 0xffff881e08d557d0, 
>>        bi_pool = 0x0, 
>>        {
>>          bio_aux = 0x0, 
>>          __UNIQUE_ID_rh_kabi_hide15 = {
>>            rh_reserved1 = 0x0
>>          }, 
>>          {<No data fields>}
>>        }, 
>>        bi_inline_vecs = 0xffff881e08d55748
>>      }, 
>>      rreq = {
>>        bi_sector = 0, 
>>        bi_next = 0x0, 
>>        bi_bdev = 0x0, 
>>        bi_flags = 1, 
>>        bi_rw = 0, 
>>        bi_vcnt = 0, 
>>        bi_idx = 0, 
>>        bi_phys_segments = 0, 
>>        bi_size = 0, 
>>        bi_seg_front_size = 0, 
>>        bi_seg_back_size = 0, 
>>        bi_end_io = 0x0, 
>>        bi_private = 0xffff881e08d54d30, 
>>        bi_ioc = 0x0, 
>>        bi_css = 0x0, 
>>        bi_integrity = 0x0, 
>>        bi_max_vecs = 1, 
>>        bi_cnt = {
>>          counter = 1
>>        }, 
>>        bi_io_vec = 0xffff881e08d557e0, 
>>        bi_pool = 0x0, 
>>        {
>>          bio_aux = 0x0, 
>>          __UNIQUE_ID_rh_kabi_hide15 = {
>>            rh_reserved1 = 0x0
>>          }, 
>>          {<No data fields>}
>>        }, 
>>        bi_inline_vecs = 0xffff881e08d557d0
>>      }, 
>>      vec = {
>>        bv_page = 0xffffea0078236200, 
>>        bv_len = 4096, 
>>        bv_offset = 0
>>      }, 
>>      rvec = {
>>        bv_page = 0x0, 
>>        bv_len = 0, 
>>        bv_offset = 0
>>      }, 
>>      page = 0xffffea0078236200, 
>>      orig_page = 0xffffea0078236200, 
>>      toread = 0x0, 
>>      read = 0x0, 
>>      towrite = 0x0, 
>>      written = 0xffff8824832a7c00, 
>>      sector = 29609798120, 
>>      flags = 24, 
>>      log_checksum = 0
>>    }}, {{
>>      req = {
>>        bi_sector = 0, 
>>        bi_next = 0x0, 
>>        bi_bdev = 0x0, 
>>        bi_flags = 1, 
>>        bi_rw = 0, 
>>        bi_vcnt = 0, 
>>        bi_idx = 0, 
>>        bi_phys_segments = 0, 
>>        bi_size = 0, 
>>        bi_seg_front_size = 0, 
>>        bi_seg_back_size = 0, 
>>        bi_end_io = 0x0, 
>>        bi_private = 0xffff881e08d54d30, 
>>        bi_ioc = 0x0, 
>>        bi_css = 0x0, 
>>        bi_integrity = 0x0, 
>>        bi_max_vecs = 1, 
>>        bi_cnt = {
>>          counter = 1
>>        }, 
>>        bi_io_vec = 0xffff881e08d55948, 
>>        bi_pool = 0x0, 
>>        {
>>          bio_aux = 0x0, 
>>          __UNIQUE_ID_rh_kabi_hide15 = {
>>            rh_reserved1 = 0x0
>>          }, 
>>          {<No data fields>}
>>        }, 
>>        bi_inline_vecs = 0xffff881e08d558c0
>>      }, 
>>      rreq = {
>>        bi_sector = 0, 
>>        bi_next = 0x0, 
>>        bi_bdev = 0x0, 
>>        bi_flags = 1, 
>>        bi_rw = 0, 
>>        bi_vcnt = 0, 
>>        bi_idx = 0, 
>>        bi_phys_segments = 0, 
>>        bi_size = 0, 
>>        bi_seg_front_size = 0, 
>>        bi_seg_back_size = 0, 
>>        bi_end_io = 0x0, 
>>        bi_private = 0xffff881e08d54d30, 
>>        bi_ioc = 0x0, 
>>        bi_css = 0x0, 
>>        bi_integrity = 0x0, 
>>        bi_max_vecs = 1, 
>>        bi_cnt = {
>>          counter = 1
>>        }, 
>>        bi_io_vec = 0xffff881e08d55958, 
>>        bi_pool = 0x0, 
>>        {
>>          bio_aux = 0x0, 
>>          __UNIQUE_ID_rh_kabi_hide15 = {
>>            rh_reserved1 = 0x0
>>          }, 
>>          {<No data fields>}
>>        }, 
>>        bi_inline_vecs = 0xffff881e08d55948
>>      }, 
>>      vec = {
>>        bv_page = 0xffffea0078236240, 
>>        bv_len = 4096, 
>>        bv_offset = 0
>>      }, 
>>      rvec = {
>>        bv_page = 0x0, 
>>        bv_len = 0, 
>>        bv_offset = 0
>>      }, 
>>      page = 0xffffea0078236240, 
>>      orig_page = 0xffffea0078236240, 
>>      toread = 0x0, 
>>      read = 0x0, 
>>      towrite = 0x0, 
>>      written = 0xffff8824832a7c00, 
>>      sector = 29609798248, 
>>      flags = 24, 
>>      log_checksum = 0
>>    }}, {{
>>      req = {
>>        bi_sector = 0, 
>>        bi_next = 0x0, 
>>        bi_bdev = 0x0, 
>>        bi_flags = 1, 
>>        bi_rw = 0, 
>>        bi_vcnt = 0, 
>>        bi_idx = 0, 
>>        bi_phys_segments = 0, 
>>        bi_size = 0, 
>>        bi_seg_front_size = 0, 
>>        bi_seg_back_size = 0, 
>>        bi_end_io = 0x0, 
>>        bi_private = 0xffff881e08d54d30, 
>>        bi_ioc = 0x0, 
>>        bi_css = 0x0, 
>>        bi_integrity = 0x0, 
>>        bi_max_vecs = 1, 
>>        bi_cnt = {
>>          counter = 1
>>        }, 
>>        bi_io_vec = 0xffff881e08d55ac0, 
>>        bi_pool = 0x0, 
>>        {
>>          bio_aux = 0x0, 
>>          __UNIQUE_ID_rh_kabi_hide15 = {
>>            rh_reserved1 = 0x0
>>          }, 
>>          {<No data fields>}
>>        }, 
>>        bi_inline_vecs = 0xffff881e08d55a38
>>      }, 
>>      rreq = {
>>        bi_sector = 0, 
>>        bi_next = 0x0, 
>>        bi_bdev = 0x0, 
>>        bi_flags = 1, 
>>        bi_rw = 0, 
>>        bi_vcnt = 0, 
>>        bi_idx = 0, 
>>        bi_phys_segments = 0, 
>>        bi_size = 0, 
>>        bi_seg_front_size = 0, 
>>        bi_seg_back_size = 0, 
>>        bi_end_io = 0x0, 
>>        bi_private = 0xffff881e08d54d30, 
>>        bi_ioc = 0x0, 
>>        bi_css = 0x0, 
>>        bi_integrity = 0x0, 
>>        bi_max_vecs = 1, 
>>        bi_cnt = {
>>          counter = 1
>>        }, 
>>        bi_io_vec = 0xffff881e08d55ad0, 
>>        bi_pool = 0x0, 
>>        {
>>          bio_aux = 0x0, 
>>          __UNIQUE_ID_rh_kabi_hide15 = {
>>            rh_reserved1 = 0x0
>>          }, 
>>          {<No data fields>}
>>        }, 
>>        bi_inline_vecs = 0xffff881e08d55ac0
>>      }, 
>>      vec = {
>>        bv_page = 0xffffea0078236280, 
>>        bv_len = 4096, 
>>        bv_offset = 0
>>      }, 
>>      rvec = {
>>        bv_page = 0x0, 
>>        bv_len = 0, 
>>        bv_offset = 0
>>      }, 
>>      page = 0xffffea0078236280, 
>>      orig_page = 0xffffea0078236280, 
>>      toread = 0x0, 
>>      read = 0x0, 
>>      towrite = 0x0, 
>>      written = 0xffff8824832a7c00, 
>>      sector = 29609798376, 
>>      flags = 24, 
>>      log_checksum = 0
>>    }}, {{
>>      req = {
>>        bi_sector = 0, 
>>        bi_next = 0x0, 
>>        bi_bdev = 0x0, 
>>        bi_flags = 1, 
>>        bi_rw = 0, 
>>        bi_vcnt = 0, 
>>        bi_idx = 0, 
>>        bi_phys_segments = 0, 
>>        bi_size = 0, 
>>        bi_seg_front_size = 0, 
>>        bi_seg_back_size = 0, 
>>        bi_end_io = 0x0, 
>>        bi_private = 0xffff881e08d54d30, 
>>        bi_ioc = 0x0, 
>>        bi_css = 0x0, 
>>        bi_integrity = 0x0, 
>>        bi_max_vecs = 1, 
>>        bi_cnt = {
>>          counter = 1
>>        }, 
>>        bi_io_vec = 0xffff881e08d55c38, 
>>        bi_pool = 0x0, 
>>        {
>>          bio_aux = 0x0, 
>>          __UNIQUE_ID_rh_kabi_hide15 = {
>>            rh_reserved1 = 0x0
>>          }, 
>>          {<No data fields>}
>>        }, 
>>        bi_inline_vecs = 0xffff881e08d55bb0
>>      }, 
>>      rreq = {
>>        bi_sector = 0, 
>>        bi_next = 0x0, 
>>        bi_bdev = 0x0, 
>>        bi_flags = 1, 
>>        bi_rw = 0, 
>>        bi_vcnt = 0, 
>>        bi_idx = 0, 
>>        bi_phys_segments = 0, 
>>        bi_size = 0, 
>>        bi_seg_front_size = 0, 
>>        bi_seg_back_size = 0, 
>>        bi_end_io = 0x0, 
>>        bi_private = 0xffff881e08d54d30, 
>>        bi_ioc = 0x0, 
>>        bi_css = 0x0, 
>>        bi_integrity = 0x0, 
>>        bi_max_vecs = 1, 
>>        bi_cnt = {
>>          counter = 1
>>        }, 
>>        bi_io_vec = 0xffff881e08d55c48, 
>>        bi_pool = 0x0, 
>>        {
>>          bio_aux = 0x0, 
>>          __UNIQUE_ID_rh_kabi_hide15 = {
>>            rh_reserved1 = 0x0
>>          }, 
>>          {<No data fields>}
>>        }, 
>>        bi_inline_vecs = 0xffff881e08d55c38
>>      }, 
>>      vec = {
>>        bv_page = 0xffffea00782362c0, 
>>        bv_len = 4096, 
>>        bv_offset = 0
>>      }, 
>>      rvec = {
>>        bv_page = 0x0, 
>>        bv_len = 0, 
>>        bv_offset = 0
>>      }, 
>>      page = 0xffffea00782362c0, 
>>      orig_page = 0xffffea00782362c0, 
>>      toread = 0x0, 
>>      read = 0x0, 
>>      towrite = 0x0, 
>>      written = 0xffff8824832a7c00, 
>>      sector = 29609798504, 
>>      flags = 24, 
>>      log_checksum = 0
>>    }}}
> 
> Didn't find anything interesting yet, sorry. There might be a race condition
> which makes the stripe batch head and also adds it to another batch list,
> that's my current theory, but can't find such race condition.

Ok. In any case, thanks much for taking the time to look at this.

> 
> So in the crashdump, we have something like this:
> 
> batch_list = {
>    next = 0xffff881dfe405d28,
>    prev = 0xffff881e0e2dec98
>  },
> 
> Can you convert the next/prev to stripes and dump their state? That probably is
> helpful for debugging.

Results below. They do look partially zeroed, and disks=0 so I’m not sure that makes sense to dump their devs.

Note: next and prev in the other crash look similar.

Here is ‘next':

crash> p ((struct stripe_head)*0xffff881dfe405d28)
$5 = {
  hash = {
    next = 0xffff881e0e2dec98, 
    pprev = 0xffff881e08d54db8
  }, 
  lru = {
    next = 0x0, 
    prev = 0x0
  }, 
  release_list = {
    next = 0x0
  }, 
  raid_conf = 0x0, 
  generation = 0, 
  sector = 0, 
  pd_idx = 0, 
  qd_idx = 0, 
  ddf_layout = 0, 
  hash_lock_index = 0, 
  state = 0, 
  count = {
    counter = 1
  }, 
  bm_seq = 0, 
  disks = 0, 
  overwrite_disks = 0, 
  check_state = check_state_idle, 
  reconstruct_state = reconstruct_state_idle, 
  stripe_lock = {
    {
      rlock = {
        raw_lock = {
          {
            head_tail = 0, 
            tickets = {
              head = 0, 
              tail = 0
            }
          }
        }
      }
    }
  }, 
  cpu = 0, 
  group = 0x0, 
  batch_head = 0x0, 
  batch_lock = {
    {
      rlock = {
        raw_lock = {
          {
            head_tail = 4265630880, 
            tickets = {
              head = 23712, 
              tail = 65088
            }
          }
        }
      }
    }
  }, 
  batch_list = {
    next = 0x0, 
    prev = 0x0
  }, 
  log_io = 0x0, 
  log_list = {
    next = 0x100000001, 
    prev = 0xffff881dfe405e70
  }, 
  ops = {
    target = 0, 
    target2 = 0, 
    zero_sum_result = (unknown: 0)
  }, 
  dev = {{
      req = {
        bi_sector = 0, 
        bi_next = 0x0, 
        bi_bdev = 0x0, 
        bi_flags = 1, 
        bi_rw = 0, 
        bi_vcnt = 0, 
        bi_idx = 0, 
        bi_phys_segments = 0, 
        bi_size = 0, 
        bi_seg_front_size = 0, 
        bi_seg_back_size = 0, 
        bi_end_io = 0x0, 
        bi_private = 0xffff881dfe405ca0, 
        bi_ioc = 0x0, 
        bi_css = 0x0, 
        bi_integrity = 0x0, 
        bi_max_vecs = 1, 
        bi_cnt = {
          counter = 1
        }, 
        bi_io_vec = 0xffff881dfe405e80, 
        bi_pool = 0x0, 
        {
          bio_aux = 0x0, 
          __UNIQUE_ID_rh_kabi_hide15 = {
            rh_reserved1 = 0x0
          }, 
          {<No data fields>}
        }, 
        bi_inline_vecs = 0xffff881dfe405e70
      }, 
      rreq = {
        bi_sector = 18446719886466550720, 
        bi_next = 0x1000, 
        bi_bdev = 0x0, 
        bi_flags = 0, 
        bi_rw = 18446719886466550720, 
        bi_vcnt = 3008, 
        bi_idx = 30713, 
        bi_phys_segments = 4294961664, 
        bi_size = 0, 
        bi_seg_front_size = 0, 
        bi_seg_back_size = 0, 
        bi_end_io = 0x0, 
        bi_private = 0xffff8824832a7c00, 
        bi_ioc = 0x6e4e1abf0, 
        bi_css = 0x8, 
        bi_integrity = 0x0, 
        bi_max_vecs = 0, 
        bi_cnt = {
          counter = 0
        }, 
        bi_io_vec = 0x0, 
        bi_pool = 0x0, 
        {
          bio_aux = 0x1, 
          __UNIQUE_ID_rh_kabi_hide15 = {
            rh_reserved1 = 0x1
          }, 
          {<No data fields>}
        }, 
        bi_inline_vecs = 0xffff881dfe405ef8
      }, 
      vec = {
        bv_page = 0x0, 
        bv_len = 0, 
        bv_offset = 0
      }, 
      rvec = {
        bv_page = 0x0, 
        bv_len = 0, 
        bv_offset = 0
      }, 
      page = 0x0, 
      orig_page = 0xffff881dfe405ca0, 
      toread = 0x0, 
      read = 0x0, 
      towrite = 0x0, 
      written = 0x100000001, 
      sector = 18446612261133901800, 
      flags = 0, 
      log_checksum = 0
    }}
}


And this is ‘prev’:

crash> p ((struct stripe_head)*0xffff881e0e2dec98)
$6 = {
  hash = {
    next = 0xffff881e08d54db8, 
    pprev = 0xffff881dfe405d28
  }, 
  lru = {
    next = 0x0, 
    prev = 0x0
  }, 
  release_list = {
    next = 0x0
  }, 
  raid_conf = 0x0, 
  generation = 0, 
  sector = 0, 
  pd_idx = 0, 
  qd_idx = 0, 
  ddf_layout = 0, 
  hash_lock_index = 0, 
  state = 0, 
  count = {
    counter = 1
  }, 
  bm_seq = 0, 
  disks = 0, 
  overwrite_disks = 0, 
  check_state = check_state_idle, 
  reconstruct_state = reconstruct_state_idle, 
  stripe_lock = {
    {
      rlock = {
        raw_lock = {
          {
            head_tail = 0, 
            tickets = {
              head = 0, 
              tail = 0
            }
          }
        }
      }
    }
  }, 
  cpu = 0, 
  group = 0x0, 
  batch_head = 0x0, 
  batch_lock = {
    {
      rlock = {
        raw_lock = {
          {
            head_tail = 237890576, 
            tickets = {
              head = 60432, 
              tail = 3629
            }
          }
        }
      }
    }
  }, 
  batch_list = {
    next = 0x0, 
    prev = 0x0
  }, 
  log_io = 0x0, 
  log_list = {
    next = 0x100000001, 
    prev = 0xffff881e0e2dede0
  }, 
  ops = {
    target = 0, 
    target2 = 0, 
    zero_sum_result = (unknown: 0)
  }, 
  dev = {{
      req = {
        bi_sector = 0, 
        bi_next = 0x0, 
        bi_bdev = 0x0, 
        bi_flags = 1, 
        bi_rw = 0, 
        bi_vcnt = 0, 
        bi_idx = 0, 
        bi_phys_segments = 0, 
        bi_size = 0, 
        bi_seg_front_size = 0, 
        bi_seg_back_size = 0, 
        bi_end_io = 0x0, 
        bi_private = 0xffff881e0e2dec10, 
        bi_ioc = 0x0, 
        bi_css = 0x0, 
        bi_integrity = 0x0, 
        bi_max_vecs = 1, 
        bi_cnt = {
          counter = 1
        }, 
        bi_io_vec = 0xffff881e0e2dedf0, 
        bi_pool = 0x0, 
        {
          bio_aux = 0x0, 
          __UNIQUE_ID_rh_kabi_hide15 = {
            rh_reserved1 = 0x0
          }, 
          {<No data fields>}
        }, 
        bi_inline_vecs = 0xffff881e0e2dede0
      }, 
      rreq = {
        bi_sector = 18446719886470726848, 
        bi_next = 0x1000, 
        bi_bdev = 0x0, 
        bi_flags = 0, 
        bi_rw = 18446719886470726848, 
        bi_vcnt = 50368, 
        bi_idx = 30776, 
        bi_phys_segments = 4294961664, 
        bi_size = 0, 
        bi_seg_front_size = 0, 
        bi_seg_back_size = 0, 
        bi_end_io = 0x0, 
        bi_private = 0xffff8824832a7c00, 
        bi_ioc = 0x6e4e1abf8, 
        bi_css = 0x1b, 
        bi_integrity = 0x0, 
        bi_max_vecs = 0, 
        bi_cnt = {
          counter = 0
        }, 
        bi_io_vec = 0x0, 
        bi_pool = 0x0, 
        {
          bio_aux = 0x1, 
          __UNIQUE_ID_rh_kabi_hide15 = {
            rh_reserved1 = 0x1
          }, 
          {<No data fields>}
        }, 
        bi_inline_vecs = 0xffff881e0e2dee68
      }, 
      vec = {
        bv_page = 0x0, 
        bv_len = 0, 
        bv_offset = 0
      }, 
      rvec = {
        bv_page = 0x0, 
        bv_len = 0, 
        bv_offset = 0
      }, 
      page = 0x0, 
      orig_page = 0xffff881e0e2dec10, 
      toread = 0x0, 
      read = 0x0, 
      towrite = 0x0, 
      written = 0x100000001, 
      sector = 18446612261401128792, 
      flags = 0, 
      log_checksum = 0
    }}
}

Thanks again,

Stephane

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: BUG_ON(sh->batch_head) in init_stripe()
  2017-08-18  4:13             ` Stephane Thiell
@ 2017-08-18  5:50               ` Shaohua Li
       [not found]                 ` <C45EC19A-1CF2-45BA-B907-BCD5B4E644F0@stanford.edu>
  0 siblings, 1 reply; 15+ messages in thread
From: Shaohua Li @ 2017-08-18  5:50 UTC (permalink / raw)
  To: Stephane Thiell; +Cc: Guoqing Jiang, linux-raid

On Fri, Aug 18, 2017 at 04:13:21AM +0000, Stephane Thiell wrote:
> > On Aug 17, 2017, at 3:58 PM, Shaohua Li <shli@kernel.org> wrote:
> > 
> > On Tue, Aug 15, 2017 at 07:27:44PM +0000, Stephane Thiell wrote:
> >> 
> >>> On Aug 15, 2017, at 11:39 AM, Shaohua Li <shli@kernel.org> wrote:
> >>> 
> >>> On Tue, Aug 15, 2017 at 05:59:38PM +0000, Stephane Thiell wrote:
> >>>> 
> >>>>> On Aug 15, 2017, at 2:24 AM, Guoqing Jiang <gqjiang@suse.com> wrote:
> >>>>> 
> >>>>> 
> >>>>> 
> >>>>> On 08/14/2017 01:29 AM, Stephane Thiell wrote:
> >>>>>> Also known as "kernel BUG at drivers/md/raid5.c:527!"
> >>>>>> 
> >>>>>> I’m still struggling with this issue in raid5.c. I currently have one crash per week in average on a system using Lustre and 24 x 10-disk raid6 per server. It did happen with or without check running. I now think there is a race at some point where a released stripe head (batched) is added to conf->inactive_list without sh->batch_head being set back to NULL.
> >>>>>> 
> >>>>>> Because sh->batch_head is only being set back to NULL in a single function, break_stripe_batch_list(), my guess is that a call to raid5_release_stripe() on the head stripe is performed elsewhere. I didn’t find where this could come from yet. So perhaps sh->batch_head should be just set to NULL in do_release_stripe(), before being added to the inactive list?
> >>>>>> 
> >>>>>> Alternatively, I wonder if it would make more sense to set batch_head to NULL where sh->overwrite_disks is also set back to 0 (like in ops_run_biodrain() today).
> >>>>>> 
> >>>>>> Because sh->overwrite_disks is also set to 0 in init_stripe(), my current plan is to try this change, at least as a workaround: remove the BUG_ON(sh->batch_head) from init_stripe() and set sh->batch_head = NULL (I don’t think I need to lock on sh->stripe_lock there?). I will get new hardware soon so I will give it a try on a test system. Any suggestions welcomed…
> >>>>>> 
> >>>>>> Quick analysis of the last vmcore:
> >>>>>> 
> >>>>>> crash> ps | grep ">"
> >>>>>> ...
> >>>>>>> 16119  16111  22  ffff881f93bace70  RU   0.0  150772  10748  e2label
> >>>>>>> 151448      2  34  ffff881ff753bec0  RU   0.0       0      0  [kiblnd_sd_00_00]
> >>>>>>> 151449      2   2  ffff881ff753edd0  RU   0.0       0      0  [kiblnd_sd_00_01]
> >>>>>>> 154637      2  29  ffff881f776b1f60  RU   0.0       0      0  [md32_raid6]
> >>>>>>> 163643      2  11  ffff8820d200bec0  UN   0.0       0      0  [ll_ost_io01_008]
> >>>>>>> 164083      2  13  ffff8820d2753ec0  UN   0.0       0      0  [ll_ost_io01_034]
> >>>>>>> 171389      2   5  ffff883fd072de20  UN   0.0       0      0  [ll_ost_io01_058]
> >>>>>>> 171391      2  17  ffff883fd072ce70  UN   0.0       0      0  [ll_ost_io01_060]
> >>>>>>> 171399      2  33  ffff883fd9e9ce70  UN   0.0       0      0  [ll_ost_io01_068]
> >>>>>>> 237019      2  15  ffff882715628fb0  UN   0.0       0      0  [ll_ost_io01_084]
> >>>>>>> 237033      2   3  ffff881fb0250fb0  UN   0.0       0      0  [ll_ost_io01_098]
> >>>>>> ...
> >>>>>> 
> >>>>>> crash>  bt 237033
> >>>>>> PID: 237033  TASK: ffff881fb0250fb0  CPU: 3   COMMAND: "ll_ost_io01_098"
> >>>>>> #0 [ffff8826fa3e7240] machine_kexec at ffffffff81059bdb
> >>>>>> #1 [ffff8826fa3e72a0] __crash_kexec at ffffffff811057c2
> >>>>>> #2 [ffff8826fa3e7370] crash_kexec at ffffffff811058b0
> >>>>>> #3 [ffff8826fa3e7388] oops_end at ffffffff8168fac8
> >>>>>> #4 [ffff8826fa3e73b0] die at ffffffff8102e93b
> >>>>>> #5 [ffff8826fa3e73e0] do_trap at ffffffff8168f180
> >>>>>> #6 [ffff8826fa3e7430] do_invalid_op at ffffffff8102b144
> >>>>>> #7 [ffff8826fa3e74e0] invalid_op at ffffffff81698b5e
> >>>>>>   [exception RIP: raid5_get_active_stripe+1809]
> >>>>>>   RIP: ffffffffa0fefd71  RSP: ffff8826fa3e7590  RFLAGS: 00010086
> >>>>>>   RAX: 0000000000000000  RBX: ffff883df051641c  RCX: dead000000000200
> >>>>>>   RDX: 0000000000000000  RSI: 0000000000000005  RDI: ffff881e08d54d30
> >>>>>>   RBP: ffff8826fa3e7638   R8: ffff881e08d54d40   R9: 0000000000000080
> >>>>>>   R10: 0000000000000000  R11: 0000000000000000  R12: ffff883df0516400
> >>>>>>   R13: ffff881e08d54d30  R14: 0000000000000000  R15: 0000000000000000
> >>>>>>   ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
> >>>>>> #8 [ffff8826fa3e7640] raid5_make_request at ffffffffa0feff55 [raid456]
> >>>>>> #9 [ffff8826fa3e7730] md_make_request at ffffffff814f838c
> >>>>>> #10 [ffff8826fa3e7790] generic_make_request at ffffffff812eedc9
> >>>>>> #11 [ffff8826fa3e77d8] submit_bio at ffffffff812eef11
> >>>>>> #12 [ffff8826fa3e7830] osd_submit_bio at ffffffffa10cdbcc [osd_ldiskfs]
> >>>>>> #13 [ffff8826fa3e7840] osd_do_bio at ffffffffa10d03c1 [osd_ldiskfs]
> >>>>>> #14 [ffff8826fa3e7960] osd_write_commit at ffffffffa10d0b94 [osd_ldiskfs]
> >>>>>> #15 [ffff8826fa3e7a00] ofd_commitrw_write at ffffffffa120c734 [ofd]
> >>>>>> #16 [ffff8826fa3e7a78] ofd_commitrw at ffffffffa120ff07 [ofd]
> >>>>>> #17 [ffff8826fa3e7b08] obd_commitrw at ffffffffa0c7dc22 [ptlrpc]
> >>>>>> #18 [ffff8826fa3e7b70] tgt_brw_write at ffffffffa0c55fc1 [ptlrpc]
> >>>>>> #19 [ffff8826fa3e7cd8] tgt_request_handle at ffffffffa0c52275 [ptlrpc]
> >>>>>> #20 [ffff8826fa3e7d20] ptlrpc_server_handle_request at ffffffffa0bfe1fb [ptlrpc]
> >>>>>> #21 [ffff8826fa3e7de8] ptlrpc_main at ffffffffa0c022b0 [ptlrpc]
> >>>>>> #22 [ffff8826fa3e7ec8] kthread at ffffffff810b0a4f
> >>>>>> #23 [ffff8826fa3e7f50] ret_from_fork at ffffffff81697218
> >>>>>> 
> >>>>>> 
> >>>>>> After some disassembly work, I found out that all active tasks are working on the same md (all are raid6):
> >>>>>> 
> >>>>>> md32’s (struct r5conf *) is 0xffff883df0516400
> >>>>>> 
> >>>>>> Our crashed task:
> >>>>>> 
> >>>>>> crash> printf "%s", ((struct r5conf)*0xffff883df0516400).cache_name[0]
> >>>>>> raid6-md32
> >>>>>> 
> >>>>>> Example for md32_raid6 task in release_inactive_stripe_list:
> >>>>>> 
> >>>>>> crash> bt 154637
> >>>>>> PID: 154637  TASK: ffff881f776b1f60  CPU: 29  COMMAND: "md32_raid6"
> >>>>>> #0 [ffff883ffc985e48] crash_nmi_callback at ffffffff8104d2e2
> >>>>>> #1 [ffff883ffc985e58] nmi_handle at ffffffff8168fc77
> >>>>>> #2 [ffff883ffc985eb0] do_nmi at ffffffff8168fe83
> >>>>>> #3 [ffff883ffc985ef0] end_repeat_nmi at ffffffff8168f093
> >>>>>>   [exception RIP: _raw_spin_lock_irqsave+64]
> >>>>>>   RIP: ffffffff8168e5f0  RSP: ffff881e8d0cbc60  RFLAGS: 00000006
> >>>>>>   RAX: 00000000000031bb  RBX: ffff883df0516740  RCX: 000000000000009a
> >>>>>>   RDX: 0000000000000098  RSI: 000000000000009a  RDI: ffff883df051641c
> >>>>>>   RBP: ffff881e8d0cbc60   R8: 0000000000000206   R9: 0000000000000000
> >>>>>>   R10: 0000000000000000  R11: 0000000000000007  R12: ffff883df0516400
> >>>>>>   R13: ffff883df051641c  R14: 0000000000000006  R15: 0000000000000005
> >>>>>>   ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
> >>>>>> --- <NMI exception stack> ---
> >>>>>> #4 [ffff881e8d0cbc60] _raw_spin_lock_irqsave at ffffffff8168e5f0
> >>>>>> #5 [ffff881e8d0cbc68] release_inactive_stripe_list at ffffffffa0fe5500 [raid456]
> >>>>>> #6 [ffff881e8d0cbcb0] handle_active_stripes at ffffffffa0ff34c2 [raid456]
> >>>>>> #7 [ffff881e8d0cbd60] raid5d at ffffffffa0ff3c28 [raid456]
> >>>>>> #8 [ffff881e8d0cbe50] md_thread at ffffffff814fb745
> >>>>>> #9 [ffff881e8d0cbec8] kthread at ffffffff810b0a4f
> >>>>>> #10 [ffff881e8d0cbf50] ret_from_fork at ffffffff81697218
> >>>>>> 
> >>>>>> 
> >>>>>> crash> bt 163643 164083 171389 171391 171399 237019 237033 |grep "exception RIP:"
> >>>>>>   [exception RIP: _raw_spin_lock_irq+63]
> >>>>>>   [exception RIP: _raw_spin_lock_irq+63]
> >>>>>>   [exception RIP: _raw_spin_lock_irq+58]
> >>>>>>   [exception RIP: _raw_spin_lock_irq+63]
> >>>>>>   [exception RIP: _raw_spin_lock_irq+58]
> >>>>>>   [exception RIP: _raw_spin_lock_irq+63]
> >>>>>>   [exception RIP: raid5_get_active_stripe+1809]
> >>>>>> 
> >>>>>> crash> bt 163643 164083 171389 171391 171399 237019 237033 |grep "raid5_get_active_stripe"
> >>>>>> #5 [ffff883bcee93588] raid5_get_active_stripe at ffffffffa0fef6cb [raid456]
> >>>>>> #5 [ffff883ba4af7588] raid5_get_active_stripe at ffffffffa0fef6cb [raid456]
> >>>>>> #5 [ffff88390877b588] raid5_get_active_stripe at ffffffffa0fef6cb [raid456]
> >>>>>> #5 [ffff88390803b588] raid5_get_active_stripe at ffffffffa0fef6cb [raid456]
> >>>>>> #5 [ffff88390831f588] raid5_get_active_stripe at ffffffffa0fef6cb [raid456]
> >>>>>> #5 [ffff8826fa7f7588] raid5_get_active_stripe at ffffffffa0fef6cb [raid456]
> >>>>>>   [exception RIP: raid5_get_active_stripe+1809]
> >>>>>> 
> >>>>>> crash> bt 163643 164083 171389 171391 171399 237019 237033 |grep R12
> >>>>>>   R10: 0000000000000003  R11: 0000000000000000  R12: ffff883df0516400
> >>>>>>   R10: 0000000000000001  R11: 0000000000000000  R12: ffff883df0516400
> >>>>>>   R10: 0000000000000005  R11: 0000000000000000  R12: ffff883df0516400
> >>>>>>   R10: 0000000000000007  R11: 0000000000000000  R12: ffff883df0516400
> >>>>>>   R10: 0000000000000009  R11: 0000000000000000  R12: ffff883df0516400
> >>>>>>   R10: 0000000000000009  R11: 0000000000000000  R12: ffff883df0516400
> >>>>>>   R10: 0000000000000000  R11: 0000000000000000  R12: ffff883df0516400
> >>>>>> 
> >>>>>> 
> >>>>>> The bad stripe_head (sh) is available at (usually in %r13):
> >>>>>> 
> >>>>>> crash> p ((struct stripe_head)*0xffff881e08d54d30).raid_conf
> >>>>>> $98 = (struct r5conf *) 0xffff883df0516400
> >>>>>> 
> >>>>>> and batch_head is not NULL... that's the issue:
> >>>>>> 
> >>>>>> crash> p ((struct stripe_head)*0xffff881e08d54d30).batch_head
> >>>>>> $99 = (struct stripe_head *) 0xffff881e08d54d30
> >>>>>> 
> >>>>>> Full struct dump below. We can see than overwrite_disks is set to 0, state is 0… but this batch_head not set to NULL makes it crash in init_stripe():
> >>>>>> 
> >>>>>> crash> p ((struct stripe_head)*0xffff881e08d54d30)
> >>>>>> $6 = {
> >>>>>> hash = {
> >>>>>>   next = 0x0,
> >>>>>>   pprev = 0x0
> >>>>>> },
> >>>>>> lru = {
> >>>>>>   next = 0xffff881e08d54d40,
> >>>>>>   prev = 0xffff881e08d54d40
> >>>>>> },
> >>>>>> release_list = {
> >>>>>>   next = 0xffff881dfe405cc0
> >>>>>> },
> >>>>>> raid_conf = 0xffff883df0516400,
> >>>>>> generation = 0,
> >>>>>> sector = 3701224808,
> >>>>>> pd_idx = 1,
> >>>>>> qd_idx = 2,
> >>>>>> ddf_layout = 0,
> >>>>>> hash_lock_index = 5,
> >>>>>> state = 0,
> >>>>>> count = {
> >>>>>>   counter = 0
> >>>>>> },
> >>>>>> bm_seq = 22708787,
> >>>>>> disks = 10,
> >>>>>> overwrite_disks = 0,
> >>>>>> check_state = check_state_idle,
> >>>>>> reconstruct_state = reconstruct_state_idle,
> >>>>>> stripe_lock = {
> >>>>>>   {
> >>>>>>     rlock = {
> >>>>>>       raw_lock = {
> >>>>>>         {
> >>>>>>           head_tail = 3709263126,
> >>>>>>           tickets = {
> >>>>>>             head = 56598,
> >>>>>>             tail = 56598
> >>>>>>           }
> >>>>>>         }
> >>>>>>       }
> >>>>>>     }
> >>>>>>   }
> >>>>>> },
> >>>>>> cpu = 15,
> >>>>>> group = 0x0,
> >>>>>> batch_head = 0xffff881e08d54d30,
> >>>>>> batch_lock = {
> >>>>>>   {
> >>>>>>     rlock = {
> >>>>>>       raw_lock = {
> >>>>>>         {
> >>>>>>           head_tail = 9961624,
> >>>>>>           tickets = {
> >>>>>>             head = 152,
> >>>>>>             tail = 152
> >>>>>>           }
> >>>>>>         }
> >>>>>>       }
> >>>>>>     }
> >>>>>>   }
> >>>>>> },
> >>>>>> batch_list = {
> >>>>>>   next = 0xffff881dfe405d28,
> >>>>>>   prev = 0xffff881e0e2dec98
> >>>>>> },
> >>>>>> log_io = 0x0,
> >>>>>> log_list = {
> >>>>>>   next = 0x0,
> >>>>>>   prev = 0x0
> >>>>>> },
> >>>>>> ops = {
> >>>>>>   target = 0,
> >>>>>>   target2 = 0,
> >>>>>>   zero_sum_result = (unknown: 0)
> >>>>>> },
> >>>>>> dev = {{
> >>>>>>     req = {
> >>>>>>       bi_sector = 0,
> >>>>>>       bi_next = 0x0,
> >>>>>>       bi_bdev = 0x0,
> >>>>>>       bi_flags = 1,
> >>>>>>       bi_rw = 0,
> >>>>>>       bi_vcnt = 0,
> >>>>>>       bi_idx = 0,
> >>>>>>       bi_phys_segments = 0,
> >>>>>>       bi_size = 0,
> >>>>>>       bi_seg_front_size = 0,
> >>>>>>       bi_seg_back_size = 0,
> >>>>>>       bi_end_io = 0x0,
> >>>>>>       bi_private = 0xffff881e08d54d30,
> >>>>>>       bi_ioc = 0x0,
> >>>>>>       bi_css = 0x0,
> >>>>>>       bi_integrity = 0x0,
> >>>>>>       bi_max_vecs = 1,
> >>>>>>       bi_cnt = {
> >>>>>>         counter = 1
> >>>>>>       },
> >>>>>>       bi_io_vec = 0xffff881e08d54f00,
> >>>>>>       bi_pool = 0x0,
> >>>>>>       {
> >>>>>>         bio_aux = 0x0,
> >>>>>>         __UNIQUE_ID_rh_kabi_hide15 = {
> >>>>>>           rh_reserved1 = 0x0
> >>>>>>         },
> >>>>>>         {<No data fields>}
> >>>>>>       },
> >>>>>>       bi_inline_vecs = 0xffff881e08d54e78
> >>>>>>     },
> >>>>>>     rreq = {
> >>>>>>       bi_sector = 0,
> >>>>>>       bi_next = 0x0,
> >>>>>>       bi_bdev = 0x0,
> >>>>>>       bi_flags = 1,
> >>>>>>       bi_rw = 0,
> >>>>>>       bi_vcnt = 0,
> >>>>>>       bi_idx = 0,
> >>>>>>       bi_phys_segments = 0,
> >>>>>>       bi_size = 0,
> >>>>>>       bi_seg_front_size = 0,
> >>>>>>       bi_seg_back_size = 0,
> >>>>>>       bi_end_io = 0x0,
> >>>>>>       bi_private = 0xffff881e08d54d30,
> >>>>>>       bi_ioc = 0x0,
> >>>>>>       bi_css = 0x0,
> >>>>>>       bi_integrity = 0x0,
> >>>>>>       bi_max_vecs = 1,
> >>>>>>       bi_cnt = {
> >>>>>>         counter = 1
> >>>>>>       },
> >>>>>>       bi_io_vec = 0xffff881e08d54f10,
> >>>>>>       bi_pool = 0x0,
> >>>>>>       {
> >>>>>>         bio_aux = 0x0,
> >>>>>>         __UNIQUE_ID_rh_kabi_hide15 = {
> >>>>>>           rh_reserved1 = 0x0
> >>>>>>         },
> >>>>>>         {<No data fields>}
> >>>>>>       },
> >>>>>>       bi_inline_vecs = 0xffff881e08d54f00
> >>>>>>     },
> >>>>>>     vec = {
> >>>>>>       bv_page = 0xffffea0078236080,
> >>>>>>       bv_len = 4096,
> >>>>>>       bv_offset = 0
> >>>>>>     },
> >>>>>>     rvec = {
> >>>>>>       bv_page = 0x0,
> >>>>>>       bv_len = 0,
> >>>>>>       bv_offset = 0
> >>>>>>     },
> >>>>>>     page = 0xffffea0078236080,
> >>>>>>     orig_page = 0xffffea0078236080,
> >>>>>>     toread = 0x0,
> >>>>>>     read = 0x0,
> >>>>>>     towrite = 0x0,
> >>>>>>     written = 0xffff8824832a7c00,
> >>>>>>     sector = 29609798632,
> >>>>>>     flags = 24,
> >>>>> 
> >>>>> Seems the flags is R5_OVERWRITE | R5_Insync, handle_stripe_clean_event
> >>>>> only calls break_stripe_batch_list if do_endio is set to true.
> >>>>> 
> >>>>> if (head_sh->batch_head && do_endio)
> >>>>>              break_stripe_batch_list(head_sh, STRIPE_EXPAND_SYNC_FLAGS);
> >>>>> 
> >>>>> And do_endio is set to true if we meet the below condition.
> >>>>> 
> >>>>> if (!test_bit(R5_LOCKED, &dev->flags) &&
> >>>>>                          (test_bit(R5_UPTODATE, &dev->flags) ||
> >>>>>                           test_bit(R5_Discard, &dev->flags) ||
> >>>>>                           test_bit(R5_SkipCopy, &dev->flags))) {
> >>>>> 
> >>>>> I guess it is why the batch_head is not null, just my 2 cents.
> >>>> 
> >>>> 
> >>>> That’s a very good point, thanks Guoqing!
> >>>> 
> >>>> I checked another similar crash dump, and found out that this same flags was 16 (R5_Insync alone), which doesn’t meet the condition for having do_endio = true neither.
> >>>> 
> >>>> The code below in raid5_end_write_request does directly release a sh->batch_head, couldn’t that be an issue?
> >>>> 
> >>>> static void raid5_end_write_request(struct bio *bi)
> >>>> {
> >>>> 
> >>>> ...
> >>>> 
> >>>>       raid5_release_stripe(sh);
> >>>> 
> >>>>       if (sh->batch_head && sh != sh->batch_head)
> >>>>               raid5_release_stripe(sh->batch_head);
> >>>> }
> >>> 
> >>> Can you dump the full stripe_head info for this crash dump?
> >> 
> >> Yes, of course!
> >> 
> >> R13: ffff881e53f5dca0
> >> 
> >> crash> p (struct stripe_head)*0xffff881e53f5dca0
> >> $1 = {
> >>  hash = {
> >>    next = 0x0, 
> >>    pprev = 0x0
> >>  }, 
> >>  lru = {
> >>    next = 0xffff881e53f5dcb0, 
> >>    prev = 0xffff881e53f5dcb0
> >>  }, 
> >>  release_list = {
> >>    next = 0xffff881e63faec30
> >>  }, 
> >>  raid_conf = 0xffff881f9fb81400, 
> >>  generation = 0, 
> >>  sector = 2278766080, 
> >>  pd_idx = 9, 
> >>  qd_idx = 0, 
> >>  ddf_layout = 0, 
> >>  hash_lock_index = 0, 
> >>  state = 0, 
> >>  count = {
> >>    counter = 0
> >>  }, 
> >>  bm_seq = 21153188, 
> >>  disks = 10, 
> >>  overwrite_disks = 0, 
> >>  check_state = check_state_idle, 
> >>  reconstruct_state = reconstruct_state_idle, 
> >>  stripe_lock = {
> >>    {
> >>      rlock = {
> >>        raw_lock = {
> >>          {
> >>            head_tail = 3483684772, 
> >>            tickets = {
> >>              head = 53156, 
> >>              tail = 53156
> >>            }
> >>          }
> >>        }
> >>      }
> >>    }
> >>  }, 
> >>  cpu = 12, 
> >>  group = 0x0, 
> >>  batch_head = 0xffff881e53f5dca0, 
> >>  batch_lock = {
> >>    {
> >>      rlock = {
> >>        raw_lock = {
> >>          {
> >>            head_tail = 1983149620, 
> >>            tickets = {
> >>              head = 30260, 
> >>              tail = 30260
> >>            }
> >>          }
> >>        }
> >>      }
> >>    }
> >>  }, 
> >>  batch_list = {
> >>    next = 0xffff881e63faec98, 
> >>    prev = 0xffff881e5c390088
> >>  }, 
> >>  log_io = 0x0, 
> >>  log_list = {
> >>    next = 0x0, 
> >>    prev = 0x0
> >>  }, 
> >>  ops = {
> >>    target = 0, 
> >>    target2 = 0, 
> >>    zero_sum_result = (unknown: 0)
> >>  }, 
> >>  dev = {{
> >>      req = {
> >>        bi_sector = 0, 
> >>        bi_next = 0x0, 
> >>        bi_bdev = 0x0, 
> >>        bi_flags = 1, 
> >>        bi_rw = 0, 
> >>        bi_vcnt = 0, 
> >>        bi_idx = 0, 
> >>        bi_phys_segments = 0, 
> >>        bi_size = 0, 
> >>        bi_seg_front_size = 0, 
> >>        bi_seg_back_size = 0, 
> >>        bi_end_io = 0x0, 
> >>        bi_private = 0xffff881e53f5dca0, 
> >>        bi_ioc = 0x0, 
> >>        bi_css = 0x0, 
> >>        bi_integrity = 0x0, 
> >>        bi_max_vecs = 1, 
> >>        bi_cnt = {
> >>          counter = 1
> >>        }, 
> >>        bi_io_vec = 0xffff881e53f5de70, 
> >>        bi_pool = 0x0, 
> >>        {
> >>          bio_aux = 0x0, 
> >>          __UNIQUE_ID_rh_kabi_hide15 = {
> >>            rh_reserved1 = 0x0
> >>          }, 
> >>          {<No data fields>}
> >>        }, 
> >>        bi_inline_vecs = 0xffff881e53f5dde8
> >>      }, 
> >>      rreq = {
> >>        bi_sector = 0, 
> >>        bi_next = 0x0, 
> >>        bi_bdev = 0x0, 
> >>        bi_flags = 1, 
> >>        bi_rw = 0, 
> >>        bi_vcnt = 0, 
> >>        bi_idx = 0, 
> >>        bi_phys_segments = 0, 
> >>        bi_size = 0, 
> >>        bi_seg_front_size = 0, 
> >>        bi_seg_back_size = 0, 
> >>        bi_end_io = 0x0, 
> >>        bi_private = 0xffff881e53f5dca0, 
> >>        bi_ioc = 0x0, 
> >>        bi_css = 0x0, 
> >>        bi_integrity = 0x0, 
> >>        bi_max_vecs = 1, 
> >>        bi_cnt = {
> >>          counter = 1
> >>        }, 
> >>        bi_io_vec = 0xffff881e53f5de80, 
> >>        bi_pool = 0x0, 
> >>        {
> >>          bio_aux = 0x0, 
> >>          __UNIQUE_ID_rh_kabi_hide15 = {
> >>            rh_reserved1 = 0x0
> >>          }, 
> >>          {<No data fields>}
> >>        }, 
> >>        bi_inline_vecs = 0xffff881e53f5de70
> >>      }, 
> >>      vec = {
> >>        bv_page = 0xffffea00794fe000, 
> >>        bv_len = 4096, 
> >>        bv_offset = 0
> >>      }, 
> >>      rvec = {
> >>        bv_page = 0x0, 
> >>        bv_len = 0, 
> >>        bv_offset = 0
> >>      }, 
> >>      page = 0xffffea00794fe000, 
> >>      orig_page = 0xffffea00794fe000, 
> >>      toread = 0x0, 
> >>      read = 0x0, 
> >>      towrite = 0x0, 
> >>      written = 0x0, 
> >>      sector = 0, 
> >>      flags = 16, 
> >>      log_checksum = 0
> >>    }}
> >> }
> >> 
> >> 
> >> 
> >>> Also stripe_head
> >>> should include 10 dev for your 10-disk raid, but crash dump only show one, is
> >>> it possible you can dump the info for the devs in the stripe?
> >> 
> >> Oh right, crash didn’t display it probably because it is an array allocated dynamically.
> >> 
> >> Let’s try this to print the 10 devs:
> >> 
> >> crash> p ((struct stripe_head)*0xffff881e53f5dca0).dev@10
> >> $10 = 
> >> {{{
> >>      req = {
> >>        bi_sector = 0, 
> >>        bi_next = 0x0, 
> >>        bi_bdev = 0x0, 
> >>        bi_flags = 1, 
> >>        bi_rw = 0, 
> >>        bi_vcnt = 0, 
> >>        bi_idx = 0, 
> >>        bi_phys_segments = 0, 
> >>        bi_size = 0, 
> >>        bi_seg_front_size = 0, 
> >>        bi_seg_back_size = 0, 
> >>        bi_end_io = 0x0, 
> >>        bi_private = 0xffff881e53f5dca0, 
> >>        bi_ioc = 0x0, 
> >>        bi_css = 0x0, 
> >>        bi_integrity = 0x0, 
> >>        bi_max_vecs = 1, 
> >>        bi_cnt = {
> >>          counter = 1
> >>        }, 
> >>        bi_io_vec = 0xffff881e53f5de70, 
> >>        bi_pool = 0x0, 
> >>        {
> >>          bio_aux = 0x0, 
> >>          __UNIQUE_ID_rh_kabi_hide15 = {
> >>            rh_reserved1 = 0x0
> >>          }, 
> >>          {<No data fields>}
> >>        }, 
> >>        bi_inline_vecs = 0xffff881e53f5dde8
> >>      }, 
> >>      rreq = {
> >>        bi_sector = 0, 
> >>        bi_next = 0x0, 
> >>        bi_bdev = 0x0, 
> >>        bi_flags = 1, 
> >>        bi_rw = 0, 
> >>        bi_vcnt = 0, 
> >>        bi_idx = 0, 
> >>        bi_phys_segments = 0, 
> >>        bi_size = 0, 
> >>        bi_seg_front_size = 0, 
> >>        bi_seg_back_size = 0, 
> >>        bi_end_io = 0x0, 
> >>        bi_private = 0xffff881e53f5dca0, 
> >>        bi_ioc = 0x0, 
> >>        bi_css = 0x0, 
> >>        bi_integrity = 0x0, 
> >>        bi_max_vecs = 1, 
> >>        bi_cnt = {
> >>          counter = 1
> >>        }, 
> >>        bi_io_vec = 0xffff881e53f5de80, 
> >>        bi_pool = 0x0, 
> >>        {
> >>          bio_aux = 0x0, 
> >>          __UNIQUE_ID_rh_kabi_hide15 = {
> >>            rh_reserved1 = 0x0
> >>          }, 
> >>          {<No data fields>}
> >>        }, 
> >>        bi_inline_vecs = 0xffff881e53f5de70
> >>      }, 
> >>      vec = {
> >>        bv_page = 0xffffea00794fe000, 
> >>        bv_len = 4096, 
> >>        bv_offset = 0
> >>      }, 
> >>      rvec = {
> >>        bv_page = 0x0, 
> >>        bv_len = 0, 
> >>        bv_offset = 0
> >>      }, 
> >>      page = 0xffffea00794fe000, 
> >>      orig_page = 0xffffea00794fe000, 
> >>      toread = 0x0, 
> >>      read = 0x0, 
> >>      towrite = 0x0, 
> >>      written = 0x0, 
> >>      sector = 0, 
> >>      flags = 16, 
> >>      log_checksum = 0
> >>    }}, {{
> >>      req = {
> >>        bi_sector = 0, 
> >>        bi_next = 0x0, 
> >>        bi_bdev = 0x0, 
> >>        bi_flags = 1, 
> >>        bi_rw = 0, 
> >>        bi_vcnt = 0, 
> >>        bi_idx = 0, 
> >>        bi_phys_segments = 0, 
> >>        bi_size = 0, 
> >>        bi_seg_front_size = 0, 
> >>        bi_seg_back_size = 0, 
> >>        bi_end_io = 0x0, 
> >>        bi_private = 0xffff881e53f5dca0, 
> >>        bi_ioc = 0x0, 
> >>        bi_css = 0x0, 
> >>        bi_integrity = 0x0, 
> >>        bi_max_vecs = 1, 
> >>        bi_cnt = {
> >>          counter = 1
> >>        }, 
> >>        bi_io_vec = 0xffff881e53f5dfe8, 
> >>        bi_pool = 0x0, 
> >>        {
> >>          bio_aux = 0x0, 
> >>          __UNIQUE_ID_rh_kabi_hide15 = {
> >>            rh_reserved1 = 0x0
> >>          }, 
> >>          {<No data fields>}
> >>        }, 
> >>        bi_inline_vecs = 0xffff881e53f5df60
> >>      }, 
> >>      rreq = {
> >>        bi_sector = 0, 
> >>        bi_next = 0x0, 
> >>        bi_bdev = 0x0, 
> >>        bi_flags = 1, 
> >>        bi_rw = 0, 
> >>        bi_vcnt = 0, 
> >>        bi_idx = 0, 
> >>        bi_phys_segments = 0, 
> >>        bi_size = 0, 
> >>        bi_seg_front_size = 0, 
> >>        bi_seg_back_size = 0, 
> >>        bi_end_io = 0x0, 
> >>        bi_private = 0xffff881e53f5dca0, 
> >>        bi_ioc = 0x0, 
> >>        bi_css = 0x0, 
> >>        bi_integrity = 0x0, 
> >>        bi_max_vecs = 1, 
> >>        bi_cnt = {
> >>          counter = 1
> >>        }, 
> >>        bi_io_vec = 0xffff881e53f5dff8, 
> >>        bi_pool = 0x0, 
> >>        {
> >>          bio_aux = 0x0, 
> >>          __UNIQUE_ID_rh_kabi_hide15 = {
> >>            rh_reserved1 = 0x0
> >>          }, 
> >>          {<No data fields>}
> >>        }, 
> >>        bi_inline_vecs = 0xffff881e53f5dfe8
> >>      }, 
> >>      vec = {
> >>        bv_page = 0xffffea00794fe040, 
> >>        bv_len = 4096, 
> >>        bv_offset = 0
> >>      }, 
> >>      rvec = {
> >>        bv_page = 0x0, 
> >>        bv_len = 0, 
> >>        bv_offset = 0
> >>      }, 
> >>      page = 0xffffea00794fe040, 
> >>      orig_page = 0xffffea00794fe040, 
> >>      toread = 0x0, 
> >>      read = 0x0, 
> >>      towrite = 0x0, 
> >>      written = 0xffff881fea351800, 
> >>      sector = 18230128640, 
> >>      flags = 24, 
> >>      log_checksum = 0
> >>    }}, {{
> >>      req = {
> >>        bi_sector = 0, 
> >>        bi_next = 0x0, 
> >>        bi_bdev = 0x0, 
> >>        bi_flags = 1, 
> >>        bi_rw = 0, 
> >>        bi_vcnt = 0, 
> >>        bi_idx = 0, 
> >>        bi_phys_segments = 0, 
> >>        bi_size = 0, 
> >>        bi_seg_front_size = 0, 
> >>        bi_seg_back_size = 0, 
> >>        bi_end_io = 0x0, 
> >>        bi_private = 0xffff881e53f5dca0, 
> >>        bi_ioc = 0x0, 
> >>        bi_css = 0x0, 
> >>        bi_integrity = 0x0, 
> >>        bi_max_vecs = 1, 
> >>        bi_cnt = {
> >>          counter = 1
> >>        }, 
> >>        bi_io_vec = 0xffff881e53f5e160, 
> >>        bi_pool = 0x0, 
> >>        {
> >>          bio_aux = 0x0, 
> >>          __UNIQUE_ID_rh_kabi_hide15 = {
> >>            rh_reserved1 = 0x0
> >>          }, 
> >>          {<No data fields>}
> >>        }, 
> >>        bi_inline_vecs = 0xffff881e53f5e0d8
> >>      }, 
> >>      rreq = {
> >>        bi_sector = 0, 
> >>        bi_next = 0x0, 
> >>        bi_bdev = 0x0, 
> >>        bi_flags = 1, 
> >>        bi_rw = 0, 
> >>        bi_vcnt = 0, 
> >>        bi_idx = 0, 
> >>        bi_phys_segments = 0, 
> >>        bi_size = 0, 
> >>        bi_seg_front_size = 0, 
> >>        bi_seg_back_size = 0, 
> >>        bi_end_io = 0x0, 
> >>        bi_private = 0xffff881e53f5dca0, 
> >>        bi_ioc = 0x0, 
> >>        bi_css = 0x0, 
> >>        bi_integrity = 0x0, 
> >>        bi_max_vecs = 1, 
> >>        bi_cnt = {
> >>          counter = 1
> >>        }, 
> >>        bi_io_vec = 0xffff881e53f5e170, 
> >>        bi_pool = 0x0, 
> >>        {
> >>          bio_aux = 0x0, 
> >>          __UNIQUE_ID_rh_kabi_hide15 = {
> >>            rh_reserved1 = 0x0
> >>          }, 
> >>          {<No data fields>}
> >>        }, 
> >>        bi_inline_vecs = 0xffff881e53f5e160
> >>      }, 
> >>      vec = {
> >>        bv_page = 0xffffea00794fe080, 
> >>        bv_len = 4096, 
> >>        bv_offset = 0
> >>      }, 
> >>      rvec = {
> >>        bv_page = 0x0, 
> >>        bv_len = 0, 
> >>        bv_offset = 0
> >>      }, 
> >>      page = 0xffffea00794fe080, 
> >>      orig_page = 0xffffea00794fe080, 
> >>      toread = 0x0, 
> >>      read = 0x0, 
> >>      towrite = 0x0, 
> >>      written = 0xffff881fea351800, 
> >>      sector = 18230128768, 
> >>      flags = 24, 
> >>      log_checksum = 0
> >>    }}, {{
> >>      req = {
> >>        bi_sector = 0, 
> >>        bi_next = 0x0, 
> >>        bi_bdev = 0x0, 
> >>        bi_flags = 1, 
> >>        bi_rw = 0, 
> >>        bi_vcnt = 0, 
> >>        bi_idx = 0, 
> >>        bi_phys_segments = 0, 
> >>        bi_size = 0, 
> >>        bi_seg_front_size = 0, 
> >>        bi_seg_back_size = 0, 
> >>        bi_end_io = 0x0, 
> >>        bi_private = 0xffff881e53f5dca0, 
> >>        bi_ioc = 0x0, 
> >>        bi_css = 0x0, 
> >>        bi_integrity = 0x0, 
> >>        bi_max_vecs = 1, 
> >>        bi_cnt = {
> >>          counter = 1
> >>        }, 
> >>        bi_io_vec = 0xffff881e53f5e2d8, 
> >>        bi_pool = 0x0, 
> >>        {
> >>          bio_aux = 0x0, 
> >>          __UNIQUE_ID_rh_kabi_hide15 = {
> >>            rh_reserved1 = 0x0
> >>          }, 
> >>          {<No data fields>}
> >>        }, 
> >>        bi_inline_vecs = 0xffff881e53f5e250
> >>      }, 
> >>      rreq = {
> >>        bi_sector = 0, 
> >>        bi_next = 0x0, 
> >>        bi_bdev = 0x0, 
> >>        bi_flags = 1, 
> >>        bi_rw = 0, 
> >>        bi_vcnt = 0, 
> >>        bi_idx = 0, 
> >>        bi_phys_segments = 0, 
> >>        bi_size = 0, 
> >>        bi_seg_front_size = 0, 
> >>        bi_seg_back_size = 0, 
> >>        bi_end_io = 0x0, 
> >>        bi_private = 0xffff881e53f5dca0, 
> >>        bi_ioc = 0x0, 
> >>        bi_css = 0x0, 
> >>        bi_integrity = 0x0, 
> >>        bi_max_vecs = 1, 
> >>        bi_cnt = {
> >>          counter = 1
> >>        }, 
> >>        bi_io_vec = 0xffff881e53f5e2e8, 
> >>        bi_pool = 0x0, 
> >>        {
> >>          bio_aux = 0x0, 
> >>          __UNIQUE_ID_rh_kabi_hide15 = {
> >>            rh_reserved1 = 0x0
> >>          }, 
> >>          {<No data fields>}
> >>        }, 
> >>        bi_inline_vecs = 0xffff881e53f5e2d8
> >>      }, 
> >>      vec = {
> >>        bv_page = 0xffffea00794fe0c0, 
> >>        bv_len = 4096, 
> >>        bv_offset = 0
> >>      }, 
> >>      rvec = {
> >>        bv_page = 0x0, 
> >>        bv_len = 0, 
> >>        bv_offset = 0
> >>      }, 
> >>      page = 0xffffea00794fe0c0, 
> >>      orig_page = 0xffffea00794fe0c0, 
> >>      toread = 0x0, 
> >>      read = 0x0, 
> >>      towrite = 0x0, 
> >>      written = 0xffff881fea351800, 
> >>      sector = 18230128896, 
> >>      flags = 24, 
> >>      log_checksum = 0
> >>    }}, {{
> >>      req = {
> >>        bi_sector = 0, 
> >>        bi_next = 0x0, 
> >>        bi_bdev = 0x0, 
> >>        bi_flags = 1, 
> >>        bi_rw = 0, 
> >>        bi_vcnt = 0, 
> >>        bi_idx = 0, 
> >>        bi_phys_segments = 0, 
> >>        bi_size = 0, 
> >>        bi_seg_front_size = 0, 
> >>        bi_seg_back_size = 0, 
> >>        bi_end_io = 0x0, 
> >>        bi_private = 0xffff881e53f5dca0, 
> >>        bi_ioc = 0x0, 
> >>        bi_css = 0x0, 
> >>        bi_integrity = 0x0, 
> >>        bi_max_vecs = 1, 
> >>        bi_cnt = {
> >>          counter = 1
> >>        }, 
> >>        bi_io_vec = 0xffff881e53f5e450, 
> >>        bi_pool = 0x0, 
> >>        {
> >>          bio_aux = 0x0, 
> >>          __UNIQUE_ID_rh_kabi_hide15 = {
> >>            rh_reserved1 = 0x0
> >>          }, 
> >>          {<No data fields>}
> >>        }, 
> >>        bi_inline_vecs = 0xffff881e53f5e3c8
> >>      }, 
> >>      rreq = {
> >>        bi_sector = 0, 
> >>        bi_next = 0x0, 
> >>        bi_bdev = 0x0, 
> >>        bi_flags = 1, 
> >>        bi_rw = 0, 
> >>        bi_vcnt = 0, 
> >>        bi_idx = 0, 
> >>        bi_phys_segments = 0, 
> >>        bi_size = 0, 
> >>        bi_seg_front_size = 0, 
> >>        bi_seg_back_size = 0, 
> >>        bi_end_io = 0x0, 
> >>        bi_private = 0xffff881e53f5dca0, 
> >>        bi_ioc = 0x0, 
> >>        bi_css = 0x0, 
> >>        bi_integrity = 0x0, 
> >>        bi_max_vecs = 1, 
> >>        bi_cnt = {
> >>          counter = 1
> >>        }, 
> >>        bi_io_vec = 0xffff881e53f5e460, 
> >>        bi_pool = 0x0, 
> >>        {
> >>          bio_aux = 0x0, 
> >>          __UNIQUE_ID_rh_kabi_hide15 = {
> >>            rh_reserved1 = 0x0
> >>          }, 
> >>          {<No data fields>}
> >>        }, 
> >>        bi_inline_vecs = 0xffff881e53f5e450
> >>      }, 
> >>      vec = {
> >>        bv_page = 0xffffea00794fe100, 
> >>        bv_len = 4096, 
> >>        bv_offset = 0
> >>      }, 
> >>      rvec = {
> >>        bv_page = 0x0, 
> >>        bv_len = 0, 
> >>        bv_offset = 0
> >>      }, 
> >>      page = 0xffffea00794fe100, 
> >>      orig_page = 0xffffea00794fe100, 
> >>      toread = 0x0, 
> >>      read = 0x0, 
> >>      towrite = 0x0, 
> >>      written = 0xffff881fea351800, 
> >>      sector = 18230129024, 
> >>      flags = 24, 
> >>      log_checksum = 0
> >>    }}, {{
> >>      req = {
> >>        bi_sector = 0, 
> >>        bi_next = 0x0, 
> >>        bi_bdev = 0x0, 
> >>        bi_flags = 1, 
> >>        bi_rw = 0, 
> >>        bi_vcnt = 0, 
> >>        bi_idx = 0, 
> >>        bi_phys_segments = 0, 
> >>        bi_size = 0, 
> >>        bi_seg_front_size = 0, 
> >>        bi_seg_back_size = 0, 
> >>        bi_end_io = 0x0, 
> >>        bi_private = 0xffff881e53f5dca0, 
> >>        bi_ioc = 0x0, 
> >>        bi_css = 0x0, 
> >>        bi_integrity = 0x0, 
> >>        bi_max_vecs = 1, 
> >>        bi_cnt = {
> >>          counter = 1
> >>        }, 
> >>        bi_io_vec = 0xffff881e53f5e5c8, 
> >>        bi_pool = 0x0, 
> >>        {
> >>          bio_aux = 0x0, 
> >>          __UNIQUE_ID_rh_kabi_hide15 = {
> >>            rh_reserved1 = 0x0
> >>          }, 
> >>          {<No data fields>}
> >>        }, 
> >>        bi_inline_vecs = 0xffff881e53f5e540
> >>      }, 
> >>      rreq = {
> >>        bi_sector = 0, 
> >>        bi_next = 0x0, 
> >>        bi_bdev = 0x0, 
> >>        bi_flags = 1, 
> >>        bi_rw = 0, 
> >>        bi_vcnt = 0, 
> >>        bi_idx = 0, 
> >>        bi_phys_segments = 0, 
> >>        bi_size = 0, 
> >>        bi_seg_front_size = 0, 
> >>        bi_seg_back_size = 0, 
> >>        bi_end_io = 0x0, 
> >>        bi_private = 0xffff881e53f5dca0, 
> >>        bi_ioc = 0x0, 
> >>        bi_css = 0x0, 
> >>        bi_integrity = 0x0, 
> >>        bi_max_vecs = 1, 
> >>        bi_cnt = {
> >>          counter = 1
> >>        }, 
> >>        bi_io_vec = 0xffff881e53f5e5d8, 
> >>        bi_pool = 0x0, 
> >>        {
> >>          bio_aux = 0x0, 
> >>          __UNIQUE_ID_rh_kabi_hide15 = {
> >>            rh_reserved1 = 0x0
> >>          }, 
> >>          {<No data fields>}
> >>        }, 
> >>        bi_inline_vecs = 0xffff881e53f5e5c8
> >>      }, 
> >>      vec = {
> >>        bv_page = 0xffffea00794fe140, 
> >>        bv_len = 4096, 
> >>        bv_offset = 0
> >>      }, 
> >>      rvec = {
> >>        bv_page = 0x0, 
> >>        bv_len = 0, 
> >>        bv_offset = 0
> >>      }, 
> >>      page = 0xffffea00794fe140, 
> >>      orig_page = 0xffffea00794fe140, 
> >>      toread = 0x0, 
> >>      read = 0x0, 
> >>      towrite = 0x0, 
> >>      written = 0xffff881fea351800, 
> >>      sector = 18230129152, 
> >>      flags = 24, 
> >>      log_checksum = 0
> >>    }}, {{
> >>      req = {
> >>        bi_sector = 0, 
> >>        bi_next = 0x0, 
> >>        bi_bdev = 0x0, 
> >>        bi_flags = 1, 
> >>        bi_rw = 0, 
> >>        bi_vcnt = 0, 
> >>        bi_idx = 0, 
> >>        bi_phys_segments = 0, 
> >>        bi_size = 0, 
> >>        bi_seg_front_size = 0, 
> >>        bi_seg_back_size = 0, 
> >>        bi_end_io = 0x0, 
> >>        bi_private = 0xffff881e53f5dca0, 
> >>        bi_ioc = 0x0, 
> >>        bi_css = 0x0, 
> >>        bi_integrity = 0x0, 
> >>        bi_max_vecs = 1, 
> >>        bi_cnt = {
> >>          counter = 1
> >>        }, 
> >>        bi_io_vec = 0xffff881e53f5e740, 
> >>        bi_pool = 0x0, 
> >>        {
> >>          bio_aux = 0x0, 
> >>          __UNIQUE_ID_rh_kabi_hide15 = {
> >>            rh_reserved1 = 0x0
> >>          }, 
> >>          {<No data fields>}
> >>        }, 
> >>        bi_inline_vecs = 0xffff881e53f5e6b8
> >>      }, 
> >>      rreq = {
> >>        bi_sector = 0, 
> >>        bi_next = 0x0, 
> >>        bi_bdev = 0x0, 
> >>        bi_flags = 1, 
> >>        bi_rw = 0, 
> >>        bi_vcnt = 0, 
> >>        bi_idx = 0, 
> >>        bi_phys_segments = 0, 
> >>        bi_size = 0, 
> >>        bi_seg_front_size = 0, 
> >>        bi_seg_back_size = 0, 
> >>        bi_end_io = 0x0, 
> >>        bi_private = 0xffff881e53f5dca0, 
> >>        bi_ioc = 0x0, 
> >>        bi_css = 0x0, 
> >>        bi_integrity = 0x0, 
> >>        bi_max_vecs = 1, 
> >>        bi_cnt = {
> >>          counter = 1
> >>        }, 
> >>        bi_io_vec = 0xffff881e53f5e750, 
> >>        bi_pool = 0x0, 
> >>        {
> >>          bio_aux = 0x0, 
> >>          __UNIQUE_ID_rh_kabi_hide15 = {
> >>            rh_reserved1 = 0x0
> >>          }, 
> >>          {<No data fields>}
> >>        }, 
> >>        bi_inline_vecs = 0xffff881e53f5e740
> >>      }, 
> >>      vec = {
> >>        bv_page = 0xffffea00794fe180, 
> >>        bv_len = 4096, 
> >>        bv_offset = 0
> >>      }, 
> >>      rvec = {
> >>        bv_page = 0x0, 
> >>        bv_len = 0, 
> >>        bv_offset = 0
> >>      }, 
> >>      page = 0xffffea00794fe180, 
> >>      orig_page = 0xffffea00794fe180, 
> >>      toread = 0x0, 
> >>      read = 0x0, 
> >>      towrite = 0x0, 
> >>      written = 0xffff881fea351800, 
> >>      sector = 18230129280, 
> >>      flags = 24, 
> >>      log_checksum = 0
> >>    }}, {{
> >>      req = {
> >>        bi_sector = 0, 
> >>        bi_next = 0x0, 
> >>        bi_bdev = 0x0, 
> >>        bi_flags = 1, 
> >>        bi_rw = 0, 
> >>        bi_vcnt = 0, 
> >>        bi_idx = 0, 
> >>        bi_phys_segments = 0, 
> >>        bi_size = 0, 
> >>        bi_seg_front_size = 0, 
> >>        bi_seg_back_size = 0, 
> >>        bi_end_io = 0x0, 
> >>        bi_private = 0xffff881e53f5dca0, 
> >>        bi_ioc = 0x0, 
> >>        bi_css = 0x0, 
> >>        bi_integrity = 0x0, 
> >>        bi_max_vecs = 1, 
> >>        bi_cnt = {
> >>          counter = 1
> >>        }, 
> >>        bi_io_vec = 0xffff881e53f5e8b8, 
> >>        bi_pool = 0x0, 
> >>        {
> >>          bio_aux = 0x0, 
> >>          __UNIQUE_ID_rh_kabi_hide15 = {
> >>            rh_reserved1 = 0x0
> >>          }, 
> >>          {<No data fields>}
> >>        }, 
> >>        bi_inline_vecs = 0xffff881e53f5e830
> >>      }, 
> >>      rreq = {
> >>        bi_sector = 0, 
> >>        bi_next = 0x0, 
> >>        bi_bdev = 0x0, 
> >>        bi_flags = 1, 
> >>        bi_rw = 0, 
> >>        bi_vcnt = 0, 
> >>        bi_idx = 0, 
> >>        bi_phys_segments = 0, 
> >>        bi_size = 0, 
> >>        bi_seg_front_size = 0, 
> >>        bi_seg_back_size = 0, 
> >>        bi_end_io = 0x0, 
> >>        bi_private = 0xffff881e53f5dca0, 
> >>        bi_ioc = 0x0, 
> >>        bi_css = 0x0, 
> >>        bi_integrity = 0x0, 
> >>        bi_max_vecs = 1, 
> >>        bi_cnt = {
> >>          counter = 1
> >>        }, 
> >>        bi_io_vec = 0xffff881e53f5e8c8, 
> >>        bi_pool = 0x0, 
> >>        {
> >>          bio_aux = 0x0, 
> >>          __UNIQUE_ID_rh_kabi_hide15 = {
> >>            rh_reserved1 = 0x0
> >>          }, 
> >>          {<No data fields>}
> >>        }, 
> >>        bi_inline_vecs = 0xffff881e53f5e8b8
> >>      }, 
> >>      vec = {
> >>        bv_page = 0xffffea00794fe1c0, 
> >>        bv_len = 4096, 
> >>        bv_offset = 0
> >>      }, 
> >>      rvec = {
> >>        bv_page = 0x0, 
> >>        bv_len = 0, 
> >>        bv_offset = 0
> >>      }, 
> >>      page = 0xffffea00794fe1c0, 
> >>      orig_page = 0xffffea00794fe1c0, 
> >>      toread = 0x0, 
> >>      read = 0x0, 
> >>      towrite = 0x0, 
> >>      written = 0xffff881fea351800, 
> >>      sector = 18230129408, 
> >>      flags = 24, 
> >>      log_checksum = 0
> >>    }}, {{
> >>      req = {
> >>        bi_sector = 0, 
> >>        bi_next = 0x0, 
> >>        bi_bdev = 0x0, 
> >>        bi_flags = 1, 
> >>        bi_rw = 0, 
> >>        bi_vcnt = 0, 
> >>        bi_idx = 0, 
> >>        bi_phys_segments = 0, 
> >>        bi_size = 0, 
> >>        bi_seg_front_size = 0, 
> >>        bi_seg_back_size = 0, 
> >>        bi_end_io = 0x0, 
> >>        bi_private = 0xffff881e53f5dca0, 
> >>        bi_ioc = 0x0, 
> >>        bi_css = 0x0, 
> >>        bi_integrity = 0x0, 
> >>        bi_max_vecs = 1, 
> >>        bi_cnt = {
> >>          counter = 1
> >>        }, 
> >>        bi_io_vec = 0xffff881e53f5ea30, 
> >>        bi_pool = 0x0, 
> >>        {
> >>          bio_aux = 0x0, 
> >>          __UNIQUE_ID_rh_kabi_hide15 = {
> >>            rh_reserved1 = 0x0
> >>          }, 
> >>          {<No data fields>}
> >>        }, 
> >>        bi_inline_vecs = 0xffff881e53f5e9a8
> >>      }, 
> >>      rreq = {
> >>        bi_sector = 0, 
> >>        bi_next = 0x0, 
> >>        bi_bdev = 0x0, 
> >>        bi_flags = 1, 
> >>        bi_rw = 0, 
> >>        bi_vcnt = 0, 
> >>        bi_idx = 0, 
> >>        bi_phys_segments = 0, 
> >>        bi_size = 0, 
> >>        bi_seg_front_size = 0, 
> >>        bi_seg_back_size = 0, 
> >>        bi_end_io = 0x0, 
> >>        bi_private = 0xffff881e53f5dca0, 
> >>        bi_ioc = 0x0, 
> >>        bi_css = 0x0, 
> >>        bi_integrity = 0x0, 
> >>        bi_max_vecs = 1, 
> >>        bi_cnt = {
> >>          counter = 1
> >>        }, 
> >>        bi_io_vec = 0xffff881e53f5ea40, 
> >>        bi_pool = 0x0, 
> >>        {
> >>          bio_aux = 0x0, 
> >>          __UNIQUE_ID_rh_kabi_hide15 = {
> >>            rh_reserved1 = 0x0
> >>          }, 
> >>          {<No data fields>}
> >>        }, 
> >>        bi_inline_vecs = 0xffff881e53f5ea30
> >>      }, 
> >>      vec = {
> >>        bv_page = 0xffffea00794fe200, 
> >>        bv_len = 4096, 
> >>        bv_offset = 0
> >>      }, 
> >>      rvec = {
> >>        bv_page = 0x0, 
> >>        bv_len = 0, 
> >>        bv_offset = 0
> >>      }, 
> >>      page = 0xffffea00794fe200, 
> >>      orig_page = 0xffffea00794fe200, 
> >>      toread = 0x0, 
> >>      read = 0x0, 
> >>      towrite = 0x0, 
> >>      written = 0xffff881fea351800, 
> >>      sector = 18230129536, 
> >>      flags = 24, 
> >>      log_checksum = 0
> >>    }}, {{
> >>      req = {
> >>        bi_sector = 0, 
> >>        bi_next = 0x0, 
> >>        bi_bdev = 0x0, 
> >>        bi_flags = 1, 
> >>        bi_rw = 0, 
> >>        bi_vcnt = 0, 
> >>        bi_idx = 0, 
> >>        bi_phys_segments = 0, 
> >>        bi_size = 0, 
> >>        bi_seg_front_size = 0, 
> >>        bi_seg_back_size = 0, 
> >>        bi_end_io = 0x0, 
> >>        bi_private = 0xffff881e53f5dca0, 
> >>        bi_ioc = 0x0, 
> >>        bi_css = 0x0, 
> >>        bi_integrity = 0x0, 
> >>        bi_max_vecs = 1, 
> >>        bi_cnt = {
> >>          counter = 1
> >>        }, 
> >>        bi_io_vec = 0xffff881e53f5eba8, 
> >>        bi_pool = 0x0, 
> >>        {
> >>          bio_aux = 0x0, 
> >>          __UNIQUE_ID_rh_kabi_hide15 = {
> >>            rh_reserved1 = 0x0
> >>          }, 
> >>          {<No data fields>}
> >>        }, 
> >>        bi_inline_vecs = 0xffff881e53f5eb20
> >>      }, 
> >>      rreq = {
> >>        bi_sector = 0, 
> >>        bi_next = 0x0, 
> >>        bi_bdev = 0x0, 
> >>        bi_flags = 1, 
> >>        bi_rw = 0, 
> >>        bi_vcnt = 0, 
> >>        bi_idx = 0, 
> >>        bi_phys_segments = 0, 
> >>        bi_size = 0, 
> >>        bi_seg_front_size = 0, 
> >>        bi_seg_back_size = 0, 
> >>        bi_end_io = 0x0, 
> >>        bi_private = 0xffff881e53f5dca0, 
> >>        bi_ioc = 0x0, 
> >>        bi_css = 0x0, 
> >>        bi_integrity = 0x0, 
> >>        bi_max_vecs = 1, 
> >>        bi_cnt = {
> >>          counter = 1
> >>        }, 
> >>        bi_io_vec = 0xffff881e53f5ebb8, 
> >>        bi_pool = 0x0, 
> >>        {
> >>          bio_aux = 0x0, 
> >>          __UNIQUE_ID_rh_kabi_hide15 = {
> >>            rh_reserved1 = 0x0
> >>          }, 
> >>          {<No data fields>}
> >>        }, 
> >>        bi_inline_vecs = 0xffff881e53f5eba8
> >>      }, 
> >>      vec = {
> >>        bv_page = 0xffffea00794fe240, 
> >>        bv_len = 4096, 
> >>        bv_offset = 0
> >>      }, 
> >>      rvec = {
> >>        bv_page = 0x0, 
> >>        bv_len = 0, 
> >>        bv_offset = 0
> >>      }, 
> >>      page = 0xffffea00794fe240, 
> >>      orig_page = 0xffffea00794fe240, 
> >>      toread = 0x0, 
> >>      read = 0x0, 
> >>      towrite = 0x0, 
> >>      written = 0x0, 
> >>      sector = 0, 
> >>      flags = 16, 
> >>      log_checksum = 0
> >>    }}}
> >> 
> >> 
> >> ---------------------------------------------------
> >> 
> >> And this is the stripe_head devs from the crash dump I sent yesterday:
> >> 
> >> R13: ffff881e08d54d30
> >> 
> >> crash> p ((struct stripe_head)*0xffff881e08d54d30).dev@10
> >> $1 = 
> >> {{{
> >>      req = {
> >>        bi_sector = 0, 
> >>        bi_next = 0x0, 
> >>        bi_bdev = 0x0, 
> >>        bi_flags = 1, 
> >>        bi_rw = 0, 
> >>        bi_vcnt = 0, 
> >>        bi_idx = 0, 
> >>        bi_phys_segments = 0, 
> >>        bi_size = 0, 
> >>        bi_seg_front_size = 0, 
> >>        bi_seg_back_size = 0, 
> >>        bi_end_io = 0x0, 
> >>        bi_private = 0xffff881e08d54d30, 
> >>        bi_ioc = 0x0, 
> >>        bi_css = 0x0, 
> >>        bi_integrity = 0x0, 
> >>        bi_max_vecs = 1, 
> >>        bi_cnt = {
> >>          counter = 1
> >>        }, 
> >>        bi_io_vec = 0xffff881e08d54f00, 
> >>        bi_pool = 0x0, 
> >>        {
> >>          bio_aux = 0x0, 
> >>          __UNIQUE_ID_rh_kabi_hide15 = {
> >>            rh_reserved1 = 0x0
> >>          }, 
> >>          {<No data fields>}
> >>        }, 
> >>        bi_inline_vecs = 0xffff881e08d54e78
> >>      }, 
> >>      rreq = {
> >>        bi_sector = 0, 
> >>        bi_next = 0x0, 
> >>        bi_bdev = 0x0, 
> >>        bi_flags = 1, 
> >>        bi_rw = 0, 
> >>        bi_vcnt = 0, 
> >>        bi_idx = 0, 
> >>        bi_phys_segments = 0, 
> >>        bi_size = 0, 
> >>        bi_seg_front_size = 0, 
> >>        bi_seg_back_size = 0, 
> >>        bi_end_io = 0x0, 
> >>        bi_private = 0xffff881e08d54d30, 
> >>        bi_ioc = 0x0, 
> >>        bi_css = 0x0, 
> >>        bi_integrity = 0x0, 
> >>        bi_max_vecs = 1, 
> >>        bi_cnt = {
> >>          counter = 1
> >>        }, 
> >>        bi_io_vec = 0xffff881e08d54f10, 
> >>        bi_pool = 0x0, 
> >>        {
> >>          bio_aux = 0x0, 
> >>          __UNIQUE_ID_rh_kabi_hide15 = {
> >>            rh_reserved1 = 0x0
> >>          }, 
> >>          {<No data fields>}
> >>        }, 
> >>        bi_inline_vecs = 0xffff881e08d54f00
> >>      }, 
> >>      vec = {
> >>        bv_page = 0xffffea0078236080, 
> >>        bv_len = 4096, 
> >>        bv_offset = 0
> >>      }, 
> >>      rvec = {
> >>        bv_page = 0x0, 
> >>        bv_len = 0, 
> >>        bv_offset = 0
> >>      }, 
> >>      page = 0xffffea0078236080, 
> >>      orig_page = 0xffffea0078236080, 
> >>      toread = 0x0, 
> >>      read = 0x0, 
> >>      towrite = 0x0, 
> >>      written = 0xffff8824832a7c00, 
> >>      sector = 29609798632, 
> >>      flags = 24, 
> >>      log_checksum = 0
> >>    }}, {{
> >>      req = {
> >>        bi_sector = 0, 
> >>        bi_next = 0x0, 
> >>        bi_bdev = 0x0, 
> >>        bi_flags = 1, 
> >>        bi_rw = 0, 
> >>        bi_vcnt = 0, 
> >>        bi_idx = 0, 
> >>        bi_phys_segments = 0, 
> >>        bi_size = 0, 
> >>        bi_seg_front_size = 0, 
> >>        bi_seg_back_size = 0, 
> >>        bi_end_io = 0x0, 
> >>        bi_private = 0xffff881e08d54d30, 
> >>        bi_ioc = 0x0, 
> >>        bi_css = 0x0, 
> >>        bi_integrity = 0x0, 
> >>        bi_max_vecs = 1, 
> >>        bi_cnt = {
> >>          counter = 1
> >>        }, 
> >>        bi_io_vec = 0xffff881e08d55078, 
> >>        bi_pool = 0x0, 
> >>        {
> >>          bio_aux = 0x0, 
> >>          __UNIQUE_ID_rh_kabi_hide15 = {
> >>            rh_reserved1 = 0x0
> >>          }, 
> >>          {<No data fields>}
> >>        }, 
> >>        bi_inline_vecs = 0xffff881e08d54ff0
> >>      }, 
> >>      rreq = {
> >>        bi_sector = 0, 
> >>        bi_next = 0x0, 
> >>        bi_bdev = 0x0, 
> >>        bi_flags = 1, 
> >>        bi_rw = 0, 
> >>        bi_vcnt = 0, 
> >>        bi_idx = 0, 
> >>        bi_phys_segments = 0, 
> >>        bi_size = 0, 
> >>        bi_seg_front_size = 0, 
> >>        bi_seg_back_size = 0, 
> >>        bi_end_io = 0x0, 
> >>        bi_private = 0xffff881e08d54d30, 
> >>        bi_ioc = 0x0, 
> >>        bi_css = 0x0, 
> >>        bi_integrity = 0x0, 
> >>        bi_max_vecs = 1, 
> >>        bi_cnt = {
> >>          counter = 1
> >>        }, 
> >>        bi_io_vec = 0xffff881e08d55088, 
> >>        bi_pool = 0x0, 
> >>        {
> >>          bio_aux = 0x0, 
> >>          __UNIQUE_ID_rh_kabi_hide15 = {
> >>            rh_reserved1 = 0x0
> >>          }, 
> >>          {<No data fields>}
> >>        }, 
> >>        bi_inline_vecs = 0xffff881e08d55078
> >>      }, 
> >>      vec = {
> >>        bv_page = 0xffffea00782360c0, 
> >>        bv_len = 4096, 
> >>        bv_offset = 0
> >>      }, 
> >>      rvec = {
> >>        bv_page = 0x0, 
> >>        bv_len = 0, 
> >>        bv_offset = 0
> >>      }, 
> >>      page = 0xffffea00782360c0, 
> >>      orig_page = 0xffffea00782360c0, 
> >>      toread = 0x0, 
> >>      read = 0x0, 
> >>      towrite = 0x0, 
> >>      written = 0x0, 
> >>      sector = 0, 
> >>      flags = 16, 
> >>      log_checksum = 0
> >>    }}, {{
> >>      req = {
> >>        bi_sector = 0, 
> >>        bi_next = 0x0, 
> >>        bi_bdev = 0x0, 
> >>        bi_flags = 1, 
> >>        bi_rw = 0, 
> >>        bi_vcnt = 0, 
> >>        bi_idx = 0, 
> >>        bi_phys_segments = 0, 
> >>        bi_size = 0, 
> >>        bi_seg_front_size = 0, 
> >>        bi_seg_back_size = 0, 
> >>        bi_end_io = 0x0, 
> >>        bi_private = 0xffff881e08d54d30, 
> >>        bi_ioc = 0x0, 
> >>        bi_css = 0x0, 
> >>        bi_integrity = 0x0, 
> >>        bi_max_vecs = 1, 
> >>        bi_cnt = {
> >>          counter = 1
> >>        }, 
> >>        bi_io_vec = 0xffff881e08d551f0, 
> >>        bi_pool = 0x0, 
> >>        {
> >>          bio_aux = 0x0, 
> >>          __UNIQUE_ID_rh_kabi_hide15 = {
> >>            rh_reserved1 = 0x0
> >>          }, 
> >>          {<No data fields>}
> >>        }, 
> >>        bi_inline_vecs = 0xffff881e08d55168
> >>      }, 
> >>      rreq = {
> >>        bi_sector = 0, 
> >>        bi_next = 0x0, 
> >>        bi_bdev = 0x0, 
> >>        bi_flags = 1, 
> >>        bi_rw = 0, 
> >>        bi_vcnt = 0, 
> >>        bi_idx = 0, 
> >>        bi_phys_segments = 0, 
> >>        bi_size = 0, 
> >>        bi_seg_front_size = 0, 
> >>        bi_seg_back_size = 0, 
> >>        bi_end_io = 0x0, 
> >>        bi_private = 0xffff881e08d54d30, 
> >>        bi_ioc = 0x0, 
> >>        bi_css = 0x0, 
> >>        bi_integrity = 0x0, 
> >>        bi_max_vecs = 1, 
> >>        bi_cnt = {
> >>          counter = 1
> >>        }, 
> >>        bi_io_vec = 0xffff881e08d55200, 
> >>        bi_pool = 0x0, 
> >>        {
> >>          bio_aux = 0x0, 
> >>          __UNIQUE_ID_rh_kabi_hide15 = {
> >>            rh_reserved1 = 0x0
> >>          }, 
> >>          {<No data fields>}
> >>        }, 
> >>        bi_inline_vecs = 0xffff881e08d551f0
> >>      }, 
> >>      vec = {
> >>        bv_page = 0xffffea0078236100, 
> >>        bv_len = 4096, 
> >>        bv_offset = 0
> >>      }, 
> >>      rvec = {
> >>        bv_page = 0x0, 
> >>        bv_len = 0, 
> >>        bv_offset = 0
> >>      }, 
> >>      page = 0xffffea0078236100, 
> >>      orig_page = 0xffffea0078236100, 
> >>      toread = 0x0, 
> >>      read = 0x0, 
> >>      towrite = 0x0, 
> >>      written = 0x0, 
> >>      sector = 0, 
> >>      flags = 16, 
> >>      log_checksum = 0
> >>    }}, {{
> >>      req = {
> >>        bi_sector = 0, 
> >>        bi_next = 0x0, 
> >>        bi_bdev = 0x0, 
> >>        bi_flags = 1, 
> >>        bi_rw = 0, 
> >>        bi_vcnt = 0, 
> >>        bi_idx = 0, 
> >>        bi_phys_segments = 0, 
> >>        bi_size = 0, 
> >>        bi_seg_front_size = 0, 
> >>        bi_seg_back_size = 0, 
> >>        bi_end_io = 0x0, 
> >>        bi_private = 0xffff881e08d54d30, 
> >>        bi_ioc = 0x0, 
> >>        bi_css = 0x0, 
> >>        bi_integrity = 0x0, 
> >>        bi_max_vecs = 1, 
> >>        bi_cnt = {
> >>          counter = 1
> >>        }, 
> >>        bi_io_vec = 0xffff881e08d55368, 
> >>        bi_pool = 0x0, 
> >>        {
> >>          bio_aux = 0x0, 
> >>          __UNIQUE_ID_rh_kabi_hide15 = {
> >>            rh_reserved1 = 0x0
> >>          }, 
> >>          {<No data fields>}
> >>        }, 
> >>        bi_inline_vecs = 0xffff881e08d552e0
> >>      }, 
> >>      rreq = {
> >>        bi_sector = 0, 
> >>        bi_next = 0x0, 
> >>        bi_bdev = 0x0, 
> >>        bi_flags = 1, 
> >>        bi_rw = 0, 
> >>        bi_vcnt = 0, 
> >>        bi_idx = 0, 
> >>        bi_phys_segments = 0, 
> >>        bi_size = 0, 
> >>        bi_seg_front_size = 0, 
> >>        bi_seg_back_size = 0, 
> >>        bi_end_io = 0x0, 
> >>        bi_private = 0xffff881e08d54d30, 
> >>        bi_ioc = 0x0, 
> >>        bi_css = 0x0, 
> >>        bi_integrity = 0x0, 
> >>        bi_max_vecs = 1, 
> >>        bi_cnt = {
> >>          counter = 1
> >>        }, 
> >>        bi_io_vec = 0xffff881e08d55378, 
> >>        bi_pool = 0x0, 
> >>        {
> >>          bio_aux = 0x0, 
> >>          __UNIQUE_ID_rh_kabi_hide15 = {
> >>            rh_reserved1 = 0x0
> >>          }, 
> >>          {<No data fields>}
> >>        }, 
> >>        bi_inline_vecs = 0xffff881e08d55368
> >>      }, 
> >>      vec = {
> >>        bv_page = 0xffffea0078236140, 
> >>        bv_len = 4096, 
> >>        bv_offset = 0
> >>      }, 
> >>      rvec = {
> >>        bv_page = 0x0, 
> >>        bv_len = 0, 
> >>        bv_offset = 0
> >>      }, 
> >>      page = 0xffffea0078236140, 
> >>      orig_page = 0xffffea0078236140, 
> >>      toread = 0x0, 
> >>      read = 0x0, 
> >>      towrite = 0x0, 
> >>      written = 0xffff8824832a7c00, 
> >>      sector = 29609797736, 
> >>      flags = 24, 
> >>      log_checksum = 0
> >>    }}, {{
> >>      req = {
> >>        bi_sector = 0, 
> >>        bi_next = 0x0, 
> >>        bi_bdev = 0x0, 
> >>        bi_flags = 1, 
> >>        bi_rw = 0, 
> >>        bi_vcnt = 0, 
> >>        bi_idx = 0, 
> >>        bi_phys_segments = 0, 
> >>        bi_size = 0, 
> >>        bi_seg_front_size = 0, 
> >>        bi_seg_back_size = 0, 
> >>        bi_end_io = 0x0, 
> >>        bi_private = 0xffff881e08d54d30, 
> >>        bi_ioc = 0x0, 
> >>        bi_css = 0x0, 
> >>        bi_integrity = 0x0, 
> >>        bi_max_vecs = 1, 
> >>        bi_cnt = {
> >>          counter = 1
> >>        }, 
> >>        bi_io_vec = 0xffff881e08d554e0, 
> >>        bi_pool = 0x0, 
> >>        {
> >>          bio_aux = 0x0, 
> >>          __UNIQUE_ID_rh_kabi_hide15 = {
> >>            rh_reserved1 = 0x0
> >>          }, 
> >>          {<No data fields>}
> >>        }, 
> >>        bi_inline_vecs = 0xffff881e08d55458
> >>      }, 
> >>      rreq = {
> >>        bi_sector = 0, 
> >>        bi_next = 0x0, 
> >>        bi_bdev = 0x0, 
> >>        bi_flags = 1, 
> >>        bi_rw = 0, 
> >>        bi_vcnt = 0, 
> >>        bi_idx = 0, 
> >>        bi_phys_segments = 0, 
> >>        bi_size = 0, 
> >>        bi_seg_front_size = 0, 
> >>        bi_seg_back_size = 0, 
> >>        bi_end_io = 0x0, 
> >>        bi_private = 0xffff881e08d54d30, 
> >>        bi_ioc = 0x0, 
> >>        bi_css = 0x0, 
> >>        bi_integrity = 0x0, 
> >>        bi_max_vecs = 1, 
> >>        bi_cnt = {
> >>          counter = 1
> >>        }, 
> >>        bi_io_vec = 0xffff881e08d554f0, 
> >>        bi_pool = 0x0, 
> >>        {
> >>          bio_aux = 0x0, 
> >>          __UNIQUE_ID_rh_kabi_hide15 = {
> >>            rh_reserved1 = 0x0
> >>          }, 
> >>          {<No data fields>}
> >>        }, 
> >>        bi_inline_vecs = 0xffff881e08d554e0
> >>      }, 
> >>      vec = {
> >>        bv_page = 0xffffea0078236180, 
> >>        bv_len = 4096, 
> >>        bv_offset = 0
> >>      }, 
> >>      rvec = {
> >>        bv_page = 0x0, 
> >>        bv_len = 0, 
> >>        bv_offset = 0
> >>      }, 
> >>      page = 0xffffea0078236180, 
> >>      orig_page = 0xffffea0078236180, 
> >>      toread = 0x0, 
> >>      read = 0x0, 
> >>      towrite = 0x0, 
> >>      written = 0xffff8824832a7c00, 
> >>      sector = 29609797864, 
> >>      flags = 24, 
> >>      log_checksum = 0
> >>    }}, {{
> >>      req = {
> >>        bi_sector = 0, 
> >>        bi_next = 0x0, 
> >>        bi_bdev = 0x0, 
> >>        bi_flags = 1, 
> >>        bi_rw = 0, 
> >>        bi_vcnt = 0, 
> >>        bi_idx = 0, 
> >>        bi_phys_segments = 0, 
> >>        bi_size = 0, 
> >>        bi_seg_front_size = 0, 
> >>        bi_seg_back_size = 0, 
> >>        bi_end_io = 0x0, 
> >>        bi_private = 0xffff881e08d54d30, 
> >>        bi_ioc = 0x0, 
> >>        bi_css = 0x0, 
> >>        bi_integrity = 0x0, 
> >>        bi_max_vecs = 1, 
> >>        bi_cnt = {
> >>          counter = 1
> >>        }, 
> >>        bi_io_vec = 0xffff881e08d55658, 
> >>        bi_pool = 0x0, 
> >>        {
> >>          bio_aux = 0x0, 
> >>          __UNIQUE_ID_rh_kabi_hide15 = {
> >>            rh_reserved1 = 0x0
> >>          }, 
> >>          {<No data fields>}
> >>        }, 
> >>        bi_inline_vecs = 0xffff881e08d555d0
> >>      }, 
> >>      rreq = {
> >>        bi_sector = 0, 
> >>        bi_next = 0x0, 
> >>        bi_bdev = 0x0, 
> >>        bi_flags = 1, 
> >>        bi_rw = 0, 
> >>        bi_vcnt = 0, 
> >>        bi_idx = 0, 
> >>        bi_phys_segments = 0, 
> >>        bi_size = 0, 
> >>        bi_seg_front_size = 0, 
> >>        bi_seg_back_size = 0, 
> >>        bi_end_io = 0x0, 
> >>        bi_private = 0xffff881e08d54d30, 
> >>        bi_ioc = 0x0, 
> >>        bi_css = 0x0, 
> >>        bi_integrity = 0x0, 
> >>        bi_max_vecs = 1, 
> >>        bi_cnt = {
> >>          counter = 1
> >>        }, 
> >>        bi_io_vec = 0xffff881e08d55668, 
> >>        bi_pool = 0x0, 
> >>        {
> >>          bio_aux = 0x0, 
> >>          __UNIQUE_ID_rh_kabi_hide15 = {
> >>            rh_reserved1 = 0x0
> >>          }, 
> >>          {<No data fields>}
> >>        }, 
> >>        bi_inline_vecs = 0xffff881e08d55658
> >>      }, 
> >>      vec = {
> >>        bv_page = 0xffffea00782361c0, 
> >>        bv_len = 4096, 
> >>        bv_offset = 0
> >>      }, 
> >>      rvec = {
> >>        bv_page = 0x0, 
> >>        bv_len = 0, 
> >>        bv_offset = 0
> >>      }, 
> >>      page = 0xffffea00782361c0, 
> >>      orig_page = 0xffffea00782361c0, 
> >>      toread = 0x0, 
> >>      read = 0x0, 
> >>      towrite = 0x0, 
> >>      written = 0xffff8824832a7c00, 
> >>      sector = 29609797992, 
> >>      flags = 24, 
> >>      log_checksum = 0
> >>    }}, {{
> >>      req = {
> >>        bi_sector = 0, 
> >>        bi_next = 0x0, 
> >>        bi_bdev = 0x0, 
> >>        bi_flags = 1, 
> >>        bi_rw = 0, 
> >>        bi_vcnt = 0, 
> >>        bi_idx = 0, 
> >>        bi_phys_segments = 0, 
> >>        bi_size = 0, 
> >>        bi_seg_front_size = 0, 
> >>        bi_seg_back_size = 0, 
> >>        bi_end_io = 0x0, 
> >>        bi_private = 0xffff881e08d54d30, 
> >>        bi_ioc = 0x0, 
> >>        bi_css = 0x0, 
> >>        bi_integrity = 0x0, 
> >>        bi_max_vecs = 1, 
> >>        bi_cnt = {
> >>          counter = 1
> >>        }, 
> >>        bi_io_vec = 0xffff881e08d557d0, 
> >>        bi_pool = 0x0, 
> >>        {
> >>          bio_aux = 0x0, 
> >>          __UNIQUE_ID_rh_kabi_hide15 = {
> >>            rh_reserved1 = 0x0
> >>          }, 
> >>          {<No data fields>}
> >>        }, 
> >>        bi_inline_vecs = 0xffff881e08d55748
> >>      }, 
> >>      rreq = {
> >>        bi_sector = 0, 
> >>        bi_next = 0x0, 
> >>        bi_bdev = 0x0, 
> >>        bi_flags = 1, 
> >>        bi_rw = 0, 
> >>        bi_vcnt = 0, 
> >>        bi_idx = 0, 
> >>        bi_phys_segments = 0, 
> >>        bi_size = 0, 
> >>        bi_seg_front_size = 0, 
> >>        bi_seg_back_size = 0, 
> >>        bi_end_io = 0x0, 
> >>        bi_private = 0xffff881e08d54d30, 
> >>        bi_ioc = 0x0, 
> >>        bi_css = 0x0, 
> >>        bi_integrity = 0x0, 
> >>        bi_max_vecs = 1, 
> >>        bi_cnt = {
> >>          counter = 1
> >>        }, 
> >>        bi_io_vec = 0xffff881e08d557e0, 
> >>        bi_pool = 0x0, 
> >>        {
> >>          bio_aux = 0x0, 
> >>          __UNIQUE_ID_rh_kabi_hide15 = {
> >>            rh_reserved1 = 0x0
> >>          }, 
> >>          {<No data fields>}
> >>        }, 
> >>        bi_inline_vecs = 0xffff881e08d557d0
> >>      }, 
> >>      vec = {
> >>        bv_page = 0xffffea0078236200, 
> >>        bv_len = 4096, 
> >>        bv_offset = 0
> >>      }, 
> >>      rvec = {
> >>        bv_page = 0x0, 
> >>        bv_len = 0, 
> >>        bv_offset = 0
> >>      }, 
> >>      page = 0xffffea0078236200, 
> >>      orig_page = 0xffffea0078236200, 
> >>      toread = 0x0, 
> >>      read = 0x0, 
> >>      towrite = 0x0, 
> >>      written = 0xffff8824832a7c00, 
> >>      sector = 29609798120, 
> >>      flags = 24, 
> >>      log_checksum = 0
> >>    }}, {{
> >>      req = {
> >>        bi_sector = 0, 
> >>        bi_next = 0x0, 
> >>        bi_bdev = 0x0, 
> >>        bi_flags = 1, 
> >>        bi_rw = 0, 
> >>        bi_vcnt = 0, 
> >>        bi_idx = 0, 
> >>        bi_phys_segments = 0, 
> >>        bi_size = 0, 
> >>        bi_seg_front_size = 0, 
> >>        bi_seg_back_size = 0, 
> >>        bi_end_io = 0x0, 
> >>        bi_private = 0xffff881e08d54d30, 
> >>        bi_ioc = 0x0, 
> >>        bi_css = 0x0, 
> >>        bi_integrity = 0x0, 
> >>        bi_max_vecs = 1, 
> >>        bi_cnt = {
> >>          counter = 1
> >>        }, 
> >>        bi_io_vec = 0xffff881e08d55948, 
> >>        bi_pool = 0x0, 
> >>        {
> >>          bio_aux = 0x0, 
> >>          __UNIQUE_ID_rh_kabi_hide15 = {
> >>            rh_reserved1 = 0x0
> >>          }, 
> >>          {<No data fields>}
> >>        }, 
> >>        bi_inline_vecs = 0xffff881e08d558c0
> >>      }, 
> >>      rreq = {
> >>        bi_sector = 0, 
> >>        bi_next = 0x0, 
> >>        bi_bdev = 0x0, 
> >>        bi_flags = 1, 
> >>        bi_rw = 0, 
> >>        bi_vcnt = 0, 
> >>        bi_idx = 0, 
> >>        bi_phys_segments = 0, 
> >>        bi_size = 0, 
> >>        bi_seg_front_size = 0, 
> >>        bi_seg_back_size = 0, 
> >>        bi_end_io = 0x0, 
> >>        bi_private = 0xffff881e08d54d30, 
> >>        bi_ioc = 0x0, 
> >>        bi_css = 0x0, 
> >>        bi_integrity = 0x0, 
> >>        bi_max_vecs = 1, 
> >>        bi_cnt = {
> >>          counter = 1
> >>        }, 
> >>        bi_io_vec = 0xffff881e08d55958, 
> >>        bi_pool = 0x0, 
> >>        {
> >>          bio_aux = 0x0, 
> >>          __UNIQUE_ID_rh_kabi_hide15 = {
> >>            rh_reserved1 = 0x0
> >>          }, 
> >>          {<No data fields>}
> >>        }, 
> >>        bi_inline_vecs = 0xffff881e08d55948
> >>      }, 
> >>      vec = {
> >>        bv_page = 0xffffea0078236240, 
> >>        bv_len = 4096, 
> >>        bv_offset = 0
> >>      }, 
> >>      rvec = {
> >>        bv_page = 0x0, 
> >>        bv_len = 0, 
> >>        bv_offset = 0
> >>      }, 
> >>      page = 0xffffea0078236240, 
> >>      orig_page = 0xffffea0078236240, 
> >>      toread = 0x0, 
> >>      read = 0x0, 
> >>      towrite = 0x0, 
> >>      written = 0xffff8824832a7c00, 
> >>      sector = 29609798248, 
> >>      flags = 24, 
> >>      log_checksum = 0
> >>    }}, {{
> >>      req = {
> >>        bi_sector = 0, 
> >>        bi_next = 0x0, 
> >>        bi_bdev = 0x0, 
> >>        bi_flags = 1, 
> >>        bi_rw = 0, 
> >>        bi_vcnt = 0, 
> >>        bi_idx = 0, 
> >>        bi_phys_segments = 0, 
> >>        bi_size = 0, 
> >>        bi_seg_front_size = 0, 
> >>        bi_seg_back_size = 0, 
> >>        bi_end_io = 0x0, 
> >>        bi_private = 0xffff881e08d54d30, 
> >>        bi_ioc = 0x0, 
> >>        bi_css = 0x0, 
> >>        bi_integrity = 0x0, 
> >>        bi_max_vecs = 1, 
> >>        bi_cnt = {
> >>          counter = 1
> >>        }, 
> >>        bi_io_vec = 0xffff881e08d55ac0, 
> >>        bi_pool = 0x0, 
> >>        {
> >>          bio_aux = 0x0, 
> >>          __UNIQUE_ID_rh_kabi_hide15 = {
> >>            rh_reserved1 = 0x0
> >>          }, 
> >>          {<No data fields>}
> >>        }, 
> >>        bi_inline_vecs = 0xffff881e08d55a38
> >>      }, 
> >>      rreq = {
> >>        bi_sector = 0, 
> >>        bi_next = 0x0, 
> >>        bi_bdev = 0x0, 
> >>        bi_flags = 1, 
> >>        bi_rw = 0, 
> >>        bi_vcnt = 0, 
> >>        bi_idx = 0, 
> >>        bi_phys_segments = 0, 
> >>        bi_size = 0, 
> >>        bi_seg_front_size = 0, 
> >>        bi_seg_back_size = 0, 
> >>        bi_end_io = 0x0, 
> >>        bi_private = 0xffff881e08d54d30, 
> >>        bi_ioc = 0x0, 
> >>        bi_css = 0x0, 
> >>        bi_integrity = 0x0, 
> >>        bi_max_vecs = 1, 
> >>        bi_cnt = {
> >>          counter = 1
> >>        }, 
> >>        bi_io_vec = 0xffff881e08d55ad0, 
> >>        bi_pool = 0x0, 
> >>        {
> >>          bio_aux = 0x0, 
> >>          __UNIQUE_ID_rh_kabi_hide15 = {
> >>            rh_reserved1 = 0x0
> >>          }, 
> >>          {<No data fields>}
> >>        }, 
> >>        bi_inline_vecs = 0xffff881e08d55ac0
> >>      }, 
> >>      vec = {
> >>        bv_page = 0xffffea0078236280, 
> >>        bv_len = 4096, 
> >>        bv_offset = 0
> >>      }, 
> >>      rvec = {
> >>        bv_page = 0x0, 
> >>        bv_len = 0, 
> >>        bv_offset = 0
> >>      }, 
> >>      page = 0xffffea0078236280, 
> >>      orig_page = 0xffffea0078236280, 
> >>      toread = 0x0, 
> >>      read = 0x0, 
> >>      towrite = 0x0, 
> >>      written = 0xffff8824832a7c00, 
> >>      sector = 29609798376, 
> >>      flags = 24, 
> >>      log_checksum = 0
> >>    }}, {{
> >>      req = {
> >>        bi_sector = 0, 
> >>        bi_next = 0x0, 
> >>        bi_bdev = 0x0, 
> >>        bi_flags = 1, 
> >>        bi_rw = 0, 
> >>        bi_vcnt = 0, 
> >>        bi_idx = 0, 
> >>        bi_phys_segments = 0, 
> >>        bi_size = 0, 
> >>        bi_seg_front_size = 0, 
> >>        bi_seg_back_size = 0, 
> >>        bi_end_io = 0x0, 
> >>        bi_private = 0xffff881e08d54d30, 
> >>        bi_ioc = 0x0, 
> >>        bi_css = 0x0, 
> >>        bi_integrity = 0x0, 
> >>        bi_max_vecs = 1, 
> >>        bi_cnt = {
> >>          counter = 1
> >>        }, 
> >>        bi_io_vec = 0xffff881e08d55c38, 
> >>        bi_pool = 0x0, 
> >>        {
> >>          bio_aux = 0x0, 
> >>          __UNIQUE_ID_rh_kabi_hide15 = {
> >>            rh_reserved1 = 0x0
> >>          }, 
> >>          {<No data fields>}
> >>        }, 
> >>        bi_inline_vecs = 0xffff881e08d55bb0
> >>      }, 
> >>      rreq = {
> >>        bi_sector = 0, 
> >>        bi_next = 0x0, 
> >>        bi_bdev = 0x0, 
> >>        bi_flags = 1, 
> >>        bi_rw = 0, 
> >>        bi_vcnt = 0, 
> >>        bi_idx = 0, 
> >>        bi_phys_segments = 0, 
> >>        bi_size = 0, 
> >>        bi_seg_front_size = 0, 
> >>        bi_seg_back_size = 0, 
> >>        bi_end_io = 0x0, 
> >>        bi_private = 0xffff881e08d54d30, 
> >>        bi_ioc = 0x0, 
> >>        bi_css = 0x0, 
> >>        bi_integrity = 0x0, 
> >>        bi_max_vecs = 1, 
> >>        bi_cnt = {
> >>          counter = 1
> >>        }, 
> >>        bi_io_vec = 0xffff881e08d55c48, 
> >>        bi_pool = 0x0, 
> >>        {
> >>          bio_aux = 0x0, 
> >>          __UNIQUE_ID_rh_kabi_hide15 = {
> >>            rh_reserved1 = 0x0
> >>          }, 
> >>          {<No data fields>}
> >>        }, 
> >>        bi_inline_vecs = 0xffff881e08d55c38
> >>      }, 
> >>      vec = {
> >>        bv_page = 0xffffea00782362c0, 
> >>        bv_len = 4096, 
> >>        bv_offset = 0
> >>      }, 
> >>      rvec = {
> >>        bv_page = 0x0, 
> >>        bv_len = 0, 
> >>        bv_offset = 0
> >>      }, 
> >>      page = 0xffffea00782362c0, 
> >>      orig_page = 0xffffea00782362c0, 
> >>      toread = 0x0, 
> >>      read = 0x0, 
> >>      towrite = 0x0, 
> >>      written = 0xffff8824832a7c00, 
> >>      sector = 29609798504, 
> >>      flags = 24, 
> >>      log_checksum = 0
> >>    }}}
> > 
> > Didn't find anything interesting yet, sorry. There might be a race condition
> > which makes the stripe batch head and also adds it to another batch list,
> > that's my current theory, but can't find such race condition.
> 
> Ok. In any case, thanks much for taking the time to look at this.
> 
> > 
> > So in the crashdump, we have something like this:
> > 
> > batch_list = {
> >    next = 0xffff881dfe405d28,
> >    prev = 0xffff881e0e2dec98
> >  },
> > 
> > Can you convert the next/prev to stripes and dump their state? That probably is
> > helpful for debugging.
> 
> Results below. They do look partially zeroed, and disks=0 so I’m not sure that makes sense to dump their devs.
> 
> Note: next and prev in the other crash look similar.
> 
> Here is ‘next':
> 
> crash> p ((struct stripe_head)*0xffff881dfe405d28)

I think it should something like this:

(struct stripe_head *)(0xffff881dfe405d28 - (unsigned long)&((struct stripe_head*)0)->batch_list)

the next/prev point to stripe_head->batch_list, not stripe_head

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: BUG_ON(sh->batch_head) in init_stripe()
       [not found]                 ` <C45EC19A-1CF2-45BA-B907-BCD5B4E644F0@stanford.edu>
@ 2017-08-21 16:10                   ` Shaohua Li
  2017-08-21 17:35                     ` Stephane Thiell
  0 siblings, 1 reply; 15+ messages in thread
From: Shaohua Li @ 2017-08-21 16:10 UTC (permalink / raw)
  To: Stephane Thiell; +Cc: Guoqing Jiang, linux-raid

On Fri, Aug 18, 2017 at 07:00:20AM +0000, Stephane Thiell wrote:
> 
> > On Aug 17, 2017, at 10:50 PM, Shaohua Li <shli@kernel.org> wrote:
> > 
> > On Fri, Aug 18, 2017 at 04:13:21AM +0000, Stephane Thiell wrote:
> >>> On Aug 17, 2017, at 3:58 PM, Shaohua Li <shli@kernel.org> wrote:
> >>> 
> >>> On Tue, Aug 15, 2017 at 07:27:44PM +0000, Stephane Thiell wrote:
> >>>> 
> >>>>> On Aug 15, 2017, at 11:39 AM, Shaohua Li <shli@kernel.org> wrote:
> >>>>> 
> >>>>> On Tue, Aug 15, 2017 at 05:59:38PM +0000, Stephane Thiell wrote:
> >>>>>> 
> >>>>>>> On Aug 15, 2017, at 2:24 AM, Guoqing Jiang <gqjiang@suse.com> wrote:
> >>>>>>> 
> >>>>>>> 
> >>>>>>> 
> >>>>>>> On 08/14/2017 01:29 AM, Stephane Thiell wrote:
> >>>>>>>> Also known as "kernel BUG at drivers/md/raid5.c:527!"
> >>>>>>>> 
> >>>>>>>> I’m still struggling with this issue in raid5.c. I currently have one crash per week in average on a system using Lustre and 24 x 10-disk raid6 per server. It did happen with or without check running. I now think there is a race at some point where a released stripe head (batched) is added to conf->inactive_list without sh->batch_head being set back to NULL.
> >>>>>>>> 
> >>>>>>>> Because sh->batch_head is only being set back to NULL in a single function, break_stripe_batch_list(), my guess is that a call to raid5_release_stripe() on the head stripe is performed elsewhere. I didn’t find where this could come from yet. So perhaps sh->batch_head should be just set to NULL in do_release_stripe(), before being added to the inactive list?
> >>>>>>>> 
> >>>>>>>> Alternatively, I wonder if it would make more sense to set batch_head to NULL where sh->overwrite_disks is also set back to 0 (like in ops_run_biodrain() today).
> >>>>>>>> 
> >>>>>>>> Because sh->overwrite_disks is also set to 0 in init_stripe(), my current plan is to try this change, at least as a workaround: remove the BUG_ON(sh->batch_head) from init_stripe() and set sh->batch_head = NULL (I don’t think I need to lock on sh->stripe_lock there?). I will get new hardware soon so I will give it a try on a test system. Any suggestions welcomed…
> >>>>>>>> 
> >>>>>>>> Quick analysis of the last vmcore:
> >>>>>>>> 
> >>>>>>>> crash> ps | grep ">"
> >>>>>>>> ...
> >>>>>>>>> 16119  16111  22  ffff881f93bace70  RU   0.0  150772  10748  e2label
> >>>>>>>>> 151448      2  34  ffff881ff753bec0  RU   0.0       0      0  [kiblnd_sd_00_00]
> >>>>>>>>> 151449      2   2  ffff881ff753edd0  RU   0.0       0      0  [kiblnd_sd_00_01]
> >>>>>>>>> 154637      2  29  ffff881f776b1f60  RU   0.0       0      0  [md32_raid6]
> >>>>>>>>> 163643      2  11  ffff8820d200bec0  UN   0.0       0      0  [ll_ost_io01_008]
> >>>>>>>>> 164083      2  13  ffff8820d2753ec0  UN   0.0       0      0  [ll_ost_io01_034]
> >>>>>>>>> 171389      2   5  ffff883fd072de20  UN   0.0       0      0  [ll_ost_io01_058]
> >>>>>>>>> 171391      2  17  ffff883fd072ce70  UN   0.0       0      0  [ll_ost_io01_060]
> >>>>>>>>> 171399      2  33  ffff883fd9e9ce70  UN   0.0       0      0  [ll_ost_io01_068]
> >>>>>>>>> 237019      2  15  ffff882715628fb0  UN   0.0       0      0  [ll_ost_io01_084]
> >>>>>>>>> 237033      2   3  ffff881fb0250fb0  UN   0.0       0      0  [ll_ost_io01_098]
> >>>>>>>> ...
> >>>>>>>> 
> >>>>>>>> crash>  bt 237033
> >>>>>>>> PID: 237033  TASK: ffff881fb0250fb0  CPU: 3   COMMAND: "ll_ost_io01_098"
> >>>>>>>> #0 [ffff8826fa3e7240] machine_kexec at ffffffff81059bdb
> >>>>>>>> #1 [ffff8826fa3e72a0] __crash_kexec at ffffffff811057c2
> >>>>>>>> #2 [ffff8826fa3e7370] crash_kexec at ffffffff811058b0
> >>>>>>>> #3 [ffff8826fa3e7388] oops_end at ffffffff8168fac8
> >>>>>>>> #4 [ffff8826fa3e73b0] die at ffffffff8102e93b
> >>>>>>>> #5 [ffff8826fa3e73e0] do_trap at ffffffff8168f180
> >>>>>>>> #6 [ffff8826fa3e7430] do_invalid_op at ffffffff8102b144
> >>>>>>>> #7 [ffff8826fa3e74e0] invalid_op at ffffffff81698b5e
> >>>>>>>>  [exception RIP: raid5_get_active_stripe+1809]
> >>>>>>>>  RIP: ffffffffa0fefd71  RSP: ffff8826fa3e7590  RFLAGS: 00010086
> >>>>>>>>  RAX: 0000000000000000  RBX: ffff883df051641c  RCX: dead000000000200
> >>>>>>>>  RDX: 0000000000000000  RSI: 0000000000000005  RDI: ffff881e08d54d30
> >>>>>>>>  RBP: ffff8826fa3e7638   R8: ffff881e08d54d40   R9: 0000000000000080
> >>>>>>>>  R10: 0000000000000000  R11: 0000000000000000  R12: ffff883df0516400
> >>>>>>>>  R13: ffff881e08d54d30  R14: 0000000000000000  R15: 0000000000000000
> >>>>>>>>  ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
> >>>>>>>> #8 [ffff8826fa3e7640] raid5_make_request at ffffffffa0feff55 [raid456]
> >>>>>>>> #9 [ffff8826fa3e7730] md_make_request at ffffffff814f838c
> >>>>>>>> #10 [ffff8826fa3e7790] generic_make_request at ffffffff812eedc9
> >>>>>>>> #11 [ffff8826fa3e77d8] submit_bio at ffffffff812eef11
> >>>>>>>> #12 [ffff8826fa3e7830] osd_submit_bio at ffffffffa10cdbcc [osd_ldiskfs]
> >>>>>>>> #13 [ffff8826fa3e7840] osd_do_bio at ffffffffa10d03c1 [osd_ldiskfs]
> >>>>>>>> #14 [ffff8826fa3e7960] osd_write_commit at ffffffffa10d0b94 [osd_ldiskfs]
> >>>>>>>> #15 [ffff8826fa3e7a00] ofd_commitrw_write at ffffffffa120c734 [ofd]
> >>>>>>>> #16 [ffff8826fa3e7a78] ofd_commitrw at ffffffffa120ff07 [ofd]
> >>>>>>>> #17 [ffff8826fa3e7b08] obd_commitrw at ffffffffa0c7dc22 [ptlrpc]
> >>>>>>>> #18 [ffff8826fa3e7b70] tgt_brw_write at ffffffffa0c55fc1 [ptlrpc]
> >>>>>>>> #19 [ffff8826fa3e7cd8] tgt_request_handle at ffffffffa0c52275 [ptlrpc]
> >>>>>>>> #20 [ffff8826fa3e7d20] ptlrpc_server_handle_request at ffffffffa0bfe1fb [ptlrpc]
> >>>>>>>> #21 [ffff8826fa3e7de8] ptlrpc_main at ffffffffa0c022b0 [ptlrpc]
> >>>>>>>> #22 [ffff8826fa3e7ec8] kthread at ffffffff810b0a4f
> >>>>>>>> #23 [ffff8826fa3e7f50] ret_from_fork at ffffffff81697218
> >>>>>>>> 
> >>>>>>>> 
> >>>>>>>> After some disassembly work, I found out that all active tasks are working on the same md (all are raid6):
> >>>>>>>> 
> >>>>>>>> md32’s (struct r5conf *) is 0xffff883df0516400
> >>>>>>>> 
> >>>>>>>> Our crashed task:
> >>>>>>>> 
> >>>>>>>> crash> printf "%s", ((struct r5conf)*0xffff883df0516400).cache_name[0]
> >>>>>>>> raid6-md32
> >>>>>>>> 
> >>>>>>>> Example for md32_raid6 task in release_inactive_stripe_list:
> >>>>>>>> 
> >>>>>>>> crash> bt 154637
> >>>>>>>> PID: 154637  TASK: ffff881f776b1f60  CPU: 29  COMMAND: "md32_raid6"
> >>>>>>>> #0 [ffff883ffc985e48] crash_nmi_callback at ffffffff8104d2e2
> >>>>>>>> #1 [ffff883ffc985e58] nmi_handle at ffffffff8168fc77
> >>>>>>>> #2 [ffff883ffc985eb0] do_nmi at ffffffff8168fe83
> >>>>>>>> #3 [ffff883ffc985ef0] end_repeat_nmi at ffffffff8168f093
> >>>>>>>>  [exception RIP: _raw_spin_lock_irqsave+64]
> >>>>>>>>  RIP: ffffffff8168e5f0  RSP: ffff881e8d0cbc60  RFLAGS: 00000006
> >>>>>>>>  RAX: 00000000000031bb  RBX: ffff883df0516740  RCX: 000000000000009a
> >>>>>>>>  RDX: 0000000000000098  RSI: 000000000000009a  RDI: ffff883df051641c
> >>>>>>>>  RBP: ffff881e8d0cbc60   R8: 0000000000000206   R9: 0000000000000000
> >>>>>>>>  R10: 0000000000000000  R11: 0000000000000007  R12: ffff883df0516400
> >>>>>>>>  R13: ffff883df051641c  R14: 0000000000000006  R15: 0000000000000005
> >>>>>>>>  ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
> >>>>>>>> --- <NMI exception stack> ---
> >>>>>>>> #4 [ffff881e8d0cbc60] _raw_spin_lock_irqsave at ffffffff8168e5f0
> >>>>>>>> #5 [ffff881e8d0cbc68] release_inactive_stripe_list at ffffffffa0fe5500 [raid456]
> >>>>>>>> #6 [ffff881e8d0cbcb0] handle_active_stripes at ffffffffa0ff34c2 [raid456]
> >>>>>>>> #7 [ffff881e8d0cbd60] raid5d at ffffffffa0ff3c28 [raid456]
> >>>>>>>> #8 [ffff881e8d0cbe50] md_thread at ffffffff814fb745
> >>>>>>>> #9 [ffff881e8d0cbec8] kthread at ffffffff810b0a4f
> >>>>>>>> #10 [ffff881e8d0cbf50] ret_from_fork at ffffffff81697218
> >>>>>>>> 
> >>>>>>>> 
> >>>>>>>> crash> bt 163643 164083 171389 171391 171399 237019 237033 |grep "exception RIP:"
> >>>>>>>>  [exception RIP: _raw_spin_lock_irq+63]
> >>>>>>>>  [exception RIP: _raw_spin_lock_irq+63]
> >>>>>>>>  [exception RIP: _raw_spin_lock_irq+58]
> >>>>>>>>  [exception RIP: _raw_spin_lock_irq+63]
> >>>>>>>>  [exception RIP: _raw_spin_lock_irq+58]
> >>>>>>>>  [exception RIP: _raw_spin_lock_irq+63]
> >>>>>>>>  [exception RIP: raid5_get_active_stripe+1809]
> >>>>>>>> 
> >>>>>>>> crash> bt 163643 164083 171389 171391 171399 237019 237033 |grep "raid5_get_active_stripe"
> >>>>>>>> #5 [ffff883bcee93588] raid5_get_active_stripe at ffffffffa0fef6cb [raid456]
> >>>>>>>> #5 [ffff883ba4af7588] raid5_get_active_stripe at ffffffffa0fef6cb [raid456]
> >>>>>>>> #5 [ffff88390877b588] raid5_get_active_stripe at ffffffffa0fef6cb [raid456]
> >>>>>>>> #5 [ffff88390803b588] raid5_get_active_stripe at ffffffffa0fef6cb [raid456]
> >>>>>>>> #5 [ffff88390831f588] raid5_get_active_stripe at ffffffffa0fef6cb [raid456]
> >>>>>>>> #5 [ffff8826fa7f7588] raid5_get_active_stripe at ffffffffa0fef6cb [raid456]
> >>>>>>>>  [exception RIP: raid5_get_active_stripe+1809]
> >>>>>>>> 
> >>>>>>>> crash> bt 163643 164083 171389 171391 171399 237019 237033 |grep R12
> >>>>>>>>  R10: 0000000000000003  R11: 0000000000000000  R12: ffff883df0516400
> >>>>>>>>  R10: 0000000000000001  R11: 0000000000000000  R12: ffff883df0516400
> >>>>>>>>  R10: 0000000000000005  R11: 0000000000000000  R12: ffff883df0516400
> >>>>>>>>  R10: 0000000000000007  R11: 0000000000000000  R12: ffff883df0516400
> >>>>>>>>  R10: 0000000000000009  R11: 0000000000000000  R12: ffff883df0516400
> >>>>>>>>  R10: 0000000000000009  R11: 0000000000000000  R12: ffff883df0516400
> >>>>>>>>  R10: 0000000000000000  R11: 0000000000000000  R12: ffff883df0516400
> >>>>>>>> 
> >>>>>>>> 
> >>>>>>>> The bad stripe_head (sh) is available at (usually in %r13):
> >>>>>>>> 
> >>>>>>>> crash> p ((struct stripe_head)*0xffff881e08d54d30).raid_conf
> >>>>>>>> $98 = (struct r5conf *) 0xffff883df0516400
> >>>>>>>> 
> >>>>>>>> and batch_head is not NULL... that's the issue:

Last mail appers not hitting the list, resend.

The new dump info does reveal some infos. Not sure if it's the issue you found,
but I did find a race condition. Please try below patch and report back:

diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
index ed5cd705b985..35637fe34820 100644
--- a/drivers/md/raid5.c
+++ b/drivers/md/raid5.c
@@ -806,13 +806,19 @@ static void stripe_add_to_batch_list(struct r5conf *conf, struct stripe_head *sh
 		}
 
 		/*
+		 * We must assign batch_head of this stripe within the
+		 * batch_lock, otherwise clear_batch_ready of batch head
+		 * stripe could clear BATCH_READY bit of this stripe and this
+		 * stripe->batch_head doesn't get assigned, which could
+		 * confuse clear_batch_ready for this stripe
+		 */
+		sh->batch_head = head->batch_head;
+		/*
 		 * at this point, head's BATCH_READY could be cleared, but we
 		 * can still add the stripe to batch list
 		 */
 		list_add(&sh->batch_list, &head->batch_list);
 		spin_unlock(&head->batch_head->batch_lock);
-
-		sh->batch_head = head->batch_head;
 	} else {
 		head->batch_head = head;
 		sh->batch_head = head->batch_head;

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: BUG_ON(sh->batch_head) in init_stripe()
  2017-08-21 16:10                   ` Shaohua Li
@ 2017-08-21 17:35                     ` Stephane Thiell
  2017-08-21 17:46                       ` Shaohua Li
  0 siblings, 1 reply; 15+ messages in thread
From: Stephane Thiell @ 2017-08-21 17:35 UTC (permalink / raw)
  To: Shaohua Li; +Cc: Guoqing Jiang, linux-raid


> On Aug 21, 2017, at 9:10 AM, Shaohua Li <shli@kernel.org> wrote:
> 
> On Fri, Aug 18, 2017 at 07:00:20AM +0000, Stephane Thiell wrote:
>> 
>>> On Aug 17, 2017, at 10:50 PM, Shaohua Li <shli@kernel.org> wrote:
>>> 
>>> On Fri, Aug 18, 2017 at 04:13:21AM +0000, Stephane Thiell wrote:
>>>>> On Aug 17, 2017, at 3:58 PM, Shaohua Li <shli@kernel.org> wrote:
>>>>> 
>>>>> On Tue, Aug 15, 2017 at 07:27:44PM +0000, Stephane Thiell wrote:
>>>>>> 
>>>>>>> On Aug 15, 2017, at 11:39 AM, Shaohua Li <shli@kernel.org> wrote:
>>>>>>> 
>>>>>>> On Tue, Aug 15, 2017 at 05:59:38PM +0000, Stephane Thiell wrote:
>>>>>>>> 
>>>>>>>>> On Aug 15, 2017, at 2:24 AM, Guoqing Jiang <gqjiang@suse.com> wrote:
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> On 08/14/2017 01:29 AM, Stephane Thiell wrote:
>>>>>>>>>> Also known as "kernel BUG at drivers/md/raid5.c:527!"
>>>>>>>>>> 
>>>>>>>>>> I’m still struggling with this issue in raid5.c. I currently have one crash per week in average on a system using Lustre and 24 x 10-disk raid6 per server. It did happen with or without check running. I now think there is a race at some point where a released stripe head (batched) is added to conf->inactive_list without sh->batch_head being set back to NULL.
>>>>>>>>>> 
>>>>>>>>>> Because sh->batch_head is only being set back to NULL in a single function, break_stripe_batch_list(), my guess is that a call to raid5_release_stripe() on the head stripe is performed elsewhere. I didn’t find where this could come from yet. So perhaps sh->batch_head should be just set to NULL in do_release_stripe(), before being added to the inactive list?
>>>>>>>>>> 
>>>>>>>>>> Alternatively, I wonder if it would make more sense to set batch_head to NULL where sh->overwrite_disks is also set back to 0 (like in ops_run_biodrain() today).
>>>>>>>>>> 
>>>>>>>>>> Because sh->overwrite_disks is also set to 0 in init_stripe(), my current plan is to try this change, at least as a workaround: remove the BUG_ON(sh->batch_head) from init_stripe() and set sh->batch_head = NULL (I don’t think I need to lock on sh->stripe_lock there?). I will get new hardware soon so I will give it a try on a test system. Any suggestions welcomed…
>>>>>>>>>> 
>>>>>>>>>> Quick analysis of the last vmcore:
>>>>>>>>>> 
>>>>>>>>>> crash> ps | grep ">"
>>>>>>>>>> ...
>>>>>>>>>>> 16119  16111  22  ffff881f93bace70  RU   0.0  150772  10748  e2label
>>>>>>>>>>> 151448      2  34  ffff881ff753bec0  RU   0.0       0      0  [kiblnd_sd_00_00]
>>>>>>>>>>> 151449      2   2  ffff881ff753edd0  RU   0.0       0      0  [kiblnd_sd_00_01]
>>>>>>>>>>> 154637      2  29  ffff881f776b1f60  RU   0.0       0      0  [md32_raid6]
>>>>>>>>>>> 163643      2  11  ffff8820d200bec0  UN   0.0       0      0  [ll_ost_io01_008]
>>>>>>>>>>> 164083      2  13  ffff8820d2753ec0  UN   0.0       0      0  [ll_ost_io01_034]
>>>>>>>>>>> 171389      2   5  ffff883fd072de20  UN   0.0       0      0  [ll_ost_io01_058]
>>>>>>>>>>> 171391      2  17  ffff883fd072ce70  UN   0.0       0      0  [ll_ost_io01_060]
>>>>>>>>>>> 171399      2  33  ffff883fd9e9ce70  UN   0.0       0      0  [ll_ost_io01_068]
>>>>>>>>>>> 237019      2  15  ffff882715628fb0  UN   0.0       0      0  [ll_ost_io01_084]
>>>>>>>>>>> 237033      2   3  ffff881fb0250fb0  UN   0.0       0      0  [ll_ost_io01_098]
>>>>>>>>>> ...
>>>>>>>>>> 
>>>>>>>>>> crash>  bt 237033
>>>>>>>>>> PID: 237033  TASK: ffff881fb0250fb0  CPU: 3   COMMAND: "ll_ost_io01_098"
>>>>>>>>>> #0 [ffff8826fa3e7240] machine_kexec at ffffffff81059bdb
>>>>>>>>>> #1 [ffff8826fa3e72a0] __crash_kexec at ffffffff811057c2
>>>>>>>>>> #2 [ffff8826fa3e7370] crash_kexec at ffffffff811058b0
>>>>>>>>>> #3 [ffff8826fa3e7388] oops_end at ffffffff8168fac8
>>>>>>>>>> #4 [ffff8826fa3e73b0] die at ffffffff8102e93b
>>>>>>>>>> #5 [ffff8826fa3e73e0] do_trap at ffffffff8168f180
>>>>>>>>>> #6 [ffff8826fa3e7430] do_invalid_op at ffffffff8102b144
>>>>>>>>>> #7 [ffff8826fa3e74e0] invalid_op at ffffffff81698b5e
>>>>>>>>>> [exception RIP: raid5_get_active_stripe+1809]
>>>>>>>>>> RIP: ffffffffa0fefd71  RSP: ffff8826fa3e7590  RFLAGS: 00010086
>>>>>>>>>> RAX: 0000000000000000  RBX: ffff883df051641c  RCX: dead000000000200
>>>>>>>>>> RDX: 0000000000000000  RSI: 0000000000000005  RDI: ffff881e08d54d30
>>>>>>>>>> RBP: ffff8826fa3e7638   R8: ffff881e08d54d40   R9: 0000000000000080
>>>>>>>>>> R10: 0000000000000000  R11: 0000000000000000  R12: ffff883df0516400
>>>>>>>>>> R13: ffff881e08d54d30  R14: 0000000000000000  R15: 0000000000000000
>>>>>>>>>> ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
>>>>>>>>>> #8 [ffff8826fa3e7640] raid5_make_request at ffffffffa0feff55 [raid456]
>>>>>>>>>> #9 [ffff8826fa3e7730] md_make_request at ffffffff814f838c
>>>>>>>>>> #10 [ffff8826fa3e7790] generic_make_request at ffffffff812eedc9
>>>>>>>>>> #11 [ffff8826fa3e77d8] submit_bio at ffffffff812eef11
>>>>>>>>>> #12 [ffff8826fa3e7830] osd_submit_bio at ffffffffa10cdbcc [osd_ldiskfs]
>>>>>>>>>> #13 [ffff8826fa3e7840] osd_do_bio at ffffffffa10d03c1 [osd_ldiskfs]
>>>>>>>>>> #14 [ffff8826fa3e7960] osd_write_commit at ffffffffa10d0b94 [osd_ldiskfs]
>>>>>>>>>> #15 [ffff8826fa3e7a00] ofd_commitrw_write at ffffffffa120c734 [ofd]
>>>>>>>>>> #16 [ffff8826fa3e7a78] ofd_commitrw at ffffffffa120ff07 [ofd]
>>>>>>>>>> #17 [ffff8826fa3e7b08] obd_commitrw at ffffffffa0c7dc22 [ptlrpc]
>>>>>>>>>> #18 [ffff8826fa3e7b70] tgt_brw_write at ffffffffa0c55fc1 [ptlrpc]
>>>>>>>>>> #19 [ffff8826fa3e7cd8] tgt_request_handle at ffffffffa0c52275 [ptlrpc]
>>>>>>>>>> #20 [ffff8826fa3e7d20] ptlrpc_server_handle_request at ffffffffa0bfe1fb [ptlrpc]
>>>>>>>>>> #21 [ffff8826fa3e7de8] ptlrpc_main at ffffffffa0c022b0 [ptlrpc]
>>>>>>>>>> #22 [ffff8826fa3e7ec8] kthread at ffffffff810b0a4f
>>>>>>>>>> #23 [ffff8826fa3e7f50] ret_from_fork at ffffffff81697218
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> After some disassembly work, I found out that all active tasks are working on the same md (all are raid6):
>>>>>>>>>> 
>>>>>>>>>> md32’s (struct r5conf *) is 0xffff883df0516400
>>>>>>>>>> 
>>>>>>>>>> Our crashed task:
>>>>>>>>>> 
>>>>>>>>>> crash> printf "%s", ((struct r5conf)*0xffff883df0516400).cache_name[0]
>>>>>>>>>> raid6-md32
>>>>>>>>>> 
>>>>>>>>>> Example for md32_raid6 task in release_inactive_stripe_list:
>>>>>>>>>> 
>>>>>>>>>> crash> bt 154637
>>>>>>>>>> PID: 154637  TASK: ffff881f776b1f60  CPU: 29  COMMAND: "md32_raid6"
>>>>>>>>>> #0 [ffff883ffc985e48] crash_nmi_callback at ffffffff8104d2e2
>>>>>>>>>> #1 [ffff883ffc985e58] nmi_handle at ffffffff8168fc77
>>>>>>>>>> #2 [ffff883ffc985eb0] do_nmi at ffffffff8168fe83
>>>>>>>>>> #3 [ffff883ffc985ef0] end_repeat_nmi at ffffffff8168f093
>>>>>>>>>> [exception RIP: _raw_spin_lock_irqsave+64]
>>>>>>>>>> RIP: ffffffff8168e5f0  RSP: ffff881e8d0cbc60  RFLAGS: 00000006
>>>>>>>>>> RAX: 00000000000031bb  RBX: ffff883df0516740  RCX: 000000000000009a
>>>>>>>>>> RDX: 0000000000000098  RSI: 000000000000009a  RDI: ffff883df051641c
>>>>>>>>>> RBP: ffff881e8d0cbc60   R8: 0000000000000206   R9: 0000000000000000
>>>>>>>>>> R10: 0000000000000000  R11: 0000000000000007  R12: ffff883df0516400
>>>>>>>>>> R13: ffff883df051641c  R14: 0000000000000006  R15: 0000000000000005
>>>>>>>>>> ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
>>>>>>>>>> --- <NMI exception stack> ---
>>>>>>>>>> #4 [ffff881e8d0cbc60] _raw_spin_lock_irqsave at ffffffff8168e5f0
>>>>>>>>>> #5 [ffff881e8d0cbc68] release_inactive_stripe_list at ffffffffa0fe5500 [raid456]
>>>>>>>>>> #6 [ffff881e8d0cbcb0] handle_active_stripes at ffffffffa0ff34c2 [raid456]
>>>>>>>>>> #7 [ffff881e8d0cbd60] raid5d at ffffffffa0ff3c28 [raid456]
>>>>>>>>>> #8 [ffff881e8d0cbe50] md_thread at ffffffff814fb745
>>>>>>>>>> #9 [ffff881e8d0cbec8] kthread at ffffffff810b0a4f
>>>>>>>>>> #10 [ffff881e8d0cbf50] ret_from_fork at ffffffff81697218
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> crash> bt 163643 164083 171389 171391 171399 237019 237033 |grep "exception RIP:"
>>>>>>>>>> [exception RIP: _raw_spin_lock_irq+63]
>>>>>>>>>> [exception RIP: _raw_spin_lock_irq+63]
>>>>>>>>>> [exception RIP: _raw_spin_lock_irq+58]
>>>>>>>>>> [exception RIP: _raw_spin_lock_irq+63]
>>>>>>>>>> [exception RIP: _raw_spin_lock_irq+58]
>>>>>>>>>> [exception RIP: _raw_spin_lock_irq+63]
>>>>>>>>>> [exception RIP: raid5_get_active_stripe+1809]
>>>>>>>>>> 
>>>>>>>>>> crash> bt 163643 164083 171389 171391 171399 237019 237033 |grep "raid5_get_active_stripe"
>>>>>>>>>> #5 [ffff883bcee93588] raid5_get_active_stripe at ffffffffa0fef6cb [raid456]
>>>>>>>>>> #5 [ffff883ba4af7588] raid5_get_active_stripe at ffffffffa0fef6cb [raid456]
>>>>>>>>>> #5 [ffff88390877b588] raid5_get_active_stripe at ffffffffa0fef6cb [raid456]
>>>>>>>>>> #5 [ffff88390803b588] raid5_get_active_stripe at ffffffffa0fef6cb [raid456]
>>>>>>>>>> #5 [ffff88390831f588] raid5_get_active_stripe at ffffffffa0fef6cb [raid456]
>>>>>>>>>> #5 [ffff8826fa7f7588] raid5_get_active_stripe at ffffffffa0fef6cb [raid456]
>>>>>>>>>> [exception RIP: raid5_get_active_stripe+1809]
>>>>>>>>>> 
>>>>>>>>>> crash> bt 163643 164083 171389 171391 171399 237019 237033 |grep R12
>>>>>>>>>> R10: 0000000000000003  R11: 0000000000000000  R12: ffff883df0516400
>>>>>>>>>> R10: 0000000000000001  R11: 0000000000000000  R12: ffff883df0516400
>>>>>>>>>> R10: 0000000000000005  R11: 0000000000000000  R12: ffff883df0516400
>>>>>>>>>> R10: 0000000000000007  R11: 0000000000000000  R12: ffff883df0516400
>>>>>>>>>> R10: 0000000000000009  R11: 0000000000000000  R12: ffff883df0516400
>>>>>>>>>> R10: 0000000000000009  R11: 0000000000000000  R12: ffff883df0516400
>>>>>>>>>> R10: 0000000000000000  R11: 0000000000000000  R12: ffff883df0516400
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> The bad stripe_head (sh) is available at (usually in %r13):
>>>>>>>>>> 
>>>>>>>>>> crash> p ((struct stripe_head)*0xffff881e08d54d30).raid_conf
>>>>>>>>>> $98 = (struct r5conf *) 0xffff883df0516400
>>>>>>>>>> 
>>>>>>>>>> and batch_head is not NULL... that's the issue:
> 
> Last mail appers not hitting the list, resend.
> 
> The new dump info does reveal some infos. Not sure if it's the issue you found,
> but I did find a race condition. Please try below patch and report back:
> 
> diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
> index ed5cd705b985..35637fe34820 100644
> --- a/drivers/md/raid5.c
> +++ b/drivers/md/raid5.c
> @@ -806,13 +806,19 @@ static void stripe_add_to_batch_list(struct r5conf *conf, struct stripe_head *sh
> 		}
> 
> 		/*
> +		 * We must assign batch_head of this stripe within the
> +		 * batch_lock, otherwise clear_batch_ready of batch head
> +		 * stripe could clear BATCH_READY bit of this stripe and this
> +		 * stripe->batch_head doesn't get assigned, which could
> +		 * confuse clear_batch_ready for this stripe
> +		 */
> +		sh->batch_head = head->batch_head;
> +		/*
> 		 * at this point, head's BATCH_READY could be cleared, but we
> 		 * can still add the stripe to batch list
> 		 */
> 		list_add(&sh->batch_list, &head->batch_list);
> 		spin_unlock(&head->batch_head->batch_lock);
> -
> -		sh->batch_head = head->batch_head;
> 	} else {
> 		head->batch_head = head;
> 		sh->batch_head = head->batch_head;

Awesome! I will apply your patch today on two of my Lustre servers and report back if I see another occurrence, or after some time if it doesn’t show up. We’ll need to wait for at least a couple weeks to be sure this does actually fix the issue I’m seeing.

Thanks much!
Stephane


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: BUG_ON(sh->batch_head) in init_stripe()
  2017-08-21 17:35                     ` Stephane Thiell
@ 2017-08-21 17:46                       ` Shaohua Li
  2017-09-05 17:54                         ` Stephane Thiell
  0 siblings, 1 reply; 15+ messages in thread
From: Shaohua Li @ 2017-08-21 17:46 UTC (permalink / raw)
  To: Stephane Thiell, masterprenium.lkml; +Cc: Guoqing Jiang, linux-raid

On Mon, Aug 21, 2017 at 05:35:58PM +0000, Stephane Thiell wrote:
> 
> > On Aug 21, 2017, at 9:10 AM, Shaohua Li <shli@kernel.org> wrote:
> > 
> > On Fri, Aug 18, 2017 at 07:00:20AM +0000, Stephane Thiell wrote:
> >> 
> >>> On Aug 17, 2017, at 10:50 PM, Shaohua Li <shli@kernel.org> wrote:
> >>> 
> >>> On Fri, Aug 18, 2017 at 04:13:21AM +0000, Stephane Thiell wrote:
> >>>>> On Aug 17, 2017, at 3:58 PM, Shaohua Li <shli@kernel.org> wrote:
> >>>>> 
> >>>>> On Tue, Aug 15, 2017 at 07:27:44PM +0000, Stephane Thiell wrote:
> >>>>>> 
> >>>>>>> On Aug 15, 2017, at 11:39 AM, Shaohua Li <shli@kernel.org> wrote:
> >>>>>>> 
> >>>>>>> On Tue, Aug 15, 2017 at 05:59:38PM +0000, Stephane Thiell wrote:
> >>>>>>>> 
> >>>>>>>>> On Aug 15, 2017, at 2:24 AM, Guoqing Jiang <gqjiang@suse.com> wrote:
> >>>>>>>>> 
> >>>>>>>>> 
> >>>>>>>>> 
> >>>>>>>>> On 08/14/2017 01:29 AM, Stephane Thiell wrote:
> >>>>>>>>>> Also known as "kernel BUG at drivers/md/raid5.c:527!"
> >>>>>>>>>> 
> >>>>>>>>>> I’m still struggling with this issue in raid5.c. I currently have one crash per week in average on a system using Lustre and 24 x 10-disk raid6 per server. It did happen with or without check running. I now think there is a race at some point where a released stripe head (batched) is added to conf->inactive_list without sh->batch_head being set back to NULL.
> >>>>>>>>>> 
> >>>>>>>>>> Because sh->batch_head is only being set back to NULL in a single function, break_stripe_batch_list(), my guess is that a call to raid5_release_stripe() on the head stripe is performed elsewhere. I didn’t find where this could come from yet. So perhaps sh->batch_head should be just set to NULL in do_release_stripe(), before being added to the inactive list?
> >>>>>>>>>> 
> >>>>>>>>>> Alternatively, I wonder if it would make more sense to set batch_head to NULL where sh->overwrite_disks is also set back to 0 (like in ops_run_biodrain() today).
> >>>>>>>>>> 
> >>>>>>>>>> Because sh->overwrite_disks is also set to 0 in init_stripe(), my current plan is to try this change, at least as a workaround: remove the BUG_ON(sh->batch_head) from init_stripe() and set sh->batch_head = NULL (I don’t think I need to lock on sh->stripe_lock there?). I will get new hardware soon so I will give it a try on a test system. Any suggestions welcomed…
> >>>>>>>>>> 
> >>>>>>>>>> Quick analysis of the last vmcore:
> >>>>>>>>>> 
> >>>>>>>>>> crash> ps | grep ">"
> >>>>>>>>>> ...
> >>>>>>>>>>> 16119  16111  22  ffff881f93bace70  RU   0.0  150772  10748  e2label
> >>>>>>>>>>> 151448      2  34  ffff881ff753bec0  RU   0.0       0      0  [kiblnd_sd_00_00]
> >>>>>>>>>>> 151449      2   2  ffff881ff753edd0  RU   0.0       0      0  [kiblnd_sd_00_01]
> >>>>>>>>>>> 154637      2  29  ffff881f776b1f60  RU   0.0       0      0  [md32_raid6]
> >>>>>>>>>>> 163643      2  11  ffff8820d200bec0  UN   0.0       0      0  [ll_ost_io01_008]
> >>>>>>>>>>> 164083      2  13  ffff8820d2753ec0  UN   0.0       0      0  [ll_ost_io01_034]
> >>>>>>>>>>> 171389      2   5  ffff883fd072de20  UN   0.0       0      0  [ll_ost_io01_058]
> >>>>>>>>>>> 171391      2  17  ffff883fd072ce70  UN   0.0       0      0  [ll_ost_io01_060]
> >>>>>>>>>>> 171399      2  33  ffff883fd9e9ce70  UN   0.0       0      0  [ll_ost_io01_068]
> >>>>>>>>>>> 237019      2  15  ffff882715628fb0  UN   0.0       0      0  [ll_ost_io01_084]
> >>>>>>>>>>> 237033      2   3  ffff881fb0250fb0  UN   0.0       0      0  [ll_ost_io01_098]
> >>>>>>>>>> ...
> >>>>>>>>>> 
> >>>>>>>>>> crash>  bt 237033
> >>>>>>>>>> PID: 237033  TASK: ffff881fb0250fb0  CPU: 3   COMMAND: "ll_ost_io01_098"
> >>>>>>>>>> #0 [ffff8826fa3e7240] machine_kexec at ffffffff81059bdb
> >>>>>>>>>> #1 [ffff8826fa3e72a0] __crash_kexec at ffffffff811057c2
> >>>>>>>>>> #2 [ffff8826fa3e7370] crash_kexec at ffffffff811058b0
> >>>>>>>>>> #3 [ffff8826fa3e7388] oops_end at ffffffff8168fac8
> >>>>>>>>>> #4 [ffff8826fa3e73b0] die at ffffffff8102e93b
> >>>>>>>>>> #5 [ffff8826fa3e73e0] do_trap at ffffffff8168f180
> >>>>>>>>>> #6 [ffff8826fa3e7430] do_invalid_op at ffffffff8102b144
> >>>>>>>>>> #7 [ffff8826fa3e74e0] invalid_op at ffffffff81698b5e
> >>>>>>>>>> [exception RIP: raid5_get_active_stripe+1809]
> >>>>>>>>>> RIP: ffffffffa0fefd71  RSP: ffff8826fa3e7590  RFLAGS: 00010086
> >>>>>>>>>> RAX: 0000000000000000  RBX: ffff883df051641c  RCX: dead000000000200
> >>>>>>>>>> RDX: 0000000000000000  RSI: 0000000000000005  RDI: ffff881e08d54d30
> >>>>>>>>>> RBP: ffff8826fa3e7638   R8: ffff881e08d54d40   R9: 0000000000000080
> >>>>>>>>>> R10: 0000000000000000  R11: 0000000000000000  R12: ffff883df0516400
> >>>>>>>>>> R13: ffff881e08d54d30  R14: 0000000000000000  R15: 0000000000000000
> >>>>>>>>>> ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
> >>>>>>>>>> #8 [ffff8826fa3e7640] raid5_make_request at ffffffffa0feff55 [raid456]
> >>>>>>>>>> #9 [ffff8826fa3e7730] md_make_request at ffffffff814f838c
> >>>>>>>>>> #10 [ffff8826fa3e7790] generic_make_request at ffffffff812eedc9
> >>>>>>>>>> #11 [ffff8826fa3e77d8] submit_bio at ffffffff812eef11
> >>>>>>>>>> #12 [ffff8826fa3e7830] osd_submit_bio at ffffffffa10cdbcc [osd_ldiskfs]
> >>>>>>>>>> #13 [ffff8826fa3e7840] osd_do_bio at ffffffffa10d03c1 [osd_ldiskfs]
> >>>>>>>>>> #14 [ffff8826fa3e7960] osd_write_commit at ffffffffa10d0b94 [osd_ldiskfs]
> >>>>>>>>>> #15 [ffff8826fa3e7a00] ofd_commitrw_write at ffffffffa120c734 [ofd]
> >>>>>>>>>> #16 [ffff8826fa3e7a78] ofd_commitrw at ffffffffa120ff07 [ofd]
> >>>>>>>>>> #17 [ffff8826fa3e7b08] obd_commitrw at ffffffffa0c7dc22 [ptlrpc]
> >>>>>>>>>> #18 [ffff8826fa3e7b70] tgt_brw_write at ffffffffa0c55fc1 [ptlrpc]
> >>>>>>>>>> #19 [ffff8826fa3e7cd8] tgt_request_handle at ffffffffa0c52275 [ptlrpc]
> >>>>>>>>>> #20 [ffff8826fa3e7d20] ptlrpc_server_handle_request at ffffffffa0bfe1fb [ptlrpc]
> >>>>>>>>>> #21 [ffff8826fa3e7de8] ptlrpc_main at ffffffffa0c022b0 [ptlrpc]
> >>>>>>>>>> #22 [ffff8826fa3e7ec8] kthread at ffffffff810b0a4f
> >>>>>>>>>> #23 [ffff8826fa3e7f50] ret_from_fork at ffffffff81697218
> >>>>>>>>>> 
> >>>>>>>>>> 
> >>>>>>>>>> After some disassembly work, I found out that all active tasks are working on the same md (all are raid6):
> >>>>>>>>>> 
> >>>>>>>>>> md32’s (struct r5conf *) is 0xffff883df0516400
> >>>>>>>>>> 
> >>>>>>>>>> Our crashed task:
> >>>>>>>>>> 
> >>>>>>>>>> crash> printf "%s", ((struct r5conf)*0xffff883df0516400).cache_name[0]
> >>>>>>>>>> raid6-md32
> >>>>>>>>>> 
> >>>>>>>>>> Example for md32_raid6 task in release_inactive_stripe_list:
> >>>>>>>>>> 
> >>>>>>>>>> crash> bt 154637
> >>>>>>>>>> PID: 154637  TASK: ffff881f776b1f60  CPU: 29  COMMAND: "md32_raid6"
> >>>>>>>>>> #0 [ffff883ffc985e48] crash_nmi_callback at ffffffff8104d2e2
> >>>>>>>>>> #1 [ffff883ffc985e58] nmi_handle at ffffffff8168fc77
> >>>>>>>>>> #2 [ffff883ffc985eb0] do_nmi at ffffffff8168fe83
> >>>>>>>>>> #3 [ffff883ffc985ef0] end_repeat_nmi at ffffffff8168f093
> >>>>>>>>>> [exception RIP: _raw_spin_lock_irqsave+64]
> >>>>>>>>>> RIP: ffffffff8168e5f0  RSP: ffff881e8d0cbc60  RFLAGS: 00000006
> >>>>>>>>>> RAX: 00000000000031bb  RBX: ffff883df0516740  RCX: 000000000000009a
> >>>>>>>>>> RDX: 0000000000000098  RSI: 000000000000009a  RDI: ffff883df051641c
> >>>>>>>>>> RBP: ffff881e8d0cbc60   R8: 0000000000000206   R9: 0000000000000000
> >>>>>>>>>> R10: 0000000000000000  R11: 0000000000000007  R12: ffff883df0516400
> >>>>>>>>>> R13: ffff883df051641c  R14: 0000000000000006  R15: 0000000000000005
> >>>>>>>>>> ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
> >>>>>>>>>> --- <NMI exception stack> ---
> >>>>>>>>>> #4 [ffff881e8d0cbc60] _raw_spin_lock_irqsave at ffffffff8168e5f0
> >>>>>>>>>> #5 [ffff881e8d0cbc68] release_inactive_stripe_list at ffffffffa0fe5500 [raid456]
> >>>>>>>>>> #6 [ffff881e8d0cbcb0] handle_active_stripes at ffffffffa0ff34c2 [raid456]
> >>>>>>>>>> #7 [ffff881e8d0cbd60] raid5d at ffffffffa0ff3c28 [raid456]
> >>>>>>>>>> #8 [ffff881e8d0cbe50] md_thread at ffffffff814fb745
> >>>>>>>>>> #9 [ffff881e8d0cbec8] kthread at ffffffff810b0a4f
> >>>>>>>>>> #10 [ffff881e8d0cbf50] ret_from_fork at ffffffff81697218
> >>>>>>>>>> 
> >>>>>>>>>> 
> >>>>>>>>>> crash> bt 163643 164083 171389 171391 171399 237019 237033 |grep "exception RIP:"
> >>>>>>>>>> [exception RIP: _raw_spin_lock_irq+63]
> >>>>>>>>>> [exception RIP: _raw_spin_lock_irq+63]
> >>>>>>>>>> [exception RIP: _raw_spin_lock_irq+58]
> >>>>>>>>>> [exception RIP: _raw_spin_lock_irq+63]
> >>>>>>>>>> [exception RIP: _raw_spin_lock_irq+58]
> >>>>>>>>>> [exception RIP: _raw_spin_lock_irq+63]
> >>>>>>>>>> [exception RIP: raid5_get_active_stripe+1809]
> >>>>>>>>>> 
> >>>>>>>>>> crash> bt 163643 164083 171389 171391 171399 237019 237033 |grep "raid5_get_active_stripe"
> >>>>>>>>>> #5 [ffff883bcee93588] raid5_get_active_stripe at ffffffffa0fef6cb [raid456]
> >>>>>>>>>> #5 [ffff883ba4af7588] raid5_get_active_stripe at ffffffffa0fef6cb [raid456]
> >>>>>>>>>> #5 [ffff88390877b588] raid5_get_active_stripe at ffffffffa0fef6cb [raid456]
> >>>>>>>>>> #5 [ffff88390803b588] raid5_get_active_stripe at ffffffffa0fef6cb [raid456]
> >>>>>>>>>> #5 [ffff88390831f588] raid5_get_active_stripe at ffffffffa0fef6cb [raid456]
> >>>>>>>>>> #5 [ffff8826fa7f7588] raid5_get_active_stripe at ffffffffa0fef6cb [raid456]
> >>>>>>>>>> [exception RIP: raid5_get_active_stripe+1809]
> >>>>>>>>>> 
> >>>>>>>>>> crash> bt 163643 164083 171389 171391 171399 237019 237033 |grep R12
> >>>>>>>>>> R10: 0000000000000003  R11: 0000000000000000  R12: ffff883df0516400
> >>>>>>>>>> R10: 0000000000000001  R11: 0000000000000000  R12: ffff883df0516400
> >>>>>>>>>> R10: 0000000000000005  R11: 0000000000000000  R12: ffff883df0516400
> >>>>>>>>>> R10: 0000000000000007  R11: 0000000000000000  R12: ffff883df0516400
> >>>>>>>>>> R10: 0000000000000009  R11: 0000000000000000  R12: ffff883df0516400
> >>>>>>>>>> R10: 0000000000000009  R11: 0000000000000000  R12: ffff883df0516400
> >>>>>>>>>> R10: 0000000000000000  R11: 0000000000000000  R12: ffff883df0516400
> >>>>>>>>>> 
> >>>>>>>>>> 
> >>>>>>>>>> The bad stripe_head (sh) is available at (usually in %r13):
> >>>>>>>>>> 
> >>>>>>>>>> crash> p ((struct stripe_head)*0xffff881e08d54d30).raid_conf
> >>>>>>>>>> $98 = (struct r5conf *) 0xffff883df0516400
> >>>>>>>>>> 
> >>>>>>>>>> and batch_head is not NULL... that's the issue:
> > 
> > Last mail appers not hitting the list, resend.
> > 
> > The new dump info does reveal some infos. Not sure if it's the issue you found,
> > but I did find a race condition. Please try below patch and report back:
> > 
> > diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
> > index ed5cd705b985..35637fe34820 100644
> > --- a/drivers/md/raid5.c
> > +++ b/drivers/md/raid5.c
> > @@ -806,13 +806,19 @@ static void stripe_add_to_batch_list(struct r5conf *conf, struct stripe_head *sh
> > 		}
> > 
> > 		/*
> > +		 * We must assign batch_head of this stripe within the
> > +		 * batch_lock, otherwise clear_batch_ready of batch head
> > +		 * stripe could clear BATCH_READY bit of this stripe and this
> > +		 * stripe->batch_head doesn't get assigned, which could
> > +		 * confuse clear_batch_ready for this stripe
> > +		 */
> > +		sh->batch_head = head->batch_head;
> > +		/*
> > 		 * at this point, head's BATCH_READY could be cleared, but we
> > 		 * can still add the stripe to batch list
> > 		 */
> > 		list_add(&sh->batch_list, &head->batch_list);
> > 		spin_unlock(&head->batch_head->batch_lock);
> > -
> > -		sh->batch_head = head->batch_head;
> > 	} else {
> > 		head->batch_head = head;
> > 		sh->batch_head = head->batch_head;
> 
> Awesome! I will apply your patch today on two of my Lustre servers and report back if I see another occurrence, or after some time if it doesn’t show up. We’ll need to wait for at least a couple weeks to be sure this does actually fix the issue I’m seeing.

Thanks!
Cc: masterprenium too, who reported the issue before. please check if the patch fix the issue.

Thanks,
Shaohua

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: BUG_ON(sh->batch_head) in init_stripe()
  2017-08-21 17:46                       ` Shaohua Li
@ 2017-09-05 17:54                         ` Stephane Thiell
  0 siblings, 0 replies; 15+ messages in thread
From: Stephane Thiell @ 2017-09-05 17:54 UTC (permalink / raw)
  To: Shaohua Li; +Cc: masterprenium.lkml, Guoqing Jiang, linux-raid

> On Aug 21, 2017, at 10:46 AM, Shaohua Li <shli@kernel.org> wrote:
> <snip>
>>> The new dump info does reveal some infos. Not sure if it's the issue you found,
>>> but I did find a race condition. Please try below patch and report back:
>>> 
>>> diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
>>> index ed5cd705b985..35637fe34820 100644
>>> --- a/drivers/md/raid5.c
>>> +++ b/drivers/md/raid5.c
>>> @@ -806,13 +806,19 @@ static void stripe_add_to_batch_list(struct r5conf *conf, struct stripe_head *sh
>>> 		}
>>> 
>>> 		/*
>>> +		 * We must assign batch_head of this stripe within the
>>> +		 * batch_lock, otherwise clear_batch_ready of batch head
>>> +		 * stripe could clear BATCH_READY bit of this stripe and this
>>> +		 * stripe->batch_head doesn't get assigned, which could
>>> +		 * confuse clear_batch_ready for this stripe
>>> +		 */
>>> +		sh->batch_head = head->batch_head;
>>> +		/*
>>> 		 * at this point, head's BATCH_READY could be cleared, but we
>>> 		 * can still add the stripe to batch list
>>> 		 */
>>> 		list_add(&sh->batch_list, &head->batch_list);
>>> 		spin_unlock(&head->batch_head->batch_lock);
>>> -
>>> -		sh->batch_head = head->batch_head;
>>> 	} else {
>>> 		head->batch_head = head;
>>> 		sh->batch_head = head->batch_head;
>> 
>> Awesome! I will apply your patch today on two of my Lustre servers and report back if I see another occurrence, or after some time if it doesn’t show up. We’ll need to wait for at least a couple weeks to be sure this does actually fix the issue I’m seeing.
> 
> Thanks!
> Cc: masterprenium too, who reported the issue before. please check if the patch fix the issue.

Shaohua, we have now been running with your patch for 15 days without any issue on two Lustre servers that were never idle with mixed workload and checks running from time to time. Looking at the previous failures, it is very likely that this patch does actually fix our issue! I’ll update if needed. Thanks again.

Best,
Stephane


^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2017-09-05 17:54 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-05-08 17:15 BUG_ON(sh->batch_head) in init_stripe() Stephane Thiell
2017-08-13 17:29 ` Stephane Thiell
2017-08-15  9:24   ` Guoqing Jiang
2017-08-15 17:59     ` Stephane Thiell
2017-08-15 18:39       ` Shaohua Li
2017-08-15 19:27         ` Stephane Thiell
2017-08-17 22:58           ` Shaohua Li
2017-08-18  4:13             ` Stephane Thiell
2017-08-18  5:50               ` Shaohua Li
     [not found]                 ` <C45EC19A-1CF2-45BA-B907-BCD5B4E644F0@stanford.edu>
2017-08-21 16:10                   ` Shaohua Li
2017-08-21 17:35                     ` Stephane Thiell
2017-08-21 17:46                       ` Shaohua Li
2017-09-05 17:54                         ` Stephane Thiell
2017-08-15 18:36     ` Shaohua Li
2017-08-15 19:07       ` Stephane Thiell

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.