All of lore.kernel.org
 help / color / mirror / Atom feed
* Hard CPU Lockup when accessing MD RAID5
@ 2016-04-12 21:54 Daniel Walker
  2016-04-13 17:00 ` Shaohua Li
  0 siblings, 1 reply; 5+ messages in thread
From: Daniel Walker @ 2016-04-12 21:54 UTC (permalink / raw)
  To: linux-raid

Im having some issues on a brand new Supermicro server that we have 
running in production along side a few other machines which are 
identical to this server..

The output from the netconsole attached to the server is here:

Apr 12 21:34:45  [75704.964946] NMI watchdog: Watchdog detected hard 
LOCKUP on cpu 6
Apr 12 21:34:45
Apr 12 21:34:45  [75704.964973] Modules linked in:
Apr 12 21:34:45   ipt_REJECT
Apr 12 21:34:45   nf_reject_ipv4
Apr 12 21:34:45   iptable_mangle
Apr 12 21:34:45   tun
Apr 12 21:34:45   netconsole
Apr 12 21:34:45   configfs
Apr 12 21:34:45   xt_multiport
Apr 12 21:34:45   ip6table_filter
Apr 12 21:34:45   ip6_tables
Apr 12 21:34:45   iptable_filter
Apr 12 21:34:45   ip_tables
Apr 12 21:34:45   x_tables
Apr 12 21:34:45   bridge
Apr 12 21:34:45   stp
Apr 12 21:34:45   llc
Apr 12 21:34:45   bonding
Apr 12 21:34:45   ext4
Apr 12 21:34:45   crc16
Apr 12 21:34:45   mbcache
Apr 12 21:34:45   jbd2
Apr 12 21:34:45   raid1
Apr 12 21:34:45   raid0
Apr 12 21:34:45   raid456
Apr 12 21:34:45   async_raid6_recov
Apr 12 21:34:45   async_memcpy
Apr 12 21:34:45   async_pq
Apr 12 21:34:45   async_xor
Apr 12 21:34:45   xor
Apr 12 21:34:45   async_tx
Apr 12 21:34:45   raid6_pq
Apr 12 21:34:45   md_mod
Apr 12 21:34:45   sr_mod
Apr 12 21:34:45   cdrom
Apr 12 21:34:45   usb_storage
Apr 12 21:34:45   hid_generic
Apr 12 21:34:45   usbhid
Apr 12 21:34:45   hid
Apr 12 21:34:45   sg
Apr 12 21:34:45   sd_mod
Apr 12 21:34:45   x86_pkg_temp_thermal
Apr 12 21:34:45   coretemp
Apr 12 21:34:45   crct10dif_pclmul
Apr 12 21:34:45   crc32_pclmul
Apr 12 21:34:45   crc32c_intel
Apr 12 21:34:45   jitterentropy_rng
Apr 12 21:34:45   sha256_ssse3
Apr 12 21:34:45   sha256_generic
Apr 12 21:34:45   hmac
Apr 12 21:34:45   iTCO_wdt
Apr 12 21:34:45   iTCO_vendor_support
Apr 12 21:34:45   drbg
Apr 12 21:34:45   ansi_cprng
Apr 12 21:34:45   aesni_intel
Apr 12 21:34:45   aes_x86_64
Apr 12 21:34:45   lrw
Apr 12 21:34:45   gf128mul
Apr 12 21:34:45   glue_helper
Apr 12 21:34:45   ablk_helper
Apr 12 21:34:45   cryptd
Apr 12 21:34:45   ahci
Apr 12 21:34:45   libahci
Apr 12 21:34:45   sb_edac
Apr 12 21:34:45   libata
Apr 12 21:34:45   igb
Apr 12 21:34:45   megaraid_sas
Apr 12 21:34:45   xhci_pci
Apr 12 21:34:45   ehci_pci
Apr 12 21:34:45   i2c_algo_bit
Apr 12 21:34:45   xhci_hcd
Apr 12 21:34:45   ehci_hcd
Apr 12 21:34:45   edac_core
Apr 12 21:34:45   ptp
Apr 12 21:34:45   mei_me
Apr 12 21:34:45   lpc_ich
Apr 12 21:34:45   i2c_i801
Apr 12 21:34:45   usbcore
Apr 12 21:34:45   pps_core
Apr 12 21:34:45   mfd_core
Apr 12 21:34:45   mei
Apr 12 21:34:45   usb_common
Apr 12 21:34:45   i2c_core
Apr 12 21:34:45   ioatdma
Apr 12 21:34:45   scsi_mod
Apr 12 21:34:45   dca
Apr 12 21:34:45   ipmi_si
Apr 12 21:34:45   ipmi_msghandler
Apr 12 21:34:45   acpi_power_meter
Apr 12 21:34:45   tpm_tis
Apr 12 21:34:45   tpm
Apr 12 21:34:45   processor
Apr 12 21:34:45   button
Apr 12 21:34:45
Apr 12 21:34:45  [75704.965874] CPU: 6 PID: 25339 Comm: main Not tainted 
4.4.1 #2
Apr 12 21:34:45  [75704.965916] Hardware name: Supermicro Super 
Server/X10DRi-LN4+, BIOS 2.0 12/17/2015
Apr 12 21:34:45  [75704.965979]  0000000000000000
Apr 12 21:34:45   ffffffff812abdf3
Apr 12 21:34:45   0000000000000000
Apr 12 21:34:45   ffffffff810cf5f5
Apr 12 21:34:45
Apr 12 21:34:45  [75704.966054]  ffff881ff2870000
Apr 12 21:34:45   ffffffff810fcea2
Apr 12 21:34:45   0000000000000001
Apr 12 21:34:45   ffff881fffcc5e58
Apr 12 21:34:45
Apr 12 21:34:45  [75704.966134]  ffff881fffccaf00
Apr 12 21:34:45   ffff881fffccb100
Apr 12 21:34:45   ffff881ff2870000
Apr 12 21:34:45   ffffffff8101bc63
Apr 12 21:34:45
Apr 12 21:34:45  [75704.966211] Call Trace:
Apr 12 21:34:45  [75704.966246]  <NMI>
Apr 12 21:34:45   [<ffffffff812abdf3>] ? dump_stack+0x40/0x5d
Apr 12 21:34:45  [75704.966297]  [<ffffffff810cf5f5>] ? 
watchdog_overflow_callback+0xb5/0xd0
Apr 12 21:34:45  [75704.966339]  [<ffffffff810fcea2>] ? 
__perf_event_overflow+0x82/0x1c0
Apr 12 21:34:45  [75704.966384]  [<ffffffff8101bc63>] ? 
intel_pmu_handle_irq+0x1c3/0x3e0
Apr 12 21:34:45  [75704.966431]  [<ffffffff8113b5cb>] ? 
vunmap_page_range+0x1bb/0x320
Apr 12 21:34:45  [75704.966474]  [<ffffffff813213e0>] ? 
ghes_copy_tofrom_phys+0x110/0x1d0
Apr 12 21:34:45  [75704.966519]  [<ffffffff81014f53>] ? 
perf_event_nmi_handler+0x23/0x40
Apr 12 21:34:45  [75704.966560]  [<ffffffff81007b85>] ? 
nmi_handle+0x65/0x100
Apr 12 21:34:45  [75704.966597]  [<ffffffff81007dfe>] ? do_nmi+0x1de/0x360
Apr 12 21:34:45  [75704.970603]  [<ffffffff8148f957>] ? 
end_repeat_nmi+0x1a/0x1e
Apr 12 21:34:45  [75704.970644]  [<ffffffff810862ca>] ? 
queued_spin_lock_slowpath+0xea/0x150
Apr 12 21:34:45  [75704.970685]  [<ffffffff810862ca>] ? 
queued_spin_lock_slowpath+0xea/0x150
Apr 12 21:34:45  [75704.970728]  [<ffffffff810862ca>] ? 
queued_spin_lock_slowpath+0xea/0x150
Apr 12 21:34:45  [75704.970768]  <<EOE>>
Apr 12 21:34:45   [<ffffffffa01b413b>] ? make_request+0x60b/0xbd0 [raid456]
Apr 12 21:34:45  [75704.970838]  [<ffffffff810815c0>] ? wait_woken+0x80/0x80
Apr 12 21:34:45  [75704.970878]  [<ffffffff81151ec4>] ? 
kmem_cache_alloc+0xf4/0x120
Apr 12 21:34:45  [75704.970922]  [<ffffffffa017632d>] ? 
md_make_request+0xdd/0x220 [md_mod]
Apr 12 21:34:45  [75704.970969]  [<ffffffff81219fde>] ? 
xfs_map_buffer.isra.12+0x2e/0x60
Apr 12 21:34:45  [75704.971012]  [<ffffffff8128691d>] ? 
generic_make_request+0xed/0x1d0
Apr 12 21:34:45  [75704.971052]  [<ffffffff81286a5a>] ? 
submit_bio+0x5a/0x140
Apr 12 21:34:45  [75704.971098]  [<ffffffff81113379>] ? 
release_pages+0xc9/0x270
Apr 12 21:34:45  [75704.971145]  [<ffffffff811a2c01>] ? 
do_mpage_readpage+0x2d1/0x640
Apr 12 21:34:45  [75704.971187]  [<ffffffff811a304d>] ? 
mpage_readpages+0xdd/0x130
Apr 12 21:34:45  [75704.971226]  [<ffffffff8121b510>] ? 
__xfs_get_blocks+0x750/0x750
Apr 12 21:34:45  [75704.971267]  [<ffffffff8121b510>] ? 
__xfs_get_blocks+0x750/0x750
Apr 12 21:34:45  [75704.971313]  [<ffffffff8114ad45>] ? 
alloc_pages_current+0x85/0x110
Apr 12 21:34:45  [75704.971354]  [<ffffffff81111d25>] ? 
__do_page_cache_readahead+0x165/0x1f0
Apr 12 21:34:45  [75704.971399]  [<ffffffff81105902>] ? 
pagecache_get_page+0x22/0x1a0
Apr 12 21:34:45  [75704.971441]  [<ffffffff8110768c>] ? 
filemap_fault+0x37c/0x400
Apr 12 21:34:45  [75704.971481]  [<ffffffff8122474b>] ? 
xfs_filemap_fault+0x3b/0x80
Apr 12 21:34:45  [75704.971526]  [<ffffffff8112d2da>] ? __do_fault+0x3a/0xc0
Apr 12 21:34:45  [75704.971564]  [<ffffffff81130883>] ? 
handle_mm_fault+0x1063/0x1650
Apr 12 21:34:45  [75704.971614]  [<ffffffff8103bdae>] ? 
__do_page_fault+0x11e/0x370
Apr 12 21:34:45  [75704.971653]  [<ffffffff811aa4ff>] ? 
SyS_epoll_wait+0x8f/0xd0
Apr 12 21:34:45  [75704.971694]  [<ffffffff8148f64f>] ? page_fault+0x1f/0x30
Apr 12 21:34:45  [75705.493640] NMI watchdog: Watchdog detected hard 
LOCKUP on cpu 12
Apr 12 21:34:45
Apr 12 21:34:45  [75705.493668] Modules linked in:
Apr 12 21:34:45   ipt_REJECT
Apr 12 21:34:45   nf_reject_ipv4
Apr 12 21:34:45   iptable_mangle
Apr 12 21:34:45   tun
Apr 12 21:34:45   netconsole
Apr 12 21:34:45   configfs
Apr 12 21:34:45   xt_multiport
Apr 12 21:34:45   ip6table_filter
Apr 12 21:34:45   ip6_tables
Apr 12 21:34:45   iptable_filter
Apr 12 21:34:45   ip_tables
Apr 12 21:34:45   x_tables
Apr 12 21:34:45   bridge
Apr 12 21:34:45   stp
Apr 12 21:34:45   llc
Apr 12 21:34:45   bonding
Apr 12 21:34:45   ext4
Apr 12 21:34:45   crc16
Apr 12 21:34:45   mbcache
Apr 12 21:34:45   jbd2
Apr 12 21:34:45   raid1
Apr 12 21:34:45   raid0
Apr 12 21:34:45   raid456
Apr 12 21:34:45   async_raid6_recov
Apr 12 21:34:45   async_memcpy
Apr 12 21:34:45   async_pq
Apr 12 21:34:45   async_xor
Apr 12 21:34:45   xor
Apr 12 21:34:45   async_tx
Apr 12 21:34:45   raid6_pq
Apr 12 21:34:45   md_mod
Apr 12 21:34:45   sr_mod
Apr 12 21:34:45   cdrom
Apr 12 21:34:45   usb_storage
Apr 12 21:34:45   hid_generic
Apr 12 21:34:45   usbhid
Apr 12 21:34:45   hid
Apr 12 21:34:45   sg
Apr 12 21:34:45   sd_mod
Apr 12 21:34:45   x86_pkg_temp_thermal
Apr 12 21:34:45   coretemp
Apr 12 21:34:45   crct10dif_pclmul
Apr 12 21:34:45   crc32_pclmul
Apr 12 21:34:45   crc32c_intel
Apr 12 21:34:45   jitterentropy_rng
Apr 12 21:34:45   sha256_ssse3
Apr 12 21:34:45   sha256_generic
Apr 12 21:34:45   hmac
Apr 12 21:34:45   iTCO_wdt
Apr 12 21:34:45   iTCO_vendor_support
Apr 12 21:34:45   drbg
Apr 12 21:34:45   ansi_cprng
Apr 12 21:34:45   aesni_intel
Apr 12 21:34:45   aes_x86_64
Apr 12 21:34:45   lrw
Apr 12 21:34:45   gf128mul
Apr 12 21:34:45   glue_helper
Apr 12 21:34:45   ablk_helper
Apr 12 21:34:45   cryptd
Apr 12 21:34:45   ahci
Apr 12 21:34:45   libahci
Apr 12 21:34:45   sb_edac
Apr 12 21:34:45   libata
Apr 12 21:34:45   igb
Apr 12 21:34:45   megaraid_sas
Apr 12 21:34:45   xhci_pci
Apr 12 21:34:45   ehci_pci
Apr 12 21:34:45   i2c_algo_bit
Apr 12 21:34:45   xhci_hcd
Apr 12 21:34:45   ehci_hcd
Apr 12 21:34:45   edac_core
Apr 12 21:34:45   ptp
Apr 12 21:34:45   mei_me
Apr 12 21:34:45   lpc_ich
Apr 12 21:34:45   i2c_i801
Apr 12 21:34:45   usbcore
Apr 12 21:34:45   pps_core
Apr 12 21:34:45   mfd_core
Apr 12 21:34:45   mei
Apr 12 21:34:45   usb_common
Apr 12 21:34:45   i2c_core
Apr 12 21:34:45   ioatdma
Apr 12 21:34:45   scsi_mod
Apr 12 21:34:45   dca
Apr 12 21:34:45   ipmi_si
Apr 12 21:34:45   ipmi_msghandler
Apr 12 21:34:45   acpi_power_meter
Apr 12 21:34:45   tpm_tis
Apr 12 21:34:45   tpm
Apr 12 21:34:45   processor
Apr 12 21:34:45   button
Apr 12 21:34:45
Apr 12 21:34:45  [75705.494688] CPU: 12 PID: 32350 Comm: main Not 
tainted 4.4.1 #2
Apr 12 21:34:45  [75705.494728] Hardware name: Supermicro Super 
Server/X10DRi-LN4+, BIOS 2.0 12/17/2015
Apr 12 21:34:45  [75705.494790]  0000000000000000
Apr 12 21:34:45   ffffffff812abdf3
Apr 12 21:34:45   0000000000000000
Apr 12 21:34:45   ffffffff810cf5f5
Apr 12 21:34:45
Apr 12 21:34:45  [75705.494886]  ffff883ff29a0000
Apr 12 21:34:45   ffffffff810fcea2
Apr 12 21:34:45   0000000000000001
Apr 12 21:34:45   ffff88407fc85e58
Apr 12 21:34:45
Apr 12 21:34:45  [75705.494976]  ffff88407fc8af00
Apr 12 21:34:45   ffff88407fc8b100
Apr 12 21:34:45   ffff883ff29a0000
Apr 12 21:34:45   ffffffff8101bc63
Apr 12 21:34:45
Apr 12 21:34:45  [75705.495064] Call Trace:
Apr 12 21:34:45  [75705.495094]  <NMI>
Apr 12 21:34:45   [<ffffffff812abdf3>] ? dump_stack+0x40/0x5d
Apr 12 21:34:45  [75705.495150]  [<ffffffff810cf5f5>] ? 
watchdog_overflow_callback+0xb5/0xd0
Apr 12 21:34:45  [75705.495193]  [<ffffffff810fcea2>] ? 
__perf_event_overflow+0x82/0x1c0
Apr 12 21:34:45  [75705.495237]  [<ffffffff8101bc63>] ? 
intel_pmu_handle_irq+0x1c3/0x3e0
Apr 12 21:34:45  [75705.495284]  [<ffffffff8113b5cb>] ? 
vunmap_page_range+0x1bb/0x320
Apr 12 21:34:45  [75705.495330]  [<ffffffff813213e0>] ? 
ghes_copy_tofrom_phys+0x110/0x1d0
Apr 12 21:34:45  [75705.495373]  [<ffffffff81014f53>] ? 
perf_event_nmi_handler+0x23/0x40
Apr 12 21:34:45  [75705.495418]  [<ffffffff81007b85>] ? 
nmi_handle+0x65/0x100
Apr 12 21:34:45  [75705.495458]  [<ffffffff81007d2e>] ? do_nmi+0x10e/0x360
Apr 12 21:34:45  [75705.495497]  [<ffffffff8148f957>] ? 
end_repeat_nmi+0x1a/0x1e
Apr 12 21:34:45  [75705.495540]  [<ffffffff810862ca>] ? 
queued_spin_lock_slowpath+0xea/0x150
Apr 12 21:34:45  [75705.495581]  [<ffffffff810862ca>] ? 
queued_spin_lock_slowpath+0xea/0x150
Apr 12 21:34:45  [75705.495621]  [<ffffffff810862ca>] ? 
queued_spin_lock_slowpath+0xea/0x150
Apr 12 21:34:45  [75705.495661]  <<EOE>>
Apr 12 21:34:45   [<ffffffffa01b413b>] ? make_request+0x60b/0xbd0 [raid456]
Apr 12 21:34:45  [75705.495733]  [<ffffffff81282d87>] ? 
blk_rq_init+0x87/0xa0
Apr 12 21:34:45  [75705.495771]  [<ffffffff81283e3c>] ? 
get_request+0x29c/0x6e0
Apr 12 21:34:45  [75705.495812]  [<ffffffff810815c0>] ? wait_woken+0x80/0x80
Apr 12 21:34:45  [75705.495853]  [<ffffffffa017632d>] ? 
md_make_request+0xdd/0x220 [md_mod]
Apr 12 21:34:45  [75705.495898]  [<ffffffff8128829e>] ? 
blk_queue_bio+0x15e/0x350
Apr 12 21:34:45  [75705.495937]  [<ffffffff8128691d>] ? 
generic_make_request+0xed/0x1d0
Apr 12 21:34:45  [75705.495978]  [<ffffffff81286a5a>] ? 
submit_bio+0x5a/0x140
Apr 12 21:34:45  [75705.496018]  [<ffffffff811a215e>] ? 
mpage_bio_submit+0x1e/0x30
Apr 12 21:34:45  [75705.496057]  [<ffffffff811a3076>] ? 
mpage_readpages+0x106/0x130
Apr 12 21:34:45  [75705.496102]  [<ffffffff8121b510>] ? 
__xfs_get_blocks+0x750/0x750
Apr 12 21:34:45  [75705.496144]  [<ffffffff8121b510>] ? 
__xfs_get_blocks+0x750/0x750
Apr 12 21:34:45  [75705.496185]  [<ffffffff8114ad45>] ? 
alloc_pages_current+0x85/0x110
Apr 12 21:34:45  [75705.496227]  [<ffffffff81111d25>] ? 
__do_page_cache_readahead+0x165/0x1f0
Apr 12 21:34:45  [75705.496268]  [<ffffffff811344f5>] ? vma_link+0x75/0xb0
Apr 12 21:34:45  [75705.496307]  [<ffffffff811120eb>] ? 
force_page_cache_readahead+0x9b/0xe0
Apr 12 21:34:45  [75705.496352]  [<ffffffff8113f876>] ? 
madvise_willneed+0x76/0x140
Apr 12 21:34:45  [75705.496395]  [<ffffffff811301ce>] ? 
handle_mm_fault+0x9ae/0x1650
Apr 12 21:34:45  [75705.496437]  [<ffffffff81133dcb>] ? find_vma+0x5b/0x70
Apr 12 21:34:45  [75705.496476]  [<ffffffff8113fc52>] ? 
SyS_madvise+0x312/0x6f0
Apr 12 21:34:45  [75705.496515]  [<ffffffff8148d9db>] ? 
entry_SYSCALL_64_fastpath+0x16/0x6e
Apr 12 21:34:47  [75707.118049] NMI watchdog: Watchdog detected hard 
LOCKUP on cpu 15
Apr 12 21:34:47
Apr 12 21:34:47  [75707.118078] Modules linked in:
Apr 12 21:34:47   ipt_REJECT
Apr 12 21:34:47   nf_reject_ipv4
Apr 12 21:34:47   iptable_mangle
Apr 12 21:34:47   tun
Apr 12 21:34:47   netconsole
Apr 12 21:34:47   configfs
Apr 12 21:34:47   xt_multiport
Apr 12 21:34:47   ip6table_filter
Apr 12 21:34:47   ip6_tables
Apr 12 21:34:47   iptable_filter
Apr 12 21:34:47   ip_tables
Apr 12 21:34:47   x_tables
Apr 12 21:34:47   bridge
Apr 12 21:34:47   stp
Apr 12 21:34:47   llc
Apr 12 21:34:47   bonding
Apr 12 21:34:47   ext4
Apr 12 21:34:47   crc16
Apr 12 21:34:47   mbcache
Apr 12 21:34:47   jbd2
Apr 12 21:34:47   raid1
Apr 12 21:34:47   raid0
Apr 12 21:34:47   raid456
Apr 12 21:34:47   async_raid6_recov
Apr 12 21:34:47   async_memcpy
Apr 12 21:34:47   async_pq
Apr 12 21:34:47   async_xor
Apr 12 21:34:47   xor
Apr 12 21:34:47   async_tx
Apr 12 21:34:47   raid6_pq
Apr 12 21:34:47   md_mod
Apr 12 21:34:47   sr_mod
Apr 12 21:34:47   cdrom
Apr 12 21:34:47   usb_storage
Apr 12 21:34:47   hid_generic
Apr 12 21:34:47   usbhid
Apr 12 21:34:47   hid
Apr 12 21:34:47   sg
Apr 12 21:34:47   sd_mod
Apr 12 21:34:47   x86_pkg_temp_thermal
Apr 12 21:34:47   coretemp
Apr 12 21:34:47   crct10dif_pclmul
Apr 12 21:34:47   crc32_pclmul
Apr 12 21:34:47   crc32c_intel
Apr 12 21:34:47   jitterentropy_rng
Apr 12 21:34:47   sha256_ssse3
Apr 12 21:34:47   sha256_generic
Apr 12 21:34:47   hmac
Apr 12 21:34:47   iTCO_wdt
Apr 12 21:34:47   iTCO_vendor_support
Apr 12 21:34:47   drbg
Apr 12 21:34:47   ansi_cprng
Apr 12 21:34:47   aesni_intel
Apr 12 21:34:47   aes_x86_64
Apr 12 21:34:47   lrw
Apr 12 21:34:47   gf128mul
Apr 12 21:34:47   glue_helper
Apr 12 21:34:47   ablk_helper
Apr 12 21:34:47   cryptd
Apr 12 21:34:47   ahci
Apr 12 21:34:47   libahci
Apr 12 21:34:47   sb_edac
Apr 12 21:34:47   libata
Apr 12 21:34:47   igb
Apr 12 21:34:47   megaraid_sas
Apr 12 21:34:47   xhci_pci
Apr 12 21:34:47   ehci_pci
Apr 12 21:34:47   i2c_algo_bit
Apr 12 21:34:47   xhci_hcd
Apr 12 21:34:47   ehci_hcd
Apr 12 21:34:47   edac_core
Apr 12 21:34:47   ptp
Apr 12 21:34:47   mei_me
Apr 12 21:34:47   lpc_ich
Apr 12 21:34:47   i2c_i801
Apr 12 21:34:47   usbcore
Apr 12 21:34:47   pps_core
Apr 12 21:34:47   mfd_core
Apr 12 21:34:47   mei
Apr 12 21:34:47   usb_common
Apr 12 21:34:47   i2c_core
Apr 12 21:34:47   ioatdma
Apr 12 21:34:47   scsi_mod
Apr 12 21:34:47   dca
Apr 12 21:34:47   ipmi_si
Apr 12 21:34:47   ipmi_msghandler
Apr 12 21:34:47   acpi_power_meter
Apr 12 21:34:47   tpm_tis
Apr 12 21:34:47   tpm
Apr 12 21:34:47   processor
Apr 12 21:34:47   button
Apr 12 21:34:47
Apr 12 21:34:47  [75707.119088] CPU: 15 PID: 31940 Comm: main Not 
tainted 4.4.1 #2
Apr 12 21:34:47  [75707.119134] Hardware name: Supermicro Super 
Server/X10DRi-LN4+, BIOS 2.0 12/17/2015
Apr 12 21:34:47  [75707.119196]  0000000000000000
Apr 12 21:34:47   ffffffff812abdf3
Apr 12 21:34:47   0000000000000000
Apr 12 21:34:47   ffffffff810cf5f5
Apr 12 21:34:47
Apr 12 21:34:47  [75707.119277]  ffff883ff2a20000
Apr 12 21:34:47   ffffffff810fcea2
Apr 12 21:34:47   0000000000000001
Apr 12 21:34:47   ffff88407fce5e58
Apr 12 21:34:47
Apr 12 21:34:47  [75707.119360]  ffff88407fceaf00
Apr 12 21:34:47   ffff88407fceb100
Apr 12 21:34:47   ffff883ff2a20000
Apr 12 21:34:47   ffffffff8101bc63
Apr 12 21:34:47
Apr 12 21:34:47  [75707.119439] Call Trace:
Apr 12 21:34:47  [75707.119471]  <NMI>
Apr 12 21:34:47   [<ffffffff812abdf3>] ? dump_stack+0x40/0x5d
Apr 12 21:34:47  [75707.119527]  [<ffffffff810cf5f5>] ? 
watchdog_overflow_callback+0xb5/0xd0
Apr 12 21:34:47  [75707.119571]  [<ffffffff810fcea2>] ? 
__perf_event_overflow+0x82/0x1c0
Apr 12 21:34:47  [75707.119614]  [<ffffffff8101bc63>] ? 
intel_pmu_handle_irq+0x1c3/0x3e0
Apr 12 21:34:47  [75707.119657]  [<ffffffff8113b5cb>] ? 
vunmap_page_range+0x1bb/0x320
Apr 12 21:34:47  [75707.119703]  [<ffffffff813213e0>] ? 
ghes_copy_tofrom_phys+0x110/0x1d0
Apr 12 21:34:47  [75707.119758]  [<ffffffff81014f53>] ? 
perf_event_nmi_handler+0x23/0x40
Apr 12 21:34:47  [75707.119800]  [<ffffffff81007b85>] ? 
nmi_handle+0x65/0x100
Apr 12 21:34:47  [75707.119838]  [<ffffffff81007d2e>] ? do_nmi+0x10e/0x360
Apr 12 21:34:47  [75707.119878]  [<ffffffff8148f957>] ? 
end_repeat_nmi+0x1a/0x1e
Apr 12 21:34:47  [75707.119920]  [<ffffffff810862ca>] ? 
queued_spin_lock_slowpath+0xea/0x150
Apr 12 21:34:47  [75707.119962]  [<ffffffff810862ca>] ? 
queued_spin_lock_slowpath+0xea/0x150
Apr 12 21:34:47  [75707.120002]  [<ffffffff810862ca>] ? 
queued_spin_lock_slowpath+0xea/0x150
Apr 12 21:34:47  [75707.120042]  <<EOE>>
Apr 12 21:34:47   [<ffffffffa01b413b>] ? make_request+0x60b/0xbd0 [raid456]
Apr 12 21:34:47  [75707.120113]  [<ffffffff810815c0>] ? wait_woken+0x80/0x80
Apr 12 21:34:47  [75707.120152]  [<ffffffffa017632d>] ? 
md_make_request+0xdd/0x220 [md_mod]
Apr 12 21:34:47  [75707.120195]  [<ffffffff8128691d>] ? 
generic_make_request+0xed/0x1d0
Apr 12 21:34:47  [75707.120236]  [<ffffffff81286a5a>] ? 
submit_bio+0x5a/0x140
Apr 12 21:34:47  [75707.120277]  [<ffffffff8112afaf>] ? 
workingset_refault+0x4f/0xa0
Apr 12 21:34:47  [75707.120320]  [<ffffffff811a215e>] ? 
mpage_bio_submit+0x1e/0x30
Apr 12 21:34:47  [75707.120359]  [<ffffffff811a3076>] ? 
mpage_readpages+0x106/0x130
Apr 12 21:34:47  [75707.120401]  [<ffffffff8121b510>] ? 
__xfs_get_blocks+0x750/0x750
Apr 12 21:34:47  [75707.120439]  [<ffffffff8121b510>] ? 
__xfs_get_blocks+0x750/0x750
Apr 12 21:34:47  [75707.120481]  [<ffffffff8114ad45>] ? 
alloc_pages_current+0x85/0x110
Apr 12 21:34:47  [75707.120523]  [<ffffffff81111d25>] ? 
__do_page_cache_readahead+0x165/0x1f0
Apr 12 21:34:47  [75707.120564]  [<ffffffff811344f5>] ? vma_link+0x75/0xb0
Apr 12 21:34:47  [75707.120602]  [<ffffffff811120c7>] ? 
force_page_cache_readahead+0x77/0xe0
Apr 12 21:34:47  [75707.120644]  [<ffffffff8113f876>] ? 
madvise_willneed+0x76/0x140
Apr 12 21:34:47  [75707.120683]  [<ffffffff811301ce>] ? 
handle_mm_fault+0x9ae/0x1650
Apr 12 21:34:47  [75707.120722]  [<ffffffff81133dcb>] ? find_vma+0x5b/0x70
Apr 12 21:34:47  [75707.120760]  [<ffffffff8113fc52>] ? 
SyS_madvise+0x312/0x6f0
Apr 12 21:34:47  [75707.120799]  [<ffffffff8148d9db>] ? 
entry_SYSCALL_64_fastpath+0x16/0x6e

Once this starts, a couple of minutes goes by and the machine locks up 
completely.

I have been unable to locate the problem here, anyone that can point me 
in the right direction?

Best regards

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Hard CPU Lockup when accessing MD RAID5
  2016-04-12 21:54 Hard CPU Lockup when accessing MD RAID5 Daniel Walker
@ 2016-04-13 17:00 ` Shaohua Li
  2016-04-20  6:52   ` Daniel Walker
  0 siblings, 1 reply; 5+ messages in thread
From: Shaohua Li @ 2016-04-13 17:00 UTC (permalink / raw)
  To: Daniel Walker; +Cc: linux-raid

Looks there is a deadlock trying to hold the device_lock or hash_lock. anything
abormal print out before the NMI watchdog? What is running in the machine?
Looks this is old kernel, is it possible you can try a latest kernel and report
back?

Thanks,
Shaohua

On Tue, Apr 12, 2016 at 09:54:08PM +0000, Daniel Walker wrote:
> Im having some issues on a brand new Supermicro server that we have running
> in production along side a few other machines which are identical to this
> server..
> 
> The output from the netconsole attached to the server is here:
> 
> Apr 12 21:34:45  [75704.964946] NMI watchdog: Watchdog detected hard LOCKUP
> on cpu 6
> Apr 12 21:34:45
> Apr 12 21:34:45  [75704.964973] Modules linked in:
> Apr 12 21:34:45   ipt_REJECT
> Apr 12 21:34:45   nf_reject_ipv4
> Apr 12 21:34:45   iptable_mangle
> Apr 12 21:34:45   tun
> Apr 12 21:34:45   netconsole
> Apr 12 21:34:45   configfs
> Apr 12 21:34:45   xt_multiport
> Apr 12 21:34:45   ip6table_filter
> Apr 12 21:34:45   ip6_tables
> Apr 12 21:34:45   iptable_filter
> Apr 12 21:34:45   ip_tables
> Apr 12 21:34:45   x_tables
> Apr 12 21:34:45   bridge
> Apr 12 21:34:45   stp
> Apr 12 21:34:45   llc
> Apr 12 21:34:45   bonding
> Apr 12 21:34:45   ext4
> Apr 12 21:34:45   crc16
> Apr 12 21:34:45   mbcache
> Apr 12 21:34:45   jbd2
> Apr 12 21:34:45   raid1
> Apr 12 21:34:45   raid0
> Apr 12 21:34:45   raid456
> Apr 12 21:34:45   async_raid6_recov
> Apr 12 21:34:45   async_memcpy
> Apr 12 21:34:45   async_pq
> Apr 12 21:34:45   async_xor
> Apr 12 21:34:45   xor
> Apr 12 21:34:45   async_tx
> Apr 12 21:34:45   raid6_pq
> Apr 12 21:34:45   md_mod
> Apr 12 21:34:45   sr_mod
> Apr 12 21:34:45   cdrom
> Apr 12 21:34:45   usb_storage
> Apr 12 21:34:45   hid_generic
> Apr 12 21:34:45   usbhid
> Apr 12 21:34:45   hid
> Apr 12 21:34:45   sg
> Apr 12 21:34:45   sd_mod
> Apr 12 21:34:45   x86_pkg_temp_thermal
> Apr 12 21:34:45   coretemp
> Apr 12 21:34:45   crct10dif_pclmul
> Apr 12 21:34:45   crc32_pclmul
> Apr 12 21:34:45   crc32c_intel
> Apr 12 21:34:45   jitterentropy_rng
> Apr 12 21:34:45   sha256_ssse3
> Apr 12 21:34:45   sha256_generic
> Apr 12 21:34:45   hmac
> Apr 12 21:34:45   iTCO_wdt
> Apr 12 21:34:45   iTCO_vendor_support
> Apr 12 21:34:45   drbg
> Apr 12 21:34:45   ansi_cprng
> Apr 12 21:34:45   aesni_intel
> Apr 12 21:34:45   aes_x86_64
> Apr 12 21:34:45   lrw
> Apr 12 21:34:45   gf128mul
> Apr 12 21:34:45   glue_helper
> Apr 12 21:34:45   ablk_helper
> Apr 12 21:34:45   cryptd
> Apr 12 21:34:45   ahci
> Apr 12 21:34:45   libahci
> Apr 12 21:34:45   sb_edac
> Apr 12 21:34:45   libata
> Apr 12 21:34:45   igb
> Apr 12 21:34:45   megaraid_sas
> Apr 12 21:34:45   xhci_pci
> Apr 12 21:34:45   ehci_pci
> Apr 12 21:34:45   i2c_algo_bit
> Apr 12 21:34:45   xhci_hcd
> Apr 12 21:34:45   ehci_hcd
> Apr 12 21:34:45   edac_core
> Apr 12 21:34:45   ptp
> Apr 12 21:34:45   mei_me
> Apr 12 21:34:45   lpc_ich
> Apr 12 21:34:45   i2c_i801
> Apr 12 21:34:45   usbcore
> Apr 12 21:34:45   pps_core
> Apr 12 21:34:45   mfd_core
> Apr 12 21:34:45   mei
> Apr 12 21:34:45   usb_common
> Apr 12 21:34:45   i2c_core
> Apr 12 21:34:45   ioatdma
> Apr 12 21:34:45   scsi_mod
> Apr 12 21:34:45   dca
> Apr 12 21:34:45   ipmi_si
> Apr 12 21:34:45   ipmi_msghandler
> Apr 12 21:34:45   acpi_power_meter
> Apr 12 21:34:45   tpm_tis
> Apr 12 21:34:45   tpm
> Apr 12 21:34:45   processor
> Apr 12 21:34:45   button
> Apr 12 21:34:45
> Apr 12 21:34:45  [75704.965874] CPU: 6 PID: 25339 Comm: main Not tainted
> 4.4.1 #2
> Apr 12 21:34:45  [75704.965916] Hardware name: Supermicro Super
> Server/X10DRi-LN4+, BIOS 2.0 12/17/2015
> Apr 12 21:34:45  [75704.965979]  0000000000000000
> Apr 12 21:34:45   ffffffff812abdf3
> Apr 12 21:34:45   0000000000000000
> Apr 12 21:34:45   ffffffff810cf5f5
> Apr 12 21:34:45
> Apr 12 21:34:45  [75704.966054]  ffff881ff2870000
> Apr 12 21:34:45   ffffffff810fcea2
> Apr 12 21:34:45   0000000000000001
> Apr 12 21:34:45   ffff881fffcc5e58
> Apr 12 21:34:45
> Apr 12 21:34:45  [75704.966134]  ffff881fffccaf00
> Apr 12 21:34:45   ffff881fffccb100
> Apr 12 21:34:45   ffff881ff2870000
> Apr 12 21:34:45   ffffffff8101bc63
> Apr 12 21:34:45
> Apr 12 21:34:45  [75704.966211] Call Trace:
> Apr 12 21:34:45  [75704.966246]  <NMI>
> Apr 12 21:34:45   [<ffffffff812abdf3>] ? dump_stack+0x40/0x5d
> Apr 12 21:34:45  [75704.966297]  [<ffffffff810cf5f5>] ?
> watchdog_overflow_callback+0xb5/0xd0
> Apr 12 21:34:45  [75704.966339]  [<ffffffff810fcea2>] ?
> __perf_event_overflow+0x82/0x1c0
> Apr 12 21:34:45  [75704.966384]  [<ffffffff8101bc63>] ?
> intel_pmu_handle_irq+0x1c3/0x3e0
> Apr 12 21:34:45  [75704.966431]  [<ffffffff8113b5cb>] ?
> vunmap_page_range+0x1bb/0x320
> Apr 12 21:34:45  [75704.966474]  [<ffffffff813213e0>] ?
> ghes_copy_tofrom_phys+0x110/0x1d0
> Apr 12 21:34:45  [75704.966519]  [<ffffffff81014f53>] ?
> perf_event_nmi_handler+0x23/0x40
> Apr 12 21:34:45  [75704.966560]  [<ffffffff81007b85>] ?
> nmi_handle+0x65/0x100
> Apr 12 21:34:45  [75704.966597]  [<ffffffff81007dfe>] ? do_nmi+0x1de/0x360
> Apr 12 21:34:45  [75704.970603]  [<ffffffff8148f957>] ?
> end_repeat_nmi+0x1a/0x1e
> Apr 12 21:34:45  [75704.970644]  [<ffffffff810862ca>] ?
> queued_spin_lock_slowpath+0xea/0x150
> Apr 12 21:34:45  [75704.970685]  [<ffffffff810862ca>] ?
> queued_spin_lock_slowpath+0xea/0x150
> Apr 12 21:34:45  [75704.970728]  [<ffffffff810862ca>] ?
> queued_spin_lock_slowpath+0xea/0x150
> Apr 12 21:34:45  [75704.970768]  <<EOE>>
> Apr 12 21:34:45   [<ffffffffa01b413b>] ? make_request+0x60b/0xbd0 [raid456]
> Apr 12 21:34:45  [75704.970838]  [<ffffffff810815c0>] ? wait_woken+0x80/0x80
> Apr 12 21:34:45  [75704.970878]  [<ffffffff81151ec4>] ?
> kmem_cache_alloc+0xf4/0x120
> Apr 12 21:34:45  [75704.970922]  [<ffffffffa017632d>] ?
> md_make_request+0xdd/0x220 [md_mod]
> Apr 12 21:34:45  [75704.970969]  [<ffffffff81219fde>] ?
> xfs_map_buffer.isra.12+0x2e/0x60
> Apr 12 21:34:45  [75704.971012]  [<ffffffff8128691d>] ?
> generic_make_request+0xed/0x1d0
> Apr 12 21:34:45  [75704.971052]  [<ffffffff81286a5a>] ?
> submit_bio+0x5a/0x140
> Apr 12 21:34:45  [75704.971098]  [<ffffffff81113379>] ?
> release_pages+0xc9/0x270
> Apr 12 21:34:45  [75704.971145]  [<ffffffff811a2c01>] ?
> do_mpage_readpage+0x2d1/0x640
> Apr 12 21:34:45  [75704.971187]  [<ffffffff811a304d>] ?
> mpage_readpages+0xdd/0x130
> Apr 12 21:34:45  [75704.971226]  [<ffffffff8121b510>] ?
> __xfs_get_blocks+0x750/0x750
> Apr 12 21:34:45  [75704.971267]  [<ffffffff8121b510>] ?
> __xfs_get_blocks+0x750/0x750
> Apr 12 21:34:45  [75704.971313]  [<ffffffff8114ad45>] ?
> alloc_pages_current+0x85/0x110
> Apr 12 21:34:45  [75704.971354]  [<ffffffff81111d25>] ?
> __do_page_cache_readahead+0x165/0x1f0
> Apr 12 21:34:45  [75704.971399]  [<ffffffff81105902>] ?
> pagecache_get_page+0x22/0x1a0
> Apr 12 21:34:45  [75704.971441]  [<ffffffff8110768c>] ?
> filemap_fault+0x37c/0x400
> Apr 12 21:34:45  [75704.971481]  [<ffffffff8122474b>] ?
> xfs_filemap_fault+0x3b/0x80
> Apr 12 21:34:45  [75704.971526]  [<ffffffff8112d2da>] ? __do_fault+0x3a/0xc0
> Apr 12 21:34:45  [75704.971564]  [<ffffffff81130883>] ?
> handle_mm_fault+0x1063/0x1650
> Apr 12 21:34:45  [75704.971614]  [<ffffffff8103bdae>] ?
> __do_page_fault+0x11e/0x370
> Apr 12 21:34:45  [75704.971653]  [<ffffffff811aa4ff>] ?
> SyS_epoll_wait+0x8f/0xd0
> Apr 12 21:34:45  [75704.971694]  [<ffffffff8148f64f>] ? page_fault+0x1f/0x30
> Apr 12 21:34:45  [75705.493640] NMI watchdog: Watchdog detected hard LOCKUP
> on cpu 12
> Apr 12 21:34:45
> Apr 12 21:34:45  [75705.493668] Modules linked in:
> Apr 12 21:34:45   ipt_REJECT
> Apr 12 21:34:45   nf_reject_ipv4
> Apr 12 21:34:45   iptable_mangle
> Apr 12 21:34:45   tun
> Apr 12 21:34:45   netconsole
> Apr 12 21:34:45   configfs
> Apr 12 21:34:45   xt_multiport
> Apr 12 21:34:45   ip6table_filter
> Apr 12 21:34:45   ip6_tables
> Apr 12 21:34:45   iptable_filter
> Apr 12 21:34:45   ip_tables
> Apr 12 21:34:45   x_tables
> Apr 12 21:34:45   bridge
> Apr 12 21:34:45   stp
> Apr 12 21:34:45   llc
> Apr 12 21:34:45   bonding
> Apr 12 21:34:45   ext4
> Apr 12 21:34:45   crc16
> Apr 12 21:34:45   mbcache
> Apr 12 21:34:45   jbd2
> Apr 12 21:34:45   raid1
> Apr 12 21:34:45   raid0
> Apr 12 21:34:45   raid456
> Apr 12 21:34:45   async_raid6_recov
> Apr 12 21:34:45   async_memcpy
> Apr 12 21:34:45   async_pq
> Apr 12 21:34:45   async_xor
> Apr 12 21:34:45   xor
> Apr 12 21:34:45   async_tx
> Apr 12 21:34:45   raid6_pq
> Apr 12 21:34:45   md_mod
> Apr 12 21:34:45   sr_mod
> Apr 12 21:34:45   cdrom
> Apr 12 21:34:45   usb_storage
> Apr 12 21:34:45   hid_generic
> Apr 12 21:34:45   usbhid
> Apr 12 21:34:45   hid
> Apr 12 21:34:45   sg
> Apr 12 21:34:45   sd_mod
> Apr 12 21:34:45   x86_pkg_temp_thermal
> Apr 12 21:34:45   coretemp
> Apr 12 21:34:45   crct10dif_pclmul
> Apr 12 21:34:45   crc32_pclmul
> Apr 12 21:34:45   crc32c_intel
> Apr 12 21:34:45   jitterentropy_rng
> Apr 12 21:34:45   sha256_ssse3
> Apr 12 21:34:45   sha256_generic
> Apr 12 21:34:45   hmac
> Apr 12 21:34:45   iTCO_wdt
> Apr 12 21:34:45   iTCO_vendor_support
> Apr 12 21:34:45   drbg
> Apr 12 21:34:45   ansi_cprng
> Apr 12 21:34:45   aesni_intel
> Apr 12 21:34:45   aes_x86_64
> Apr 12 21:34:45   lrw
> Apr 12 21:34:45   gf128mul
> Apr 12 21:34:45   glue_helper
> Apr 12 21:34:45   ablk_helper
> Apr 12 21:34:45   cryptd
> Apr 12 21:34:45   ahci
> Apr 12 21:34:45   libahci
> Apr 12 21:34:45   sb_edac
> Apr 12 21:34:45   libata
> Apr 12 21:34:45   igb
> Apr 12 21:34:45   megaraid_sas
> Apr 12 21:34:45   xhci_pci
> Apr 12 21:34:45   ehci_pci
> Apr 12 21:34:45   i2c_algo_bit
> Apr 12 21:34:45   xhci_hcd
> Apr 12 21:34:45   ehci_hcd
> Apr 12 21:34:45   edac_core
> Apr 12 21:34:45   ptp
> Apr 12 21:34:45   mei_me
> Apr 12 21:34:45   lpc_ich
> Apr 12 21:34:45   i2c_i801
> Apr 12 21:34:45   usbcore
> Apr 12 21:34:45   pps_core
> Apr 12 21:34:45   mfd_core
> Apr 12 21:34:45   mei
> Apr 12 21:34:45   usb_common
> Apr 12 21:34:45   i2c_core
> Apr 12 21:34:45   ioatdma
> Apr 12 21:34:45   scsi_mod
> Apr 12 21:34:45   dca
> Apr 12 21:34:45   ipmi_si
> Apr 12 21:34:45   ipmi_msghandler
> Apr 12 21:34:45   acpi_power_meter
> Apr 12 21:34:45   tpm_tis
> Apr 12 21:34:45   tpm
> Apr 12 21:34:45   processor
> Apr 12 21:34:45   button
> Apr 12 21:34:45
> Apr 12 21:34:45  [75705.494688] CPU: 12 PID: 32350 Comm: main Not tainted
> 4.4.1 #2
> Apr 12 21:34:45  [75705.494728] Hardware name: Supermicro Super
> Server/X10DRi-LN4+, BIOS 2.0 12/17/2015
> Apr 12 21:34:45  [75705.494790]  0000000000000000
> Apr 12 21:34:45   ffffffff812abdf3
> Apr 12 21:34:45   0000000000000000
> Apr 12 21:34:45   ffffffff810cf5f5
> Apr 12 21:34:45
> Apr 12 21:34:45  [75705.494886]  ffff883ff29a0000
> Apr 12 21:34:45   ffffffff810fcea2
> Apr 12 21:34:45   0000000000000001
> Apr 12 21:34:45   ffff88407fc85e58
> Apr 12 21:34:45
> Apr 12 21:34:45  [75705.494976]  ffff88407fc8af00
> Apr 12 21:34:45   ffff88407fc8b100
> Apr 12 21:34:45   ffff883ff29a0000
> Apr 12 21:34:45   ffffffff8101bc63
> Apr 12 21:34:45
> Apr 12 21:34:45  [75705.495064] Call Trace:
> Apr 12 21:34:45  [75705.495094]  <NMI>
> Apr 12 21:34:45   [<ffffffff812abdf3>] ? dump_stack+0x40/0x5d
> Apr 12 21:34:45  [75705.495150]  [<ffffffff810cf5f5>] ?
> watchdog_overflow_callback+0xb5/0xd0
> Apr 12 21:34:45  [75705.495193]  [<ffffffff810fcea2>] ?
> __perf_event_overflow+0x82/0x1c0
> Apr 12 21:34:45  [75705.495237]  [<ffffffff8101bc63>] ?
> intel_pmu_handle_irq+0x1c3/0x3e0
> Apr 12 21:34:45  [75705.495284]  [<ffffffff8113b5cb>] ?
> vunmap_page_range+0x1bb/0x320
> Apr 12 21:34:45  [75705.495330]  [<ffffffff813213e0>] ?
> ghes_copy_tofrom_phys+0x110/0x1d0
> Apr 12 21:34:45  [75705.495373]  [<ffffffff81014f53>] ?
> perf_event_nmi_handler+0x23/0x40
> Apr 12 21:34:45  [75705.495418]  [<ffffffff81007b85>] ?
> nmi_handle+0x65/0x100
> Apr 12 21:34:45  [75705.495458]  [<ffffffff81007d2e>] ? do_nmi+0x10e/0x360
> Apr 12 21:34:45  [75705.495497]  [<ffffffff8148f957>] ?
> end_repeat_nmi+0x1a/0x1e
> Apr 12 21:34:45  [75705.495540]  [<ffffffff810862ca>] ?
> queued_spin_lock_slowpath+0xea/0x150
> Apr 12 21:34:45  [75705.495581]  [<ffffffff810862ca>] ?
> queued_spin_lock_slowpath+0xea/0x150
> Apr 12 21:34:45  [75705.495621]  [<ffffffff810862ca>] ?
> queued_spin_lock_slowpath+0xea/0x150
> Apr 12 21:34:45  [75705.495661]  <<EOE>>
> Apr 12 21:34:45   [<ffffffffa01b413b>] ? make_request+0x60b/0xbd0 [raid456]
> Apr 12 21:34:45  [75705.495733]  [<ffffffff81282d87>] ?
> blk_rq_init+0x87/0xa0
> Apr 12 21:34:45  [75705.495771]  [<ffffffff81283e3c>] ?
> get_request+0x29c/0x6e0
> Apr 12 21:34:45  [75705.495812]  [<ffffffff810815c0>] ? wait_woken+0x80/0x80
> Apr 12 21:34:45  [75705.495853]  [<ffffffffa017632d>] ?
> md_make_request+0xdd/0x220 [md_mod]
> Apr 12 21:34:45  [75705.495898]  [<ffffffff8128829e>] ?
> blk_queue_bio+0x15e/0x350
> Apr 12 21:34:45  [75705.495937]  [<ffffffff8128691d>] ?
> generic_make_request+0xed/0x1d0
> Apr 12 21:34:45  [75705.495978]  [<ffffffff81286a5a>] ?
> submit_bio+0x5a/0x140
> Apr 12 21:34:45  [75705.496018]  [<ffffffff811a215e>] ?
> mpage_bio_submit+0x1e/0x30
> Apr 12 21:34:45  [75705.496057]  [<ffffffff811a3076>] ?
> mpage_readpages+0x106/0x130
> Apr 12 21:34:45  [75705.496102]  [<ffffffff8121b510>] ?
> __xfs_get_blocks+0x750/0x750
> Apr 12 21:34:45  [75705.496144]  [<ffffffff8121b510>] ?
> __xfs_get_blocks+0x750/0x750
> Apr 12 21:34:45  [75705.496185]  [<ffffffff8114ad45>] ?
> alloc_pages_current+0x85/0x110
> Apr 12 21:34:45  [75705.496227]  [<ffffffff81111d25>] ?
> __do_page_cache_readahead+0x165/0x1f0
> Apr 12 21:34:45  [75705.496268]  [<ffffffff811344f5>] ? vma_link+0x75/0xb0
> Apr 12 21:34:45  [75705.496307]  [<ffffffff811120eb>] ?
> force_page_cache_readahead+0x9b/0xe0
> Apr 12 21:34:45  [75705.496352]  [<ffffffff8113f876>] ?
> madvise_willneed+0x76/0x140
> Apr 12 21:34:45  [75705.496395]  [<ffffffff811301ce>] ?
> handle_mm_fault+0x9ae/0x1650
> Apr 12 21:34:45  [75705.496437]  [<ffffffff81133dcb>] ? find_vma+0x5b/0x70
> Apr 12 21:34:45  [75705.496476]  [<ffffffff8113fc52>] ?
> SyS_madvise+0x312/0x6f0
> Apr 12 21:34:45  [75705.496515]  [<ffffffff8148d9db>] ?
> entry_SYSCALL_64_fastpath+0x16/0x6e
> Apr 12 21:34:47  [75707.118049] NMI watchdog: Watchdog detected hard LOCKUP
> on cpu 15
> Apr 12 21:34:47
> Apr 12 21:34:47  [75707.118078] Modules linked in:
> Apr 12 21:34:47   ipt_REJECT
> Apr 12 21:34:47   nf_reject_ipv4
> Apr 12 21:34:47   iptable_mangle
> Apr 12 21:34:47   tun
> Apr 12 21:34:47   netconsole
> Apr 12 21:34:47   configfs
> Apr 12 21:34:47   xt_multiport
> Apr 12 21:34:47   ip6table_filter
> Apr 12 21:34:47   ip6_tables
> Apr 12 21:34:47   iptable_filter
> Apr 12 21:34:47   ip_tables
> Apr 12 21:34:47   x_tables
> Apr 12 21:34:47   bridge
> Apr 12 21:34:47   stp
> Apr 12 21:34:47   llc
> Apr 12 21:34:47   bonding
> Apr 12 21:34:47   ext4
> Apr 12 21:34:47   crc16
> Apr 12 21:34:47   mbcache
> Apr 12 21:34:47   jbd2
> Apr 12 21:34:47   raid1
> Apr 12 21:34:47   raid0
> Apr 12 21:34:47   raid456
> Apr 12 21:34:47   async_raid6_recov
> Apr 12 21:34:47   async_memcpy
> Apr 12 21:34:47   async_pq
> Apr 12 21:34:47   async_xor
> Apr 12 21:34:47   xor
> Apr 12 21:34:47   async_tx
> Apr 12 21:34:47   raid6_pq
> Apr 12 21:34:47   md_mod
> Apr 12 21:34:47   sr_mod
> Apr 12 21:34:47   cdrom
> Apr 12 21:34:47   usb_storage
> Apr 12 21:34:47   hid_generic
> Apr 12 21:34:47   usbhid
> Apr 12 21:34:47   hid
> Apr 12 21:34:47   sg
> Apr 12 21:34:47   sd_mod
> Apr 12 21:34:47   x86_pkg_temp_thermal
> Apr 12 21:34:47   coretemp
> Apr 12 21:34:47   crct10dif_pclmul
> Apr 12 21:34:47   crc32_pclmul
> Apr 12 21:34:47   crc32c_intel
> Apr 12 21:34:47   jitterentropy_rng
> Apr 12 21:34:47   sha256_ssse3
> Apr 12 21:34:47   sha256_generic
> Apr 12 21:34:47   hmac
> Apr 12 21:34:47   iTCO_wdt
> Apr 12 21:34:47   iTCO_vendor_support
> Apr 12 21:34:47   drbg
> Apr 12 21:34:47   ansi_cprng
> Apr 12 21:34:47   aesni_intel
> Apr 12 21:34:47   aes_x86_64
> Apr 12 21:34:47   lrw
> Apr 12 21:34:47   gf128mul
> Apr 12 21:34:47   glue_helper
> Apr 12 21:34:47   ablk_helper
> Apr 12 21:34:47   cryptd
> Apr 12 21:34:47   ahci
> Apr 12 21:34:47   libahci
> Apr 12 21:34:47   sb_edac
> Apr 12 21:34:47   libata
> Apr 12 21:34:47   igb
> Apr 12 21:34:47   megaraid_sas
> Apr 12 21:34:47   xhci_pci
> Apr 12 21:34:47   ehci_pci
> Apr 12 21:34:47   i2c_algo_bit
> Apr 12 21:34:47   xhci_hcd
> Apr 12 21:34:47   ehci_hcd
> Apr 12 21:34:47   edac_core
> Apr 12 21:34:47   ptp
> Apr 12 21:34:47   mei_me
> Apr 12 21:34:47   lpc_ich
> Apr 12 21:34:47   i2c_i801
> Apr 12 21:34:47   usbcore
> Apr 12 21:34:47   pps_core
> Apr 12 21:34:47   mfd_core
> Apr 12 21:34:47   mei
> Apr 12 21:34:47   usb_common
> Apr 12 21:34:47   i2c_core
> Apr 12 21:34:47   ioatdma
> Apr 12 21:34:47   scsi_mod
> Apr 12 21:34:47   dca
> Apr 12 21:34:47   ipmi_si
> Apr 12 21:34:47   ipmi_msghandler
> Apr 12 21:34:47   acpi_power_meter
> Apr 12 21:34:47   tpm_tis
> Apr 12 21:34:47   tpm
> Apr 12 21:34:47   processor
> Apr 12 21:34:47   button
> Apr 12 21:34:47
> Apr 12 21:34:47  [75707.119088] CPU: 15 PID: 31940 Comm: main Not tainted
> 4.4.1 #2
> Apr 12 21:34:47  [75707.119134] Hardware name: Supermicro Super
> Server/X10DRi-LN4+, BIOS 2.0 12/17/2015
> Apr 12 21:34:47  [75707.119196]  0000000000000000
> Apr 12 21:34:47   ffffffff812abdf3
> Apr 12 21:34:47   0000000000000000
> Apr 12 21:34:47   ffffffff810cf5f5
> Apr 12 21:34:47
> Apr 12 21:34:47  [75707.119277]  ffff883ff2a20000
> Apr 12 21:34:47   ffffffff810fcea2
> Apr 12 21:34:47   0000000000000001
> Apr 12 21:34:47   ffff88407fce5e58
> Apr 12 21:34:47
> Apr 12 21:34:47  [75707.119360]  ffff88407fceaf00
> Apr 12 21:34:47   ffff88407fceb100
> Apr 12 21:34:47   ffff883ff2a20000
> Apr 12 21:34:47   ffffffff8101bc63
> Apr 12 21:34:47
> Apr 12 21:34:47  [75707.119439] Call Trace:
> Apr 12 21:34:47  [75707.119471]  <NMI>
> Apr 12 21:34:47   [<ffffffff812abdf3>] ? dump_stack+0x40/0x5d
> Apr 12 21:34:47  [75707.119527]  [<ffffffff810cf5f5>] ?
> watchdog_overflow_callback+0xb5/0xd0
> Apr 12 21:34:47  [75707.119571]  [<ffffffff810fcea2>] ?
> __perf_event_overflow+0x82/0x1c0
> Apr 12 21:34:47  [75707.119614]  [<ffffffff8101bc63>] ?
> intel_pmu_handle_irq+0x1c3/0x3e0
> Apr 12 21:34:47  [75707.119657]  [<ffffffff8113b5cb>] ?
> vunmap_page_range+0x1bb/0x320
> Apr 12 21:34:47  [75707.119703]  [<ffffffff813213e0>] ?
> ghes_copy_tofrom_phys+0x110/0x1d0
> Apr 12 21:34:47  [75707.119758]  [<ffffffff81014f53>] ?
> perf_event_nmi_handler+0x23/0x40
> Apr 12 21:34:47  [75707.119800]  [<ffffffff81007b85>] ?
> nmi_handle+0x65/0x100
> Apr 12 21:34:47  [75707.119838]  [<ffffffff81007d2e>] ? do_nmi+0x10e/0x360
> Apr 12 21:34:47  [75707.119878]  [<ffffffff8148f957>] ?
> end_repeat_nmi+0x1a/0x1e
> Apr 12 21:34:47  [75707.119920]  [<ffffffff810862ca>] ?
> queued_spin_lock_slowpath+0xea/0x150
> Apr 12 21:34:47  [75707.119962]  [<ffffffff810862ca>] ?
> queued_spin_lock_slowpath+0xea/0x150
> Apr 12 21:34:47  [75707.120002]  [<ffffffff810862ca>] ?
> queued_spin_lock_slowpath+0xea/0x150
> Apr 12 21:34:47  [75707.120042]  <<EOE>>
> Apr 12 21:34:47   [<ffffffffa01b413b>] ? make_request+0x60b/0xbd0 [raid456]
> Apr 12 21:34:47  [75707.120113]  [<ffffffff810815c0>] ? wait_woken+0x80/0x80
> Apr 12 21:34:47  [75707.120152]  [<ffffffffa017632d>] ?
> md_make_request+0xdd/0x220 [md_mod]
> Apr 12 21:34:47  [75707.120195]  [<ffffffff8128691d>] ?
> generic_make_request+0xed/0x1d0
> Apr 12 21:34:47  [75707.120236]  [<ffffffff81286a5a>] ?
> submit_bio+0x5a/0x140
> Apr 12 21:34:47  [75707.120277]  [<ffffffff8112afaf>] ?
> workingset_refault+0x4f/0xa0
> Apr 12 21:34:47  [75707.120320]  [<ffffffff811a215e>] ?
> mpage_bio_submit+0x1e/0x30
> Apr 12 21:34:47  [75707.120359]  [<ffffffff811a3076>] ?
> mpage_readpages+0x106/0x130
> Apr 12 21:34:47  [75707.120401]  [<ffffffff8121b510>] ?
> __xfs_get_blocks+0x750/0x750
> Apr 12 21:34:47  [75707.120439]  [<ffffffff8121b510>] ?
> __xfs_get_blocks+0x750/0x750
> Apr 12 21:34:47  [75707.120481]  [<ffffffff8114ad45>] ?
> alloc_pages_current+0x85/0x110
> Apr 12 21:34:47  [75707.120523]  [<ffffffff81111d25>] ?
> __do_page_cache_readahead+0x165/0x1f0
> Apr 12 21:34:47  [75707.120564]  [<ffffffff811344f5>] ? vma_link+0x75/0xb0
> Apr 12 21:34:47  [75707.120602]  [<ffffffff811120c7>] ?
> force_page_cache_readahead+0x77/0xe0
> Apr 12 21:34:47  [75707.120644]  [<ffffffff8113f876>] ?
> madvise_willneed+0x76/0x140
> Apr 12 21:34:47  [75707.120683]  [<ffffffff811301ce>] ?
> handle_mm_fault+0x9ae/0x1650
> Apr 12 21:34:47  [75707.120722]  [<ffffffff81133dcb>] ? find_vma+0x5b/0x70
> Apr 12 21:34:47  [75707.120760]  [<ffffffff8113fc52>] ?
> SyS_madvise+0x312/0x6f0
> Apr 12 21:34:47  [75707.120799]  [<ffffffff8148d9db>] ?
> entry_SYSCALL_64_fastpath+0x16/0x6e
> 
> Once this starts, a couple of minutes goes by and the machine locks up
> completely.
> 
> I have been unable to locate the problem here, anyone that can point me in
> the right direction?
> 
> Best regards
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Hard CPU Lockup when accessing MD RAID5
  2016-04-13 17:00 ` Shaohua Li
@ 2016-04-20  6:52   ` Daniel Walker
  2016-04-20 15:29     ` John Stoffel
  0 siblings, 1 reply; 5+ messages in thread
From: Daniel Walker @ 2016-04-20  6:52 UTC (permalink / raw)
  To: linux-raid

Hi,

I upgraded the kernel to the latest stable with debugging enabled 
(4.5.1) without any luck, this is what is outputted in dmesg:


    [262448.558983] INFO: task php:13376 blocked for more than 120 seconds.
    [262448.559057]       Tainted: G        W       4.5.1 #1
    [262448.559092] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" 
disables this message.
    [262448.559246] php             D
     ffff88001c297a18
        0 13376  12277 0x00000000
    [262448.559519]  ffff88001c297a18
     ffff881ff248c100
     ffff880013e9b400
     ffff881fea472000

    [262448.559603]  ffff88001c297ae8
     ffff88001c298000
     ffff881c5cac1b30
     ffff880013e9b400

    [262448.560046]  0000000000020001
     0000000545ea7820
     ffff88001c297a30
     ffffffff814d5690

    [262448.560485] Call Trace:
    [262448.560541]  [<ffffffff814d5690>] schedule+0x30/0x80
    [262448.560761]  [<ffffffff814d823e>] schedule_timeout+0x21e/0x2a0
    [262448.560828]  [<ffffffff81217c3d>] ? 
xfs_bmap_search_extents+0x7d/0x100
    [262448.561000]  [<ffffffff810902d9>] ? down_trylock+0x29/0x40
    [262448.561135]  [<ffffffff814d726f>] __down+0x5f/0xa0
    [262448.561268]  [<ffffffff8124bdd6>] ? _xfs_buf_find+0x156/0x350
    [262448.561347]  [<ffffffff8109032c>] down+0x3c/0x50
    [262448.561390]  [<ffffffff8124bbc7>] xfs_buf_lock+0x37/0xf0
    [262448.561435]  [<ffffffff8124bdd6>] _xfs_buf_find+0x156/0x350
    [262448.561557]  [<ffffffff8124bff5>] xfs_buf_get_map+0x25/0x280
    [262448.561603]  [<ffffffff81268f4b>] ? kmem_zone_alloc+0x7b/0x120
    [262448.561666]  [<ffffffff8124cbe8>] xfs_buf_read_map+0x28/0x180
    [262448.561768]  [<ffffffff8127830b>] xfs_trans_read_buf_map+0xeb/0x300
    [262448.561809]  [<ffffffff8123f7da>] xfs_imap_to_bp+0x5a/0xc0
    [262448.561881]  [<ffffffff8125b7a5>] xfs_iunlink_remove+0x275/0x3a0
    [262448.561943]  [<ffffffff81268f4b>] ? kmem_zone_alloc+0x7b/0x120
    [262448.561988]  [<ffffffff8125ec33>] xfs_ifree+0x33/0xd0
    [262448.562033]  [<ffffffff8125ed85>] xfs_inactive_ifree+0xb5/0x200
    [262448.562109]  [<ffffffff8125ef58>] xfs_inactive+0x88/0x110
    [262448.562296]  [<ffffffff81263f31>] xfs_fs_evict_inode+0xc1/0x110
    [262448.562344]  [<ffffffff811a42fb>] evict+0xbb/0x180
    [262448.562405]  [<ffffffff811a4bb3>] iput+0x193/0x200
    [262448.562483]  [<ffffffff811a08d2>] d_delete+0x122/0x160
    [262448.562520]  [<ffffffff81195b99>] vfs_rmdir+0xf9/0x120
    [262448.562559]  [<ffffffff81199d17>] do_rmdir+0x1b7/0x1d0
    [262448.562607]  [<ffffffff81001210>] ? exit_to_usermode_loop+0x90/0xb0
    [262448.562665]  [<ffffffff8119a921>] SyS_rmdir+0x11/0x20
    [262448.562891]  [<ffffffff814d8f1b>] 
entry_SYSCALL_64_fastpath+0x16/0x6e
    [262489.707201] NMI watchdog: Watchdog detected hard LOCKUP on cpu 15

    [262489.707227] Modules linked in:
     ipt_MASQUERADE
     nf_nat_masquerade_ipv4
     iptable_nat
     nf_conntrack_ipv4
     nf_defrag_ipv4
     nf_nat_ipv4
     nf_nat
     nf_conntrack
     ipt_REJECT
     nf_reject_ipv4
     iptable_mangle
     netconsole
     configfs
     tun
     xt_multiport
     ip6table_filter
     ip6_tables
     iptable_filter
     ip_tables
     x_tables
     bridge
     stp
     llc
     bonding
     ext4
     crc16
     mbcache
     jbd2
     raid1
     raid0
     raid456
     async_raid6_recov
     async_memcpy
     async_pq
     async_xor
     xor
     async_tx
     raid6_pq
     md_mod
     sg
     sd_mod
     hid_generic
     usbhid
     hid
     x86_pkg_temp_thermal
     coretemp
     crct10dif_pclmul
     crc32_pclmul
     crc32c_intel
     ghash_clmulni_intel
     jitterentropy_rng
     sha256_ssse3
     iTCO_wdt
     sha256_generic
     iTCO_vendor_support
     hmac
     drbg
     xhci_pci
     ahci
     sb_edac
     ehci_pci
     ansi_cprng
     xhci_hcd
     ehci_hcd
     libahci
     i2c_i801
     edac_core
     lpc_ich
     mei_me
     mfd_core
     libata
     usbcore
     igb
     mei
     megaraid_sas
     i2c_algo_bit
     usb_common
     ptp
     aesni_intel
     pps_core
     aes_x86_64
     ioatdma
     lrw
     gf128mul
     glue_helper
     ablk_helper
     i2c_core
     scsi_mod
     dca
     cryptd
     ipmi_si
     ipmi_msghandler
     acpi_power_meter
     tpm_tis
     tpm
     processor
     button

    [262489.708066] CPU: 15 PID: 17535 Comm: kworker/u32:6 Tainted: 
G        W       4.5.1 #1
    [262489.708124] Hardware name: Supermicro Super Server/X10DRi-LN4+, 
BIOS 2.0 12/17/2015
    [262489.708187] Workqueue: writeback wb_workfn
     (flush-9:7)

    [262489.708228]  0000000000000000
     ffff88207fde5bd0
     ffffffff812e00b8
     0000000000000000

    [262489.708298]  0000000000000000
     ffff88207fde5be8
     ffffffff810dff1d
     ffff881ff2270000

    [262489.708368]  ffff88207fde5c20
     ffffffff8110f8f8
     0000000000000001
     ffff88207fdeaf00

    [262489.708438] Call Trace:
    [262489.708467]  <NMI>
     [<ffffffff812e00b8>] dump_stack+0x4d/0x65
    [262489.708512]  [<ffffffff810dff1d>] 
watchdog_overflow_callback+0xdd/0xf0
    [262489.708552]  [<ffffffff8110f8f8>] __perf_event_overflow+0x88/0x1d0
    [262489.708589]  [<ffffffff811103e4>] perf_event_overflow+0x14/0x20
    [262489.708627]  [<ffffffff8101e320>] intel_pmu_handle_irq+0x1d0/0x4a0
    [262489.708666]  [<ffffffff81155481>] ? vunmap_page_range+0x1a1/0x310
    [262489.708703]  [<ffffffff811555fc>] ? 
unmap_kernel_range_noflush+0xc/0x10
    [262489.708748]  [<ffffffff8135a543>] ? 
ghes_copy_tofrom_phys+0x113/0x1e0
    [262489.708788]  [<ffffffff810359da>] ? 
native_apic_wait_icr_idle+0x1a/0x30
    [262489.708827]  [<ffffffff810096e0>] ? arch_irq_work_raise+0x30/0x40
    [262489.708865]  [<ffffffff810162d8>] perf_event_nmi_handler+0x28/0x50
    [262489.708902]  [<ffffffff81008121>] nmi_handle+0x61/0x110
    [262489.708939]  [<ffffffff810082e7>] do_nmi+0x117/0x3e0
    [262489.708975]  [<ffffffff814dae97>] end_repeat_nmi+0x1a/0x1e
    [262489.709013]  [<ffffffffa01d05f0>] ? raid5_unplug+0x70/0x130 
[raid456]
    [262489.709051]  [<ffffffffa01d05f0>] ? raid5_unplug+0x70/0x130 
[raid456]
    [262489.709089]  [<ffffffffa01d05f0>] ? raid5_unplug+0x70/0x130 
[raid456]
    [262489.709125]  <<EOE>>
     [<ffffffff812b9b98>] blk_flush_plug_list+0xa8/0x210
    [262489.709169]  [<ffffffff814d5de0>] ? bit_wait_timeout+0x70/0x70
    [262489.709206]  [<ffffffff814d4c04>] io_schedule_timeout+0x54/0x130
    [262489.709242]  [<ffffffff814d5df6>] bit_wait_io+0x16/0x60
    [262489.709277]  [<ffffffff814d5b59>] __wait_on_bit_lock+0x49/0xa0
    [262489.709314]  [<ffffffff81117fd0>] __lock_page+0xb0/0xc0
    [262489.709352]  [<ffffffff8108bdc0>] ? 
autoremove_wake_function+0x30/0x30
    [262489.709391]  [<ffffffff811250f0>] write_cache_pages+0x2f0/0x4d0
    [262489.709427]  [<ffffffff81122df0>] ? wb_position_ratio+0x1f0/0x1f0
    [262489.709465]  [<ffffffff8112530e>] generic_writepages+0x3e/0x60
    [262489.709502]  [<ffffffff81244c18>] xfs_vm_writepages+0x38/0x40
    [262489.709539]  [<ffffffff81125e29>] do_writepages+0x19/0x30
    [262489.709574]  [<ffffffff811b5c50>] 
__writeback_single_inode+0x40/0x310
    [262489.709612]  [<ffffffff811b6402>] writeback_sb_inodes+0x242/0x520
    [262489.709649]  [<ffffffff811b676a>] __writeback_inodes_wb+0x8a/0xc0
    [262489.709686]  [<ffffffff811b6a77>] wb_writeback+0x247/0x2d0
    [262489.709721]  [<ffffffff811b716f>] wb_workfn+0x20f/0x3c0
    [262489.709758]  [<ffffffff81067513>] process_one_work+0x143/0x400
    [262489.709795]  [<ffffffff81067cc1>] worker_thread+0x61/0x490
    [262489.709831]  [<ffffffff81067c60>] ? max_active_store+0x60/0x60
    [262489.709867]  [<ffffffff8106c926>] kthread+0xd6/0xf0
    [262489.709901]  [<ffffffff8106c850>] ? kthread_park+0x50/0x50
    [262489.709937]  [<ffffffff814d92af>] ret_from_fork+0x3f/0x70
    [262489.709972]  [<ffffffff8106c850>] ? kthread_park+0x50/0x50
    [262491.022971] NMI watchdog: Watchdog detected hard LOCKUP on cpu 0

    [262491.023470] Modules linked in:
     ipt_MASQUERADE
     nf_nat_masquerade_ipv4
     iptable_nat
     nf_conntrack_ipv4
     nf_defrag_ipv4
     nf_nat_ipv4
     nf_nat
     nf_conntrack
     ipt_REJECT
     nf_reject_ipv4
     iptable_mangle
     netconsole
     configfs
     tun
     xt_multiport
     ip6table_filter
     ip6_tables
     iptable_filter
     ip_tables
     x_tables
     bridge
     stp
     llc
     bonding
     ext4
     crc16
     mbcache
     jbd2
     raid1
     raid0
     raid456
     async_raid6_recov
     async_memcpy
     async_pq
     async_xor
     xor
     async_tx
     raid6_pq
     md_mod
     sg
     sd_mod
     hid_generic
     usbhid
     hid
     x86_pkg_temp_thermal
     coretemp
     crct10dif_pclmul
     crc32_pclmul
     crc32c_intel
     ghash_clmulni_intel
     jitterentropy_rng
     sha256_ssse3
     iTCO_wdt
     sha256_generic
     iTCO_vendor_support
     hmac
     drbg
     xhci_pci
     ahci
     sb_edac
     ehci_pci
     ansi_cprng
     xhci_hcd
     ehci_hcd
     libahci
     i2c_i801
     edac_core
     lpc_ich
     mei_me
     mfd_core
     libata
     usbcore
     igb
     mei
     megaraid_sas
     i2c_algo_bit
     usb_common
     ptp
     aesni_intel
     pps_core
     aes_x86_64
     ioatdma
     lrw
     gf128mul
     glue_helper
     ablk_helper
     i2c_core
     scsi_mod
     dca
     cryptd
     ipmi_si
     ipmi_msghandler
     acpi_power_meter
     tpm_tis
     tpm
     processor
     button

    [262491.029705] CPU: 0 PID: 1178 Comm: md7_raid5 Tainted: G        
W       4.5.1 #1
    [262491.029776] Hardware name: Supermicro Super Server/X10DRi-LN4+, 
BIOS 2.0 12/17/2015
    [262491.029849]  0000000000000000
     ffff88207fc05bd0
     ffffffff812e00b8
     0000000000000000

    [262491.029988]  0000000000000000
     ffff88207fc05be8
     ffffffff810dff1d
     ffff881fff032000

    [262491.030124]  ffff88207fc05c20
     ffffffff8110f8f8
     0000000000000001
     ffff88207fc0af00

    [262491.030260] Call Trace:
    [262491.030302]  <NMI>
     [<ffffffff812e00b8>] dump_stack+0x4d/0x65
    [262491.030377]  [<ffffffff810dff1d>] 
watchdog_overflow_callback+0xdd/0xf0
    [262491.030432]  [<ffffffff8110f8f8>] __perf_event_overflow+0x88/0x1d0
    [262491.030484]  [<ffffffff811103e4>] perf_event_overflow+0x14/0x20
    [262491.030536]  [<ffffffff8101e320>] intel_pmu_handle_irq+0x1d0/0x4a0
    [262491.030589]  [<ffffffff81155481>] ? vunmap_page_range+0x1a1/0x310
    [262491.030640]  [<ffffffff811555fc>] ? 
unmap_kernel_range_noflush+0xc/0x10
    [262491.030693]  [<ffffffff8135a543>] ? 
ghes_copy_tofrom_phys+0x113/0x1e0
    [262491.030745]  [<ffffffff8135a681>] ? ghes_read_estatus+0x71/0x140
    [262491.030797]  [<ffffffff810162d8>] perf_event_nmi_handler+0x28/0x50
    [262491.030849]  [<ffffffff81008121>] nmi_handle+0x61/0x110
    [262491.030898]  [<ffffffff810083d1>] do_nmi+0x201/0x3e0
    [262491.030949]  [<ffffffff814dae97>] end_repeat_nmi+0x1a/0x1e
    [262491.030998]  [<ffffffff81090d23>] ? 
queued_spin_lock_slowpath+0x153/0x170
    [262491.031050]  [<ffffffff81090d23>] ? 
queued_spin_lock_slowpath+0x153/0x170
    [262491.031102]  [<ffffffff81090d23>] ? 
queued_spin_lock_slowpath+0x153/0x170
    [262491.031153]  <<EOE>>
     [<ffffffff814d8c6c>] _raw_spin_lock_irq+0x1c/0x20
    [262491.031225]  [<ffffffffa01db6b1>] raid5d+0x91/0x720 [raid456]
    [262491.031276]  [<ffffffff810a4a8a>] ? try_to_del_timer_sync+0x4a/0x60
    [262491.031328]  [<ffffffff810a4ae3>] ? del_timer_sync+0x43/0x50
    [262491.031377]  [<ffffffff814d816e>] ? schedule_timeout+0x14e/0x2a0
    [262491.031428]  [<ffffffff810a4830>] ? 
trace_event_raw_event_tick_stop+0x100/0x100
    [262491.031502]  [<ffffffffa017874b>] md_thread+0x12b/0x130 [md_mod]
    [262491.031555]  [<ffffffff8108bd90>] ? wait_woken+0x80/0x80
    [262491.031605]  [<ffffffffa0178620>] ? find_pers+0x70/0x70 [md_mod]
    [262491.031656]  [<ffffffff8106c926>] kthread+0xd6/0xf0
    [262491.031704]  [<ffffffff8106c850>] ? kthread_park+0x50/0x50
    [262491.031753]  [<ffffffff814d92af>] ret_from_fork+0x3f/0x70
    [262491.031802]  [<ffffffff8106c850>] ? kthread_park+0x50/0x50
    [262491.031753]  [<ffffffff814d92af>] ret_from_fork+0x3f/0x70
    [262491.031802]  [<ffffffff8106c850>] ? kthread_park+0x50/0x50

The server is hosting plain VPS's, there's a few that use it for 
rtorrent which is quite disk extenssive, but from what I can see that 
iowait is quite low.

There's absolutely nothing logged at all before the lockups, everythings 
running fine and then suddenly it just crashes, im beginning to think we 
might have a hardware problem, but im having a hard time finding the 
actual issue.

Any ideas?

Best regards


Den 13-04-2016 kl. 19:00 skrev Shaohua Li:
> Looks there is a deadlock trying to hold the device_lock or hash_lock. anything
> abormal print out before the NMI watchdog? What is running in the machine?
> Looks this is old kernel, is it possible you can try a latest kernel and report
> back?
>
> Thanks,
> Shaohua
>
> On Tue, Apr 12, 2016 at 09:54:08PM +0000, Daniel Walker wrote:
>> Im having some issues on a brand new Supermicro server that we have running
>> in production along side a few other machines which are identical to this
>> server..
>>
>> The output from the netconsole attached to the server is here:
>>
>> Apr 12 21:34:45  [75704.964946] NMI watchdog: Watchdog detected hard LOCKUP
>> on cpu 6
>> Apr 12 21:34:45
>> Apr 12 21:34:45  [75704.964973] Modules linked in:
>> Apr 12 21:34:45   ipt_REJECT
>> Apr 12 21:34:45   nf_reject_ipv4
>> Apr 12 21:34:45   iptable_mangle
>> Apr 12 21:34:45   tun
>> Apr 12 21:34:45   netconsole
>> Apr 12 21:34:45   configfs
>> Apr 12 21:34:45   xt_multiport
>> Apr 12 21:34:45   ip6table_filter
>> Apr 12 21:34:45   ip6_tables
>> Apr 12 21:34:45   iptable_filter
>> Apr 12 21:34:45   ip_tables
>> Apr 12 21:34:45   x_tables
>> Apr 12 21:34:45   bridge
>> Apr 12 21:34:45   stp
>> Apr 12 21:34:45   llc
>> Apr 12 21:34:45   bonding
>> Apr 12 21:34:45   ext4
>> Apr 12 21:34:45   crc16
>> Apr 12 21:34:45   mbcache
>> Apr 12 21:34:45   jbd2
>> Apr 12 21:34:45   raid1
>> Apr 12 21:34:45   raid0
>> Apr 12 21:34:45   raid456
>> Apr 12 21:34:45   async_raid6_recov
>> Apr 12 21:34:45   async_memcpy
>> Apr 12 21:34:45   async_pq
>> Apr 12 21:34:45   async_xor
>> Apr 12 21:34:45   xor
>> Apr 12 21:34:45   async_tx
>> Apr 12 21:34:45   raid6_pq
>> Apr 12 21:34:45   md_mod
>> Apr 12 21:34:45   sr_mod
>> Apr 12 21:34:45   cdrom
>> Apr 12 21:34:45   usb_storage
>> Apr 12 21:34:45   hid_generic
>> Apr 12 21:34:45   usbhid
>> Apr 12 21:34:45   hid
>> Apr 12 21:34:45   sg
>> Apr 12 21:34:45   sd_mod
>> Apr 12 21:34:45   x86_pkg_temp_thermal
>> Apr 12 21:34:45   coretemp
>> Apr 12 21:34:45   crct10dif_pclmul
>> Apr 12 21:34:45   crc32_pclmul
>> Apr 12 21:34:45   crc32c_intel
>> Apr 12 21:34:45   jitterentropy_rng
>> Apr 12 21:34:45   sha256_ssse3
>> Apr 12 21:34:45   sha256_generic
>> Apr 12 21:34:45   hmac
>> Apr 12 21:34:45   iTCO_wdt
>> Apr 12 21:34:45   iTCO_vendor_support
>> Apr 12 21:34:45   drbg
>> Apr 12 21:34:45   ansi_cprng
>> Apr 12 21:34:45   aesni_intel
>> Apr 12 21:34:45   aes_x86_64
>> Apr 12 21:34:45   lrw
>> Apr 12 21:34:45   gf128mul
>> Apr 12 21:34:45   glue_helper
>> Apr 12 21:34:45   ablk_helper
>> Apr 12 21:34:45   cryptd
>> Apr 12 21:34:45   ahci
>> Apr 12 21:34:45   libahci
>> Apr 12 21:34:45   sb_edac
>> Apr 12 21:34:45   libata
>> Apr 12 21:34:45   igb
>> Apr 12 21:34:45   megaraid_sas
>> Apr 12 21:34:45   xhci_pci
>> Apr 12 21:34:45   ehci_pci
>> Apr 12 21:34:45   i2c_algo_bit
>> Apr 12 21:34:45   xhci_hcd
>> Apr 12 21:34:45   ehci_hcd
>> Apr 12 21:34:45   edac_core
>> Apr 12 21:34:45   ptp
>> Apr 12 21:34:45   mei_me
>> Apr 12 21:34:45   lpc_ich
>> Apr 12 21:34:45   i2c_i801
>> Apr 12 21:34:45   usbcore
>> Apr 12 21:34:45   pps_core
>> Apr 12 21:34:45   mfd_core
>> Apr 12 21:34:45   mei
>> Apr 12 21:34:45   usb_common
>> Apr 12 21:34:45   i2c_core
>> Apr 12 21:34:45   ioatdma
>> Apr 12 21:34:45   scsi_mod
>> Apr 12 21:34:45   dca
>> Apr 12 21:34:45   ipmi_si
>> Apr 12 21:34:45   ipmi_msghandler
>> Apr 12 21:34:45   acpi_power_meter
>> Apr 12 21:34:45   tpm_tis
>> Apr 12 21:34:45   tpm
>> Apr 12 21:34:45   processor
>> Apr 12 21:34:45   button
>> Apr 12 21:34:45
>> Apr 12 21:34:45  [75704.965874] CPU: 6 PID: 25339 Comm: main Not tainted
>> 4.4.1 #2
>> Apr 12 21:34:45  [75704.965916] Hardware name: Supermicro Super
>> Server/X10DRi-LN4+, BIOS 2.0 12/17/2015
>> Apr 12 21:34:45  [75704.965979]  0000000000000000
>> Apr 12 21:34:45   ffffffff812abdf3
>> Apr 12 21:34:45   0000000000000000
>> Apr 12 21:34:45   ffffffff810cf5f5
>> Apr 12 21:34:45
>> Apr 12 21:34:45  [75704.966054]  ffff881ff2870000
>> Apr 12 21:34:45   ffffffff810fcea2
>> Apr 12 21:34:45   0000000000000001
>> Apr 12 21:34:45   ffff881fffcc5e58
>> Apr 12 21:34:45
>> Apr 12 21:34:45  [75704.966134]  ffff881fffccaf00
>> Apr 12 21:34:45   ffff881fffccb100
>> Apr 12 21:34:45   ffff881ff2870000
>> Apr 12 21:34:45   ffffffff8101bc63
>> Apr 12 21:34:45
>> Apr 12 21:34:45  [75704.966211] Call Trace:
>> Apr 12 21:34:45  [75704.966246]  <NMI>
>> Apr 12 21:34:45   [<ffffffff812abdf3>] ? dump_stack+0x40/0x5d
>> Apr 12 21:34:45  [75704.966297]  [<ffffffff810cf5f5>] ?
>> watchdog_overflow_callback+0xb5/0xd0
>> Apr 12 21:34:45  [75704.966339]  [<ffffffff810fcea2>] ?
>> __perf_event_overflow+0x82/0x1c0
>> Apr 12 21:34:45  [75704.966384]  [<ffffffff8101bc63>] ?
>> intel_pmu_handle_irq+0x1c3/0x3e0
>> Apr 12 21:34:45  [75704.966431]  [<ffffffff8113b5cb>] ?
>> vunmap_page_range+0x1bb/0x320
>> Apr 12 21:34:45  [75704.966474]  [<ffffffff813213e0>] ?
>> ghes_copy_tofrom_phys+0x110/0x1d0
>> Apr 12 21:34:45  [75704.966519]  [<ffffffff81014f53>] ?
>> perf_event_nmi_handler+0x23/0x40
>> Apr 12 21:34:45  [75704.966560]  [<ffffffff81007b85>] ?
>> nmi_handle+0x65/0x100
>> Apr 12 21:34:45  [75704.966597]  [<ffffffff81007dfe>] ? do_nmi+0x1de/0x360
>> Apr 12 21:34:45  [75704.970603]  [<ffffffff8148f957>] ?
>> end_repeat_nmi+0x1a/0x1e
>> Apr 12 21:34:45  [75704.970644]  [<ffffffff810862ca>] ?
>> queued_spin_lock_slowpath+0xea/0x150
>> Apr 12 21:34:45  [75704.970685]  [<ffffffff810862ca>] ?
>> queued_spin_lock_slowpath+0xea/0x150
>> Apr 12 21:34:45  [75704.970728]  [<ffffffff810862ca>] ?
>> queued_spin_lock_slowpath+0xea/0x150
>> Apr 12 21:34:45  [75704.970768]  <<EOE>>
>> Apr 12 21:34:45   [<ffffffffa01b413b>] ? make_request+0x60b/0xbd0 [raid456]
>> Apr 12 21:34:45  [75704.970838]  [<ffffffff810815c0>] ? wait_woken+0x80/0x80
>> Apr 12 21:34:45  [75704.970878]  [<ffffffff81151ec4>] ?
>> kmem_cache_alloc+0xf4/0x120
>> Apr 12 21:34:45  [75704.970922]  [<ffffffffa017632d>] ?
>> md_make_request+0xdd/0x220 [md_mod]
>> Apr 12 21:34:45  [75704.970969]  [<ffffffff81219fde>] ?
>> xfs_map_buffer.isra.12+0x2e/0x60
>> Apr 12 21:34:45  [75704.971012]  [<ffffffff8128691d>] ?
>> generic_make_request+0xed/0x1d0
>> Apr 12 21:34:45  [75704.971052]  [<ffffffff81286a5a>] ?
>> submit_bio+0x5a/0x140
>> Apr 12 21:34:45  [75704.971098]  [<ffffffff81113379>] ?
>> release_pages+0xc9/0x270
>> Apr 12 21:34:45  [75704.971145]  [<ffffffff811a2c01>] ?
>> do_mpage_readpage+0x2d1/0x640
>> Apr 12 21:34:45  [75704.971187]  [<ffffffff811a304d>] ?
>> mpage_readpages+0xdd/0x130
>> Apr 12 21:34:45  [75704.971226]  [<ffffffff8121b510>] ?
>> __xfs_get_blocks+0x750/0x750
>> Apr 12 21:34:45  [75704.971267]  [<ffffffff8121b510>] ?
>> __xfs_get_blocks+0x750/0x750
>> Apr 12 21:34:45  [75704.971313]  [<ffffffff8114ad45>] ?
>> alloc_pages_current+0x85/0x110
>> Apr 12 21:34:45  [75704.971354]  [<ffffffff81111d25>] ?
>> __do_page_cache_readahead+0x165/0x1f0
>> Apr 12 21:34:45  [75704.971399]  [<ffffffff81105902>] ?
>> pagecache_get_page+0x22/0x1a0
>> Apr 12 21:34:45  [75704.971441]  [<ffffffff8110768c>] ?
>> filemap_fault+0x37c/0x400
>> Apr 12 21:34:45  [75704.971481]  [<ffffffff8122474b>] ?
>> xfs_filemap_fault+0x3b/0x80
>> Apr 12 21:34:45  [75704.971526]  [<ffffffff8112d2da>] ? __do_fault+0x3a/0xc0
>> Apr 12 21:34:45  [75704.971564]  [<ffffffff81130883>] ?
>> handle_mm_fault+0x1063/0x1650
>> Apr 12 21:34:45  [75704.971614]  [<ffffffff8103bdae>] ?
>> __do_page_fault+0x11e/0x370
>> Apr 12 21:34:45  [75704.971653]  [<ffffffff811aa4ff>] ?
>> SyS_epoll_wait+0x8f/0xd0
>> Apr 12 21:34:45  [75704.971694]  [<ffffffff8148f64f>] ? page_fault+0x1f/0x30
>> Apr 12 21:34:45  [75705.493640] NMI watchdog: Watchdog detected hard LOCKUP
>> on cpu 12
>> Apr 12 21:34:45
>> Apr 12 21:34:45  [75705.493668] Modules linked in:
>> Apr 12 21:34:45   ipt_REJECT
>> Apr 12 21:34:45   nf_reject_ipv4
>> Apr 12 21:34:45   iptable_mangle
>> Apr 12 21:34:45   tun
>> Apr 12 21:34:45   netconsole
>> Apr 12 21:34:45   configfs
>> Apr 12 21:34:45   xt_multiport
>> Apr 12 21:34:45   ip6table_filter
>> Apr 12 21:34:45   ip6_tables
>> Apr 12 21:34:45   iptable_filter
>> Apr 12 21:34:45   ip_tables
>> Apr 12 21:34:45   x_tables
>> Apr 12 21:34:45   bridge
>> Apr 12 21:34:45   stp
>> Apr 12 21:34:45   llc
>> Apr 12 21:34:45   bonding
>> Apr 12 21:34:45   ext4
>> Apr 12 21:34:45   crc16
>> Apr 12 21:34:45   mbcache
>> Apr 12 21:34:45   jbd2
>> Apr 12 21:34:45   raid1
>> Apr 12 21:34:45   raid0
>> Apr 12 21:34:45   raid456
>> Apr 12 21:34:45   async_raid6_recov
>> Apr 12 21:34:45   async_memcpy
>> Apr 12 21:34:45   async_pq
>> Apr 12 21:34:45   async_xor
>> Apr 12 21:34:45   xor
>> Apr 12 21:34:45   async_tx
>> Apr 12 21:34:45   raid6_pq
>> Apr 12 21:34:45   md_mod
>> Apr 12 21:34:45   sr_mod
>> Apr 12 21:34:45   cdrom
>> Apr 12 21:34:45   usb_storage
>> Apr 12 21:34:45   hid_generic
>> Apr 12 21:34:45   usbhid
>> Apr 12 21:34:45   hid
>> Apr 12 21:34:45   sg
>> Apr 12 21:34:45   sd_mod
>> Apr 12 21:34:45   x86_pkg_temp_thermal
>> Apr 12 21:34:45   coretemp
>> Apr 12 21:34:45   crct10dif_pclmul
>> Apr 12 21:34:45   crc32_pclmul
>> Apr 12 21:34:45   crc32c_intel
>> Apr 12 21:34:45   jitterentropy_rng
>> Apr 12 21:34:45   sha256_ssse3
>> Apr 12 21:34:45   sha256_generic
>> Apr 12 21:34:45   hmac
>> Apr 12 21:34:45   iTCO_wdt
>> Apr 12 21:34:45   iTCO_vendor_support
>> Apr 12 21:34:45   drbg
>> Apr 12 21:34:45   ansi_cprng
>> Apr 12 21:34:45   aesni_intel
>> Apr 12 21:34:45   aes_x86_64
>> Apr 12 21:34:45   lrw
>> Apr 12 21:34:45   gf128mul
>> Apr 12 21:34:45   glue_helper
>> Apr 12 21:34:45   ablk_helper
>> Apr 12 21:34:45   cryptd
>> Apr 12 21:34:45   ahci
>> Apr 12 21:34:45   libahci
>> Apr 12 21:34:45   sb_edac
>> Apr 12 21:34:45   libata
>> Apr 12 21:34:45   igb
>> Apr 12 21:34:45   megaraid_sas
>> Apr 12 21:34:45   xhci_pci
>> Apr 12 21:34:45   ehci_pci
>> Apr 12 21:34:45   i2c_algo_bit
>> Apr 12 21:34:45   xhci_hcd
>> Apr 12 21:34:45   ehci_hcd
>> Apr 12 21:34:45   edac_core
>> Apr 12 21:34:45   ptp
>> Apr 12 21:34:45   mei_me
>> Apr 12 21:34:45   lpc_ich
>> Apr 12 21:34:45   i2c_i801
>> Apr 12 21:34:45   usbcore
>> Apr 12 21:34:45   pps_core
>> Apr 12 21:34:45   mfd_core
>> Apr 12 21:34:45   mei
>> Apr 12 21:34:45   usb_common
>> Apr 12 21:34:45   i2c_core
>> Apr 12 21:34:45   ioatdma
>> Apr 12 21:34:45   scsi_mod
>> Apr 12 21:34:45   dca
>> Apr 12 21:34:45   ipmi_si
>> Apr 12 21:34:45   ipmi_msghandler
>> Apr 12 21:34:45   acpi_power_meter
>> Apr 12 21:34:45   tpm_tis
>> Apr 12 21:34:45   tpm
>> Apr 12 21:34:45   processor
>> Apr 12 21:34:45   button
>> Apr 12 21:34:45
>> Apr 12 21:34:45  [75705.494688] CPU: 12 PID: 32350 Comm: main Not tainted
>> 4.4.1 #2
>> Apr 12 21:34:45  [75705.494728] Hardware name: Supermicro Super
>> Server/X10DRi-LN4+, BIOS 2.0 12/17/2015
>> Apr 12 21:34:45  [75705.494790]  0000000000000000
>> Apr 12 21:34:45   ffffffff812abdf3
>> Apr 12 21:34:45   0000000000000000
>> Apr 12 21:34:45   ffffffff810cf5f5
>> Apr 12 21:34:45
>> Apr 12 21:34:45  [75705.494886]  ffff883ff29a0000
>> Apr 12 21:34:45   ffffffff810fcea2
>> Apr 12 21:34:45   0000000000000001
>> Apr 12 21:34:45   ffff88407fc85e58
>> Apr 12 21:34:45
>> Apr 12 21:34:45  [75705.494976]  ffff88407fc8af00
>> Apr 12 21:34:45   ffff88407fc8b100
>> Apr 12 21:34:45   ffff883ff29a0000
>> Apr 12 21:34:45   ffffffff8101bc63
>> Apr 12 21:34:45
>> Apr 12 21:34:45  [75705.495064] Call Trace:
>> Apr 12 21:34:45  [75705.495094]  <NMI>
>> Apr 12 21:34:45   [<ffffffff812abdf3>] ? dump_stack+0x40/0x5d
>> Apr 12 21:34:45  [75705.495150]  [<ffffffff810cf5f5>] ?
>> watchdog_overflow_callback+0xb5/0xd0
>> Apr 12 21:34:45  [75705.495193]  [<ffffffff810fcea2>] ?
>> __perf_event_overflow+0x82/0x1c0
>> Apr 12 21:34:45  [75705.495237]  [<ffffffff8101bc63>] ?
>> intel_pmu_handle_irq+0x1c3/0x3e0
>> Apr 12 21:34:45  [75705.495284]  [<ffffffff8113b5cb>] ?
>> vunmap_page_range+0x1bb/0x320
>> Apr 12 21:34:45  [75705.495330]  [<ffffffff813213e0>] ?
>> ghes_copy_tofrom_phys+0x110/0x1d0
>> Apr 12 21:34:45  [75705.495373]  [<ffffffff81014f53>] ?
>> perf_event_nmi_handler+0x23/0x40
>> Apr 12 21:34:45  [75705.495418]  [<ffffffff81007b85>] ?
>> nmi_handle+0x65/0x100
>> Apr 12 21:34:45  [75705.495458]  [<ffffffff81007d2e>] ? do_nmi+0x10e/0x360
>> Apr 12 21:34:45  [75705.495497]  [<ffffffff8148f957>] ?
>> end_repeat_nmi+0x1a/0x1e
>> Apr 12 21:34:45  [75705.495540]  [<ffffffff810862ca>] ?
>> queued_spin_lock_slowpath+0xea/0x150
>> Apr 12 21:34:45  [75705.495581]  [<ffffffff810862ca>] ?
>> queued_spin_lock_slowpath+0xea/0x150
>> Apr 12 21:34:45  [75705.495621]  [<ffffffff810862ca>] ?
>> queued_spin_lock_slowpath+0xea/0x150
>> Apr 12 21:34:45  [75705.495661]  <<EOE>>
>> Apr 12 21:34:45   [<ffffffffa01b413b>] ? make_request+0x60b/0xbd0 [raid456]
>> Apr 12 21:34:45  [75705.495733]  [<ffffffff81282d87>] ?
>> blk_rq_init+0x87/0xa0
>> Apr 12 21:34:45  [75705.495771]  [<ffffffff81283e3c>] ?
>> get_request+0x29c/0x6e0
>> Apr 12 21:34:45  [75705.495812]  [<ffffffff810815c0>] ? wait_woken+0x80/0x80
>> Apr 12 21:34:45  [75705.495853]  [<ffffffffa017632d>] ?
>> md_make_request+0xdd/0x220 [md_mod]
>> Apr 12 21:34:45  [75705.495898]  [<ffffffff8128829e>] ?
>> blk_queue_bio+0x15e/0x350
>> Apr 12 21:34:45  [75705.495937]  [<ffffffff8128691d>] ?
>> generic_make_request+0xed/0x1d0
>> Apr 12 21:34:45  [75705.495978]  [<ffffffff81286a5a>] ?
>> submit_bio+0x5a/0x140
>> Apr 12 21:34:45  [75705.496018]  [<ffffffff811a215e>] ?
>> mpage_bio_submit+0x1e/0x30
>> Apr 12 21:34:45  [75705.496057]  [<ffffffff811a3076>] ?
>> mpage_readpages+0x106/0x130
>> Apr 12 21:34:45  [75705.496102]  [<ffffffff8121b510>] ?
>> __xfs_get_blocks+0x750/0x750
>> Apr 12 21:34:45  [75705.496144]  [<ffffffff8121b510>] ?
>> __xfs_get_blocks+0x750/0x750
>> Apr 12 21:34:45  [75705.496185]  [<ffffffff8114ad45>] ?
>> alloc_pages_current+0x85/0x110
>> Apr 12 21:34:45  [75705.496227]  [<ffffffff81111d25>] ?
>> __do_page_cache_readahead+0x165/0x1f0
>> Apr 12 21:34:45  [75705.496268]  [<ffffffff811344f5>] ? vma_link+0x75/0xb0
>> Apr 12 21:34:45  [75705.496307]  [<ffffffff811120eb>] ?
>> force_page_cache_readahead+0x9b/0xe0
>> Apr 12 21:34:45  [75705.496352]  [<ffffffff8113f876>] ?
>> madvise_willneed+0x76/0x140
>> Apr 12 21:34:45  [75705.496395]  [<ffffffff811301ce>] ?
>> handle_mm_fault+0x9ae/0x1650
>> Apr 12 21:34:45  [75705.496437]  [<ffffffff81133dcb>] ? find_vma+0x5b/0x70
>> Apr 12 21:34:45  [75705.496476]  [<ffffffff8113fc52>] ?
>> SyS_madvise+0x312/0x6f0
>> Apr 12 21:34:45  [75705.496515]  [<ffffffff8148d9db>] ?
>> entry_SYSCALL_64_fastpath+0x16/0x6e
>> Apr 12 21:34:47  [75707.118049] NMI watchdog: Watchdog detected hard LOCKUP
>> on cpu 15
>> Apr 12 21:34:47
>> Apr 12 21:34:47  [75707.118078] Modules linked in:
>> Apr 12 21:34:47   ipt_REJECT
>> Apr 12 21:34:47   nf_reject_ipv4
>> Apr 12 21:34:47   iptable_mangle
>> Apr 12 21:34:47   tun
>> Apr 12 21:34:47   netconsole
>> Apr 12 21:34:47   configfs
>> Apr 12 21:34:47   xt_multiport
>> Apr 12 21:34:47   ip6table_filter
>> Apr 12 21:34:47   ip6_tables
>> Apr 12 21:34:47   iptable_filter
>> Apr 12 21:34:47   ip_tables
>> Apr 12 21:34:47   x_tables
>> Apr 12 21:34:47   bridge
>> Apr 12 21:34:47   stp
>> Apr 12 21:34:47   llc
>> Apr 12 21:34:47   bonding
>> Apr 12 21:34:47   ext4
>> Apr 12 21:34:47   crc16
>> Apr 12 21:34:47   mbcache
>> Apr 12 21:34:47   jbd2
>> Apr 12 21:34:47   raid1
>> Apr 12 21:34:47   raid0
>> Apr 12 21:34:47   raid456
>> Apr 12 21:34:47   async_raid6_recov
>> Apr 12 21:34:47   async_memcpy
>> Apr 12 21:34:47   async_pq
>> Apr 12 21:34:47   async_xor
>> Apr 12 21:34:47   xor
>> Apr 12 21:34:47   async_tx
>> Apr 12 21:34:47   raid6_pq
>> Apr 12 21:34:47   md_mod
>> Apr 12 21:34:47   sr_mod
>> Apr 12 21:34:47   cdrom
>> Apr 12 21:34:47   usb_storage
>> Apr 12 21:34:47   hid_generic
>> Apr 12 21:34:47   usbhid
>> Apr 12 21:34:47   hid
>> Apr 12 21:34:47   sg
>> Apr 12 21:34:47   sd_mod
>> Apr 12 21:34:47   x86_pkg_temp_thermal
>> Apr 12 21:34:47   coretemp
>> Apr 12 21:34:47   crct10dif_pclmul
>> Apr 12 21:34:47   crc32_pclmul
>> Apr 12 21:34:47   crc32c_intel
>> Apr 12 21:34:47   jitterentropy_rng
>> Apr 12 21:34:47   sha256_ssse3
>> Apr 12 21:34:47   sha256_generic
>> Apr 12 21:34:47   hmac
>> Apr 12 21:34:47   iTCO_wdt
>> Apr 12 21:34:47   iTCO_vendor_support
>> Apr 12 21:34:47   drbg
>> Apr 12 21:34:47   ansi_cprng
>> Apr 12 21:34:47   aesni_intel
>> Apr 12 21:34:47   aes_x86_64
>> Apr 12 21:34:47   lrw
>> Apr 12 21:34:47   gf128mul
>> Apr 12 21:34:47   glue_helper
>> Apr 12 21:34:47   ablk_helper
>> Apr 12 21:34:47   cryptd
>> Apr 12 21:34:47   ahci
>> Apr 12 21:34:47   libahci
>> Apr 12 21:34:47   sb_edac
>> Apr 12 21:34:47   libata
>> Apr 12 21:34:47   igb
>> Apr 12 21:34:47   megaraid_sas
>> Apr 12 21:34:47   xhci_pci
>> Apr 12 21:34:47   ehci_pci
>> Apr 12 21:34:47   i2c_algo_bit
>> Apr 12 21:34:47   xhci_hcd
>> Apr 12 21:34:47   ehci_hcd
>> Apr 12 21:34:47   edac_core
>> Apr 12 21:34:47   ptp
>> Apr 12 21:34:47   mei_me
>> Apr 12 21:34:47   lpc_ich
>> Apr 12 21:34:47   i2c_i801
>> Apr 12 21:34:47   usbcore
>> Apr 12 21:34:47   pps_core
>> Apr 12 21:34:47   mfd_core
>> Apr 12 21:34:47   mei
>> Apr 12 21:34:47   usb_common
>> Apr 12 21:34:47   i2c_core
>> Apr 12 21:34:47   ioatdma
>> Apr 12 21:34:47   scsi_mod
>> Apr 12 21:34:47   dca
>> Apr 12 21:34:47   ipmi_si
>> Apr 12 21:34:47   ipmi_msghandler
>> Apr 12 21:34:47   acpi_power_meter
>> Apr 12 21:34:47   tpm_tis
>> Apr 12 21:34:47   tpm
>> Apr 12 21:34:47   processor
>> Apr 12 21:34:47   button
>> Apr 12 21:34:47
>> Apr 12 21:34:47  [75707.119088] CPU: 15 PID: 31940 Comm: main Not tainted
>> 4.4.1 #2
>> Apr 12 21:34:47  [75707.119134] Hardware name: Supermicro Super
>> Server/X10DRi-LN4+, BIOS 2.0 12/17/2015
>> Apr 12 21:34:47  [75707.119196]  0000000000000000
>> Apr 12 21:34:47   ffffffff812abdf3
>> Apr 12 21:34:47   0000000000000000
>> Apr 12 21:34:47   ffffffff810cf5f5
>> Apr 12 21:34:47
>> Apr 12 21:34:47  [75707.119277]  ffff883ff2a20000
>> Apr 12 21:34:47   ffffffff810fcea2
>> Apr 12 21:34:47   0000000000000001
>> Apr 12 21:34:47   ffff88407fce5e58
>> Apr 12 21:34:47
>> Apr 12 21:34:47  [75707.119360]  ffff88407fceaf00
>> Apr 12 21:34:47   ffff88407fceb100
>> Apr 12 21:34:47   ffff883ff2a20000
>> Apr 12 21:34:47   ffffffff8101bc63
>> Apr 12 21:34:47
>> Apr 12 21:34:47  [75707.119439] Call Trace:
>> Apr 12 21:34:47  [75707.119471]  <NMI>
>> Apr 12 21:34:47   [<ffffffff812abdf3>] ? dump_stack+0x40/0x5d
>> Apr 12 21:34:47  [75707.119527]  [<ffffffff810cf5f5>] ?
>> watchdog_overflow_callback+0xb5/0xd0
>> Apr 12 21:34:47  [75707.119571]  [<ffffffff810fcea2>] ?
>> __perf_event_overflow+0x82/0x1c0
>> Apr 12 21:34:47  [75707.119614]  [<ffffffff8101bc63>] ?
>> intel_pmu_handle_irq+0x1c3/0x3e0
>> Apr 12 21:34:47  [75707.119657]  [<ffffffff8113b5cb>] ?
>> vunmap_page_range+0x1bb/0x320
>> Apr 12 21:34:47  [75707.119703]  [<ffffffff813213e0>] ?
>> ghes_copy_tofrom_phys+0x110/0x1d0
>> Apr 12 21:34:47  [75707.119758]  [<ffffffff81014f53>] ?
>> perf_event_nmi_handler+0x23/0x40
>> Apr 12 21:34:47  [75707.119800]  [<ffffffff81007b85>] ?
>> nmi_handle+0x65/0x100
>> Apr 12 21:34:47  [75707.119838]  [<ffffffff81007d2e>] ? do_nmi+0x10e/0x360
>> Apr 12 21:34:47  [75707.119878]  [<ffffffff8148f957>] ?
>> end_repeat_nmi+0x1a/0x1e
>> Apr 12 21:34:47  [75707.119920]  [<ffffffff810862ca>] ?
>> queued_spin_lock_slowpath+0xea/0x150
>> Apr 12 21:34:47  [75707.119962]  [<ffffffff810862ca>] ?
>> queued_spin_lock_slowpath+0xea/0x150
>> Apr 12 21:34:47  [75707.120002]  [<ffffffff810862ca>] ?
>> queued_spin_lock_slowpath+0xea/0x150
>> Apr 12 21:34:47  [75707.120042]  <<EOE>>
>> Apr 12 21:34:47   [<ffffffffa01b413b>] ? make_request+0x60b/0xbd0 [raid456]
>> Apr 12 21:34:47  [75707.120113]  [<ffffffff810815c0>] ? wait_woken+0x80/0x80
>> Apr 12 21:34:47  [75707.120152]  [<ffffffffa017632d>] ?
>> md_make_request+0xdd/0x220 [md_mod]
>> Apr 12 21:34:47  [75707.120195]  [<ffffffff8128691d>] ?
>> generic_make_request+0xed/0x1d0
>> Apr 12 21:34:47  [75707.120236]  [<ffffffff81286a5a>] ?
>> submit_bio+0x5a/0x140
>> Apr 12 21:34:47  [75707.120277]  [<ffffffff8112afaf>] ?
>> workingset_refault+0x4f/0xa0
>> Apr 12 21:34:47  [75707.120320]  [<ffffffff811a215e>] ?
>> mpage_bio_submit+0x1e/0x30
>> Apr 12 21:34:47  [75707.120359]  [<ffffffff811a3076>] ?
>> mpage_readpages+0x106/0x130
>> Apr 12 21:34:47  [75707.120401]  [<ffffffff8121b510>] ?
>> __xfs_get_blocks+0x750/0x750
>> Apr 12 21:34:47  [75707.120439]  [<ffffffff8121b510>] ?
>> __xfs_get_blocks+0x750/0x750
>> Apr 12 21:34:47  [75707.120481]  [<ffffffff8114ad45>] ?
>> alloc_pages_current+0x85/0x110
>> Apr 12 21:34:47  [75707.120523]  [<ffffffff81111d25>] ?
>> __do_page_cache_readahead+0x165/0x1f0
>> Apr 12 21:34:47  [75707.120564]  [<ffffffff811344f5>] ? vma_link+0x75/0xb0
>> Apr 12 21:34:47  [75707.120602]  [<ffffffff811120c7>] ?
>> force_page_cache_readahead+0x77/0xe0
>> Apr 12 21:34:47  [75707.120644]  [<ffffffff8113f876>] ?
>> madvise_willneed+0x76/0x140
>> Apr 12 21:34:47  [75707.120683]  [<ffffffff811301ce>] ?
>> handle_mm_fault+0x9ae/0x1650
>> Apr 12 21:34:47  [75707.120722]  [<ffffffff81133dcb>] ? find_vma+0x5b/0x70
>> Apr 12 21:34:47  [75707.120760]  [<ffffffff8113fc52>] ?
>> SyS_madvise+0x312/0x6f0
>> Apr 12 21:34:47  [75707.120799]  [<ffffffff8148d9db>] ?
>> entry_SYSCALL_64_fastpath+0x16/0x6e
>>
>> Once this starts, a couple of minutes goes by and the machine locks up
>> completely.
>>
>> I have been unable to locate the problem here, anyone that can point me in
>> the right direction?
>>
>> Best regards
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Hard CPU Lockup when accessing MD RAID5
  2016-04-20  6:52   ` Daniel Walker
@ 2016-04-20 15:29     ` John Stoffel
  2016-04-21 22:47       ` Daniel Walker
  0 siblings, 1 reply; 5+ messages in thread
From: John Stoffel @ 2016-04-20 15:29 UTC (permalink / raw)
  To: Daniel Walker; +Cc: linux-raid


Daniel,

This is one of those hard problems to diagnose.  Can you take the
system out of production and run some stress tests on it to see how it
does?

Have you updated all the firmware on the board?  Have you disabled
hyperthreading as well?  Is there any overclocking or stuff like that
happening?  If so, go back to the BIOS "safe" defaults.

Do you have another system with the same hardware that's working fine
in the same type of setup?  Then that does point to hardware.

Is your power supply maxed out or near the limits?  Maybe you're
getting a slight under-voltage?  Not likely... but you never know.

And why is the kernel tainted?  Are you adding in third party modules?
If so, remove them completely from the system.  SuperMicros don't
generally require anything like that in my experience.

Is it some of the extra monitoring modules you have installed?

Good luck!
John



>>>>> "Daniel" == Daniel Walker <admin@ftwinc.net> writes:

Daniel> Hi,

Daniel> I upgraded the kernel to the latest stable with debugging enabled 
Daniel> (4.5.1) without any luck, this is what is outputted in dmesg:


Daniel>     [262448.558983] INFO: task php:13376 blocked for more than 120 seconds.
Daniel>     [262448.559057]       Tainted: G        W       4.5.1 #1
Daniel>     [262448.559092] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" 
Daniel> disables this message.
Daniel>     [262448.559246] php             D
Daniel>      ffff88001c297a18
Daniel>         0 13376  12277 0x00000000
Daniel>     [262448.559519]  ffff88001c297a18
Daniel>      ffff881ff248c100
Daniel>      ffff880013e9b400
Daniel>      ffff881fea472000

Daniel>     [262448.559603]  ffff88001c297ae8
Daniel>      ffff88001c298000
Daniel>      ffff881c5cac1b30
Daniel>      ffff880013e9b400

Daniel>     [262448.560046]  0000000000020001
Daniel>      0000000545ea7820
Daniel>      ffff88001c297a30
Daniel>      ffffffff814d5690

Daniel>     [262448.560485] Call Trace:
Daniel>     [262448.560541]  [<ffffffff814d5690>] schedule+0x30/0x80
Daniel>     [262448.560761]  [<ffffffff814d823e>] schedule_timeout+0x21e/0x2a0
Daniel>     [262448.560828]  [<ffffffff81217c3d>] ? 
Daniel> xfs_bmap_search_extents+0x7d/0x100
Daniel>     [262448.561000]  [<ffffffff810902d9>] ? down_trylock+0x29/0x40
Daniel>     [262448.561135]  [<ffffffff814d726f>] __down+0x5f/0xa0
Daniel>     [262448.561268]  [<ffffffff8124bdd6>] ? _xfs_buf_find+0x156/0x350
Daniel>     [262448.561347]  [<ffffffff8109032c>] down+0x3c/0x50
Daniel>     [262448.561390]  [<ffffffff8124bbc7>] xfs_buf_lock+0x37/0xf0
Daniel>     [262448.561435]  [<ffffffff8124bdd6>] _xfs_buf_find+0x156/0x350
Daniel>     [262448.561557]  [<ffffffff8124bff5>] xfs_buf_get_map+0x25/0x280
Daniel>     [262448.561603]  [<ffffffff81268f4b>] ? kmem_zone_alloc+0x7b/0x120
Daniel>     [262448.561666]  [<ffffffff8124cbe8>] xfs_buf_read_map+0x28/0x180
Daniel>     [262448.561768]  [<ffffffff8127830b>] xfs_trans_read_buf_map+0xeb/0x300
Daniel>     [262448.561809]  [<ffffffff8123f7da>] xfs_imap_to_bp+0x5a/0xc0
Daniel>     [262448.561881]  [<ffffffff8125b7a5>] xfs_iunlink_remove+0x275/0x3a0
Daniel>     [262448.561943]  [<ffffffff81268f4b>] ? kmem_zone_alloc+0x7b/0x120
Daniel>     [262448.561988]  [<ffffffff8125ec33>] xfs_ifree+0x33/0xd0
Daniel>     [262448.562033]  [<ffffffff8125ed85>] xfs_inactive_ifree+0xb5/0x200
Daniel>     [262448.562109]  [<ffffffff8125ef58>] xfs_inactive+0x88/0x110
Daniel>     [262448.562296]  [<ffffffff81263f31>] xfs_fs_evict_inode+0xc1/0x110
Daniel>     [262448.562344]  [<ffffffff811a42fb>] evict+0xbb/0x180
Daniel>     [262448.562405]  [<ffffffff811a4bb3>] iput+0x193/0x200
Daniel>     [262448.562483]  [<ffffffff811a08d2>] d_delete+0x122/0x160
Daniel>     [262448.562520]  [<ffffffff81195b99>] vfs_rmdir+0xf9/0x120
Daniel>     [262448.562559]  [<ffffffff81199d17>] do_rmdir+0x1b7/0x1d0
Daniel>     [262448.562607]  [<ffffffff81001210>] ? exit_to_usermode_loop+0x90/0xb0
Daniel>     [262448.562665]  [<ffffffff8119a921>] SyS_rmdir+0x11/0x20
Daniel>     [262448.562891]  [<ffffffff814d8f1b>] 
Daniel> entry_SYSCALL_64_fastpath+0x16/0x6e
Daniel>     [262489.707201] NMI watchdog: Watchdog detected hard LOCKUP on cpu 15

Daniel>     [262489.707227] Modules linked in:
Daniel>      ipt_MASQUERADE
Daniel>      nf_nat_masquerade_ipv4
Daniel>      iptable_nat
Daniel>      nf_conntrack_ipv4
Daniel>      nf_defrag_ipv4
Daniel>      nf_nat_ipv4
Daniel>      nf_nat
Daniel>      nf_conntrack
Daniel>      ipt_REJECT
Daniel>      nf_reject_ipv4
Daniel>      iptable_mangle
Daniel>      netconsole
Daniel>      configfs
Daniel>      tun
Daniel>      xt_multiport
Daniel>      ip6table_filter
Daniel>      ip6_tables
Daniel>      iptable_filter
Daniel>      ip_tables
Daniel>      x_tables
Daniel>      bridge
Daniel>      stp
Daniel>      llc
Daniel>      bonding
Daniel>      ext4
Daniel>      crc16
Daniel>      mbcache
Daniel>      jbd2
Daniel>      raid1
Daniel>      raid0
Daniel>      raid456
Daniel>      async_raid6_recov
Daniel>      async_memcpy
Daniel>      async_pq
Daniel>      async_xor
Daniel>      xor
Daniel>      async_tx
Daniel>      raid6_pq
Daniel>      md_mod
Daniel>      sg
Daniel>      sd_mod
Daniel>      hid_generic
Daniel>      usbhid
Daniel>      hid
Daniel>      x86_pkg_temp_thermal
Daniel>      coretemp
Daniel>      crct10dif_pclmul
Daniel>      crc32_pclmul
Daniel>      crc32c_intel
Daniel>      ghash_clmulni_intel
Daniel>      jitterentropy_rng
Daniel>      sha256_ssse3
Daniel>      iTCO_wdt
Daniel>      sha256_generic
Daniel>      iTCO_vendor_support
Daniel>      hmac
Daniel>      drbg
Daniel>      xhci_pci
Daniel>      ahci
Daniel>      sb_edac
Daniel>      ehci_pci
Daniel>      ansi_cprng
Daniel>      xhci_hcd
Daniel>      ehci_hcd
Daniel>      libahci
Daniel>      i2c_i801
Daniel>      edac_core
Daniel>      lpc_ich
Daniel>      mei_me
Daniel>      mfd_core
Daniel>      libata
Daniel>      usbcore
Daniel>      igb
Daniel>      mei
Daniel>      megaraid_sas
Daniel>      i2c_algo_bit
Daniel>      usb_common
Daniel>      ptp
Daniel>      aesni_intel
Daniel>      pps_core
Daniel>      aes_x86_64
Daniel>      ioatdma
Daniel>      lrw
Daniel>      gf128mul
Daniel>      glue_helper
Daniel>      ablk_helper
Daniel>      i2c_core
Daniel>      scsi_mod
Daniel>      dca
Daniel>      cryptd
Daniel>      ipmi_si
Daniel>      ipmi_msghandler
Daniel>      acpi_power_meter
Daniel>      tpm_tis
Daniel>      tpm
Daniel>      processor
Daniel>      button

Daniel>     [262489.708066] CPU: 15 PID: 17535 Comm: kworker/u32:6 Tainted: 
Daniel> G        W       4.5.1 #1
Daniel>     [262489.708124] Hardware name: Supermicro Super Server/X10DRi-LN4+, 
Daniel> BIOS 2.0 12/17/2015
Daniel>     [262489.708187] Workqueue: writeback wb_workfn
Daniel>      (flush-9:7)

Daniel>     [262489.708228]  0000000000000000
Daniel>      ffff88207fde5bd0
Daniel>      ffffffff812e00b8
Daniel>      0000000000000000

Daniel>     [262489.708298]  0000000000000000
Daniel>      ffff88207fde5be8
Daniel>      ffffffff810dff1d
Daniel>      ffff881ff2270000

Daniel>     [262489.708368]  ffff88207fde5c20
Daniel>      ffffffff8110f8f8
Daniel>      0000000000000001
Daniel>      ffff88207fdeaf00

Daniel>     [262489.708438] Call Trace:
Daniel>     [262489.708467]  <NMI>
Daniel>      [<ffffffff812e00b8>] dump_stack+0x4d/0x65
Daniel>     [262489.708512]  [<ffffffff810dff1d>] 
Daniel> watchdog_overflow_callback+0xdd/0xf0
Daniel>     [262489.708552]  [<ffffffff8110f8f8>] __perf_event_overflow+0x88/0x1d0
Daniel>     [262489.708589]  [<ffffffff811103e4>] perf_event_overflow+0x14/0x20
Daniel>     [262489.708627]  [<ffffffff8101e320>] intel_pmu_handle_irq+0x1d0/0x4a0
Daniel>     [262489.708666]  [<ffffffff81155481>] ? vunmap_page_range+0x1a1/0x310
Daniel>     [262489.708703]  [<ffffffff811555fc>] ? 
Daniel> unmap_kernel_range_noflush+0xc/0x10
Daniel>     [262489.708748]  [<ffffffff8135a543>] ? 
Daniel> ghes_copy_tofrom_phys+0x113/0x1e0
Daniel>     [262489.708788]  [<ffffffff810359da>] ? 
Daniel> native_apic_wait_icr_idle+0x1a/0x30
Daniel>     [262489.708827]  [<ffffffff810096e0>] ? arch_irq_work_raise+0x30/0x40
Daniel>     [262489.708865]  [<ffffffff810162d8>] perf_event_nmi_handler+0x28/0x50
Daniel>     [262489.708902]  [<ffffffff81008121>] nmi_handle+0x61/0x110
Daniel>     [262489.708939]  [<ffffffff810082e7>] do_nmi+0x117/0x3e0
Daniel>     [262489.708975]  [<ffffffff814dae97>] end_repeat_nmi+0x1a/0x1e
Daniel>     [262489.709013]  [<ffffffffa01d05f0>] ? raid5_unplug+0x70/0x130 
Daniel> [raid456]
Daniel>     [262489.709051]  [<ffffffffa01d05f0>] ? raid5_unplug+0x70/0x130 
Daniel> [raid456]
Daniel>     [262489.709089]  [<ffffffffa01d05f0>] ? raid5_unplug+0x70/0x130 
Daniel> [raid456]
Daniel>     [262489.709125]  <<EOE>>
Daniel>      [<ffffffff812b9b98>] blk_flush_plug_list+0xa8/0x210
Daniel>     [262489.709169]  [<ffffffff814d5de0>] ? bit_wait_timeout+0x70/0x70
Daniel>     [262489.709206]  [<ffffffff814d4c04>] io_schedule_timeout+0x54/0x130
Daniel>     [262489.709242]  [<ffffffff814d5df6>] bit_wait_io+0x16/0x60
Daniel>     [262489.709277]  [<ffffffff814d5b59>] __wait_on_bit_lock+0x49/0xa0
Daniel>     [262489.709314]  [<ffffffff81117fd0>] __lock_page+0xb0/0xc0
Daniel>     [262489.709352]  [<ffffffff8108bdc0>] ? 
Daniel> autoremove_wake_function+0x30/0x30
Daniel>     [262489.709391]  [<ffffffff811250f0>] write_cache_pages+0x2f0/0x4d0
Daniel>     [262489.709427]  [<ffffffff81122df0>] ? wb_position_ratio+0x1f0/0x1f0
Daniel>     [262489.709465]  [<ffffffff8112530e>] generic_writepages+0x3e/0x60
Daniel>     [262489.709502]  [<ffffffff81244c18>] xfs_vm_writepages+0x38/0x40
Daniel>     [262489.709539]  [<ffffffff81125e29>] do_writepages+0x19/0x30
Daniel>     [262489.709574]  [<ffffffff811b5c50>] 
Daniel> __writeback_single_inode+0x40/0x310
Daniel>     [262489.709612]  [<ffffffff811b6402>] writeback_sb_inodes+0x242/0x520
Daniel>     [262489.709649]  [<ffffffff811b676a>] __writeback_inodes_wb+0x8a/0xc0
Daniel>     [262489.709686]  [<ffffffff811b6a77>] wb_writeback+0x247/0x2d0
Daniel>     [262489.709721]  [<ffffffff811b716f>] wb_workfn+0x20f/0x3c0
Daniel>     [262489.709758]  [<ffffffff81067513>] process_one_work+0x143/0x400
Daniel>     [262489.709795]  [<ffffffff81067cc1>] worker_thread+0x61/0x490
Daniel>     [262489.709831]  [<ffffffff81067c60>] ? max_active_store+0x60/0x60
Daniel>     [262489.709867]  [<ffffffff8106c926>] kthread+0xd6/0xf0
Daniel>     [262489.709901]  [<ffffffff8106c850>] ? kthread_park+0x50/0x50
Daniel>     [262489.709937]  [<ffffffff814d92af>] ret_from_fork+0x3f/0x70
Daniel>     [262489.709972]  [<ffffffff8106c850>] ? kthread_park+0x50/0x50
Daniel>     [262491.022971] NMI watchdog: Watchdog detected hard LOCKUP on cpu 0

Daniel>     [262491.023470] Modules linked in:
Daniel>      ipt_MASQUERADE
Daniel>      nf_nat_masquerade_ipv4
Daniel>      iptable_nat
Daniel>      nf_conntrack_ipv4
Daniel>      nf_defrag_ipv4
Daniel>      nf_nat_ipv4
Daniel>      nf_nat
Daniel>      nf_conntrack
Daniel>      ipt_REJECT
Daniel>      nf_reject_ipv4
Daniel>      iptable_mangle
Daniel>      netconsole
Daniel>      configfs
Daniel>      tun
Daniel>      xt_multiport
Daniel>      ip6table_filter
Daniel>      ip6_tables
Daniel>      iptable_filter
Daniel>      ip_tables
Daniel>      x_tables
Daniel>      bridge
Daniel>      stp
Daniel>      llc
Daniel>      bonding
Daniel>      ext4
Daniel>      crc16
Daniel>      mbcache
Daniel>      jbd2
Daniel>      raid1
Daniel>      raid0
Daniel>      raid456
Daniel>      async_raid6_recov
Daniel>      async_memcpy
Daniel>      async_pq
Daniel>      async_xor
Daniel>      xor
Daniel>      async_tx
Daniel>      raid6_pq
Daniel>      md_mod
Daniel>      sg
Daniel>      sd_mod
Daniel>      hid_generic
Daniel>      usbhid
Daniel>      hid
Daniel>      x86_pkg_temp_thermal
Daniel>      coretemp
Daniel>      crct10dif_pclmul
Daniel>      crc32_pclmul
Daniel>      crc32c_intel
Daniel>      ghash_clmulni_intel
Daniel>      jitterentropy_rng
Daniel>      sha256_ssse3
Daniel>      iTCO_wdt
Daniel>      sha256_generic
Daniel>      iTCO_vendor_support
Daniel>      hmac
Daniel>      drbg
Daniel>      xhci_pci
Daniel>      ahci
Daniel>      sb_edac
Daniel>      ehci_pci
Daniel>      ansi_cprng
Daniel>      xhci_hcd
Daniel>      ehci_hcd
Daniel>      libahci
Daniel>      i2c_i801
Daniel>      edac_core
Daniel>      lpc_ich
Daniel>      mei_me
Daniel>      mfd_core
Daniel>      libata
Daniel>      usbcore
Daniel>      igb
Daniel>      mei
Daniel>      megaraid_sas
Daniel>      i2c_algo_bit
Daniel>      usb_common
Daniel>      ptp
Daniel>      aesni_intel
Daniel>      pps_core
Daniel>      aes_x86_64
Daniel>      ioatdma
Daniel>      lrw
Daniel>      gf128mul
Daniel>      glue_helper
Daniel>      ablk_helper
Daniel>      i2c_core
Daniel>      scsi_mod
Daniel>      dca
Daniel>      cryptd
Daniel>      ipmi_si
Daniel>      ipmi_msghandler
Daniel>      acpi_power_meter
Daniel>      tpm_tis
Daniel>      tpm
Daniel>      processor
Daniel>      button

Daniel>     [262491.029705] CPU: 0 PID: 1178 Comm: md7_raid5 Tainted: G        
Daniel> W       4.5.1 #1
Daniel>     [262491.029776] Hardware name: Supermicro Super Server/X10DRi-LN4+, 
Daniel> BIOS 2.0 12/17/2015
Daniel>     [262491.029849]  0000000000000000
Daniel>      ffff88207fc05bd0
Daniel>      ffffffff812e00b8
Daniel>      0000000000000000

Daniel>     [262491.029988]  0000000000000000
Daniel>      ffff88207fc05be8
Daniel>      ffffffff810dff1d
Daniel>      ffff881fff032000

Daniel>     [262491.030124]  ffff88207fc05c20
Daniel>      ffffffff8110f8f8
Daniel>      0000000000000001
Daniel>      ffff88207fc0af00

Daniel>     [262491.030260] Call Trace:
Daniel>     [262491.030302]  <NMI>
Daniel>      [<ffffffff812e00b8>] dump_stack+0x4d/0x65
Daniel>     [262491.030377]  [<ffffffff810dff1d>] 
Daniel> watchdog_overflow_callback+0xdd/0xf0
Daniel>     [262491.030432]  [<ffffffff8110f8f8>] __perf_event_overflow+0x88/0x1d0
Daniel>     [262491.030484]  [<ffffffff811103e4>] perf_event_overflow+0x14/0x20
Daniel>     [262491.030536]  [<ffffffff8101e320>] intel_pmu_handle_irq+0x1d0/0x4a0
Daniel>     [262491.030589]  [<ffffffff81155481>] ? vunmap_page_range+0x1a1/0x310
Daniel>     [262491.030640]  [<ffffffff811555fc>] ? 
Daniel> unmap_kernel_range_noflush+0xc/0x10
Daniel>     [262491.030693]  [<ffffffff8135a543>] ? 
Daniel> ghes_copy_tofrom_phys+0x113/0x1e0
Daniel>     [262491.030745]  [<ffffffff8135a681>] ? ghes_read_estatus+0x71/0x140
Daniel>     [262491.030797]  [<ffffffff810162d8>] perf_event_nmi_handler+0x28/0x50
Daniel>     [262491.030849]  [<ffffffff81008121>] nmi_handle+0x61/0x110
Daniel>     [262491.030898]  [<ffffffff810083d1>] do_nmi+0x201/0x3e0
Daniel>     [262491.030949]  [<ffffffff814dae97>] end_repeat_nmi+0x1a/0x1e
Daniel>     [262491.030998]  [<ffffffff81090d23>] ? 
Daniel> queued_spin_lock_slowpath+0x153/0x170
Daniel>     [262491.031050]  [<ffffffff81090d23>] ? 
Daniel> queued_spin_lock_slowpath+0x153/0x170
Daniel>     [262491.031102]  [<ffffffff81090d23>] ? 
Daniel> queued_spin_lock_slowpath+0x153/0x170
Daniel>     [262491.031153]  <<EOE>>
Daniel>      [<ffffffff814d8c6c>] _raw_spin_lock_irq+0x1c/0x20
Daniel>     [262491.031225]  [<ffffffffa01db6b1>] raid5d+0x91/0x720 [raid456]
Daniel>     [262491.031276]  [<ffffffff810a4a8a>] ? try_to_del_timer_sync+0x4a/0x60
Daniel>     [262491.031328]  [<ffffffff810a4ae3>] ? del_timer_sync+0x43/0x50
Daniel>     [262491.031377]  [<ffffffff814d816e>] ? schedule_timeout+0x14e/0x2a0
Daniel>     [262491.031428]  [<ffffffff810a4830>] ? 
Daniel> trace_event_raw_event_tick_stop+0x100/0x100
Daniel>     [262491.031502]  [<ffffffffa017874b>] md_thread+0x12b/0x130 [md_mod]
Daniel>     [262491.031555]  [<ffffffff8108bd90>] ? wait_woken+0x80/0x80
Daniel>     [262491.031605]  [<ffffffffa0178620>] ? find_pers+0x70/0x70 [md_mod]
Daniel>     [262491.031656]  [<ffffffff8106c926>] kthread+0xd6/0xf0
Daniel>     [262491.031704]  [<ffffffff8106c850>] ? kthread_park+0x50/0x50
Daniel>     [262491.031753]  [<ffffffff814d92af>] ret_from_fork+0x3f/0x70
Daniel>     [262491.031802]  [<ffffffff8106c850>] ? kthread_park+0x50/0x50
Daniel>     [262491.031753]  [<ffffffff814d92af>] ret_from_fork+0x3f/0x70
Daniel>     [262491.031802]  [<ffffffff8106c850>] ? kthread_park+0x50/0x50

Daniel> The server is hosting plain VPS's, there's a few that use it for 
Daniel> rtorrent which is quite disk extenssive, but from what I can see that 
Daniel> iowait is quite low.

Daniel> There's absolutely nothing logged at all before the lockups, everythings 
Daniel> running fine and then suddenly it just crashes, im beginning to think we 
Daniel> might have a hardware problem, but im having a hard time finding the 
Daniel> actual issue.

Daniel> Any ideas?

Daniel> Best regards


Daniel> Den 13-04-2016 kl. 19:00 skrev Shaohua Li:
>> Looks there is a deadlock trying to hold the device_lock or hash_lock. anything
>> abormal print out before the NMI watchdog? What is running in the machine?
>> Looks this is old kernel, is it possible you can try a latest kernel and report
>> back?
>> 
>> Thanks,
>> Shaohua
>> 
>> On Tue, Apr 12, 2016 at 09:54:08PM +0000, Daniel Walker wrote:
>>> Im having some issues on a brand new Supermicro server that we have running
>>> in production along side a few other machines which are identical to this
>>> server..
>>> 
>>> The output from the netconsole attached to the server is here:
>>> 
>>> Apr 12 21:34:45  [75704.964946] NMI watchdog: Watchdog detected hard LOCKUP
>>> on cpu 6
>>> Apr 12 21:34:45
>>> Apr 12 21:34:45  [75704.964973] Modules linked in:
>>> Apr 12 21:34:45   ipt_REJECT
>>> Apr 12 21:34:45   nf_reject_ipv4
>>> Apr 12 21:34:45   iptable_mangle
>>> Apr 12 21:34:45   tun
>>> Apr 12 21:34:45   netconsole
>>> Apr 12 21:34:45   configfs
>>> Apr 12 21:34:45   xt_multiport
>>> Apr 12 21:34:45   ip6table_filter
>>> Apr 12 21:34:45   ip6_tables
>>> Apr 12 21:34:45   iptable_filter
>>> Apr 12 21:34:45   ip_tables
>>> Apr 12 21:34:45   x_tables
>>> Apr 12 21:34:45   bridge
>>> Apr 12 21:34:45   stp
>>> Apr 12 21:34:45   llc
>>> Apr 12 21:34:45   bonding
>>> Apr 12 21:34:45   ext4
>>> Apr 12 21:34:45   crc16
>>> Apr 12 21:34:45   mbcache
>>> Apr 12 21:34:45   jbd2
>>> Apr 12 21:34:45   raid1
>>> Apr 12 21:34:45   raid0
>>> Apr 12 21:34:45   raid456
>>> Apr 12 21:34:45   async_raid6_recov
>>> Apr 12 21:34:45   async_memcpy
>>> Apr 12 21:34:45   async_pq
>>> Apr 12 21:34:45   async_xor
>>> Apr 12 21:34:45   xor
>>> Apr 12 21:34:45   async_tx
>>> Apr 12 21:34:45   raid6_pq
>>> Apr 12 21:34:45   md_mod
>>> Apr 12 21:34:45   sr_mod
>>> Apr 12 21:34:45   cdrom
>>> Apr 12 21:34:45   usb_storage
>>> Apr 12 21:34:45   hid_generic
>>> Apr 12 21:34:45   usbhid
>>> Apr 12 21:34:45   hid
>>> Apr 12 21:34:45   sg
>>> Apr 12 21:34:45   sd_mod
>>> Apr 12 21:34:45   x86_pkg_temp_thermal
>>> Apr 12 21:34:45   coretemp
>>> Apr 12 21:34:45   crct10dif_pclmul
>>> Apr 12 21:34:45   crc32_pclmul
>>> Apr 12 21:34:45   crc32c_intel
>>> Apr 12 21:34:45   jitterentropy_rng
>>> Apr 12 21:34:45   sha256_ssse3
>>> Apr 12 21:34:45   sha256_generic
>>> Apr 12 21:34:45   hmac
>>> Apr 12 21:34:45   iTCO_wdt
>>> Apr 12 21:34:45   iTCO_vendor_support
>>> Apr 12 21:34:45   drbg
>>> Apr 12 21:34:45   ansi_cprng
>>> Apr 12 21:34:45   aesni_intel
>>> Apr 12 21:34:45   aes_x86_64
>>> Apr 12 21:34:45   lrw
>>> Apr 12 21:34:45   gf128mul
>>> Apr 12 21:34:45   glue_helper
>>> Apr 12 21:34:45   ablk_helper
>>> Apr 12 21:34:45   cryptd
>>> Apr 12 21:34:45   ahci
>>> Apr 12 21:34:45   libahci
>>> Apr 12 21:34:45   sb_edac
>>> Apr 12 21:34:45   libata
>>> Apr 12 21:34:45   igb
>>> Apr 12 21:34:45   megaraid_sas
>>> Apr 12 21:34:45   xhci_pci
>>> Apr 12 21:34:45   ehci_pci
>>> Apr 12 21:34:45   i2c_algo_bit
>>> Apr 12 21:34:45   xhci_hcd
>>> Apr 12 21:34:45   ehci_hcd
>>> Apr 12 21:34:45   edac_core
>>> Apr 12 21:34:45   ptp
>>> Apr 12 21:34:45   mei_me
>>> Apr 12 21:34:45   lpc_ich
>>> Apr 12 21:34:45   i2c_i801
>>> Apr 12 21:34:45   usbcore
>>> Apr 12 21:34:45   pps_core
>>> Apr 12 21:34:45   mfd_core
>>> Apr 12 21:34:45   mei
>>> Apr 12 21:34:45   usb_common
>>> Apr 12 21:34:45   i2c_core
>>> Apr 12 21:34:45   ioatdma
>>> Apr 12 21:34:45   scsi_mod
>>> Apr 12 21:34:45   dca
>>> Apr 12 21:34:45   ipmi_si
>>> Apr 12 21:34:45   ipmi_msghandler
>>> Apr 12 21:34:45   acpi_power_meter
>>> Apr 12 21:34:45   tpm_tis
>>> Apr 12 21:34:45   tpm
>>> Apr 12 21:34:45   processor
>>> Apr 12 21:34:45   button
>>> Apr 12 21:34:45
>>> Apr 12 21:34:45  [75704.965874] CPU: 6 PID: 25339 Comm: main Not tainted
>>> 4.4.1 #2
>>> Apr 12 21:34:45  [75704.965916] Hardware name: Supermicro Super
>>> Server/X10DRi-LN4+, BIOS 2.0 12/17/2015
>>> Apr 12 21:34:45  [75704.965979]  0000000000000000
>>> Apr 12 21:34:45   ffffffff812abdf3
>>> Apr 12 21:34:45   0000000000000000
>>> Apr 12 21:34:45   ffffffff810cf5f5
>>> Apr 12 21:34:45
>>> Apr 12 21:34:45  [75704.966054]  ffff881ff2870000
>>> Apr 12 21:34:45   ffffffff810fcea2
>>> Apr 12 21:34:45   0000000000000001
>>> Apr 12 21:34:45   ffff881fffcc5e58
>>> Apr 12 21:34:45
>>> Apr 12 21:34:45  [75704.966134]  ffff881fffccaf00
>>> Apr 12 21:34:45   ffff881fffccb100
>>> Apr 12 21:34:45   ffff881ff2870000
>>> Apr 12 21:34:45   ffffffff8101bc63
>>> Apr 12 21:34:45
>>> Apr 12 21:34:45  [75704.966211] Call Trace:
>>> Apr 12 21:34:45  [75704.966246]  <NMI>
>>> Apr 12 21:34:45   [<ffffffff812abdf3>] ? dump_stack+0x40/0x5d
>>> Apr 12 21:34:45  [75704.966297]  [<ffffffff810cf5f5>] ?
>>> watchdog_overflow_callback+0xb5/0xd0
>>> Apr 12 21:34:45  [75704.966339]  [<ffffffff810fcea2>] ?
>>> __perf_event_overflow+0x82/0x1c0
>>> Apr 12 21:34:45  [75704.966384]  [<ffffffff8101bc63>] ?
>>> intel_pmu_handle_irq+0x1c3/0x3e0
>>> Apr 12 21:34:45  [75704.966431]  [<ffffffff8113b5cb>] ?
>>> vunmap_page_range+0x1bb/0x320
>>> Apr 12 21:34:45  [75704.966474]  [<ffffffff813213e0>] ?
>>> ghes_copy_tofrom_phys+0x110/0x1d0
>>> Apr 12 21:34:45  [75704.966519]  [<ffffffff81014f53>] ?
>>> perf_event_nmi_handler+0x23/0x40
>>> Apr 12 21:34:45  [75704.966560]  [<ffffffff81007b85>] ?
>>> nmi_handle+0x65/0x100
>>> Apr 12 21:34:45  [75704.966597]  [<ffffffff81007dfe>] ? do_nmi+0x1de/0x360
>>> Apr 12 21:34:45  [75704.970603]  [<ffffffff8148f957>] ?
>>> end_repeat_nmi+0x1a/0x1e
>>> Apr 12 21:34:45  [75704.970644]  [<ffffffff810862ca>] ?
>>> queued_spin_lock_slowpath+0xea/0x150
>>> Apr 12 21:34:45  [75704.970685]  [<ffffffff810862ca>] ?
>>> queued_spin_lock_slowpath+0xea/0x150
>>> Apr 12 21:34:45  [75704.970728]  [<ffffffff810862ca>] ?
>>> queued_spin_lock_slowpath+0xea/0x150
>>> Apr 12 21:34:45  [75704.970768]  <<EOE>>
>>> Apr 12 21:34:45   [<ffffffffa01b413b>] ? make_request+0x60b/0xbd0 [raid456]
>>> Apr 12 21:34:45  [75704.970838]  [<ffffffff810815c0>] ? wait_woken+0x80/0x80
>>> Apr 12 21:34:45  [75704.970878]  [<ffffffff81151ec4>] ?
>>> kmem_cache_alloc+0xf4/0x120
>>> Apr 12 21:34:45  [75704.970922]  [<ffffffffa017632d>] ?
>>> md_make_request+0xdd/0x220 [md_mod]
>>> Apr 12 21:34:45  [75704.970969]  [<ffffffff81219fde>] ?
>>> xfs_map_buffer.isra.12+0x2e/0x60
>>> Apr 12 21:34:45  [75704.971012]  [<ffffffff8128691d>] ?
>>> generic_make_request+0xed/0x1d0
>>> Apr 12 21:34:45  [75704.971052]  [<ffffffff81286a5a>] ?
>>> submit_bio+0x5a/0x140
>>> Apr 12 21:34:45  [75704.971098]  [<ffffffff81113379>] ?
>>> release_pages+0xc9/0x270
>>> Apr 12 21:34:45  [75704.971145]  [<ffffffff811a2c01>] ?
>>> do_mpage_readpage+0x2d1/0x640
>>> Apr 12 21:34:45  [75704.971187]  [<ffffffff811a304d>] ?
>>> mpage_readpages+0xdd/0x130
>>> Apr 12 21:34:45  [75704.971226]  [<ffffffff8121b510>] ?
>>> __xfs_get_blocks+0x750/0x750
>>> Apr 12 21:34:45  [75704.971267]  [<ffffffff8121b510>] ?
>>> __xfs_get_blocks+0x750/0x750
>>> Apr 12 21:34:45  [75704.971313]  [<ffffffff8114ad45>] ?
>>> alloc_pages_current+0x85/0x110
>>> Apr 12 21:34:45  [75704.971354]  [<ffffffff81111d25>] ?
>>> __do_page_cache_readahead+0x165/0x1f0
>>> Apr 12 21:34:45  [75704.971399]  [<ffffffff81105902>] ?
>>> pagecache_get_page+0x22/0x1a0
>>> Apr 12 21:34:45  [75704.971441]  [<ffffffff8110768c>] ?
>>> filemap_fault+0x37c/0x400
>>> Apr 12 21:34:45  [75704.971481]  [<ffffffff8122474b>] ?
>>> xfs_filemap_fault+0x3b/0x80
>>> Apr 12 21:34:45  [75704.971526]  [<ffffffff8112d2da>] ? __do_fault+0x3a/0xc0
>>> Apr 12 21:34:45  [75704.971564]  [<ffffffff81130883>] ?
>>> handle_mm_fault+0x1063/0x1650
>>> Apr 12 21:34:45  [75704.971614]  [<ffffffff8103bdae>] ?
>>> __do_page_fault+0x11e/0x370
>>> Apr 12 21:34:45  [75704.971653]  [<ffffffff811aa4ff>] ?
>>> SyS_epoll_wait+0x8f/0xd0
>>> Apr 12 21:34:45  [75704.971694]  [<ffffffff8148f64f>] ? page_fault+0x1f/0x30
>>> Apr 12 21:34:45  [75705.493640] NMI watchdog: Watchdog detected hard LOCKUP
>>> on cpu 12
>>> Apr 12 21:34:45
>>> Apr 12 21:34:45  [75705.493668] Modules linked in:
>>> Apr 12 21:34:45   ipt_REJECT
>>> Apr 12 21:34:45   nf_reject_ipv4
>>> Apr 12 21:34:45   iptable_mangle
>>> Apr 12 21:34:45   tun
>>> Apr 12 21:34:45   netconsole
>>> Apr 12 21:34:45   configfs
>>> Apr 12 21:34:45   xt_multiport
>>> Apr 12 21:34:45   ip6table_filter
>>> Apr 12 21:34:45   ip6_tables
>>> Apr 12 21:34:45   iptable_filter
>>> Apr 12 21:34:45   ip_tables
>>> Apr 12 21:34:45   x_tables
>>> Apr 12 21:34:45   bridge
>>> Apr 12 21:34:45   stp
>>> Apr 12 21:34:45   llc
>>> Apr 12 21:34:45   bonding
>>> Apr 12 21:34:45   ext4
>>> Apr 12 21:34:45   crc16
>>> Apr 12 21:34:45   mbcache
>>> Apr 12 21:34:45   jbd2
>>> Apr 12 21:34:45   raid1
>>> Apr 12 21:34:45   raid0
>>> Apr 12 21:34:45   raid456
>>> Apr 12 21:34:45   async_raid6_recov
>>> Apr 12 21:34:45   async_memcpy
>>> Apr 12 21:34:45   async_pq
>>> Apr 12 21:34:45   async_xor
>>> Apr 12 21:34:45   xor
>>> Apr 12 21:34:45   async_tx
>>> Apr 12 21:34:45   raid6_pq
>>> Apr 12 21:34:45   md_mod
>>> Apr 12 21:34:45   sr_mod
>>> Apr 12 21:34:45   cdrom
>>> Apr 12 21:34:45   usb_storage
>>> Apr 12 21:34:45   hid_generic
>>> Apr 12 21:34:45   usbhid
>>> Apr 12 21:34:45   hid
>>> Apr 12 21:34:45   sg
>>> Apr 12 21:34:45   sd_mod
>>> Apr 12 21:34:45   x86_pkg_temp_thermal
>>> Apr 12 21:34:45   coretemp
>>> Apr 12 21:34:45   crct10dif_pclmul
>>> Apr 12 21:34:45   crc32_pclmul
>>> Apr 12 21:34:45   crc32c_intel
>>> Apr 12 21:34:45   jitterentropy_rng
>>> Apr 12 21:34:45   sha256_ssse3
>>> Apr 12 21:34:45   sha256_generic
>>> Apr 12 21:34:45   hmac
>>> Apr 12 21:34:45   iTCO_wdt
>>> Apr 12 21:34:45   iTCO_vendor_support
>>> Apr 12 21:34:45   drbg
>>> Apr 12 21:34:45   ansi_cprng
>>> Apr 12 21:34:45   aesni_intel
>>> Apr 12 21:34:45   aes_x86_64
>>> Apr 12 21:34:45   lrw
>>> Apr 12 21:34:45   gf128mul
>>> Apr 12 21:34:45   glue_helper
>>> Apr 12 21:34:45   ablk_helper
>>> Apr 12 21:34:45   cryptd
>>> Apr 12 21:34:45   ahci
>>> Apr 12 21:34:45   libahci
>>> Apr 12 21:34:45   sb_edac
>>> Apr 12 21:34:45   libata
>>> Apr 12 21:34:45   igb
>>> Apr 12 21:34:45   megaraid_sas
>>> Apr 12 21:34:45   xhci_pci
>>> Apr 12 21:34:45   ehci_pci
>>> Apr 12 21:34:45   i2c_algo_bit
>>> Apr 12 21:34:45   xhci_hcd
>>> Apr 12 21:34:45   ehci_hcd
>>> Apr 12 21:34:45   edac_core
>>> Apr 12 21:34:45   ptp
>>> Apr 12 21:34:45   mei_me
>>> Apr 12 21:34:45   lpc_ich
>>> Apr 12 21:34:45   i2c_i801
>>> Apr 12 21:34:45   usbcore
>>> Apr 12 21:34:45   pps_core
>>> Apr 12 21:34:45   mfd_core
>>> Apr 12 21:34:45   mei
>>> Apr 12 21:34:45   usb_common
>>> Apr 12 21:34:45   i2c_core
>>> Apr 12 21:34:45   ioatdma
>>> Apr 12 21:34:45   scsi_mod
>>> Apr 12 21:34:45   dca
>>> Apr 12 21:34:45   ipmi_si
>>> Apr 12 21:34:45   ipmi_msghandler
>>> Apr 12 21:34:45   acpi_power_meter
>>> Apr 12 21:34:45   tpm_tis
>>> Apr 12 21:34:45   tpm
>>> Apr 12 21:34:45   processor
>>> Apr 12 21:34:45   button
>>> Apr 12 21:34:45
>>> Apr 12 21:34:45  [75705.494688] CPU: 12 PID: 32350 Comm: main Not tainted
>>> 4.4.1 #2
>>> Apr 12 21:34:45  [75705.494728] Hardware name: Supermicro Super
>>> Server/X10DRi-LN4+, BIOS 2.0 12/17/2015
>>> Apr 12 21:34:45  [75705.494790]  0000000000000000
>>> Apr 12 21:34:45   ffffffff812abdf3
>>> Apr 12 21:34:45   0000000000000000
>>> Apr 12 21:34:45   ffffffff810cf5f5
>>> Apr 12 21:34:45
>>> Apr 12 21:34:45  [75705.494886]  ffff883ff29a0000
>>> Apr 12 21:34:45   ffffffff810fcea2
>>> Apr 12 21:34:45   0000000000000001
>>> Apr 12 21:34:45   ffff88407fc85e58
>>> Apr 12 21:34:45
>>> Apr 12 21:34:45  [75705.494976]  ffff88407fc8af00
>>> Apr 12 21:34:45   ffff88407fc8b100
>>> Apr 12 21:34:45   ffff883ff29a0000
>>> Apr 12 21:34:45   ffffffff8101bc63
>>> Apr 12 21:34:45
>>> Apr 12 21:34:45  [75705.495064] Call Trace:
>>> Apr 12 21:34:45  [75705.495094]  <NMI>
>>> Apr 12 21:34:45   [<ffffffff812abdf3>] ? dump_stack+0x40/0x5d
>>> Apr 12 21:34:45  [75705.495150]  [<ffffffff810cf5f5>] ?
>>> watchdog_overflow_callback+0xb5/0xd0
>>> Apr 12 21:34:45  [75705.495193]  [<ffffffff810fcea2>] ?
>>> __perf_event_overflow+0x82/0x1c0
>>> Apr 12 21:34:45  [75705.495237]  [<ffffffff8101bc63>] ?
>>> intel_pmu_handle_irq+0x1c3/0x3e0
>>> Apr 12 21:34:45  [75705.495284]  [<ffffffff8113b5cb>] ?
>>> vunmap_page_range+0x1bb/0x320
>>> Apr 12 21:34:45  [75705.495330]  [<ffffffff813213e0>] ?
>>> ghes_copy_tofrom_phys+0x110/0x1d0
>>> Apr 12 21:34:45  [75705.495373]  [<ffffffff81014f53>] ?
>>> perf_event_nmi_handler+0x23/0x40
>>> Apr 12 21:34:45  [75705.495418]  [<ffffffff81007b85>] ?
>>> nmi_handle+0x65/0x100
>>> Apr 12 21:34:45  [75705.495458]  [<ffffffff81007d2e>] ? do_nmi+0x10e/0x360
>>> Apr 12 21:34:45  [75705.495497]  [<ffffffff8148f957>] ?
>>> end_repeat_nmi+0x1a/0x1e
>>> Apr 12 21:34:45  [75705.495540]  [<ffffffff810862ca>] ?
>>> queued_spin_lock_slowpath+0xea/0x150
>>> Apr 12 21:34:45  [75705.495581]  [<ffffffff810862ca>] ?
>>> queued_spin_lock_slowpath+0xea/0x150
>>> Apr 12 21:34:45  [75705.495621]  [<ffffffff810862ca>] ?
>>> queued_spin_lock_slowpath+0xea/0x150
>>> Apr 12 21:34:45  [75705.495661]  <<EOE>>
>>> Apr 12 21:34:45   [<ffffffffa01b413b>] ? make_request+0x60b/0xbd0 [raid456]
>>> Apr 12 21:34:45  [75705.495733]  [<ffffffff81282d87>] ?
>>> blk_rq_init+0x87/0xa0
>>> Apr 12 21:34:45  [75705.495771]  [<ffffffff81283e3c>] ?
>>> get_request+0x29c/0x6e0
>>> Apr 12 21:34:45  [75705.495812]  [<ffffffff810815c0>] ? wait_woken+0x80/0x80
>>> Apr 12 21:34:45  [75705.495853]  [<ffffffffa017632d>] ?
>>> md_make_request+0xdd/0x220 [md_mod]
>>> Apr 12 21:34:45  [75705.495898]  [<ffffffff8128829e>] ?
>>> blk_queue_bio+0x15e/0x350
>>> Apr 12 21:34:45  [75705.495937]  [<ffffffff8128691d>] ?
>>> generic_make_request+0xed/0x1d0
>>> Apr 12 21:34:45  [75705.495978]  [<ffffffff81286a5a>] ?
>>> submit_bio+0x5a/0x140
>>> Apr 12 21:34:45  [75705.496018]  [<ffffffff811a215e>] ?
>>> mpage_bio_submit+0x1e/0x30
>>> Apr 12 21:34:45  [75705.496057]  [<ffffffff811a3076>] ?
>>> mpage_readpages+0x106/0x130
>>> Apr 12 21:34:45  [75705.496102]  [<ffffffff8121b510>] ?
>>> __xfs_get_blocks+0x750/0x750
>>> Apr 12 21:34:45  [75705.496144]  [<ffffffff8121b510>] ?
>>> __xfs_get_blocks+0x750/0x750
>>> Apr 12 21:34:45  [75705.496185]  [<ffffffff8114ad45>] ?
>>> alloc_pages_current+0x85/0x110
>>> Apr 12 21:34:45  [75705.496227]  [<ffffffff81111d25>] ?
>>> __do_page_cache_readahead+0x165/0x1f0
>>> Apr 12 21:34:45  [75705.496268]  [<ffffffff811344f5>] ? vma_link+0x75/0xb0
>>> Apr 12 21:34:45  [75705.496307]  [<ffffffff811120eb>] ?
>>> force_page_cache_readahead+0x9b/0xe0
>>> Apr 12 21:34:45  [75705.496352]  [<ffffffff8113f876>] ?
>>> madvise_willneed+0x76/0x140
>>> Apr 12 21:34:45  [75705.496395]  [<ffffffff811301ce>] ?
>>> handle_mm_fault+0x9ae/0x1650
>>> Apr 12 21:34:45  [75705.496437]  [<ffffffff81133dcb>] ? find_vma+0x5b/0x70
>>> Apr 12 21:34:45  [75705.496476]  [<ffffffff8113fc52>] ?
>>> SyS_madvise+0x312/0x6f0
>>> Apr 12 21:34:45  [75705.496515]  [<ffffffff8148d9db>] ?
>>> entry_SYSCALL_64_fastpath+0x16/0x6e
>>> Apr 12 21:34:47  [75707.118049] NMI watchdog: Watchdog detected hard LOCKUP
>>> on cpu 15
>>> Apr 12 21:34:47
>>> Apr 12 21:34:47  [75707.118078] Modules linked in:
>>> Apr 12 21:34:47   ipt_REJECT
>>> Apr 12 21:34:47   nf_reject_ipv4
>>> Apr 12 21:34:47   iptable_mangle
>>> Apr 12 21:34:47   tun
>>> Apr 12 21:34:47   netconsole
>>> Apr 12 21:34:47   configfs
>>> Apr 12 21:34:47   xt_multiport
>>> Apr 12 21:34:47   ip6table_filter
>>> Apr 12 21:34:47   ip6_tables
>>> Apr 12 21:34:47   iptable_filter
>>> Apr 12 21:34:47   ip_tables
>>> Apr 12 21:34:47   x_tables
>>> Apr 12 21:34:47   bridge
>>> Apr 12 21:34:47   stp
>>> Apr 12 21:34:47   llc
>>> Apr 12 21:34:47   bonding
>>> Apr 12 21:34:47   ext4
>>> Apr 12 21:34:47   crc16
>>> Apr 12 21:34:47   mbcache
>>> Apr 12 21:34:47   jbd2
>>> Apr 12 21:34:47   raid1
>>> Apr 12 21:34:47   raid0
>>> Apr 12 21:34:47   raid456
>>> Apr 12 21:34:47   async_raid6_recov
>>> Apr 12 21:34:47   async_memcpy
>>> Apr 12 21:34:47   async_pq
>>> Apr 12 21:34:47   async_xor
>>> Apr 12 21:34:47   xor
>>> Apr 12 21:34:47   async_tx
>>> Apr 12 21:34:47   raid6_pq
>>> Apr 12 21:34:47   md_mod
>>> Apr 12 21:34:47   sr_mod
>>> Apr 12 21:34:47   cdrom
>>> Apr 12 21:34:47   usb_storage
>>> Apr 12 21:34:47   hid_generic
>>> Apr 12 21:34:47   usbhid
>>> Apr 12 21:34:47   hid
>>> Apr 12 21:34:47   sg
>>> Apr 12 21:34:47   sd_mod
>>> Apr 12 21:34:47   x86_pkg_temp_thermal
>>> Apr 12 21:34:47   coretemp
>>> Apr 12 21:34:47   crct10dif_pclmul
>>> Apr 12 21:34:47   crc32_pclmul
>>> Apr 12 21:34:47   crc32c_intel
>>> Apr 12 21:34:47   jitterentropy_rng
>>> Apr 12 21:34:47   sha256_ssse3
>>> Apr 12 21:34:47   sha256_generic
>>> Apr 12 21:34:47   hmac
>>> Apr 12 21:34:47   iTCO_wdt
>>> Apr 12 21:34:47   iTCO_vendor_support
>>> Apr 12 21:34:47   drbg
>>> Apr 12 21:34:47   ansi_cprng
>>> Apr 12 21:34:47   aesni_intel
>>> Apr 12 21:34:47   aes_x86_64
>>> Apr 12 21:34:47   lrw
>>> Apr 12 21:34:47   gf128mul
>>> Apr 12 21:34:47   glue_helper
>>> Apr 12 21:34:47   ablk_helper
>>> Apr 12 21:34:47   cryptd
>>> Apr 12 21:34:47   ahci
>>> Apr 12 21:34:47   libahci
>>> Apr 12 21:34:47   sb_edac
>>> Apr 12 21:34:47   libata
>>> Apr 12 21:34:47   igb
>>> Apr 12 21:34:47   megaraid_sas
>>> Apr 12 21:34:47   xhci_pci
>>> Apr 12 21:34:47   ehci_pci
>>> Apr 12 21:34:47   i2c_algo_bit
>>> Apr 12 21:34:47   xhci_hcd
>>> Apr 12 21:34:47   ehci_hcd
>>> Apr 12 21:34:47   edac_core
>>> Apr 12 21:34:47   ptp
>>> Apr 12 21:34:47   mei_me
>>> Apr 12 21:34:47   lpc_ich
>>> Apr 12 21:34:47   i2c_i801
>>> Apr 12 21:34:47   usbcore
>>> Apr 12 21:34:47   pps_core
>>> Apr 12 21:34:47   mfd_core
>>> Apr 12 21:34:47   mei
>>> Apr 12 21:34:47   usb_common
>>> Apr 12 21:34:47   i2c_core
>>> Apr 12 21:34:47   ioatdma
>>> Apr 12 21:34:47   scsi_mod
>>> Apr 12 21:34:47   dca
>>> Apr 12 21:34:47   ipmi_si
>>> Apr 12 21:34:47   ipmi_msghandler
>>> Apr 12 21:34:47   acpi_power_meter
>>> Apr 12 21:34:47   tpm_tis
>>> Apr 12 21:34:47   tpm
>>> Apr 12 21:34:47   processor
>>> Apr 12 21:34:47   button
>>> Apr 12 21:34:47
>>> Apr 12 21:34:47  [75707.119088] CPU: 15 PID: 31940 Comm: main Not tainted
>>> 4.4.1 #2
>>> Apr 12 21:34:47  [75707.119134] Hardware name: Supermicro Super
>>> Server/X10DRi-LN4+, BIOS 2.0 12/17/2015
>>> Apr 12 21:34:47  [75707.119196]  0000000000000000
>>> Apr 12 21:34:47   ffffffff812abdf3
>>> Apr 12 21:34:47   0000000000000000
>>> Apr 12 21:34:47   ffffffff810cf5f5
>>> Apr 12 21:34:47
>>> Apr 12 21:34:47  [75707.119277]  ffff883ff2a20000
>>> Apr 12 21:34:47   ffffffff810fcea2
>>> Apr 12 21:34:47   0000000000000001
>>> Apr 12 21:34:47   ffff88407fce5e58
>>> Apr 12 21:34:47
>>> Apr 12 21:34:47  [75707.119360]  ffff88407fceaf00
>>> Apr 12 21:34:47   ffff88407fceb100
>>> Apr 12 21:34:47   ffff883ff2a20000
>>> Apr 12 21:34:47   ffffffff8101bc63
>>> Apr 12 21:34:47
>>> Apr 12 21:34:47  [75707.119439] Call Trace:
>>> Apr 12 21:34:47  [75707.119471]  <NMI>
>>> Apr 12 21:34:47   [<ffffffff812abdf3>] ? dump_stack+0x40/0x5d
>>> Apr 12 21:34:47  [75707.119527]  [<ffffffff810cf5f5>] ?
>>> watchdog_overflow_callback+0xb5/0xd0
>>> Apr 12 21:34:47  [75707.119571]  [<ffffffff810fcea2>] ?
>>> __perf_event_overflow+0x82/0x1c0
>>> Apr 12 21:34:47  [75707.119614]  [<ffffffff8101bc63>] ?
>>> intel_pmu_handle_irq+0x1c3/0x3e0
>>> Apr 12 21:34:47  [75707.119657]  [<ffffffff8113b5cb>] ?
>>> vunmap_page_range+0x1bb/0x320
>>> Apr 12 21:34:47  [75707.119703]  [<ffffffff813213e0>] ?
>>> ghes_copy_tofrom_phys+0x110/0x1d0
>>> Apr 12 21:34:47  [75707.119758]  [<ffffffff81014f53>] ?
>>> perf_event_nmi_handler+0x23/0x40
>>> Apr 12 21:34:47  [75707.119800]  [<ffffffff81007b85>] ?
>>> nmi_handle+0x65/0x100
>>> Apr 12 21:34:47  [75707.119838]  [<ffffffff81007d2e>] ? do_nmi+0x10e/0x360
>>> Apr 12 21:34:47  [75707.119878]  [<ffffffff8148f957>] ?
>>> end_repeat_nmi+0x1a/0x1e
>>> Apr 12 21:34:47  [75707.119920]  [<ffffffff810862ca>] ?
>>> queued_spin_lock_slowpath+0xea/0x150
>>> Apr 12 21:34:47  [75707.119962]  [<ffffffff810862ca>] ?
>>> queued_spin_lock_slowpath+0xea/0x150
>>> Apr 12 21:34:47  [75707.120002]  [<ffffffff810862ca>] ?
>>> queued_spin_lock_slowpath+0xea/0x150
>>> Apr 12 21:34:47  [75707.120042]  <<EOE>>
>>> Apr 12 21:34:47   [<ffffffffa01b413b>] ? make_request+0x60b/0xbd0 [raid456]
>>> Apr 12 21:34:47  [75707.120113]  [<ffffffff810815c0>] ? wait_woken+0x80/0x80
>>> Apr 12 21:34:47  [75707.120152]  [<ffffffffa017632d>] ?
>>> md_make_request+0xdd/0x220 [md_mod]
>>> Apr 12 21:34:47  [75707.120195]  [<ffffffff8128691d>] ?
>>> generic_make_request+0xed/0x1d0
>>> Apr 12 21:34:47  [75707.120236]  [<ffffffff81286a5a>] ?
>>> submit_bio+0x5a/0x140
>>> Apr 12 21:34:47  [75707.120277]  [<ffffffff8112afaf>] ?
>>> workingset_refault+0x4f/0xa0
>>> Apr 12 21:34:47  [75707.120320]  [<ffffffff811a215e>] ?
>>> mpage_bio_submit+0x1e/0x30
>>> Apr 12 21:34:47  [75707.120359]  [<ffffffff811a3076>] ?
>>> mpage_readpages+0x106/0x130
>>> Apr 12 21:34:47  [75707.120401]  [<ffffffff8121b510>] ?
>>> __xfs_get_blocks+0x750/0x750
>>> Apr 12 21:34:47  [75707.120439]  [<ffffffff8121b510>] ?
>>> __xfs_get_blocks+0x750/0x750
>>> Apr 12 21:34:47  [75707.120481]  [<ffffffff8114ad45>] ?
>>> alloc_pages_current+0x85/0x110
>>> Apr 12 21:34:47  [75707.120523]  [<ffffffff81111d25>] ?
>>> __do_page_cache_readahead+0x165/0x1f0
>>> Apr 12 21:34:47  [75707.120564]  [<ffffffff811344f5>] ? vma_link+0x75/0xb0
>>> Apr 12 21:34:47  [75707.120602]  [<ffffffff811120c7>] ?
>>> force_page_cache_readahead+0x77/0xe0
>>> Apr 12 21:34:47  [75707.120644]  [<ffffffff8113f876>] ?
>>> madvise_willneed+0x76/0x140
>>> Apr 12 21:34:47  [75707.120683]  [<ffffffff811301ce>] ?
>>> handle_mm_fault+0x9ae/0x1650
>>> Apr 12 21:34:47  [75707.120722]  [<ffffffff81133dcb>] ? find_vma+0x5b/0x70
>>> Apr 12 21:34:47  [75707.120760]  [<ffffffff8113fc52>] ?
>>> SyS_madvise+0x312/0x6f0
>>> Apr 12 21:34:47  [75707.120799]  [<ffffffff8148d9db>] ?
>>> entry_SYSCALL_64_fastpath+0x16/0x6e
>>> 
>>> Once this starts, a couple of minutes goes by and the machine locks up
>>> completely.
>>> 
>>> I have been unable to locate the problem here, anyone that can point me in
>>> the right direction?
>>> 
>>> Best regards
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
>>> the body of a message to majordomo@vger.kernel.org
>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
Daniel> --
Daniel> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
Daniel> the body of a message to majordomo@vger.kernel.org
Daniel> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Hard CPU Lockup when accessing MD RAID5
  2016-04-20 15:29     ` John Stoffel
@ 2016-04-21 22:47       ` Daniel Walker
  0 siblings, 0 replies; 5+ messages in thread
From: Daniel Walker @ 2016-04-21 22:47 UTC (permalink / raw)
  To: linux-raid

Hi,

Well, things have gone from bad to worse in my eyes..

We have had the following hardware replaced: Chassis, Motherboard, CPUs, 
RAM, SAS Cable, SAS Controller and the PSUs, basically we are down to 
just the harddrives and it is still crashing..

This is a rather long one :)

Apr 21 23:55:19  [  785.975018] NMI watchdog: Watchdog detected hard 
LOCKUP on cpu 1
Apr 21 23:55:19
Apr 21 23:55:19  [  785.975110] Modules linked in:
Apr 21 23:55:19   iptable_mangle
Apr 21 23:55:19   netconsole
Apr 21 23:55:19   configfs
Apr 21 23:55:19   tun
Apr 21 23:55:19   xt_multiport
Apr 21 23:55:19   ip6table_filter
Apr 21 23:55:19   ip6_tables
Apr 21 23:55:19   iptable_filter
Apr 21 23:55:19   ip_tables
Apr 21 23:55:19   x_tables
Apr 21 23:55:19   bridge
Apr 21 23:55:19   stp
Apr 21 23:55:19   llc
Apr 21 23:55:19   bonding
Apr 21 23:55:19   ext4
Apr 21 23:55:19   crc16
Apr 21 23:55:19   mbcache
Apr 21 23:55:19   jbd2
Apr 21 23:55:19   raid1
Apr 21 23:55:19   raid0
Apr 21 23:55:19   raid456
Apr 21 23:55:19   async_raid6_recov
Apr 21 23:55:19   async_memcpy
Apr 21 23:55:19   async_pq
Apr 21 23:55:19   async_xor
Apr 21 23:55:19   xor
Apr 21 23:55:19   async_tx
Apr 21 23:55:19   raid6_pq
Apr 21 23:55:19   md_mod
Apr 21 23:55:19   sg
Apr 21 23:55:19   sd_mod
Apr 21 23:55:19   hid_generic
Apr 21 23:55:19   usbhid
Apr 21 23:55:19   hid
Apr 21 23:55:19   iTCO_wdt
Apr 21 23:55:19   iTCO_vendor_support
Apr 21 23:55:19   x86_pkg_temp_thermal
Apr 21 23:55:19   intel_powerclamp
Apr 21 23:55:19   coretemp
Apr 21 23:55:19   crct10dif_pclmul
Apr 21 23:55:19   crc32_pclmul
Apr 21 23:55:19   crc32c_intel
Apr 21 23:55:19   ghash_clmulni_intel
Apr 21 23:55:19   cryptd
Apr 21 23:55:19   xhci_pci
Apr 21 23:55:19   ahci
Apr 21 23:55:19   igb
Apr 21 23:55:19   ehci_pci
Apr 21 23:55:19   i2c_algo_bit
Apr 21 23:55:19   xhci_hcd
Apr 21 23:55:19   ptp
Apr 21 23:55:19   ehci_hcd
Apr 21 23:55:19   libahci
Apr 21 23:55:19   mpt3sas
Apr 21 23:55:19   sb_edac
Apr 21 23:55:19   i2c_i801
Apr 21 23:55:19   pps_core
Apr 21 23:55:19   edac_core
Apr 21 23:55:19   mei_me
Apr 21 23:55:19   raid_class
Apr 21 23:55:19   lpc_ich
Apr 21 23:55:19   libata
Apr 21 23:55:19   scsi_transport_sas
Apr 21 23:55:19   usbcore
Apr 21 23:55:19   mfd_core
Apr 21 23:55:19   mei
Apr 21 23:55:19   usb_common
Apr 21 23:55:19   i2c_core
Apr 21 23:55:19   ioatdma
Apr 21 23:55:19   scsi_mod
Apr 21 23:55:19   dca
Apr 21 23:55:19   ipmi_si
Apr 21 23:55:19   ipmi_msghandler
Apr 21 23:55:19   acpi_power_meter
Apr 21 23:55:19   acpi_pad
Apr 21 23:55:19   tpm_tis
Apr 21 23:55:19   tpm
Apr 21 23:55:19   processor
Apr 21 23:55:19   button
Apr 21 23:55:19
Apr 21 23:55:19  [  785.980450] CPU: 1 PID: 14630 Comm: kworker/u65:2 
Not tainted 4.5.1 #1
Apr 21 23:55:19  [  785.980528] Hardware name: Supermicro Super 
Server/X10DRi-LN4+, BIOS 1.0b 01/29/2015
Apr 21 23:55:19  [  785.980616] Workqueue: writeback wb_workfn
Apr 21 23:55:19   (flush-9:11)
Apr 21 23:55:19
Apr 21 23:55:19  [  785.980818]  0000000000000000
Apr 21 23:55:19   ffff881fffc25bd0
Apr 21 23:55:19   ffffffff812e00b8
Apr 21 23:55:19   0000000000000000
Apr 21 23:55:19
Apr 21 23:55:19  [  785.981148]  0000000000000000
Apr 21 23:55:19   ffff881fffc25be8
Apr 21 23:55:19   ffffffff810dff1d
Apr 21 23:55:19   ffff881ff2cc0000
Apr 21 23:55:19
Apr 21 23:55:19  [  785.981479]  ffff881fffc25c20
Apr 21 23:55:19   ffffffff8110f8f8
Apr 21 23:55:19   0000000000000001
Apr 21 23:55:19   ffff881fffc2af00
Apr 21 23:55:19
Apr 21 23:55:19  [  785.981810] Call Trace:
Apr 21 23:55:19  [  785.981897]  <NMI>
Apr 21 23:55:19   [<ffffffff812e00b8>] dump_stack+0x4d/0x65
Apr 21 23:55:19  [  785.982065]  [<ffffffff810dff1d>] 
watchdog_overflow_callback+0xdd/0xf0
Apr 21 23:55:19  [  785.982165]  [<ffffffff8110f8f8>] 
__perf_event_overflow+0x88/0x1d0
Apr 21 23:55:19  [  785.982261]  [<ffffffff811103e4>] 
perf_event_overflow+0x14/0x20
Apr 21 23:55:19  [  785.982358]  [<ffffffff8101e320>] 
intel_pmu_handle_irq+0x1d0/0x4a0
Apr 21 23:55:19  [  785.982458]  [<ffffffff810162d8>] 
perf_event_nmi_handler+0x28/0x50
Apr 21 23:55:19  [  785.982554]  [<ffffffff81008121>] nmi_handle+0x61/0x110
Apr 21 23:55:19  [  785.982648]  [<ffffffff810082e7>] do_nmi+0x117/0x3e0
Apr 21 23:55:19  [  785.982746]  [<ffffffff814dae97>] 
end_repeat_nmi+0x1a/0x1e
Apr 21 23:55:19  [  785.982844]  [<ffffffffa01c4084>] ? 
__release_stripe+0x4/0x20 [raid456]
Apr 21 23:55:19  [  785.982941]  [<ffffffffa01c4084>] ? 
__release_stripe+0x4/0x20 [raid456]
Apr 21 23:55:19  [  785.983038]  [<ffffffffa01c4084>] ? 
__release_stripe+0x4/0x20 [raid456]
Apr 21 23:55:19  [  785.983134]  <<EOE>>
Apr 21 23:55:19   [<ffffffffa01c560b>] ? raid5_unplug+0x8b/0x130 [raid456]
Apr 21 23:55:19  [  785.983316]  [<ffffffff812b9b98>] 
blk_flush_plug_list+0xa8/0x210
Apr 21 23:55:19  [  785.983411]  [<ffffffff812ba0a4>] 
blk_finish_plug+0x24/0x40
Apr 21 23:55:19  [  785.983506]  [<ffffffff811b69a2>] 
wb_writeback+0x172/0x2d0
Apr 21 23:55:19  [  785.983600]  [<ffffffff811b716f>] wb_workfn+0x20f/0x3c0
Apr 21 23:55:19  [  785.983698]  [<ffffffff81067513>] 
process_one_work+0x143/0x400
Apr 21 23:55:19  [  785.983793]  [<ffffffff81067cc1>] 
worker_thread+0x61/0x490
Apr 21 23:55:19  [  785.983888]  [<ffffffff81067c60>] ? 
max_active_store+0x60/0x60
Apr 21 23:55:19  [  785.983983]  [<ffffffff81067c60>] ? 
max_active_store+0x60/0x60
Apr 21 23:55:19  [  785.984078]  [<ffffffff8106c926>] kthread+0xd6/0xf0
Apr 21 23:55:19  [  785.984171]  [<ffffffff810011f6>] ? 
exit_to_usermode_loop+0x76/0xb0
Apr 21 23:55:19  [  785.984266]  [<ffffffff8106c850>] ? 
kthread_park+0x50/0x50
Apr 21 23:55:19  [  785.984361]  [<ffffffff814d92af>] 
ret_from_fork+0x3f/0x70
Apr 21 23:55:19  [  785.984454]  [<ffffffff8106c850>] ? 
kthread_park+0x50/0x50
Apr 21 23:55:21  [  787.840894] NMI watchdog: Watchdog detected hard 
LOCKUP on cpu 13
Apr 21 23:55:21
Apr 21 23:55:21  [  787.840993] Modules linked in:
Apr 21 23:55:21   iptable_mangle
Apr 21 23:55:21   netconsole
Apr 21 23:55:21   configfs
Apr 21 23:55:21   tun
Apr 21 23:55:21   xt_multiport
Apr 21 23:55:21   ip6table_filter
Apr 21 23:55:21   ip6_tables
Apr 21 23:55:21   iptable_filter
Apr 21 23:55:21   ip_tables
Apr 21 23:55:21   x_tables
Apr 21 23:55:21   bridge
Apr 21 23:55:21   stp
Apr 21 23:55:21   llc
Apr 21 23:55:21   bonding
Apr 21 23:55:21   ext4
Apr 21 23:55:21   crc16
Apr 21 23:55:21   mbcache
Apr 21 23:55:21   jbd2
Apr 21 23:55:21   raid1
Apr 21 23:55:21   raid0
Apr 21 23:55:21   raid456
Apr 21 23:55:21   async_raid6_recov
Apr 21 23:55:21   async_memcpy
Apr 21 23:55:21   async_pq
Apr 21 23:55:21   async_xor
Apr 21 23:55:21   xor
Apr 21 23:55:21   async_tx
Apr 21 23:55:21   raid6_pq
Apr 21 23:55:21   md_mod
Apr 21 23:55:21   sg
Apr 21 23:55:21   sd_mod
Apr 21 23:55:21   hid_generic
Apr 21 23:55:21   usbhid
Apr 21 23:55:21   hid
Apr 21 23:55:21   iTCO_wdt
Apr 21 23:55:21   iTCO_vendor_support
Apr 21 23:55:21   x86_pkg_temp_thermal
Apr 21 23:55:21   intel_powerclamp
Apr 21 23:55:21   coretemp
Apr 21 23:55:21   crct10dif_pclmul
Apr 21 23:55:21   crc32_pclmul
Apr 21 23:55:21   crc32c_intel
Apr 21 23:55:21   ghash_clmulni_intel
Apr 21 23:55:21   cryptd
Apr 21 23:55:21   xhci_pci
Apr 21 23:55:21   ahci
Apr 21 23:55:21   igb
Apr 21 23:55:21   ehci_pci
Apr 21 23:55:21   i2c_algo_bit
Apr 21 23:55:21   xhci_hcd
Apr 21 23:55:21   ptp
Apr 21 23:55:21   ehci_hcd
Apr 21 23:55:21   libahci
Apr 21 23:55:21   mpt3sas
Apr 21 23:55:21   sb_edac
Apr 21 23:55:21   i2c_i801
Apr 21 23:55:21   pps_core
Apr 21 23:55:21   edac_core
Apr 21 23:55:21   mei_me
Apr 21 23:55:21   raid_class
Apr 21 23:55:21   lpc_ich
Apr 21 23:55:21   libata
Apr 21 23:55:21   scsi_transport_sas
Apr 21 23:55:21   usbcore
Apr 21 23:55:21   mfd_core
Apr 21 23:55:21   mei
Apr 21 23:55:21   usb_common
Apr 21 23:55:21   i2c_core
Apr 21 23:55:21   ioatdma
Apr 21 23:55:21   scsi_mod
Apr 21 23:55:21   dca
Apr 21 23:55:21   ipmi_si
Apr 21 23:55:21   ipmi_msghandler
Apr 21 23:55:21   acpi_power_meter
Apr 21 23:55:21   acpi_pad
Apr 21 23:55:21   tpm_tis
Apr 21 23:55:21   tpm
Apr 21 23:55:21   processor
Apr 21 23:55:21   button
Apr 21 23:55:21
Apr 21 23:55:21  [  787.848156] CPU: 13 PID: 16848 Comm: rtorrent main 
Not tainted 4.5.1 #1
Apr 21 23:55:21  [  787.848270] Hardware name: Supermicro Super 
Server/X10DRi-LN4+, BIOS 1.0b 01/29/2015
Apr 21 23:55:21  [  787.848403]  0000000000000000
Apr 21 23:55:21   ffff88407fca5bd0
Apr 21 23:55:21   ffffffff812e00b8
Apr 21 23:55:21   0000000000000000
Apr 21 23:55:21
Apr 21 23:55:21  [  787.848857]  0000000000000000
Apr 21 23:55:21   ffff88407fca5be8
Apr 21 23:55:21   ffffffff810dff1d
Apr 21 23:55:21   ffff883fea688000
Apr 21 23:55:21
Apr 21 23:55:21  [  787.849321]  ffff88407fca5c20
Apr 21 23:55:21   ffffffff8110f8f8
Apr 21 23:55:21   0000000000000001
Apr 21 23:55:21   ffff88407fcaaf00
Apr 21 23:55:21
Apr 21 23:55:21  [  787.849780] Call Trace:
Apr 21 23:55:21  [  787.849891]  <NMI>
Apr 21 23:55:21   [<ffffffff812e00b8>] dump_stack+0x4d/0x65
Apr 21 23:55:21  [  787.850091]  [<ffffffff810dff1d>] 
watchdog_overflow_callback+0xdd/0xf0
Apr 21 23:55:21  [  787.850211]  [<ffffffff8110f8f8>] 
__perf_event_overflow+0x88/0x1d0
Apr 21 23:55:21  [  787.850326]  [<ffffffff811103e4>] 
perf_event_overflow+0x14/0x20
Apr 21 23:55:21  [  787.850441]  [<ffffffff8101e320>] 
intel_pmu_handle_irq+0x1d0/0x4a0
Apr 21 23:55:21  [  787.850564]  [<ffffffff810162d8>] 
perf_event_nmi_handler+0x28/0x50
Apr 21 23:55:21  [  787.850677]  [<ffffffff81008121>] nmi_handle+0x61/0x110
Apr 21 23:55:21  [  787.850788]  [<ffffffff810083d1>] do_nmi+0x201/0x3e0
Apr 21 23:55:21  [  787.850910]  [<ffffffff814dae97>] 
end_repeat_nmi+0x1a/0x1e
Apr 21 23:55:21  [  787.851024]  [<ffffffff81090cc5>] ? 
queued_spin_lock_slowpath+0xf5/0x170
Apr 21 23:55:21  [  787.851142]  [<ffffffff81090cc5>] ? 
queued_spin_lock_slowpath+0xf5/0x170
Apr 21 23:55:21  [  787.851255]  [<ffffffff81090cc5>] ? 
queued_spin_lock_slowpath+0xf5/0x170
Apr 21 23:55:21  [  787.851367]  <<EOE>>
Apr 21 23:55:21   [<ffffffff814d8c6c>] _raw_spin_lock_irq+0x1c/0x20
Apr 21 23:55:21  [  787.851565]  [<ffffffffa01cd5d4>] 
raid5_make_request+0x6d4/0xce0 [raid456]
Apr 21 23:55:21  [  787.851680]  [<ffffffff812b824f>] ? 
generic_make_request+0x1f/0x1c0
Apr 21 23:55:21  [  787.851793]  [<ffffffff812bdc23>] ? 
blk_queue_split+0xb3/0x530
Apr 21 23:55:21  [  787.851907]  [<ffffffff8108bd90>] ? wait_woken+0x80/0x80
Apr 21 23:55:21  [  787.852021]  [<ffffffffa0110e43>] 
md_make_request+0xd3/0x210 [md_mod]
Apr 21 23:55:21  [  787.852135]  [<ffffffff81244923>] ? 
xfs_map_buffer.isra.15+0x33/0x60
Apr 21 23:55:21  [  787.852248]  [<ffffffff812b8319>] 
generic_make_request+0xe9/0x1c0
Apr 21 23:55:21  [  787.852365]  [<ffffffff812b8452>] submit_bio+0x62/0x150
Apr 21 23:55:21  [  787.852479]  [<ffffffff811c6f41>] 
do_mpage_readpage+0x2a1/0x6a0
Apr 21 23:55:21  [  787.852593]  [<ffffffff811286d9>] ? 
lru_cache_add+0x9/0x10
Apr 21 23:55:21  [  787.852704]  [<ffffffff811c7450>] 
mpage_readpages+0x110/0x170
Apr 21 23:55:21  [  787.852815]  [<ffffffff81246040>] ? 
__xfs_get_blocks+0x810/0x810
Apr 21 23:55:21  [  787.852927]  [<ffffffff81246040>] ? 
__xfs_get_blocks+0x810/0x810
Apr 21 23:55:21  [  787.853040]  [<ffffffff8116633d>] ? 
alloc_pages_current+0x8d/0x110
Apr 21 23:55:21  [  787.853152]  [<ffffffff812442f3>] 
xfs_vm_readpages+0x33/0x80
Apr 21 23:55:21  [  787.853265]  [<ffffffff81126585>] 
__do_page_cache_readahead+0x165/0x210
Apr 21 23:55:21  [  787.853381]  [<ffffffffa02cc397>] ? 
br_dev_xmit+0x137/0x1d0 [bridge]
Apr 21 23:55:21  [  787.853496]  [<ffffffff8111b1c7>] 
filemap_fault+0x427/0x4d0
Apr 21 23:55:21  [  787.853607]  [<ffffffff814d756d>] ? down_read+0xd/0x20
Apr 21 23:55:21  [  787.853719]  [<ffffffff8124fe20>] 
xfs_filemap_fault+0x40/0xa0
Apr 21 23:55:21  [  787.853833]  [<ffffffff81144fcd>] __do_fault+0x5d/0x110
Apr 21 23:55:21  [  787.853945]  [<ffffffff81148e34>] 
handle_mm_fault+0x1154/0x1b00
Apr 21 23:55:21  [  787.854058]  [<ffffffff81042ee1>] 
__do_page_fault+0x121/0x360
Apr 21 23:55:21  [  787.854170]  [<ffffffff8104315c>] do_page_fault+0xc/0x10
Apr 21 23:55:21  [  787.854282]  [<ffffffff814dab8f>] page_fault+0x1f/0x30
Apr 21 23:55:21  [  787.854395]  [<ffffffff812ec4f2>] ? 
copy_user_enhanced_fast_string+0x2/0x10
Apr 21 23:55:21  [  787.854510]  [<ffffffff812f25bc>] ? 
copy_from_iter+0x7c/0x260
Apr 21 23:55:21  [  787.854622]  [<ffffffff8143a448>] 
tcp_sendmsg+0xaa8/0xae0
Apr 21 23:55:21  [  787.854736]  [<ffffffff814631d0>] inet_sendmsg+0x60/0x90
Apr 21 23:55:21  [  787.854847]  [<ffffffff813d4da3>] sock_sendmsg+0x33/0x40
Apr 21 23:55:21  [  787.854959]  [<ffffffff813d51cf>] SYSC_sendto+0xef/0x170
Apr 21 23:55:21  [  787.855071]  [<ffffffff811363e8>] ? 
vm_mmap_pgoff+0x98/0xc0
Apr 21 23:55:21  [  787.855185]  [<ffffffff8114e075>] ? 
SyS_mmap_pgoff+0xe5/0x270
Apr 21 23:55:21  [  787.855297]  [<ffffffff813d5bc9>] SyS_sendto+0x9/0x10
Apr 21 23:55:21  [  787.855409]  [<ffffffff814d8f1b>] 
entry_SYSCALL_64_fastpath+0x16/0x6e
Apr 21 23:55:21  [  788.267238] NMI watchdog: Watchdog detected hard 
LOCKUP on cpu 6
Apr 21 23:55:21
Apr 21 23:55:21  [  788.267327] Modules linked in:
Apr 21 23:55:21   iptable_mangle
Apr 21 23:55:21   netconsole
Apr 21 23:55:21   configfs
Apr 21 23:55:21   tun
Apr 21 23:55:21   xt_multiport
Apr 21 23:55:21   ip6table_filter
Apr 21 23:55:21   ip6_tables
Apr 21 23:55:21   iptable_filter
Apr 21 23:55:21   ip_tables
Apr 21 23:55:21   x_tables
Apr 21 23:55:21   bridge
Apr 21 23:55:21   stp
Apr 21 23:55:21   llc
Apr 21 23:55:21   bonding
Apr 21 23:55:21   ext4
Apr 21 23:55:21   crc16
Apr 21 23:55:21   mbcache
Apr 21 23:55:21   jbd2
Apr 21 23:55:21   raid1
Apr 21 23:55:21   raid0
Apr 21 23:55:21   raid456
Apr 21 23:55:21   async_raid6_recov
Apr 21 23:55:21   async_memcpy
Apr 21 23:55:21   async_pq
Apr 21 23:55:21   async_xor
Apr 21 23:55:21   xor
Apr 21 23:55:21   async_tx
Apr 21 23:55:21   raid6_pq
Apr 21 23:55:21   md_mod
Apr 21 23:55:21   sg
Apr 21 23:55:21   sd_mod
Apr 21 23:55:21   hid_generic
Apr 21 23:55:21   usbhid
Apr 21 23:55:21   hid
Apr 21 23:55:21   iTCO_wdt
Apr 21 23:55:21   iTCO_vendor_support
Apr 21 23:55:21   x86_pkg_temp_thermal
Apr 21 23:55:21   intel_powerclamp
Apr 21 23:55:21   coretemp
Apr 21 23:55:21   crct10dif_pclmul
Apr 21 23:55:21   crc32_pclmul
Apr 21 23:55:21   crc32c_intel
Apr 21 23:55:21   ghash_clmulni_intel
Apr 21 23:55:21   cryptd
Apr 21 23:55:21   xhci_pci
Apr 21 23:55:21   ahci
Apr 21 23:55:21   igb
Apr 21 23:55:21   ehci_pci
Apr 21 23:55:21   i2c_algo_bit
Apr 21 23:55:21   xhci_hcd
Apr 21 23:55:21   ptp
Apr 21 23:55:21   ehci_hcd
Apr 21 23:55:21   libahci
Apr 21 23:55:21   mpt3sas
Apr 21 23:55:21   sb_edac
Apr 21 23:55:21   i2c_i801
Apr 21 23:55:21   pps_core
Apr 21 23:55:21   edac_core
Apr 21 23:55:21   mei_me
Apr 21 23:55:21   raid_class
Apr 21 23:55:21   lpc_ich
Apr 21 23:55:21   libata
Apr 21 23:55:21   scsi_transport_sas
Apr 21 23:55:21   usbcore
Apr 21 23:55:21   mfd_core
Apr 21 23:55:21   mei
Apr 21 23:55:21   usb_common
Apr 21 23:55:21   i2c_core
Apr 21 23:55:21   ioatdma
Apr 21 23:55:21   scsi_mod
Apr 21 23:55:21   dca
Apr 21 23:55:21   ipmi_si
Apr 21 23:55:21   ipmi_msghandler
Apr 21 23:55:21   acpi_power_meter
Apr 21 23:55:21   acpi_pad
Apr 21 23:55:21   tpm_tis
Apr 21 23:55:21   tpm
Apr 21 23:55:21   processor
Apr 21 23:55:21   button
Apr 21 23:55:21
Apr 21 23:55:21  [  788.273235] CPU: 6 PID: 12760 Comm: rtorrent main 
Not tainted 4.5.1 #1
Apr 21 23:55:21  [  788.273337] Hardware name: Supermicro Super 
Server/X10DRi-LN4+, BIOS 1.0b 01/29/2015
Apr 21 23:55:21  [  788.273454]  0000000000000000
Apr 21 23:55:21   ffff881fffcc5bd0
Apr 21 23:55:21   ffffffff812e00b8
Apr 21 23:55:21   0000000000000000
Apr 21 23:55:21
Apr 21 23:55:21  [  788.273827]  0000000000000000
Apr 21 23:55:21   ffff881fffcc5be8
Apr 21 23:55:21   ffffffff810dff1d
Apr 21 23:55:21   ffff881ff2fc8000
Apr 21 23:55:21
Apr 21 23:55:21  [  788.274193]  ffff881fffcc5c20
Apr 21 23:55:21   ffffffff8110f8f8
Apr 21 23:55:21   0000000000000001
Apr 21 23:55:21   ffff881fffccaf00
Apr 21 23:55:21
Apr 21 23:55:21  [  788.274564] Call Trace:
Apr 21 23:55:21  [  788.274650]  <NMI>
Apr 21 23:55:21   [<ffffffff812e00b8>] dump_stack+0x4d/0x65
Apr 21 23:55:21  [  788.274815]  [<ffffffff810dff1d>] 
watchdog_overflow_callback+0xdd/0xf0
Apr 21 23:55:21  [  788.274913]  [<ffffffff8110f8f8>] 
__perf_event_overflow+0x88/0x1d0
Apr 21 23:55:21  [  788.275010]  [<ffffffff811103e4>] 
perf_event_overflow+0x14/0x20
Apr 21 23:55:21  [  788.275106]  [<ffffffff8101e320>] 
intel_pmu_handle_irq+0x1d0/0x4a0
Apr 21 23:55:21  [  788.275203]  [<ffffffff810162d8>] 
perf_event_nmi_handler+0x28/0x50
Apr 21 23:55:21  [  788.275299]  [<ffffffff81008121>] nmi_handle+0x61/0x110
Apr 21 23:55:21  [  788.275392]  [<ffffffff810082e7>] do_nmi+0x117/0x3e0
Apr 21 23:55:21  [  788.275487]  [<ffffffff814dae97>] 
end_repeat_nmi+0x1a/0x1e
Apr 21 23:55:21  [  788.275582]  [<ffffffff81090cc5>] ? 
queued_spin_lock_slowpath+0xf5/0x170
Apr 21 23:55:21  [  788.275678]  [<ffffffff81090cc5>] ? 
queued_spin_lock_slowpath+0xf5/0x170
Apr 21 23:55:21  [  788.275773]  [<ffffffff81090cc5>] ? 
queued_spin_lock_slowpath+0xf5/0x170
Apr 21 23:55:21  [  788.275868]  <<EOE>>
Apr 21 23:55:21   [<ffffffff814d8c6c>] _raw_spin_lock_irq+0x1c/0x20
Apr 21 23:55:21  [  788.276030]  [<ffffffffa01cd5d4>] 
raid5_make_request+0x6d4/0xce0 [raid456]
Apr 21 23:55:21  [  788.276128]  [<ffffffff812b824f>] ? 
generic_make_request+0x1f/0x1c0
Apr 21 23:55:21  [  788.276225]  [<ffffffff812bdc23>] ? 
blk_queue_split+0xb3/0x530
Apr 21 23:55:21  [  788.276321]  [<ffffffff8108bd90>] ? wait_woken+0x80/0x80
Apr 21 23:55:21  [  788.276416]  [<ffffffffa0110e43>] 
md_make_request+0xd3/0x210 [md_mod]
Apr 21 23:55:21  [  788.276512]  [<ffffffff812b8319>] 
generic_make_request+0xe9/0x1c0
Apr 21 23:55:21  [  788.276607]  [<ffffffff812b8452>] submit_bio+0x62/0x150
Apr 21 23:55:21  [  788.276702]  [<ffffffff81127e05>] ? 
__pagevec_lru_add_fn+0x105/0x1e0
Apr 21 23:55:21  [  788.276798]  [<ffffffff811c6f90>] 
do_mpage_readpage+0x2f0/0x6a0
Apr 21 23:55:21  [  788.276893]  [<ffffffff811286d9>] ? 
lru_cache_add+0x9/0x10
Apr 21 23:55:21  [  788.276986]  [<ffffffff811c7450>] 
mpage_readpages+0x110/0x170
Apr 21 23:55:21  [  788.277081]  [<ffffffff81246040>] ? 
__xfs_get_blocks+0x810/0x810
Apr 21 23:55:21  [  788.277175]  [<ffffffff81246040>] ? 
__xfs_get_blocks+0x810/0x810
Apr 21 23:55:21  [  788.277271]  [<ffffffff8116633d>] ? 
alloc_pages_current+0x8d/0x110
Apr 21 23:55:21  [  788.277366]  [<ffffffff812442f3>] 
xfs_vm_readpages+0x33/0x80
Apr 21 23:55:21  [  788.277460]  [<ffffffff81126585>] 
__do_page_cache_readahead+0x165/0x210
Apr 21 23:55:21  [  788.277557]  [<ffffffff8111b1c7>] 
filemap_fault+0x427/0x4d0
Apr 21 23:55:21  [  788.277651]  [<ffffffff814d756d>] ? down_read+0xd/0x20
Apr 21 23:55:21  [  788.277744]  [<ffffffff8124fe20>] 
xfs_filemap_fault+0x40/0xa0
Apr 21 23:55:21  [  788.277840]  [<ffffffff81144fcd>] __do_fault+0x5d/0x110
Apr 21 23:55:21  [  788.277933]  [<ffffffff81148e34>] 
handle_mm_fault+0x1154/0x1b00
Apr 21 23:55:21  [  788.278029]  [<ffffffff81042ee1>] 
__do_page_fault+0x121/0x360
Apr 21 23:55:21  [  788.278123]  [<ffffffff8104315c>] do_page_fault+0xc/0x10
Apr 21 23:55:21  [  788.278216]  [<ffffffff814dab8f>] page_fault+0x1f/0x30
Apr 21 23:55:21  [  788.278311]  [<ffffffff812ec4f2>] ? 
copy_user_enhanced_fast_string+0x2/0x10
Apr 21 23:55:21  [  788.278410]  [<ffffffff812f25bc>] ? 
copy_from_iter+0x7c/0x260
Apr 21 23:55:21  [  788.278505]  [<ffffffff81439f78>] 
tcp_sendmsg+0x5d8/0xae0
Apr 21 23:55:21  [  788.278600]  [<ffffffff814631d0>] inet_sendmsg+0x60/0x90
Apr 21 23:55:21  [  788.278694]  [<ffffffff813d4da3>] sock_sendmsg+0x33/0x40
Apr 21 23:55:21  [  788.278787]  [<ffffffff813d51cf>] SYSC_sendto+0xef/0x170
Apr 21 23:55:21  [  788.278880]  [<ffffffff813d5bc9>] SyS_sendto+0x9/0x10
Apr 21 23:55:21  [  788.278973]  [<ffffffff814d8f1b>] 
entry_SYSCALL_64_fastpath+0x16/0x6e
Apr 21 23:55:23  [  790.117129] NMI watchdog: Watchdog detected hard 
LOCKUP on cpu 3
Apr 21 23:55:23
Apr 21 23:55:23  [  790.117222] Modules linked in:
Apr 21 23:55:23   iptable_mangle
Apr 21 23:55:23   netconsole
Apr 21 23:55:23   configfs
Apr 21 23:55:23   tun
Apr 21 23:55:23   xt_multiport
Apr 21 23:55:23   ip6table_filter
Apr 21 23:55:23   ip6_tables
Apr 21 23:55:23   iptable_filter
Apr 21 23:55:23   ip_tables
Apr 21 23:55:23   x_tables
Apr 21 23:55:23   bridge
Apr 21 23:55:23   stp
Apr 21 23:55:23   llc
Apr 21 23:55:23   bonding
Apr 21 23:55:23   ext4
Apr 21 23:55:23   crc16
Apr 21 23:55:23   mbcache
Apr 21 23:55:23   jbd2
Apr 21 23:55:23   raid1
Apr 21 23:55:23   raid0
Apr 21 23:55:23   raid456
Apr 21 23:55:23   async_raid6_recov
Apr 21 23:55:23   async_memcpy
Apr 21 23:55:23   async_pq
Apr 21 23:55:23   async_xor
Apr 21 23:55:23   xor
Apr 21 23:55:23   async_tx
Apr 21 23:55:23   raid6_pq
Apr 21 23:55:23   md_mod
Apr 21 23:55:23   sg
Apr 21 23:55:23   sd_mod
Apr 21 23:55:23   hid_generic
Apr 21 23:55:23   usbhid
Apr 21 23:55:23   hid
Apr 21 23:55:23   iTCO_wdt
Apr 21 23:55:23   iTCO_vendor_support
Apr 21 23:55:23   x86_pkg_temp_thermal
Apr 21 23:55:23   intel_powerclamp
Apr 21 23:55:23   coretemp
Apr 21 23:55:23   crct10dif_pclmul
Apr 21 23:55:23   crc32_pclmul
Apr 21 23:55:23   crc32c_intel
Apr 21 23:55:23   ghash_clmulni_intel
Apr 21 23:55:23   cryptd
Apr 21 23:55:23   xhci_pci
Apr 21 23:55:23   ahci
Apr 21 23:55:23   igb
Apr 21 23:55:23   ehci_pci
Apr 21 23:55:23   i2c_algo_bit
Apr 21 23:55:23   xhci_hcd
Apr 21 23:55:23   ptp
Apr 21 23:55:23   ehci_hcd
Apr 21 23:55:23   libahci
Apr 21 23:55:23   mpt3sas
Apr 21 23:55:23   sb_edac
Apr 21 23:55:23   i2c_i801
Apr 21 23:55:23   pps_core
Apr 21 23:55:23   edac_core
Apr 21 23:55:23   mei_me
Apr 21 23:55:23   raid_class
Apr 21 23:55:23   lpc_ich
Apr 21 23:55:23   libata
Apr 21 23:55:23   scsi_transport_sas
Apr 21 23:55:23   usbcore
Apr 21 23:55:23   mfd_core
Apr 21 23:55:23   mei
Apr 21 23:55:23   usb_common
Apr 21 23:55:23   i2c_core
Apr 21 23:55:23   ioatdma
Apr 21 23:55:23   scsi_mod
Apr 21 23:55:23   dca
Apr 21 23:55:23   ipmi_si
Apr 21 23:55:23   ipmi_msghandler
Apr 21 23:55:23   acpi_power_meter
Apr 21 23:55:23   acpi_pad
Apr 21 23:55:23   tpm_tis
Apr 21 23:55:23   tpm
Apr 21 23:55:23   processor
Apr 21 23:55:23   button
Apr 21 23:55:23
Apr 21 23:55:23  [  790.127050] CPU: 3 PID: 785 Comm: md11_raid5 Not 
tainted 4.5.1 #1
Apr 21 23:55:23  [  790.127145] Hardware name: Supermicro Super 
Server/X10DRi-LN4+, BIOS 1.0b 01/29/2015
Apr 21 23:55:23  [  790.127261]  0000000000000000
Apr 21 23:55:23   ffff881fffc65bd0
Apr 21 23:55:23   ffffffff812e00b8
Apr 21 23:55:23   0000000000000000
Apr 21 23:55:23
Apr 21 23:55:23  [  790.127630]  0000000000000000
Apr 21 23:55:23   ffff881fffc65be8
Apr 21 23:55:23   ffffffff810dff1d
Apr 21 23:55:23   ffff881ff2f10000
Apr 21 23:55:23
Apr 21 23:55:23  [  790.127999]  ffff881fffc65c20
Apr 21 23:55:23   ffffffff8110f8f8
Apr 21 23:55:23   0000000000000001
Apr 21 23:55:23   ffff881fffc6af00
Apr 21 23:55:23
Apr 21 23:55:23  [  790.128365] Call Trace:
Apr 21 23:55:23  [  790.128451]  <NMI>
Apr 21 23:55:23   [<ffffffff812e00b8>] dump_stack+0x4d/0x65
Apr 21 23:55:23  [  790.128620]  [<ffffffff810dff1d>] 
watchdog_overflow_callback+0xdd/0xf0
Apr 21 23:55:23  [  790.128720]  [<ffffffff8110f8f8>] 
__perf_event_overflow+0x88/0x1d0
Apr 21 23:55:23  [  790.128816]  [<ffffffff811103e4>] 
perf_event_overflow+0x14/0x20
Apr 21 23:55:23  [  790.128912]  [<ffffffff8101e320>] 
intel_pmu_handle_irq+0x1d0/0x4a0
Apr 21 23:55:23  [  790.129012]  [<ffffffff810162d8>] 
perf_event_nmi_handler+0x28/0x50
Apr 21 23:55:23  [  790.129111]  [<ffffffff81008121>] nmi_handle+0x61/0x110
Apr 21 23:55:23  [  790.129211]  [<ffffffff810083d1>] do_nmi+0x201/0x3e0
Apr 21 23:55:23  [  790.129308]  [<ffffffff814dae97>] 
end_repeat_nmi+0x1a/0x1e
Apr 21 23:55:23  [  790.129403]  [<ffffffff81090d23>] ? 
queued_spin_lock_slowpath+0x153/0x170
Apr 21 23:55:23  [  790.129499]  [<ffffffff81090d23>] ? 
queued_spin_lock_slowpath+0x153/0x170
Apr 21 23:55:23  [  790.129600]  [<ffffffff81090d23>] ? 
queued_spin_lock_slowpath+0x153/0x170
Apr 21 23:55:23  [  790.129696]  <<EOE>>
Apr 21 23:55:23   [<ffffffff814d8c6c>] _raw_spin_lock_irq+0x1c/0x20
Apr 21 23:55:23  [  790.129865]  [<ffffffffa01d031b>] 
handle_active_stripes.isra.55+0x1ab/0x4b0 [raid456]
Apr 21 23:55:23  [  790.129982]  [<ffffffffa01d0aa9>] raid5d+0x489/0x720 
[raid456]
Apr 21 23:55:23  [  790.130081]  [<ffffffff810a4830>] ? 
trace_event_raw_event_tick_stop+0x100/0x100
Apr 21 23:55:23  [  790.130200]  [<ffffffffa011074b>] 
md_thread+0x12b/0x130 [md_mod]
Apr 21 23:55:23  [  790.130299]  [<ffffffff8108bd90>] ? wait_woken+0x80/0x80
Apr 21 23:55:23  [  790.130398]  [<ffffffffa0110620>] ? 
find_pers+0x70/0x70 [md_mod]
Apr 21 23:55:23  [  790.130494]  [<ffffffff8106c926>] kthread+0xd6/0xf0
Apr 21 23:55:23  [  790.130586]  [<ffffffff8106c850>] ? 
kthread_park+0x50/0x50
Apr 21 23:55:23  [  790.130683]  [<ffffffff814d92af>] 
ret_from_fork+0x3f/0x70
Apr 21 23:55:23  [  790.130780]  [<ffffffff8106c850>] ? 
kthread_park+0x50/0x50
Apr 21 23:55:25  [  791.957594] NMI watchdog: Watchdog detected hard 
LOCKUP on cpu 17
Apr 21 23:55:25
Apr 21 23:55:25  [  791.958139] Modules linked in:
Apr 21 23:55:25   iptable_mangle
Apr 21 23:55:25   netconsole
Apr 21 23:55:25   configfs
Apr 21 23:55:25   tun
Apr 21 23:55:25   xt_multiport
Apr 21 23:55:25   ip6table_filter
Apr 21 23:55:25   ip6_tables
Apr 21 23:55:25   iptable_filter
Apr 21 23:55:25   ip_tables
Apr 21 23:55:25   x_tables
Apr 21 23:55:25   bridge
Apr 21 23:55:25   stp
Apr 21 23:55:25   llc
Apr 21 23:55:25   bonding
Apr 21 23:55:25   ext4
Apr 21 23:55:25   crc16
Apr 21 23:55:25   mbcache
Apr 21 23:55:25   jbd2
Apr 21 23:55:25   raid1
Apr 21 23:55:25   raid0
Apr 21 23:55:25   raid456
Apr 21 23:55:25   async_raid6_recov
Apr 21 23:55:25   async_memcpy
Apr 21 23:55:25   async_pq
Apr 21 23:55:25   async_xor
Apr 21 23:55:25   xor
Apr 21 23:55:25   async_tx
Apr 21 23:55:25   raid6_pq
Apr 21 23:55:25   md_mod
Apr 21 23:55:25   sg
Apr 21 23:55:25   sd_mod
Apr 21 23:55:25   hid_generic
Apr 21 23:55:25   usbhid
Apr 21 23:55:25   hid
Apr 21 23:55:25   iTCO_wdt
Apr 21 23:55:25   iTCO_vendor_support
Apr 21 23:55:25   x86_pkg_temp_thermal
Apr 21 23:55:25   intel_powerclamp
Apr 21 23:55:25   coretemp
Apr 21 23:55:25   crct10dif_pclmul
Apr 21 23:55:25   crc32_pclmul
Apr 21 23:55:25   crc32c_intel
Apr 21 23:55:25   ghash_clmulni_intel
Apr 21 23:55:25   cryptd
Apr 21 23:55:25   xhci_pci
Apr 21 23:55:25   ahci
Apr 21 23:55:25   igb
Apr 21 23:55:25   ehci_pci
Apr 21 23:55:25   i2c_algo_bit
Apr 21 23:55:25   xhci_hcd
Apr 21 23:55:25   ptp
Apr 21 23:55:25   ehci_hcd
Apr 21 23:55:25   libahci
Apr 21 23:55:25   mpt3sas
Apr 21 23:55:25   sb_edac
Apr 21 23:55:25   i2c_i801
Apr 21 23:55:25   pps_core
Apr 21 23:55:25   edac_core
Apr 21 23:55:25   mei_me
Apr 21 23:55:25   raid_class
Apr 21 23:55:25   lpc_ich
Apr 21 23:55:25   libata
Apr 21 23:55:25   scsi_transport_sas
Apr 21 23:55:25   usbcore
Apr 21 23:55:25   mfd_core
Apr 21 23:55:25   mei
Apr 21 23:55:25   usb_common
Apr 21 23:55:25   i2c_core
Apr 21 23:55:25   ioatdma
Apr 21 23:55:25   scsi_mod
Apr 21 23:55:25   dca
Apr 21 23:55:25   ipmi_si
Apr 21 23:55:25   ipmi_msghandler
Apr 21 23:55:25   acpi_power_meter
Apr 21 23:55:25   acpi_pad
Apr 21 23:55:25   tpm_tis
Apr 21 23:55:25   tpm
Apr 21 23:55:25   processor
Apr 21 23:55:25   button
Apr 21 23:55:25
Apr 21 23:55:25  [  791.964341] CPU: 17 PID: 18101 Comm: rtorrent main 
Not tainted 4.5.1 #1
Apr 21 23:55:25  [  791.964443] Hardware name: Supermicro Super 
Server/X10DRi-LN4+, BIOS 1.0b 01/29/2015
Apr 21 23:55:25  [  791.964567]  0000000000000000
Apr 21 23:55:25   ffff881fffd25bd0
Apr 21 23:55:25   ffffffff812e00b8
Apr 21 23:55:25   0000000000000000
Apr 21 23:55:25
Apr 21 23:55:25  [  791.964968]  0000000000000000
Apr 21 23:55:25   ffff881fffd25be8
Apr 21 23:55:25   ffffffff810dff1d
Apr 21 23:55:25   ffff881ff2890000
Apr 21 23:55:25
Apr 21 23:55:25  [  791.965369]  ffff881fffd25c20
Apr 21 23:55:25   ffffffff8110f8f8
Apr 21 23:55:25   0000000000000001
Apr 21 23:55:25   ffff881fffd2af00
Apr 21 23:55:25
Apr 21 23:55:25  [  791.965773] Call Trace:
Apr 21 23:55:25  [  791.965867]  <NMI>
Apr 21 23:55:25   [<ffffffff812e00b8>] dump_stack+0x4d/0x65
Apr 21 23:55:25  [  791.966053]  [<ffffffff810dff1d>] 
watchdog_overflow_callback+0xdd/0xf0
Apr 21 23:55:25  [  791.966161]  [<ffffffff8110f8f8>] 
__perf_event_overflow+0x88/0x1d0
Apr 21 23:55:25  [  791.966264]  [<ffffffff811103e4>] 
perf_event_overflow+0x14/0x20
Apr 21 23:55:25  [  791.966368]  [<ffffffff8101e320>] 
intel_pmu_handle_irq+0x1d0/0x4a0
Apr 21 23:55:25  [  791.966473]  [<ffffffff810162d8>] 
perf_event_nmi_handler+0x28/0x50
Apr 21 23:55:25  [  791.966577]  [<ffffffff81008121>] nmi_handle+0x61/0x110
Apr 21 23:55:25  [  791.966677]  [<ffffffff810083d1>] do_nmi+0x201/0x3e0
Apr 21 23:55:25  [  791.966778]  [<ffffffff814dae97>] 
end_repeat_nmi+0x1a/0x1e
Apr 21 23:55:25  [  791.966881]  [<ffffffff81090cd9>] ? 
queued_spin_lock_slowpath+0x109/0x170
Apr 21 23:55:25  [  791.966984]  [<ffffffff81090cd9>] ? 
queued_spin_lock_slowpath+0x109/0x170
Apr 21 23:55:25  [  791.967088]  [<ffffffff81090cd9>] ? 
queued_spin_lock_slowpath+0x109/0x170
Apr 21 23:55:25  [  791.967197]  <<EOE>>
Apr 21 23:55:25   [<ffffffff814d8c6c>] _raw_spin_lock_irq+0x1c/0x20
Apr 21 23:55:25  [  791.967376]  [<ffffffffa01cd5d4>] 
raid5_make_request+0x6d4/0xce0 [raid456]
Apr 21 23:55:25  [  791.967484]  [<ffffffff81217c3d>] ? 
xfs_bmap_search_extents+0x7d/0x100
Apr 21 23:55:25  [  791.967590]  [<ffffffff8108bd90>] ? wait_woken+0x80/0x80
Apr 21 23:55:25  [  791.967693]  [<ffffffffa0110e43>] 
md_make_request+0xd3/0x210 [md_mod]
Apr 21 23:55:25  [  791.967799]  [<ffffffff812b8319>] 
generic_make_request+0xe9/0x1c0
Apr 21 23:55:25  [  791.967903]  [<ffffffff812b8452>] submit_bio+0x62/0x150
Apr 21 23:55:25  [  791.968006]  [<ffffffff81127e05>] ? 
__pagevec_lru_add_fn+0x105/0x1e0
Apr 21 23:55:25  [  791.968110]  [<ffffffff811c6f90>] 
do_mpage_readpage+0x2f0/0x6a0
Apr 21 23:55:25  [  791.968213]  [<ffffffff811286d9>] ? 
lru_cache_add+0x9/0x10
Apr 21 23:55:25  [  791.968314]  [<ffffffff811c7450>] 
mpage_readpages+0x110/0x170
Apr 21 23:55:25  [  791.968420]  [<ffffffff81246040>] ? 
__xfs_get_blocks+0x810/0x810
Apr 21 23:55:25  [  791.968522]  [<ffffffff81246040>] ? 
__xfs_get_blocks+0x810/0x810
Apr 21 23:55:25  [  791.968626]  [<ffffffff8116633d>] ? 
alloc_pages_current+0x8d/0x110
Apr 21 23:55:25  [  791.968912]  [<ffffffff812442f3>] 
xfs_vm_readpages+0x33/0x80
Apr 21 23:55:25  [  791.969015]  [<ffffffff81126585>] 
__do_page_cache_readahead+0x165/0x210
Apr 21 23:55:25  [  791.969121]  [<ffffffff8111b1c7>] 
filemap_fault+0x427/0x4d0
Apr 21 23:55:25  [  791.969223]  [<ffffffff814d756d>] ? down_read+0xd/0x20
Apr 21 23:55:25  [  791.969325]  [<ffffffff8124fe20>] 
xfs_filemap_fault+0x40/0xa0
Apr 21 23:55:25  [  791.969429]  [<ffffffff81144fcd>] __do_fault+0x5d/0x110
Apr 21 23:55:25  [  791.969531]  [<ffffffff81148e34>] 
handle_mm_fault+0x1154/0x1b00
Apr 21 23:55:25  [  791.969635]  [<ffffffff810a49bf>] ? 
lock_timer_base.isra.34+0x4f/0x70
Apr 21 23:55:25  [  791.969741]  [<ffffffff81042ee1>] 
__do_page_fault+0x121/0x360
Apr 21 23:55:25  [  791.969842]  [<ffffffff8104315c>] do_page_fault+0xc/0x10
Apr 21 23:55:25  [  791.969944]  [<ffffffff814dab8f>] page_fault+0x1f/0x30
Apr 21 23:55:25  [  791.970047]  [<ffffffff812ec4f2>] ? 
copy_user_enhanced_fast_string+0x2/0x10
Apr 21 23:55:25  [  791.970152]  [<ffffffff812f25bc>] ? 
copy_from_iter+0x7c/0x260
Apr 21 23:55:25  [  791.970255]  [<ffffffff8143a448>] 
tcp_sendmsg+0xaa8/0xae0
Apr 21 23:55:25  [  791.970359]  [<ffffffff814631d0>] inet_sendmsg+0x60/0x90
Apr 21 23:55:25  [  791.970462]  [<ffffffff813d4da3>] sock_sendmsg+0x33/0x40
Apr 21 23:55:25  [  791.970562]  [<ffffffff813d51cf>] SYSC_sendto+0xef/0x170
Apr 21 23:55:25  [  791.970664]  [<ffffffff81042efe>] ? 
__do_page_fault+0x13e/0x360
Apr 21 23:55:25  [  791.970766]  [<ffffffff813d5bc9>] SyS_sendto+0x9/0x10
Apr 21 23:55:25  [  791.970868]  [<ffffffff814d8f1b>] 
entry_SYSCALL_64_fastpath+0x16/0x6e
Apr 21 23:55:26  [  793.219426] NMI watchdog: Watchdog detected hard 
LOCKUP on cpu 0
Apr 21 23:55:26
Apr 21 23:55:26  [  793.219517] Modules linked in:
Apr 21 23:55:26   iptable_mangle
Apr 21 23:55:26   netconsole
Apr 21 23:55:26   configfs
Apr 21 23:55:26   tun
Apr 21 23:55:26   xt_multiport
Apr 21 23:55:26   ip6table_filter
Apr 21 23:55:26   ip6_tables
Apr 21 23:55:26   iptable_filter
Apr 21 23:55:26   ip_tables
Apr 21 23:55:26   x_tables
Apr 21 23:55:26   bridge
Apr 21 23:55:26   stp
Apr 21 23:55:26   llc
Apr 21 23:55:26   bonding
Apr 21 23:55:26   ext4
Apr 21 23:55:26   crc16
Apr 21 23:55:26   mbcache
Apr 21 23:55:26   jbd2
Apr 21 23:55:26   raid1
Apr 21 23:55:26   raid0
Apr 21 23:55:26   raid456
Apr 21 23:55:26   async_raid6_recov
Apr 21 23:55:26   async_memcpy
Apr 21 23:55:26   async_pq
Apr 21 23:55:26   async_xor
Apr 21 23:55:26   xor
Apr 21 23:55:26   async_tx
Apr 21 23:55:26   raid6_pq
Apr 21 23:55:26   md_mod
Apr 21 23:55:26   sg
Apr 21 23:55:26   sd_mod
Apr 21 23:55:26   hid_generic
Apr 21 23:55:26   usbhid
Apr 21 23:55:26   hid
Apr 21 23:55:26   iTCO_wdt
Apr 21 23:55:26   iTCO_vendor_support
Apr 21 23:55:26   x86_pkg_temp_thermal
Apr 21 23:55:26   intel_powerclamp
Apr 21 23:55:26   coretemp
Apr 21 23:55:26   crct10dif_pclmul
Apr 21 23:55:26   crc32_pclmul
Apr 21 23:55:26   crc32c_intel
Apr 21 23:55:26   ghash_clmulni_intel
Apr 21 23:55:26   cryptd
Apr 21 23:55:26   xhci_pci
Apr 21 23:55:26   ahci
Apr 21 23:55:26   igb
Apr 21 23:55:26   ehci_pci
Apr 21 23:55:26   i2c_algo_bit
Apr 21 23:55:26   xhci_hcd
Apr 21 23:55:26   ptp
Apr 21 23:55:26   ehci_hcd
Apr 21 23:55:26   libahci
Apr 21 23:55:26   mpt3sas
Apr 21 23:55:26   sb_edac
Apr 21 23:55:26   i2c_i801
Apr 21 23:55:26   pps_core
Apr 21 23:55:26   edac_core
Apr 21 23:55:26   mei_me
Apr 21 23:55:26   raid_class
Apr 21 23:55:26   lpc_ich
Apr 21 23:55:26   libata
Apr 21 23:55:26   scsi_transport_sas
Apr 21 23:55:26   usbcore
Apr 21 23:55:26   mfd_core
Apr 21 23:55:26   mei
Apr 21 23:55:26   usb_common
Apr 21 23:55:26   i2c_core
Apr 21 23:55:26   ioatdma
Apr 21 23:55:26   scsi_mod
Apr 21 23:55:26   dca
Apr 21 23:55:26   ipmi_si
Apr 21 23:55:26   ipmi_msghandler
Apr 21 23:55:26   acpi_power_meter
Apr 21 23:55:26   acpi_pad
Apr 21 23:55:26   tpm_tis
Apr 21 23:55:26   tpm
Apr 21 23:55:26   processor
Apr 21 23:55:26   button
Apr 21 23:55:26
Apr 21 23:55:26  [  793.224979] CPU: 0 PID: 17378 Comm: rtorrent main 
Not tainted 4.5.1 #1
Apr 21 23:55:26  [  793.225075] Hardware name: Supermicro Super 
Server/X10DRi-LN4+, BIOS 1.0b 01/29/2015
Apr 21 23:55:26  [  793.225190]  0000000000000000
Apr 21 23:55:26   ffff881fffc05bd0
Apr 21 23:55:26   ffffffff812e00b8
Apr 21 23:55:26   0000000000000000
Apr 21 23:55:26
Apr 21 23:55:26  [  793.225552]  0000000000000000
Apr 21 23:55:26   ffff881fffc05be8
Apr 21 23:55:26   ffffffff810dff1d
Apr 21 23:55:26   ffff881fff832c00
Apr 21 23:55:26
Apr 21 23:55:26  [  793.225915]  ffff881fffc05c20
Apr 21 23:55:26   ffffffff8110f8f8
Apr 21 23:55:26   0000000000000001
Apr 21 23:55:26   ffff881fffc0af00
Apr 21 23:55:26
Apr 21 23:55:26  [  793.226277] Call Trace:
Apr 21 23:55:26  [  793.226363]  <NMI>
Apr 21 23:55:26   [<ffffffff812e00b8>] dump_stack+0x4d/0x65
Apr 21 23:55:26  [  793.226812]  [<ffffffff810dff1d>] 
watchdog_overflow_callback+0xdd/0xf0
Apr 21 23:55:26  [  793.226916]  [<ffffffff8110f8f8>] 
__perf_event_overflow+0x88/0x1d0
Apr 21 23:55:26  [  793.227014]  [<ffffffff811103e4>] 
perf_event_overflow+0x14/0x20
Apr 21 23:55:26  [  793.227112]  [<ffffffff8101e320>] 
intel_pmu_handle_irq+0x1d0/0x4a0
Apr 21 23:55:26  [  793.227210]  [<ffffffff810162d8>] 
perf_event_nmi_handler+0x28/0x50
Apr 21 23:55:26  [  793.227309]  [<ffffffff81008121>] nmi_handle+0x61/0x110
Apr 21 23:55:26  [  793.227405]  [<ffffffff810082e7>] do_nmi+0x117/0x3e0
Apr 21 23:55:26  [  793.227503]  [<ffffffff814dae97>] 
end_repeat_nmi+0x1a/0x1e
Apr 21 23:55:26  [  793.227600]  [<ffffffff81090cc1>] ? 
queued_spin_lock_slowpath+0xf1/0x170
Apr 21 23:55:26  [  793.227700]  [<ffffffff81090cc1>] ? 
queued_spin_lock_slowpath+0xf1/0x170
Apr 21 23:55:26  [  793.227797]  [<ffffffff81090cc1>] ? 
queued_spin_lock_slowpath+0xf1/0x170
Apr 21 23:55:26  [  793.227895]  <<EOE>>
Apr 21 23:55:26   [<ffffffff814d8c6c>] _raw_spin_lock_irq+0x1c/0x20
Apr 21 23:55:26  [  793.228071]  [<ffffffffa01cd5d4>] 
raid5_make_request+0x6d4/0xce0 [raid456]
Apr 21 23:55:26  [  793.228171]  [<ffffffff8111b520>] ? 
mempool_alloc_slab+0x10/0x20
Apr 21 23:55:26  [  793.228270]  [<ffffffff8108bd90>] ? wait_woken+0x80/0x80
Apr 21 23:55:26  [  793.228368]  [<ffffffffa0110e43>] 
md_make_request+0xd3/0x210 [md_mod]
Apr 21 23:55:26  [  793.228468]  [<ffffffff812b8319>] 
generic_make_request+0xe9/0x1c0
Apr 21 23:55:26  [  793.228564]  [<ffffffff812b8452>] submit_bio+0x62/0x150
Apr 21 23:55:26  [  793.228663]  [<ffffffff811c6425>] 
mpage_bio_submit+0x25/0x30
Apr 21 23:55:26  [  793.228759]  [<ffffffff811c7489>] 
mpage_readpages+0x149/0x170
Apr 21 23:55:26  [  793.228858]  [<ffffffff81246040>] ? 
__xfs_get_blocks+0x810/0x810
Apr 21 23:55:26  [  793.228953]  [<ffffffff81246040>] ? 
__xfs_get_blocks+0x810/0x810
Apr 21 23:55:26  [  793.229065]  [<ffffffff8116633d>] ? 
alloc_pages_current+0x8d/0x110
Apr 21 23:55:26  [  793.229168]  [<ffffffff812442f3>] 
xfs_vm_readpages+0x33/0x80
Apr 21 23:55:26  [  793.229265]  [<ffffffff81126585>] 
__do_page_cache_readahead+0x165/0x210
Apr 21 23:55:26  [  793.229368]  [<ffffffffa02cc397>] ? 
br_dev_xmit+0x137/0x1d0 [bridge]
Apr 21 23:55:26  [  793.229465]  [<ffffffff8111b1c7>] 
filemap_fault+0x427/0x4d0
Apr 21 23:55:26  [  793.229561]  [<ffffffff814d756d>] ? down_read+0xd/0x20
Apr 21 23:55:26  [  793.229656]  [<ffffffff8124fe20>] 
xfs_filemap_fault+0x40/0xa0
Apr 21 23:55:26  [  793.229754]  [<ffffffff81144fcd>] __do_fault+0x5d/0x110
Apr 21 23:55:26  [  793.229849]  [<ffffffff81148e34>] 
handle_mm_fault+0x1154/0x1b00
Apr 21 23:55:26  [  793.229947]  [<ffffffff81042ee1>] 
__do_page_fault+0x121/0x360
Apr 21 23:55:26  [  793.230042]  [<ffffffff8104315c>] do_page_fault+0xc/0x10
Apr 21 23:55:26  [  793.230137]  [<ffffffff814dab8f>] page_fault+0x1f/0x30
Apr 21 23:55:26  [  793.230233]  [<ffffffff812ec4f2>] ? 
copy_user_enhanced_fast_string+0x2/0x10
Apr 21 23:55:26  [  793.230332]  [<ffffffff812f25bc>] ? 
copy_from_iter+0x7c/0x260
Apr 21 23:55:26  [  793.230429]  [<ffffffff81439f78>] 
tcp_sendmsg+0x5d8/0xae0
Apr 21 23:55:26  [  793.230524]  [<ffffffff8114c8e1>] ? 
__vma_link_file+0x41/0x50
Apr 21 23:55:26  [  793.230622]  [<ffffffff814631d0>] inet_sendmsg+0x60/0x90
Apr 21 23:55:26  [  793.230717]  [<ffffffff813d4da3>] sock_sendmsg+0x33/0x40
Apr 21 23:55:26  [  793.230811]  [<ffffffff813d51cf>] SYSC_sendto+0xef/0x170
Apr 21 23:55:26  [  793.230907]  [<ffffffff811363e8>] ? 
vm_mmap_pgoff+0x98/0xc0
Apr 21 23:55:26  [  793.231003]  [<ffffffff8114e075>] ? 
SyS_mmap_pgoff+0xe5/0x270
Apr 21 23:55:26  [  793.231098]  [<ffffffff813d5bc9>] SyS_sendto+0x9/0x10
Apr 21 23:55:26  [  793.231192]  [<ffffffff814d8f1b>] 
entry_SYSCALL_64_fastpath+0x16/0x6e
Apr 21 23:55:27  [  793.895422] NMI watchdog: Watchdog detected hard 
LOCKUP on cpu 4

We are not using any additional modules for monitoring the servers other 
than plain ping warnings in case a server is not responding..

We have tried loading the optimized defaults in bios, the current 
motherboard is on an older bios just for testing and the problem is 
identical..

I just cannot find the problem here, it appears to die constantly.

Right now, i have taken it out of production, and im moving data over 
from that raids, it currently consists of 6 raid5's, i will move data 
between them one at the time and re-create the mdadm raid and the 
filesystem on them to see if there's a problem there.

Any other ideas?

Best regards
Daniel

Den 20-04-2016 kl. 17:29 skrev John Stoffel:
> Daniel,
>
> This is one of those hard problems to diagnose.  Can you take the
> system out of production and run some stress tests on it to see how it
> does?
>
> Have you updated all the firmware on the board?  Have you disabled
> hyperthreading as well?  Is there any overclocking or stuff like that
> happening?  If so, go back to the BIOS "safe" defaults.
>
> Do you have another system with the same hardware that's working fine
> in the same type of setup?  Then that does point to hardware.
>
> Is your power supply maxed out or near the limits?  Maybe you're
> getting a slight under-voltage?  Not likely... but you never know.
>
> And why is the kernel tainted?  Are you adding in third party modules?
> If so, remove them completely from the system.  SuperMicros don't
> generally require anything like that in my experience.
>
> Is it some of the extra monitoring modules you have installed?
>
> Good luck!
> John
>
>
>
>>>>>> "Daniel" == Daniel Walker <admin@ftwinc.net> writes:
> Daniel> Hi,
>
> Daniel> I upgraded the kernel to the latest stable with debugging enabled
> Daniel> (4.5.1) without any luck, this is what is outputted in dmesg:
>
>
> Daniel>     [262448.558983] INFO: task php:13376 blocked for more than 120 seconds.
> Daniel>     [262448.559057]       Tainted: G        W       4.5.1 #1
> Daniel>     [262448.559092] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
> Daniel> disables this message.
> Daniel>     [262448.559246] php             D
> Daniel>      ffff88001c297a18
> Daniel>         0 13376  12277 0x00000000
> Daniel>     [262448.559519]  ffff88001c297a18
> Daniel>      ffff881ff248c100
> Daniel>      ffff880013e9b400
> Daniel>      ffff881fea472000
>
> Daniel>     [262448.559603]  ffff88001c297ae8
> Daniel>      ffff88001c298000
> Daniel>      ffff881c5cac1b30
> Daniel>      ffff880013e9b400
>
> Daniel>     [262448.560046]  0000000000020001
> Daniel>      0000000545ea7820
> Daniel>      ffff88001c297a30
> Daniel>      ffffffff814d5690
>
> Daniel>     [262448.560485] Call Trace:
> Daniel>     [262448.560541]  [<ffffffff814d5690>] schedule+0x30/0x80
> Daniel>     [262448.560761]  [<ffffffff814d823e>] schedule_timeout+0x21e/0x2a0
> Daniel>     [262448.560828]  [<ffffffff81217c3d>] ?
> Daniel> xfs_bmap_search_extents+0x7d/0x100
> Daniel>     [262448.561000]  [<ffffffff810902d9>] ? down_trylock+0x29/0x40
> Daniel>     [262448.561135]  [<ffffffff814d726f>] __down+0x5f/0xa0
> Daniel>     [262448.561268]  [<ffffffff8124bdd6>] ? _xfs_buf_find+0x156/0x350
> Daniel>     [262448.561347]  [<ffffffff8109032c>] down+0x3c/0x50
> Daniel>     [262448.561390]  [<ffffffff8124bbc7>] xfs_buf_lock+0x37/0xf0
> Daniel>     [262448.561435]  [<ffffffff8124bdd6>] _xfs_buf_find+0x156/0x350
> Daniel>     [262448.561557]  [<ffffffff8124bff5>] xfs_buf_get_map+0x25/0x280
> Daniel>     [262448.561603]  [<ffffffff81268f4b>] ? kmem_zone_alloc+0x7b/0x120
> Daniel>     [262448.561666]  [<ffffffff8124cbe8>] xfs_buf_read_map+0x28/0x180
> Daniel>     [262448.561768]  [<ffffffff8127830b>] xfs_trans_read_buf_map+0xeb/0x300
> Daniel>     [262448.561809]  [<ffffffff8123f7da>] xfs_imap_to_bp+0x5a/0xc0
> Daniel>     [262448.561881]  [<ffffffff8125b7a5>] xfs_iunlink_remove+0x275/0x3a0
> Daniel>     [262448.561943]  [<ffffffff81268f4b>] ? kmem_zone_alloc+0x7b/0x120
> Daniel>     [262448.561988]  [<ffffffff8125ec33>] xfs_ifree+0x33/0xd0
> Daniel>     [262448.562033]  [<ffffffff8125ed85>] xfs_inactive_ifree+0xb5/0x200
> Daniel>     [262448.562109]  [<ffffffff8125ef58>] xfs_inactive+0x88/0x110
> Daniel>     [262448.562296]  [<ffffffff81263f31>] xfs_fs_evict_inode+0xc1/0x110
> Daniel>     [262448.562344]  [<ffffffff811a42fb>] evict+0xbb/0x180
> Daniel>     [262448.562405]  [<ffffffff811a4bb3>] iput+0x193/0x200
> Daniel>     [262448.562483]  [<ffffffff811a08d2>] d_delete+0x122/0x160
> Daniel>     [262448.562520]  [<ffffffff81195b99>] vfs_rmdir+0xf9/0x120
> Daniel>     [262448.562559]  [<ffffffff81199d17>] do_rmdir+0x1b7/0x1d0
> Daniel>     [262448.562607]  [<ffffffff81001210>] ? exit_to_usermode_loop+0x90/0xb0
> Daniel>     [262448.562665]  [<ffffffff8119a921>] SyS_rmdir+0x11/0x20
> Daniel>     [262448.562891]  [<ffffffff814d8f1b>]
> Daniel> entry_SYSCALL_64_fastpath+0x16/0x6e
> Daniel>     [262489.707201] NMI watchdog: Watchdog detected hard LOCKUP on cpu 15
>
> Daniel>     [262489.707227] Modules linked in:
> Daniel>      ipt_MASQUERADE
> Daniel>      nf_nat_masquerade_ipv4
> Daniel>      iptable_nat
> Daniel>      nf_conntrack_ipv4
> Daniel>      nf_defrag_ipv4
> Daniel>      nf_nat_ipv4
> Daniel>      nf_nat
> Daniel>      nf_conntrack
> Daniel>      ipt_REJECT
> Daniel>      nf_reject_ipv4
> Daniel>      iptable_mangle
> Daniel>      netconsole
> Daniel>      configfs
> Daniel>      tun
> Daniel>      xt_multiport
> Daniel>      ip6table_filter
> Daniel>      ip6_tables
> Daniel>      iptable_filter
> Daniel>      ip_tables
> Daniel>      x_tables
> Daniel>      bridge
> Daniel>      stp
> Daniel>      llc
> Daniel>      bonding
> Daniel>      ext4
> Daniel>      crc16
> Daniel>      mbcache
> Daniel>      jbd2
> Daniel>      raid1
> Daniel>      raid0
> Daniel>      raid456
> Daniel>      async_raid6_recov
> Daniel>      async_memcpy
> Daniel>      async_pq
> Daniel>      async_xor
> Daniel>      xor
> Daniel>      async_tx
> Daniel>      raid6_pq
> Daniel>      md_mod
> Daniel>      sg
> Daniel>      sd_mod
> Daniel>      hid_generic
> Daniel>      usbhid
> Daniel>      hid
> Daniel>      x86_pkg_temp_thermal
> Daniel>      coretemp
> Daniel>      crct10dif_pclmul
> Daniel>      crc32_pclmul
> Daniel>      crc32c_intel
> Daniel>      ghash_clmulni_intel
> Daniel>      jitterentropy_rng
> Daniel>      sha256_ssse3
> Daniel>      iTCO_wdt
> Daniel>      sha256_generic
> Daniel>      iTCO_vendor_support
> Daniel>      hmac
> Daniel>      drbg
> Daniel>      xhci_pci
> Daniel>      ahci
> Daniel>      sb_edac
> Daniel>      ehci_pci
> Daniel>      ansi_cprng
> Daniel>      xhci_hcd
> Daniel>      ehci_hcd
> Daniel>      libahci
> Daniel>      i2c_i801
> Daniel>      edac_core
> Daniel>      lpc_ich
> Daniel>      mei_me
> Daniel>      mfd_core
> Daniel>      libata
> Daniel>      usbcore
> Daniel>      igb
> Daniel>      mei
> Daniel>      megaraid_sas
> Daniel>      i2c_algo_bit
> Daniel>      usb_common
> Daniel>      ptp
> Daniel>      aesni_intel
> Daniel>      pps_core
> Daniel>      aes_x86_64
> Daniel>      ioatdma
> Daniel>      lrw
> Daniel>      gf128mul
> Daniel>      glue_helper
> Daniel>      ablk_helper
> Daniel>      i2c_core
> Daniel>      scsi_mod
> Daniel>      dca
> Daniel>      cryptd
> Daniel>      ipmi_si
> Daniel>      ipmi_msghandler
> Daniel>      acpi_power_meter
> Daniel>      tpm_tis
> Daniel>      tpm
> Daniel>      processor
> Daniel>      button
>
> Daniel>     [262489.708066] CPU: 15 PID: 17535 Comm: kworker/u32:6 Tainted:
> Daniel> G        W       4.5.1 #1
> Daniel>     [262489.708124] Hardware name: Supermicro Super Server/X10DRi-LN4+,
> Daniel> BIOS 2.0 12/17/2015
> Daniel>     [262489.708187] Workqueue: writeback wb_workfn
> Daniel>      (flush-9:7)
>
> Daniel>     [262489.708228]  0000000000000000
> Daniel>      ffff88207fde5bd0
> Daniel>      ffffffff812e00b8
> Daniel>      0000000000000000
>
> Daniel>     [262489.708298]  0000000000000000
> Daniel>      ffff88207fde5be8
> Daniel>      ffffffff810dff1d
> Daniel>      ffff881ff2270000
>
> Daniel>     [262489.708368]  ffff88207fde5c20
> Daniel>      ffffffff8110f8f8
> Daniel>      0000000000000001
> Daniel>      ffff88207fdeaf00
>
> Daniel>     [262489.708438] Call Trace:
> Daniel>     [262489.708467]  <NMI>
> Daniel>      [<ffffffff812e00b8>] dump_stack+0x4d/0x65
> Daniel>     [262489.708512]  [<ffffffff810dff1d>]
> Daniel> watchdog_overflow_callback+0xdd/0xf0
> Daniel>     [262489.708552]  [<ffffffff8110f8f8>] __perf_event_overflow+0x88/0x1d0
> Daniel>     [262489.708589]  [<ffffffff811103e4>] perf_event_overflow+0x14/0x20
> Daniel>     [262489.708627]  [<ffffffff8101e320>] intel_pmu_handle_irq+0x1d0/0x4a0
> Daniel>     [262489.708666]  [<ffffffff81155481>] ? vunmap_page_range+0x1a1/0x310
> Daniel>     [262489.708703]  [<ffffffff811555fc>] ?
> Daniel> unmap_kernel_range_noflush+0xc/0x10
> Daniel>     [262489.708748]  [<ffffffff8135a543>] ?
> Daniel> ghes_copy_tofrom_phys+0x113/0x1e0
> Daniel>     [262489.708788]  [<ffffffff810359da>] ?
> Daniel> native_apic_wait_icr_idle+0x1a/0x30
> Daniel>     [262489.708827]  [<ffffffff810096e0>] ? arch_irq_work_raise+0x30/0x40
> Daniel>     [262489.708865]  [<ffffffff810162d8>] perf_event_nmi_handler+0x28/0x50
> Daniel>     [262489.708902]  [<ffffffff81008121>] nmi_handle+0x61/0x110
> Daniel>     [262489.708939]  [<ffffffff810082e7>] do_nmi+0x117/0x3e0
> Daniel>     [262489.708975]  [<ffffffff814dae97>] end_repeat_nmi+0x1a/0x1e
> Daniel>     [262489.709013]  [<ffffffffa01d05f0>] ? raid5_unplug+0x70/0x130
> Daniel> [raid456]
> Daniel>     [262489.709051]  [<ffffffffa01d05f0>] ? raid5_unplug+0x70/0x130
> Daniel> [raid456]
> Daniel>     [262489.709089]  [<ffffffffa01d05f0>] ? raid5_unplug+0x70/0x130
> Daniel> [raid456]
> Daniel>     [262489.709125]  <<EOE>>
> Daniel>      [<ffffffff812b9b98>] blk_flush_plug_list+0xa8/0x210
> Daniel>     [262489.709169]  [<ffffffff814d5de0>] ? bit_wait_timeout+0x70/0x70
> Daniel>     [262489.709206]  [<ffffffff814d4c04>] io_schedule_timeout+0x54/0x130
> Daniel>     [262489.709242]  [<ffffffff814d5df6>] bit_wait_io+0x16/0x60
> Daniel>     [262489.709277]  [<ffffffff814d5b59>] __wait_on_bit_lock+0x49/0xa0
> Daniel>     [262489.709314]  [<ffffffff81117fd0>] __lock_page+0xb0/0xc0
> Daniel>     [262489.709352]  [<ffffffff8108bdc0>] ?
> Daniel> autoremove_wake_function+0x30/0x30
> Daniel>     [262489.709391]  [<ffffffff811250f0>] write_cache_pages+0x2f0/0x4d0
> Daniel>     [262489.709427]  [<ffffffff81122df0>] ? wb_position_ratio+0x1f0/0x1f0
> Daniel>     [262489.709465]  [<ffffffff8112530e>] generic_writepages+0x3e/0x60
> Daniel>     [262489.709502]  [<ffffffff81244c18>] xfs_vm_writepages+0x38/0x40
> Daniel>     [262489.709539]  [<ffffffff81125e29>] do_writepages+0x19/0x30
> Daniel>     [262489.709574]  [<ffffffff811b5c50>]
> Daniel> __writeback_single_inode+0x40/0x310
> Daniel>     [262489.709612]  [<ffffffff811b6402>] writeback_sb_inodes+0x242/0x520
> Daniel>     [262489.709649]  [<ffffffff811b676a>] __writeback_inodes_wb+0x8a/0xc0
> Daniel>     [262489.709686]  [<ffffffff811b6a77>] wb_writeback+0x247/0x2d0
> Daniel>     [262489.709721]  [<ffffffff811b716f>] wb_workfn+0x20f/0x3c0
> Daniel>     [262489.709758]  [<ffffffff81067513>] process_one_work+0x143/0x400
> Daniel>     [262489.709795]  [<ffffffff81067cc1>] worker_thread+0x61/0x490
> Daniel>     [262489.709831]  [<ffffffff81067c60>] ? max_active_store+0x60/0x60
> Daniel>     [262489.709867]  [<ffffffff8106c926>] kthread+0xd6/0xf0
> Daniel>     [262489.709901]  [<ffffffff8106c850>] ? kthread_park+0x50/0x50
> Daniel>     [262489.709937]  [<ffffffff814d92af>] ret_from_fork+0x3f/0x70
> Daniel>     [262489.709972]  [<ffffffff8106c850>] ? kthread_park+0x50/0x50
> Daniel>     [262491.022971] NMI watchdog: Watchdog detected hard LOCKUP on cpu 0
>
> Daniel>     [262491.023470] Modules linked in:
> Daniel>      ipt_MASQUERADE
> Daniel>      nf_nat_masquerade_ipv4
> Daniel>      iptable_nat
> Daniel>      nf_conntrack_ipv4
> Daniel>      nf_defrag_ipv4
> Daniel>      nf_nat_ipv4
> Daniel>      nf_nat
> Daniel>      nf_conntrack
> Daniel>      ipt_REJECT
> Daniel>      nf_reject_ipv4
> Daniel>      iptable_mangle
> Daniel>      netconsole
> Daniel>      configfs
> Daniel>      tun
> Daniel>      xt_multiport
> Daniel>      ip6table_filter
> Daniel>      ip6_tables
> Daniel>      iptable_filter
> Daniel>      ip_tables
> Daniel>      x_tables
> Daniel>      bridge
> Daniel>      stp
> Daniel>      llc
> Daniel>      bonding
> Daniel>      ext4
> Daniel>      crc16
> Daniel>      mbcache
> Daniel>      jbd2
> Daniel>      raid1
> Daniel>      raid0
> Daniel>      raid456
> Daniel>      async_raid6_recov
> Daniel>      async_memcpy
> Daniel>      async_pq
> Daniel>      async_xor
> Daniel>      xor
> Daniel>      async_tx
> Daniel>      raid6_pq
> Daniel>      md_mod
> Daniel>      sg
> Daniel>      sd_mod
> Daniel>      hid_generic
> Daniel>      usbhid
> Daniel>      hid
> Daniel>      x86_pkg_temp_thermal
> Daniel>      coretemp
> Daniel>      crct10dif_pclmul
> Daniel>      crc32_pclmul
> Daniel>      crc32c_intel
> Daniel>      ghash_clmulni_intel
> Daniel>      jitterentropy_rng
> Daniel>      sha256_ssse3
> Daniel>      iTCO_wdt
> Daniel>      sha256_generic
> Daniel>      iTCO_vendor_support
> Daniel>      hmac
> Daniel>      drbg
> Daniel>      xhci_pci
> Daniel>      ahci
> Daniel>      sb_edac
> Daniel>      ehci_pci
> Daniel>      ansi_cprng
> Daniel>      xhci_hcd
> Daniel>      ehci_hcd
> Daniel>      libahci
> Daniel>      i2c_i801
> Daniel>      edac_core
> Daniel>      lpc_ich
> Daniel>      mei_me
> Daniel>      mfd_core
> Daniel>      libata
> Daniel>      usbcore
> Daniel>      igb
> Daniel>      mei
> Daniel>      megaraid_sas
> Daniel>      i2c_algo_bit
> Daniel>      usb_common
> Daniel>      ptp
> Daniel>      aesni_intel
> Daniel>      pps_core
> Daniel>      aes_x86_64
> Daniel>      ioatdma
> Daniel>      lrw
> Daniel>      gf128mul
> Daniel>      glue_helper
> Daniel>      ablk_helper
> Daniel>      i2c_core
> Daniel>      scsi_mod
> Daniel>      dca
> Daniel>      cryptd
> Daniel>      ipmi_si
> Daniel>      ipmi_msghandler
> Daniel>      acpi_power_meter
> Daniel>      tpm_tis
> Daniel>      tpm
> Daniel>      processor
> Daniel>      button
>
> Daniel>     [262491.029705] CPU: 0 PID: 1178 Comm: md7_raid5 Tainted: G
> Daniel> W       4.5.1 #1
> Daniel>     [262491.029776] Hardware name: Supermicro Super Server/X10DRi-LN4+,
> Daniel> BIOS 2.0 12/17/2015
> Daniel>     [262491.029849]  0000000000000000
> Daniel>      ffff88207fc05bd0
> Daniel>      ffffffff812e00b8
> Daniel>      0000000000000000
>
> Daniel>     [262491.029988]  0000000000000000
> Daniel>      ffff88207fc05be8
> Daniel>      ffffffff810dff1d
> Daniel>      ffff881fff032000
>
> Daniel>     [262491.030124]  ffff88207fc05c20
> Daniel>      ffffffff8110f8f8
> Daniel>      0000000000000001
> Daniel>      ffff88207fc0af00
>
> Daniel>     [262491.030260] Call Trace:
> Daniel>     [262491.030302]  <NMI>
> Daniel>      [<ffffffff812e00b8>] dump_stack+0x4d/0x65
> Daniel>     [262491.030377]  [<ffffffff810dff1d>]
> Daniel> watchdog_overflow_callback+0xdd/0xf0
> Daniel>     [262491.030432]  [<ffffffff8110f8f8>] __perf_event_overflow+0x88/0x1d0
> Daniel>     [262491.030484]  [<ffffffff811103e4>] perf_event_overflow+0x14/0x20
> Daniel>     [262491.030536]  [<ffffffff8101e320>] intel_pmu_handle_irq+0x1d0/0x4a0
> Daniel>     [262491.030589]  [<ffffffff81155481>] ? vunmap_page_range+0x1a1/0x310
> Daniel>     [262491.030640]  [<ffffffff811555fc>] ?
> Daniel> unmap_kernel_range_noflush+0xc/0x10
> Daniel>     [262491.030693]  [<ffffffff8135a543>] ?
> Daniel> ghes_copy_tofrom_phys+0x113/0x1e0
> Daniel>     [262491.030745]  [<ffffffff8135a681>] ? ghes_read_estatus+0x71/0x140
> Daniel>     [262491.030797]  [<ffffffff810162d8>] perf_event_nmi_handler+0x28/0x50
> Daniel>     [262491.030849]  [<ffffffff81008121>] nmi_handle+0x61/0x110
> Daniel>     [262491.030898]  [<ffffffff810083d1>] do_nmi+0x201/0x3e0
> Daniel>     [262491.030949]  [<ffffffff814dae97>] end_repeat_nmi+0x1a/0x1e
> Daniel>     [262491.030998]  [<ffffffff81090d23>] ?
> Daniel> queued_spin_lock_slowpath+0x153/0x170
> Daniel>     [262491.031050]  [<ffffffff81090d23>] ?
> Daniel> queued_spin_lock_slowpath+0x153/0x170
> Daniel>     [262491.031102]  [<ffffffff81090d23>] ?
> Daniel> queued_spin_lock_slowpath+0x153/0x170
> Daniel>     [262491.031153]  <<EOE>>
> Daniel>      [<ffffffff814d8c6c>] _raw_spin_lock_irq+0x1c/0x20
> Daniel>     [262491.031225]  [<ffffffffa01db6b1>] raid5d+0x91/0x720 [raid456]
> Daniel>     [262491.031276]  [<ffffffff810a4a8a>] ? try_to_del_timer_sync+0x4a/0x60
> Daniel>     [262491.031328]  [<ffffffff810a4ae3>] ? del_timer_sync+0x43/0x50
> Daniel>     [262491.031377]  [<ffffffff814d816e>] ? schedule_timeout+0x14e/0x2a0
> Daniel>     [262491.031428]  [<ffffffff810a4830>] ?
> Daniel> trace_event_raw_event_tick_stop+0x100/0x100
> Daniel>     [262491.031502]  [<ffffffffa017874b>] md_thread+0x12b/0x130 [md_mod]
> Daniel>     [262491.031555]  [<ffffffff8108bd90>] ? wait_woken+0x80/0x80
> Daniel>     [262491.031605]  [<ffffffffa0178620>] ? find_pers+0x70/0x70 [md_mod]
> Daniel>     [262491.031656]  [<ffffffff8106c926>] kthread+0xd6/0xf0
> Daniel>     [262491.031704]  [<ffffffff8106c850>] ? kthread_park+0x50/0x50
> Daniel>     [262491.031753]  [<ffffffff814d92af>] ret_from_fork+0x3f/0x70
> Daniel>     [262491.031802]  [<ffffffff8106c850>] ? kthread_park+0x50/0x50
> Daniel>     [262491.031753]  [<ffffffff814d92af>] ret_from_fork+0x3f/0x70
> Daniel>     [262491.031802]  [<ffffffff8106c850>] ? kthread_park+0x50/0x50
>
> Daniel> The server is hosting plain VPS's, there's a few that use it for
> Daniel> rtorrent which is quite disk extenssive, but from what I can see that
> Daniel> iowait is quite low.
>
> Daniel> There's absolutely nothing logged at all before the lockups, everythings
> Daniel> running fine and then suddenly it just crashes, im beginning to think we
> Daniel> might have a hardware problem, but im having a hard time finding the
> Daniel> actual issue.
>
> Daniel> Any ideas?
>
> Daniel> Best regards
>
>
> Daniel> Den 13-04-2016 kl. 19:00 skrev Shaohua Li:
>>> Looks there is a deadlock trying to hold the device_lock or hash_lock. anything
>>> abormal print out before the NMI watchdog? What is running in the machine?
>>> Looks this is old kernel, is it possible you can try a latest kernel and report
>>> back?
>>>
>>> Thanks,
>>> Shaohua
>>>
>>> On Tue, Apr 12, 2016 at 09:54:08PM +0000, Daniel Walker wrote:
>>>> Im having some issues on a brand new Supermicro server that we have running
>>>> in production along side a few other machines which are identical to this
>>>> server..
>>>>
>>>> The output from the netconsole attached to the server is here:
>>>>
>>>> Apr 12 21:34:45  [75704.964946] NMI watchdog: Watchdog detected hard LOCKUP
>>>> on cpu 6
>>>> Apr 12 21:34:45
>>>> Apr 12 21:34:45  [75704.964973] Modules linked in:
>>>> Apr 12 21:34:45   ipt_REJECT
>>>> Apr 12 21:34:45   nf_reject_ipv4
>>>> Apr 12 21:34:45   iptable_mangle
>>>> Apr 12 21:34:45   tun
>>>> Apr 12 21:34:45   netconsole
>>>> Apr 12 21:34:45   configfs
>>>> Apr 12 21:34:45   xt_multiport
>>>> Apr 12 21:34:45   ip6table_filter
>>>> Apr 12 21:34:45   ip6_tables
>>>> Apr 12 21:34:45   iptable_filter
>>>> Apr 12 21:34:45   ip_tables
>>>> Apr 12 21:34:45   x_tables
>>>> Apr 12 21:34:45   bridge
>>>> Apr 12 21:34:45   stp
>>>> Apr 12 21:34:45   llc
>>>> Apr 12 21:34:45   bonding
>>>> Apr 12 21:34:45   ext4
>>>> Apr 12 21:34:45   crc16
>>>> Apr 12 21:34:45   mbcache
>>>> Apr 12 21:34:45   jbd2
>>>> Apr 12 21:34:45   raid1
>>>> Apr 12 21:34:45   raid0
>>>> Apr 12 21:34:45   raid456
>>>> Apr 12 21:34:45   async_raid6_recov
>>>> Apr 12 21:34:45   async_memcpy
>>>> Apr 12 21:34:45   async_pq
>>>> Apr 12 21:34:45   async_xor
>>>> Apr 12 21:34:45   xor
>>>> Apr 12 21:34:45   async_tx
>>>> Apr 12 21:34:45   raid6_pq
>>>> Apr 12 21:34:45   md_mod
>>>> Apr 12 21:34:45   sr_mod
>>>> Apr 12 21:34:45   cdrom
>>>> Apr 12 21:34:45   usb_storage
>>>> Apr 12 21:34:45   hid_generic
>>>> Apr 12 21:34:45   usbhid
>>>> Apr 12 21:34:45   hid
>>>> Apr 12 21:34:45   sg
>>>> Apr 12 21:34:45   sd_mod
>>>> Apr 12 21:34:45   x86_pkg_temp_thermal
>>>> Apr 12 21:34:45   coretemp
>>>> Apr 12 21:34:45   crct10dif_pclmul
>>>> Apr 12 21:34:45   crc32_pclmul
>>>> Apr 12 21:34:45   crc32c_intel
>>>> Apr 12 21:34:45   jitterentropy_rng
>>>> Apr 12 21:34:45   sha256_ssse3
>>>> Apr 12 21:34:45   sha256_generic
>>>> Apr 12 21:34:45   hmac
>>>> Apr 12 21:34:45   iTCO_wdt
>>>> Apr 12 21:34:45   iTCO_vendor_support
>>>> Apr 12 21:34:45   drbg
>>>> Apr 12 21:34:45   ansi_cprng
>>>> Apr 12 21:34:45   aesni_intel
>>>> Apr 12 21:34:45   aes_x86_64
>>>> Apr 12 21:34:45   lrw
>>>> Apr 12 21:34:45   gf128mul
>>>> Apr 12 21:34:45   glue_helper
>>>> Apr 12 21:34:45   ablk_helper
>>>> Apr 12 21:34:45   cryptd
>>>> Apr 12 21:34:45   ahci
>>>> Apr 12 21:34:45   libahci
>>>> Apr 12 21:34:45   sb_edac
>>>> Apr 12 21:34:45   libata
>>>> Apr 12 21:34:45   igb
>>>> Apr 12 21:34:45   megaraid_sas
>>>> Apr 12 21:34:45   xhci_pci
>>>> Apr 12 21:34:45   ehci_pci
>>>> Apr 12 21:34:45   i2c_algo_bit
>>>> Apr 12 21:34:45   xhci_hcd
>>>> Apr 12 21:34:45   ehci_hcd
>>>> Apr 12 21:34:45   edac_core
>>>> Apr 12 21:34:45   ptp
>>>> Apr 12 21:34:45   mei_me
>>>> Apr 12 21:34:45   lpc_ich
>>>> Apr 12 21:34:45   i2c_i801
>>>> Apr 12 21:34:45   usbcore
>>>> Apr 12 21:34:45   pps_core
>>>> Apr 12 21:34:45   mfd_core
>>>> Apr 12 21:34:45   mei
>>>> Apr 12 21:34:45   usb_common
>>>> Apr 12 21:34:45   i2c_core
>>>> Apr 12 21:34:45   ioatdma
>>>> Apr 12 21:34:45   scsi_mod
>>>> Apr 12 21:34:45   dca
>>>> Apr 12 21:34:45   ipmi_si
>>>> Apr 12 21:34:45   ipmi_msghandler
>>>> Apr 12 21:34:45   acpi_power_meter
>>>> Apr 12 21:34:45   tpm_tis
>>>> Apr 12 21:34:45   tpm
>>>> Apr 12 21:34:45   processor
>>>> Apr 12 21:34:45   button
>>>> Apr 12 21:34:45
>>>> Apr 12 21:34:45  [75704.965874] CPU: 6 PID: 25339 Comm: main Not tainted
>>>> 4.4.1 #2
>>>> Apr 12 21:34:45  [75704.965916] Hardware name: Supermicro Super
>>>> Server/X10DRi-LN4+, BIOS 2.0 12/17/2015
>>>> Apr 12 21:34:45  [75704.965979]  0000000000000000
>>>> Apr 12 21:34:45   ffffffff812abdf3
>>>> Apr 12 21:34:45   0000000000000000
>>>> Apr 12 21:34:45   ffffffff810cf5f5
>>>> Apr 12 21:34:45
>>>> Apr 12 21:34:45  [75704.966054]  ffff881ff2870000
>>>> Apr 12 21:34:45   ffffffff810fcea2
>>>> Apr 12 21:34:45   0000000000000001
>>>> Apr 12 21:34:45   ffff881fffcc5e58
>>>> Apr 12 21:34:45
>>>> Apr 12 21:34:45  [75704.966134]  ffff881fffccaf00
>>>> Apr 12 21:34:45   ffff881fffccb100
>>>> Apr 12 21:34:45   ffff881ff2870000
>>>> Apr 12 21:34:45   ffffffff8101bc63
>>>> Apr 12 21:34:45
>>>> Apr 12 21:34:45  [75704.966211] Call Trace:
>>>> Apr 12 21:34:45  [75704.966246]  <NMI>
>>>> Apr 12 21:34:45   [<ffffffff812abdf3>] ? dump_stack+0x40/0x5d
>>>> Apr 12 21:34:45  [75704.966297]  [<ffffffff810cf5f5>] ?
>>>> watchdog_overflow_callback+0xb5/0xd0
>>>> Apr 12 21:34:45  [75704.966339]  [<ffffffff810fcea2>] ?
>>>> __perf_event_overflow+0x82/0x1c0
>>>> Apr 12 21:34:45  [75704.966384]  [<ffffffff8101bc63>] ?
>>>> intel_pmu_handle_irq+0x1c3/0x3e0
>>>> Apr 12 21:34:45  [75704.966431]  [<ffffffff8113b5cb>] ?
>>>> vunmap_page_range+0x1bb/0x320
>>>> Apr 12 21:34:45  [75704.966474]  [<ffffffff813213e0>] ?
>>>> ghes_copy_tofrom_phys+0x110/0x1d0
>>>> Apr 12 21:34:45  [75704.966519]  [<ffffffff81014f53>] ?
>>>> perf_event_nmi_handler+0x23/0x40
>>>> Apr 12 21:34:45  [75704.966560]  [<ffffffff81007b85>] ?
>>>> nmi_handle+0x65/0x100
>>>> Apr 12 21:34:45  [75704.966597]  [<ffffffff81007dfe>] ? do_nmi+0x1de/0x360
>>>> Apr 12 21:34:45  [75704.970603]  [<ffffffff8148f957>] ?
>>>> end_repeat_nmi+0x1a/0x1e
>>>> Apr 12 21:34:45  [75704.970644]  [<ffffffff810862ca>] ?
>>>> queued_spin_lock_slowpath+0xea/0x150
>>>> Apr 12 21:34:45  [75704.970685]  [<ffffffff810862ca>] ?
>>>> queued_spin_lock_slowpath+0xea/0x150
>>>> Apr 12 21:34:45  [75704.970728]  [<ffffffff810862ca>] ?
>>>> queued_spin_lock_slowpath+0xea/0x150
>>>> Apr 12 21:34:45  [75704.970768]  <<EOE>>
>>>> Apr 12 21:34:45   [<ffffffffa01b413b>] ? make_request+0x60b/0xbd0 [raid456]
>>>> Apr 12 21:34:45  [75704.970838]  [<ffffffff810815c0>] ? wait_woken+0x80/0x80
>>>> Apr 12 21:34:45  [75704.970878]  [<ffffffff81151ec4>] ?
>>>> kmem_cache_alloc+0xf4/0x120
>>>> Apr 12 21:34:45  [75704.970922]  [<ffffffffa017632d>] ?
>>>> md_make_request+0xdd/0x220 [md_mod]
>>>> Apr 12 21:34:45  [75704.970969]  [<ffffffff81219fde>] ?
>>>> xfs_map_buffer.isra.12+0x2e/0x60
>>>> Apr 12 21:34:45  [75704.971012]  [<ffffffff8128691d>] ?
>>>> generic_make_request+0xed/0x1d0
>>>> Apr 12 21:34:45  [75704.971052]  [<ffffffff81286a5a>] ?
>>>> submit_bio+0x5a/0x140
>>>> Apr 12 21:34:45  [75704.971098]  [<ffffffff81113379>] ?
>>>> release_pages+0xc9/0x270
>>>> Apr 12 21:34:45  [75704.971145]  [<ffffffff811a2c01>] ?
>>>> do_mpage_readpage+0x2d1/0x640
>>>> Apr 12 21:34:45  [75704.971187]  [<ffffffff811a304d>] ?
>>>> mpage_readpages+0xdd/0x130
>>>> Apr 12 21:34:45  [75704.971226]  [<ffffffff8121b510>] ?
>>>> __xfs_get_blocks+0x750/0x750
>>>> Apr 12 21:34:45  [75704.971267]  [<ffffffff8121b510>] ?
>>>> __xfs_get_blocks+0x750/0x750
>>>> Apr 12 21:34:45  [75704.971313]  [<ffffffff8114ad45>] ?
>>>> alloc_pages_current+0x85/0x110
>>>> Apr 12 21:34:45  [75704.971354]  [<ffffffff81111d25>] ?
>>>> __do_page_cache_readahead+0x165/0x1f0
>>>> Apr 12 21:34:45  [75704.971399]  [<ffffffff81105902>] ?
>>>> pagecache_get_page+0x22/0x1a0
>>>> Apr 12 21:34:45  [75704.971441]  [<ffffffff8110768c>] ?
>>>> filemap_fault+0x37c/0x400
>>>> Apr 12 21:34:45  [75704.971481]  [<ffffffff8122474b>] ?
>>>> xfs_filemap_fault+0x3b/0x80
>>>> Apr 12 21:34:45  [75704.971526]  [<ffffffff8112d2da>] ? __do_fault+0x3a/0xc0
>>>> Apr 12 21:34:45  [75704.971564]  [<ffffffff81130883>] ?
>>>> handle_mm_fault+0x1063/0x1650
>>>> Apr 12 21:34:45  [75704.971614]  [<ffffffff8103bdae>] ?
>>>> __do_page_fault+0x11e/0x370
>>>> Apr 12 21:34:45  [75704.971653]  [<ffffffff811aa4ff>] ?
>>>> SyS_epoll_wait+0x8f/0xd0
>>>> Apr 12 21:34:45  [75704.971694]  [<ffffffff8148f64f>] ? page_fault+0x1f/0x30
>>>> Apr 12 21:34:45  [75705.493640] NMI watchdog: Watchdog detected hard LOCKUP
>>>> on cpu 12
>>>> Apr 12 21:34:45
>>>> Apr 12 21:34:45  [75705.493668] Modules linked in:
>>>> Apr 12 21:34:45   ipt_REJECT
>>>> Apr 12 21:34:45   nf_reject_ipv4
>>>> Apr 12 21:34:45   iptable_mangle
>>>> Apr 12 21:34:45   tun
>>>> Apr 12 21:34:45   netconsole
>>>> Apr 12 21:34:45   configfs
>>>> Apr 12 21:34:45   xt_multiport
>>>> Apr 12 21:34:45   ip6table_filter
>>>> Apr 12 21:34:45   ip6_tables
>>>> Apr 12 21:34:45   iptable_filter
>>>> Apr 12 21:34:45   ip_tables
>>>> Apr 12 21:34:45   x_tables
>>>> Apr 12 21:34:45   bridge
>>>> Apr 12 21:34:45   stp
>>>> Apr 12 21:34:45   llc
>>>> Apr 12 21:34:45   bonding
>>>> Apr 12 21:34:45   ext4
>>>> Apr 12 21:34:45   crc16
>>>> Apr 12 21:34:45   mbcache
>>>> Apr 12 21:34:45   jbd2
>>>> Apr 12 21:34:45   raid1
>>>> Apr 12 21:34:45   raid0
>>>> Apr 12 21:34:45   raid456
>>>> Apr 12 21:34:45   async_raid6_recov
>>>> Apr 12 21:34:45   async_memcpy
>>>> Apr 12 21:34:45   async_pq
>>>> Apr 12 21:34:45   async_xor
>>>> Apr 12 21:34:45   xor
>>>> Apr 12 21:34:45   async_tx
>>>> Apr 12 21:34:45   raid6_pq
>>>> Apr 12 21:34:45   md_mod
>>>> Apr 12 21:34:45   sr_mod
>>>> Apr 12 21:34:45   cdrom
>>>> Apr 12 21:34:45   usb_storage
>>>> Apr 12 21:34:45   hid_generic
>>>> Apr 12 21:34:45   usbhid
>>>> Apr 12 21:34:45   hid
>>>> Apr 12 21:34:45   sg
>>>> Apr 12 21:34:45   sd_mod
>>>> Apr 12 21:34:45   x86_pkg_temp_thermal
>>>> Apr 12 21:34:45   coretemp
>>>> Apr 12 21:34:45   crct10dif_pclmul
>>>> Apr 12 21:34:45   crc32_pclmul
>>>> Apr 12 21:34:45   crc32c_intel
>>>> Apr 12 21:34:45   jitterentropy_rng
>>>> Apr 12 21:34:45   sha256_ssse3
>>>> Apr 12 21:34:45   sha256_generic
>>>> Apr 12 21:34:45   hmac
>>>> Apr 12 21:34:45   iTCO_wdt
>>>> Apr 12 21:34:45   iTCO_vendor_support
>>>> Apr 12 21:34:45   drbg
>>>> Apr 12 21:34:45   ansi_cprng
>>>> Apr 12 21:34:45   aesni_intel
>>>> Apr 12 21:34:45   aes_x86_64
>>>> Apr 12 21:34:45   lrw
>>>> Apr 12 21:34:45   gf128mul
>>>> Apr 12 21:34:45   glue_helper
>>>> Apr 12 21:34:45   ablk_helper
>>>> Apr 12 21:34:45   cryptd
>>>> Apr 12 21:34:45   ahci
>>>> Apr 12 21:34:45   libahci
>>>> Apr 12 21:34:45   sb_edac
>>>> Apr 12 21:34:45   libata
>>>> Apr 12 21:34:45   igb
>>>> Apr 12 21:34:45   megaraid_sas
>>>> Apr 12 21:34:45   xhci_pci
>>>> Apr 12 21:34:45   ehci_pci
>>>> Apr 12 21:34:45   i2c_algo_bit
>>>> Apr 12 21:34:45   xhci_hcd
>>>> Apr 12 21:34:45   ehci_hcd
>>>> Apr 12 21:34:45   edac_core
>>>> Apr 12 21:34:45   ptp
>>>> Apr 12 21:34:45   mei_me
>>>> Apr 12 21:34:45   lpc_ich
>>>> Apr 12 21:34:45   i2c_i801
>>>> Apr 12 21:34:45   usbcore
>>>> Apr 12 21:34:45   pps_core
>>>> Apr 12 21:34:45   mfd_core
>>>> Apr 12 21:34:45   mei
>>>> Apr 12 21:34:45   usb_common
>>>> Apr 12 21:34:45   i2c_core
>>>> Apr 12 21:34:45   ioatdma
>>>> Apr 12 21:34:45   scsi_mod
>>>> Apr 12 21:34:45   dca
>>>> Apr 12 21:34:45   ipmi_si
>>>> Apr 12 21:34:45   ipmi_msghandler
>>>> Apr 12 21:34:45   acpi_power_meter
>>>> Apr 12 21:34:45   tpm_tis
>>>> Apr 12 21:34:45   tpm
>>>> Apr 12 21:34:45   processor
>>>> Apr 12 21:34:45   button
>>>> Apr 12 21:34:45
>>>> Apr 12 21:34:45  [75705.494688] CPU: 12 PID: 32350 Comm: main Not tainted
>>>> 4.4.1 #2
>>>> Apr 12 21:34:45  [75705.494728] Hardware name: Supermicro Super
>>>> Server/X10DRi-LN4+, BIOS 2.0 12/17/2015
>>>> Apr 12 21:34:45  [75705.494790]  0000000000000000
>>>> Apr 12 21:34:45   ffffffff812abdf3
>>>> Apr 12 21:34:45   0000000000000000
>>>> Apr 12 21:34:45   ffffffff810cf5f5
>>>> Apr 12 21:34:45
>>>> Apr 12 21:34:45  [75705.494886]  ffff883ff29a0000
>>>> Apr 12 21:34:45   ffffffff810fcea2
>>>> Apr 12 21:34:45   0000000000000001
>>>> Apr 12 21:34:45   ffff88407fc85e58
>>>> Apr 12 21:34:45
>>>> Apr 12 21:34:45  [75705.494976]  ffff88407fc8af00
>>>> Apr 12 21:34:45   ffff88407fc8b100
>>>> Apr 12 21:34:45   ffff883ff29a0000
>>>> Apr 12 21:34:45   ffffffff8101bc63
>>>> Apr 12 21:34:45
>>>> Apr 12 21:34:45  [75705.495064] Call Trace:
>>>> Apr 12 21:34:45  [75705.495094]  <NMI>
>>>> Apr 12 21:34:45   [<ffffffff812abdf3>] ? dump_stack+0x40/0x5d
>>>> Apr 12 21:34:45  [75705.495150]  [<ffffffff810cf5f5>] ?
>>>> watchdog_overflow_callback+0xb5/0xd0
>>>> Apr 12 21:34:45  [75705.495193]  [<ffffffff810fcea2>] ?
>>>> __perf_event_overflow+0x82/0x1c0
>>>> Apr 12 21:34:45  [75705.495237]  [<ffffffff8101bc63>] ?
>>>> intel_pmu_handle_irq+0x1c3/0x3e0
>>>> Apr 12 21:34:45  [75705.495284]  [<ffffffff8113b5cb>] ?
>>>> vunmap_page_range+0x1bb/0x320
>>>> Apr 12 21:34:45  [75705.495330]  [<ffffffff813213e0>] ?
>>>> ghes_copy_tofrom_phys+0x110/0x1d0
>>>> Apr 12 21:34:45  [75705.495373]  [<ffffffff81014f53>] ?
>>>> perf_event_nmi_handler+0x23/0x40
>>>> Apr 12 21:34:45  [75705.495418]  [<ffffffff81007b85>] ?
>>>> nmi_handle+0x65/0x100
>>>> Apr 12 21:34:45  [75705.495458]  [<ffffffff81007d2e>] ? do_nmi+0x10e/0x360
>>>> Apr 12 21:34:45  [75705.495497]  [<ffffffff8148f957>] ?
>>>> end_repeat_nmi+0x1a/0x1e
>>>> Apr 12 21:34:45  [75705.495540]  [<ffffffff810862ca>] ?
>>>> queued_spin_lock_slowpath+0xea/0x150
>>>> Apr 12 21:34:45  [75705.495581]  [<ffffffff810862ca>] ?
>>>> queued_spin_lock_slowpath+0xea/0x150
>>>> Apr 12 21:34:45  [75705.495621]  [<ffffffff810862ca>] ?
>>>> queued_spin_lock_slowpath+0xea/0x150
>>>> Apr 12 21:34:45  [75705.495661]  <<EOE>>
>>>> Apr 12 21:34:45   [<ffffffffa01b413b>] ? make_request+0x60b/0xbd0 [raid456]
>>>> Apr 12 21:34:45  [75705.495733]  [<ffffffff81282d87>] ?
>>>> blk_rq_init+0x87/0xa0
>>>> Apr 12 21:34:45  [75705.495771]  [<ffffffff81283e3c>] ?
>>>> get_request+0x29c/0x6e0
>>>> Apr 12 21:34:45  [75705.495812]  [<ffffffff810815c0>] ? wait_woken+0x80/0x80
>>>> Apr 12 21:34:45  [75705.495853]  [<ffffffffa017632d>] ?
>>>> md_make_request+0xdd/0x220 [md_mod]
>>>> Apr 12 21:34:45  [75705.495898]  [<ffffffff8128829e>] ?
>>>> blk_queue_bio+0x15e/0x350
>>>> Apr 12 21:34:45  [75705.495937]  [<ffffffff8128691d>] ?
>>>> generic_make_request+0xed/0x1d0
>>>> Apr 12 21:34:45  [75705.495978]  [<ffffffff81286a5a>] ?
>>>> submit_bio+0x5a/0x140
>>>> Apr 12 21:34:45  [75705.496018]  [<ffffffff811a215e>] ?
>>>> mpage_bio_submit+0x1e/0x30
>>>> Apr 12 21:34:45  [75705.496057]  [<ffffffff811a3076>] ?
>>>> mpage_readpages+0x106/0x130
>>>> Apr 12 21:34:45  [75705.496102]  [<ffffffff8121b510>] ?
>>>> __xfs_get_blocks+0x750/0x750
>>>> Apr 12 21:34:45  [75705.496144]  [<ffffffff8121b510>] ?
>>>> __xfs_get_blocks+0x750/0x750
>>>> Apr 12 21:34:45  [75705.496185]  [<ffffffff8114ad45>] ?
>>>> alloc_pages_current+0x85/0x110
>>>> Apr 12 21:34:45  [75705.496227]  [<ffffffff81111d25>] ?
>>>> __do_page_cache_readahead+0x165/0x1f0
>>>> Apr 12 21:34:45  [75705.496268]  [<ffffffff811344f5>] ? vma_link+0x75/0xb0
>>>> Apr 12 21:34:45  [75705.496307]  [<ffffffff811120eb>] ?
>>>> force_page_cache_readahead+0x9b/0xe0
>>>> Apr 12 21:34:45  [75705.496352]  [<ffffffff8113f876>] ?
>>>> madvise_willneed+0x76/0x140
>>>> Apr 12 21:34:45  [75705.496395]  [<ffffffff811301ce>] ?
>>>> handle_mm_fault+0x9ae/0x1650
>>>> Apr 12 21:34:45  [75705.496437]  [<ffffffff81133dcb>] ? find_vma+0x5b/0x70
>>>> Apr 12 21:34:45  [75705.496476]  [<ffffffff8113fc52>] ?
>>>> SyS_madvise+0x312/0x6f0
>>>> Apr 12 21:34:45  [75705.496515]  [<ffffffff8148d9db>] ?
>>>> entry_SYSCALL_64_fastpath+0x16/0x6e
>>>> Apr 12 21:34:47  [75707.118049] NMI watchdog: Watchdog detected hard LOCKUP
>>>> on cpu 15
>>>> Apr 12 21:34:47
>>>> Apr 12 21:34:47  [75707.118078] Modules linked in:
>>>> Apr 12 21:34:47   ipt_REJECT
>>>> Apr 12 21:34:47   nf_reject_ipv4
>>>> Apr 12 21:34:47   iptable_mangle
>>>> Apr 12 21:34:47   tun
>>>> Apr 12 21:34:47   netconsole
>>>> Apr 12 21:34:47   configfs
>>>> Apr 12 21:34:47   xt_multiport
>>>> Apr 12 21:34:47   ip6table_filter
>>>> Apr 12 21:34:47   ip6_tables
>>>> Apr 12 21:34:47   iptable_filter
>>>> Apr 12 21:34:47   ip_tables
>>>> Apr 12 21:34:47   x_tables
>>>> Apr 12 21:34:47   bridge
>>>> Apr 12 21:34:47   stp
>>>> Apr 12 21:34:47   llc
>>>> Apr 12 21:34:47   bonding
>>>> Apr 12 21:34:47   ext4
>>>> Apr 12 21:34:47   crc16
>>>> Apr 12 21:34:47   mbcache
>>>> Apr 12 21:34:47   jbd2
>>>> Apr 12 21:34:47   raid1
>>>> Apr 12 21:34:47   raid0
>>>> Apr 12 21:34:47   raid456
>>>> Apr 12 21:34:47   async_raid6_recov
>>>> Apr 12 21:34:47   async_memcpy
>>>> Apr 12 21:34:47   async_pq
>>>> Apr 12 21:34:47   async_xor
>>>> Apr 12 21:34:47   xor
>>>> Apr 12 21:34:47   async_tx
>>>> Apr 12 21:34:47   raid6_pq
>>>> Apr 12 21:34:47   md_mod
>>>> Apr 12 21:34:47   sr_mod
>>>> Apr 12 21:34:47   cdrom
>>>> Apr 12 21:34:47   usb_storage
>>>> Apr 12 21:34:47   hid_generic
>>>> Apr 12 21:34:47   usbhid
>>>> Apr 12 21:34:47   hid
>>>> Apr 12 21:34:47   sg
>>>> Apr 12 21:34:47   sd_mod
>>>> Apr 12 21:34:47   x86_pkg_temp_thermal
>>>> Apr 12 21:34:47   coretemp
>>>> Apr 12 21:34:47   crct10dif_pclmul
>>>> Apr 12 21:34:47   crc32_pclmul
>>>> Apr 12 21:34:47   crc32c_intel
>>>> Apr 12 21:34:47   jitterentropy_rng
>>>> Apr 12 21:34:47   sha256_ssse3
>>>> Apr 12 21:34:47   sha256_generic
>>>> Apr 12 21:34:47   hmac
>>>> Apr 12 21:34:47   iTCO_wdt
>>>> Apr 12 21:34:47   iTCO_vendor_support
>>>> Apr 12 21:34:47   drbg
>>>> Apr 12 21:34:47   ansi_cprng
>>>> Apr 12 21:34:47   aesni_intel
>>>> Apr 12 21:34:47   aes_x86_64
>>>> Apr 12 21:34:47   lrw
>>>> Apr 12 21:34:47   gf128mul
>>>> Apr 12 21:34:47   glue_helper
>>>> Apr 12 21:34:47   ablk_helper
>>>> Apr 12 21:34:47   cryptd
>>>> Apr 12 21:34:47   ahci
>>>> Apr 12 21:34:47   libahci
>>>> Apr 12 21:34:47   sb_edac
>>>> Apr 12 21:34:47   libata
>>>> Apr 12 21:34:47   igb
>>>> Apr 12 21:34:47   megaraid_sas
>>>> Apr 12 21:34:47   xhci_pci
>>>> Apr 12 21:34:47   ehci_pci
>>>> Apr 12 21:34:47   i2c_algo_bit
>>>> Apr 12 21:34:47   xhci_hcd
>>>> Apr 12 21:34:47   ehci_hcd
>>>> Apr 12 21:34:47   edac_core
>>>> Apr 12 21:34:47   ptp
>>>> Apr 12 21:34:47   mei_me
>>>> Apr 12 21:34:47   lpc_ich
>>>> Apr 12 21:34:47   i2c_i801
>>>> Apr 12 21:34:47   usbcore
>>>> Apr 12 21:34:47   pps_core
>>>> Apr 12 21:34:47   mfd_core
>>>> Apr 12 21:34:47   mei
>>>> Apr 12 21:34:47   usb_common
>>>> Apr 12 21:34:47   i2c_core
>>>> Apr 12 21:34:47   ioatdma
>>>> Apr 12 21:34:47   scsi_mod
>>>> Apr 12 21:34:47   dca
>>>> Apr 12 21:34:47   ipmi_si
>>>> Apr 12 21:34:47   ipmi_msghandler
>>>> Apr 12 21:34:47   acpi_power_meter
>>>> Apr 12 21:34:47   tpm_tis
>>>> Apr 12 21:34:47   tpm
>>>> Apr 12 21:34:47   processor
>>>> Apr 12 21:34:47   button
>>>> Apr 12 21:34:47
>>>> Apr 12 21:34:47  [75707.119088] CPU: 15 PID: 31940 Comm: main Not tainted
>>>> 4.4.1 #2
>>>> Apr 12 21:34:47  [75707.119134] Hardware name: Supermicro Super
>>>> Server/X10DRi-LN4+, BIOS 2.0 12/17/2015
>>>> Apr 12 21:34:47  [75707.119196]  0000000000000000
>>>> Apr 12 21:34:47   ffffffff812abdf3
>>>> Apr 12 21:34:47   0000000000000000
>>>> Apr 12 21:34:47   ffffffff810cf5f5
>>>> Apr 12 21:34:47
>>>> Apr 12 21:34:47  [75707.119277]  ffff883ff2a20000
>>>> Apr 12 21:34:47   ffffffff810fcea2
>>>> Apr 12 21:34:47   0000000000000001
>>>> Apr 12 21:34:47   ffff88407fce5e58
>>>> Apr 12 21:34:47
>>>> Apr 12 21:34:47  [75707.119360]  ffff88407fceaf00
>>>> Apr 12 21:34:47   ffff88407fceb100
>>>> Apr 12 21:34:47   ffff883ff2a20000
>>>> Apr 12 21:34:47   ffffffff8101bc63
>>>> Apr 12 21:34:47
>>>> Apr 12 21:34:47  [75707.119439] Call Trace:
>>>> Apr 12 21:34:47  [75707.119471]  <NMI>
>>>> Apr 12 21:34:47   [<ffffffff812abdf3>] ? dump_stack+0x40/0x5d
>>>> Apr 12 21:34:47  [75707.119527]  [<ffffffff810cf5f5>] ?
>>>> watchdog_overflow_callback+0xb5/0xd0
>>>> Apr 12 21:34:47  [75707.119571]  [<ffffffff810fcea2>] ?
>>>> __perf_event_overflow+0x82/0x1c0
>>>> Apr 12 21:34:47  [75707.119614]  [<ffffffff8101bc63>] ?
>>>> intel_pmu_handle_irq+0x1c3/0x3e0
>>>> Apr 12 21:34:47  [75707.119657]  [<ffffffff8113b5cb>] ?
>>>> vunmap_page_range+0x1bb/0x320
>>>> Apr 12 21:34:47  [75707.119703]  [<ffffffff813213e0>] ?
>>>> ghes_copy_tofrom_phys+0x110/0x1d0
>>>> Apr 12 21:34:47  [75707.119758]  [<ffffffff81014f53>] ?
>>>> perf_event_nmi_handler+0x23/0x40
>>>> Apr 12 21:34:47  [75707.119800]  [<ffffffff81007b85>] ?
>>>> nmi_handle+0x65/0x100
>>>> Apr 12 21:34:47  [75707.119838]  [<ffffffff81007d2e>] ? do_nmi+0x10e/0x360
>>>> Apr 12 21:34:47  [75707.119878]  [<ffffffff8148f957>] ?
>>>> end_repeat_nmi+0x1a/0x1e
>>>> Apr 12 21:34:47  [75707.119920]  [<ffffffff810862ca>] ?
>>>> queued_spin_lock_slowpath+0xea/0x150
>>>> Apr 12 21:34:47  [75707.119962]  [<ffffffff810862ca>] ?
>>>> queued_spin_lock_slowpath+0xea/0x150
>>>> Apr 12 21:34:47  [75707.120002]  [<ffffffff810862ca>] ?
>>>> queued_spin_lock_slowpath+0xea/0x150
>>>> Apr 12 21:34:47  [75707.120042]  <<EOE>>
>>>> Apr 12 21:34:47   [<ffffffffa01b413b>] ? make_request+0x60b/0xbd0 [raid456]
>>>> Apr 12 21:34:47  [75707.120113]  [<ffffffff810815c0>] ? wait_woken+0x80/0x80
>>>> Apr 12 21:34:47  [75707.120152]  [<ffffffffa017632d>] ?
>>>> md_make_request+0xdd/0x220 [md_mod]
>>>> Apr 12 21:34:47  [75707.120195]  [<ffffffff8128691d>] ?
>>>> generic_make_request+0xed/0x1d0
>>>> Apr 12 21:34:47  [75707.120236]  [<ffffffff81286a5a>] ?
>>>> submit_bio+0x5a/0x140
>>>> Apr 12 21:34:47  [75707.120277]  [<ffffffff8112afaf>] ?
>>>> workingset_refault+0x4f/0xa0
>>>> Apr 12 21:34:47  [75707.120320]  [<ffffffff811a215e>] ?
>>>> mpage_bio_submit+0x1e/0x30
>>>> Apr 12 21:34:47  [75707.120359]  [<ffffffff811a3076>] ?
>>>> mpage_readpages+0x106/0x130
>>>> Apr 12 21:34:47  [75707.120401]  [<ffffffff8121b510>] ?
>>>> __xfs_get_blocks+0x750/0x750
>>>> Apr 12 21:34:47  [75707.120439]  [<ffffffff8121b510>] ?
>>>> __xfs_get_blocks+0x750/0x750
>>>> Apr 12 21:34:47  [75707.120481]  [<ffffffff8114ad45>] ?
>>>> alloc_pages_current+0x85/0x110
>>>> Apr 12 21:34:47  [75707.120523]  [<ffffffff81111d25>] ?
>>>> __do_page_cache_readahead+0x165/0x1f0
>>>> Apr 12 21:34:47  [75707.120564]  [<ffffffff811344f5>] ? vma_link+0x75/0xb0
>>>> Apr 12 21:34:47  [75707.120602]  [<ffffffff811120c7>] ?
>>>> force_page_cache_readahead+0x77/0xe0
>>>> Apr 12 21:34:47  [75707.120644]  [<ffffffff8113f876>] ?
>>>> madvise_willneed+0x76/0x140
>>>> Apr 12 21:34:47  [75707.120683]  [<ffffffff811301ce>] ?
>>>> handle_mm_fault+0x9ae/0x1650
>>>> Apr 12 21:34:47  [75707.120722]  [<ffffffff81133dcb>] ? find_vma+0x5b/0x70
>>>> Apr 12 21:34:47  [75707.120760]  [<ffffffff8113fc52>] ?
>>>> SyS_madvise+0x312/0x6f0
>>>> Apr 12 21:34:47  [75707.120799]  [<ffffffff8148d9db>] ?
>>>> entry_SYSCALL_64_fastpath+0x16/0x6e
>>>>
>>>> Once this starts, a couple of minutes goes by and the machine locks up
>>>> completely.
>>>>
>>>> I have been unable to locate the problem here, anyone that can point me in
>>>> the right direction?
>>>>
>>>> Best regards
>>>> --
>>>> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
>>>> the body of a message to majordomo@vger.kernel.org
>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Daniel> --
> Daniel> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> Daniel> the body of a message to majordomo@vger.kernel.org
> Daniel> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2016-04-21 22:47 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-04-12 21:54 Hard CPU Lockup when accessing MD RAID5 Daniel Walker
2016-04-13 17:00 ` Shaohua Li
2016-04-20  6:52   ` Daniel Walker
2016-04-20 15:29     ` John Stoffel
2016-04-21 22:47       ` Daniel Walker

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.