All of lore.kernel.org
 help / color / mirror / Atom feed
* kernel panic / NULL pointer dereference
@ 2012-05-10 15:45 Bernd Schubert
  2012-05-10 16:17 ` Eric Sandeen
  2012-05-10 16:43   ` Bernd Schubert
  0 siblings, 2 replies; 20+ messages in thread
From: Bernd Schubert @ 2012-05-10 15:45 UTC (permalink / raw)
  To: linux-xfs

Hi all,

I'm just playing with an SRP connected NetApp system and just got an XFS 
related kernel panic. I guess it is due to large IO (32MiB). At least it 
just came up after enabling 32MiB device max_sectors.
As the tests are running in a RHEL6 image and as I needed at least 
2.6.39 to get a large srp_tablsize with SRP, I simply installed the 
lasted oracle uek kernel. If needed I'm going to update to a vanilla 
version.


> May 10 17:31:49 sgi01 kernel: XFS (sdb): Mounting Filesystem
> May 10 17:31:49 sgi01 kernel: XFS (sdb): Ending clean mount
> May 10 17:33:00 sgi01 kernel: BUG: unable to handle kernel NULL pointer dereference at           (null)
> May 10 17:33:00 sgi01 kernel: IP: [<ffffffffa07f5483>] xfs_alloc_ioend_bio+0x33/0x50 [xfs]
> May 10 17:33:00 sgi01 kernel: PGD 0
> May 10 17:33:00 sgi01 kernel: Oops: 0002 [#1] SMP
> May 10 17:33:00 sgi01 kernel: CPU 16
> May 10 17:33:00 sgi01 kernel: Modules linked in: xfs ib_srp scsi_dh_rdac scsi_transport_srp ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_connt
> rack ip6table_filter ip6_tables ib_ucm iw_cxgb4 iw_cxgb3 rdma_ucm rdma_cm iw_cm ib_addr ib_ipoib ib_cm ib_sa ib_uverbs ib_umad mlx4_ib mlx4_en mlx4_core i
> b_mthca ib_mad ib_core dm_round_robin isci libsas sg microcode qla2xxx scsi_transport_fc scsi_tgt pcspkr ghes hed wmi i2c_i801 i2c_core iTCO_wdt iTCO_vend
> or_support qla3xxx cciss e1000e megaraid_sas aacraid aic79xx aic7xxx ata_piix mptspi scsi_transport_spi mptsas mptscsih mptbase arcmsr sata_nv sata_svw 3w
> _9xxx 3w_xxxx bnx2 forcedeth ext4 jbd2 ext3 jbd mbcache sata_sil tg3 e1000 nfs lockd fscache auth_rpcgss nfs_acl sunrpc sd_mod crc_t10dif mpt2sas scsi_tra
> nsport_sas raid_class ahci libahci igb dca dm_multipath dm_mirror dm_region_hash dm_log dm_mod be2iscsi bnx2i cnic uio ipv6 cxgb4i cxgb4 cxgb3i libcxgbi c
> xgb3 mdio libiscsi_tcp qla4xxx iscsi_boot_sysfs libiscsi scsi_transport_iscsi [last unloaded: ib_srp]
> May 10 17:33:00 sgi01 kernel:
> May 10 17:33:00 sgi01 kernel: Pid: 76245, comm: flush-8:16 Not tainted 2.6.39-100.6.1.el6uek.x86_64 #1 SGI.COM SUMMIT/S2600GZ
> May 10 17:33:00 sgi01 kernel: RIP: 0010:[<ffffffffa07f5483>]  [<ffffffffa07f5483>] xfs_alloc_ioend_bio+0x33/0x50 [xfs]
> May 10 17:33:00 sgi01 kernel: RSP: 0018:ffff8806687ff8b0  EFLAGS: 00010206
> May 10 17:33:00 sgi01 kernel: RAX: 0000000000000000 RBX: ffff8807a9e3b5a8 RCX: ffff88080ed51b80
> May 10 17:33:00 sgi01 kernel: RDX: 00000000006c6800 RSI: ffff880669c34780 RDI: 0000000000000282
> May 10 17:33:00 sgi01 kernel: RBP: ffff8806687ff8c0 R08: 1e00000000000000 R09: 0000000000000002
> May 10 17:33:00 sgi01 kernel: R10: ffff88083ffece00 R11: 000000000000006c R12: 0000000000000000
> May 10 17:33:00 sgi01 kernel: R13: ffff88080b422f28 R14: ffff8806687ffd20 R15: 0000000000000000
> May 10 17:33:00 sgi01 kernel: FS:  0000000000000000(0000) GS:ffff88083f700000(0000) knlGS:0000000000000000
> May 10 17:33:00 sgi01 kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> May 10 17:33:00 sgi01 kernel: CR2: 0000000000000000 CR3: 0000000001761000 CR4: 00000000000406e0
> May 10 17:33:00 sgi01 kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> May 10 17:33:00 sgi01 kernel: DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> May 10 17:33:00 sgi01 kernel: Process flush-8:16 (pid: 76245, threadinfo ffff8806687fe000, task ffff8806687d2400)
> May 10 17:33:00 sgi01 kernel: Stack:
> May 10 17:33:00 sgi01 kernel: ffffea00133f70f0 ffff8807a9e3b5a8 ffff8806687ff910 ffffffffa07f561e
> May 10 17:33:00 sgi01 kernel: ffffea00133f69f0 ffffea00133f5d78 0000000000000000 ffff8807a9e3b5a8
> May 10 17:33:00 sgi01 kernel: ffffea0012ba90a8 ffff8806e0137190 0000000000000000 ffff88080b422f28
> May 10 17:33:00 sgi01 kernel: Call Trace:
> May 10 17:33:00 sgi01 kernel: [<ffffffffa07f561e>] xfs_submit_ioend+0xfe/0x110 [xfs]
> May 10 17:33:00 sgi01 kernel: [<ffffffffa07f696b>] xfs_vm_writepage+0x26b/0x510 [xfs]
> May 10 17:33:00 sgi01 kernel: [<ffffffff81112377>] __writepage+0x17/0x40
> May 10 17:33:00 sgi01 kernel: [<ffffffff81113696>] write_cache_pages+0x246/0x520
> May 10 17:33:00 sgi01 kernel: [<ffffffff81112360>] ? set_page_dirty+0x70/0x70
> May 10 17:33:00 sgi01 kernel: [<ffffffff811139c1>] generic_writepages+0x51/0x80
> May 10 17:33:00 sgi01 kernel: [<ffffffffa07f537d>] xfs_vm_writepages+0x5d/0x80 [xfs]
> May 10 17:33:00 sgi01 kernel: [<ffffffff81113a11>] do_writepages+0x21/0x40
> May 10 17:33:00 sgi01 kernel: [<ffffffff8118df2e>] writeback_single_inode+0x10e/0x270
> May 10 17:33:00 sgi01 kernel: [<ffffffff8118e333>] writeback_sb_inodes+0xe3/0x1b0
> May 10 17:33:00 sgi01 kernel: [<ffffffff8118e4a4>] writeback_inodes_wb+0xa4/0x170
> May 10 17:33:00 sgi01 kernel: [<ffffffff8118e863>] wb_writeback+0x2f3/0x430
> May 10 17:33:00 sgi01 kernel: [<ffffffff814fb28f>] ? _raw_spin_lock_irqsave+0x2f/0x40
> May 10 17:33:00 sgi01 kernel: [<ffffffff811129ba>] ? determine_dirtyable_memory+0x1a/0x30
> May 10 17:33:00 sgi01 kernel: [<ffffffff8118eafb>] wb_do_writeback+0x15b/0x280
> May 10 17:33:00 sgi01 kernel: [<ffffffff8118ecca>] bdi_writeback_thread+0xaa/0x270
> May 10 17:33:00 sgi01 kernel: [<ffffffff8118ec20>] ? wb_do_writeback+0x280/0x280
> May 10 17:33:00 sgi01 kernel: [<ffffffff81089ef6>] kthread+0x96/0xa0
> May 10 17:33:00 sgi01 kernel: [<ffffffff815046a4>] kernel_thread_helper+0x4/0x10
> May 10 17:33:00 sgi01 kernel: [<ffffffff81089e60>] ? kthread_worker_fn+0x1a0/0x1a0
> May 10 17:33:00 sgi01 kernel: [<ffffffff815046a0>] ? gs_change+0x13/0x13
> May 10 17:33:00 sgi01 kernel: Code: 08 66 66 66 66 90 48 89 fb 48 8b 7f 30 e8 56 3a 9a e0 bf 10 00 00 00 89 c6 e8 ca 56 9a e0 48 8b 53 20 48 c1 ea 09 48 0f af 53 18
> May 10 17:33:00 sgi01 kernel: RIP  [<ffffffffa07f5483>] xfs_alloc_ioend_bio+0x33/0x50 [xfs]
> May 10 17:33:00 sgi01 kernel: RSP <ffff8806687ff8b0>
> May 10 17:33:00 sgi01 kernel: CR2: 0000000000000000
> May 10 17:33:00 sgi01 kernel: ---[ end trace e6b492c98aa66902 ]---
> May 10 17:33:00 sgi01 kernel: Kernel panic - not syncing: Fatal exception
> May 10 17:33:00 sgi01 kernel: Pid: 76245, comm: flush-8:16 Tainted: G      D     2.6.39-100.6.1.el6uek.x86_64 #1
> May 10 17:33:00 sgi01 kernel: Call Trace:
> May 10 17:33:00 sgi01 kernel: [<ffffffff814f83c6>] panic+0x91/0x1a8



Any idea or do I need to dig myself?

Thanks,
Bernd

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: kernel panic / NULL pointer dereference
  2012-05-10 15:45 kernel panic / NULL pointer dereference Bernd Schubert
@ 2012-05-10 16:17 ` Eric Sandeen
  2012-05-10 16:43   ` Bernd Schubert
  1 sibling, 0 replies; 20+ messages in thread
From: Eric Sandeen @ 2012-05-10 16:17 UTC (permalink / raw)
  To: Bernd Schubert; +Cc: linux-xfs

On 5/10/12 10:45 AM, Bernd Schubert wrote:
> Hi all,
> 
> I'm just playing with an SRP connected NetApp system and just got an XFS related kernel panic. I guess it is due to large IO (32MiB). At least it just came up after enabling 32MiB device max_sectors.
> As the tests are running in a RHEL6 image and as I needed at least 2.6.39 to get a large srp_tablsize with SRP, I simply installed the lasted oracle uek kernel. If needed I'm going to update to a vanilla version.
> 
> 
>> May 10 17:31:49 sgi01 kernel: XFS (sdb): Mounting Filesystem
>> May 10 17:31:49 sgi01 kernel: XFS (sdb): Ending clean mount
>> May 10 17:33:00 sgi01 kernel: BUG: unable to handle kernel NULL pointer dereference at           (null)
>> May 10 17:33:00 sgi01 kernel: IP: [<ffffffffa07f5483>] xfs_alloc_ioend_bio+0x33/0x50 [xfs]

You'll probably need to disassemble that yourself to be sure where it blew up, but I'm guessing bio_alloc() failed.  Upstream,with GFP_NOIO, it's not supposed to happen thanks to mempools:

 *      If %__GFP_WAIT is set, then bio_alloc will always be able to allocate
 *      a bio. This is due to the mempool guarantees. To make this work, callers
 *      must never allocate more than 1 bio at a time from this pool. Callers
 *      that need to allocate more than 1 bio must always submit the previously
 *      allocated bio for IO before attempting to allocate a new one. Failure to
 *      do so can cause livelocks under memory pressure.

But, I don't know what's in your oracle kernel... can you hit it upstream?

-Eric

>> May 10 17:33:00 sgi01 kernel: PGD 0
>> May 10 17:33:00 sgi01 kernel: Oops: 0002 [#1] SMP
>> May 10 17:33:00 sgi01 kernel: CPU 16
>> May 10 17:33:00 sgi01 kernel: Modules linked in: xfs ib_srp scsi_dh_rdac scsi_transport_srp ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_connt
>> rack ip6table_filter ip6_tables ib_ucm iw_cxgb4 iw_cxgb3 rdma_ucm rdma_cm iw_cm ib_addr ib_ipoib ib_cm ib_sa ib_uverbs ib_umad mlx4_ib mlx4_en mlx4_core i
>> b_mthca ib_mad ib_core dm_round_robin isci libsas sg microcode qla2xxx scsi_transport_fc scsi_tgt pcspkr ghes hed wmi i2c_i801 i2c_core iTCO_wdt iTCO_vend
>> or_support qla3xxx cciss e1000e megaraid_sas aacraid aic79xx aic7xxx ata_piix mptspi scsi_transport_spi mptsas mptscsih mptbase arcmsr sata_nv sata_svw 3w
>> _9xxx 3w_xxxx bnx2 forcedeth ext4 jbd2 ext3 jbd mbcache sata_sil tg3 e1000 nfs lockd fscache auth_rpcgss nfs_acl sunrpc sd_mod crc_t10dif mpt2sas scsi_tra
>> nsport_sas raid_class ahci libahci igb dca dm_multipath dm_mirror dm_region_hash dm_log dm_mod be2iscsi bnx2i cnic uio ipv6 cxgb4i cxgb4 cxgb3i libcxgbi c
>> xgb3 mdio libiscsi_tcp qla4xxx iscsi_boot_sysfs libiscsi scsi_transport_iscsi [last unloaded: ib_srp]
>> May 10 17:33:00 sgi01 kernel:
>> May 10 17:33:00 sgi01 kernel: Pid: 76245, comm: flush-8:16 Not tainted 2.6.39-100.6.1.el6uek.x86_64 #1 SGI.COM SUMMIT/S2600GZ
>> May 10 17:33:00 sgi01 kernel: RIP: 0010:[<ffffffffa07f5483>]  [<ffffffffa07f5483>] xfs_alloc_ioend_bio+0x33/0x50 [xfs]
>> May 10 17:33:00 sgi01 kernel: RSP: 0018:ffff8806687ff8b0  EFLAGS: 00010206
>> May 10 17:33:00 sgi01 kernel: RAX: 0000000000000000 RBX: ffff8807a9e3b5a8 RCX: ffff88080ed51b80
>> May 10 17:33:00 sgi01 kernel: RDX: 00000000006c6800 RSI: ffff880669c34780 RDI: 0000000000000282
>> May 10 17:33:00 sgi01 kernel: RBP: ffff8806687ff8c0 R08: 1e00000000000000 R09: 0000000000000002
>> May 10 17:33:00 sgi01 kernel: R10: ffff88083ffece00 R11: 000000000000006c R12: 0000000000000000
>> May 10 17:33:00 sgi01 kernel: R13: ffff88080b422f28 R14: ffff8806687ffd20 R15: 0000000000000000
>> May 10 17:33:00 sgi01 kernel: FS:  0000000000000000(0000) GS:ffff88083f700000(0000) knlGS:0000000000000000
>> May 10 17:33:00 sgi01 kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
>> May 10 17:33:00 sgi01 kernel: CR2: 0000000000000000 CR3: 0000000001761000 CR4: 00000000000406e0
>> May 10 17:33:00 sgi01 kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>> May 10 17:33:00 sgi01 kernel: DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
>> May 10 17:33:00 sgi01 kernel: Process flush-8:16 (pid: 76245, threadinfo ffff8806687fe000, task ffff8806687d2400)
>> May 10 17:33:00 sgi01 kernel: Stack:
>> May 10 17:33:00 sgi01 kernel: ffffea00133f70f0 ffff8807a9e3b5a8 ffff8806687ff910 ffffffffa07f561e
>> May 10 17:33:00 sgi01 kernel: ffffea00133f69f0 ffffea00133f5d78 0000000000000000 ffff8807a9e3b5a8
>> May 10 17:33:00 sgi01 kernel: ffffea0012ba90a8 ffff8806e0137190 0000000000000000 ffff88080b422f28
>> May 10 17:33:00 sgi01 kernel: Call Trace:
>> May 10 17:33:00 sgi01 kernel: [<ffffffffa07f561e>] xfs_submit_ioend+0xfe/0x110 [xfs]
>> May 10 17:33:00 sgi01 kernel: [<ffffffffa07f696b>] xfs_vm_writepage+0x26b/0x510 [xfs]
>> May 10 17:33:00 sgi01 kernel: [<ffffffff81112377>] __writepage+0x17/0x40
>> May 10 17:33:00 sgi01 kernel: [<ffffffff81113696>] write_cache_pages+0x246/0x520
>> May 10 17:33:00 sgi01 kernel: [<ffffffff81112360>] ? set_page_dirty+0x70/0x70
>> May 10 17:33:00 sgi01 kernel: [<ffffffff811139c1>] generic_writepages+0x51/0x80
>> May 10 17:33:00 sgi01 kernel: [<ffffffffa07f537d>] xfs_vm_writepages+0x5d/0x80 [xfs]
>> May 10 17:33:00 sgi01 kernel: [<ffffffff81113a11>] do_writepages+0x21/0x40
>> May 10 17:33:00 sgi01 kernel: [<ffffffff8118df2e>] writeback_single_inode+0x10e/0x270
>> May 10 17:33:00 sgi01 kernel: [<ffffffff8118e333>] writeback_sb_inodes+0xe3/0x1b0
>> May 10 17:33:00 sgi01 kernel: [<ffffffff8118e4a4>] writeback_inodes_wb+0xa4/0x170
>> May 10 17:33:00 sgi01 kernel: [<ffffffff8118e863>] wb_writeback+0x2f3/0x430
>> May 10 17:33:00 sgi01 kernel: [<ffffffff814fb28f>] ? _raw_spin_lock_irqsave+0x2f/0x40
>> May 10 17:33:00 sgi01 kernel: [<ffffffff811129ba>] ? determine_dirtyable_memory+0x1a/0x30
>> May 10 17:33:00 sgi01 kernel: [<ffffffff8118eafb>] wb_do_writeback+0x15b/0x280
>> May 10 17:33:00 sgi01 kernel: [<ffffffff8118ecca>] bdi_writeback_thread+0xaa/0x270
>> May 10 17:33:00 sgi01 kernel: [<ffffffff8118ec20>] ? wb_do_writeback+0x280/0x280
>> May 10 17:33:00 sgi01 kernel: [<ffffffff81089ef6>] kthread+0x96/0xa0
>> May 10 17:33:00 sgi01 kernel: [<ffffffff815046a4>] kernel_thread_helper+0x4/0x10
>> May 10 17:33:00 sgi01 kernel: [<ffffffff81089e60>] ? kthread_worker_fn+0x1a0/0x1a0
>> May 10 17:33:00 sgi01 kernel: [<ffffffff815046a0>] ? gs_change+0x13/0x13
>> May 10 17:33:00 sgi01 kernel: Code: 08 66 66 66 66 90 48 89 fb 48 8b 7f 30 e8 56 3a 9a e0 bf 10 00 00 00 89 c6 e8 ca 56 9a e0 48 8b 53 20 48 c1 ea 09 48 0f af 53 18
>> May 10 17:33:00 sgi01 kernel: RIP  [<ffffffffa07f5483>] xfs_alloc_ioend_bio+0x33/0x50 [xfs]
>> May 10 17:33:00 sgi01 kernel: RSP <ffff8806687ff8b0>
>> May 10 17:33:00 sgi01 kernel: CR2: 0000000000000000
>> May 10 17:33:00 sgi01 kernel: ---[ end trace e6b492c98aa66902 ]---
>> May 10 17:33:00 sgi01 kernel: Kernel panic - not syncing: Fatal exception
>> May 10 17:33:00 sgi01 kernel: Pid: 76245, comm: flush-8:16 Tainted: G      D     2.6.39-100.6.1.el6uek.x86_64 #1
>> May 10 17:33:00 sgi01 kernel: Call Trace:
>> May 10 17:33:00 sgi01 kernel: [<ffffffff814f83c6>] panic+0x91/0x1a8
> 
> 
> 
> Any idea or do I need to dig myself?
> 
> Thanks,
> Bernd
> 
> _______________________________________________
> xfs mailing list
> xfs@oss.sgi.com
> http://oss.sgi.com/mailman/listinfo/xfs
> 

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: kernel panic / NULL pointer dereference
  2012-05-10 15:45 kernel panic / NULL pointer dereference Bernd Schubert
@ 2012-05-10 16:43   ` Bernd Schubert
  2012-05-10 16:43   ` Bernd Schubert
  1 sibling, 0 replies; 20+ messages in thread
From: Bernd Schubert @ 2012-05-10 16:43 UTC (permalink / raw)
  Cc: linux-fsdevel

On 05/10/2012 05:45 PM, Bernd Schubert wrote:
> Hi all,
> 
> I'm just playing with an SRP connected NetApp system and just got an XFS 
> related kernel panic. I guess it is due to large IO (32MiB). At least it 
> just came up after enabling 32MiB device max_sectors.
> As the tests are running in a RHEL6 image and as I needed at least 
> 2.6.39 to get a large srp_tablsize with SRP, I simply installed the 
> lasted oracle uek kernel. If needed I'm going to update to a vanilla 
> version.
> 
> 
>> May 10 17:31:49 sgi01 kernel: XFS (sdb): Mounting Filesystem
>> May 10 17:31:49 sgi01 kernel: XFS (sdb): Ending clean mount
>> May 10 17:33:00 sgi01 kernel: BUG: unable to handle kernel NULL 
>> pointer dereference at (null)
>> May 10 17:33:00 sgi01 kernel: IP: [<ffffffffa07f5483>] 
>> xfs_alloc_ioend_bio+0x33/0x50 [xfs]

Oh, there is a bio allocation path to return NULL:

bvec_alloc_bs(gfp_mask, nr_iovecs, ) => NULL when nr_iovecs  > BIO_MAX_PAGES
bio_alloc_bioset(gfp_mask, nr_iovecs, ...)
bio_alloc(GFP_NOIO, nvecs)
xfs_alloc_ioend_bio()

And nvecs/nr_iovecs is obtained by bio_get_nr_vecs(), which does not check for
BIO_MAX_PAGES. Of course, all of that only happens with large IO sizes, 
which is exactly what I'm doing.
As xfs_alloc_ioend_bio() is using GFP_NOIO it does not expect bio_alloc 
to fail, but as I'm trying to send large IOs I guess that is exactly what happens here.


>May 10 17:33:00 sgi01 kernel: [<ffffffffa07f561e>] xfs_submit_ioend+0xfe/0x110 [xfs]
>May 10 17:33:00 sgi01 kernel: [<ffffffffa07f696b>] xfs_vm_writepage+0x26b/0x510 [xfs]
>May 10 17:33:00 sgi01 kernel: [<ffffffff81112377>] __writepage+0x17/0x40
>May 10 17:33:00 sgi01 kernel: [<ffffffff81113696>] write_cache_pages+0x246/0x520
>May 10 17:33:00 sgi01 kernel: [<ffffffff81112360>] ? set_page_dirty+0x70/0x70
>May 10 17:33:00 sgi01 kernel: [<ffffffff811139c1>] generic_writepages+0x51/0x80
>May 10 17:33:00 sgi01 kernel: [<ffffffffa07f537d>] xfs_vm_writepages+0x5d/0x80 [xfs]
>May 10 17:33:00 sgi01 kernel: [<ffffffff81113a11>] do_writepages+0x21/0x40
>May 10 17:33:00 sgi01 kernel: [<ffffffff8118df2e>] writeback_single_inode+0x10e/0x270

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: kernel panic / NULL pointer dereference
@ 2012-05-10 16:43   ` Bernd Schubert
  0 siblings, 0 replies; 20+ messages in thread
From: Bernd Schubert @ 2012-05-10 16:43 UTC (permalink / raw)
  To: linux-xfs; +Cc: linux-fsdevel

On 05/10/2012 05:45 PM, Bernd Schubert wrote:
> Hi all,
> 
> I'm just playing with an SRP connected NetApp system and just got an XFS 
> related kernel panic. I guess it is due to large IO (32MiB). At least it 
> just came up after enabling 32MiB device max_sectors.
> As the tests are running in a RHEL6 image and as I needed at least 
> 2.6.39 to get a large srp_tablsize with SRP, I simply installed the 
> lasted oracle uek kernel. If needed I'm going to update to a vanilla 
> version.
> 
> 
>> May 10 17:31:49 sgi01 kernel: XFS (sdb): Mounting Filesystem
>> May 10 17:31:49 sgi01 kernel: XFS (sdb): Ending clean mount
>> May 10 17:33:00 sgi01 kernel: BUG: unable to handle kernel NULL 
>> pointer dereference at (null)
>> May 10 17:33:00 sgi01 kernel: IP: [<ffffffffa07f5483>] 
>> xfs_alloc_ioend_bio+0x33/0x50 [xfs]

Oh, there is a bio allocation path to return NULL:

bvec_alloc_bs(gfp_mask, nr_iovecs, ) => NULL when nr_iovecs  > BIO_MAX_PAGES
bio_alloc_bioset(gfp_mask, nr_iovecs, ...)
bio_alloc(GFP_NOIO, nvecs)
xfs_alloc_ioend_bio()

And nvecs/nr_iovecs is obtained by bio_get_nr_vecs(), which does not check for
BIO_MAX_PAGES. Of course, all of that only happens with large IO sizes, 
which is exactly what I'm doing.
As xfs_alloc_ioend_bio() is using GFP_NOIO it does not expect bio_alloc 
to fail, but as I'm trying to send large IOs I guess that is exactly what happens here.


>May 10 17:33:00 sgi01 kernel: [<ffffffffa07f561e>] xfs_submit_ioend+0xfe/0x110 [xfs]
>May 10 17:33:00 sgi01 kernel: [<ffffffffa07f696b>] xfs_vm_writepage+0x26b/0x510 [xfs]
>May 10 17:33:00 sgi01 kernel: [<ffffffff81112377>] __writepage+0x17/0x40
>May 10 17:33:00 sgi01 kernel: [<ffffffff81113696>] write_cache_pages+0x246/0x520
>May 10 17:33:00 sgi01 kernel: [<ffffffff81112360>] ? set_page_dirty+0x70/0x70
>May 10 17:33:00 sgi01 kernel: [<ffffffff811139c1>] generic_writepages+0x51/0x80
>May 10 17:33:00 sgi01 kernel: [<ffffffffa07f537d>] xfs_vm_writepages+0x5d/0x80 [xfs]
>May 10 17:33:00 sgi01 kernel: [<ffffffff81113a11>] do_writepages+0x21/0x40
>May 10 17:33:00 sgi01 kernel: [<ffffffff8118df2e>] writeback_single_inode+0x10e/0x270

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH] bio allocation failure due to bio_get_nr_vecs()
  2012-05-10 16:43   ` Bernd Schubert
@ 2012-05-11 13:49     ` Bernd Schubert
  -1 siblings, 0 replies; 20+ messages in thread
From: Bernd Schubert @ 2012-05-11 13:49 UTC (permalink / raw)
  To: Bernd Schubert
  Cc: linux-fsdevel, linux-xfs, sandeen, Kent Overstreet, Tejun Heo,
	Jens Axboe

>>> May 10 17:31:49 sgi01 kernel: XFS (sdb): Mounting Filesystem
>>> May 10 17:31:49 sgi01 kernel: XFS (sdb): Ending clean mount
>>> May 10 17:33:00 sgi01 kernel: BUG: unable to handle kernel NULL
>>> pointer dereference at (null)
>>> May 10 17:33:00 sgi01 kernel: IP: [<ffffffffa07f5483>]
>>> xfs_alloc_ioend_bio+0x33/0x50 [xfs]
> 
> Oh, there is a bio allocation path to return NULL:
> 
> bvec_alloc_bs(gfp_mask, nr_iovecs, ) =>  NULL when nr_iovecs>  BIO_MAX_PAGES
> bio_alloc_bioset(gfp_mask, nr_iovecs, ...)
> bio_alloc(GFP_NOIO, nvecs)
> xfs_alloc_ioend_bio()
> 
> And nvecs/nr_iovecs is obtained by bio_get_nr_vecs(), which does not check for
> BIO_MAX_PAGES. Of course, all of that only happens with large IO sizes,
> which is exactly what I'm doing.
> As xfs_alloc_ioend_bio() is using GFP_NOIO it does not expect bio_alloc
> to fail, but as I'm trying to send large IOs I guess that is exactly what happens here.

I see that Kent already fixed an overflow issue 
in commit 5abebfdd02450fa1349daacf242e70b3736581e3. But even with this commit, 
bio_get_nr_vecs() still only checks for queue_max_segments(). As we have a maximum 
of 2048 segments, that does not help much here.
After cherry-picking 5abebfdd02450fa1349daacf242e70b3736581e3 and applying the patch
below, I didn't run into panics / NULL pointer dereferences anymore.


bio: bio_get_nr_vecs() must not return more than BIO_MAX_PAGES

From: Bernd Schubert <bernd.schubert@itwm.fraunhofer.de>

The number of bio_get_nr_vecs() is passed down via bio_alloc() to
bvec_alloc_bs(), which fails the bio allocation if
nr_iovecs > BIO_MAX_PAGES. For the underlying caller this causes an
unexpected bio allocation failure.
Limiting to queue_max_segments() is not sufficient, as max_segments
also might be very large.

bvec_alloc_bs(gfp_mask, nr_iovecs, ) => NULL when nr_iovecs  > BIO_MAX_PAGES
bio_alloc_bioset(gfp_mask, nr_iovecs, ...)
bio_alloc(GFP_NOIO, nvecs)
xfs_alloc_ioend_bio()


Signed-off-by: Bernd Schubert <bernd.schubert@itwm.fraunhofer.de>
---
 fs/bio.c |    7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/fs/bio.c b/fs/bio.c
index e453924..84da885 100644
--- a/fs/bio.c
+++ b/fs/bio.c
@@ -505,9 +505,14 @@ EXPORT_SYMBOL(bio_clone);
 int bio_get_nr_vecs(struct block_device *bdev)
 {
 	struct request_queue *q = bdev_get_queue(bdev);
-	return min_t(unsigned,
+	int nr_pages;
+
+	nr_pages = min_t(unsigned,
 		     queue_max_segments(q),
 		     queue_max_sectors(q) / (PAGE_SIZE >> 9) + 1);
+
+	return min_t(unsigned, nr_pages, BIO_MAX_PAGES);
+
 }
 EXPORT_SYMBOL(bio_get_nr_vecs);
 

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH] bio allocation failure due to bio_get_nr_vecs()
@ 2012-05-11 13:49     ` Bernd Schubert
  0 siblings, 0 replies; 20+ messages in thread
From: Bernd Schubert @ 2012-05-11 13:49 UTC (permalink / raw)
  To: Bernd Schubert
  Cc: Jens Axboe, sandeen, linux-xfs, Tejun Heo, linux-fsdevel,
	Kent Overstreet

>>> May 10 17:31:49 sgi01 kernel: XFS (sdb): Mounting Filesystem
>>> May 10 17:31:49 sgi01 kernel: XFS (sdb): Ending clean mount
>>> May 10 17:33:00 sgi01 kernel: BUG: unable to handle kernel NULL
>>> pointer dereference at (null)
>>> May 10 17:33:00 sgi01 kernel: IP: [<ffffffffa07f5483>]
>>> xfs_alloc_ioend_bio+0x33/0x50 [xfs]
> 
> Oh, there is a bio allocation path to return NULL:
> 
> bvec_alloc_bs(gfp_mask, nr_iovecs, ) =>  NULL when nr_iovecs>  BIO_MAX_PAGES
> bio_alloc_bioset(gfp_mask, nr_iovecs, ...)
> bio_alloc(GFP_NOIO, nvecs)
> xfs_alloc_ioend_bio()
> 
> And nvecs/nr_iovecs is obtained by bio_get_nr_vecs(), which does not check for
> BIO_MAX_PAGES. Of course, all of that only happens with large IO sizes,
> which is exactly what I'm doing.
> As xfs_alloc_ioend_bio() is using GFP_NOIO it does not expect bio_alloc
> to fail, but as I'm trying to send large IOs I guess that is exactly what happens here.

I see that Kent already fixed an overflow issue 
in commit 5abebfdd02450fa1349daacf242e70b3736581e3. But even with this commit, 
bio_get_nr_vecs() still only checks for queue_max_segments(). As we have a maximum 
of 2048 segments, that does not help much here.
After cherry-picking 5abebfdd02450fa1349daacf242e70b3736581e3 and applying the patch
below, I didn't run into panics / NULL pointer dereferences anymore.


bio: bio_get_nr_vecs() must not return more than BIO_MAX_PAGES

From: Bernd Schubert <bernd.schubert@itwm.fraunhofer.de>

The number of bio_get_nr_vecs() is passed down via bio_alloc() to
bvec_alloc_bs(), which fails the bio allocation if
nr_iovecs > BIO_MAX_PAGES. For the underlying caller this causes an
unexpected bio allocation failure.
Limiting to queue_max_segments() is not sufficient, as max_segments
also might be very large.

bvec_alloc_bs(gfp_mask, nr_iovecs, ) => NULL when nr_iovecs  > BIO_MAX_PAGES
bio_alloc_bioset(gfp_mask, nr_iovecs, ...)
bio_alloc(GFP_NOIO, nvecs)
xfs_alloc_ioend_bio()


Signed-off-by: Bernd Schubert <bernd.schubert@itwm.fraunhofer.de>
---
 fs/bio.c |    7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/fs/bio.c b/fs/bio.c
index e453924..84da885 100644
--- a/fs/bio.c
+++ b/fs/bio.c
@@ -505,9 +505,14 @@ EXPORT_SYMBOL(bio_clone);
 int bio_get_nr_vecs(struct block_device *bdev)
 {
 	struct request_queue *q = bdev_get_queue(bdev);
-	return min_t(unsigned,
+	int nr_pages;
+
+	nr_pages = min_t(unsigned,
 		     queue_max_segments(q),
 		     queue_max_sectors(q) / (PAGE_SIZE >> 9) + 1);
+
+	return min_t(unsigned, nr_pages, BIO_MAX_PAGES);
+
 }
 EXPORT_SYMBOL(bio_get_nr_vecs);
 

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [PATCH] bio allocation failure due to bio_get_nr_vecs()
  2012-05-11 13:49     ` Bernd Schubert
@ 2012-05-11 14:06       ` Jeff Moyer
  -1 siblings, 0 replies; 20+ messages in thread
From: Jeff Moyer @ 2012-05-11 14:06 UTC (permalink / raw)
  To: Bernd Schubert
  Cc: linux-fsdevel, linux-xfs, sandeen, Kent Overstreet, Tejun Heo,
	Jens Axboe

Bernd Schubert <bernd.schubert@itwm.fraunhofer.de> writes:

> diff --git a/fs/bio.c b/fs/bio.c
> index e453924..84da885 100644
> --- a/fs/bio.c
> +++ b/fs/bio.c
> @@ -505,9 +505,14 @@ EXPORT_SYMBOL(bio_clone);
>  int bio_get_nr_vecs(struct block_device *bdev)
>  {
>  	struct request_queue *q = bdev_get_queue(bdev);
> -	return min_t(unsigned,
> +	int nr_pages;

Looks like a corrupt patch.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] bio allocation failure due to bio_get_nr_vecs()
@ 2012-05-11 14:06       ` Jeff Moyer
  0 siblings, 0 replies; 20+ messages in thread
From: Jeff Moyer @ 2012-05-11 14:06 UTC (permalink / raw)
  To: Bernd Schubert
  Cc: Jens Axboe, sandeen, linux-xfs, Tejun Heo, linux-fsdevel,
	Kent Overstreet

Bernd Schubert <bernd.schubert@itwm.fraunhofer.de> writes:

> diff --git a/fs/bio.c b/fs/bio.c
> index e453924..84da885 100644
> --- a/fs/bio.c
> +++ b/fs/bio.c
> @@ -505,9 +505,14 @@ EXPORT_SYMBOL(bio_clone);
>  int bio_get_nr_vecs(struct block_device *bdev)
>  {
>  	struct request_queue *q = bdev_get_queue(bdev);
> -	return min_t(unsigned,
> +	int nr_pages;

Looks like a corrupt patch.

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] bio allocation failure due to bio_get_nr_vecs()
  2012-05-11 14:06       ` Jeff Moyer
@ 2012-05-11 14:31         ` Bernd Schubert
  -1 siblings, 0 replies; 20+ messages in thread
From: Bernd Schubert @ 2012-05-11 14:31 UTC (permalink / raw)
  To: Jeff Moyer
  Cc: linux-fsdevel, linux-xfs, sandeen, Kent Overstreet, Tejun Heo,
	Jens Axboe

[-- Attachment #1: Type: text/plain, Size: 606 bytes --]

On 05/11/2012 04:06 PM, Jeff Moyer wrote:
> Bernd Schubert<bernd.schubert@itwm.fraunhofer.de>  writes:
>
>> diff --git a/fs/bio.c b/fs/bio.c
>> index e453924..84da885 100644
>> --- a/fs/bio.c
>> +++ b/fs/bio.c
>> @@ -505,9 +505,14 @@ EXPORT_SYMBOL(bio_clone);
>>   int bio_get_nr_vecs(struct block_device *bdev)
>>   {
>>   	struct request_queue *q = bdev_get_queue(bdev);
>> -	return min_t(unsigned,
>> +	int nr_pages;
>
> Looks like a corrupt patch.

What do you actually mean? Issue by thunderbird? I just saved the mail 
in my sent folder and it looks? Just to be sure, patch attached.


Thanks,
Bernd

[-- Attachment #2: fix-bio-nrvec.patch --]
[-- Type: text/x-patch, Size: 1242 bytes --]

bio: bio_get_nr_vecs() must not return more than BIO_MAX_PAGES

From: Bernd Schubert <bernd.schubert@itwm.fraunhofer.de>

The number of bio_get_nr_vecs() is passed down via bio_alloc() to
bvec_alloc_bs(), which fails the bio allocation if
nr_iovecs > BIO_MAX_PAGES. For the underlying caller this causes an
unexpected bio allocation failure.
Limiting to queue_max_segments() is not sufficiet, as max_segments
also might be very large.

bvec_alloc_bs(gfp_mask, nr_iovecs, ) => NULL when nr_iovecs  > BIO_MAX_PAGES
bio_alloc_bioset(gfp_mask, nr_iovecs, ...)
bio_alloc(GFP_NOIO, nvecs)
xfs_alloc_ioend_bio()


Signed-off-by: Bernd Schubert <bernd.schubert@itwm.fraunhofer.de>
---
 fs/bio.c |    7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/fs/bio.c b/fs/bio.c
index e453924..84da885 100644
--- a/fs/bio.c
+++ b/fs/bio.c
@@ -505,9 +505,14 @@ EXPORT_SYMBOL(bio_clone);
 int bio_get_nr_vecs(struct block_device *bdev)
 {
 	struct request_queue *q = bdev_get_queue(bdev);
-	return min_t(unsigned,
+	int nr_pages;
+
+	nr_pages = min_t(unsigned,
 		     queue_max_segments(q),
 		     queue_max_sectors(q) / (PAGE_SIZE >> 9) + 1);
+
+	return min_t(unsigned, nr_pages, BIO_MAX_PAGES);
+
 }
 EXPORT_SYMBOL(bio_get_nr_vecs);
 

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [PATCH] bio allocation failure due to bio_get_nr_vecs()
@ 2012-05-11 14:31         ` Bernd Schubert
  0 siblings, 0 replies; 20+ messages in thread
From: Bernd Schubert @ 2012-05-11 14:31 UTC (permalink / raw)
  To: Jeff Moyer
  Cc: Jens Axboe, sandeen, linux-xfs, Tejun Heo, linux-fsdevel,
	Kent Overstreet

[-- Attachment #1: Type: text/plain, Size: 606 bytes --]

On 05/11/2012 04:06 PM, Jeff Moyer wrote:
> Bernd Schubert<bernd.schubert@itwm.fraunhofer.de>  writes:
>
>> diff --git a/fs/bio.c b/fs/bio.c
>> index e453924..84da885 100644
>> --- a/fs/bio.c
>> +++ b/fs/bio.c
>> @@ -505,9 +505,14 @@ EXPORT_SYMBOL(bio_clone);
>>   int bio_get_nr_vecs(struct block_device *bdev)
>>   {
>>   	struct request_queue *q = bdev_get_queue(bdev);
>> -	return min_t(unsigned,
>> +	int nr_pages;
>
> Looks like a corrupt patch.

What do you actually mean? Issue by thunderbird? I just saved the mail 
in my sent folder and it looks? Just to be sure, patch attached.


Thanks,
Bernd

[-- Attachment #2: fix-bio-nrvec.patch --]
[-- Type: text/x-patch, Size: 1242 bytes --]

bio: bio_get_nr_vecs() must not return more than BIO_MAX_PAGES

From: Bernd Schubert <bernd.schubert@itwm.fraunhofer.de>

The number of bio_get_nr_vecs() is passed down via bio_alloc() to
bvec_alloc_bs(), which fails the bio allocation if
nr_iovecs > BIO_MAX_PAGES. For the underlying caller this causes an
unexpected bio allocation failure.
Limiting to queue_max_segments() is not sufficiet, as max_segments
also might be very large.

bvec_alloc_bs(gfp_mask, nr_iovecs, ) => NULL when nr_iovecs  > BIO_MAX_PAGES
bio_alloc_bioset(gfp_mask, nr_iovecs, ...)
bio_alloc(GFP_NOIO, nvecs)
xfs_alloc_ioend_bio()


Signed-off-by: Bernd Schubert <bernd.schubert@itwm.fraunhofer.de>
---
 fs/bio.c |    7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/fs/bio.c b/fs/bio.c
index e453924..84da885 100644
--- a/fs/bio.c
+++ b/fs/bio.c
@@ -505,9 +505,14 @@ EXPORT_SYMBOL(bio_clone);
 int bio_get_nr_vecs(struct block_device *bdev)
 {
 	struct request_queue *q = bdev_get_queue(bdev);
-	return min_t(unsigned,
+	int nr_pages;
+
+	nr_pages = min_t(unsigned,
 		     queue_max_segments(q),
 		     queue_max_sectors(q) / (PAGE_SIZE >> 9) + 1);
+
+	return min_t(unsigned, nr_pages, BIO_MAX_PAGES);
+
 }
 EXPORT_SYMBOL(bio_get_nr_vecs);
 

[-- Attachment #3: Type: text/plain, Size: 121 bytes --]

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [PATCH] bio allocation failure due to bio_get_nr_vecs()
  2012-05-11 14:06       ` Jeff Moyer
@ 2012-05-11 14:36         ` Jens Axboe
  -1 siblings, 0 replies; 20+ messages in thread
From: Jens Axboe @ 2012-05-11 14:36 UTC (permalink / raw)
  To: Jeff Moyer
  Cc: Bernd Schubert, linux-fsdevel, linux-xfs, sandeen,
	Kent Overstreet, Tejun Heo

On 05/11/2012 04:06 PM, Jeff Moyer wrote:
> Bernd Schubert <bernd.schubert@itwm.fraunhofer.de> writes:
> 
>> diff --git a/fs/bio.c b/fs/bio.c
>> index e453924..84da885 100644
>> --- a/fs/bio.c
>> +++ b/fs/bio.c
>> @@ -505,9 +505,14 @@ EXPORT_SYMBOL(bio_clone);
>>  int bio_get_nr_vecs(struct block_device *bdev)
>>  {
>>  	struct request_queue *q = bdev_get_queue(bdev);
>> -	return min_t(unsigned,
>> +	int nr_pages;
> 
> Looks like a corrupt patch.

It's fine, I think you are misreading the added and removed lines :-)

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] bio allocation failure due to bio_get_nr_vecs()
@ 2012-05-11 14:36         ` Jens Axboe
  0 siblings, 0 replies; 20+ messages in thread
From: Jens Axboe @ 2012-05-11 14:36 UTC (permalink / raw)
  To: Jeff Moyer
  Cc: linux-xfs, sandeen, Bernd Schubert, Tejun Heo, linux-fsdevel,
	Kent Overstreet

On 05/11/2012 04:06 PM, Jeff Moyer wrote:
> Bernd Schubert <bernd.schubert@itwm.fraunhofer.de> writes:
> 
>> diff --git a/fs/bio.c b/fs/bio.c
>> index e453924..84da885 100644
>> --- a/fs/bio.c
>> +++ b/fs/bio.c
>> @@ -505,9 +505,14 @@ EXPORT_SYMBOL(bio_clone);
>>  int bio_get_nr_vecs(struct block_device *bdev)
>>  {
>>  	struct request_queue *q = bdev_get_queue(bdev);
>> -	return min_t(unsigned,
>> +	int nr_pages;
> 
> Looks like a corrupt patch.

It's fine, I think you are misreading the added and removed lines :-)

-- 
Jens Axboe

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] bio allocation failure due to bio_get_nr_vecs()
  2012-05-11 13:49     ` Bernd Schubert
@ 2012-05-11 14:36       ` Jens Axboe
  -1 siblings, 0 replies; 20+ messages in thread
From: Jens Axboe @ 2012-05-11 14:36 UTC (permalink / raw)
  To: Bernd Schubert
  Cc: linux-fsdevel, linux-xfs, sandeen, Kent Overstreet, Tejun Heo

On 05/11/2012 03:49 PM, Bernd Schubert wrote:
> The number of bio_get_nr_vecs() is passed down via bio_alloc() to
> bvec_alloc_bs(), which fails the bio allocation if
> nr_iovecs > BIO_MAX_PAGES. For the underlying caller this causes an
> unexpected bio allocation failure.
> Limiting to queue_max_segments() is not sufficient, as max_segments
> also might be very large.
> 
> bvec_alloc_bs(gfp_mask, nr_iovecs, ) => NULL when nr_iovecs  > BIO_MAX_PAGES
> bio_alloc_bioset(gfp_mask, nr_iovecs, ...)
> bio_alloc(GFP_NOIO, nvecs)
> xfs_alloc_ioend_bio()

Thanks, looks sane. Applied.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] bio allocation failure due to bio_get_nr_vecs()
@ 2012-05-11 14:36       ` Jens Axboe
  0 siblings, 0 replies; 20+ messages in thread
From: Jens Axboe @ 2012-05-11 14:36 UTC (permalink / raw)
  To: Bernd Schubert
  Cc: linux-fsdevel, linux-xfs, Tejun Heo, sandeen, Kent Overstreet

On 05/11/2012 03:49 PM, Bernd Schubert wrote:
> The number of bio_get_nr_vecs() is passed down via bio_alloc() to
> bvec_alloc_bs(), which fails the bio allocation if
> nr_iovecs > BIO_MAX_PAGES. For the underlying caller this causes an
> unexpected bio allocation failure.
> Limiting to queue_max_segments() is not sufficient, as max_segments
> also might be very large.
> 
> bvec_alloc_bs(gfp_mask, nr_iovecs, ) => NULL when nr_iovecs  > BIO_MAX_PAGES
> bio_alloc_bioset(gfp_mask, nr_iovecs, ...)
> bio_alloc(GFP_NOIO, nvecs)
> xfs_alloc_ioend_bio()

Thanks, looks sane. Applied.

-- 
Jens Axboe

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] bio allocation failure due to bio_get_nr_vecs()
  2012-05-11 14:36       ` Jens Axboe
@ 2012-05-11 14:44         ` Bernd Schubert
  -1 siblings, 0 replies; 20+ messages in thread
From: Bernd Schubert @ 2012-05-11 14:44 UTC (permalink / raw)
  To: Jens Axboe; +Cc: linux-fsdevel, linux-xfs, sandeen, Kent Overstreet, Tejun Heo

On 05/11/2012 04:36 PM, Jens Axboe wrote:
> On 05/11/2012 03:49 PM, Bernd Schubert wrote:
>> The number of bio_get_nr_vecs() is passed down via bio_alloc() to
>> bvec_alloc_bs(), which fails the bio allocation if
>> nr_iovecs>  BIO_MAX_PAGES. For the underlying caller this causes an
>> unexpected bio allocation failure.
>> Limiting to queue_max_segments() is not sufficient, as max_segments
>> also might be very large.
>>
>> bvec_alloc_bs(gfp_mask, nr_iovecs, ) =>  NULL when nr_iovecs>  BIO_MAX_PAGES
>> bio_alloc_bioset(gfp_mask, nr_iovecs, ...)
>> bio_alloc(GFP_NOIO, nvecs)
>> xfs_alloc_ioend_bio()
>
> Thanks, looks sane. Applied.
>

Great, thanks! Should we CC linux-stable for commit 
5abebfdd02450fa1349daacf242e70b3736581e3 and this one, as I got a hard 
kernel panic?


Thanks,
Bernd

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] bio allocation failure due to bio_get_nr_vecs()
@ 2012-05-11 14:44         ` Bernd Schubert
  0 siblings, 0 replies; 20+ messages in thread
From: Bernd Schubert @ 2012-05-11 14:44 UTC (permalink / raw)
  To: Jens Axboe; +Cc: linux-fsdevel, linux-xfs, Tejun Heo, sandeen, Kent Overstreet

On 05/11/2012 04:36 PM, Jens Axboe wrote:
> On 05/11/2012 03:49 PM, Bernd Schubert wrote:
>> The number of bio_get_nr_vecs() is passed down via bio_alloc() to
>> bvec_alloc_bs(), which fails the bio allocation if
>> nr_iovecs>  BIO_MAX_PAGES. For the underlying caller this causes an
>> unexpected bio allocation failure.
>> Limiting to queue_max_segments() is not sufficient, as max_segments
>> also might be very large.
>>
>> bvec_alloc_bs(gfp_mask, nr_iovecs, ) =>  NULL when nr_iovecs>  BIO_MAX_PAGES
>> bio_alloc_bioset(gfp_mask, nr_iovecs, ...)
>> bio_alloc(GFP_NOIO, nvecs)
>> xfs_alloc_ioend_bio()
>
> Thanks, looks sane. Applied.
>

Great, thanks! Should we CC linux-stable for commit 
5abebfdd02450fa1349daacf242e70b3736581e3 and this one, as I got a hard 
kernel panic?


Thanks,
Bernd

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] bio allocation failure due to bio_get_nr_vecs()
  2012-05-11 14:44         ` Bernd Schubert
@ 2012-05-11 14:45           ` Jens Axboe
  -1 siblings, 0 replies; 20+ messages in thread
From: Jens Axboe @ 2012-05-11 14:45 UTC (permalink / raw)
  To: Bernd Schubert
  Cc: linux-fsdevel, linux-xfs, sandeen, Kent Overstreet, Tejun Heo

On 05/11/2012 04:44 PM, Bernd Schubert wrote:
> On 05/11/2012 04:36 PM, Jens Axboe wrote:
>> On 05/11/2012 03:49 PM, Bernd Schubert wrote:
>>> The number of bio_get_nr_vecs() is passed down via bio_alloc() to
>>> bvec_alloc_bs(), which fails the bio allocation if
>>> nr_iovecs>  BIO_MAX_PAGES. For the underlying caller this causes an
>>> unexpected bio allocation failure.
>>> Limiting to queue_max_segments() is not sufficient, as max_segments
>>> also might be very large.
>>>
>>> bvec_alloc_bs(gfp_mask, nr_iovecs, ) =>  NULL when nr_iovecs>  BIO_MAX_PAGES
>>> bio_alloc_bioset(gfp_mask, nr_iovecs, ...)
>>> bio_alloc(GFP_NOIO, nvecs)
>>> xfs_alloc_ioend_bio()
>>
>> Thanks, looks sane. Applied.
>>
> 
> Great, thanks! Should we CC linux-stable for commit
> 5abebfdd02450fa1349daacf242e70b3736581e3 and this one, as I got a hard
> kernel panic?

Yes, that's a good idea. I've ammended the commit now to include stable.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] bio allocation failure due to bio_get_nr_vecs()
@ 2012-05-11 14:45           ` Jens Axboe
  0 siblings, 0 replies; 20+ messages in thread
From: Jens Axboe @ 2012-05-11 14:45 UTC (permalink / raw)
  To: Bernd Schubert
  Cc: linux-fsdevel, linux-xfs, Tejun Heo, sandeen, Kent Overstreet

On 05/11/2012 04:44 PM, Bernd Schubert wrote:
> On 05/11/2012 04:36 PM, Jens Axboe wrote:
>> On 05/11/2012 03:49 PM, Bernd Schubert wrote:
>>> The number of bio_get_nr_vecs() is passed down via bio_alloc() to
>>> bvec_alloc_bs(), which fails the bio allocation if
>>> nr_iovecs>  BIO_MAX_PAGES. For the underlying caller this causes an
>>> unexpected bio allocation failure.
>>> Limiting to queue_max_segments() is not sufficient, as max_segments
>>> also might be very large.
>>>
>>> bvec_alloc_bs(gfp_mask, nr_iovecs, ) =>  NULL when nr_iovecs>  BIO_MAX_PAGES
>>> bio_alloc_bioset(gfp_mask, nr_iovecs, ...)
>>> bio_alloc(GFP_NOIO, nvecs)
>>> xfs_alloc_ioend_bio()
>>
>> Thanks, looks sane. Applied.
>>
> 
> Great, thanks! Should we CC linux-stable for commit
> 5abebfdd02450fa1349daacf242e70b3736581e3 and this one, as I got a hard
> kernel panic?

Yes, that's a good idea. I've ammended the commit now to include stable.

-- 
Jens Axboe

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] bio allocation failure due to bio_get_nr_vecs()
  2012-05-11 14:36         ` Jens Axboe
@ 2012-05-11 16:29           ` Jeff Moyer
  -1 siblings, 0 replies; 20+ messages in thread
From: Jeff Moyer @ 2012-05-11 16:29 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Bernd Schubert, linux-fsdevel, linux-xfs, sandeen,
	Kent Overstreet, Tejun Heo

Jens Axboe <axboe@kernel.dk> writes:

> On 05/11/2012 04:06 PM, Jeff Moyer wrote:
>> Bernd Schubert <bernd.schubert@itwm.fraunhofer.de> writes:
>> 
>>> diff --git a/fs/bio.c b/fs/bio.c
>>> index e453924..84da885 100644
>>> --- a/fs/bio.c
>>> +++ b/fs/bio.c
>>> @@ -505,9 +505,14 @@ EXPORT_SYMBOL(bio_clone);
>>>  int bio_get_nr_vecs(struct block_device *bdev)
>>>  {
>>>  	struct request_queue *q = bdev_get_queue(bdev);
>>> -	return min_t(unsigned,
>>> +	int nr_pages;
>> 
>> Looks like a corrupt patch.
>
> It's fine, I think you are misreading the added and removed lines :-)

Whoops, sorry!

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] bio allocation failure due to bio_get_nr_vecs()
@ 2012-05-11 16:29           ` Jeff Moyer
  0 siblings, 0 replies; 20+ messages in thread
From: Jeff Moyer @ 2012-05-11 16:29 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-xfs, sandeen, Bernd Schubert, Tejun Heo, linux-fsdevel,
	Kent Overstreet

Jens Axboe <axboe@kernel.dk> writes:

> On 05/11/2012 04:06 PM, Jeff Moyer wrote:
>> Bernd Schubert <bernd.schubert@itwm.fraunhofer.de> writes:
>> 
>>> diff --git a/fs/bio.c b/fs/bio.c
>>> index e453924..84da885 100644
>>> --- a/fs/bio.c
>>> +++ b/fs/bio.c
>>> @@ -505,9 +505,14 @@ EXPORT_SYMBOL(bio_clone);
>>>  int bio_get_nr_vecs(struct block_device *bdev)
>>>  {
>>>  	struct request_queue *q = bdev_get_queue(bdev);
>>> -	return min_t(unsigned,
>>> +	int nr_pages;
>> 
>> Looks like a corrupt patch.
>
> It's fine, I think you are misreading the added and removed lines :-)

Whoops, sorry!

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2012-05-11 18:09 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-05-10 15:45 kernel panic / NULL pointer dereference Bernd Schubert
2012-05-10 16:17 ` Eric Sandeen
2012-05-10 16:43 ` Bernd Schubert
2012-05-10 16:43   ` Bernd Schubert
2012-05-11 13:49   ` [PATCH] bio allocation failure due to bio_get_nr_vecs() Bernd Schubert
2012-05-11 13:49     ` Bernd Schubert
2012-05-11 14:06     ` Jeff Moyer
2012-05-11 14:06       ` Jeff Moyer
2012-05-11 14:31       ` Bernd Schubert
2012-05-11 14:31         ` Bernd Schubert
2012-05-11 14:36       ` Jens Axboe
2012-05-11 14:36         ` Jens Axboe
2012-05-11 16:29         ` Jeff Moyer
2012-05-11 16:29           ` Jeff Moyer
2012-05-11 14:36     ` Jens Axboe
2012-05-11 14:36       ` Jens Axboe
2012-05-11 14:44       ` Bernd Schubert
2012-05-11 14:44         ` Bernd Schubert
2012-05-11 14:45         ` Jens Axboe
2012-05-11 14:45           ` Jens Axboe

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.