All of lore.kernel.org
 help / color / mirror / Atom feed
* kernel BUG at drivers/iommu/intel-iommu.c:1767 on F16 3.1.1-2.fc16.x86_64
@ 2011-12-08 13:43 Albert Strasheim
       [not found] ` <CALfB72BHLmN_vkpz9j7mi4yQWaYfJOojqopJ6d4911vuf3WRAA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 18+ messages in thread
From: Albert Strasheim @ 2011-12-08 13:43 UTC (permalink / raw)
  To: linux-rdma-u79uwXL29TY76Z2rM5mHXA

Hello all

We've hit a kernel panic running 3.1.1-2.fc16.x86_64 on Fedora 16 when
registering 144 buffers of 32 MB each.

We're using libibverbs-1.1.5-5.fc16.x86_64.

Any help would be appreciated. Would a firmware upgrade make a difference?

ibv_devinfo output:

hca_id: mlx4_0
        transport:                      InfiniBand (0)
        fw_ver:                         2.8.600
        node_guid:                      0002:c903:0010:df98
        sys_image_guid:                 0002:c903:0010:df9b
        vendor_id:                      0x02c9
        vendor_part_id:                 26428
        hw_ver:                         0xB0
        board_id:                       MT_0FC0110009
        phys_port_cnt:                  2

Regards

Albert

[  597.407974] ------------[ cut here ]------------
[  597.412843] kernel BUG at drivers/iommu/intel-iommu.c:1767!
[  597.418652] invalid opcode: 0000 [#1] SMP
[  597.423114] CPU 0
[  597.424993] Modules linked in: binfmt_misc ses enclosure mlx4_ib
mlx4_en microcode serio_raw joydev i2c_i801 iTCO_wdt
iTCO_vendor_support ioatdma igb mpt2sas mlx4_core scsi_transport_sas
raid_class i7core_edac edac_core dca w83795 w83627ehf hwmon_vid
coretemp adm1021 i2c_core ib_ipoib ib_cm ib_addr ib_sa ib_uverbs
ib_umad ib_mad ib_core ipmi_poweroff ipmi_watchdog ipmi_devintf
ipmi_si ipmi_msghandler [last unloaded: scsi_wait_scan]
[  597.467309]
[  597.469040] Pid: 3789, comm: flowrouter Not tainted
3.1.1-2.fc16.x86_64 #1 Supermicro X8DTH-i/6/iF/6F/X8DTH
[  597.479379] RIP: 0010:[<ffffffff813c0542>]  [<ffffffff813c0542>]
__domain_mapping+0x41/0x251
[  597.488304] RSP: 0018:ffff8814ac599bf8  EFLAGS: 00010206
[  597.493849] RAX: 000000000fffffff RBX: ffff881674b93018 RCX: 0000000000000024
[  597.501210] RDX: ffff881674b93018 RSI: ffffffffffffff80 RDI: ffff88178dc04e00
[  597.508575] RBP: ffff8814ac599c68 R08: 000000000000007f R09: 0000000000000003
[  597.515936] R10: 00000000000162b7 R11: 0000000000016268 R12: ffff881674b93018
[  597.523297] R13: 000000000000007f R14: ffffffffffffff80 R15: 000000000000007f
[  597.530663] FS:  00007f6e3cb67700(0000) GS:ffff8817dfc00000(0000)
knlGS:0000000000000000
[  597.539170] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[  597.545154] CR2: 00007f6e34003038 CR3: 0000002e4b14f000 CR4: 00000000000006f0
[  597.552510] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  597.559870] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[  597.567230] Process flowrouter (pid: 3789, threadinfo
ffff8814ac598000, task ffff88168c82c590)
[  597.576263] Stack:
[  597.578513]  ffff88168c82c590 000000000000007f ffff88178dc04e00
ffff88178dc04e00
[  597.586585]  0000fffffffff000 000000000000007f 0000000000000000
ffffffffffffff80
[  597.594647]  000000000000007f ffff881674b93018 ffff88178dc04e00
000000000000007f
[  597.602700] Call Trace:
[  597.605387]  [<ffffffff813c1bef>] intel_map_sg+0x15c/0x1d6
[  597.611108]  [<ffffffffa003bbe6>] ib_umem_get+0x317/0x42d [ib_core]
[  597.617612]  [<ffffffffa0162305>] mlx4_ib_reg_user_mr+0x79/0x15b [mlx4_ib]
[  597.624710]  [<ffffffff81043ff3>] ? should_resched+0xe/0x2d
[  597.630512]  [<ffffffff814b5ad5>] ? _cond_resched+0xe/0x22
[  597.636232]  [<ffffffffa0062fe6>] ib_uverbs_reg_mr+0x144/0x29a [ib_uverbs]
[  597.643334]  [<ffffffffa00613c1>] ib_uverbs_write+0xb6/0xc1 [ib_uverbs]
[  597.650177]  [<ffffffff81129186>] vfs_write+0xac/0xf3
[  597.655463]  [<ffffffff81129375>] sys_write+0x4a/0x6e
[  597.660744]  [<ffffffff814bd902>] system_call_fastpath+0x16/0x1b
[  597.666973] Code: 48 89 4d c0 48 89 7d a8 49 89 d4 6b 4f 4c 09 48
89 75 c8 4d 89 c7 83 c1 12 83 f9 3f 7f 0f 4a 8d 44 06 ff 48 d3 e8 48
85 c0 74 02 <0f> 0b 41 f6 c1 03 0f 84 e9 01 00 00 41 81 e1 03 08 00 00
45 31
[  597.690327] RIP  [<ffffffff813c0542>] __domain_mapping+0x41/0x251
[  597.696704]  RSP <ffff8814ac599bf8>
[  597.700601] ---[ end trace bd543b01b0d3c89e ]---
[  597.705549] ------------[ cut here ]------------
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: kernel BUG at drivers/iommu/intel-iommu.c:1767 on F16 3.1.1-2.fc16.x86_64
  2011-12-08 13:43 kernel BUG at drivers/iommu/intel-iommu.c:1767 on F16 3.1.1-2.fc16.x86_64 Albert Strasheim
@ 2011-12-08 17:47     ` Roland Dreier
  0 siblings, 0 replies; 18+ messages in thread
From: Roland Dreier @ 2011-12-08 17:47 UTC (permalink / raw)
  To: Albert Strasheim; +Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, LKML, David Woodhouse

 > Any help would be appreciated. Would a firmware upgrade make a difference?

Almost certainly not a firmware issue.

 > [  597.412843] kernel BUG at drivers/iommu/intel-iommu.c:1767!

So this is

        BUG_ON(addr_width < BITS_PER_LONG && (iov_pfn + nr_pages - 1)
>> addr_width);

in __domain_mapping() I believe.  And we have:

        int addr_width = agaw_to_width(domain->agaw) - VTD_PAGE_SHIFT;

How much RAM does you system have?

I don't know too much about the low-level VT-d details, but is it
possible the setup code
is choosing a too small "guest address width" to cover all your memory?

drivers/net/ethernet/mellanox/mlx4/main.c has

        err = pci_set_dma_mask(pdev, DMA_BIT_MASK(64));
        if (err) {
                dev_warn(&pdev->dev, "Warning: couldn't set 64-bit PCI
DMA mask.\n");
                err = pci_set_dma_mask(pdev, DMA_BIT_MASK(32));
                if (err) {
                        dev_err(&pdev->dev, "Can't set PCI DMA mask,
aborting.\n");
                        goto err_release_regions;
                }
        }
        err = pci_set_consistent_dma_mask(pdev, DMA_BIT_MASK(64));
        if (err) {
                dev_warn(&pdev->dev, "Warning: couldn't set 64-bit "
                         "consistent PCI DMA mask.\n");
                err = pci_set_consistent_dma_mask(pdev, DMA_BIT_MASK(32));
                if (err) {
                        dev_err(&pdev->dev, "Can't set consistent PCI
DMA mask, "
                                "aborting.\n");
                        goto err_release_regions;
                }
        }

do you see any warnings in your kernel log about setting the PCI DMA mask?
(in any case that should only affect bus addresses, not "guest addresses")

David, any idea?

 - R.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: kernel BUG at drivers/iommu/intel-iommu.c:1767 on F16 3.1.1-2.fc16.x86_64
@ 2011-12-08 17:47     ` Roland Dreier
  0 siblings, 0 replies; 18+ messages in thread
From: Roland Dreier @ 2011-12-08 17:47 UTC (permalink / raw)
  To: Albert Strasheim; +Cc: linux-rdma, LKML, David Woodhouse

 > Any help would be appreciated. Would a firmware upgrade make a difference?

Almost certainly not a firmware issue.

 > [  597.412843] kernel BUG at drivers/iommu/intel-iommu.c:1767!

So this is

        BUG_ON(addr_width < BITS_PER_LONG && (iov_pfn + nr_pages - 1)
>> addr_width);

in __domain_mapping() I believe.  And we have:

        int addr_width = agaw_to_width(domain->agaw) - VTD_PAGE_SHIFT;

How much RAM does you system have?

I don't know too much about the low-level VT-d details, but is it
possible the setup code
is choosing a too small "guest address width" to cover all your memory?

drivers/net/ethernet/mellanox/mlx4/main.c has

        err = pci_set_dma_mask(pdev, DMA_BIT_MASK(64));
        if (err) {
                dev_warn(&pdev->dev, "Warning: couldn't set 64-bit PCI
DMA mask.\n");
                err = pci_set_dma_mask(pdev, DMA_BIT_MASK(32));
                if (err) {
                        dev_err(&pdev->dev, "Can't set PCI DMA mask,
aborting.\n");
                        goto err_release_regions;
                }
        }
        err = pci_set_consistent_dma_mask(pdev, DMA_BIT_MASK(64));
        if (err) {
                dev_warn(&pdev->dev, "Warning: couldn't set 64-bit "
                         "consistent PCI DMA mask.\n");
                err = pci_set_consistent_dma_mask(pdev, DMA_BIT_MASK(32));
                if (err) {
                        dev_err(&pdev->dev, "Can't set consistent PCI
DMA mask, "
                                "aborting.\n");
                        goto err_release_regions;
                }
        }

do you see any warnings in your kernel log about setting the PCI DMA mask?
(in any case that should only affect bus addresses, not "guest addresses")

David, any idea?

 - R.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: kernel BUG at drivers/iommu/intel-iommu.c:1767 on F16 3.1.1-2.fc16.x86_64
  2011-12-08 17:47     ` Roland Dreier
@ 2011-12-08 17:56         ` Albert Strasheim
  -1 siblings, 0 replies; 18+ messages in thread
From: Albert Strasheim @ 2011-12-08 17:56 UTC (permalink / raw)
  To: Roland Dreier; +Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, LKML, David Woodhouse

Hello

On Thu, Dec 8, 2011 at 7:47 PM, Roland Dreier <roland-BHEL68pLQRGGvPXPguhicg@public.gmane.org> wrote:
>  > [  597.412843] kernel BUG at drivers/iommu/intel-iommu.c:1767!
> So this is
>        BUG_ON(addr_width < BITS_PER_LONG && (iov_pfn + nr_pages - 1)
>>> addr_width);
>
> in __domain_mapping() I believe.  And we have:
>        int addr_width = agaw_to_width(domain->agaw) - VTD_PAGE_SHIFT;
> How much RAM does you system have?

192 GB.

A few gigabyte is reserved for 2 MB huge pages, if that matters.

> I don't know too much about the low-level VT-d details, but is it
> possible the setup code
> is choosing a too small "guest address width" to cover all your memory?
>
> drivers/net/ethernet/mellanox/mlx4/main.c has
>
>        err = pci_set_dma_mask(pdev, DMA_BIT_MASK(64));
>        if (err) {
>                dev_warn(&pdev->dev, "Warning: couldn't set 64-bit PCI
> DMA mask.\n");
>                err = pci_set_dma_mask(pdev, DMA_BIT_MASK(32));
>                if (err) {
>                        dev_err(&pdev->dev, "Can't set PCI DMA mask,
> aborting.\n");
>                        goto err_release_regions;
>                }
>        }
>        err = pci_set_consistent_dma_mask(pdev, DMA_BIT_MASK(64));
>        if (err) {
>                dev_warn(&pdev->dev, "Warning: couldn't set 64-bit "
>                         "consistent PCI DMA mask.\n");
>                err = pci_set_consistent_dma_mask(pdev, DMA_BIT_MASK(32));
>                if (err) {
>                        dev_err(&pdev->dev, "Can't set consistent PCI
> DMA mask, "
>                                "aborting.\n");
>                        goto err_release_regions;
>                }
>        }
>
> do you see any warnings in your kernel log about setting the PCI DMA mask?
> (in any case that should only affect bus addresses, not "guest addresses")

I don't see anything like what you mentioned here.

From the SAS controller:

mpt2sas0: 64 BIT PCI BUS DMA ADDRESSING SUPPORTED, total mem (198141716 kB)

Not seeing anything about PCI DMA errors.

I think the BIOS has VT-d enabled. dmesg says:

PCI-DMA: Intel(R) Virtualization Technology for Directed I/O

I can send you a full dmesg if that would help.

Regards

Albert
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: kernel BUG at drivers/iommu/intel-iommu.c:1767 on F16 3.1.1-2.fc16.x86_64
@ 2011-12-08 17:56         ` Albert Strasheim
  0 siblings, 0 replies; 18+ messages in thread
From: Albert Strasheim @ 2011-12-08 17:56 UTC (permalink / raw)
  To: Roland Dreier; +Cc: linux-rdma, LKML, David Woodhouse

Hello

On Thu, Dec 8, 2011 at 7:47 PM, Roland Dreier <roland@purestorage.com> wrote:
>  > [  597.412843] kernel BUG at drivers/iommu/intel-iommu.c:1767!
> So this is
>        BUG_ON(addr_width < BITS_PER_LONG && (iov_pfn + nr_pages - 1)
>>> addr_width);
>
> in __domain_mapping() I believe.  And we have:
>        int addr_width = agaw_to_width(domain->agaw) - VTD_PAGE_SHIFT;
> How much RAM does you system have?

192 GB.

A few gigabyte is reserved for 2 MB huge pages, if that matters.

> I don't know too much about the low-level VT-d details, but is it
> possible the setup code
> is choosing a too small "guest address width" to cover all your memory?
>
> drivers/net/ethernet/mellanox/mlx4/main.c has
>
>        err = pci_set_dma_mask(pdev, DMA_BIT_MASK(64));
>        if (err) {
>                dev_warn(&pdev->dev, "Warning: couldn't set 64-bit PCI
> DMA mask.\n");
>                err = pci_set_dma_mask(pdev, DMA_BIT_MASK(32));
>                if (err) {
>                        dev_err(&pdev->dev, "Can't set PCI DMA mask,
> aborting.\n");
>                        goto err_release_regions;
>                }
>        }
>        err = pci_set_consistent_dma_mask(pdev, DMA_BIT_MASK(64));
>        if (err) {
>                dev_warn(&pdev->dev, "Warning: couldn't set 64-bit "
>                         "consistent PCI DMA mask.\n");
>                err = pci_set_consistent_dma_mask(pdev, DMA_BIT_MASK(32));
>                if (err) {
>                        dev_err(&pdev->dev, "Can't set consistent PCI
> DMA mask, "
>                                "aborting.\n");
>                        goto err_release_regions;
>                }
>        }
>
> do you see any warnings in your kernel log about setting the PCI DMA mask?
> (in any case that should only affect bus addresses, not "guest addresses")

I don't see anything like what you mentioned here.

>From the SAS controller:

mpt2sas0: 64 BIT PCI BUS DMA ADDRESSING SUPPORTED, total mem (198141716 kB)

Not seeing anything about PCI DMA errors.

I think the BIOS has VT-d enabled. dmesg says:

PCI-DMA: Intel(R) Virtualization Technology for Directed I/O

I can send you a full dmesg if that would help.

Regards

Albert

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: kernel BUG at drivers/iommu/intel-iommu.c:1767 on F16 3.1.1-2.fc16.x86_64
  2011-12-08 17:56         ` Albert Strasheim
@ 2011-12-08 18:29             ` Roland Dreier
  -1 siblings, 0 replies; 18+ messages in thread
From: Roland Dreier @ 2011-12-08 18:29 UTC (permalink / raw)
  To: Albert Strasheim; +Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, LKML, David Woodhouse

On Thu, Dec 8, 2011 at 9:56 AM, Albert Strasheim <fullung-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
> I think the BIOS has VT-d enabled. dmesg says:
>
> PCI-DMA: Intel(R) Virtualization Technology for Directed I/O

Yes, you're crashing in the VT-d code.

If you don't care about that, you can boot with the kernel parameter
"intel_iommu=off"
as a workaround, but it would be nice to get to the bottom of this.

 - R.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: kernel BUG at drivers/iommu/intel-iommu.c:1767 on F16 3.1.1-2.fc16.x86_64
@ 2011-12-08 18:29             ` Roland Dreier
  0 siblings, 0 replies; 18+ messages in thread
From: Roland Dreier @ 2011-12-08 18:29 UTC (permalink / raw)
  To: Albert Strasheim; +Cc: linux-rdma, LKML, David Woodhouse

On Thu, Dec 8, 2011 at 9:56 AM, Albert Strasheim <fullung@gmail.com> wrote:
> I think the BIOS has VT-d enabled. dmesg says:
>
> PCI-DMA: Intel(R) Virtualization Technology for Directed I/O

Yes, you're crashing in the VT-d code.

If you don't care about that, you can boot with the kernel parameter
"intel_iommu=off"
as a workaround, but it would be nice to get to the bottom of this.

 - R.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: kernel BUG at drivers/iommu/intel-iommu.c:1767 on F16 3.1.1-2.fc16.x86_64
  2011-12-08 18:29             ` Roland Dreier
@ 2011-12-08 18:31                 ` Albert Strasheim
  -1 siblings, 0 replies; 18+ messages in thread
From: Albert Strasheim @ 2011-12-08 18:31 UTC (permalink / raw)
  To: Roland Dreier; +Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, LKML, David Woodhouse

Hello

On Thu, Dec 8, 2011 at 8:29 PM, Roland Dreier <roland-BHEL68pLQRGGvPXPguhicg@public.gmane.org> wrote:
> On Thu, Dec 8, 2011 at 9:56 AM, Albert Strasheim <fullung-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
>> I think the BIOS has VT-d enabled. dmesg says:
>> PCI-DMA: Intel(R) Virtualization Technology for Directed I/O
> Yes, you're crashing in the VT-d code.
> If you don't care about that, you can boot with the kernel parameter
> "intel_iommu=off"
> as a workaround, but it would be nice to get to the bottom of this.

I'm happy to test any patches or provide more information.

Regards

Albert
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: kernel BUG at drivers/iommu/intel-iommu.c:1767 on F16 3.1.1-2.fc16.x86_64
@ 2011-12-08 18:31                 ` Albert Strasheim
  0 siblings, 0 replies; 18+ messages in thread
From: Albert Strasheim @ 2011-12-08 18:31 UTC (permalink / raw)
  To: Roland Dreier; +Cc: linux-rdma, LKML, David Woodhouse

Hello

On Thu, Dec 8, 2011 at 8:29 PM, Roland Dreier <roland@purestorage.com> wrote:
> On Thu, Dec 8, 2011 at 9:56 AM, Albert Strasheim <fullung@gmail.com> wrote:
>> I think the BIOS has VT-d enabled. dmesg says:
>> PCI-DMA: Intel(R) Virtualization Technology for Directed I/O
> Yes, you're crashing in the VT-d code.
> If you don't care about that, you can boot with the kernel parameter
> "intel_iommu=off"
> as a workaround, but it would be nice to get to the bottom of this.

I'm happy to test any patches or provide more information.

Regards

Albert

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: kernel BUG at drivers/iommu/intel-iommu.c:1767 on F16 3.1.1-2.fc16.x86_64
  2011-12-08 18:31                 ` Albert Strasheim
@ 2011-12-20 10:47                     ` Albert Strasheim
  -1 siblings, 0 replies; 18+ messages in thread
From: Albert Strasheim @ 2011-12-20 10:47 UTC (permalink / raw)
  To: Roland Dreier; +Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, LKML, David Woodhouse

Any news on this one?

Regards

Albert

On Thu, Dec 8, 2011 at 8:31 PM, Albert Strasheim <fullung-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
> On Thu, Dec 8, 2011 at 8:29 PM, Roland Dreier <roland-BHEL68pLQRGGvPXPguhicg@public.gmane.org> wrote:
>> On Thu, Dec 8, 2011 at 9:56 AM, Albert Strasheim <fullung-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
>>> I think the BIOS has VT-d enabled. dmesg says:
>>> PCI-DMA: Intel(R) Virtualization Technology for Directed I/O
>> Yes, you're crashing in the VT-d code.
>> If you don't care about that, you can boot with the kernel parameter
>> "intel_iommu=off"
>> as a workaround, but it would be nice to get to the bottom of this.
>
> I'm happy to test any patches or provide more information.
>
> Regards
>
> Albert
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: kernel BUG at drivers/iommu/intel-iommu.c:1767 on F16 3.1.1-2.fc16.x86_64
@ 2011-12-20 10:47                     ` Albert Strasheim
  0 siblings, 0 replies; 18+ messages in thread
From: Albert Strasheim @ 2011-12-20 10:47 UTC (permalink / raw)
  To: Roland Dreier; +Cc: linux-rdma, LKML, David Woodhouse

Any news on this one?

Regards

Albert

On Thu, Dec 8, 2011 at 8:31 PM, Albert Strasheim <fullung@gmail.com> wrote:
> On Thu, Dec 8, 2011 at 8:29 PM, Roland Dreier <roland@purestorage.com> wrote:
>> On Thu, Dec 8, 2011 at 9:56 AM, Albert Strasheim <fullung@gmail.com> wrote:
>>> I think the BIOS has VT-d enabled. dmesg says:
>>> PCI-DMA: Intel(R) Virtualization Technology for Directed I/O
>> Yes, you're crashing in the VT-d code.
>> If you don't care about that, you can boot with the kernel parameter
>> "intel_iommu=off"
>> as a workaround, but it would be nice to get to the bottom of this.
>
> I'm happy to test any patches or provide more information.
>
> Regards
>
> Albert

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: kernel BUG at drivers/iommu/intel-iommu.c:1767 on F16 3.1.1-2.fc16.x86_64
  2011-12-20 10:47                     ` Albert Strasheim
@ 2012-01-19  8:57                         ` Albert Strasheim
  -1 siblings, 0 replies; 18+ messages in thread
From: Albert Strasheim @ 2012-01-19  8:57 UTC (permalink / raw)
  To: Roland Dreier; +Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, LKML, David Woodhouse

Hello again

On Tue, Dec 20, 2011 at 12:47 PM, Albert Strasheim <fullung-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
> Any news on this one?
> Regards
> Albert
> On Thu, Dec 8, 2011 at 8:31 PM, Albert Strasheim <fullung-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
>> On Thu, Dec 8, 2011 at 8:29 PM, Roland Dreier <roland-BHEL68pLQRGGvPXPguhicg@public.gmane.org> wrote:
>>> On Thu, Dec 8, 2011 at 9:56 AM, Albert Strasheim <fullung-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
>>>> I think the BIOS has VT-d enabled. dmesg says:
>>>> PCI-DMA: Intel(R) Virtualization Technology for Directed I/O
>>> Yes, you're crashing in the VT-d code.
>>> If you don't care about that, you can boot with the kernel parameter
>>> "intel_iommu=off"
>>> as a workaround, but it would be nice to get to the bottom of this.
>> I'm happy to test any patches or provide more information.

Just checking up on this issue. Is there any further testing or
information we can provide to help make a fix happen?

Regards

Albert
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: kernel BUG at drivers/iommu/intel-iommu.c:1767 on F16 3.1.1-2.fc16.x86_64
@ 2012-01-19  8:57                         ` Albert Strasheim
  0 siblings, 0 replies; 18+ messages in thread
From: Albert Strasheim @ 2012-01-19  8:57 UTC (permalink / raw)
  To: Roland Dreier; +Cc: linux-rdma, LKML, David Woodhouse

Hello again

On Tue, Dec 20, 2011 at 12:47 PM, Albert Strasheim <fullung@gmail.com> wrote:
> Any news on this one?
> Regards
> Albert
> On Thu, Dec 8, 2011 at 8:31 PM, Albert Strasheim <fullung@gmail.com> wrote:
>> On Thu, Dec 8, 2011 at 8:29 PM, Roland Dreier <roland@purestorage.com> wrote:
>>> On Thu, Dec 8, 2011 at 9:56 AM, Albert Strasheim <fullung@gmail.com> wrote:
>>>> I think the BIOS has VT-d enabled. dmesg says:
>>>> PCI-DMA: Intel(R) Virtualization Technology for Directed I/O
>>> Yes, you're crashing in the VT-d code.
>>> If you don't care about that, you can boot with the kernel parameter
>>> "intel_iommu=off"
>>> as a workaround, but it would be nice to get to the bottom of this.
>> I'm happy to test any patches or provide more information.

Just checking up on this issue. Is there any further testing or
information we can provide to help make a fix happen?

Regards

Albert

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: kernel BUG at drivers/iommu/intel-iommu.c:1767 on F16 3.1.1-2.fc16.x86_64
  2012-01-19  8:57                         ` Albert Strasheim
@ 2012-01-20  8:23                             ` Roland Dreier
  -1 siblings, 0 replies; 18+ messages in thread
From: Roland Dreier @ 2012-01-20  8:23 UTC (permalink / raw)
  To: Albert Strasheim; +Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, LKML, David Woodhouse

On Thu, Jan 19, 2012 at 12:57 AM, Albert Strasheim <fullung-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
> Just checking up on this issue. Is there any further testing or
> information we can provide to help make a fix happen?

I'm not likely to be much help on VT-d issues, but maybe it
would be useful to dump all the values in the BUG_ON if its
going to trigger, ie just before

       BUG_ON(addr_width < BITS_PER_LONG && (iov_pfn + nr_pages - 1)
>> addr_width);

add

       if (addr_width < BITS_PER_LONG && (iov_pfn + nr_pages - 1) >> addr_width)
           pr_err("VT-d BUG! addr_width %d < %d (iov_pfn 0x%lx
nr_pages %ld)\n", addr_width, BITS_PER_LONG, iov_pfn, nr_pages);

and report what that prints.

 - R.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: kernel BUG at drivers/iommu/intel-iommu.c:1767 on F16 3.1.1-2.fc16.x86_64
@ 2012-01-20  8:23                             ` Roland Dreier
  0 siblings, 0 replies; 18+ messages in thread
From: Roland Dreier @ 2012-01-20  8:23 UTC (permalink / raw)
  To: Albert Strasheim; +Cc: linux-rdma, LKML, David Woodhouse

On Thu, Jan 19, 2012 at 12:57 AM, Albert Strasheim <fullung@gmail.com> wrote:
> Just checking up on this issue. Is there any further testing or
> information we can provide to help make a fix happen?

I'm not likely to be much help on VT-d issues, but maybe it
would be useful to dump all the values in the BUG_ON if its
going to trigger, ie just before

       BUG_ON(addr_width < BITS_PER_LONG && (iov_pfn + nr_pages - 1)
>> addr_width);

add

       if (addr_width < BITS_PER_LONG && (iov_pfn + nr_pages - 1) >> addr_width)
           pr_err("VT-d BUG! addr_width %d < %d (iov_pfn 0x%lx
nr_pages %ld)\n", addr_width, BITS_PER_LONG, iov_pfn, nr_pages);

and report what that prints.

 - R.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: kernel BUG at drivers/iommu/intel-iommu.c:1767 on F16 3.1.1-2.fc16.x86_64
  2012-01-20  8:23                             ` Roland Dreier
@ 2012-01-20 19:02                                 ` Albert Strasheim
  -1 siblings, 0 replies; 18+ messages in thread
From: Albert Strasheim @ 2012-01-20 19:02 UTC (permalink / raw)
  To: Roland Dreier; +Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, LKML, David Woodhouse

Hello

On Fri, Jan 20, 2012 at 10:23 AM, Roland Dreier <roland-BHEL68pLQRGGvPXPguhicg@public.gmane.org> wrote:
> On Thu, Jan 19, 2012 at 12:57 AM, Albert Strasheim <fullung-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
>> Just checking up on this issue. Is there any further testing or
>> information we can provide to help make a fix happen?
> I'm not likely to be much help on VT-d issues, but maybe it
> would be useful to dump all the values in the BUG_ON if its
> going to trigger, ie just before

Just retested with 3.2.1-1.fc16.x86_64 and the bug seems to be gone.

I confirmed that my test program triggers the bug on 3.1.1-1.fc16.x86_64.

It seems a bunch of IOMMU fixes went in on 9 and 10 January, so it
seems to have fixed this problem in 3.2.

Thanks!

Regards

Albert
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: kernel BUG at drivers/iommu/intel-iommu.c:1767 on F16 3.1.1-2.fc16.x86_64
@ 2012-01-20 19:02                                 ` Albert Strasheim
  0 siblings, 0 replies; 18+ messages in thread
From: Albert Strasheim @ 2012-01-20 19:02 UTC (permalink / raw)
  To: Roland Dreier; +Cc: linux-rdma, LKML, David Woodhouse

Hello

On Fri, Jan 20, 2012 at 10:23 AM, Roland Dreier <roland@purestorage.com> wrote:
> On Thu, Jan 19, 2012 at 12:57 AM, Albert Strasheim <fullung@gmail.com> wrote:
>> Just checking up on this issue. Is there any further testing or
>> information we can provide to help make a fix happen?
> I'm not likely to be much help on VT-d issues, but maybe it
> would be useful to dump all the values in the BUG_ON if its
> going to trigger, ie just before

Just retested with 3.2.1-1.fc16.x86_64 and the bug seems to be gone.

I confirmed that my test program triggers the bug on 3.1.1-1.fc16.x86_64.

It seems a bunch of IOMMU fixes went in on 9 and 10 January, so it
seems to have fixed this problem in 3.2.

Thanks!

Regards

Albert

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: kernel BUG at drivers/iommu/intel-iommu.c:1767 on F16 3.1.1-2.fc16.x86_64
  2012-01-20 19:02                                 ` Albert Strasheim
  (?)
@ 2012-01-24 15:00                                 ` Josh Boyer
  -1 siblings, 0 replies; 18+ messages in thread
From: Josh Boyer @ 2012-01-24 15:00 UTC (permalink / raw)
  To: Albert Strasheim; +Cc: Roland Dreier, linux-rdma, LKML, David Woodhouse

On Fri, Jan 20, 2012 at 2:02 PM, Albert Strasheim <fullung@gmail.com> wrote:
> Hello
>
> On Fri, Jan 20, 2012 at 10:23 AM, Roland Dreier <roland@purestorage.com> wrote:
>> On Thu, Jan 19, 2012 at 12:57 AM, Albert Strasheim <fullung@gmail.com> wrote:
>>> Just checking up on this issue. Is there any further testing or
>>> information we can provide to help make a fix happen?
>> I'm not likely to be much help on VT-d issues, but maybe it
>> would be useful to dump all the values in the BUG_ON if its
>> going to trigger, ie just before
>
> Just retested with 3.2.1-1.fc16.x86_64 and the bug seems to be gone.
>
> I confirmed that my test program triggers the bug on 3.1.1-1.fc16.x86_64.
>
> It seems a bunch of IOMMU fixes went in on 9 and 10 January, so it
> seems to have fixed this problem in 3.2.

Possibly not.  Fedora disabled the Intel IOMMU by default in 3.1.6 (ish) and
newer, so if your problem was related to the IOMMU being enabled that might
explain why it went away.

You might try booting with intel_iommu=on and see if it really is gone.

josh

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2012-01-24 15:00 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-12-08 13:43 kernel BUG at drivers/iommu/intel-iommu.c:1767 on F16 3.1.1-2.fc16.x86_64 Albert Strasheim
     [not found] ` <CALfB72BHLmN_vkpz9j7mi4yQWaYfJOojqopJ6d4911vuf3WRAA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2011-12-08 17:47   ` Roland Dreier
2011-12-08 17:47     ` Roland Dreier
     [not found]     ` <CAL1RGDVh7gNTrteiEaGg=4fxW=p3ydZPVf4d1hgtwNT9xUdOMw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2011-12-08 17:56       ` Albert Strasheim
2011-12-08 17:56         ` Albert Strasheim
     [not found]         ` <CALfB72BUDiZOZY8f4wvYej5Ozur2FZN3h3FNgc78Ur=tUsSCmA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2011-12-08 18:29           ` Roland Dreier
2011-12-08 18:29             ` Roland Dreier
     [not found]             ` <CAL1RGDX9P7tmGTkByEid17FrSFwdAW+4=mb9eUxc_uHvmFOeSg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2011-12-08 18:31               ` Albert Strasheim
2011-12-08 18:31                 ` Albert Strasheim
     [not found]                 ` <CALfB72CAwzRe3u2+yQ15+coY9kNgFTZ_Be-b-+aMe6QvbZy-HA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2011-12-20 10:47                   ` Albert Strasheim
2011-12-20 10:47                     ` Albert Strasheim
     [not found]                     ` <CALfB72CLZMrS+gTR3=0QcGry4vA3wKmgbhjp9Lc06UZtVcxzZg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2012-01-19  8:57                       ` Albert Strasheim
2012-01-19  8:57                         ` Albert Strasheim
     [not found]                         ` <CALfB72A+A9UDieFtYOMLB9c+XHO8nt8Lr-2pkikaqM=0fJ4OAQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2012-01-20  8:23                           ` Roland Dreier
2012-01-20  8:23                             ` Roland Dreier
     [not found]                             ` <CAL1RGDVWDsNhHywkmzaGWxZ5ERYatXVh4fGFC_tmOkkLt3Dg+w-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2012-01-20 19:02                               ` Albert Strasheim
2012-01-20 19:02                                 ` Albert Strasheim
2012-01-24 15:00                                 ` Josh Boyer

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.