linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Re: kernel BUG at drivers/iommu/intel-iommu.c:1767 on F16 3.1.1-2.fc16.x86_64
       [not found] <CALfB72BHLmN_vkpz9j7mi4yQWaYfJOojqopJ6d4911vuf3WRAA@mail.gmail.com>
@ 2011-12-08 17:47 ` Roland Dreier
  2011-12-08 17:56   ` Albert Strasheim
  0 siblings, 1 reply; 9+ messages in thread
From: Roland Dreier @ 2011-12-08 17:47 UTC (permalink / raw)
  To: Albert Strasheim; +Cc: linux-rdma, LKML, David Woodhouse

 > Any help would be appreciated. Would a firmware upgrade make a difference?

Almost certainly not a firmware issue.

 > [  597.412843] kernel BUG at drivers/iommu/intel-iommu.c:1767!

So this is

        BUG_ON(addr_width < BITS_PER_LONG && (iov_pfn + nr_pages - 1)
>> addr_width);

in __domain_mapping() I believe.  And we have:

        int addr_width = agaw_to_width(domain->agaw) - VTD_PAGE_SHIFT;

How much RAM does you system have?

I don't know too much about the low-level VT-d details, but is it
possible the setup code
is choosing a too small "guest address width" to cover all your memory?

drivers/net/ethernet/mellanox/mlx4/main.c has

        err = pci_set_dma_mask(pdev, DMA_BIT_MASK(64));
        if (err) {
                dev_warn(&pdev->dev, "Warning: couldn't set 64-bit PCI
DMA mask.\n");
                err = pci_set_dma_mask(pdev, DMA_BIT_MASK(32));
                if (err) {
                        dev_err(&pdev->dev, "Can't set PCI DMA mask,
aborting.\n");
                        goto err_release_regions;
                }
        }
        err = pci_set_consistent_dma_mask(pdev, DMA_BIT_MASK(64));
        if (err) {
                dev_warn(&pdev->dev, "Warning: couldn't set 64-bit "
                         "consistent PCI DMA mask.\n");
                err = pci_set_consistent_dma_mask(pdev, DMA_BIT_MASK(32));
                if (err) {
                        dev_err(&pdev->dev, "Can't set consistent PCI
DMA mask, "
                                "aborting.\n");
                        goto err_release_regions;
                }
        }

do you see any warnings in your kernel log about setting the PCI DMA mask?
(in any case that should only affect bus addresses, not "guest addresses")

David, any idea?

 - R.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: kernel BUG at drivers/iommu/intel-iommu.c:1767 on F16 3.1.1-2.fc16.x86_64
  2011-12-08 17:47 ` kernel BUG at drivers/iommu/intel-iommu.c:1767 on F16 3.1.1-2.fc16.x86_64 Roland Dreier
@ 2011-12-08 17:56   ` Albert Strasheim
  2011-12-08 18:29     ` Roland Dreier
  0 siblings, 1 reply; 9+ messages in thread
From: Albert Strasheim @ 2011-12-08 17:56 UTC (permalink / raw)
  To: Roland Dreier; +Cc: linux-rdma, LKML, David Woodhouse

Hello

On Thu, Dec 8, 2011 at 7:47 PM, Roland Dreier <roland@purestorage.com> wrote:
>  > [  597.412843] kernel BUG at drivers/iommu/intel-iommu.c:1767!
> So this is
>        BUG_ON(addr_width < BITS_PER_LONG && (iov_pfn + nr_pages - 1)
>>> addr_width);
>
> in __domain_mapping() I believe.  And we have:
>        int addr_width = agaw_to_width(domain->agaw) - VTD_PAGE_SHIFT;
> How much RAM does you system have?

192 GB.

A few gigabyte is reserved for 2 MB huge pages, if that matters.

> I don't know too much about the low-level VT-d details, but is it
> possible the setup code
> is choosing a too small "guest address width" to cover all your memory?
>
> drivers/net/ethernet/mellanox/mlx4/main.c has
>
>        err = pci_set_dma_mask(pdev, DMA_BIT_MASK(64));
>        if (err) {
>                dev_warn(&pdev->dev, "Warning: couldn't set 64-bit PCI
> DMA mask.\n");
>                err = pci_set_dma_mask(pdev, DMA_BIT_MASK(32));
>                if (err) {
>                        dev_err(&pdev->dev, "Can't set PCI DMA mask,
> aborting.\n");
>                        goto err_release_regions;
>                }
>        }
>        err = pci_set_consistent_dma_mask(pdev, DMA_BIT_MASK(64));
>        if (err) {
>                dev_warn(&pdev->dev, "Warning: couldn't set 64-bit "
>                         "consistent PCI DMA mask.\n");
>                err = pci_set_consistent_dma_mask(pdev, DMA_BIT_MASK(32));
>                if (err) {
>                        dev_err(&pdev->dev, "Can't set consistent PCI
> DMA mask, "
>                                "aborting.\n");
>                        goto err_release_regions;
>                }
>        }
>
> do you see any warnings in your kernel log about setting the PCI DMA mask?
> (in any case that should only affect bus addresses, not "guest addresses")

I don't see anything like what you mentioned here.

>From the SAS controller:

mpt2sas0: 64 BIT PCI BUS DMA ADDRESSING SUPPORTED, total mem (198141716 kB)

Not seeing anything about PCI DMA errors.

I think the BIOS has VT-d enabled. dmesg says:

PCI-DMA: Intel(R) Virtualization Technology for Directed I/O

I can send you a full dmesg if that would help.

Regards

Albert

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: kernel BUG at drivers/iommu/intel-iommu.c:1767 on F16 3.1.1-2.fc16.x86_64
  2011-12-08 17:56   ` Albert Strasheim
@ 2011-12-08 18:29     ` Roland Dreier
  2011-12-08 18:31       ` Albert Strasheim
  0 siblings, 1 reply; 9+ messages in thread
From: Roland Dreier @ 2011-12-08 18:29 UTC (permalink / raw)
  To: Albert Strasheim; +Cc: linux-rdma, LKML, David Woodhouse

On Thu, Dec 8, 2011 at 9:56 AM, Albert Strasheim <fullung@gmail.com> wrote:
> I think the BIOS has VT-d enabled. dmesg says:
>
> PCI-DMA: Intel(R) Virtualization Technology for Directed I/O

Yes, you're crashing in the VT-d code.

If you don't care about that, you can boot with the kernel parameter
"intel_iommu=off"
as a workaround, but it would be nice to get to the bottom of this.

 - R.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: kernel BUG at drivers/iommu/intel-iommu.c:1767 on F16 3.1.1-2.fc16.x86_64
  2011-12-08 18:29     ` Roland Dreier
@ 2011-12-08 18:31       ` Albert Strasheim
  2011-12-20 10:47         ` Albert Strasheim
  0 siblings, 1 reply; 9+ messages in thread
From: Albert Strasheim @ 2011-12-08 18:31 UTC (permalink / raw)
  To: Roland Dreier; +Cc: linux-rdma, LKML, David Woodhouse

Hello

On Thu, Dec 8, 2011 at 8:29 PM, Roland Dreier <roland@purestorage.com> wrote:
> On Thu, Dec 8, 2011 at 9:56 AM, Albert Strasheim <fullung@gmail.com> wrote:
>> I think the BIOS has VT-d enabled. dmesg says:
>> PCI-DMA: Intel(R) Virtualization Technology for Directed I/O
> Yes, you're crashing in the VT-d code.
> If you don't care about that, you can boot with the kernel parameter
> "intel_iommu=off"
> as a workaround, but it would be nice to get to the bottom of this.

I'm happy to test any patches or provide more information.

Regards

Albert

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: kernel BUG at drivers/iommu/intel-iommu.c:1767 on F16 3.1.1-2.fc16.x86_64
  2011-12-08 18:31       ` Albert Strasheim
@ 2011-12-20 10:47         ` Albert Strasheim
  2012-01-19  8:57           ` Albert Strasheim
  0 siblings, 1 reply; 9+ messages in thread
From: Albert Strasheim @ 2011-12-20 10:47 UTC (permalink / raw)
  To: Roland Dreier; +Cc: linux-rdma, LKML, David Woodhouse

Any news on this one?

Regards

Albert

On Thu, Dec 8, 2011 at 8:31 PM, Albert Strasheim <fullung@gmail.com> wrote:
> On Thu, Dec 8, 2011 at 8:29 PM, Roland Dreier <roland@purestorage.com> wrote:
>> On Thu, Dec 8, 2011 at 9:56 AM, Albert Strasheim <fullung@gmail.com> wrote:
>>> I think the BIOS has VT-d enabled. dmesg says:
>>> PCI-DMA: Intel(R) Virtualization Technology for Directed I/O
>> Yes, you're crashing in the VT-d code.
>> If you don't care about that, you can boot with the kernel parameter
>> "intel_iommu=off"
>> as a workaround, but it would be nice to get to the bottom of this.
>
> I'm happy to test any patches or provide more information.
>
> Regards
>
> Albert

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: kernel BUG at drivers/iommu/intel-iommu.c:1767 on F16 3.1.1-2.fc16.x86_64
  2011-12-20 10:47         ` Albert Strasheim
@ 2012-01-19  8:57           ` Albert Strasheim
  2012-01-20  8:23             ` Roland Dreier
  0 siblings, 1 reply; 9+ messages in thread
From: Albert Strasheim @ 2012-01-19  8:57 UTC (permalink / raw)
  To: Roland Dreier; +Cc: linux-rdma, LKML, David Woodhouse

Hello again

On Tue, Dec 20, 2011 at 12:47 PM, Albert Strasheim <fullung@gmail.com> wrote:
> Any news on this one?
> Regards
> Albert
> On Thu, Dec 8, 2011 at 8:31 PM, Albert Strasheim <fullung@gmail.com> wrote:
>> On Thu, Dec 8, 2011 at 8:29 PM, Roland Dreier <roland@purestorage.com> wrote:
>>> On Thu, Dec 8, 2011 at 9:56 AM, Albert Strasheim <fullung@gmail.com> wrote:
>>>> I think the BIOS has VT-d enabled. dmesg says:
>>>> PCI-DMA: Intel(R) Virtualization Technology for Directed I/O
>>> Yes, you're crashing in the VT-d code.
>>> If you don't care about that, you can boot with the kernel parameter
>>> "intel_iommu=off"
>>> as a workaround, but it would be nice to get to the bottom of this.
>> I'm happy to test any patches or provide more information.

Just checking up on this issue. Is there any further testing or
information we can provide to help make a fix happen?

Regards

Albert

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: kernel BUG at drivers/iommu/intel-iommu.c:1767 on F16 3.1.1-2.fc16.x86_64
  2012-01-19  8:57           ` Albert Strasheim
@ 2012-01-20  8:23             ` Roland Dreier
  2012-01-20 19:02               ` Albert Strasheim
  0 siblings, 1 reply; 9+ messages in thread
From: Roland Dreier @ 2012-01-20  8:23 UTC (permalink / raw)
  To: Albert Strasheim; +Cc: linux-rdma, LKML, David Woodhouse

On Thu, Jan 19, 2012 at 12:57 AM, Albert Strasheim <fullung@gmail.com> wrote:
> Just checking up on this issue. Is there any further testing or
> information we can provide to help make a fix happen?

I'm not likely to be much help on VT-d issues, but maybe it
would be useful to dump all the values in the BUG_ON if its
going to trigger, ie just before

       BUG_ON(addr_width < BITS_PER_LONG && (iov_pfn + nr_pages - 1)
>> addr_width);

add

       if (addr_width < BITS_PER_LONG && (iov_pfn + nr_pages - 1) >> addr_width)
           pr_err("VT-d BUG! addr_width %d < %d (iov_pfn 0x%lx
nr_pages %ld)\n", addr_width, BITS_PER_LONG, iov_pfn, nr_pages);

and report what that prints.

 - R.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: kernel BUG at drivers/iommu/intel-iommu.c:1767 on F16 3.1.1-2.fc16.x86_64
  2012-01-20  8:23             ` Roland Dreier
@ 2012-01-20 19:02               ` Albert Strasheim
  2012-01-24 15:00                 ` Josh Boyer
  0 siblings, 1 reply; 9+ messages in thread
From: Albert Strasheim @ 2012-01-20 19:02 UTC (permalink / raw)
  To: Roland Dreier; +Cc: linux-rdma, LKML, David Woodhouse

Hello

On Fri, Jan 20, 2012 at 10:23 AM, Roland Dreier <roland@purestorage.com> wrote:
> On Thu, Jan 19, 2012 at 12:57 AM, Albert Strasheim <fullung@gmail.com> wrote:
>> Just checking up on this issue. Is there any further testing or
>> information we can provide to help make a fix happen?
> I'm not likely to be much help on VT-d issues, but maybe it
> would be useful to dump all the values in the BUG_ON if its
> going to trigger, ie just before

Just retested with 3.2.1-1.fc16.x86_64 and the bug seems to be gone.

I confirmed that my test program triggers the bug on 3.1.1-1.fc16.x86_64.

It seems a bunch of IOMMU fixes went in on 9 and 10 January, so it
seems to have fixed this problem in 3.2.

Thanks!

Regards

Albert

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: kernel BUG at drivers/iommu/intel-iommu.c:1767 on F16 3.1.1-2.fc16.x86_64
  2012-01-20 19:02               ` Albert Strasheim
@ 2012-01-24 15:00                 ` Josh Boyer
  0 siblings, 0 replies; 9+ messages in thread
From: Josh Boyer @ 2012-01-24 15:00 UTC (permalink / raw)
  To: Albert Strasheim; +Cc: Roland Dreier, linux-rdma, LKML, David Woodhouse

On Fri, Jan 20, 2012 at 2:02 PM, Albert Strasheim <fullung@gmail.com> wrote:
> Hello
>
> On Fri, Jan 20, 2012 at 10:23 AM, Roland Dreier <roland@purestorage.com> wrote:
>> On Thu, Jan 19, 2012 at 12:57 AM, Albert Strasheim <fullung@gmail.com> wrote:
>>> Just checking up on this issue. Is there any further testing or
>>> information we can provide to help make a fix happen?
>> I'm not likely to be much help on VT-d issues, but maybe it
>> would be useful to dump all the values in the BUG_ON if its
>> going to trigger, ie just before
>
> Just retested with 3.2.1-1.fc16.x86_64 and the bug seems to be gone.
>
> I confirmed that my test program triggers the bug on 3.1.1-1.fc16.x86_64.
>
> It seems a bunch of IOMMU fixes went in on 9 and 10 January, so it
> seems to have fixed this problem in 3.2.

Possibly not.  Fedora disabled the Intel IOMMU by default in 3.1.6 (ish) and
newer, so if your problem was related to the IOMMU being enabled that might
explain why it went away.

You might try booting with intel_iommu=on and see if it really is gone.

josh

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2012-01-24 15:00 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <CALfB72BHLmN_vkpz9j7mi4yQWaYfJOojqopJ6d4911vuf3WRAA@mail.gmail.com>
2011-12-08 17:47 ` kernel BUG at drivers/iommu/intel-iommu.c:1767 on F16 3.1.1-2.fc16.x86_64 Roland Dreier
2011-12-08 17:56   ` Albert Strasheim
2011-12-08 18:29     ` Roland Dreier
2011-12-08 18:31       ` Albert Strasheim
2011-12-20 10:47         ` Albert Strasheim
2012-01-19  8:57           ` Albert Strasheim
2012-01-20  8:23             ` Roland Dreier
2012-01-20 19:02               ` Albert Strasheim
2012-01-24 15:00                 ` Josh Boyer

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).