All of lore.kernel.org
 help / color / mirror / Atom feed
From: Ranran <ranshalit@gmail.com>
To: Bjorn Helgaas <helgaas@kernel.org>
Cc: linux-pci@vger.kernel.org
Subject: Re: [Bug 205701] New: Can't access RAM from PCIe
Date: Fri, 6 Dec 2019 18:52:40 +0200	[thread overview]
Message-ID: <CAJ2oMhJQOS-2GuXKGAdBQBXN55=af+xpmWW5+MUgkyMLG_0Q0w@mail.gmail.com> (raw)
In-Reply-To: <CAJ2oMhJqsSftJtSDt2fsjqhLT0qQDZkdgQUc4pusuy6TvCnSVA@mail.gmail.com>

On Fri, Dec 6, 2019 at 6:48 PM Ranran <ranshalit@gmail.com> wrote:
>
> On Fri, Dec 6, 2019 at 5:08 PM Bjorn Helgaas <helgaas@kernel.org> wrote:
> >
> > On Fri, Dec 06, 2019 at 08:09:48AM +0200, Ranran wrote:
> > > On Fri, Nov 29, 2019 at 8:38 PM Bjorn Helgaas <helgaas@kernel.org> wrote:
> > > >
> > > > On Fri, Nov 29, 2019 at 06:10:51PM +0200, Ranran wrote:
> > > > > On Fri, Nov 29, 2019 at 4:58 PM Bjorn Helgaas <helgaas@kernel.org> wrote:
> > > > > > On Fri, Nov 29, 2019 at 06:59:48AM +0000, bugzilla-daemon@bugzilla.kernel.org wrote:
> > > > > > > https://bugzilla.kernel.org/show_bug.cgi?id=205701
> >
> > > I have tried to upgrade to latest kernel 5.4 (elrepo in centos), but
> > > with this processor/board (system x3650, Xeon), it get hang during
> > > kernel boot, without any error in dmesg, just keeps waiting for
> > > nothing for couple of minutes and than drops to dracut.
> >
> > - I don't think you ever said exactly what the original failure mode
> >   was.  You said DMA from an FPGA failed.  What is the specific
> >   device?  How do you know the DMA fails?
> >
>
> Hi,
> FPGA is Intel's Arria 10 device.
> We know that DMA fails because on using signaltap/probing the DMA
> transaction from FPGA to CPU's RAM we see that it stall, i.e. keep
> waiting for the access to finish.
> We don't observe any error in dmesg.
>

Two more notes about this:
1. We know that on same computer (Intel's Xeon, system x3650) the FPGA
can do the transaction without any issues.
2. Using the exact same test module in older compute/cpu (Intel's
DUO), we observe no issues in the dma transaction from FPGA.
The DMA transaction is always from FPGA to CPU's RAM.


>
> > - Re your v5.4 kernel testing, dracut is a user-space distro thing, so
> >   it sounds like your hang is some sort of installation problem that I
> >   can't really help you with.  Maybe there are troubleshooting hints
> >   at https://www.kernel.org/pub/linux/utils/boot/dracut/dracut.html.
>
> I know, that's quite frustrating. I tried to disable features using
> kernel arguments noacpi, noapic, but it still freeze somewhere without
> giving any error,
>
> >   You may also be able to just drop a v5.4 kernel on your v4.18
> >   system, at least for testing purposes.
> >
> What does it mean to drop 5.4 kernel on 4.18 kernel ?
>
>
> > - Your comment #3 in bugzilla is a link to a Google Doc containing a
> >   test module.  In the future, please attach things as plain text
> >   attachments directly to the bugzilla.  There's an "Add attachment"
> >   link immediately before the "Description" comment in bugzilla.  I
> >   did it for you this time.
> >
> > - It looks like your test_module.c is a kernel module, and frankly
> >   it's a mess.  Global variables that should be per-device, unused
> >   variables (dma_get_mask() called for no reason), confused usage
> >   (e.g., using both pci_dev_s and pPciDev), whitespace that appears
> >   random, etc.  I suggest starting with Documentation/PCI/pci.rst and,
> >   at least for this debugging effort, making it a self-contained
> >   driver instead of splitting things between a kernel module and
> >   user-space.
> >
>
> I've attached latest kernel module, which I hope will make it more
> clear, I will try to make it a standalone test next time I'm in lab.
>
> > - Your comment #4 is a link to a Google Doc containing lspci output.
> >   I attached it to bugzilla directly for you.
> >
> > - You apparently didn't run lspci as root ("sudo lspci -vv"), so it
> >   is missing a lot of information.
> >
> > - Your lspci doesn't match either of the dmesg logs.  Please make sure
> >   all your logs are from the same machine in the same configuration.
> >   For example, the first devices found by the kernel (from both
> >   comments #1 and #2) are:
> >
> >     pci 0000:00:00.0: [8086:3c00] type 00 class 0x060000
> >     pci 0000:00:01.0: [8086:3c02] type 01 class 0x060400
> >     pci 0000:00:02.0: [8086:3c04] type 01 class 0x060400
> >     pci 0000:00:02.2: [8086:3c06] type 01 class 0x060400
> >     ...
> >
> >   But the lspci doesn't include 00:01.0, 00:02.0, or 00:02.2.  It
> >   shows:
> >
> >     00:00.0 Host bridge: Intel Corporation Device 2020 (rev 04)
> >     00:04.0 System peripheral: Intel Corporation Sky Lake-E CBDMA Registers (rev 04)
> >     00:04.1 System peripheral: Intel Corporation Sky Lake-E CBDMA Registers (rev 04)
> >     00:04.2 System peripheral: Intel Corporation Sky Lake-E CBDMA Registers (rev 04)
> >     ...
>
>  I will do it in lab tomorrow. Thanks.

  reply	other threads:[~2019-12-06 16:52 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <CAJ2oMhJ10FTcNH5wqWT2nfNz4jwG0BYr1DcVYTUPOcsSwpkMYg@mail.gmail.com>
2019-11-29 18:38 ` [Bug 205701] New: Can't access RAM from PCIe Bjorn Helgaas
2019-11-29 21:43   ` Ranran
2019-12-06  6:09   ` Ranran
2019-12-06 15:08     ` Bjorn Helgaas
2019-12-06 16:48       ` Ranran
2019-12-06 16:52         ` Ranran [this message]
2019-12-06 17:57         ` Bjorn Helgaas
2019-12-15 17:29           ` Ranran
2019-12-17 23:29             ` Bjorn Helgaas
     [not found] <bug-205701-41252@https.bugzilla.kernel.org/>
2019-11-29 14:58 ` Bjorn Helgaas

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAJ2oMhJQOS-2GuXKGAdBQBXN55=af+xpmWW5+MUgkyMLG_0Q0w@mail.gmail.com' \
    --to=ranshalit@gmail.com \
    --cc=helgaas@kernel.org \
    --cc=linux-pci@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.