linux-pci.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Neophyte questions about PCIe
@ 2017-03-07 22:45 Mason
  2017-03-08 13:39 ` Mason
                   ` (2 more replies)
  0 siblings, 3 replies; 33+ messages in thread
From: Mason @ 2017-03-07 22:45 UTC (permalink / raw)
  To: linux-pci, Linux ARM
  Cc: Rob Herring, Arnd Bergmann, Ard Biesheuvel, Marc Zyngier,
	Thibaud Cornic, David Laight, Phuong Nguyen, Shawn Lin

Hello,

I've been working with the Linux PCIe framework for a few weeks,
and there are still a few things that remain unclear to me.
I thought I'd group them in a single message.

1) If I understand correctly, PCI defines 3 types of (address?) "spaces"
	- configuration
	- memory
	- I/O

I think PCI has its roots in x86, where there are separate
instructions for I/O accesses and memory accesses (with MMIO
sitting somewhere in the middle). I'm on ARMv7 which doesn't
have I/O instructions AFAIK. I'm not sure what the I/O address
space is used for in PCIe, especially since I was told that
one may map I/O-type registers (in my understanding, registers
for which accesses cause side effects) within mem space.


2) On my platform, there are two revisions of the PCIe controller.
Rev1 muxes config and mem inside a 256 MB window, and doesn't support
I/O space.
Rev2 muxes all 3 spaces inside a 256 MB window.

Ard has stated that this model is not supported by Linux.
AFAIU, the reason is that accesses may occur concurrently
(especially on SMP systems). Thus tweaking a bit before
the actual access necessarily creates a race condition.

I wondered if there might be (reasonable) software
work-arounds, in your experience?


3) What happens if a device requires more than 256 MB of
mem space? (Is that common? What kind of device? GPUs?)
Our controller supports a remapping "facility" to add an
offset to the bus address. Is such a feature supported
by Linux at all?  The problem is that this creates
another race condition, as setting the offset register
before an access may occur concurrently on two cores.
Perhaps 256 MB is plenty on a 32-bit embedded device?


4) The HW dev is considering the following fix.
Instead of muxing the address spaces, provide smaller
exclusive spaces. For example
[0x5000_0000, 0x5400_0000] for config (64MB)
[0x5400_0000, 0x5800_0000] for I/O (64MB)
[0x5800_0000, 0x6000_0000] for mem (128MB)

That way, bits 26:27 implicitly select the address space
	00 = config
	01 = I/O
	1x = mem

This would be more in line with what Linux expects, right?
Are these sizes acceptable? 64 MB config is probably overkill
(we'll never have 64 devices on this board). 64 MB for I/O
is probably plenty. The issue might be mem space?


Thanks to anyone who can shine some light on either of
these points for me :-)

Regards.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Neophyte questions about PCIe
  2017-03-07 22:45 Neophyte questions about PCIe Mason
@ 2017-03-08 13:39 ` Mason
  2017-03-08 13:54 ` David Laight
  2017-03-08 15:17 ` Bjorn Helgaas
  2 siblings, 0 replies; 33+ messages in thread
From: Mason @ 2017-03-08 13:39 UTC (permalink / raw)
  To: linux-pci, Linux ARM
  Cc: Ard Biesheuvel, Arnd Bergmann, Rob Herring, David Laight,
	Phuong Nguyen, Thibaud Cornic, Marc Zyngier, Shawn Lin,
	Bjorn Helgaas

On 07/03/2017 23:45, Mason wrote:

> 3) What happens if a device requires more than 256 MB of
> mem space? (Is that common? What kind of device? GPUs?)
> Our controller supports a remapping "facility" to add an
> offset to the bus address. Is such a feature supported
> by Linux at all?  The problem is that this creates
> another race condition, as setting the offset register
> before an access may occur concurrently on two cores.
> Perhaps 256 MB is plenty on a 32-bit embedded device?

I was told that Linux does not support this kind of "dynamic remapping",
because access to PCI memory region is not handled through a call-back;
the driver just calls readl/writel directly on the pointer.

"We expect that any device we map into the address space is reachable
through static page table entries set up by ioremap() or pci_iomap()."


On a related subject, I asked about the max size of I/O space.

/**
 *	pci_remap_iospace - Remap the memory mapped I/O space
 *	@res: Resource describing the I/O space
 *	@phys_addr: physical address of range to be mapped
 *
 *	Remap the memory mapped I/O space described by the @res
 *	and the CPU physical address @phys_addr into virtual address space.
 *	Only architectures that have memory mapped IO functions defined
 *	(and the PCI_IOBASE value defined) should call this function.
 */
int __weak pci_remap_iospace(const struct resource *res, phys_addr_t phys_addr)
{
#if defined(PCI_IOBASE) && defined(CONFIG_MMU)
	unsigned long vaddr = (unsigned long)PCI_IOBASE + res->start;

	if (!(res->flags & IORESOURCE_IO))
		return -EINVAL;

	if (res->end > IO_SPACE_LIMIT)
		return -EINVAL;

	return ioremap_page_range(vaddr, vaddr + resource_size(res), phys_addr,
				  pgprot_device(PAGE_KERNEL));
#else
	/* this architecture does not have memory mapped I/O space,
	   so this function should never be called */
	WARN_ONCE(1, "This architecture does not support memory mapped I/O\n");
	return -ENODEV;
#endif
}

/* PCI fixed i/o mapping */
#define PCI_IO_VIRT_BASE	0xfee00000
#define PCI_IOBASE		((void __iomem *)PCI_IO_VIRT_BASE)

http://lxr.free-electrons.com/source/arch/arm/include/asm/io.h?v=4.9#L188

#ifdef CONFIG_NEED_MACH_IO_H
#include <mach/io.h>
#elif defined(CONFIG_PCI)
#define IO_SPACE_LIMIT	((resource_size_t)0xfffff)
#define __io(a)		__typesafe_io(PCI_IO_VIRT_BASE + ((a) & IO_SPACE_LIMIT))
#else
#define __io(a)		__typesafe_io((a) & IO_SPACE_LIMIT)
#endif

So the default seems to be 1 MB on arm32. But the platform seems allowed
to define a larger or a smaller space.

Regards.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* RE: Neophyte questions about PCIe
  2017-03-07 22:45 Neophyte questions about PCIe Mason
  2017-03-08 13:39 ` Mason
@ 2017-03-08 13:54 ` David Laight
  2017-03-08 14:17   ` Mason
  2017-03-09 22:01   ` Jeremy Linton
  2017-03-08 15:17 ` Bjorn Helgaas
  2 siblings, 2 replies; 33+ messages in thread
From: David Laight @ 2017-03-08 13:54 UTC (permalink / raw)
  To: 'Mason', linux-pci, Linux ARM
  Cc: Rob Herring, Arnd Bergmann, Ard Biesheuvel, Marc Zyngier,
	Thibaud Cornic, Phuong Nguyen, Shawn Lin

From: Mason
> Sent: 07 March 2017 22:45
> Hello,
> 
> I've been working with the Linux PCIe framework for a few weeks,
> and there are still a few things that remain unclear to me.
> I thought I'd group them in a single message.
> 
> 1) If I understand correctly, PCI defines 3 types of (address?) "spaces"
> 	- configuration
> 	- memory
> 	- I/O
> 
> I think PCI has its roots in x86, where there are separate
> instructions for I/O accesses and memory accesses (with MMIO
> sitting somewhere in the middle). I'm on ARMv7 which doesn't
> have I/O instructions AFAIK. I'm not sure what the I/O address
> space is used for in PCIe, especially since I was told that
> one may map I/O-type registers (in my understanding, registers
> for which accesses cause side effects) within mem space.

There isn't much difference between a memory BAR and an IO BAR.
Both are used for accesses to device registers.
There are subtle differences in the PCIe TLPs (I think io writes
get a completion TLP).
Memory space (maybe only 64bit address??) can be 'pre-fetchable'
but generally the driver maps everything uncachable.


> 2) On my platform, there are two revisions of the PCIe controller.
> Rev1 muxes config and mem inside a 256 MB window, and doesn't support
> I/O space.
> Rev2 muxes all 3 spaces inside a 256 MB window.

Don't think config space fits.
With the 'obvious' mapping the 'bus number' is in the top
8 bits of the address.
IIRC x86 uses two 32bit addresses for config space.
One is used to hold the 'address' for the cycle, the other
to perform the cycle.

> Ard has stated that this model is not supported by Linux.
> AFAIU, the reason is that accesses may occur concurrently
> (especially on SMP systems). Thus tweaking a bit before
> the actual access necessarily creates a race condition.
> 
> I wondered if there might be (reasonable) software
> work-arounds, in your experience?

Remember some drivers let applications mmap PCIe addresses
directly into the user page tables.
So you have to stop absolutely everything if you change
your mux.

> 3) What happens if a device requires more than 256 MB of
> mem space? (Is that common? What kind of device? GPUs?)
> Our controller supports a remapping "facility" to add an
> offset to the bus address. Is such a feature supported
> by Linux at all?  The problem is that this creates
> another race condition, as setting the offset register
> before an access may occur concurrently on two cores.
> Perhaps 256 MB is plenty on a 32-bit embedded device?

GPUs tend to have their own paging scheme.
So don't need humongous windows.
I'm not sure how much space is really needed.
32bit x86 reserve the top 1GB of physical address for PCI(e).

> 4) The HW dev is considering the following fix.
> Instead of muxing the address spaces, provide smaller
> exclusive spaces. For example
> [0x5000_0000, 0x5400_0000] for config (64MB)
> [0x5400_0000, 0x5800_0000] for I/O (64MB)
> [0x5800_0000, 0x6000_0000] for mem (128MB)

You almost certainly don't need more than 64k of IO.

> That way, bits 26:27 implicitly select the address space
> 	00 = config
> 	01 = I/O
> 	1x = mem
> 
> This would be more in line with what Linux expects, right?
> Are these sizes acceptable? 64 MB config is probably overkill
> (we'll never have 64 devices on this board). 64 MB for I/O
> is probably plenty. The issue might be mem space?

Config space isn't dense, you (probably) need 25 bits to get a 2nd bus number.
Even 256MB constrains you to 16 bus numbers.

Is this an ARM cpu inside an altera (now intel) fpga??
There is a nasty bug in their PCIe to avalon bridge logic (fixed in quartus 16.1).

	David


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Neophyte questions about PCIe
  2017-03-08 13:54 ` David Laight
@ 2017-03-08 14:17   ` Mason
  2017-03-08 14:38     ` David Laight
  2017-03-09 22:01   ` Jeremy Linton
  1 sibling, 1 reply; 33+ messages in thread
From: Mason @ 2017-03-08 14:17 UTC (permalink / raw)
  To: David Laight, linux-pci, Linux ARM
  Cc: Ard Biesheuvel, Arnd Bergmann, Rob Herring, Phuong Nguyen,
	Thibaud Cornic, Marc Zyngier, Shawn Lin, Bjorn Helgaas

Hello David,

On 08/03/2017 14:54, David Laight wrote:

> Mason wrote:
>
>> 2) On my platform, there are two revisions of the PCIe controller.
>> Rev1 muxes config and mem inside a 256 MB window, and doesn't support
>> I/O space.
>> Rev2 muxes all 3 spaces inside a 256 MB window.
> 
> Don't think config space fits.
> With the 'obvious' mapping the 'bus number' is in the top
> 8 bits of the address.

https://www.kernel.org/doc/Documentation/devicetree/bindings/pci/host-generic-pci.txt

	cfg_offset(bus, device, function, register) =
		bus << 20 | device << 15 | function << 12 | register

8 bits for bus, 5 bits for device, 3 bits for function, 12 bits for reg offset
1 MB per bus, 256 buses max => 256 MB max

Supporting "only" 64 buses is good enough, I believe.


>> 3) What happens if a device requires more than 256 MB of
>> mem space? (Is that common? What kind of device? GPUs?)
>> Our controller supports a remapping "facility" to add an
>> offset to the bus address. Is such a feature supported
>> by Linux at all?  The problem is that this creates
>> another race condition, as setting the offset register
>> before an access may occur concurrently on two cores.
>> Perhaps 256 MB is plenty on a 32-bit embedded device?
> 
> GPUs tend to have their own paging scheme.
> So don't need humongous windows.
> I'm not sure how much space is really needed.
> 32bit x86 reserve the top 1GB of physical address for PCI(e).

I'm hoping 128 MB mem is enough. The two cards I have that are correctly
detected request 8 KB. (I have other cards that are not enumerated at all...
No idea why at the moment.)


>> 4) The HW dev is considering the following fix.
>> Instead of muxing the address spaces, provide smaller
>> exclusive spaces. For example
>> [0x5000_0000, 0x5400_0000] for config (64MB)
>> [0x5400_0000, 0x5800_0000] for I/O (64MB)
>> [0x5800_0000, 0x6000_0000] for mem (128MB)
> 
> You almost certainly don't need more than 64k of IO.

Good to know.


> Config space isn't dense, you (probably) need 25 bits to get a 2nd bus number.
> Even 256MB constrains you to 16 bus numbers.

Unless I got the math wrong, it's 20 bits (1 MB) per bus.
So 64 MB allows 64 buses.

> Is this an ARM cpu inside an altera (now intel) fpga??
> There is a nasty bug in their PCIe to avalon bridge logic (fixed in quartus 16.1).

The PCIe controller is from PLDA, and it's embedded in a SoC
where the CPU is a multi-core ARM Cortex A9 MP.

Regards.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* RE: Neophyte questions about PCIe
  2017-03-08 14:17   ` Mason
@ 2017-03-08 14:38     ` David Laight
  0 siblings, 0 replies; 33+ messages in thread
From: David Laight @ 2017-03-08 14:38 UTC (permalink / raw)
  To: 'Mason', linux-pci, Linux ARM
  Cc: Rob Herring, Arnd Bergmann, Ard Biesheuvel, Marc Zyngier,
	Thibaud Cornic, Bjorn Helgaas, Phuong Nguyen, Shawn Lin

From: Mason
> Sent: 08 March 2017 14:18
...
> > Don't think config space fits.
> > With the 'obvious' mapping the 'bus number' is in the top
> > 8 bits of the address.
> 
> https://www.kernel.org/doc/Documentation/devicetree/bindings/pci/host-generic-pci.txt
> 
> 	cfg_offset(bus, device, function, register) =
> 		bus << 20 | device << 15 | function << 12 | register
> 
> 8 bits for bus, 5 bits for device, 3 bits for function, 12 bits for reg offset
> 1 MB per bus, 256 buses max => 256 MB max
> 
> Supporting "only" 64 buses is good enough, I believe.

I was comparing the PCIe TLP layout for config cycles and normal read/write.
There bus/dev/fun are in the top 16 bits with 4 reserved bits above the
register address (effectively allowing 64k of config space per function).

Possibly some logic shifting the address across for config cycles.

The TLP type also has a bit for type-0 v type-1 config cycles.
I can't quite remember the difference.

	David


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Neophyte questions about PCIe
  2017-03-07 22:45 Neophyte questions about PCIe Mason
  2017-03-08 13:39 ` Mason
  2017-03-08 13:54 ` David Laight
@ 2017-03-08 15:17 ` Bjorn Helgaas
  2017-03-09 23:43   ` Mason
  2 siblings, 1 reply; 33+ messages in thread
From: Bjorn Helgaas @ 2017-03-08 15:17 UTC (permalink / raw)
  To: Mason
  Cc: Rob Herring, Arnd Bergmann, Ard Biesheuvel, Marc Zyngier,
	linux-pci, Thibaud Cornic, David Laight, Phuong Nguyen,
	Shawn Lin, Linux ARM

On Tue, Mar 07, 2017 at 11:45:27PM +0100, Mason wrote:
> Hello,
> 
> I've been working with the Linux PCIe framework for a few weeks,
> and there are still a few things that remain unclear to me.
> I thought I'd group them in a single message.
> 
> 1) If I understand correctly, PCI defines 3 types of (address?) "spaces"
> 	- configuration
> 	- memory
> 	- I/O
> 
> I think PCI has its roots in x86, where there are separate
> instructions for I/O accesses and memory accesses (with MMIO
> sitting somewhere in the middle). I'm on ARMv7 which doesn't
> have I/O instructions AFAIK. I'm not sure what the I/O address
> space is used for in PCIe, especially since I was told that
> one may map I/O-type registers (in my understanding, registers
> for which accesses cause side effects) within mem space.

You're right about the three PCI address spaces.  Obviously, these
only apply to the *PCI* hierarchy.  The PCI host bridge, which is the
interface between the PCI hierarchy and the rest of the system (CPUs,
system RAM, etc.), generates these PCI config, memory, or I/O
transactions.

The host bridge may use a variety of mechanisms to translate a CPU
access into the appropriate PCI transaction.

  - PCI memory transactions: Generally the host bridge translates CPU
    memory accesses directly into PCI memory accesses, although it may
    translate the physical address from the CPU to a different PCI bus
    address, e.g., by truncating high-order address bits or adding a
    constant offset.

    As you mentioned, drivers use some flavor of ioremap() to set up
    mappings for PCI memory space, then they perform simple memory
    accesses to it.  There's no required PCI core wrapper and no
    locking in this path.

  - PCI I/O transactions: On x86, where the ISA supports "I/O"
    instructions, a host bridge generally forwards I/O accesses from
    the CPU directly to PCI.  Bridges for use on other arches may
    provide a bridge-specific way to convert a CPU memory access into
    a PCI I/O transaction, e.g., a CPU memory store inside a bridge
    window may be translated to a PCI I/O write transaction, with the
    PCI I/O address determined by the offset into the bridge window.

    Drivers use inb()/outb() to access PCI I/O space.  These are
    arch-specific wrappers that can use the appropriate mechanism for
    the arch and bridge.

    PCIe deprecates I/O space, and many bridges don't support it at
    all, so it's relatively unimportant.  Many PCI devices do make
    registers available in both I/O and memory space, but there's no
    spec requirement to do so.  Drivers for such devices would have to
    know about this as a device-specific detail.

  - PCI config transactions: The simplest mechanism is called ECAM
    ("Enhanced Configuration Access Method") and is required by the
    PCIe spec and also supported by some conventional PCI bridges.  A
    CPU memory access inside a bridge window is converted into a PCI
    configuration transaction.  The PCI bus/device/function
    information is encoded into the CPU physical memory address.

    Another common mechanism is for the host bridge to have an
    "address" register, where the CPU writes the PCI bus/device/
    function information, and a "data" register where the CPU reads or
    writes the configuration data.  This obviously requires locking
    around the address/data accesses.

    The PCI core and drivers use pci_read_config_*() wrappers to
    access config space.  These use the appropriate bridge-specific
    mechanism and do any required locking.

> 2) On my platform, there are two revisions of the PCIe controller.
> Rev1 muxes config and mem inside a 256 MB window, and doesn't support
> I/O space.
> Rev2 muxes all 3 spaces inside a 256 MB window.
> 
> Ard has stated that this model is not supported by Linux.
> AFAIU, the reason is that accesses may occur concurrently
> (especially on SMP systems). Thus tweaking a bit before
> the actual access necessarily creates a race condition.

Yes.

> I wondered if there might be (reasonable) software
> work-arounds, in your experience?

Muxing config and I/O space isn't a huge issue because they both use
wrappers that could do locking.  Muxing config and memory space is a
pretty big problem because memory accesses do not use a wrapper.

There's no pretty way of making sure no driver is doing memory
accesses during a config access.  Somebody already pointed out that
you'd have to make sure no other CPU could be executing a driver while
you're doing a config access.  I can't think of any better solution.

> 3) What happens if a device requires more than 256 MB of
> mem space? (Is that common? What kind of device? GPUs?)

It is fairly common to have PCI BARs larger than 256MB.

> Our controller supports a remapping "facility" to add an
> offset to the bus address. Is such a feature supported
> by Linux at all?  The problem is that this creates
> another race condition, as setting the offset register
> before an access may occur concurrently on two cores.
> Perhaps 256 MB is plenty on a 32-bit embedded device?

Linux certainly supports a constant offset between the CPU physical
address and the PCI bus address -- this is the offset described by
pci_add_resource_offset().

But it sounds like you're envisioning some sort of dynamic remapping,
and I don't see how that could work.  The PCI core needs to know the
entire host bridge window size up front, because that's how it assigns
BARs.  Since there's no wrapper for memory accesses, there's no
opportunity to change the remapping at the time of access.

> 4) The HW dev is considering the following fix.
> Instead of muxing the address spaces, provide smaller
> exclusive spaces. For example
> [0x5000_0000, 0x5400_0000] for config (64MB)
> [0x5400_0000, 0x5800_0000] for I/O (64MB)
> [0x5800_0000, 0x6000_0000] for mem (128MB)
> 
> That way, bits 26:27 implicitly select the address space
> 	00 = config
> 	01 = I/O
> 	1x = mem
> 
> This would be more in line with what Linux expects, right?
> Are these sizes acceptable? 64 MB config is probably overkill
> (we'll never have 64 devices on this board). 64 MB for I/O
> is probably plenty. The issue might be mem space?

Having exclusive spaces like that would be a typical approach.  The
I/O space seems like way more than you probably need, if you need it
at all.  There might be a few ancient devices that require I/O space,
but only you can tell whether you need to support those.

Same with memory space: if you restrict the set of devices you want to
support, you can restrict the amount of address space you need.  The
Sky Lake GPU on my laptop has a 256MB BAR, so even a single device
like that can require more than the 128MB you'd have with this map.

Bjorn

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Neophyte questions about PCIe
  2017-03-08 13:54 ` David Laight
  2017-03-08 14:17   ` Mason
@ 2017-03-09 22:01   ` Jeremy Linton
  1 sibling, 0 replies; 33+ messages in thread
From: Jeremy Linton @ 2017-03-09 22:01 UTC (permalink / raw)
  To: David Laight, 'Mason', linux-pci, Linux ARM
  Cc: Rob Herring, Arnd Bergmann, Ard Biesheuvel, Marc Zyngier,
	Thibaud Cornic, Phuong Nguyen, Shawn Lin

Hi,

On 03/08/2017 07:54 AM, David Laight wrote:
> From: Mason
>> Sent: 07 March 2017 22:45
>> Hello,
>>
>> I've been working with the Linux PCIe framework for a few weeks,
>> and there are still a few things that remain unclear to me.
>> I thought I'd group them in a single message.
>>
>> 1) If I understand correctly, PCI defines 3 types of (address?) "spaces"
>> 	- configuration
>> 	- memory
>> 	- I/O
>>
>> I think PCI has its roots in x86, where there are separate
>> instructions for I/O accesses and memory accesses (with MMIO
>> sitting somewhere in the middle). I'm on ARMv7 which doesn't
>> have I/O instructions AFAIK. I'm not sure what the I/O address
>> space is used for in PCIe, especially since I was told that
>> one may map I/O-type registers (in my understanding, registers
>> for which accesses cause side effects) within mem space.
>
> There isn't much difference between a memory BAR and an IO BAR.
> Both are used for accesses to device registers.
> There are subtle differences in the PCIe TLPs (I think io writes
> get a completion TLP).
> Memory space (maybe only 64bit address??) can be 'pre-fetchable'
> but generally the driver maps everything uncachable.
>
>
>> 2) On my platform, there are two revisions of the PCIe controller.
>> Rev1 muxes config and mem inside a 256 MB window, and doesn't support
>> I/O space.
>> Rev2 muxes all 3 spaces inside a 256 MB window.
>
> Don't think config space fits.
> With the 'obvious' mapping the 'bus number' is in the top
> 8 bits of the address.
> IIRC x86 uses two 32bit addresses for config space.
> One is used to hold the 'address' for the cycle, the other
> to perform the cycle.
>
>> Ard has stated that this model is not supported by Linux.
>> AFAIU, the reason is that accesses may occur concurrently
>> (especially on SMP systems). Thus tweaking a bit before
>> the actual access necessarily creates a race condition.
>>
>> I wondered if there might be (reasonable) software
>> work-arounds, in your experience?
>
> Remember some drivers let applications mmap PCIe addresses
> directly into the user page tables.
> So you have to stop absolutely everything if you change
> your mux.
>
>> 3) What happens if a device requires more than 256 MB of
>> mem space? (Is that common? What kind of device? GPUs?)
>> Our controller supports a remapping "facility" to add an
>> offset to the bus address. Is such a feature supported
>> by Linux at all?  The problem is that this creates
>> another race condition, as setting the offset register
>> before an access may occur concurrently on two cores.
>> Perhaps 256 MB is plenty on a 32-bit embedded device?
>
> GPUs tend to have their own paging scheme.
> So don't need humongous windows.
> I'm not sure how much space is really needed.

The server class (tesla/etc) GPU's use 64-bit BAR's for a while and in 
the case of something like the K80 require two 16G mappings per board.

(see the comment at the bottom of this link)

https://devtalk.nvidia.com/default/topic/865872/driver-installing-problem-for-nvidia-tesla-k80-under-linux/


> 32bit x86 reserve the top 1GB of physical address for PCI(e).
>
>> 4) The HW dev is considering the following fix.
>> Instead of muxing the address spaces, provide smaller
>> exclusive spaces. For example
>> [0x5000_0000, 0x5400_0000] for config (64MB)
>> [0x5400_0000, 0x5800_0000] for I/O (64MB)
>> [0x5800_0000, 0x6000_0000] for mem (128MB)
>
> You almost certainly don't need more than 64k of IO.
>
>> That way, bits 26:27 implicitly select the address space
>> 	00 = config
>> 	01 = I/O
>> 	1x = mem
>>
>> This would be more in line with what Linux expects, right?
>> Are these sizes acceptable? 64 MB config is probably overkill
>> (we'll never have 64 devices on this board). 64 MB for I/O
>> is probably plenty. The issue might be mem space?
>
> Config space isn't dense, you (probably) need 25 bits to get a 2nd bus number.
> Even 256MB constrains you to 16 bus numbers.
>
> Is this an ARM cpu inside an altera (now intel) fpga??
> There is a nasty bug in their PCIe to avalon bridge logic (fixed in quartus 16.1).
>
> 	David
>
>
> _______________________________________________
> linux-arm-kernel mailing list
> linux-arm-kernel@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
>


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Neophyte questions about PCIe
  2017-03-08 15:17 ` Bjorn Helgaas
@ 2017-03-09 23:43   ` Mason
  2017-03-10 13:15     ` Robin Murphy
  2017-03-10 16:45     ` Mason
  0 siblings, 2 replies; 33+ messages in thread
From: Mason @ 2017-03-09 23:43 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: linux-pci, linux-usb, Rob Herring, Arnd Bergmann, Ard Biesheuvel,
	Marc Zyngier, Thibaud Cornic, David Laight, Phuong Nguyen,
	Shawn Lin, Linux ARM

On 08/03/2017 16:17, Bjorn Helgaas wrote:
[snip excellent in-depth overview]

I think I'm making progress, in that I now have a better
idea of what I don't understand. So I'm able to ask
(hopefully) less vague questions.

Take the USB3 PCIe adapter I've been testing with. At some
point during init, the XHCI driver request some memory
(via kmalloc?) in order to exchange data with the host, right?

On my SoC, the RAM used by Linux lives at physical range
[0x8000_0000, 0x8800_0000[ => 128 MB

How does the XHCI driver make the adapter aware of where
it can scribble data? The XHCI driver has no notion that
the device is behind a bus, does it?

At some point, the physical addresses must be converted
to PCI bus addresses, right? Is it computed subtracting
the offset defined in the DT?

Then suppose the USB3 card wants to write to an address
in RAM. It sends a packet on the PCIe bus, targeting
the PCI bus address of that RAM, right? Is this address
supposed to be in BAR0 of the root complex? I guess not,
since Bjorn said that it was unusual for a RC to have
a BAR at all. So I'll hand-wave, and decree that, by some
protocol magic, the packet arrives at the PCIe controller.
And this controller knows to forward this write request
over the memory bus. Does that look about right?

My problem is that, in the current implementation of the
PCIe controller, the USB device that wants to write to
memory is supposed to target BAR0 of the RC.

Since my mem space is limited to 256 MB, then BAR0 is
limited to 256 MB (or even 128 MB, since I also need
to mapthe device's BAR into the same mem space).

So, if I understand correctly (which, at this point,
is quite unlikely) PCIe will work correctly for me
only if Linux manages 128 MB or less...

How does it work on systems where the RC has no BAR?
I suppose devices are able to access all of RAM...
because the controller forwards everything? (This may
be where an IOMMU comes handy?)

Is there a way to know, at run-time, where and how big
Linux's dynamic memory pool is? Perhaps the memory pool
itself remains smaller than 128 MB?

I realize that I've asked a million questions. Feel free
to ignore most of them, if you can help with just one,
it would be a tremendous help already.

Regards.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Neophyte questions about PCIe
  2017-03-09 23:43   ` Mason
@ 2017-03-10 13:15     ` Robin Murphy
  2017-03-10 14:06       ` David Laight
  2017-03-10 14:53       ` Mason
  2017-03-10 16:45     ` Mason
  1 sibling, 2 replies; 33+ messages in thread
From: Robin Murphy @ 2017-03-10 13:15 UTC (permalink / raw)
  To: Mason
  Cc: Bjorn Helgaas, Rob Herring, Arnd Bergmann, Ard Biesheuvel,
	Marc Zyngier, linux-pci, Thibaud Cornic, linux-usb, David Laight,
	Phuong Nguyen, Shawn Lin, Linux ARM

On 09/03/17 23:43, Mason wrote:
> On 08/03/2017 16:17, Bjorn Helgaas wrote:
> [snip excellent in-depth overview]
> 
> I think I'm making progress, in that I now have a better
> idea of what I don't understand. So I'm able to ask
> (hopefully) less vague questions.
> 
> Take the USB3 PCIe adapter I've been testing with. At some
> point during init, the XHCI driver request some memory
> (via kmalloc?) in order to exchange data with the host, right?
> 
> On my SoC, the RAM used by Linux lives at physical range
> [0x8000_0000, 0x8800_0000[ => 128 MB
> 
> How does the XHCI driver make the adapter aware of where
> it can scribble data? The XHCI driver has no notion that
> the device is behind a bus, does it?
> 
> At some point, the physical addresses must be converted
> to PCI bus addresses, right? Is it computed subtracting
> the offset defined in the DT?
> 
> Then suppose the USB3 card wants to write to an address
> in RAM. It sends a packet on the PCIe bus, targeting
> the PCI bus address of that RAM, right? Is this address
> supposed to be in BAR0 of the root complex? I guess not,
> since Bjorn said that it was unusual for a RC to have
> a BAR at all. So I'll hand-wave, and decree that, by some
> protocol magic, the packet arrives at the PCIe controller.
> And this controller knows to forward this write request
> over the memory bus. Does that look about right?

Generally, yes - if an area of memory space *is* claimed by a BAR, then
another PCI device accessing that would be treated as peer-to-peer DMA,
which may or may not be allowed (or supported at all). For mem space
which isn't claimed by BARs, it's up to the RC to decide what to do. As
a concrete example (which might possibly be relevant) the PLDA XR3-AXI
IP which we have in the ARM Juno SoC has the ATR_PCIE_WINx registers in
its root port configuration block that control what ranges of mem space
are mapped to the external AXI master interface and how.

> My problem is that, in the current implementation of the
> PCIe controller, the USB device that wants to write to
> memory is supposed to target BAR0 of the RC.

That doesn't sound right at all. If the RC has a BAR, I'd expect it to
be for poking the guts of the RC device itself (since this prompted me
to go and compare, I see the Juno RC does indeed have it own enigmatic
16KB BAR, which reads as ever-changing random junk; no idea what that's
about).

> Since my mem space is limited to 256 MB, then BAR0 is
> limited to 256 MB (or even 128 MB, since I also need
> to mapthe device's BAR into the same mem space).

Your window into mem space *from the CPU's point of view* is limited to
256MB. The relationship between mem space and the system (AXI) memory
map from the point of view of PCI devices is a separate issue; if it's
configurable at all, it probably makes sense to have the firmware set an
outbound window to at least cover DRAM 1:1, then forget about it (this
is essentially what Juno UEFI does, for example).

Robin.

> So, if I understand correctly (which, at this point,
> is quite unlikely) PCIe will work correctly for me
> only if Linux manages 128 MB or less...
> 
> How does it work on systems where the RC has no BAR?
> I suppose devices are able to access all of RAM...
> because the controller forwards everything? (This may
> be where an IOMMU comes handy?)
> 
> Is there a way to know, at run-time, where and how big
> Linux's dynamic memory pool is? Perhaps the memory pool
> itself remains smaller than 128 MB?
> 
> I realize that I've asked a million questions. Feel free
> to ignore most of them, if you can help with just one,
> it would be a tremendous help already.
> 
> Regards.
> 
> _______________________________________________
> linux-arm-kernel mailing list
> linux-arm-kernel@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
> 

^ permalink raw reply	[flat|nested] 33+ messages in thread

* RE: Neophyte questions about PCIe
  2017-03-10 13:15     ` Robin Murphy
@ 2017-03-10 14:06       ` David Laight
  2017-03-10 15:05         ` Mason
  2017-03-10 14:53       ` Mason
  1 sibling, 1 reply; 33+ messages in thread
From: David Laight @ 2017-03-10 14:06 UTC (permalink / raw)
  To: 'Robin Murphy', Mason
  Cc: Bjorn Helgaas, Rob Herring, Arnd Bergmann, Ard Biesheuvel,
	Marc Zyngier, linux-pci, Thibaud Cornic, linux-usb,
	Phuong Nguyen, Shawn Lin, Linux ARM

From: Robin Murphy
> Sent: 10 March 2017 13:16
> On 09/03/17 23:43, Mason wrote:
> > On 08/03/2017 16:17, Bjorn Helgaas wrote:
> > [snip excellent in-depth overview]
> >
> > I think I'm making progress, in that I now have a better
> > idea of what I don't understand. So I'm able to ask
> > (hopefully) less vague questions.
> >
> > Take the USB3 PCIe adapter I've been testing with. At some
> > point during init, the XHCI driver request some memory
> > (via kmalloc?) in order to exchange data with the host, right?
> >
> > On my SoC, the RAM used by Linux lives at physical range
> > [0x8000_0000, 0x8800_0000[ =3D> 128 MB
> >
> > How does the XHCI driver make the adapter aware of where
> > it can scribble data? The XHCI driver has no notion that
> > the device is behind a bus, does it?
> >
> > At some point, the physical addresses must be converted
> > to PCI bus addresses, right? Is it computed subtracting
> > the offset defined in the DT?

The driver should call dma_alloc_coherent() which returns both the
kernel virtual address and the device (xhci controller) has
to use to access it.
The cpu physical address is irrelevant (although it might be
calculated in the middle somewhere).


> > Then suppose the USB3 card wants to write to an address
> > in RAM. It sends a packet on the PCIe bus, targeting
> > the PCI bus address of that RAM, right? Is this address
> > supposed to be in BAR0 of the root complex? I guess not,
> > since Bjorn said that it was unusual for a RC to have
> > a BAR at all. So I'll hand-wave, and decree that, by some
> > protocol magic, the packet arrives at the PCIe controller.
> > And this controller knows to forward this write request
> > over the memory bus. Does that look about right?
>=20
> Generally, yes - if an area of memory space *is* claimed by a BAR, then
> another PCI device accessing that would be treated as peer-to-peer DMA,
> which may or may not be allowed (or supported at all).

So PCIe addresses that refer to the host memory addresses are
just forwarded to the memory subsystem.
In practise this is almost everything.

The only other PCIe writes the host will see are likely to be associated
with MIS and MSI-X interrupt support.

Some PCIe root complex support peer-to-peer writes but not reads.
Write are normally 'posted' (so are 'fire and forget') reads need the
completion TLP (containing the data) sent back - all hard and difficult.

> For mem space
> which isn't claimed by BARs, it's up to the RC to decide what to do. As
> a concrete example (which might possibly be relevant) the PLDA XR3-AXI
> IP which we have in the ARM Juno SoC has the ATR_PCIE_WINx registers in
> its root port configuration block that control what ranges of mem space
> are mapped to the external AXI master interface and how.
>=20
> > My problem is that, in the current implementation of the
> > PCIe controller, the USB device that wants to write to
> > memory is supposed to target BAR0 of the RC.
>=20
> That doesn't sound right at all. If the RC has a BAR, I'd expect it to
> be for poking the guts of the RC device itself (since this prompted me
> to go and compare, I see the Juno RC does indeed have it own enigmatic
> 16KB BAR, which reads as ever-changing random junk; no idea what that's
> about).
>=20
> > Since my mem space is limited to 256 MB, then BAR0 is
> > limited to 256 MB (or even 128 MB, since I also need
> > to mapthe device's BAR into the same mem space).
>=20
> Your window into mem space *from the CPU's point of view* is limited to
> 256MB. The relationship between mem space and the system (AXI) memory
> map from the point of view of PCI devices is a separate issue; if it's
> configurable at all, it probably makes sense to have the firmware set an
> outbound window to at least cover DRAM 1:1, then forget about it (this
> is essentially what Juno UEFI does, for example).

So you have 128MB (max) of system memory that has cpu physical
addresses 0x80000000 upwards.
I'd expect it all to be accessible from any PCIe card at some PCIe
address, it might be at address 0, 0x80000000 or any other offset.

I don't know which DT entry controls that offset.

	David

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Neophyte questions about PCIe
  2017-03-10 13:15     ` Robin Murphy
  2017-03-10 14:06       ` David Laight
@ 2017-03-10 14:53       ` Mason
  1 sibling, 0 replies; 33+ messages in thread
From: Mason @ 2017-03-10 14:53 UTC (permalink / raw)
  To: Robin Murphy
  Cc: Bjorn Helgaas, Rob Herring, Arnd Bergmann, Ard Biesheuvel,
	Marc Zyngier, linux-pci, Thibaud Cornic, linux-usb, David Laight,
	Phuong Nguyen, Shawn Lin, Linux ARM

On 10/03/2017 14:15, Robin Murphy wrote:
> On 09/03/17 23:43, Mason wrote:
>> On 08/03/2017 16:17, Bjorn Helgaas wrote:
>> [snip excellent in-depth overview]
>>
>> I think I'm making progress, in that I now have a better
>> idea of what I don't understand. So I'm able to ask
>> (hopefully) less vague questions.
>>
>> Take the USB3 PCIe adapter I've been testing with. At some
>> point during init, the XHCI driver request some memory
>> (via kmalloc?) in order to exchange data with the host, right?
>>
>> On my SoC, the RAM used by Linux lives at physical range
>> [0x8000_0000, 0x8800_0000[ => 128 MB
>>
>> How does the XHCI driver make the adapter aware of where
>> it can scribble data? The XHCI driver has no notion that
>> the device is behind a bus, does it?
>>
>> At some point, the physical addresses must be converted
>> to PCI bus addresses, right? Is it computed subtracting
>> the offset defined in the DT?
>>
>> Then suppose the USB3 card wants to write to an address
>> in RAM. It sends a packet on the PCIe bus, targeting
>> the PCI bus address of that RAM, right? Is this address
>> supposed to be in BAR0 of the root complex? I guess not,
>> since Bjorn said that it was unusual for a RC to have
>> a BAR at all. So I'll hand-wave, and decree that, by some
>> protocol magic, the packet arrives at the PCIe controller.
>> And this controller knows to forward this write request
>> over the memory bus. Does that look about right?
> 
> Generally, yes - if an area of memory space *is* claimed by a BAR, then
> another PCI device accessing that would be treated as peer-to-peer DMA,
> which may or may not be allowed (or supported at all). For mem space
> which isn't claimed by BARs, it's up to the RC to decide what to do. As
> a concrete example (which might possibly be relevant) the PLDA XR3-AXI
> IP which we have in the ARM Juno SoC has the ATR_PCIE_WINx registers in
> its root port configuration block that control what ranges of mem space
> are mapped to the external AXI master interface and how.

The HW dev told me that the Verilog code for the RC considers
packets not targeted at RC BAR0 an error, and drops it.


>> My problem is that, in the current implementation of the
>> PCIe controller, the USB device that wants to write to
>> memory is supposed to target BAR0 of the RC.
> 
> That doesn't sound right at all. If the RC has a BAR, I'd expect it to
> be for poking the guts of the RC device itself (since this prompted me
> to go and compare, I see the Juno RC does indeed have it own enigmatic
> 16KB BAR, which reads as ever-changing random junk; no idea what that's
> about).

That's not how our RC works. If I want to poke its guts, I have
some MMIO addresses on the global bus. RC BAR0 is strictly used
as a window to the global bus.


>> Since my mem space is limited to 256 MB, then BAR0 is
>> limited to 256 MB (or even 128 MB, since I also need
>> to mapthe device's BAR into the same mem space).
> 
> Your window into mem space *from the CPU's point of view* is limited to
> 256MB. The relationship between mem space and the system (AXI) memory
> map from the point of view of PCI devices is a separate issue; if it's
> configurable at all, it probably makes sense to have the firmware set an
> outbound window to at least cover DRAM 1:1, then forget about it (this
> is essentially what Juno UEFI does, for example).

The size of RC BAR0 is limited to 1 GB, so best case I can map
1 GB back to the system RAM. Well, actually best case is 896 MB
since 1/8 of the window must map the MSI doorbell region.

I'll see what I can come up with.

Thanks a lot for your comments.

Regards.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Neophyte questions about PCIe
  2017-03-10 14:06       ` David Laight
@ 2017-03-10 15:05         ` Mason
  2017-03-10 15:14           ` David Laight
                             ` (2 more replies)
  0 siblings, 3 replies; 33+ messages in thread
From: Mason @ 2017-03-10 15:05 UTC (permalink / raw)
  To: David Laight, Robin Murphy
  Cc: Bjorn Helgaas, Rob Herring, Arnd Bergmann, Ard Biesheuvel,
	Marc Zyngier, linux-pci, Thibaud Cornic, linux-usb,
	Phuong Nguyen, Shawn Lin, Linux ARM

On 10/03/2017 15:06, David Laight wrote:

> Robin Murphy wrote:
>
>> On 09/03/17 23:43, Mason wrote:
>>
>>> I think I'm making progress, in that I now have a better
>>> idea of what I don't understand. So I'm able to ask
>>> (hopefully) less vague questions.
>>>
>>> Take the USB3 PCIe adapter I've been testing with. At some
>>> point during init, the XHCI driver request some memory
>>> (via kmalloc?) in order to exchange data with the host, right?
>>>
>>> On my SoC, the RAM used by Linux lives at physical range
>>> [0x8000_0000, 0x8800_0000[ => 128 MB
>>>
>>> How does the XHCI driver make the adapter aware of where
>>> it can scribble data? The XHCI driver has no notion that
>>> the device is behind a bus, does it?
>>>
>>> At some point, the physical addresses must be converted
>>> to PCI bus addresses, right? Is it computed subtracting
>>> the offset defined in the DT?
> 
> The driver should call dma_alloc_coherent() which returns both the
> kernel virtual address and the device (xhci controller) has
> to use to access it.
> The cpu physical address is irrelevant (although it might be
> calculated in the middle somewhere).

Thank you for that missing piece of the puzzle.
I see some relevant action in drivers/usb/host/xhci-mem.c

And I now see this log:

[    2.499320] xhci_hcd 0000:01:00.0: // Device context base array address = 0x8e07e000 (DMA), d0855000 (virt)
[    2.509156] xhci_hcd 0000:01:00.0: Allocated command ring at cfb04200
[    2.515640] xhci_hcd 0000:01:00.0: First segment DMA is 0x8e07f000
[    2.521863] xhci_hcd 0000:01:00.0: // Setting command ring address to 0x20
[    2.528786] xhci_hcd 0000:01:00.0: // xHC command ring deq ptr low bits + flags = @00000000
[    2.537188] xhci_hcd 0000:01:00.0: // xHC command ring deq ptr high bits = @00000000
[    2.545002] xhci_hcd 0000:01:00.0: // Doorbell array is located at offset 0x800 from cap regs base addr
[    2.554455] xhci_hcd 0000:01:00.0: // xHCI capability registers at d0852000:
[    2.561550] xhci_hcd 0000:01:00.0: // @d0852000 = 0x1000020 (CAPLENGTH AND HCIVERSION)

I believe 0x8e07e000 is a CPU address, not a PCI bus address.


>>> Then suppose the USB3 card wants to write to an address
>>> in RAM. It sends a packet on the PCIe bus, targeting
>>> the PCI bus address of that RAM, right? Is this address
>>> supposed to be in BAR0 of the root complex? I guess not,
>>> since Bjorn said that it was unusual for a RC to have
>>> a BAR at all. So I'll hand-wave, and decree that, by some
>>> protocol magic, the packet arrives at the PCIe controller.
>>> And this controller knows to forward this write request
>>> over the memory bus. Does that look about right?
>>
>> Generally, yes - if an area of memory space *is* claimed by a BAR, then
>> another PCI device accessing that would be treated as peer-to-peer DMA,
>> which may or may not be allowed (or supported at all).
> 
> So PCIe addresses that refer to the host memory addresses are
> just forwarded to the memory subsystem.
> In practise this is almost everything.

My RC drops packets not targeting its BAR0.

> The only other PCIe writes the host will see are likely to be associated
> with MIS and MSI-X interrupt support.

Rev 1 of the PCIe controller is supposed to forward MSI doorbell
writes over the global bus to the PCIe controller's MMIO register.

> Some PCIe root complex support peer-to-peer writes but not reads.
> Write are normally 'posted' (so are 'fire and forget') reads need the
> completion TLP (containing the data) sent back - all hard and difficult.
> 
>> For mem space
>> which isn't claimed by BARs, it's up to the RC to decide what to do. As
>> a concrete example (which might possibly be relevant) the PLDA XR3-AXI
>> IP which we have in the ARM Juno SoC has the ATR_PCIE_WINx registers in
>> its root port configuration block that control what ranges of mem space
>> are mapped to the external AXI master interface and how.
>>
>>> My problem is that, in the current implementation of the
>>> PCIe controller, the USB device that wants to write to
>>> memory is supposed to target BAR0 of the RC.
>>
>> That doesn't sound right at all. If the RC has a BAR, I'd expect it to
>> be for poking the guts of the RC device itself (since this prompted me
>> to go and compare, I see the Juno RC does indeed have it own enigmatic
>> 16KB BAR, which reads as ever-changing random junk; no idea what that's
>> about).
>>
>>> Since my mem space is limited to 256 MB, then BAR0 is
>>> limited to 256 MB (or even 128 MB, since I also need
>>> to mapthe device's BAR into the same mem space).
>>
>> Your window into mem space *from the CPU's point of view* is limited to
>> 256MB. The relationship between mem space and the system (AXI) memory
>> map from the point of view of PCI devices is a separate issue; if it's
>> configurable at all, it probably makes sense to have the firmware set an
>> outbound window to at least cover DRAM 1:1, then forget about it (this
>> is essentially what Juno UEFI does, for example).
> 
> So you have 128MB (max) of system memory that has cpu physical
> addresses 0x80000000 upwards.
> I'd expect it all to be accessible from any PCIe card at some PCIe
> address, it might be at address 0, 0x80000000 or any other offset.
> 
> I don't know which DT entry controls that offset.

This is a crucial point, I think.

Regards.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* RE: Neophyte questions about PCIe
  2017-03-10 15:05         ` Mason
@ 2017-03-10 15:14           ` David Laight
  2017-03-10 15:33             ` Mason
  2017-03-10 15:23           ` Robin Murphy
  2017-03-10 18:49           ` Bjorn Helgaas
  2 siblings, 1 reply; 33+ messages in thread
From: David Laight @ 2017-03-10 15:14 UTC (permalink / raw)
  To: 'Mason', Robin Murphy
  Cc: Bjorn Helgaas, Rob Herring, Arnd Bergmann, Ard Biesheuvel,
	Marc Zyngier, linux-pci, Thibaud Cornic, linux-usb,
	Phuong Nguyen, Shawn Lin, Linux ARM

From: Mason
> Sent: 10 March 2017 15:06
...
> My RC drops packets not targeting its BAR0.

I suspect the fpga/cpld logic supports RC and endpoint modes
and is using much the same names for the registers (and logic
implementation).

If your cpu support more than 1GB of memory but only part is
PCIe accessible you'll have to ensure that all the memory
definitions are set correctly and 'bounce buffers' used for
some operations.

	David

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Neophyte questions about PCIe
  2017-03-10 15:05         ` Mason
  2017-03-10 15:14           ` David Laight
@ 2017-03-10 15:23           ` Robin Murphy
  2017-03-10 15:35             ` David Laight
  2017-03-10 18:49           ` Bjorn Helgaas
  2 siblings, 1 reply; 33+ messages in thread
From: Robin Murphy @ 2017-03-10 15:23 UTC (permalink / raw)
  To: Mason
  Cc: David Laight, Bjorn Helgaas, Rob Herring, Arnd Bergmann,
	Ard Biesheuvel, Marc Zyngier, linux-pci, Thibaud Cornic,
	linux-usb, Phuong Nguyen, Shawn Lin, Linux ARM

On 10/03/17 15:05, Mason wrote:
> On 10/03/2017 15:06, David Laight wrote:
> 
>> Robin Murphy wrote:
>>
>>> On 09/03/17 23:43, Mason wrote:
>>>
>>>> I think I'm making progress, in that I now have a better
>>>> idea of what I don't understand. So I'm able to ask
>>>> (hopefully) less vague questions.
>>>>
>>>> Take the USB3 PCIe adapter I've been testing with. At some
>>>> point during init, the XHCI driver request some memory
>>>> (via kmalloc?) in order to exchange data with the host, right?
>>>>
>>>> On my SoC, the RAM used by Linux lives at physical range
>>>> [0x8000_0000, 0x8800_0000[ => 128 MB
>>>>
>>>> How does the XHCI driver make the adapter aware of where
>>>> it can scribble data? The XHCI driver has no notion that
>>>> the device is behind a bus, does it?
>>>>
>>>> At some point, the physical addresses must be converted
>>>> to PCI bus addresses, right? Is it computed subtracting
>>>> the offset defined in the DT?
>>
>> The driver should call dma_alloc_coherent() which returns both the
>> kernel virtual address and the device (xhci controller) has
>> to use to access it.
>> The cpu physical address is irrelevant (although it might be
>> calculated in the middle somewhere).
> 
> Thank you for that missing piece of the puzzle.
> I see some relevant action in drivers/usb/host/xhci-mem.c
> 
> And I now see this log:
> 
> [    2.499320] xhci_hcd 0000:01:00.0: // Device context base array address = 0x8e07e000 (DMA), d0855000 (virt)
> [    2.509156] xhci_hcd 0000:01:00.0: Allocated command ring at cfb04200
> [    2.515640] xhci_hcd 0000:01:00.0: First segment DMA is 0x8e07f000
> [    2.521863] xhci_hcd 0000:01:00.0: // Setting command ring address to 0x20
> [    2.528786] xhci_hcd 0000:01:00.0: // xHC command ring deq ptr low bits + flags = @00000000
> [    2.537188] xhci_hcd 0000:01:00.0: // xHC command ring deq ptr high bits = @00000000
> [    2.545002] xhci_hcd 0000:01:00.0: // Doorbell array is located at offset 0x800 from cap regs base addr
> [    2.554455] xhci_hcd 0000:01:00.0: // xHCI capability registers at d0852000:
> [    2.561550] xhci_hcd 0000:01:00.0: // @d0852000 = 0x1000020 (CAPLENGTH AND HCIVERSION)
> 
> I believe 0x8e07e000 is a CPU address, not a PCI bus address.
> 
> 
>>>> Then suppose the USB3 card wants to write to an address
>>>> in RAM. It sends a packet on the PCIe bus, targeting
>>>> the PCI bus address of that RAM, right? Is this address
>>>> supposed to be in BAR0 of the root complex? I guess not,
>>>> since Bjorn said that it was unusual for a RC to have
>>>> a BAR at all. So I'll hand-wave, and decree that, by some
>>>> protocol magic, the packet arrives at the PCIe controller.
>>>> And this controller knows to forward this write request
>>>> over the memory bus. Does that look about right?
>>>
>>> Generally, yes - if an area of memory space *is* claimed by a BAR, then
>>> another PCI device accessing that would be treated as peer-to-peer DMA,
>>> which may or may not be allowed (or supported at all).
>>
>> So PCIe addresses that refer to the host memory addresses are
>> just forwarded to the memory subsystem.
>> In practise this is almost everything.
> 
> My RC drops packets not targeting its BAR0.

OK, so it does sound like you're in a particularly awkward position that
rules out using a sane 1:1 mapping between mem space and the system
address map.

>> The only other PCIe writes the host will see are likely to be associated
>> with MIS and MSI-X interrupt support.
> 
> Rev 1 of the PCIe controller is supposed to forward MSI doorbell
> writes over the global bus to the PCIe controller's MMIO register.
> 
>> Some PCIe root complex support peer-to-peer writes but not reads.
>> Write are normally 'posted' (so are 'fire and forget') reads need the
>> completion TLP (containing the data) sent back - all hard and difficult.
>>
>>> For mem space
>>> which isn't claimed by BARs, it's up to the RC to decide what to do. As
>>> a concrete example (which might possibly be relevant) the PLDA XR3-AXI
>>> IP which we have in the ARM Juno SoC has the ATR_PCIE_WINx registers in
>>> its root port configuration block that control what ranges of mem space
>>> are mapped to the external AXI master interface and how.
>>>
>>>> My problem is that, in the current implementation of the
>>>> PCIe controller, the USB device that wants to write to
>>>> memory is supposed to target BAR0 of the RC.
>>>
>>> That doesn't sound right at all. If the RC has a BAR, I'd expect it to
>>> be for poking the guts of the RC device itself (since this prompted me
>>> to go and compare, I see the Juno RC does indeed have it own enigmatic
>>> 16KB BAR, which reads as ever-changing random junk; no idea what that's
>>> about).
>>>
>>>> Since my mem space is limited to 256 MB, then BAR0 is
>>>> limited to 256 MB (or even 128 MB, since I also need
>>>> to mapthe device's BAR into the same mem space).
>>>
>>> Your window into mem space *from the CPU's point of view* is limited to
>>> 256MB. The relationship between mem space and the system (AXI) memory
>>> map from the point of view of PCI devices is a separate issue; if it's
>>> configurable at all, it probably makes sense to have the firmware set an
>>> outbound window to at least cover DRAM 1:1, then forget about it (this
>>> is essentially what Juno UEFI does, for example).
>>
>> So you have 128MB (max) of system memory that has cpu physical
>> addresses 0x80000000 upwards.
>> I'd expect it all to be accessible from any PCIe card at some PCIe
>> address, it might be at address 0, 0x80000000 or any other offset.
>>
>> I don't know which DT entry controls that offset.
> 
> This is a crucial point, I think.

The appropriate DT property would be "dma-ranges", i.e.

pci@... {
	...
	dma-ranges = <(PCI bus address) (CPU phys address) (size)>;
}

The fun part is that that will only actually match the hardware once the
magic BAR has actually been programmed with (bus address), so you end up
with this part of your DT being more of a prophecy than a property :)

Robin.

> 
> Regards.
> 

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Neophyte questions about PCIe
  2017-03-10 15:14           ` David Laight
@ 2017-03-10 15:33             ` Mason
  0 siblings, 0 replies; 33+ messages in thread
From: Mason @ 2017-03-10 15:33 UTC (permalink / raw)
  To: David Laight, Robin Murphy
  Cc: Bjorn Helgaas, Rob Herring, Arnd Bergmann, Ard Biesheuvel,
	Marc Zyngier, linux-pci, Thibaud Cornic, linux-usb,
	Phuong Nguyen, Shawn Lin, Linux ARM

On 10/03/2017 16:14, David Laight wrote:

> Mason wrote:
> 
>> My RC drops packets not targeting its BAR0.
> 
> I suspect the fpga/cpld logic supports RC and endpoint modes
> and is using much the same names for the registers (and logic
> implementation).

Your guess is spot on.

In the controller's MMIO registers, the so-called core_conf_0
register has the following field:

chip_is_root: 1 means tango is root port, 0 means tango is endpoint.

> If your cpu support more than 1GB of memory but only part is
> PCIe accessible you'll have to ensure that all the memory
> definitions are set correctly and 'bounce buffers' used for
> some operations.

Do you mean I would have to "fix" something in the XHCI driver?

Hopefully, no customer plans to give Linux more than 1 GB.
(Although the latest systems do support 4 GB... A lot of it is
used for video buffers, handled outside Linux.)

Regards.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* RE: Neophyte questions about PCIe
  2017-03-10 15:23           ` Robin Murphy
@ 2017-03-10 15:35             ` David Laight
  2017-03-10 16:00               ` Robin Murphy
  0 siblings, 1 reply; 33+ messages in thread
From: David Laight @ 2017-03-10 15:35 UTC (permalink / raw)
  To: 'Robin Murphy', Mason
  Cc: Bjorn Helgaas, Rob Herring, Arnd Bergmann, Ard Biesheuvel,
	Marc Zyngier, linux-pci, Thibaud Cornic, linux-usb,
	Phuong Nguyen, Shawn Lin, Linux ARM

From: Robin Murphy=20
> Sent: 10 March 2017 15:23
...
> >> So you have 128MB (max) of system memory that has cpu physical
> >> addresses 0x80000000 upwards.
> >> I'd expect it all to be accessible from any PCIe card at some PCIe
> >> address, it might be at address 0, 0x80000000 or any other offset.
> >>
> >> I don't know which DT entry controls that offset.
> >
> > This is a crucial point, I think.
>=20
> The appropriate DT property would be "dma-ranges", i.e.
>=20
> pci@... {
> 	...
> 	dma-ranges =3D <(PCI bus address) (CPU phys address) (size)>;
> }

Isn't that just saying which physical addresses the cpu can assign
for buffers for those devices?
There is also an offset between the 'cpu physical address' and the
'dma address'.
This might be implicit in the 'BAR0' base address register.
=20
> The fun part is that that will only actually match the hardware once the
> magic BAR has actually been programmed with (bus address), so you end up
> with this part of your DT being more of a prophecy than a property :)

The BAR0 values could easily be programmed into the cpld/fpga - so
not need writing by the cpu at all.

	David

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Neophyte questions about PCIe
  2017-03-10 15:35             ` David Laight
@ 2017-03-10 16:00               ` Robin Murphy
  2017-03-13 10:59                 ` Mason
  0 siblings, 1 reply; 33+ messages in thread
From: Robin Murphy @ 2017-03-10 16:00 UTC (permalink / raw)
  To: David Laight, Mason
  Cc: Bjorn Helgaas, Rob Herring, Arnd Bergmann, Ard Biesheuvel,
	Marc Zyngier, linux-pci, Thibaud Cornic, linux-usb,
	Phuong Nguyen, Shawn Lin, Linux ARM

On 10/03/17 15:35, David Laight wrote:
> From: Robin Murphy 
>> Sent: 10 March 2017 15:23
> ...
>>>> So you have 128MB (max) of system memory that has cpu physical
>>>> addresses 0x80000000 upwards.
>>>> I'd expect it all to be accessible from any PCIe card at some PCIe
>>>> address, it might be at address 0, 0x80000000 or any other offset.
>>>>
>>>> I don't know which DT entry controls that offset.
>>>
>>> This is a crucial point, I think.
>>
>> The appropriate DT property would be "dma-ranges", i.e.
>>
>> pci@... {
>> 	...
>> 	dma-ranges = <(PCI bus address) (CPU phys address) (size)>;
>> }
> 
> Isn't that just saying which physical addresses the cpu can assign
> for buffers for those devices?
> There is also an offset between the 'cpu physical address' and the
> 'dma address'.

That offset is inherent in what "dma-ranges" describes. Say (for ease of
calculation) that BAR0 has been put at a mem space address of 0x20000000
and maps the first 1GB of physical DRAM. That would give us:

	dma-ranges = <0x20000000 0x80000000 0x40000000>;

Then a "virt = dma_alloc_coherent(..., &handle, ...)", borrowing the
numbers from earlier in the thread, would automatically end up with:

	virt == 0xd0855000;
	handle == 0x2e07e000;

(with the physical address of 0x8e07e000 in between being irrelevant to
the consuming driver)

It is true that the device's DMA mask assignment is also part and parcel
of this, whereby we will limit what physical addresses the kernel
considers valid for DMA involving devices behind this range to the lower
3GB (i.e. 0x80000000 + 0x40000000 - 1). With a bit of luck,
CONFIG_DMABOUNCE should do the rest of the job of working around that
where necessary.

Robin.

> This might be implicit in the 'BAR0' base address register.
>  
>> The fun part is that that will only actually match the hardware once the
>> magic BAR has actually been programmed with (bus address), so you end up
>> with this part of your DT being more of a prophecy than a property :)
> 
> The BAR0 values could easily be programmed into the cpld/fpga - so
> not need writing by the cpu at all.
> 
> 	David
> 

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Neophyte questions about PCIe
  2017-03-09 23:43   ` Mason
  2017-03-10 13:15     ` Robin Murphy
@ 2017-03-10 16:45     ` Mason
  2017-03-10 17:49       ` Mason
  1 sibling, 1 reply; 33+ messages in thread
From: Mason @ 2017-03-10 16:45 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: linux-pci, linux-usb, Rob Herring, Arnd Bergmann, Ard Biesheuvel,
	Marc Zyngier, Thibaud Cornic, David Laight, Phuong Nguyen,
	Shawn Lin, Robin Murphy, Linux ARM

On 10/03/2017 00:43, Mason wrote:

> I think I'm making progress [...]

Yes! I was able to plug a USB3 Flash drive, mount it,
and read its contents. A million thanks, my head was
starting to hurt from too much banging.

Time to clean up a million hacks to be able to discuss
the finer points.

Regards.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Neophyte questions about PCIe
  2017-03-10 16:45     ` Mason
@ 2017-03-10 17:49       ` Mason
  2017-03-11 10:57         ` Mason
                           ` (2 more replies)
  0 siblings, 3 replies; 33+ messages in thread
From: Mason @ 2017-03-10 17:49 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: linux-pci, linux-usb, Rob Herring, Arnd Bergmann, Ard Biesheuvel,
	Marc Zyngier, Thibaud Cornic, David Laight, Phuong Nguyen,
	Shawn Lin, Robin Murphy, Linux ARM

On 10/03/2017 17:45, Mason wrote:

> Time to clean up a million hacks to be able to discuss the finer points.

Here is my current boot log:

[    1.133895] OF: PCI: host bridge /soc/pcie@50000000 ranges:
[    1.139607] pci_add_resource_offset: res=[bus 00-0f] offset=0x0
[    1.145659] OF: PCI: Parsing ranges property...
[    1.150316] OF: PCI:   MEM 0x54000000..0x5fffffff -> 0x04000000
[    1.156364] pci_add_resource_offset: res=[mem 0x54000000-0x5fffffff] offset=0x50000000
[    1.164628] pci_tango 50000000.pcie: ECAM at [mem 0x50000000-0x50ffffff] for [bus 00-0f]
[    1.173033] pci_tango 50000000.pcie: PCI host bridge to bus 0000:00
[    1.179440] pci_bus 0000:00: root bus resource [bus 00-0f]
[    1.185056] pci_bus 0000:00: root bus resource [mem 0x54000000-0x5fffffff] (bus address [0x04000000-0x0fffffff])
[    1.195386] pci_bus 0000:00: scanning bus
[    1.199539] pci 0000:00:00.0: [1105:0024] type 01 class 0x048000
[    1.205691] pci 0000:00:00.0: calling tango_pcie_fixup_class+0x0/0x10
[    1.212277] pci 0000:00:00.0: reg 0x10: [mem 0x00000000-0x3fffffff 64bit]
[    1.219220] pci 0000:00:00.0: calling pci_fixup_ide_bases+0x0/0x40
[    1.225570] pci 0000:00:00.0: supports D1 D2
[    1.229957] pci 0000:00:00.0: PME# supported from D0 D1 D2 D3hot
[    1.236092] pci 0000:00:00.0: PME# disabled
[    1.240576] pci_bus 0000:00: fixups for bus
[    1.244886] PCI: bus0: Fast back to back transfers disabled
[    1.250587] pci 0000:00:00.0: scanning [bus 00-00] behind bridge, pass 0
[    1.257420] pci 0000:00:00.0: bridge configuration invalid ([bus 00-00]), reconfiguring
[    1.265567] pci 0000:00:00.0: scanning [bus 00-00] behind bridge, pass 1
[    1.272517] pci_bus 0000:01: busn_res: can not insert [bus 01-ff] under [bus 00-0f] (conflicts with (null) [bus 00-0f])
[    1.283462] pci_bus 0000:01: scanning bus
[    1.287623] pci 0000:01:00.0: [1912:0014] type 00 class 0x0c0330
[    1.293799] pci 0000:01:00.0: reg 0x10: [mem 0x00000000-0x00001fff 64bit]
[    1.300799] pci 0000:01:00.0: calling pci_fixup_ide_bases+0x0/0x40
[    1.307223] pci 0000:01:00.0: PME# supported from D0 D3hot D3cold
[    1.313446] pci 0000:01:00.0: PME# disabled
[    1.318053] pci_bus 0000:01: fixups for bus
[    1.322362] PCI: bus1: Fast back to back transfers disabled
[    1.328060] pci_bus 0000:01: bus scan returning with max=01
[    1.333759] pci_bus 0000:01: busn_res: [bus 01-ff] end is updated to 01
[    1.340506] pci_bus 0000:00: bus scan returning with max=01
[    1.346205] pci 0000:00:00.0: fixup irq: got 0
[    1.350765] pci 0000:00:00.0: assigning IRQ 00
[    1.355332] pci 0000:01:00.0: fixup irq: got 0
[    1.359892] pci 0000:01:00.0: assigning IRQ 00
[    1.364479] pci 0000:00:00.0: BAR 0: no space for [mem size 0x40000000 64bit]
[    1.371748] pci 0000:00:00.0: BAR 0: failed to assign [mem size 0x40000000 64bit]
[    1.379369] pci 0000:00:00.0: BAR 8: assigned [mem 0x54000000-0x540fffff]
[    1.386291] pci 0000:01:00.0: BAR 0: assigned [mem 0x54000000-0x54001fff 64bit]
[    1.393747] pci 0000:00:00.0: PCI bridge to [bus 01]
[    1.398833] pci 0000:00:00.0:   bridge window [mem 0x54000000-0x540fffff]
[    1.405767] pci 0000:00:00.0: calling tango_pcie_bar_quirk+0x0/0x40
[    1.412160] tango_pcie_bar_quirk: bus=0 devfn=0
[    1.416843] pcieport 0000:00:00.0: enabling device (0140 -> 0142)
[    1.423074] pcieport 0000:00:00.0: enabling bus mastering
[    1.428652] altera_irq_domain_alloc: ENTER
[    1.432876] FOO-msi 2e080.msi: msi#0 address_hi 0x0 address_lo 0x9002e07c
[    1.440007] FOO-msi 2e080.msi: msi#0 address_hi 0x0 address_lo 0x9002e07c
[    1.446972] aer 0000:00:00.0:pcie002: service driver aer loaded
[    1.453157] pci 0000:01:00.0: calling quirk_usb_early_handoff+0x0/0x7e0
[    1.459913] pci 0000:01:00.0: enabling device (0140 -> 0142)
[    1.465709] quirk_usb_handoff_xhci: ioremap(0x54000000, 8192)
[    1.471589] xhci_find_next_ext_cap: offset=0x500
[    1.476325] val = 0x1000401
...
[    1.624093] ehci_hcd: USB 2.0 'Enhanced' Host Controller (EHCI) Driver
[    1.630675] ehci-pci: EHCI PCI platform driver
[    1.635338] xhci_hcd 0000:01:00.0: enabling bus mastering
[    1.640789] xhci_hcd 0000:01:00.0: xHCI Host Controller
[    1.646071] xhci_hcd 0000:01:00.0: new USB bus registered, assigned bus number 1
[    1.659065] xhci_find_next_ext_cap: offset=0x500
[    1.663714] val = 0x1000401
[    1.666526] xhci_find_next_ext_cap: offset=0x510
[    1.671171] val = 0x3000502
[    1.673984] xhci_find_next_ext_cap: offset=0x510
[    1.678632] val = 0x3000502
[    1.681433] xhci_find_next_ext_cap: offset=0x524
[    1.686079] val = 0x2000702
[    1.688888] xhci_find_next_ext_cap: offset=0x524
[    1.693533] val = 0x2000702
[    1.696343] xhci_find_next_ext_cap: offset=0x540
[    1.700987] val = 0x4c0
[    1.703446] xhci_find_next_ext_cap: offset=0x550
[    1.708091] val = 0xa
[    1.710382] xhci_find_next_ext_cap: offset=0x510
[    1.715028] val = 0x3000502
[    1.717837] xhci_find_next_ext_cap: offset=0x524
[    1.722482] val = 0x2000702
[    1.725304] xhci_hcd 0000:01:00.0: hcc params 0x014051cf hci version 0x100 quirks 0x00000010
[    1.733801] xhci_hcd 0000:01:00.0: enabling Mem-Wr-Inval
[    1.739222] altera_irq_domain_alloc: ENTER
[    1.743393] altera_irq_domain_alloc: ENTER
[    1.747543] altera_irq_domain_alloc: ENTER
[    1.751674] FOO-msi 2e080.msi: msi#1 address_hi 0x0 address_lo 0x9002e07c
[    1.758514] FOO-msi 2e080.msi: msi#2 address_hi 0x0 address_lo 0x9002e07c
[    1.765347] FOO-msi 2e080.msi: msi#3 address_hi 0x0 address_lo 0x9002e07c
[    1.772229] FOO-msi 2e080.msi: msi#1 address_hi 0x0 address_lo 0x9002e07c
[    1.779098] FOO-msi 2e080.msi: msi#2 address_hi 0x0 address_lo 0x9002e07c
[    1.785957] FOO-msi 2e080.msi: msi#3 address_hi 0x0 address_lo 0x9002e07c
[    1.792979] usb usb1: New USB device found, idVendor=1d6b, idProduct=0002
[    1.799817] usb usb1: New USB device strings: Mfr=3, Product=2, SerialNumber=1
[    1.807085] usb usb1: Product: xHCI Host Controller
[    1.811994] usb usb1: Manufacturer: Linux 4.9.7-1-rc2 xhci-hcd
[    1.817863] usb usb1: SerialNumber: 0000:01:00.0
[    1.823072] hub 1-0:1.0: USB hub found
[    1.826890] hub 1-0:1.0: 4 ports detected
[    1.831199] xhci_hcd 0000:01:00.0: xHCI Host Controller
[    1.836473] xhci_hcd 0000:01:00.0: new USB bus registered, assigned bus number 2
[    1.843925] cmd=c8852020 status=c8852024
[    1.847946] usb usb2: We don't know the algorithms for LPM for this host, disabling LPM.
[    1.856168] usb usb2: New USB device found, idVendor=1d6b, idProduct=0003
[    1.863004] usb usb2: New USB device strings: Mfr=3, Product=2, SerialNumber=1
[    1.870275] usb usb2: Product: xHCI Host Controller
[    1.875185] usb usb2: Manufacturer: Linux 4.9.7-1-rc2 xhci-hcd
[    1.881055] usb usb2: SerialNumber: 0000:01:00.0
[    1.886151] hub 2-0:1.0: USB hub found
[    1.889964] hub 2-0:1.0: 4 ports detected
[    1.894660] usbcore: registered new interface driver usb-storage


And when I insert/remove my Flash drive:

[  216.216744] usb 2-1: new SuperSpeed USB device number 2 using xhci_hcd
[  216.250189] usb 2-1: New USB device found, idVendor=0951, idProduct=1666
[  216.256945] usb 2-1: New USB device strings: Mfr=1, Product=2, SerialNumber=3
[  216.264130] usb 2-1: Product: DataTraveler 3.0
[  216.268607] usb 2-1: Manufacturer: Kingston
[  216.272821] usb 2-1: SerialNumber: 002618887865F0C0F8646BFA
[  216.283005] usb-storage 2-1:1.0: USB Mass Storage device detected
[  216.289492] scsi host0: usb-storage 2-1:1.0
[  217.299474] scsi 0:0:0:0: Direct-Access     Kingston DataTraveler 3.0      PQ: 0 ANSI: 6
[  217.309981] sd 0:0:0:0: [sda] 15109516 512-byte logical blocks: (7.74 GB/7.20 GiB)
[  217.320354] sd 0:0:0:0: [sda] Write Protect is off
[  217.325722] sd 0:0:0:0: [sda] Mode Sense: 4f 00 00 00
[  217.331333] sd 0:0:0:0: [sda] Write cache: disabled, read cache: enabled, doesn't support DPO or FUA
[  217.343691]  sda: sda1
[  217.347819] sd 0:0:0:0: [sda] Attached SCSI removable disk
[  217.371940] random: fast init done
[  217.547108] FAT-fs (sda1): Volume was not properly unmounted. Some data may be corrupt. Please run fsck.

[  244.509391] pcieport 0000:00:00.0: AER: Uncorrected (Non-Fatal) error received: id=0000
[  244.517478] pcieport 0000:00:00.0: PCIe Bus Error: severity=Uncorrected (Non-Fatal), type=Transaction Layer, id=0000(Requester ID)
[  244.529525] pcieport 0000:00:00.0:   device [1105:0024] error status/mask=00004000/00000000
[  244.538033] pcieport 0000:00:00.0:    [14] Completion Timeout     (First)
[  244.544940] pcieport 0000:00:00.0: broadcast error_detected message
[  244.551301] pcieport 0000:00:00.0: AER: Device recovery failed
[  244.828674] xhci_hcd 0000:01:00.0: Cannot set link state.
[  244.834177] usb usb2-port1: cannot disable (err = -32)
[  244.839359] usb 2-1: USB disconnect, device number 2

Hmmm, that sounds fishy.


# cat /proc/interrupts 
           CPU0       CPU1       
 19:       1958       2212     GIC-0  29 Edge      twd
 20:        107          0      irq0   1 Level     serial
 25:          1          0  PCIe MSI   0 Edge      aerdrv
 27:        471          0  PCIe MSI 524288 Edge      xhci_hcd
 28:          0          0  PCIe MSI 524289 Edge      xhci_hcd
 29:          0          0  PCIe MSI 524290 Edge      xhci_hcd
IPI0:          0          0  CPU wakeup interrupts
IPI1:          0          0  Timer broadcast interrupts
IPI2:       1094       2376  Rescheduling interrupts
IPI3:          0        132  Function call interrupts
IPI4:          0          0  CPU stop interrupts
IPI5:          1          0  IRQ work interrupts
IPI6:          0          0  completion interrupts
Err:          0

The MSI indices look fishy.
524288 = 0x80000
Not sure where that come from.


# /usr/sbin/lspci -v
00:00.0 PCI bridge: Sigma Designs, Inc. Device 0024 (rev 01) (prog-if 00 [Normal decode])
        Flags: bus master, fast devsel, latency 0
        Memory at <ignored> (64-bit, non-prefetchable)
        Bus: primary=00, secondary=01, subordinate=01, sec-latency=0
        I/O behind bridge: 00000000-00000fff
        Memory behind bridge: 04000000-040fffff
        Prefetchable memory behind bridge: 00000000-000fffff
        Capabilities: [50] MSI: Enable+ Count=1/4 Maskable- 64bit+
        Capabilities: [78] Power Management version 3
        Capabilities: [80] Express Root Port (Slot-), MSI 03
        Capabilities: [100] Virtual Channel
        Capabilities: [800] Advanced Error Reporting
        Kernel driver in use: pcieport

01:00.0 USB controller: Renesas Technology Corp. uPD720201 USB 3.0 Host Controller (rev 03) (prog-if 30 [XHCI])
        Flags: bus master, fast devsel, latency 0
        Memory at 54000000 (64-bit, non-prefetchable) [size=8K]
        Capabilities: [50] Power Management version 3
        Capabilities: [70] MSI: Enable- Count=1/8 Maskable- 64bit+
        Capabilities: [90] MSI-X: Enable+ Count=8 Masked-
        Capabilities: [a0] Express Endpoint, MSI 00
        Capabilities: [100] Advanced Error Reporting
        Capabilities: [150] Latency Tolerance Reporting
        Kernel driver in use: xhci_hcd


Still some weirdness here.
I might be using too old a version: lspci version 3.2.1


And my current code, to work-around the silicon bugs:

#include <linux/kernel.h>
#include <linux/init.h>
#include <linux/ioport.h>
#include <linux/of_pci.h>
#include <linux/of.h>
#include <linux/pci-ecam.h>
#include <linux/platform_device.h>

//#define DEBUG_CONFIG

static int tango_config_read(struct pci_bus *bus, unsigned int devfn,
				    int where, int size, u32 *val)
{
	int ret;
	void __iomem *pci_conf = (void *)0xf002e048;

#ifdef DEBUG_CONFIG
	if (where == PCI_BASE_ADDRESS_0)
		dump_stack();
#endif

	writel(1, pci_conf);

	if (devfn != 0) {
		*val = ~0;
		return PCIBIOS_DEVICE_NOT_FOUND;
	}

	ret = pci_generic_config_read(bus, devfn, where, size, val);

	writel(0, pci_conf);

#ifdef DEBUG_CONFIG
	printk("%s: bus=%d where=%d size=%d val=0x%x\n",
			__func__, bus->number, where, size, *val);
#endif

	return ret;
}

static int tango_config_write(struct pci_bus *bus, unsigned int devfn,
				     int where, int size, u32 val)
{
	int ret;
	void __iomem *pci_conf = (void *)0xf002e048;

#ifdef DEBUG_CONFIG
	if (where == PCI_BASE_ADDRESS_0)
		dump_stack();
#endif

#ifdef DEBUG_CONFIG
	printk("%s: bus=%d where=%d size=%d val=0x%x\n",
			__func__, bus->number, where, size, val);
#endif

	writel(1, pci_conf);

	ret = pci_generic_config_write(bus, devfn, where, size, val);

	writel(0, pci_conf);

	return ret;
}

static struct pci_ecam_ops tango_pci_ops = {
	.bus_shift	= 20,
	.pci_ops	= {
		.map_bus        = pci_ecam_map_bus,
		.read           = tango_config_read,
		.write          = tango_config_write,
	}
};

static const struct of_device_id tango_pci_ids[] = {
	{ .compatible = "sigma,smp8759-pcie" },
	{ /* sentinel */ },
};

static int tango_pci_probe(struct platform_device *pdev)
{
	return pci_host_common_probe(pdev, &tango_pci_ops);
}

static struct platform_driver tango_pci_driver = {
	.probe = tango_pci_probe,
	.driver = {
		.name = KBUILD_MODNAME,
		.of_match_table = tango_pci_ids,
	},
};

builtin_platform_driver(tango_pci_driver);

#define RIESLING_B 0x24

/* Root complex reports incorrect device class */
static void tango_pcie_fixup_class(struct pci_dev *dev)
{
	dev->class = PCI_CLASS_BRIDGE_PCI << 8;
}
DECLARE_PCI_FIXUP_EARLY(0x1105, RIESLING_B, tango_pcie_fixup_class);

static void tango_pcie_bar_quirk(struct pci_dev *dev)
{
	struct pci_bus *bus = dev->bus;

	printk("%s: bus=%d devfn=%d\n", __func__, bus->number, dev->devfn);

        pci_write_config_dword(dev, PCI_BASE_ADDRESS_0, 0x80000004);
}
DECLARE_PCI_FIXUP_FINAL(0x1105, PCI_ANY_ID, tango_pcie_bar_quirk);


Regards.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Neophyte questions about PCIe
  2017-03-10 15:05         ` Mason
  2017-03-10 15:14           ` David Laight
  2017-03-10 15:23           ` Robin Murphy
@ 2017-03-10 18:49           ` Bjorn Helgaas
  2 siblings, 0 replies; 33+ messages in thread
From: Bjorn Helgaas @ 2017-03-10 18:49 UTC (permalink / raw)
  To: Mason
  Cc: David Laight, Robin Murphy, Rob Herring, Arnd Bergmann,
	Ard Biesheuvel, Marc Zyngier, linux-pci, Thibaud Cornic,
	linux-usb, Phuong Nguyen, Shawn Lin, Linux ARM

On Fri, Mar 10, 2017 at 04:05:50PM +0100, Mason wrote:
> On 10/03/2017 15:06, David Laight wrote:
> 
> > Robin Murphy wrote:
> >
> >> On 09/03/17 23:43, Mason wrote:
> >>
> >>> I think I'm making progress, in that I now have a better
> >>> idea of what I don't understand. So I'm able to ask
> >>> (hopefully) less vague questions.
> >>>
> >>> Take the USB3 PCIe adapter I've been testing with. At some
> >>> point during init, the XHCI driver request some memory
> >>> (via kmalloc?) in order to exchange data with the host, right?
> >>>
> >>> On my SoC, the RAM used by Linux lives at physical range
> >>> [0x8000_0000, 0x8800_0000[ => 128 MB
> >>>
> >>> How does the XHCI driver make the adapter aware of where
> >>> it can scribble data? The XHCI driver has no notion that
> >>> the device is behind a bus, does it?
> >>>
> >>> At some point, the physical addresses must be converted
> >>> to PCI bus addresses, right? Is it computed subtracting
> >>> the offset defined in the DT?
> > 
> > The driver should call dma_alloc_coherent() which returns both the
> > kernel virtual address and the device (xhci controller) has
> > to use to access it.
> > The cpu physical address is irrelevant (although it might be
> > calculated in the middle somewhere).
> 
> Thank you for that missing piece of the puzzle.
> I see some relevant action in drivers/usb/host/xhci-mem.c
> 
> And I now see this log:
> 
> [    2.499320] xhci_hcd 0000:01:00.0: // Device context base array address = 0x8e07e000 (DMA), d0855000 (virt)
> [    2.509156] xhci_hcd 0000:01:00.0: Allocated command ring at cfb04200
> [    2.515640] xhci_hcd 0000:01:00.0: First segment DMA is 0x8e07f000
> [    2.521863] xhci_hcd 0000:01:00.0: // Setting command ring address to 0x20
> [    2.528786] xhci_hcd 0000:01:00.0: // xHC command ring deq ptr low bits + flags = @00000000
> [    2.537188] xhci_hcd 0000:01:00.0: // xHC command ring deq ptr high bits = @00000000
> [    2.545002] xhci_hcd 0000:01:00.0: // Doorbell array is located at offset 0x800 from cap regs base addr
> [    2.554455] xhci_hcd 0000:01:00.0: // xHCI capability registers at d0852000:
> [    2.561550] xhci_hcd 0000:01:00.0: // @d0852000 = 0x1000020 (CAPLENGTH AND HCIVERSION)
> 
> I believe 0x8e07e000 is a CPU address, not a PCI bus address.

Sounds like you've made good progress since this email, but I think
0x8e07e000 is a PCI bus address, not a CPU address.  The code in
xhci_mem_init() is this:

  xhci->dcbaa = dma_alloc_coherent(dev, sizeof(*xhci->dcbaa), &dma, flags);
  xhci->dcbaa->dma = dma;
  xhci_dbg_trace("Device context base array address = 0x%llx (DMA), %p (virt)",
                 (unsigned long long)xhci->dcbaa->dma, xhci->dcbaa);

dma_alloc_coherent() allocates a buffer and returns two values: the
CPU virtual address ("xhci->dcbaa" here) for use by the driver and the
corresponding DMA address the device can use to reach the buffer
("dma").

Documentation/DMA-API-HOWTO.txt has more details that might be useful.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Neophyte questions about PCIe
  2017-03-10 17:49       ` Mason
@ 2017-03-11 10:57         ` Mason
  2017-03-13 21:40           ` Bjorn Helgaas
  2017-03-13 14:25         ` Mason
  2017-03-14 14:00         ` Mason
  2 siblings, 1 reply; 33+ messages in thread
From: Mason @ 2017-03-11 10:57 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: linux-pci, linux-usb, Rob Herring, Arnd Bergmann, Ard Biesheuvel,
	Marc Zyngier, Thibaud Cornic, David Laight, Phuong Nguyen,
	Shawn Lin, Robin Murphy, Linux ARM, Arnd Bergmann, Kevin Hilman

On 10/03/2017 18:49, Mason wrote:

> And my current code, to work-around the silicon bugs:
> 
> #include <linux/kernel.h>
> #include <linux/init.h>
> #include <linux/ioport.h>
> #include <linux/of_pci.h>
> #include <linux/of.h>
> #include <linux/pci-ecam.h>
> #include <linux/platform_device.h>
> 
> //#define DEBUG_CONFIG
> 
> static int tango_config_read(struct pci_bus *bus, unsigned int devfn,
> 				    int where, int size, u32 *val)
> {
> 	int ret;
> 	void __iomem *pci_conf = (void *)0xf002e048;
> 
> #ifdef DEBUG_CONFIG
> 	if (where == PCI_BASE_ADDRESS_0)
> 		dump_stack();
> #endif
> 
> 	writel(1, pci_conf);

This sets the config/mem mux to CONFIG SPACE.

> 	if (devfn != 0) {
> 		*val = ~0;
> 		return PCIBIOS_DEVICE_NOT_FOUND;
> 	}

This works around a silicon bug, where accesses to devices or
functions not 0 return garbage.

> 	ret = pci_generic_config_read(bus, devfn, where, size, val);
> 
> 	writel(0, pci_conf);

This resets the config/mem mux back to MEM SPACE.

If anything tries to access MEM in that time frame, we're toast.

> #ifdef DEBUG_CONFIG
> 	printk("%s: bus=%d where=%d size=%d val=0x%x\n",
> 			__func__, bus->number, where, size, *val);
> #endif
> 
> 	return ret;
> }
> 
> static int tango_config_write(struct pci_bus *bus, unsigned int devfn,
> 				     int where, int size, u32 val)
> {
> 	int ret;
> 	void __iomem *pci_conf = (void *)0xf002e048;
> 
> #ifdef DEBUG_CONFIG
> 	if (where == PCI_BASE_ADDRESS_0)
> 		dump_stack();
> #endif
> 
> #ifdef DEBUG_CONFIG
> 	printk("%s: bus=%d where=%d size=%d val=0x%x\n",
> 			__func__, bus->number, where, size, val);
> #endif
> 
> 	writel(1, pci_conf);
> 
> 	ret = pci_generic_config_write(bus, devfn, where, size, val);
> 
> 	writel(0, pci_conf);
> 
> 	return ret;
> }
> 
> static struct pci_ecam_ops tango_pci_ops = {
> 	.bus_shift	= 20,
> 	.pci_ops	= {
> 		.map_bus        = pci_ecam_map_bus,
> 		.read           = tango_config_read,
> 		.write          = tango_config_write,
> 	}
> };
> 
> static const struct of_device_id tango_pci_ids[] = {
> 	{ .compatible = "sigma,smp8759-pcie" },
> 	{ /* sentinel */ },
> };
> 
> static int tango_pci_probe(struct platform_device *pdev)
> {
> 	return pci_host_common_probe(pdev, &tango_pci_ops);
> }
> 
> static struct platform_driver tango_pci_driver = {
> 	.probe = tango_pci_probe,
> 	.driver = {
> 		.name = KBUILD_MODNAME,
> 		.of_match_table = tango_pci_ids,
> 	},
> };
> 
> builtin_platform_driver(tango_pci_driver);
> 
> #define RIESLING_B 0x24
> 
> /* Root complex reports incorrect device class */
> static void tango_pcie_fixup_class(struct pci_dev *dev)
> {
> 	dev->class = PCI_CLASS_BRIDGE_PCI << 8;
> }
> DECLARE_PCI_FIXUP_EARLY(0x1105, RIESLING_B, tango_pcie_fixup_class);

This works around another silicon bug.

> static void tango_pcie_bar_quirk(struct pci_dev *dev)
> {
> 	struct pci_bus *bus = dev->bus;
> 
> 	printk("%s: bus=%d devfn=%d\n", __func__, bus->number, dev->devfn);
> 
>         pci_write_config_dword(dev, PCI_BASE_ADDRESS_0, 0x80000004);
> }
> DECLARE_PCI_FIXUP_FINAL(0x1105, PCI_ANY_ID, tango_pcie_bar_quirk);

And this is where the elusive "black magic" happens.

Is it "safe" to configure a BAR behind Linux's back?

Basically, there seems to be an identity map between RAM and PCI space.
(Is that, perhaps, some kind of default? I would think that the default
would have been defined by the "ranges" prop in the pci DT node.)

So PCI address 0x8000_0000 maps to CPU address 0x8000_0000, i.e. the
start of system RAM. And when dev 1 accesses RAM, the RC correctly
forwards the packet to the memory bus.

However, RC BAR0 is limited to 1 GB (split across 8 x 128 MB "region").

Thus, to properly set this up, I need to account for what memory
Linux is managing, i.e. the mem= command line argument.
(I don't know how to access that at run-time.)

For example, if we have 2 x 512 MB of RAM.
DRAM0 is at [0x8000_0000, 0xa000_0000[
DRAM1 is at [0xc000_0000, 0xe000_0000[

But a different situation is 1 x 1 GB of RAM.
DRAM0 is at [0x8000_0000, 0xc000_0000[

I need to program different region targets.
How to do that in a way that is acceptable upstream?

Regards.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Neophyte questions about PCIe
  2017-03-10 16:00               ` Robin Murphy
@ 2017-03-13 10:59                 ` Mason
  2017-03-13 11:56                   ` Robin Murphy
  0 siblings, 1 reply; 33+ messages in thread
From: Mason @ 2017-03-13 10:59 UTC (permalink / raw)
  To: Robin Murphy, David Laight
  Cc: Bjorn Helgaas, Rob Herring, Mark Rutland, linux-pci, DT,
	Linux ARM, Thibaud Cornic, Phuong Nguyen

On 10/03/2017 17:00, Robin Murphy wrote:

> On 10/03/17 15:35, David Laight wrote:
>
>> Robin Murphy wrote:
>> 
>>> The appropriate DT property would be "dma-ranges", i.e.
>>>
>>> pci@... {
>>> 	...
>>> 	dma-ranges = <(PCI bus address) (CPU phys address) (size)>;
>>> }
>>
>> Isn't that just saying which physical addresses the cpu can assign
>> for buffers for those devices?
>> There is also an offset between the 'cpu physical address' and the
>> 'dma address'.
> 
> That offset is inherent in what "dma-ranges" describes. Say (for ease of
> calculation) that BAR0 has been put at a mem space address of 0x20000000
> and maps the first 1GB of physical DRAM. That would give us:
> 
> 	dma-ranges = <0x20000000 0x80000000 0x40000000>;
> 
> Then a "virt = dma_alloc_coherent(..., &handle, ...)", borrowing the
> numbers from earlier in the thread, would automatically end up with:
> 
> 	virt == 0xd0855000;
> 	handle == 0x2e07e000;
> 
> (with the physical address of 0x8e07e000 in between being irrelevant to
> the consuming driver)
> 
> It is true that the device's DMA mask assignment is also part and parcel
> of this, whereby we will limit what physical addresses the kernel
> considers valid for DMA involving devices behind this range to the lower
> 3GB (i.e. 0x80000000 + 0x40000000 - 1). With a bit of luck,
> CONFIG_DMABOUNCE should do the rest of the job of working around that
> where necessary.

AFAICT, the parser for the "dma-ranges" property is implemented
in of_dma_get_range() in drivers/of/address.c

http://lxr.free-electrons.com/source/drivers/of/address.c#L808

/**
 * of_dma_get_range - Get DMA range info
 * @np:		device node to get DMA range info
 * @dma_addr:	pointer to store initial DMA address of DMA range
 * @paddr:	pointer to store initial CPU address of DMA range
 * @size:	pointer to store size of DMA range
 *
 * Look in bottom up direction for the first "dma-ranges" property
 * and parse it.
 *  dma-ranges format:
 *	DMA addr (dma_addr)	: naddr cells
 *	CPU addr (phys_addr_t)	: pna cells
 *	size			: nsize cells
 *
 * It returns -ENODEV if "dma-ranges" property was not found
 * for this device in DT.
 */

I didn't find anything relevant in Documentation/devicetree/bindings
except Documentation/devicetree/bindings/iommu/iommu.txt but I'm not
sure this applies to my SoC.

It's not clear to me how ranges and dma-ranges interact...
Is it perhaps: ranges for cpu-to-bus, dma-ranges for bus-to-cpu?


ePAPR (Version 1.1 -- 08 April) provides a formal definition

2.3.9 dma-ranges

Property: dma-ranges

Value type: <empty> or <prop-encoded-array> encoded as arbitrary number
of triplets of (child-bus-address, parent-bus-address, length).

Description:

The dma-ranges property is used to describe the direct memory access (DMA) structure of a
memory-mapped bus whose device tree parent can be accessed from DMA operations
originating from the bus. It provides a means of defining a mapping or translation between the
physical address space of the bus and the physical address space of the parent of the bus.
The format of the value of the dma-ranges property is an arbitrary number of triplets of
(child-bus-address, parent-bus-address, length). Each triplet specified describes a contiguous
DMA address range.

o The child-bus-address is a physical address within the child bus' address space. The
number of cells to represent the address depends on the bus and can be determined
from the #address-cells of this node (the node in which the dma-ranges property
appears).

o The parent-bus-address is a physical address within the parent bus' address space.
The number of cells to represent the parent address is bus dependent and can be
determined from the #address-cells property of the node that defines the parent's
address space.

o The length specifies the size of the range in the child's address space. The number of
cells to represent the size can be determined from the #size-cells of this node (the
node in which the dma-ranges property appears).


I'm still digging :-)

Regards.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Neophyte questions about PCIe
  2017-03-13 10:59                 ` Mason
@ 2017-03-13 11:56                   ` Robin Murphy
  0 siblings, 0 replies; 33+ messages in thread
From: Robin Murphy @ 2017-03-13 11:56 UTC (permalink / raw)
  To: Mason
  Cc: David Laight, Bjorn Helgaas, Rob Herring, Mark Rutland,
	linux-pci, DT, Linux ARM, Thibaud Cornic, Phuong Nguyen

On 13/03/17 10:59, Mason wrote:
> On 10/03/2017 17:00, Robin Murphy wrote:
> 
>> On 10/03/17 15:35, David Laight wrote:
>>
>>> Robin Murphy wrote:
>>>
>>>> The appropriate DT property would be "dma-ranges", i.e.
>>>>
>>>> pci@... {
>>>> 	...
>>>> 	dma-ranges = <(PCI bus address) (CPU phys address) (size)>;
>>>> }
>>>
>>> Isn't that just saying which physical addresses the cpu can assign
>>> for buffers for those devices?
>>> There is also an offset between the 'cpu physical address' and the
>>> 'dma address'.
>>
>> That offset is inherent in what "dma-ranges" describes. Say (for ease of
>> calculation) that BAR0 has been put at a mem space address of 0x20000000
>> and maps the first 1GB of physical DRAM. That would give us:
>>
>> 	dma-ranges = <0x20000000 0x80000000 0x40000000>;
>>
>> Then a "virt = dma_alloc_coherent(..., &handle, ...)", borrowing the
>> numbers from earlier in the thread, would automatically end up with:
>>
>> 	virt == 0xd0855000;
>> 	handle == 0x2e07e000;
>>
>> (with the physical address of 0x8e07e000 in between being irrelevant to
>> the consuming driver)
>>
>> It is true that the device's DMA mask assignment is also part and parcel
>> of this, whereby we will limit what physical addresses the kernel
>> considers valid for DMA involving devices behind this range to the lower
>> 3GB (i.e. 0x80000000 + 0x40000000 - 1). With a bit of luck,
>> CONFIG_DMABOUNCE should do the rest of the job of working around that
>> where necessary.
> 
> AFAICT, the parser for the "dma-ranges" property is implemented
> in of_dma_get_range() in drivers/of/address.c
> 
> http://lxr.free-electrons.com/source/drivers/of/address.c#L808
> 
> /**
>  * of_dma_get_range - Get DMA range info
>  * @np:		device node to get DMA range info
>  * @dma_addr:	pointer to store initial DMA address of DMA range
>  * @paddr:	pointer to store initial CPU address of DMA range
>  * @size:	pointer to store size of DMA range
>  *
>  * Look in bottom up direction for the first "dma-ranges" property
>  * and parse it.
>  *  dma-ranges format:
>  *	DMA addr (dma_addr)	: naddr cells
>  *	CPU addr (phys_addr_t)	: pna cells
>  *	size			: nsize cells
>  *
>  * It returns -ENODEV if "dma-ranges" property was not found
>  * for this device in DT.
>  */
> 
> I didn't find anything relevant in Documentation/devicetree/bindings
> except Documentation/devicetree/bindings/iommu/iommu.txt but I'm not
> sure this applies to my SoC.
> 
> It's not clear to me how ranges and dma-ranges interact...
> Is it perhaps: ranges for cpu-to-bus, dma-ranges for bus-to-cpu?

That's it exactly. FWIW the in-kernel description can be found in
Documentation/devicetree/booting-without-of.txt, but the
www.devicetree.org spec (which succeeds ePAPR) should be the
authoritative reference now. Note that the property itself is fully
capable of describing multiple discontiguous ranges which might look
like the perfect solution to your multiple-DRAM-banks problem, but Linux
can't actually cope with that at all :(

Robin.

> ePAPR (Version 1.1 -- 08 April) provides a formal definition
> 
> 2.3.9 dma-ranges
> 
> Property: dma-ranges
> 
> Value type: <empty> or <prop-encoded-array> encoded as arbitrary number
> of triplets of (child-bus-address, parent-bus-address, length).
> 
> Description:
> 
> The dma-ranges property is used to describe the direct memory access (DMA) structure of a
> memory-mapped bus whose device tree parent can be accessed from DMA operations
> originating from the bus. It provides a means of defining a mapping or translation between the
> physical address space of the bus and the physical address space of the parent of the bus.
> The format of the value of the dma-ranges property is an arbitrary number of triplets of
> (child-bus-address, parent-bus-address, length). Each triplet specified describes a contiguous
> DMA address range.
> 
> o The child-bus-address is a physical address within the child bus' address space. The
> number of cells to represent the address depends on the bus and can be determined
> from the #address-cells of this node (the node in which the dma-ranges property
> appears).
> 
> o The parent-bus-address is a physical address within the parent bus' address space.
> The number of cells to represent the parent address is bus dependent and can be
> determined from the #address-cells property of the node that defines the parent's
> address space.
> 
> o The length specifies the size of the range in the child's address space. The number of
> cells to represent the size can be determined from the #size-cells of this node (the
> node in which the dma-ranges property appears).
> 
> 
> I'm still digging :-)
> 
> Regards.
> 

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Neophyte questions about PCIe
  2017-03-10 17:49       ` Mason
  2017-03-11 10:57         ` Mason
@ 2017-03-13 14:25         ` Mason
  2017-03-14 14:00         ` Mason
  2 siblings, 0 replies; 33+ messages in thread
From: Mason @ 2017-03-13 14:25 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: linux-pci, Thibaud Cornic, David Laight, Phuong Nguyen,
	Robin Murphy, Linux ARM

On 10/03/2017 18:49, Mason wrote:

> # /usr/sbin/lspci -v
> 00:00.0 PCI bridge: Sigma Designs, Inc. Device 0024 (rev 01) (prog-if 00 [Normal decode])
>         Flags: bus master, fast devsel, latency 0
>         Memory at <ignored> (64-bit, non-prefetchable)
>         Bus: primary=00, secondary=01, subordinate=01, sec-latency=0
>         I/O behind bridge: 00000000-00000fff
>         Memory behind bridge: 04000000-040fffff
>         Prefetchable memory behind bridge: 00000000-000fffff
>         Capabilities: [50] MSI: Enable+ Count=1/4 Maskable- 64bit+
>         Capabilities: [78] Power Management version 3
>         Capabilities: [80] Express Root Port (Slot-), MSI 03
>         Capabilities: [100] Virtual Channel
>         Capabilities: [800] Advanced Error Reporting
>         Kernel driver in use: pcieport
> 
> Still some weirdness here.
> I might be using too old a version: lspci version 3.2.1

Focusing on the unexpected reports of stuff "behind bridge".
The output from pciutils 3.5.4 is mostly identical.
Is this likely to cause issues down the road?

# /usr/sbin/lspci -v
00:00.0 PCI bridge: Sigma Designs, Inc. Device 0024 (rev 01) (prog-if 00 [Normal decode])
        Flags: bus master, fast devsel, latency 0, IRQ 25
        Memory at <ignored> (64-bit, non-prefetchable)
        Bus: primary=00, secondary=01, subordinate=01, sec-latency=0
        I/O behind bridge: 00000000-00000fff [size=4K]
        Memory behind bridge: 04000000-040fffff [size=1M]
        Prefetchable memory behind bridge: 00000000-000fffff [size=1M]
        Capabilities: [50] MSI: Enable+ Count=1/4 Maskable- 64bit+
        Capabilities: [78] Power Management version 3
        Capabilities: [80] Express Root Port (Slot-), MSI 03
        Capabilities: [100] Virtual Channel
        Capabilities: [800] Advanced Error Reporting
        Kernel driver in use: pcieport

Regards.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Neophyte questions about PCIe
  2017-03-11 10:57         ` Mason
@ 2017-03-13 21:40           ` Bjorn Helgaas
  2017-03-13 21:57             ` Mason
  0 siblings, 1 reply; 33+ messages in thread
From: Bjorn Helgaas @ 2017-03-13 21:40 UTC (permalink / raw)
  To: Mason
  Cc: linux-pci, linux-usb, Rob Herring, Arnd Bergmann, Ard Biesheuvel,
	Marc Zyngier, Thibaud Cornic, David Laight, Phuong Nguyen,
	Shawn Lin, Robin Murphy, Linux ARM, Kevin Hilman

On Sat, Mar 11, 2017 at 11:57:56AM +0100, Mason wrote:
> On 10/03/2017 18:49, Mason wrote:

> > static void tango_pcie_bar_quirk(struct pci_dev *dev)
> > {
> > 	struct pci_bus *bus = dev->bus;
> > 
> > 	printk("%s: bus=%d devfn=%d\n", __func__, bus->number, dev->devfn);
> > 
> >         pci_write_config_dword(dev, PCI_BASE_ADDRESS_0, 0x80000004);
> > }
> > DECLARE_PCI_FIXUP_FINAL(0x1105, PCI_ANY_ID, tango_pcie_bar_quirk);
> 
> And this is where the elusive "black magic" happens.
> 
> Is it "safe" to configure a BAR behind Linux's back?

No.  Linux maintains a struct resource for every BAR.  This quirk
makes the BAR out of sync with the resource, so Linux no longer has an
accurate idea of what bus address space is consumed and what is
available.

Normally a BAR is for mapping device registers into PCI bus address
space.  If this BAR controls how the RC forwards PCI DMA transactions
to RAM, then it's not really a BAR and you should prevent Linux from
seeing it as a BAR.  You could do this by special-casing it in the
config accessor so reads return 0 and writes are dropped.  Then you
could write the register in your host bridge driver safely because the
PCI core would think the BAR is not implemented.

Bjorn

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Neophyte questions about PCIe
  2017-03-13 21:40           ` Bjorn Helgaas
@ 2017-03-13 21:57             ` Mason
  2017-03-13 22:46               ` Bjorn Helgaas
  2017-03-14 10:23               ` David Laight
  0 siblings, 2 replies; 33+ messages in thread
From: Mason @ 2017-03-13 21:57 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: linux-pci, linux-usb, Rob Herring, Arnd Bergmann, Ard Biesheuvel,
	Marc Zyngier, Thibaud Cornic, David Laight, Phuong Nguyen,
	Shawn Lin, Robin Murphy, Linux ARM, Kevin Hilman

On 13/03/2017 22:40, Bjorn Helgaas wrote:

> On Sat, Mar 11, 2017 at 11:57:56AM +0100, Mason wrote:
>
>> On 10/03/2017 18:49, Mason wrote:
>> 
>>> static void tango_pcie_bar_quirk(struct pci_dev *dev)
>>> {
>>> 	struct pci_bus *bus = dev->bus;
>>>
>>> 	printk("%s: bus=%d devfn=%d\n", __func__, bus->number, dev->devfn);
>>>
>>>         pci_write_config_dword(dev, PCI_BASE_ADDRESS_0, 0x80000004);
>>> }
>>> DECLARE_PCI_FIXUP_FINAL(0x1105, PCI_ANY_ID, tango_pcie_bar_quirk);
>>
>> And this is where the elusive "black magic" happens.
>>
>> Is it "safe" to configure a BAR behind Linux's back?
> 
> No.  Linux maintains a struct resource for every BAR.  This quirk
> makes the BAR out of sync with the resource, so Linux no longer has an
> accurate idea of what bus address space is consumed and what is
> available.

Even when Linux is not able to map the BAR, since it's too
large to fit in the mem window?

> Normally a BAR is for mapping device registers into PCI bus address
> space.  If this BAR controls how the RC forwards PCI DMA transactions
> to RAM, then it's not really a BAR and you should prevent Linux from
> seeing it as a BAR.  You could do this by special-casing it in the
> config accessor so reads return 0 and writes are dropped.  Then you
> could write the register in your host bridge driver safely because the
> PCI core would think the BAR is not implemented.

In fact, that's what I used to do in a previous version :-)

I'd like to push support for this PCIe controller upstream.

Is the code I posted on the right track?
Maybe I can post a RFC patch tomorrow?

Regards.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Neophyte questions about PCIe
  2017-03-13 21:57             ` Mason
@ 2017-03-13 22:46               ` Bjorn Helgaas
  2017-03-14 10:23               ` David Laight
  1 sibling, 0 replies; 33+ messages in thread
From: Bjorn Helgaas @ 2017-03-13 22:46 UTC (permalink / raw)
  To: Mason
  Cc: linux-pci, linux-usb, Rob Herring, Arnd Bergmann, Ard Biesheuvel,
	Marc Zyngier, Thibaud Cornic, David Laight, Phuong Nguyen,
	Shawn Lin, Robin Murphy, Linux ARM, Kevin Hilman

On Mon, Mar 13, 2017 at 10:57:48PM +0100, Mason wrote:
> On 13/03/2017 22:40, Bjorn Helgaas wrote:
> 
> > On Sat, Mar 11, 2017 at 11:57:56AM +0100, Mason wrote:
> >
> >> On 10/03/2017 18:49, Mason wrote:
> >> 
> >>> static void tango_pcie_bar_quirk(struct pci_dev *dev)
> >>> {
> >>> 	struct pci_bus *bus = dev->bus;
> >>>
> >>> 	printk("%s: bus=%d devfn=%d\n", __func__, bus->number, dev->devfn);
> >>>
> >>>         pci_write_config_dword(dev, PCI_BASE_ADDRESS_0, 0x80000004);
> >>> }
> >>> DECLARE_PCI_FIXUP_FINAL(0x1105, PCI_ANY_ID, tango_pcie_bar_quirk);
> >>
> >> And this is where the elusive "black magic" happens.
> >>
> >> Is it "safe" to configure a BAR behind Linux's back?
> > 
> > No.  Linux maintains a struct resource for every BAR.  This quirk
> > makes the BAR out of sync with the resource, so Linux no longer has an
> > accurate idea of what bus address space is consumed and what is
> > available.
> 
> Even when Linux is not able to map the BAR, since it's too
> large to fit in the mem window?

I don't think there's much point in advertising a BAR that isn't
really a BAR and making assumptions about how Linux will handle it.
So my answer remains "No, I don't think it's a good idea to change a
BAR behind the back of the PCI core.  It might work now, but there's
no guarantee it will keep working."

> > Normally a BAR is for mapping device registers into PCI bus address
> > space.  If this BAR controls how the RC forwards PCI DMA transactions
> > to RAM, then it's not really a BAR and you should prevent Linux from
> > seeing it as a BAR.  You could do this by special-casing it in the
> > config accessor so reads return 0 and writes are dropped.  Then you
> > could write the register in your host bridge driver safely because the
> > PCI core would think the BAR is not implemented.
> 
> In fact, that's what I used to do in a previous version :-)
> 
> I'd like to push support for this PCIe controller upstream.
> 
> Is the code I posted on the right track?
> Maybe I can post a RFC patch tomorrow?

No need to ask before posting a patch :)

^ permalink raw reply	[flat|nested] 33+ messages in thread

* RE: Neophyte questions about PCIe
  2017-03-13 21:57             ` Mason
  2017-03-13 22:46               ` Bjorn Helgaas
@ 2017-03-14 10:23               ` David Laight
  2017-03-14 12:05                 ` Mason
  1 sibling, 1 reply; 33+ messages in thread
From: David Laight @ 2017-03-14 10:23 UTC (permalink / raw)
  To: 'Mason', Bjorn Helgaas
  Cc: linux-pci, linux-usb, Rob Herring, Arnd Bergmann, Ard Biesheuvel,
	Marc Zyngier, Thibaud Cornic, Phuong Nguyen, Shawn Lin,
	Robin Murphy, Linux ARM, Kevin Hilman

From: Mason
> Sent: 13 March 2017 21:58
...
> I'd like to push support for this PCIe controller upstream.
>=20
> Is the code I posted on the right track?
> Maybe I can post a RFC patch tomorrow?

I think you need to resolve the problem of config space (and IO) cycles
before the driver can be deemed usable.

	David

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Neophyte questions about PCIe
  2017-03-14 10:23               ` David Laight
@ 2017-03-14 12:05                 ` Mason
  2017-03-14 12:24                   ` David Laight
  0 siblings, 1 reply; 33+ messages in thread
From: Mason @ 2017-03-14 12:05 UTC (permalink / raw)
  To: David Laight, Bjorn Helgaas
  Cc: linux-pci, linux-usb, Rob Herring, Arnd Bergmann, Ard Biesheuvel,
	Marc Zyngier, Thibaud Cornic, Phuong Nguyen, Shawn Lin,
	Robin Murphy, Linux ARM, Kevin Hilman

On 14/03/2017 11:23, David Laight wrote:

> Mason wrote:
> 
>> I'd like to push support for this PCIe controller upstream.
>>
>> Is the code I posted on the right track?
>> Maybe I can post a RFC patch tomorrow?
> 
> I think you need to resolve the problem of config space (and IO) cycles
> before the driver can be deemed usable.

You're alluding to the (unfortunate) muxing of config and mem spaces
on my controller, where concurrent accesses by two different threads
would blow the system up.

You've suggested sending IPIs in the config space accessor, in order
to prevent other CPUs from starting a mem access. But this doesn't
help if a mem access is already in flight, AFAIU.

I fear there is nothing that can be done in SW, short of rewriting
drivers such that mem space accesses are handled by a driver-specific
call-back which could take care of all required locking.

AFAICT, my only (reasonable) option is putting a big fat warning
in the code, and pray that concurrent accesses never happen.
(I'll test with a storage stress test on a USB3 drive.)

In parallel, I'm trying to convince management that the HW needs
fixing ASAP.

Regards.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* RE: Neophyte questions about PCIe
  2017-03-14 12:05                 ` Mason
@ 2017-03-14 12:24                   ` David Laight
  0 siblings, 0 replies; 33+ messages in thread
From: David Laight @ 2017-03-14 12:24 UTC (permalink / raw)
  To: 'Mason', Bjorn Helgaas
  Cc: linux-pci, linux-usb, Rob Herring, Arnd Bergmann, Ard Biesheuvel,
	Marc Zyngier, Thibaud Cornic, Phuong Nguyen, Shawn Lin,
	Robin Murphy, Linux ARM, Kevin Hilman

From: Mason
> Sent: 14 March 2017 12:06
> On 14/03/2017 11:23, David Laight wrote:
>=20
> > Mason wrote:
> >
> >> I'd like to push support for this PCIe controller upstream.
> >>
> >> Is the code I posted on the right track?
> >> Maybe I can post a RFC patch tomorrow?
> >
> > I think you need to resolve the problem of config space (and IO) cycles
> > before the driver can be deemed usable.
>=20
> You're alluding to the (unfortunate) muxing of config and mem spaces
> on my controller, where concurrent accesses by two different threads
> would blow the system up.
>=20
> You've suggested sending IPIs in the config space accessor, in order
> to prevent other CPUs from starting a mem access. But this doesn't
> help if a mem access is already in flight, AFAIU.
>=20
> I fear there is nothing that can be done in SW, short of rewriting
> drivers such that mem space accesses are handled by a driver-specific
> call-back which could take care of all required locking.
>=20
> AFAICT, my only (reasonable) option is putting a big fat warning
> in the code, and pray that concurrent accesses never happen.
> (I'll test with a storage stress test on a USB3 drive.)

Since this is a host controller and you want to be able to use standard
PCIe cards (with their standard drivers) you need to fix the hardware
or do some kind of software workaround that is only in the config space
code.
You cannot assume than config space cycles won't happen at the same
time as other accesses.

While most driver accesses should use either the iowrite32() or writel()
families of functions I'm pretty sure they don't have inter-cpu locking.
(They might have annoying, excessive barriers.)
But it is also legal for drivers to allow pcie memory space be mmapped
directly into a processes address space.
You have no control over the bus cycles that generates.
This isn't common, but is done. We use it for bit-banging serial eeproms.

	David

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Neophyte questions about PCIe
  2017-03-10 17:49       ` Mason
  2017-03-11 10:57         ` Mason
  2017-03-13 14:25         ` Mason
@ 2017-03-14 14:00         ` Mason
  2017-03-14 15:54           ` Mason
  2 siblings, 1 reply; 33+ messages in thread
From: Mason @ 2017-03-14 14:00 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: linux-pci, Rob Herring, Ard Biesheuvel, Thibaud Cornic,
	David Laight, Phuong Nguyen, Shawn Lin, Robin Murphy, Linux ARM

On 10/03/2017 18:49, Mason wrote:

> /* Root complex reports incorrect device class */
> static void tango_pcie_fixup_class(struct pci_dev *dev)
> {
> 	dev->class = PCI_CLASS_BRIDGE_PCI << 8;
> }

Gen1 controller reports class/rev = 0x04800001
Gen2 controller reports class/rev = 0x06000001

#define PCI_CLASS_BRIDGE_HOST		0x0600
#define PCI_CLASS_BRIDGE_ISA		0x0601
#define PCI_CLASS_BRIDGE_EISA		0x0602
#define PCI_CLASS_BRIDGE_MC		0x0603
#define PCI_CLASS_BRIDGE_PCI		0x0604
#define PCI_CLASS_BRIDGE_PCMCIA		0x0605
#define PCI_CLASS_BRIDGE_NUBUS		0x0606
#define PCI_CLASS_BRIDGE_CARDBUS	0x0607
#define PCI_CLASS_BRIDGE_RACEWAY	0x0608
#define PCI_CLASS_BRIDGE_OTHER		0x0680

My fixup replaces 0x048000 with 0x060400.

	0x060400 != 0x060000

Which is correct:
PCI_CLASS_BRIDGE_HOST or PCI_CLASS_BRIDGE_PCI?

Naively, I would expect Host/PCI bridge to be more correct
for a root complex.

Regards.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Neophyte questions about PCIe
  2017-03-14 14:00         ` Mason
@ 2017-03-14 15:54           ` Mason
  2017-03-14 21:46             ` Bjorn Helgaas
  0 siblings, 1 reply; 33+ messages in thread
From: Mason @ 2017-03-14 15:54 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: linux-pci, Rob Herring, Ard Biesheuvel, Thibaud Cornic,
	David Laight, Phuong Nguyen, Shawn Lin, Robin Murphy, Linux ARM

On 14/03/2017 15:00, Mason wrote:

> On 10/03/2017 18:49, Mason wrote:
> 
>> /* Root complex reports incorrect device class */
>> static void tango_pcie_fixup_class(struct pci_dev *dev)
>> {
>> 	dev->class = PCI_CLASS_BRIDGE_PCI << 8;
>> }
> 
> Gen1 controller reports class/rev = 0x04800001
> Gen2 controller reports class/rev = 0x06000001
> 
> #define PCI_CLASS_BRIDGE_HOST		0x0600
> #define PCI_CLASS_BRIDGE_ISA		0x0601
> #define PCI_CLASS_BRIDGE_EISA		0x0602
> #define PCI_CLASS_BRIDGE_MC		0x0603
> #define PCI_CLASS_BRIDGE_PCI		0x0604
> #define PCI_CLASS_BRIDGE_PCMCIA		0x0605
> #define PCI_CLASS_BRIDGE_NUBUS		0x0606
> #define PCI_CLASS_BRIDGE_CARDBUS	0x0607
> #define PCI_CLASS_BRIDGE_RACEWAY	0x0608
> #define PCI_CLASS_BRIDGE_OTHER		0x0680
> 
> My fixup replaces 0x048000 with 0x060400.
> 
> 	0x060400 != 0x060000
> 
> Which is correct:
> PCI_CLASS_BRIDGE_HOST or PCI_CLASS_BRIDGE_PCI?
> 
> Naively, I would expect Host/PCI bridge to be more correct
> for a root complex.

But that's very likely wrong, since the code in Linux does:

	switch (dev->hdr_type) {		    /* header type */
	case PCI_HEADER_TYPE_BRIDGE:		    /* bridge header */
		if (class != PCI_CLASS_BRIDGE_PCI)
			goto bad;
		/* The PCI-to-PCI bridge spec requires that subtractive
		   decoding (i.e. transparent) bridge must have programming
		   interface code of 0x01. */

So a class of PCI_CLASS_BRIDGE_HOST would error out, I think.
Does this mean I need to fixup Gen2 as well?
(Since it reports 0x060000)

Regards.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Neophyte questions about PCIe
  2017-03-14 15:54           ` Mason
@ 2017-03-14 21:46             ` Bjorn Helgaas
  0 siblings, 0 replies; 33+ messages in thread
From: Bjorn Helgaas @ 2017-03-14 21:46 UTC (permalink / raw)
  To: Mason
  Cc: Rob Herring, Ard Biesheuvel, linux-pci, Thibaud Cornic,
	David Laight, Phuong Nguyen, Robin Murphy, Shawn Lin, Linux ARM

On Tue, Mar 14, 2017 at 04:54:00PM +0100, Mason wrote:
> On 14/03/2017 15:00, Mason wrote:
> 
> > On 10/03/2017 18:49, Mason wrote:
> > 
> >> /* Root complex reports incorrect device class */
> >> static void tango_pcie_fixup_class(struct pci_dev *dev)
> >> {
> >> 	dev->class = PCI_CLASS_BRIDGE_PCI << 8;
> >> }
> > 
> > Gen1 controller reports class/rev = 0x04800001
> > Gen2 controller reports class/rev = 0x06000001
> > 
> > #define PCI_CLASS_BRIDGE_HOST		0x0600
> > #define PCI_CLASS_BRIDGE_ISA		0x0601
> > #define PCI_CLASS_BRIDGE_EISA		0x0602
> > #define PCI_CLASS_BRIDGE_MC		0x0603
> > #define PCI_CLASS_BRIDGE_PCI		0x0604
> > #define PCI_CLASS_BRIDGE_PCMCIA		0x0605
> > #define PCI_CLASS_BRIDGE_NUBUS		0x0606
> > #define PCI_CLASS_BRIDGE_CARDBUS	0x0607
> > #define PCI_CLASS_BRIDGE_RACEWAY	0x0608
> > #define PCI_CLASS_BRIDGE_OTHER		0x0680
> > 
> > My fixup replaces 0x048000 with 0x060400.
> > 
> > 	0x060400 != 0x060000
> > 
> > Which is correct:
> > PCI_CLASS_BRIDGE_HOST or PCI_CLASS_BRIDGE_PCI?
> > 
> > Naively, I would expect Host/PCI bridge to be more correct
> > for a root complex.
> 
> But that's very likely wrong, since the code in Linux does:
> 
> 	switch (dev->hdr_type) {		    /* header type */
> 	case PCI_HEADER_TYPE_BRIDGE:		    /* bridge header */
> 		if (class != PCI_CLASS_BRIDGE_PCI)
> 			goto bad;
> 		/* The PCI-to-PCI bridge spec requires that subtractive
> 		   decoding (i.e. transparent) bridge must have programming
> 		   interface code of 0x01. */
> 
> So a class of PCI_CLASS_BRIDGE_HOST would error out, I think.
> Does this mean I need to fixup Gen2 as well?
> (Since it reports 0x060000)

Header Type 1 (PCI_HEADER_TYPE_BRIDGE) identifies PCI-to-PCI bridges
(see PCI spec r3.0, sec 6.1).  These devices use the type 1
configuration header (PCI-to-PCI Bridge spec r1.2, sec 3.2) and they
exist on a primary PCI bus and forward transactions between that bus
and a secondary PCI bus.  The PCI core controls the memory, I/O, and
prefetchable memory windows between the primary and secondary buses
using the window registers in the type 1 header.

So I guess if this device has a type 1 header, it should use
PCI_CLASS_BRIDGE_PCI; otherwise use PCI_CLASS_BRIDGE_HOST.

I don't know the topology of your hardware, but normally a Root
Complex itself is not a PCI-to-PCI bridge.  It usually *contains*
several Root Ports, which appear to be on the root PCI bus and act as
bridges to a secondary bus.  The RC might appear as a type 0 device
but the programming model is not specified.  The Root Ports would be
type 1 devices like standard PCI-to-PCI bridges (PCIe spec r3.0, sec
1.3.1 and 1.4).

Bjorn

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 33+ messages in thread

end of thread, other threads:[~2017-03-14 21:46 UTC | newest]

Thread overview: 33+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-03-07 22:45 Neophyte questions about PCIe Mason
2017-03-08 13:39 ` Mason
2017-03-08 13:54 ` David Laight
2017-03-08 14:17   ` Mason
2017-03-08 14:38     ` David Laight
2017-03-09 22:01   ` Jeremy Linton
2017-03-08 15:17 ` Bjorn Helgaas
2017-03-09 23:43   ` Mason
2017-03-10 13:15     ` Robin Murphy
2017-03-10 14:06       ` David Laight
2017-03-10 15:05         ` Mason
2017-03-10 15:14           ` David Laight
2017-03-10 15:33             ` Mason
2017-03-10 15:23           ` Robin Murphy
2017-03-10 15:35             ` David Laight
2017-03-10 16:00               ` Robin Murphy
2017-03-13 10:59                 ` Mason
2017-03-13 11:56                   ` Robin Murphy
2017-03-10 18:49           ` Bjorn Helgaas
2017-03-10 14:53       ` Mason
2017-03-10 16:45     ` Mason
2017-03-10 17:49       ` Mason
2017-03-11 10:57         ` Mason
2017-03-13 21:40           ` Bjorn Helgaas
2017-03-13 21:57             ` Mason
2017-03-13 22:46               ` Bjorn Helgaas
2017-03-14 10:23               ` David Laight
2017-03-14 12:05                 ` Mason
2017-03-14 12:24                   ` David Laight
2017-03-13 14:25         ` Mason
2017-03-14 14:00         ` Mason
2017-03-14 15:54           ` Mason
2017-03-14 21:46             ` Bjorn Helgaas

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).