All of lore.kernel.org
 help / color / mirror / Atom feed
* pci-mvebu driver on km_kirkwood
@ 2013-07-10 16:15 Gerlando Falauto
  2013-07-10 16:57 ` Thomas Petazzoni
  2013-07-31  8:03 ` Thomas Petazzoni
  0 siblings, 2 replies; 90+ messages in thread
From: Gerlando Falauto @ 2013-07-10 16:15 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Thomas,

I am trying to use the pci-mvebu driver on one of our km_kirkwood 
boards. The board is based on Marvell's 98dx4122, which should 
essentially be 6281 compatible.

So I copied the following block from kirkwood-6281.dtsi into
kirkwood-98dx4122.dtsi:

		pcie-controller {
			compatible = "marvell,kirkwood-pcie";
			status = "disabled";
			device_type = "pci";

			#address-cells = <3>;
			#size-cells = <2>;

			bus-range = <0x00 0xff>;

			ranges = <0x82000000 0 0x00040000 0x00040000 0 0x00002000   /* Port 
0.0 registers */
				  0x82000000 0 0xe0000000 0xe0000000 0 0x08000000   /* 
non-prefetchable memory */
			          0x81000000 0 0          0xe8000000 0 0x00100000>; /* 
downstream I/O */

			pcie at 1,0 {
				device_type = "pci";
				assigned-addresses = <0x82000800 0 0x00040000 0 0x2000>;
				reg = <0x0800 0 0 0 0>;
				#address-cells = <3>;
				#size-cells = <2>;
				#interrupt-cells = <1>;
				ranges;
				interrupt-map-mask = <0 0 0 0>;
				interrupt-map = <0 0 0 0 &intc 9>;
				marvell,pcie-port = <0>;
				marvell,pcie-lane = <0>;
				clocks = <&gate_clk 2>;
				status = "disabled";
			};
		};

And added the following block to kirkwood-km_kirkwood.dts:

		pcie-controller {
			status = "okay";

			pcie at 1,0 {
				status = "okay";
			};
		};

The code I took from jcooper's repo:

   http://git.infradead.org/users/jcooper/linux.git

I took the tag

   dt-3.11-6

on top of which I merged:

   mvebu/pcie
   mvebu/pcie_bridge
   mvebu/pcie_kirkwood

Only with the latest merge did I get some conflict on
kirkwood.dtsi:

<<<<<<< HEAD
		ranges = <0x00000000 0xf1000000 0x0100000		
		          0xf4000000 0xf4000000 0x0000400
=======
		ranges = <0x00000000 0xf1000000 0x4000000
		          0xe0000000 0xe0000000 0x8100000
 >>>>>>> jcooper/mvebu/pcie_kirkwood

tried both variants, (almost) the same result:

<<<<<<< HEAD
Kirkwood: MV88F6281-A0, TCLK=200000000.
Feroceon L2: Cache support initialised, in WT override mode.
mvebu-pcie pcie-controller.1: PCIe0.0: link up
mvebu-pcie pcie-controller.1: PCI host bridge to bus 0000:00
pci_bus 0000:00: root bus resource [io  0x1000-0xfffff]
pci_bus 0000:00: root bus resource [mem 0xffffffff-0x07fffffe]
pci_bus 0000:00: root bus resource [bus 00-ff]
pci 0000:00:01.0: [11ab:7846] type 01 class 0x060400
PCI: bus0: Fast back to back transfers disabled
pci 0000:00:01.0: bridge configuration invalid ([bus 00-00]), reconfiguring
pci 0000:01:00.0: [10ee:0008] type 00 class 0x050000
pci 0000:01:00.0: reg 10: [mem 0x00000000-0x00000fff]
pci 0000:01:00.0: reg 14: [mem 0x00000000-0x07ffffff]
pci 0000:01:00.0: reg 18: [mem 0x00000000-0x00000fff]
pci 0000:01:00.0: reg 1c: [mem 0x00000000-0x007fffff]
pci 0000:01:00.0: reg 20: [mem 0x00000000-0x00001fff]
pci 0000:01:00.0: reg 24: [mem 0x00000000-0x00000fff]
pci 0000:01:00.0: supports D1 D2
pci 0000:01:00.0: PME# supported from D0 D1 D2 D3hot
PCI: bus1: Fast back to back transfers disabled
pci_bus 0000:01: busn_res: [bus 01-ff] end is updated to 01
pci 0000:00:01.0: BAR 8: can't assign mem (size 0xc000000)
pci 0000:01:00.0: BAR 1: can't assign mem (size 0x8000000)
pci 0000:01:00.0: BAR 3: can't assign mem (size 0x800000)
pci 0000:01:00.0: BAR 4: can't assign mem (size 0x2000)
pci 0000:01:00.0: BAR 0: can't assign mem (size 0x1000)
pci 0000:01:00.0: BAR 2: can't assign mem (size 0x1000)
pci 0000:01:00.0: BAR 5: can't assign mem (size 0x1000)
pci 0000:00:01.0: PCI bridge to [bus 01]
=======
Kirkwood: MV88F6281-A0, TCLK=200000000.
Feroceon L2: Cache support initialised, in WT override mode.
mvebu-pcie pcie-controller.2: PCIe0.0: link up
mvebu-pcie pcie-controller.2: PCI host bridge to bus 0000:00
pci_bus 0000:00: root bus resource [io  0x1000-0xfffff]
pci_bus 0000:00: root bus resource [mem 0xe0000000-0xe7ffffff]
pci_bus 0000:00: root bus resource [bus 00-ff]
pci 0000:00:01.0: [11ab:7846] type 01 class 0x060400
PCI: bus0: Fast back to back transfers disabled
pci 0000:00:01.0: bridge configuration invalid ([bus 00-00]), reconfiguring
pci 0000:01:00.0: [10ee:0008] type 00 class 0x050000
pci 0000:01:00.0: reg 10: [mem 0x00000000-0x00000fff]
pci 0000:01:00.0: reg 14: [mem 0x00000000-0x07ffffff]
pci 0000:01:00.0: reg 18: [mem 0x00000000-0x00000fff]
pci 0000:01:00.0: reg 1c: [mem 0x00000000-0x007fffff]
pci 0000:01:00.0: reg 20: [mem 0x00000000-0x00001fff]
pci 0000:01:00.0: reg 24: [mem 0x00000000-0x00000fff]
pci 0000:01:00.0: supports D1 D2
pci 0000:01:00.0: PME# supported from D0 D1 D2 D3hot
PCI: bus1: Fast back to back transfers disabled
pci_bus 0000:01: busn_res: [bus 01-ff] end is updated to 01
pci 0000:00:01.0: BAR 8: can't assign mem (size 0xc000000)
pci 0000:01:00.0: BAR 1: can't assign mem (size 0x8000000)
pci 0000:01:00.0: BAR 3: can't assign mem (size 0x800000)
pci 0000:01:00.0: BAR 4: can't assign mem (size 0x2000)
pci 0000:01:00.0: BAR 0: can't assign mem (size 0x1000)
pci 0000:01:00.0: BAR 2: can't assign mem (size 0x1000)
pci 0000:01:00.0: BAR 5: can't assign mem (size 0x1000)
pci 0000:00:01.0: PCI bridge to [bus 01]

 >>>>>>> jcooper/mvebu/pcie_kirkwood

Compared to a working configuration, here I see a spurious

   pci 0000:00:01.0: BAR 8: can't assign mem (size 0xc000000)

which I don't understand, plus all others which are failing.

It's weird how with the second configuration:

   mvebu-pcie pcie-controller.2: PCIe0.0: link up
   mvebu-pcie pcie-controller.2: PCI host bridge to bus 0000:00
   pci_bus 0000:00: root bus resource [io  0x1000-0xfffff]
   pci_bus 0000:00: root bus resource [mem 0xe0000000-0xe7ffffff]

I get a second mvebu-pcie pcie-controller.2, although with a more 
reasonable memory range.

Needless to say, I did try several other combinations of your recent and 
not-so-recent patches (from May 23rd onwards), with essentially the same 
results.

It *must* be something trivial. Any hints?

Thanks a lot!
Gerlando

^ permalink raw reply	[flat|nested] 90+ messages in thread

* pci-mvebu driver on km_kirkwood
  2013-07-10 16:15 pci-mvebu driver on km_kirkwood Gerlando Falauto
@ 2013-07-10 16:57 ` Thomas Petazzoni
  2013-07-10 17:31   ` Gerlando Falauto
  2013-07-31  8:03 ` Thomas Petazzoni
  1 sibling, 1 reply; 90+ messages in thread
From: Thomas Petazzoni @ 2013-07-10 16:57 UTC (permalink / raw)
  To: linux-arm-kernel

Gerlando,

On Wed, 10 Jul 2013 18:15:32 +0200, Gerlando Falauto wrote:

> I am trying to use the pci-mvebu driver on one of our km_kirkwood 
> boards. The board is based on Marvell's 98dx4122, which should 
> essentially be 6281 compatible.

Was this platform working with the old PCIe driver in mach-kirkwood/ ?


> The code I took from jcooper's repo:
> 
>    http://git.infradead.org/users/jcooper/linux.git
> 
> I took the tag
> 
>    dt-3.11-6
> 
> on top of which I merged:
> 
>    mvebu/pcie
>    mvebu/pcie_bridge
>    mvebu/pcie_kirkwood

Could you instead use the latest master from Linus tree? That would
avoid merge conflicts, and ensure you have all the necessary pieces.

> Only with the latest merge did I get some conflict on
> kirkwood.dtsi:
> 
> <<<<<<< HEAD
> 		ranges = <0x00000000 0xf1000000 0x0100000		
> 		          0xf4000000 0xf4000000 0x0000400
> =======
> 		ranges = <0x00000000 0xf1000000 0x4000000
> 		          0xe0000000 0xe0000000 0x8100000

The first cannot work, because it lacks the range for the PCIe. The
second should work. The correct merge should be:

 		ranges = <0x00000000 0xf1000000 0x0100000		
 		          0xf4000000 0xf4000000 0x0000400
 		          0xe0000000 0xe0000000 0x8100000>;

i.e, we've added the PCIe range (last line) and splitted the SRAM into
its own range (or something like that, don't remember the details, but
Ezequiel can confirm).

> <<<<<<< HEAD
> Kirkwood: MV88F6281-A0, TCLK=200000000.
> Feroceon L2: Cache support initialised, in WT override mode.
> mvebu-pcie pcie-controller.1: PCIe0.0: link up
> mvebu-pcie pcie-controller.1: PCI host bridge to bus 0000:00
> pci_bus 0000:00: root bus resource [io  0x1000-0xfffff]
> pci_bus 0000:00: root bus resource [mem 0xffffffff-0x07fffffe]
> pci_bus 0000:00: root bus resource [bus 00-ff]
> pci 0000:00:01.0: [11ab:7846] type 01 class 0x060400
> PCI: bus0: Fast back to back transfers disabled
> pci 0000:00:01.0: bridge configuration invalid ([bus 00-00]), reconfiguring
> pci 0000:01:00.0: [10ee:0008] type 00 class 0x050000
> pci 0000:01:00.0: reg 10: [mem 0x00000000-0x00000fff]
> pci 0000:01:00.0: reg 14: [mem 0x00000000-0x07ffffff]
> pci 0000:01:00.0: reg 18: [mem 0x00000000-0x00000fff]
> pci 0000:01:00.0: reg 1c: [mem 0x00000000-0x007fffff]
> pci 0000:01:00.0: reg 20: [mem 0x00000000-0x00001fff]
> pci 0000:01:00.0: reg 24: [mem 0x00000000-0x00000fff]
> pci 0000:01:00.0: supports D1 D2
> pci 0000:01:00.0: PME# supported from D0 D1 D2 D3hot
> PCI: bus1: Fast back to back transfers disabled
> pci_bus 0000:01: busn_res: [bus 01-ff] end is updated to 01
> pci 0000:00:01.0: BAR 8: can't assign mem (size 0xc000000)
> pci 0000:01:00.0: BAR 1: can't assign mem (size 0x8000000)
> pci 0000:01:00.0: BAR 3: can't assign mem (size 0x800000)
> pci 0000:01:00.0: BAR 4: can't assign mem (size 0x2000)
> pci 0000:01:00.0: BAR 0: can't assign mem (size 0x1000)
> pci 0000:01:00.0: BAR 2: can't assign mem (size 0x1000)
> pci 0000:01:00.0: BAR 5: can't assign mem (size 0x1000)
> pci 0000:00:01.0: PCI bridge to [bus 01]

The first test you did cannot work at all, due to the incorrect ranges.

If you have the PCIe working with the old driver, can you pastebin
somewhere the complete boot log, as well as the output of "lspci
-vvv" ?

> Compared to a working configuration, here I see a spurious
> 
>    pci 0000:00:01.0: BAR 8: can't assign mem (size 0xc000000)
> 
> which I don't understand, plus all others which are failing.
> 
> It's weird how with the second configuration:
> 
>    mvebu-pcie pcie-controller.2: PCIe0.0: link up
>    mvebu-pcie pcie-controller.2: PCI host bridge to bus 0000:00
>    pci_bus 0000:00: root bus resource [io  0x1000-0xfffff]
>    pci_bus 0000:00: root bus resource [mem 0xe0000000-0xe7ffffff]
> 
> I get a second mvebu-pcie pcie-controller.2, although with a more 
> reasonable memory range.

A second mvebu-pcie controller? Is your Device Tree correct?

I'm not really sure to understand what's going on here. Can you post
the complete boot log, and test with the latest Linus git tree, where
all the PCIe support got merged?

Thanks!

Thomas
-- 
Thomas Petazzoni, Free Electrons
Kernel, drivers, real-time and embedded Linux
development, consulting, training and support.
http://free-electrons.com

^ permalink raw reply	[flat|nested] 90+ messages in thread

* pci-mvebu driver on km_kirkwood
  2013-07-10 16:57 ` Thomas Petazzoni
@ 2013-07-10 17:31   ` Gerlando Falauto
  2013-07-10 19:56     ` Gerlando Falauto
                       ` (2 more replies)
  0 siblings, 3 replies; 90+ messages in thread
From: Gerlando Falauto @ 2013-07-10 17:31 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Thomas,

first of all thanks for your quick feedback.

On 07/10/2013 06:57 PM, Thomas Petazzoni wrote:
> Gerlando,
>
> On Wed, 10 Jul 2013 18:15:32 +0200, Gerlando Falauto wrote:
>
>> I am trying to use the pci-mvebu driver on one of our km_kirkwood
>> boards. The board is based on Marvell's 98dx4122, which should
>> essentially be 6281 compatible.
>
> Was this platform working with the old PCIe driver in mach-kirkwood/ ?

Yes, though we had to trick it a little bit to get both the internal 
switch and this PCIe device working:

- this PCIe device requires to map 256M of memory as opposed to just 128
- we need a virtual PCIe device to connect to the internal switch, which 
must be mapped at 0xf4000000 (normally used for the NAND which must then 
move to 0xff000000)

But apart from the huge BAR (0x07ffffff aka 128M) for the PCIe device 
not being mappable, the rest was normally working just fine even without 
the above changes (i.e. the other BARs were mapped fine).

>
>> The code I took from jcooper's repo:
>>
>>     http://git.infradead.org/users/jcooper/linux.git
>>
>> I took the tag
>>
>>     dt-3.11-6
>>
>> on top of which I merged:
>>
>>     mvebu/pcie
>>     mvebu/pcie_bridge
>>     mvebu/pcie_kirkwood
>
> Could you instead use the latest master from Linus tree? That would
> avoid merge conflicts, and ensure you have all the necessary pieces.

Oops, I had no idea all this had gotten merged already.
Quite honestly, I have no idea how to track this kind of stuff (i.e. did 
a given patch ever got merged and where?) but that's a different topic.

>> Only with the latest merge did I get some conflict on
>> kirkwood.dtsi:
>>
>> <<<<<<< HEAD
>> 		ranges = <0x00000000 0xf1000000 0x0100000		
>> 		          0xf4000000 0xf4000000 0x0000400
>> =======
>> 		ranges = <0x00000000 0xf1000000 0x4000000
>> 		          0xe0000000 0xe0000000 0x8100000
>
> The first cannot work, because it lacks the range for the PCIe. The
> second should work. The correct merge should be:
>
>   		ranges = <0x00000000 0xf1000000 0x0100000		
>   		          0xf4000000 0xf4000000 0x0000400
>   		          0xe0000000 0xe0000000 0x8100000>;
>
> i.e, we've added the PCIe range (last line) and splitted the SRAM into
> its own range (or something like that, don't remember the details, but
> Ezequiel can confirm).

OK that's a good starting point.

>> <<<<<<< HEAD
>> Kirkwood: MV88F6281-A0, TCLK=200000000.
>> Feroceon L2: Cache support initialised, in WT override mode.
>> mvebu-pcie pcie-controller.1: PCIe0.0: link up
>> mvebu-pcie pcie-controller.1: PCI host bridge to bus 0000:00
>> pci_bus 0000:00: root bus resource [io  0x1000-0xfffff]
>> pci_bus 0000:00: root bus resource [mem 0xffffffff-0x07fffffe]
>> pci_bus 0000:00: root bus resource [bus 00-ff]
>> pci 0000:00:01.0: [11ab:7846] type 01 class 0x060400
>> PCI: bus0: Fast back to back transfers disabled
>> pci 0000:00:01.0: bridge configuration invalid ([bus 00-00]), reconfiguring
>> pci 0000:01:00.0: [10ee:0008] type 00 class 0x050000
>> pci 0000:01:00.0: reg 10: [mem 0x00000000-0x00000fff]
>> pci 0000:01:00.0: reg 14: [mem 0x00000000-0x07ffffff]
>> pci 0000:01:00.0: reg 18: [mem 0x00000000-0x00000fff]
>> pci 0000:01:00.0: reg 1c: [mem 0x00000000-0x007fffff]
>> pci 0000:01:00.0: reg 20: [mem 0x00000000-0x00001fff]
>> pci 0000:01:00.0: reg 24: [mem 0x00000000-0x00000fff]
>> pci 0000:01:00.0: supports D1 D2
>> pci 0000:01:00.0: PME# supported from D0 D1 D2 D3hot
>> PCI: bus1: Fast back to back transfers disabled
>> pci_bus 0000:01: busn_res: [bus 01-ff] end is updated to 01
>> pci 0000:00:01.0: BAR 8: can't assign mem (size 0xc000000)
>> pci 0000:01:00.0: BAR 1: can't assign mem (size 0x8000000)
>> pci 0000:01:00.0: BAR 3: can't assign mem (size 0x800000)
>> pci 0000:01:00.0: BAR 4: can't assign mem (size 0x2000)
>> pci 0000:01:00.0: BAR 0: can't assign mem (size 0x1000)
>> pci 0000:01:00.0: BAR 2: can't assign mem (size 0x1000)
>> pci 0000:01:00.0: BAR 5: can't assign mem (size 0x1000)
>> pci 0000:00:01.0: PCI bridge to [bus 01]
>
> The first test you did cannot work at all, due to the incorrect ranges.
>
> If you have the PCIe working with the old driver, can you pastebin
> somewhere the complete boot log, as well as the output of "lspci
> -vvv" ?

OK, I will.
In the meantime, what I got to establish is that by manually disabling 
the two biggest resources

 >> pci 0000:00:01.0: BAR 8: can't assign mem (size 0xc000000)
 >> pci 0000:01:00.0: BAR 1: can't assign mem (size 0x8000000)

i.e. something like:

-281,6 +282,10 @@ static void assign_requested_resources_sorted(struct 
list_head *head,
         list_for_each_entry(dev_res, head, list) {
                 res = dev_res->res;
                 idx = res - &dev_res->dev->resource[0];
+
+               if (resource_size(res) < 0x8000000)
+               {
+

at least I can get the following ones to be assigned correctly:

mvebu-pcie pcie-controller.2: PCIe0.0: link up
mvebu-pcie pcie-controller.2: PCI host bridge to bus 0000:00
pci_bus 0000:00: root bus resource [io  0x1000-0xfffff]
pci_bus 0000:00: root bus resource [mem 0xe0000000-0xe7ffffff]
pci_bus 0000:00: root bus resource [bus 00-ff]
pci 0000:00:01.0: [11ab:7846] type 01 class 0x060400
PCI: bus0: Fast back to back transfers disabled
pci 0000:00:01.0: bridge configuration invalid ([bus 00-00]), reconfiguring
pci 0000:01:00.0: [10ee:0008] type 00 class 0x050000
pci 0000:01:00.0: reg 10: [mem 0x00000000-0x00000fff]
pci 0000:01:00.0: reg 14: [mem 0x00000000-0x07ffffff]
pci 0000:01:00.0: reg 18: [mem 0x00000000-0x00000fff]
pci 0000:01:00.0: reg 1c: [mem 0x00000000-0x007fffff]
pci 0000:01:00.0: reg 20: [mem 0x00000000-0x00001fff]
pci 0000:01:00.0: reg 24: [mem 0x00000000-0x00000fff]
pci 0000:01:00.0: supports D1 D2
pci 0000:01:00.0: PME# supported from D0 D1 D2 D3hot
PCI: bus1: Fast back to back transfers disabled
pci_bus 0000:01: busn_res: [bus 01-ff] end is updated to 01
pci 0000:01:00.0: BAR 3: assigned [mem 0x04000000-0x047fffff]
pci 0000:01:00.0: BAR 3: set to [mem 0x04000000-0x047fffff] (PCI address 
[0x4000000-0x47fffff])
pci 0000:01:00.0: BAR 4: assigned [mem 0x04800000-0x04801fff]
pci 0000:01:00.0: BAR 4: set to [mem 0x04800000-0x04801fff] (PCI address 
[0x4800000-0x4801fff])
pci 0000:01:00.0: BAR 0: assigned [mem 0x04802000-0x04802fff]
pci 0000:01:00.0: BAR 0: set to [mem 0x04802000-0x04802fff] (PCI address 
[0x4802000-0x4802fff])
pci 0000:01:00.0: BAR 2: assigned [mem 0x04803000-0x04803fff]
pci 0000:01:00.0: BAR 2: set to [mem 0x04803000-0x04803fff] (PCI address 
[0x4803000-0x4803fff])
pci 0000:01:00.0: BAR 5: assigned [mem 0x04804000-0x04804fff]
pci 0000:01:00.0: BAR 5: set to [mem 0x04804000-0x04804fff] (PCI address 
[0x4804000-0x4804fff])
pci 0000:00:01.0: PCI bridge to [bus 01]
pci 0000:00:01.0:   bridge window [mem 0x04000000-0x0fffffff]
PCI: enabling device 0000:00:01.0 (0140 -> 0143)


Which is a bit weird because in the past these huge assignments would 
just fail but the following ones would work just fine.

>
>> Compared to a working configuration, here I see a spurious
>>

I assume the

>>     pci 0000:00:01.0: BAR 8: can't assign mem (size 0xc000000)
>>

comes from the switch but I have no idea how to find it out.
I'm quite sure this is the first time I'm seeing BAR 8.

>> which I don't understand, plus all others which are failing.
>>
>> It's weird how with the second configuration:
>>
>>     mvebu-pcie pcie-controller.2: PCIe0.0: link up
>>     mvebu-pcie pcie-controller.2: PCI host bridge to bus 0000:00
>>     pci_bus 0000:00: root bus resource [io  0x1000-0xfffff]
>>     pci_bus 0000:00: root bus resource [mem 0xe0000000-0xe7ffffff]
>>
>> I get a second mvebu-pcie pcie-controller.2, although with a more
>> reasonable memory range.
>
> A second mvebu-pcie controller? Is your Device Tree correct?

Whoops, my fault. There's just one pcie-controller.2, it's just that 
with the correct ranges the nand.1 node gets created as well, and these 
(platform?) devices are numbered sequentially, regardless of their type.

>
> I'm not really sure to understand what's going on here. Can you post
> the complete boot log, and test with the latest Linus git tree, where
> all the PCIe support got merged?

I sure will.
Thanks for the heads-up.

Thanks a lot!
Gerlando

>
> Thanks!
>
> Thomas
>

^ permalink raw reply	[flat|nested] 90+ messages in thread

* pci-mvebu driver on km_kirkwood
  2013-07-10 17:31   ` Gerlando Falauto
@ 2013-07-10 19:56     ` Gerlando Falauto
  2013-07-11  7:03     ` Valentin Longchamp
  2013-07-11 14:32     ` Thomas Petazzoni
  2 siblings, 0 replies; 90+ messages in thread
From: Gerlando Falauto @ 2013-07-10 19:56 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Thomas,

I guess I understand now....

 >>> pci 0000:00:01.0: BAR 8: can't assign mem (size 0xc000000)

this is the BAR for the bridge, your virtual PCI host device, whose size 
is calculated dynamically depending on what is found on the underlying 
hardware.
So compared to the legacy driver which was relying on the real hardware 
BARs (where I could get /some/ BARs to work, namely the biggest one 
which was taking up the whole 128M), here it's an all-or-nothing approach.
As a matter of fact, everything works fine if I explicitly disable the 
biggest BAR with a trick:

mvebu-pcie pcie-controller.1: PCIe0.0: link up
mvebu-pcie pcie-controller.1: PCI host bridge to bus 0000:00
pci_bus 0000:00: root bus resource [io  0x1000-0xfffff]
pci_bus 0000:00: root bus resource [mem 0xe0000000-0xe7ffffff]
pci_bus 0000:00: root bus resource [bus 00-ff]
pci 0000:00:01.0: [11ab:7846] type 01 class 0x060400
PCI: bus0: Fast back to back transfers disabled
pci 0000:00:01.0: bridge configuration invalid ([bus 00-00]), reconfiguring
pci 0000:01:00.0: [10ee:0008] type 00 class 0x050000
pci 0000:01:00.0: reg 10: [mem 0x00000000-0x00000fff]
pci 0000:01:00.0: reg 14: [mem 0x00000000-0x07ffffff]
pci 0000:01:00.0: reg 18: [mem 0x00000000-0x00000fff]
pci 0000:01:00.0: reg 1c: [mem 0x00000000-0x007fffff]
pci 0000:01:00.0: reg 20: [mem 0x00000000-0x00001fff]
pci 0000:01:00.0: reg 24: [mem 0x00000000-0x00000fff]
pci 0000:01:00.0: supports D1 D2
pci 0000:01:00.0: PME# supported from D0 D1 D2 D3hot
PCI: bus1: Fast back to back transfers disabled
pci_bus 0000:01: busn_res: [bus 01-ff] end is updated to 01
pci 0000:01:00.0: disabling BAR 1: [mem 0x00000000-0x07ffffff] TOO BIG 
(alignment 0x1000)
pci 0000:00:01.0: BAR 8: assigned [mem 0xe0000000-0xe0bfffff]
pci 0000:01:00.0: BAR 3: assigned [mem 0xe0000000-0xe07fffff]
pci 0000:01:00.0: BAR 4: assigned [mem 0xe0800000-0xe0801fff]
pci 0000:01:00.0: BAR 0: assigned [mem 0xe0802000-0xe0802fff]
pci 0000:01:00.0: BAR 2: assigned [mem 0xe0803000-0xe0803fff]
pci 0000:01:00.0: BAR 5: assigned [mem 0xe0804000-0xe0804fff]
pci 0000:00:01.0: PCI bridge to [bus 01]
pci 0000:00:01.0:   bridge window [mem 0xe0000000-0xe0bfffff]

So this seems to be the final solution (without the hack above):

--- a/arch/arm/boot/dts/kirkwood-98dx4122.dtsi
+++ b/arch/arm/boot/dts/kirkwood-98dx4122.dtsi
                         bus-range = <0x00 0xff>;

                         ranges = <0x82000000 0 0x00040000 0x00040000 0 
0x00002000   /* Port 0.0 registers */
-                                 0x82000000 0 0xe0000000 0xe0000000 0 
0x08000000   /* non-prefetchable memory */
+                                 0x82000000 0 0xe0000000 0xe0000000 0 
0x0c000000   /* non-prefetchable memory */
                                   0x81000000 0 0          0xe8000000 0 
0x00100000>; /* downstream I/O */

                         pcie at 1,0 {

Does the above make sense? Am I setting up overlapping ranges this way?
Could I make it 0x10000000 so to have 256M?

Thanks a lot!
Gerlando

On 07/10/2013 07:31 PM, Gerlando Falauto wrote:
> Hi Thomas,
>
> first of all thanks for your quick feedback.
>
> On 07/10/2013 06:57 PM, Thomas Petazzoni wrote:
>> Gerlando,
>>
>> On Wed, 10 Jul 2013 18:15:32 +0200, Gerlando Falauto wrote:
>>
>>> I am trying to use the pci-mvebu driver on one of our km_kirkwood
>>> boards. The board is based on Marvell's 98dx4122, which should
>>> essentially be 6281 compatible.
>>
>> Was this platform working with the old PCIe driver in mach-kirkwood/ ?
>
> Yes, though we had to trick it a little bit to get both the internal
> switch and this PCIe device working:
>
> - this PCIe device requires to map 256M of memory as opposed to just 128
> - we need a virtual PCIe device to connect to the internal switch, which
> must be mapped at 0xf4000000 (normally used for the NAND which must then
> move to 0xff000000)
>
> But apart from the huge BAR (0x07ffffff aka 128M) for the PCIe device
> not being mappable, the rest was normally working just fine even without
> the above changes (i.e. the other BARs were mapped fine).
>
>>
>>> The code I took from jcooper's repo:
>>>
>>>     http://git.infradead.org/users/jcooper/linux.git
>>>
>>> I took the tag
>>>
>>>     dt-3.11-6
>>>
>>> on top of which I merged:
>>>
>>>     mvebu/pcie
>>>     mvebu/pcie_bridge
>>>     mvebu/pcie_kirkwood
>>
>> Could you instead use the latest master from Linus tree? That would
>> avoid merge conflicts, and ensure you have all the necessary pieces.
>
> Oops, I had no idea all this had gotten merged already.
> Quite honestly, I have no idea how to track this kind of stuff (i.e. did
> a given patch ever got merged and where?) but that's a different topic.
>
>>> Only with the latest merge did I get some conflict on
>>> kirkwood.dtsi:
>>>
>>> <<<<<<< HEAD
>>>         ranges = <0x00000000 0xf1000000 0x0100000
>>>                   0xf4000000 0xf4000000 0x0000400
>>> =======
>>>         ranges = <0x00000000 0xf1000000 0x4000000
>>>                   0xe0000000 0xe0000000 0x8100000
>>
>> The first cannot work, because it lacks the range for the PCIe. The
>> second should work. The correct merge should be:
>>
>>           ranges = <0x00000000 0xf1000000 0x0100000
>>                     0xf4000000 0xf4000000 0x0000400
>>                     0xe0000000 0xe0000000 0x8100000>;
>>
>> i.e, we've added the PCIe range (last line) and splitted the SRAM into
>> its own range (or something like that, don't remember the details, but
>> Ezequiel can confirm).
>
> OK that's a good starting point.
>
>>> <<<<<<< HEAD
>>> Kirkwood: MV88F6281-A0, TCLK=200000000.
>>> Feroceon L2: Cache support initialised, in WT override mode.
>>> mvebu-pcie pcie-controller.1: PCIe0.0: link up
>>> mvebu-pcie pcie-controller.1: PCI host bridge to bus 0000:00
>>> pci_bus 0000:00: root bus resource [io  0x1000-0xfffff]
>>> pci_bus 0000:00: root bus resource [mem 0xffffffff-0x07fffffe]
>>> pci_bus 0000:00: root bus resource [bus 00-ff]
>>> pci 0000:00:01.0: [11ab:7846] type 01 class 0x060400
>>> PCI: bus0: Fast back to back transfers disabled
>>> pci 0000:00:01.0: bridge configuration invalid ([bus 00-00]),
>>> reconfiguring
>>> pci 0000:01:00.0: [10ee:0008] type 00 class 0x050000
>>> pci 0000:01:00.0: reg 10: [mem 0x00000000-0x00000fff]
>>> pci 0000:01:00.0: reg 14: [mem 0x00000000-0x07ffffff]
>>> pci 0000:01:00.0: reg 18: [mem 0x00000000-0x00000fff]
>>> pci 0000:01:00.0: reg 1c: [mem 0x00000000-0x007fffff]
>>> pci 0000:01:00.0: reg 20: [mem 0x00000000-0x00001fff]
>>> pci 0000:01:00.0: reg 24: [mem 0x00000000-0x00000fff]
>>> pci 0000:01:00.0: supports D1 D2
>>> pci 0000:01:00.0: PME# supported from D0 D1 D2 D3hot
>>> PCI: bus1: Fast back to back transfers disabled
>>> pci_bus 0000:01: busn_res: [bus 01-ff] end is updated to 01
>>> pci 0000:00:01.0: BAR 8: can't assign mem (size 0xc000000)
>>> pci 0000:01:00.0: BAR 1: can't assign mem (size 0x8000000)
>>> pci 0000:01:00.0: BAR 3: can't assign mem (size 0x800000)
>>> pci 0000:01:00.0: BAR 4: can't assign mem (size 0x2000)
>>> pci 0000:01:00.0: BAR 0: can't assign mem (size 0x1000)
>>> pci 0000:01:00.0: BAR 2: can't assign mem (size 0x1000)
>>> pci 0000:01:00.0: BAR 5: can't assign mem (size 0x1000)
>>> pci 0000:00:01.0: PCI bridge to [bus 01]
>>
>> The first test you did cannot work at all, due to the incorrect ranges.
>>
>> If you have the PCIe working with the old driver, can you pastebin
>> somewhere the complete boot log, as well as the output of "lspci
>> -vvv" ?
>
> OK, I will.
> In the meantime, what I got to establish is that by manually disabling
> the two biggest resources
>
>  >> pci 0000:00:01.0: BAR 8: can't assign mem (size 0xc000000)
>  >> pci 0000:01:00.0: BAR 1: can't assign mem (size 0x8000000)
>
> i.e. something like:
>
> -281,6 +282,10 @@ static void assign_requested_resources_sorted(struct
> list_head *head,
>          list_for_each_entry(dev_res, head, list) {
>                  res = dev_res->res;
>                  idx = res - &dev_res->dev->resource[0];
> +
> +               if (resource_size(res) < 0x8000000)
> +               {
> +
>
> at least I can get the following ones to be assigned correctly:
>
> mvebu-pcie pcie-controller.2: PCIe0.0: link up
> mvebu-pcie pcie-controller.2: PCI host bridge to bus 0000:00
> pci_bus 0000:00: root bus resource [io  0x1000-0xfffff]
> pci_bus 0000:00: root bus resource [mem 0xe0000000-0xe7ffffff]
> pci_bus 0000:00: root bus resource [bus 00-ff]
> pci 0000:00:01.0: [11ab:7846] type 01 class 0x060400
> PCI: bus0: Fast back to back transfers disabled
> pci 0000:00:01.0: bridge configuration invalid ([bus 00-00]), reconfiguring
> pci 0000:01:00.0: [10ee:0008] type 00 class 0x050000
> pci 0000:01:00.0: reg 10: [mem 0x00000000-0x00000fff]
> pci 0000:01:00.0: reg 14: [mem 0x00000000-0x07ffffff]
> pci 0000:01:00.0: reg 18: [mem 0x00000000-0x00000fff]
> pci 0000:01:00.0: reg 1c: [mem 0x00000000-0x007fffff]
> pci 0000:01:00.0: reg 20: [mem 0x00000000-0x00001fff]
> pci 0000:01:00.0: reg 24: [mem 0x00000000-0x00000fff]
> pci 0000:01:00.0: supports D1 D2
> pci 0000:01:00.0: PME# supported from D0 D1 D2 D3hot
> PCI: bus1: Fast back to back transfers disabled
> pci_bus 0000:01: busn_res: [bus 01-ff] end is updated to 01
> pci 0000:01:00.0: BAR 3: assigned [mem 0x04000000-0x047fffff]
> pci 0000:01:00.0: BAR 3: set to [mem 0x04000000-0x047fffff] (PCI address
> [0x4000000-0x47fffff])
> pci 0000:01:00.0: BAR 4: assigned [mem 0x04800000-0x04801fff]
> pci 0000:01:00.0: BAR 4: set to [mem 0x04800000-0x04801fff] (PCI address
> [0x4800000-0x4801fff])
> pci 0000:01:00.0: BAR 0: assigned [mem 0x04802000-0x04802fff]
> pci 0000:01:00.0: BAR 0: set to [mem 0x04802000-0x04802fff] (PCI address
> [0x4802000-0x4802fff])
> pci 0000:01:00.0: BAR 2: assigned [mem 0x04803000-0x04803fff]
> pci 0000:01:00.0: BAR 2: set to [mem 0x04803000-0x04803fff] (PCI address
> [0x4803000-0x4803fff])
> pci 0000:01:00.0: BAR 5: assigned [mem 0x04804000-0x04804fff]
> pci 0000:01:00.0: BAR 5: set to [mem 0x04804000-0x04804fff] (PCI address
> [0x4804000-0x4804fff])
> pci 0000:00:01.0: PCI bridge to [bus 01]
> pci 0000:00:01.0:   bridge window [mem 0x04000000-0x0fffffff]
> PCI: enabling device 0000:00:01.0 (0140 -> 0143)
>
>
> Which is a bit weird because in the past these huge assignments would
> just fail but the following ones would work just fine.
>
>>
>>> Compared to a working configuration, here I see a spurious
>>>
>
> I assume the
>
>>>     pci 0000:00:01.0: BAR 8: can't assign mem (size 0xc000000)
>>>
>
> comes from the switch but I have no idea how to find it out.
> I'm quite sure this is the first time I'm seeing BAR 8.
>
>>> which I don't understand, plus all others which are failing.
>>>
>>> It's weird how with the second configuration:
>>>
>>>     mvebu-pcie pcie-controller.2: PCIe0.0: link up
>>>     mvebu-pcie pcie-controller.2: PCI host bridge to bus 0000:00
>>>     pci_bus 0000:00: root bus resource [io  0x1000-0xfffff]
>>>     pci_bus 0000:00: root bus resource [mem 0xe0000000-0xe7ffffff]
>>>
>>> I get a second mvebu-pcie pcie-controller.2, although with a more
>>> reasonable memory range.
>>
>> A second mvebu-pcie controller? Is your Device Tree correct?
>
> Whoops, my fault. There's just one pcie-controller.2, it's just that
> with the correct ranges the nand.1 node gets created as well, and these
> (platform?) devices are numbered sequentially, regardless of their type.
>
>>
>> I'm not really sure to understand what's going on here. Can you post
>> the complete boot log, and test with the latest Linus git tree, where
>> all the PCIe support got merged?
>
> I sure will.
> Thanks for the heads-up.
>
> Thanks a lot!
> Gerlando
>
>>
>> Thanks!
>>
>> Thomas
>>
>

^ permalink raw reply	[flat|nested] 90+ messages in thread

* pci-mvebu driver on km_kirkwood
  2013-07-10 17:31   ` Gerlando Falauto
  2013-07-10 19:56     ` Gerlando Falauto
@ 2013-07-11  7:03     ` Valentin Longchamp
  2013-07-12  8:59       ` Thomas Petazzoni
  2013-07-11 14:32     ` Thomas Petazzoni
  2 siblings, 1 reply; 90+ messages in thread
From: Valentin Longchamp @ 2013-07-11  7:03 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Gerlando,

I just want to give further information about  the hardware and memory mapping
we have with these kirkwood variants in our system.

As I told you yesterday, I think it's very important to have a clear view of the
actual memory map when setting these ranges.

On 07/10/2013 07:31 PM, Falauto, Gerlando wrote:
> Hi Thomas,
> 
> first of all thanks for your quick feedback.
> 
> On 07/10/2013 06:57 PM, Thomas Petazzoni wrote:
>> Gerlando,
>>
>> On Wed, 10 Jul 2013 18:15:32 +0200, Gerlando Falauto wrote:
>>
>>> I am trying to use the pci-mvebu driver on one of our km_kirkwood
>>> boards. The board is based on Marvell's 98dx4122, which should
>>> essentially be 6281 compatible.
>>
>> Was this platform working with the old PCIe driver in mach-kirkwood/ ?
> 
> Yes, though we had to trick it a little bit to get both the internal 
> switch and this PCIe device working:
> 
> - this PCIe device requires to map 256M of memory as opposed to just 128

On the board you are currently using for your tests, it is the case (the whole
map is not used ... things are scattered over the 256 MB, with one 128MB BAR).
If you want to get rid of this problem, we have another board that does not
require these (256 MB .. and I have one of them on my desk that you can use for
your tests).

On the kirkwood variant we use, there is only _one_ real PCIe controller. In
order to map 256MB for the MEM space of this controller without having further
memory map conflicts, what was done was to not enable the CPU windows for the
usual 2nd PCIe controller and set a wider CPU window for the MEM space of the
only PCIe controller.

> - we need a virtual PCIe device to connect to the internal switch, which 
> must be mapped at 0xf4000000 (normally used for the NAND which must then 
> move to 0xff000000)
> 

I think you can forget this for the time being. This is called (also in
Marvell's doc) a virtual PCIe controller, but apart that it is then memory
mapped, this has nothing to do with PCIe (although I don't know what is done
internally in the SoC). It is a problem because the physical address chosen for
this CPU window conflicts with the one that is used for the NAND controller in
the current kirkwood Linux memory map. But this is another topic and it should
not play any role in this PCIe topic.

I hope this helps also the other better understand what's going on here.

Valentin

^ permalink raw reply	[flat|nested] 90+ messages in thread

* pci-mvebu driver on km_kirkwood
  2013-07-10 17:31   ` Gerlando Falauto
  2013-07-10 19:56     ` Gerlando Falauto
  2013-07-11  7:03     ` Valentin Longchamp
@ 2013-07-11 14:32     ` Thomas Petazzoni
  2014-02-18 17:29       ` Gerlando Falauto
  2 siblings, 1 reply; 90+ messages in thread
From: Thomas Petazzoni @ 2013-07-11 14:32 UTC (permalink / raw)
  To: linux-arm-kernel

Dear Gerlando Falauto,

On Wed, 10 Jul 2013 19:31:56 +0200, Gerlando Falauto wrote:

> Yes, though we had to trick it a little bit to get both the internal 
> switch and this PCIe device working:
> 
> - this PCIe device requires to map 256M of memory as opposed to just 128
> - we need a virtual PCIe device to connect to the internal switch, which 
> must be mapped at 0xf4000000 (normally used for the NAND which must then 
> move to 0xff000000)

Aah, if you need 256 MB, then you need to adjust the ranges, because
by default there is only 128 MB for PCIe memory. So, you would need
something like:

So, within the pcie-controller node, you should do something like:

			ranges = <0x82000000 0 0x00040000 0x00040000 0 0x00002000   /* Port 0.0 registers */
				  0x82000000 0 0xe0000000 0xe0000000 0 0x10000000   /* non-prefetchable memory */
			          0x81000000 0 0          0xf0000000 0 0x00100000>; /* downstream I/O */

and in the ranges property at the ocp { } level, you should do something like:


		ranges = <0x00000000 0xf1000000 0x0100000
		          0xe0000000 0xe0000000 0x10100000 /* PCIE */
		          0xf4000000 0xf4000000 0x0000400
		          0xf5000000 0xf5000000 0x0000400>;

Basically, before the change the configuration was:

 * 128 MB of PCIe memory at 0xe0000000
 * 1 MB of PCIe I/O at 0xe8000000

After the change, you have:

 * 256 MB of PCIe memory at 0xe0000000
 * 1 MB of PCIe I/O at 0xf0000000

Best regards,

Thomas
-- 
Thomas Petazzoni, Free Electrons
Kernel, drivers, real-time and embedded Linux
development, consulting, training and support.
http://free-electrons.com

^ permalink raw reply	[flat|nested] 90+ messages in thread

* pci-mvebu driver on km_kirkwood
  2013-07-11  7:03     ` Valentin Longchamp
@ 2013-07-12  8:59       ` Thomas Petazzoni
  2013-07-15 15:46         ` Valentin Longchamp
  0 siblings, 1 reply; 90+ messages in thread
From: Thomas Petazzoni @ 2013-07-12  8:59 UTC (permalink / raw)
  To: linux-arm-kernel

Dear Valentin Longchamp,

On Thu, 11 Jul 2013 09:03:59 +0200, Valentin Longchamp wrote:

> On the board you are currently using for your tests, it is the case (the whole
> map is not used ... things are scattered over the 256 MB, with one 128MB BAR).
> If you want to get rid of this problem, we have another board that does not
> require these (256 MB .. and I have one of them on my desk that you can use for
> your tests).
> 
> On the kirkwood variant we use, there is only _one_ real PCIe controller. In
> order to map 256MB for the MEM space of this controller without having further
> memory map conflicts, what was done was to not enable the CPU windows for the
> usual 2nd PCIe controller and set a wider CPU window for the MEM space of the
> only PCIe controller.

Such tricks are no longer needed with the new PCIe driver. Instead of
assigning address ranges per PCIe controller, the new PCIe driver
(together with the mvebu-mbus driver) allows to specify one global
range of addresses for PCIe mem, and the PCIe driver will automatically
figure out which devices are available on which PCIe bus, how much PCIe
mem then need, and create the MBus windows accordingly.

However of course, as I pointed out in an earlier e-mail, this global
range must be suitably sized to allow the mapping of all PCIe devices.
By default, we've made it 128 MB large, but in this case, it looks like
you would need 256 MB.

But there's no need to disable the second PCIe controller anymore. If
there's nothing connected to it, not PCIe window will be created for it.

> > - we need a virtual PCIe device to connect to the internal switch, which 
> > must be mapped at 0xf4000000 (normally used for the NAND which must then 
> > move to 0xff000000)
> > 
> 
> I think you can forget this for the time being. This is called (also in
> Marvell's doc) a virtual PCIe controller, but apart that it is then memory
> mapped, this has nothing to do with PCIe (although I don't know what is done
> internally in the SoC). It is a problem because the physical address chosen for
> this CPU window conflicts with the one that is used for the NAND controller in
> the current kirkwood Linux memory map. But this is another topic and it should
> not play any role in this PCIe topic.

I'm not sure to follow this story of a virtual PCIe controller sitting
at 0xf4000000. Can you give a few more details?

Thomas
-- 
Thomas Petazzoni, Free Electrons
Kernel, drivers, real-time and embedded Linux
development, consulting, training and support.
http://free-electrons.com

^ permalink raw reply	[flat|nested] 90+ messages in thread

* pci-mvebu driver on km_kirkwood
  2013-07-12  8:59       ` Thomas Petazzoni
@ 2013-07-15 15:46         ` Valentin Longchamp
  2013-07-15 19:51           ` Thomas Petazzoni
  0 siblings, 1 reply; 90+ messages in thread
From: Valentin Longchamp @ 2013-07-15 15:46 UTC (permalink / raw)
  To: linux-arm-kernel

Hello Thomas,

On 07/12/2013 10:59 AM, Thomas Petazzoni wrote:
> Dear Valentin Longchamp,
> 
> On Thu, 11 Jul 2013 09:03:59 +0200, Valentin Longchamp wrote:
> 
>> On the board you are currently using for your tests, it is the case (the whole
>> map is not used ... things are scattered over the 256 MB, with one 128MB BAR).
>> If you want to get rid of this problem, we have another board that does not
>> require these (256 MB .. and I have one of them on my desk that you can use for
>> your tests).
>>
>> On the kirkwood variant we use, there is only _one_ real PCIe controller. In
>> order to map 256MB for the MEM space of this controller without having further
>> memory map conflicts, what was done was to not enable the CPU windows for the
>> usual 2nd PCIe controller and set a wider CPU window for the MEM space of the
>> only PCIe controller.
> 
> Such tricks are no longer needed with the new PCIe driver. Instead of
> assigning address ranges per PCIe controller, the new PCIe driver
> (together with the mvebu-mbus driver) allows to specify one global
> range of addresses for PCIe mem, and the PCIe driver will automatically
> figure out which devices are available on which PCIe bus, how much PCIe
> mem then need, and create the MBus windows accordingly.
> 
> However of course, as I pointed out in an earlier e-mail, this global
> range must be suitably sized to allow the mapping of all PCIe devices.
> By default, we've made it 128 MB large, but in this case, it looks like
> you would need 256 MB.
> 
> But there's no need to disable the second PCIe controller anymore. If
> there's nothing connected to it, not PCIe window will be created for it.

Thank you for this precision. That's a nice feature of the new PCIe driver.

> 
>>> - we need a virtual PCIe device to connect to the internal switch, which 
>>> must be mapped at 0xf4000000 (normally used for the NAND which must then 
>>> move to 0xff000000)
>>>
>>
>> I think you can forget this for the time being. This is called (also in
>> Marvell's doc) a virtual PCIe controller, but apart that it is then memory
>> mapped, this has nothing to do with PCIe (although I don't know what is done
>> internally in the SoC). It is a problem because the physical address chosen for
>> this CPU window conflicts with the one that is used for the NAND controller in
>> the current kirkwood Linux memory map. But this is another topic and it should
>> not play any role in this PCIe topic.
> 
> I'm not sure to follow this story of a virtual PCIe controller sitting
> at 0xf4000000. Can you give a few more details?
> 

I'm not sure I can give all the details here as the documentation that we have
for this is subject to an NDA. To keep it short, in the kirkwood SoC we use,
there is an Ethernet Switch that is accessed by the kirkwood through an internal
virtual PCIe controller. The switch management SW has some hard expectations
about the physical address for this "PCIe" memory mapped window which conflicts
with the ones defined in the current device trees.

Valentin

^ permalink raw reply	[flat|nested] 90+ messages in thread

* pci-mvebu driver on km_kirkwood
  2013-07-15 15:46         ` Valentin Longchamp
@ 2013-07-15 19:51           ` Thomas Petazzoni
  0 siblings, 0 replies; 90+ messages in thread
From: Thomas Petazzoni @ 2013-07-15 19:51 UTC (permalink / raw)
  To: linux-arm-kernel

Dear Valentin Longchamp,

On Mon, 15 Jul 2013 17:46:12 +0200, Valentin Longchamp wrote:

> > I'm not sure to follow this story of a virtual PCIe controller sitting
> > at 0xf4000000. Can you give a few more details?
> > 
> 
> I'm not sure I can give all the details here as the documentation that we have
> for this is subject to an NDA. To keep it short, in the kirkwood SoC we use,
> there is an Ethernet Switch that is accessed by the kirkwood through an internal
> virtual PCIe controller. The switch management SW has some hard expectations
> about the physical address for this "PCIe" memory mapped window which conflicts
> with the ones defined in the current device trees.

Ok. Note that the addresses chosen in the Device Tree can easily be
changed on a per-SoC or per-board basis, if needed.

Best regards,

Thomas
-- 
Thomas Petazzoni, Free Electrons
Kernel, drivers, real-time and embedded Linux
development, consulting, training and support.
http://free-electrons.com

^ permalink raw reply	[flat|nested] 90+ messages in thread

* pci-mvebu driver on km_kirkwood
  2013-07-10 16:15 pci-mvebu driver on km_kirkwood Gerlando Falauto
  2013-07-10 16:57 ` Thomas Petazzoni
@ 2013-07-31  8:03 ` Thomas Petazzoni
  2013-07-31  8:26   ` Gerlando Falauto
  1 sibling, 1 reply; 90+ messages in thread
From: Thomas Petazzoni @ 2013-07-31  8:03 UTC (permalink / raw)
  To: linux-arm-kernel

Dear Gerlando Falauto,

On Wed, 10 Jul 2013 18:15:32 +0200, Gerlando Falauto wrote:

> I am trying to use the pci-mvebu driver on one of our km_kirkwood 
> boards. The board is based on Marvell's 98dx4122, which should 
> essentially be 6281 compatible.

In the end, did you manage to get the pci-mvebu driver to work on your
platform?

Thanks,

Thomas
-- 
Thomas Petazzoni, Free Electrons
Kernel, drivers, real-time and embedded Linux
development, consulting, training and support.
http://free-electrons.com

^ permalink raw reply	[flat|nested] 90+ messages in thread

* pci-mvebu driver on km_kirkwood
  2013-07-31  8:03 ` Thomas Petazzoni
@ 2013-07-31  8:26   ` Gerlando Falauto
  2013-07-31  9:00     ` Thomas Petazzoni
  0 siblings, 1 reply; 90+ messages in thread
From: Gerlando Falauto @ 2013-07-31  8:26 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Thomas,

On 07/31/2013 10:03 AM, Thomas Petazzoni wrote:
> Dear Gerlando Falauto,
>
> On Wed, 10 Jul 2013 18:15:32 +0200, Gerlando Falauto wrote:
>
>> I am trying to use the pci-mvebu driver on one of our km_kirkwood
>> boards. The board is based on Marvell's 98dx4122, which should
>> essentially be 6281 compatible.
>
> In the end, did you manage to get the pci-mvebu driver to work on your
> platform?

Yes, I did -- though I didn't go much beyond simple device probing (i.e. 
no real, intense usage of devices). AND I'm not using the DT-based mbus 
driver (i.e. addresses are still hardcoded within the source code).

Actually, the main reason for trying to use this driver was because I 
wanted to model a PCIe *device* within the device tree, so to expose its 
GPIOs and IRQs to be referenced (through phandles) from other device 
tree nodes. The way I understand it, turns out this is not the way to 
go, as PCI/PCIe are essentially enumerated busses, so you're not 
supposed to -and it's not a trivial task to- put any information about 
real devices within the device tree.
Do you have any suggestion about that?

On the other hand, for our use case I'm afraid there might be some 
hardcoded values within drivers or userspace code, where a certain PCIe 
device is expected to be connected within a given bus number with a 
given device number (bleah!).
If I understand correctly, your driver creates a virtual PCI-to-PCI 
bridge, so our devices would be connected to BUS #1 as opposed to #0 -- 
which might break existing (cr*ee*ppy) code.
But that's not your fault of course.

If you're interested, I can keep you posted as soon as we proceed 
further with this (most likely in September or so).

Next step would be to test Ezequiel's MBus DT binding [PATCH v8], but 
I'm afraid that'll have to wait too, until end of August or so, as I am 
about to leave for vacation.

Thank you!
Gerlando

^ permalink raw reply	[flat|nested] 90+ messages in thread

* pci-mvebu driver on km_kirkwood
  2013-07-31  8:26   ` Gerlando Falauto
@ 2013-07-31  9:00     ` Thomas Petazzoni
  2013-07-31 20:50       ` Jason Gunthorpe
  0 siblings, 1 reply; 90+ messages in thread
From: Thomas Petazzoni @ 2013-07-31  9:00 UTC (permalink / raw)
  To: linux-arm-kernel

Dear Gerlando Falauto,

[ Device Tree mailing list readers: there is a question for you below. ]

On Wed, 31 Jul 2013 10:26:44 +0200, Gerlando Falauto wrote:

> >> I am trying to use the pci-mvebu driver on one of our km_kirkwood
> >> boards. The board is based on Marvell's 98dx4122, which should
> >> essentially be 6281 compatible.
> >
> > In the end, did you manage to get the pci-mvebu driver to work on your
> > platform?
> 
> Yes, I did -- though I didn't go much beyond simple device probing (i.e. 
> no real, intense usage of devices).

Ok, good.

> AND I'm not using the DT-based mbus 
> driver (i.e. addresses are still hardcoded within the source code).

Ok, that will be the next step, but I don't expect you to face many
issues. The DT-based mbus doesn't change much the internal logic, it's
really just the DT representation that's different. On the other hand,
the new PCIe driver was completely changing the internal logic, by
adding the emulated PCI-to-PCI bridge.

> Actually, the main reason for trying to use this driver was because I 
> wanted to model a PCIe *device* within the device tree, so to expose its 
> GPIOs and IRQs to be referenced (through phandles) from other device 
> tree nodes. The way I understand it, turns out this is not the way to 
> go, as PCI/PCIe are essentially enumerated busses, so you're not 
> supposed to -and it's not a trivial task to- put any information about 
> real devices within the device tree.
> Do you have any suggestion about that?

Indeed, PCI/PCIe devices are enumerated dynamically, so they are not
listed in the Device Tree, so there's no way to "attach" more
information to them.

Device Tree people, any suggestion about the above question?

> On the other hand, for our use case I'm afraid there might be some 
> hardcoded values within drivers or userspace code, where a certain PCIe 
> device is expected to be connected within a given bus number with a 
> given device number (bleah!).
> If I understand correctly, your driver creates a virtual PCI-to-PCI 
> bridge, so our devices would be connected to BUS #1 as opposed to #0 -- 
> which might break existing (cr*ee*ppy) code.
> But that's not your fault of course.

Yeah, I believe normally userspace code shouldn't rely on a particular
PCI bus topology.

> If you're interested, I can keep you posted as soon as we proceed 
> further with this (most likely in September or so).

Sure.

> Next step would be to test Ezequiel's MBus DT binding [PATCH v8], but 
> I'm afraid that'll have to wait too, until end of August or so, as I am 
> about to leave for vacation.

Ok, thanks!

Thomas
-- 
Thomas Petazzoni, Free Electrons
Kernel, drivers, real-time and embedded Linux
development, consulting, training and support.
http://free-electrons.com

^ permalink raw reply	[flat|nested] 90+ messages in thread

* pci-mvebu driver on km_kirkwood
  2013-07-31  9:00     ` Thomas Petazzoni
@ 2013-07-31 20:50       ` Jason Gunthorpe
  2013-08-09 14:01         ` Thierry Reding
  0 siblings, 1 reply; 90+ messages in thread
From: Jason Gunthorpe @ 2013-07-31 20:50 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, Jul 31, 2013 at 11:00:45AM +0200, Thomas Petazzoni wrote:

> > Actually, the main reason for trying to use this driver was because I 
> > wanted to model a PCIe *device* within the device tree, so to expose its 
> > GPIOs and IRQs to be referenced (through phandles) from other device 
> > tree nodes. The way I understand it, turns out this is not the way to 
> > go, as PCI/PCIe are essentially enumerated busses, so you're not 
> > supposed to -and it's not a trivial task to- put any information about 
> > real devices within the device tree.
> > Do you have any suggestion about that?
> 
> Indeed, PCI/PCIe devices are enumerated dynamically, so they are not
> listed in the Device Tree, so there's no way to "attach" more
> information to them.
> 
> Device Tree people, any suggestion about the above question?

No, that isn't true.

Device tree can include the discovered PCI devices, you have to use
the special reg encoding and all that weirdness, but it does work. The
of_node will be attached to the struct pci device automatically.

On server/etc DT platforms the firmware will do PCI discovery and
resource assignment then dump all those results into DT for the OS to
reference.

This is a major reason why we wanted to see the standard PCI DT be
used for Marvell/etc, the existing infrastructure for this is
valuable.

AFAIK, Thierry has tested this on tegra, and I am doing it on Kirkwood
(though not yet with the new driver).

It is useful for exactly the reason stated - you can describe GPIOs,
I2C busses, etc, etc in DT and then upon load of the PCI driver engage
the DT code to populate and connect all that downstream
infrastructure.

I understand someday DT overlays might be a better alternative for
this, but AFAIK today in mainline this is what we have..

That said, the guideline to not include discoverable information in DT
is a good guideline for upstream DTs..

Jason

^ permalink raw reply	[flat|nested] 90+ messages in thread

* pci-mvebu driver on km_kirkwood
  2013-07-31 20:50       ` Jason Gunthorpe
@ 2013-08-09 14:01         ` Thierry Reding
  2013-08-26  9:27             ` Gerlando Falauto
  0 siblings, 1 reply; 90+ messages in thread
From: Thierry Reding @ 2013-08-09 14:01 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, Jul 31, 2013 at 02:50:34PM -0600, Jason Gunthorpe wrote:
> On Wed, Jul 31, 2013 at 11:00:45AM +0200, Thomas Petazzoni wrote:
> 
> > > Actually, the main reason for trying to use this driver was because I 
> > > wanted to model a PCIe *device* within the device tree, so to expose its 
> > > GPIOs and IRQs to be referenced (through phandles) from other device 
> > > tree nodes. The way I understand it, turns out this is not the way to 
> > > go, as PCI/PCIe are essentially enumerated busses, so you're not 
> > > supposed to -and it's not a trivial task to- put any information about 
> > > real devices within the device tree.
> > > Do you have any suggestion about that?
> > 
> > Indeed, PCI/PCIe devices are enumerated dynamically, so they are not
> > listed in the Device Tree, so there's no way to "attach" more
> > information to them.
> > 
> > Device Tree people, any suggestion about the above question?
> 
> No, that isn't true.
> 
> Device tree can include the discovered PCI devices, you have to use
> the special reg encoding and all that weirdness, but it does work. The
> of_node will be attached to the struct pci device automatically.
> 
> On server/etc DT platforms the firmware will do PCI discovery and
> resource assignment then dump all those results into DT for the OS to
> reference.
> 
> This is a major reason why we wanted to see the standard PCI DT be
> used for Marvell/etc, the existing infrastructure for this is
> valuable.
> 
> AFAIK, Thierry has tested this on tegra, and I am doing it on Kirkwood
> (though not yet with the new driver).
> 
> It is useful for exactly the reason stated - you can describe GPIOs,
> I2C busses, etc, etc in DT and then upon load of the PCI driver engage
> the DT code to populate and connect all that downstream
> infrastructure.

Obviously this doesn't work in general purpose systems because the PCI
hierarchy needs to be hardcoded in the DT. If you start adding and
removing PCI devices that will likely change the hierarchy and break
this matching of PCI device to DT node.

It's quite unlikely to have a need to hook up GPIOs or IRQs via DT in a
general purpose system, though, so I don't really see that being a big
problem.

Thierry
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 836 bytes
Desc: not available
URL: <http://lists.infradead.org/pipermail/linux-arm-kernel/attachments/20130809/52a28f3e/attachment.sig>

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: pci-mvebu driver on km_kirkwood
  2013-08-09 14:01         ` Thierry Reding
@ 2013-08-26  9:27             ` Gerlando Falauto
  0 siblings, 0 replies; 90+ messages in thread
From: Gerlando Falauto @ 2013-08-26  9:27 UTC (permalink / raw)
  To: Thierry Reding
  Cc: Thomas Petazzoni, Longchamp, Valentin, Jason Cooper, devicetree,
	Jason Gunthorpe, Andrew Lunn, Ezequiel Garcia, linux-arm-kernel,
	Sebastian Hesselbarth

Hi guys [particularly Jason and Thierry],

sorry for the prolonged silence, here I am back again...

On 08/09/2013 04:01 PM, Thierry Reding wrote:
> On Wed, Jul 31, 2013 at 02:50:34PM -0600, Jason Gunthorpe wrote:
>> On Wed, Jul 31, 2013 at 11:00:45AM +0200, Thomas Petazzoni wrote:
>>
>>>> Actually, the main reason for trying to use this driver was because I
>>>> wanted to model a PCIe *device* within the device tree, so to expose its
>>>> GPIOs and IRQs to be referenced (through phandles) from other device
>>>> tree nodes. The way I understand it, turns out this is not the way to
>>>> go, as PCI/PCIe are essentially enumerated busses, so you're not
>>>> supposed to -and it's not a trivial task to- put any information about
>>>> real devices within the device tree.
>>>> Do you have any suggestion about that?
>>>
>>> Indeed, PCI/PCIe devices are enumerated dynamically, so they are not
>>> listed in the Device Tree, so there's no way to "attach" more
>>> information to them.
>>>
>>> Device Tree people, any suggestion about the above question?
>>
>> No, that isn't true.
>>
>> Device tree can include the discovered PCI devices, you have to use
>> the special reg encoding and all that weirdness, but it does work. The
>> of_node will be attached to the struct pci device automatically.

So you mean that, assuming I knew the topology, I could populate the 
device tree in advance (e.g. statically), so that it already includes 
*devices* which will be further discovered during probing?
Or else you mean the {firmware,u-boot} can do that prior to starting the OS?
If either of the above is true, could you please suggest some example 
(or some way to get one)?
I assume the "reg" property (and the after-"@" node name) will need to 
encode (at least) the device number, is that right?

I tried reading the "PCI Bus Binding to Open Firmware" but I could not 
make complete sense out of it...

>> On server/etc DT platforms the firmware will do PCI discovery and
>> resource assignment then dump all those results into DT for the OS to
>> reference.
>>
>> This is a major reason why we wanted to see the standard PCI DT be
>> used for Marvell/etc, the existing infrastructure for this is
>> valuable.
>>
>> AFAIK, Thierry has tested this on tegra, and I am doing it on Kirkwood
>> (though not yet with the new driver).

Could you please give a pointer to some example of this? I'm not quite 
sure I understand what you guys are talking about.

>>
>> It is useful for exactly the reason stated - you can describe GPIOs,
>> I2C busses, etc, etc in DT and then upon load of the PCI driver engage
>> the DT code to populate and connect all that downstream
>> infrastructure.

I'm not 100% sure I made myself clear though.
What I would like to do is to have *other* parts of the device tree be 
able to reference (i.e., connect to, through phandles) a PCI device 
(because it provides a GPIO, for instance).
Is that also what you mean?

> Obviously this doesn't work in general purpose systems because the PCI
> hierarchy needs to be hardcoded in the DT. If you start adding and
> removing PCI devices that will likely change the hierarchy and break
> this matching of PCI device to DT node.

Yes, I guess in that case (if ever) we would need some other way that 
the device number (is that the same as the physical slot?) to specify a 
particular "hotplug" device (i.e. maybe a serial number or so)?
But that's definitely out of scope here.

>
> It's quite unlikely to have a need to hook up GPIOs or IRQs via DT in a
> general purpose system, though, so I don't really see that being a big
> problem.

Agreed.

>
> Thierry

Thanks again for your patience...

Gerlando

^ permalink raw reply	[flat|nested] 90+ messages in thread

* pci-mvebu driver on km_kirkwood
@ 2013-08-26  9:27             ` Gerlando Falauto
  0 siblings, 0 replies; 90+ messages in thread
From: Gerlando Falauto @ 2013-08-26  9:27 UTC (permalink / raw)
  To: linux-arm-kernel

Hi guys [particularly Jason and Thierry],

sorry for the prolonged silence, here I am back again...

On 08/09/2013 04:01 PM, Thierry Reding wrote:
> On Wed, Jul 31, 2013 at 02:50:34PM -0600, Jason Gunthorpe wrote:
>> On Wed, Jul 31, 2013 at 11:00:45AM +0200, Thomas Petazzoni wrote:
>>
>>>> Actually, the main reason for trying to use this driver was because I
>>>> wanted to model a PCIe *device* within the device tree, so to expose its
>>>> GPIOs and IRQs to be referenced (through phandles) from other device
>>>> tree nodes. The way I understand it, turns out this is not the way to
>>>> go, as PCI/PCIe are essentially enumerated busses, so you're not
>>>> supposed to -and it's not a trivial task to- put any information about
>>>> real devices within the device tree.
>>>> Do you have any suggestion about that?
>>>
>>> Indeed, PCI/PCIe devices are enumerated dynamically, so they are not
>>> listed in the Device Tree, so there's no way to "attach" more
>>> information to them.
>>>
>>> Device Tree people, any suggestion about the above question?
>>
>> No, that isn't true.
>>
>> Device tree can include the discovered PCI devices, you have to use
>> the special reg encoding and all that weirdness, but it does work. The
>> of_node will be attached to the struct pci device automatically.

So you mean that, assuming I knew the topology, I could populate the 
device tree in advance (e.g. statically), so that it already includes 
*devices* which will be further discovered during probing?
Or else you mean the {firmware,u-boot} can do that prior to starting the OS?
If either of the above is true, could you please suggest some example 
(or some way to get one)?
I assume the "reg" property (and the after-"@" node name) will need to 
encode (at least) the device number, is that right?

I tried reading the "PCI Bus Binding to Open Firmware" but I could not 
make complete sense out of it...

>> On server/etc DT platforms the firmware will do PCI discovery and
>> resource assignment then dump all those results into DT for the OS to
>> reference.
>>
>> This is a major reason why we wanted to see the standard PCI DT be
>> used for Marvell/etc, the existing infrastructure for this is
>> valuable.
>>
>> AFAIK, Thierry has tested this on tegra, and I am doing it on Kirkwood
>> (though not yet with the new driver).

Could you please give a pointer to some example of this? I'm not quite 
sure I understand what you guys are talking about.

>>
>> It is useful for exactly the reason stated - you can describe GPIOs,
>> I2C busses, etc, etc in DT and then upon load of the PCI driver engage
>> the DT code to populate and connect all that downstream
>> infrastructure.

I'm not 100% sure I made myself clear though.
What I would like to do is to have *other* parts of the device tree be 
able to reference (i.e., connect to, through phandles) a PCI device 
(because it provides a GPIO, for instance).
Is that also what you mean?

> Obviously this doesn't work in general purpose systems because the PCI
> hierarchy needs to be hardcoded in the DT. If you start adding and
> removing PCI devices that will likely change the hierarchy and break
> this matching of PCI device to DT node.

Yes, I guess in that case (if ever) we would need some other way that 
the device number (is that the same as the physical slot?) to specify a 
particular "hotplug" device (i.e. maybe a serial number or so)?
But that's definitely out of scope here.

>
> It's quite unlikely to have a need to hook up GPIOs or IRQs via DT in a
> general purpose system, though, so I don't really see that being a big
> problem.

Agreed.

>
> Thierry

Thanks again for your patience...

Gerlando

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: pci-mvebu driver on km_kirkwood
  2013-08-26  9:27             ` Gerlando Falauto
@ 2013-08-26 12:02               ` Thierry Reding
  -1 siblings, 0 replies; 90+ messages in thread
From: Thierry Reding @ 2013-08-26 12:02 UTC (permalink / raw)
  To: Gerlando Falauto
  Cc: Thomas Petazzoni, Longchamp, Valentin, Jason Cooper, devicetree,
	Jason Gunthorpe, Andrew Lunn, Ezequiel Garcia, linux-arm-kernel,
	Sebastian Hesselbarth


[-- Attachment #1.1: Type: text/plain, Size: 4118 bytes --]

On Mon, Aug 26, 2013 at 11:27:06AM +0200, Gerlando Falauto wrote:
> Hi guys [particularly Jason and Thierry],
> 
> sorry for the prolonged silence, here I am back again...
> 
> On 08/09/2013 04:01 PM, Thierry Reding wrote:
> >On Wed, Jul 31, 2013 at 02:50:34PM -0600, Jason Gunthorpe wrote:
> >>On Wed, Jul 31, 2013 at 11:00:45AM +0200, Thomas Petazzoni wrote:
> >>
> >>>>Actually, the main reason for trying to use this driver was because I
> >>>>wanted to model a PCIe *device* within the device tree, so to expose its
> >>>>GPIOs and IRQs to be referenced (through phandles) from other device
> >>>>tree nodes. The way I understand it, turns out this is not the way to
> >>>>go, as PCI/PCIe are essentially enumerated busses, so you're not
> >>>>supposed to -and it's not a trivial task to- put any information about
> >>>>real devices within the device tree.
> >>>>Do you have any suggestion about that?
> >>>
> >>>Indeed, PCI/PCIe devices are enumerated dynamically, so they are not
> >>>listed in the Device Tree, so there's no way to "attach" more
> >>>information to them.
> >>>
> >>>Device Tree people, any suggestion about the above question?
> >>
> >>No, that isn't true.
> >>
> >>Device tree can include the discovered PCI devices, you have to use
> >>the special reg encoding and all that weirdness, but it does work. The
> >>of_node will be attached to the struct pci device automatically.
> 
> So you mean that, assuming I knew the topology, I could populate the
> device tree in advance (e.g. statically), so that it already
> includes *devices* which will be further discovered during probing?
> Or else you mean the {firmware,u-boot} can do that prior to starting the OS?
> If either of the above is true, could you please suggest some
> example (or some way to get one)?
> I assume the "reg" property (and the after-"@" node name) will need
> to encode (at least) the device number, is that right?
> 
> I tried reading the "PCI Bus Binding to Open Firmware" but I could
> not make complete sense out of it...

You can find an example of this here:

	https://gitorious.org/thierryreding/linux/commit/b85c03d73288f6e376fc158ceac30f29680b4192

It's been quite some time that I've actually tested that, but it used to
work properly. What you basically need to do is represent the whole bus
hierarchy within the DT. In the above example there's the top-level root
port (pci@1,0), which provides a bus (1) on which there's a switch named
pci@0,0. That switch provides another bus (2) on which more devices are
listed (pci@[012345],0). Those are all downstream ports providing
separate busses again and have a single device attached to them.

You can pretty much arbitrarily nest nodes that way to represent any
hierarchy you want. The tricky part is to get the node numbering right
but `lspci -t' helps quite a bit with that.

> >>It is useful for exactly the reason stated - you can describe GPIOs,
> >>I2C busses, etc, etc in DT and then upon load of the PCI driver engage
> >>the DT code to populate and connect all that downstream
> >>infrastructure.
> 
> I'm not 100% sure I made myself clear though.
> What I would like to do is to have *other* parts of the device tree
> be able to reference (i.e., connect to, through phandles) a PCI
> device (because it provides a GPIO, for instance).
> Is that also what you mean?

Yes. In the example above you'll see that there's actually a GPIO
controller (pci@1,0/pci@0,0/pci@2,0/pci@0,0), so you could simply
associate a phandle with it, as in:

	gpioext: pci@0,0 {
		...
	};

And then hook up other devices to it using the regular notation:

	foo {
		...
		enable-gpios = <&gpioext 0 0>;
		...
	};

That's not done in this example but I've actually used something very
similar to that on an x86 platform to hook up the reset pin of an I2C
touchscreen controller to a GPIO controller, where both the I2C and
GPIO controllers were on the PCI bus.

I can't find that snippet right now, but I can look more thoroughly if
the above doesn't help you at all.

Thierry

[-- Attachment #1.2: Type: application/pgp-signature, Size: 836 bytes --]

[-- Attachment #2: Type: text/plain, Size: 176 bytes --]

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 90+ messages in thread

* pci-mvebu driver on km_kirkwood
@ 2013-08-26 12:02               ` Thierry Reding
  0 siblings, 0 replies; 90+ messages in thread
From: Thierry Reding @ 2013-08-26 12:02 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, Aug 26, 2013 at 11:27:06AM +0200, Gerlando Falauto wrote:
> Hi guys [particularly Jason and Thierry],
> 
> sorry for the prolonged silence, here I am back again...
> 
> On 08/09/2013 04:01 PM, Thierry Reding wrote:
> >On Wed, Jul 31, 2013 at 02:50:34PM -0600, Jason Gunthorpe wrote:
> >>On Wed, Jul 31, 2013 at 11:00:45AM +0200, Thomas Petazzoni wrote:
> >>
> >>>>Actually, the main reason for trying to use this driver was because I
> >>>>wanted to model a PCIe *device* within the device tree, so to expose its
> >>>>GPIOs and IRQs to be referenced (through phandles) from other device
> >>>>tree nodes. The way I understand it, turns out this is not the way to
> >>>>go, as PCI/PCIe are essentially enumerated busses, so you're not
> >>>>supposed to -and it's not a trivial task to- put any information about
> >>>>real devices within the device tree.
> >>>>Do you have any suggestion about that?
> >>>
> >>>Indeed, PCI/PCIe devices are enumerated dynamically, so they are not
> >>>listed in the Device Tree, so there's no way to "attach" more
> >>>information to them.
> >>>
> >>>Device Tree people, any suggestion about the above question?
> >>
> >>No, that isn't true.
> >>
> >>Device tree can include the discovered PCI devices, you have to use
> >>the special reg encoding and all that weirdness, but it does work. The
> >>of_node will be attached to the struct pci device automatically.
> 
> So you mean that, assuming I knew the topology, I could populate the
> device tree in advance (e.g. statically), so that it already
> includes *devices* which will be further discovered during probing?
> Or else you mean the {firmware,u-boot} can do that prior to starting the OS?
> If either of the above is true, could you please suggest some
> example (or some way to get one)?
> I assume the "reg" property (and the after-"@" node name) will need
> to encode (at least) the device number, is that right?
> 
> I tried reading the "PCI Bus Binding to Open Firmware" but I could
> not make complete sense out of it...

You can find an example of this here:

	https://gitorious.org/thierryreding/linux/commit/b85c03d73288f6e376fc158ceac30f29680b4192

It's been quite some time that I've actually tested that, but it used to
work properly. What you basically need to do is represent the whole bus
hierarchy within the DT. In the above example there's the top-level root
port (pci at 1,0), which provides a bus (1) on which there's a switch named
pci at 0,0. That switch provides another bus (2) on which more devices are
listed (pci@[012345],0). Those are all downstream ports providing
separate busses again and have a single device attached to them.

You can pretty much arbitrarily nest nodes that way to represent any
hierarchy you want. The tricky part is to get the node numbering right
but `lspci -t' helps quite a bit with that.

> >>It is useful for exactly the reason stated - you can describe GPIOs,
> >>I2C busses, etc, etc in DT and then upon load of the PCI driver engage
> >>the DT code to populate and connect all that downstream
> >>infrastructure.
> 
> I'm not 100% sure I made myself clear though.
> What I would like to do is to have *other* parts of the device tree
> be able to reference (i.e., connect to, through phandles) a PCI
> device (because it provides a GPIO, for instance).
> Is that also what you mean?

Yes. In the example above you'll see that there's actually a GPIO
controller (pci at 1,0/pci at 0,0/pci at 2,0/pci at 0,0), so you could simply
associate a phandle with it, as in:

	gpioext: pci at 0,0 {
		...
	};

And then hook up other devices to it using the regular notation:

	foo {
		...
		enable-gpios = <&gpioext 0 0>;
		...
	};

That's not done in this example but I've actually used something very
similar to that on an x86 platform to hook up the reset pin of an I2C
touchscreen controller to a GPIO controller, where both the I2C and
GPIO controllers were on the PCI bus.

I can't find that snippet right now, but I can look more thoroughly if
the above doesn't help you at all.

Thierry
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 836 bytes
Desc: not available
URL: <http://lists.infradead.org/pipermail/linux-arm-kernel/attachments/20130826/16513cf2/attachment-0001.sig>

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: pci-mvebu driver on km_kirkwood
  2013-08-26 12:02               ` Thierry Reding
@ 2013-08-26 14:49                 ` Gerlando Falauto
  -1 siblings, 0 replies; 90+ messages in thread
From: Gerlando Falauto @ 2013-08-26 14:49 UTC (permalink / raw)
  To: Thierry Reding
  Cc: Thomas Petazzoni, Longchamp, Valentin, Jason Cooper, devicetree,
	Jason Gunthorpe, Andrew Lunn, Ezequiel Garcia, linux-arm-kernel,
	Sebastian Hesselbarth

Hi Thierry,

On 08/26/2013 02:02 PM, Thierry Reding wrote:
> On Mon, Aug 26, 2013 at 11:27:06AM +0200, Gerlando Falauto wrote:
>> Hi guys [particularly Jason and Thierry],
>>
>> sorry for the prolonged silence, here I am back again...
>>
>> On 08/09/2013 04:01 PM, Thierry Reding wrote:
>>> On Wed, Jul 31, 2013 at 02:50:34PM -0600, Jason Gunthorpe wrote:
>>>> On Wed, Jul 31, 2013 at 11:00:45AM +0200, Thomas Petazzoni wrote:
>>>>
>>>>>> Actually, the main reason for trying to use this driver was because I
>>>>>> wanted to model a PCIe *device* within the device tree, so to expose its
>>>>>> GPIOs and IRQs to be referenced (through phandles) from other device
>>>>>> tree nodes. The way I understand it, turns out this is not the way to
>>>>>> go, as PCI/PCIe are essentially enumerated busses, so you're not
>>>>>> supposed to -and it's not a trivial task to- put any information about
>>>>>> real devices within the device tree.
>>>>>> Do you have any suggestion about that?
>>>>>
>>>>> Indeed, PCI/PCIe devices are enumerated dynamically, so they are not
>>>>> listed in the Device Tree, so there's no way to "attach" more
>>>>> information to them.
>>>>>
>>>>> Device Tree people, any suggestion about the above question?
>>>>
>>>> No, that isn't true.
>>>>
>>>> Device tree can include the discovered PCI devices, you have to use
>>>> the special reg encoding and all that weirdness, but it does work. The
>>>> of_node will be attached to the struct pci device automatically.
>>
>> So you mean that, assuming I knew the topology, I could populate the
>> device tree in advance (e.g. statically), so that it already
>> includes *devices* which will be further discovered during probing?
>> Or else you mean the {firmware,u-boot} can do that prior to starting the OS?
>> If either of the above is true, could you please suggest some
>> example (or some way to get one)?
>> I assume the "reg" property (and the after-"@" node name) will need
>> to encode (at least) the device number, is that right?
>>
>> I tried reading the "PCI Bus Binding to Open Firmware" but I could
>> not make complete sense out of it...
>
> You can find an example of this here:
>
> 	https://gitorious.org/thierryreding/linux/commit/b85c03d73288f6e376fc158ceac30f29680b4192
>

Thanks for your precious feedback. Unfortunately gitorious' servers are 
offline right now... :-(

> It's been quite some time that I've actually tested that, but it used to
> work properly. What you basically need to do is represent the whole bus
> hierarchy within the DT. In the above example there's the top-level root
> port (pci@1,0), which provides a bus (1) on which there's a switch named
> pci@0,0. That switch provides another bus (2) on which more devices are
> listed (pci@[012345],0). Those are all downstream ports providing
> separate busses again and have a single device attached to them.
>
> You can pretty much arbitrarily nest nodes that way to represent any
> hierarchy you want. The tricky part is to get the node numbering right
> but `lspci -t' helps quite a bit with that.

One last question though... what does then the numbering ("@a,b") stand 
for? I assume if the output of a plain (i.e. no params) 'lspci' is

bb:dd.f (bus:device.function)

I should only have a "pci@dd,f" node, with the bus numbering being 
imposed by the hierarchy after an actual probing, right?
So the actual bus number is never listed in the device tree (whereas the 
"@device,function" is). Is that right?

>>>> It is useful for exactly the reason stated - you can describe GPIOs,
>>>> I2C busses, etc, etc in DT and then upon load of the PCI driver engage
>>>> the DT code to populate and connect all that downstream
>>>> infrastructure.
>>
>> I'm not 100% sure I made myself clear though.
>> What I would like to do is to have *other* parts of the device tree
>> be able to reference (i.e., connect to, through phandles) a PCI
>> device (because it provides a GPIO, for instance).
>> Is that also what you mean?
>
> Yes. In the example above you'll see that there's actually a GPIO
> controller (pci@1,0/pci@0,0/pci@2,0/pci@0,0), so you could simply
> associate a phandle with it, as in:
>
> 	gpioext: pci@0,0 {
> 		...
> 	};
>
> And then hook up other devices to it using the regular notation:
>
> 	foo {
> 		...
> 		enable-gpios = <&gpioext 0 0>;
> 		...
> 	};
>
> That's not done in this example but I've actually used something very
> similar to that on an x86 platform to hook up the reset pin of an I2C
> touchscreen controller to a GPIO controller, where both the I2C and
> GPIO controllers were on the PCI bus.
>
> I can't find that snippet right now, but I can look more thoroughly if
> the above doesn't help you at all.
>
> Thierry
>

I guess I'll have to wait until gitorious.org actually does come back 
up... then you'll definitely hear from me again. :-)

Thanks!
Gerlando

^ permalink raw reply	[flat|nested] 90+ messages in thread

* pci-mvebu driver on km_kirkwood
@ 2013-08-26 14:49                 ` Gerlando Falauto
  0 siblings, 0 replies; 90+ messages in thread
From: Gerlando Falauto @ 2013-08-26 14:49 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Thierry,

On 08/26/2013 02:02 PM, Thierry Reding wrote:
> On Mon, Aug 26, 2013 at 11:27:06AM +0200, Gerlando Falauto wrote:
>> Hi guys [particularly Jason and Thierry],
>>
>> sorry for the prolonged silence, here I am back again...
>>
>> On 08/09/2013 04:01 PM, Thierry Reding wrote:
>>> On Wed, Jul 31, 2013 at 02:50:34PM -0600, Jason Gunthorpe wrote:
>>>> On Wed, Jul 31, 2013 at 11:00:45AM +0200, Thomas Petazzoni wrote:
>>>>
>>>>>> Actually, the main reason for trying to use this driver was because I
>>>>>> wanted to model a PCIe *device* within the device tree, so to expose its
>>>>>> GPIOs and IRQs to be referenced (through phandles) from other device
>>>>>> tree nodes. The way I understand it, turns out this is not the way to
>>>>>> go, as PCI/PCIe are essentially enumerated busses, so you're not
>>>>>> supposed to -and it's not a trivial task to- put any information about
>>>>>> real devices within the device tree.
>>>>>> Do you have any suggestion about that?
>>>>>
>>>>> Indeed, PCI/PCIe devices are enumerated dynamically, so they are not
>>>>> listed in the Device Tree, so there's no way to "attach" more
>>>>> information to them.
>>>>>
>>>>> Device Tree people, any suggestion about the above question?
>>>>
>>>> No, that isn't true.
>>>>
>>>> Device tree can include the discovered PCI devices, you have to use
>>>> the special reg encoding and all that weirdness, but it does work. The
>>>> of_node will be attached to the struct pci device automatically.
>>
>> So you mean that, assuming I knew the topology, I could populate the
>> device tree in advance (e.g. statically), so that it already
>> includes *devices* which will be further discovered during probing?
>> Or else you mean the {firmware,u-boot} can do that prior to starting the OS?
>> If either of the above is true, could you please suggest some
>> example (or some way to get one)?
>> I assume the "reg" property (and the after-"@" node name) will need
>> to encode (at least) the device number, is that right?
>>
>> I tried reading the "PCI Bus Binding to Open Firmware" but I could
>> not make complete sense out of it...
>
> You can find an example of this here:
>
> 	https://gitorious.org/thierryreding/linux/commit/b85c03d73288f6e376fc158ceac30f29680b4192
>

Thanks for your precious feedback. Unfortunately gitorious' servers are 
offline right now... :-(

> It's been quite some time that I've actually tested that, but it used to
> work properly. What you basically need to do is represent the whole bus
> hierarchy within the DT. In the above example there's the top-level root
> port (pci at 1,0), which provides a bus (1) on which there's a switch named
> pci at 0,0. That switch provides another bus (2) on which more devices are
> listed (pci@[012345],0). Those are all downstream ports providing
> separate busses again and have a single device attached to them.
>
> You can pretty much arbitrarily nest nodes that way to represent any
> hierarchy you want. The tricky part is to get the node numbering right
> but `lspci -t' helps quite a bit with that.

One last question though... what does then the numbering ("@a,b") stand 
for? I assume if the output of a plain (i.e. no params) 'lspci' is

bb:dd.f (bus:device.function)

I should only have a "pci at dd,f" node, with the bus numbering being 
imposed by the hierarchy after an actual probing, right?
So the actual bus number is never listed in the device tree (whereas the 
"@device,function" is). Is that right?

>>>> It is useful for exactly the reason stated - you can describe GPIOs,
>>>> I2C busses, etc, etc in DT and then upon load of the PCI driver engage
>>>> the DT code to populate and connect all that downstream
>>>> infrastructure.
>>
>> I'm not 100% sure I made myself clear though.
>> What I would like to do is to have *other* parts of the device tree
>> be able to reference (i.e., connect to, through phandles) a PCI
>> device (because it provides a GPIO, for instance).
>> Is that also what you mean?
>
> Yes. In the example above you'll see that there's actually a GPIO
> controller (pci at 1,0/pci at 0,0/pci at 2,0/pci at 0,0), so you could simply
> associate a phandle with it, as in:
>
> 	gpioext: pci at 0,0 {
> 		...
> 	};
>
> And then hook up other devices to it using the regular notation:
>
> 	foo {
> 		...
> 		enable-gpios = <&gpioext 0 0>;
> 		...
> 	};
>
> That's not done in this example but I've actually used something very
> similar to that on an x86 platform to hook up the reset pin of an I2C
> touchscreen controller to a GPIO controller, where both the I2C and
> GPIO controllers were on the PCI bus.
>
> I can't find that snippet right now, but I can look more thoroughly if
> the above doesn't help you at all.
>
> Thierry
>

I guess I'll have to wait until gitorious.org actually does come back 
up... then you'll definitely hear from me again. :-)

Thanks!
Gerlando

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: pci-mvebu driver on km_kirkwood
  2013-08-26 14:49                 ` Gerlando Falauto
@ 2013-08-26 19:16                   ` Jason Gunthorpe
  -1 siblings, 0 replies; 90+ messages in thread
From: Jason Gunthorpe @ 2013-08-26 19:16 UTC (permalink / raw)
  To: Gerlando Falauto
  Cc: Thomas Petazzoni, Andrew Lunn, Jason Cooper, devicetree,
	Thierry Reding, Ezequiel Garcia, Longchamp, Valentin,
	linux-arm-kernel, Sebastian Hesselbarth

On Mon, Aug 26, 2013 at 04:49:23PM +0200, Gerlando Falauto wrote:

> One last question though... what does then the numbering ("@a,b")
> stand for? I assume if the output of a plain (i.e. no params)
> 'lspci' is

It is device,function, but it is only descriptive and not used by
Linux.
 
> I should only have a "pci@dd,f" node, with the bus numbering being
> imposed by the hierarchy after an actual probing, right?
> So the actual bus number is never listed in the device tree (whereas
> the "@device,function" is). Is that right?

The reg must encode the bus number according to the OF format:

               33222222 22221111 11111100 00000000
               10987654 32109876 54321098 76543210
 phys.hi cell: npt000ss bbbbbbbb dddddfff rrrrrrrr
phys.mid cell: hhhhhhhh hhhhhhhh hhhhhhhh hhhhhhhh
 phys.lo cell: llllllll llllllll llllllll llllllll
 
bbbbbbbb is the 8-bit Bus Number
ddddd is the 5-bit Device Number
fff is the 3-bit Function Number

Others are 0.

Jason

^ permalink raw reply	[flat|nested] 90+ messages in thread

* pci-mvebu driver on km_kirkwood
@ 2013-08-26 19:16                   ` Jason Gunthorpe
  0 siblings, 0 replies; 90+ messages in thread
From: Jason Gunthorpe @ 2013-08-26 19:16 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, Aug 26, 2013 at 04:49:23PM +0200, Gerlando Falauto wrote:

> One last question though... what does then the numbering ("@a,b")
> stand for? I assume if the output of a plain (i.e. no params)
> 'lspci' is

It is device,function, but it is only descriptive and not used by
Linux.
 
> I should only have a "pci at dd,f" node, with the bus numbering being
> imposed by the hierarchy after an actual probing, right?
> So the actual bus number is never listed in the device tree (whereas
> the "@device,function" is). Is that right?

The reg must encode the bus number according to the OF format:

               33222222 22221111 11111100 00000000
               10987654 32109876 54321098 76543210
 phys.hi cell: npt000ss bbbbbbbb dddddfff rrrrrrrr
phys.mid cell: hhhhhhhh hhhhhhhh hhhhhhhh hhhhhhhh
 phys.lo cell: llllllll llllllll llllllll llllllll
 
bbbbbbbb is the 8-bit Bus Number
ddddd is the 5-bit Device Number
fff is the 3-bit Function Number

Others are 0.

Jason

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: pci-mvebu driver on km_kirkwood
  2013-08-26 19:16                   ` Jason Gunthorpe
@ 2013-11-04 14:49                       ` Gerlando Falauto
  -1 siblings, 0 replies; 90+ messages in thread
From: Gerlando Falauto @ 2013-11-04 14:49 UTC (permalink / raw)
  To: Jason Gunthorpe, Thierry Reding
  Cc: Thomas Petazzoni, Andrew Lunn, Jason Cooper, Longchamp, Valentin,
	devicetree-u79uwXL29TY76Z2rM5mHXA, Ezequiel Garcia,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	Sebastian Hesselbarth

Hi folks,

thank you for your patience...

So, thanks to Thierry's example:

 > 
https://gitorious.org/thierryreding/linux/commit/b85c03d73288f6e376fc158ceac30f29680b4192

and Jason's explanation:

 > The reg must encode the bus number according to the OF format:
 >
 >                 33222222 22221111 11111100 00000000
 >                 10987654 32109876 54321098 76543210
 >   phys.hi cell: npt000ss bbbbbbbb dddddfff rrrrrrrr
 > phys.mid cell: hhhhhhhh hhhhhhhh hhhhhhhh hhhhhhhh
 >   phys.lo cell: llllllll llllllll llllllll llllllll
 >
 > bbbbbbbb is the 8-bit Bus Number
 > ddddd is the 5-bit Device Number
 > fff is the 3-bit Function Number
 >
 > Others are 0.

I'm finally starting to make some sense out of this, and I checked that 
Jason's statement is indeed true, at least on 3.10:

 > Device tree can include the discovered PCI devices, you have to use
 > the special reg encoding and all that weirdness, but it does work. The
 > of_node will be attached to the struct pci device automatically.

[Hi latency was also due to other activities, not just the low 
throughput of my brain cells] ;-)

I have one last question for Thierry though: what's the point of things 
such as

+					pci@0,0 {
+						compatible = "opencores,spi";

(apart from clarity, of course)?
I mean, wouldn't the driver be bound to the device through its PCI 
vendor ID / device ID?
Are we also supposed to register a platform driver based on a compatible 
string instead?

Thanks again guys!
Gerlando
--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 90+ messages in thread

* pci-mvebu driver on km_kirkwood
@ 2013-11-04 14:49                       ` Gerlando Falauto
  0 siblings, 0 replies; 90+ messages in thread
From: Gerlando Falauto @ 2013-11-04 14:49 UTC (permalink / raw)
  To: linux-arm-kernel

Hi folks,

thank you for your patience...

So, thanks to Thierry's example:

 > 
https://gitorious.org/thierryreding/linux/commit/b85c03d73288f6e376fc158ceac30f29680b4192

and Jason's explanation:

 > The reg must encode the bus number according to the OF format:
 >
 >                 33222222 22221111 11111100 00000000
 >                 10987654 32109876 54321098 76543210
 >   phys.hi cell: npt000ss bbbbbbbb dddddfff rrrrrrrr
 > phys.mid cell: hhhhhhhh hhhhhhhh hhhhhhhh hhhhhhhh
 >   phys.lo cell: llllllll llllllll llllllll llllllll
 >
 > bbbbbbbb is the 8-bit Bus Number
 > ddddd is the 5-bit Device Number
 > fff is the 3-bit Function Number
 >
 > Others are 0.

I'm finally starting to make some sense out of this, and I checked that 
Jason's statement is indeed true, at least on 3.10:

 > Device tree can include the discovered PCI devices, you have to use
 > the special reg encoding and all that weirdness, but it does work. The
 > of_node will be attached to the struct pci device automatically.

[Hi latency was also due to other activities, not just the low 
throughput of my brain cells] ;-)

I have one last question for Thierry though: what's the point of things 
such as

+					pci at 0,0 {
+						compatible = "opencores,spi";

(apart from clarity, of course)?
I mean, wouldn't the driver be bound to the device through its PCI 
vendor ID / device ID?
Are we also supposed to register a platform driver based on a compatible 
string instead?

Thanks again guys!
Gerlando

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: pci-mvebu driver on km_kirkwood
  2013-11-04 14:49                       ` Gerlando Falauto
@ 2013-11-05  8:13                           ` Thierry Reding
  -1 siblings, 0 replies; 90+ messages in thread
From: Thierry Reding @ 2013-11-05  8:13 UTC (permalink / raw)
  To: Gerlando Falauto
  Cc: Jason Gunthorpe, Thomas Petazzoni, Andrew Lunn, Jason Cooper,
	Longchamp, Valentin, devicetree-u79uwXL29TY76Z2rM5mHXA,
	Ezequiel Garcia,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	Sebastian Hesselbarth

[-- Attachment #1: Type: text/plain, Size: 2383 bytes --]

On Mon, Nov 04, 2013 at 03:49:59PM +0100, Gerlando Falauto wrote:
> Hi folks,
> 
> thank you for your patience...
> 
> So, thanks to Thierry's example:
> 
> > https://gitorious.org/thierryreding/linux/commit/b85c03d73288f6e376fc158ceac30f29680b4192
> 
> and Jason's explanation:
> 
> > The reg must encode the bus number according to the OF format:
> >
> >                 33222222 22221111 11111100 00000000
> >                 10987654 32109876 54321098 76543210
> >   phys.hi cell: npt000ss bbbbbbbb dddddfff rrrrrrrr
> > phys.mid cell: hhhhhhhh hhhhhhhh hhhhhhhh hhhhhhhh
> >   phys.lo cell: llllllll llllllll llllllll llllllll
> >
> > bbbbbbbb is the 8-bit Bus Number
> > ddddd is the 5-bit Device Number
> > fff is the 3-bit Function Number
> >
> > Others are 0.
> 
> I'm finally starting to make some sense out of this, and I checked
> that Jason's statement is indeed true, at least on 3.10:
> 
> > Device tree can include the discovered PCI devices, you have to use
> > the special reg encoding and all that weirdness, but it does work. The
> > of_node will be attached to the struct pci device automatically.
> 
> [Hi latency was also due to other activities, not just the low
> throughput of my brain cells] ;-)
> 
> I have one last question for Thierry though: what's the point of
> things such as
> 
> +					pci@0,0 {
> +						compatible = "opencores,spi";
> 
> (apart from clarity, of course)?
> I mean, wouldn't the driver be bound to the device through its PCI
> vendor ID / device ID?
> Are we also supposed to register a platform driver based on a
> compatible string instead?

I think that compatible property is completely bogus. Or at least the
value is. The primary reason why I included them was for descriptive
purposes.

According to section 2.5 of the PCI Bus Binding to Open Firmware[0] this
should be something like:

	compatible = "pciVVVV,DDDD";

where VVVV is the vendor ID and DDDD is the device ID, both in
hexadecimal. Section 2.5 lists a few more, but I'm not sure exactly
which would really be required.

I'm not even sure that they really are required at all. The drivers will
certainly be able to bind to them via the standard vendor and device ID
matching as you say. And no, no platform driver required.

Thierry

[0]: http://www.openfirmware.org/1275/bindings/pci/pci2_1.pdf

[-- Attachment #2: Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 90+ messages in thread

* pci-mvebu driver on km_kirkwood
@ 2013-11-05  8:13                           ` Thierry Reding
  0 siblings, 0 replies; 90+ messages in thread
From: Thierry Reding @ 2013-11-05  8:13 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, Nov 04, 2013 at 03:49:59PM +0100, Gerlando Falauto wrote:
> Hi folks,
> 
> thank you for your patience...
> 
> So, thanks to Thierry's example:
> 
> > https://gitorious.org/thierryreding/linux/commit/b85c03d73288f6e376fc158ceac30f29680b4192
> 
> and Jason's explanation:
> 
> > The reg must encode the bus number according to the OF format:
> >
> >                 33222222 22221111 11111100 00000000
> >                 10987654 32109876 54321098 76543210
> >   phys.hi cell: npt000ss bbbbbbbb dddddfff rrrrrrrr
> > phys.mid cell: hhhhhhhh hhhhhhhh hhhhhhhh hhhhhhhh
> >   phys.lo cell: llllllll llllllll llllllll llllllll
> >
> > bbbbbbbb is the 8-bit Bus Number
> > ddddd is the 5-bit Device Number
> > fff is the 3-bit Function Number
> >
> > Others are 0.
> 
> I'm finally starting to make some sense out of this, and I checked
> that Jason's statement is indeed true, at least on 3.10:
> 
> > Device tree can include the discovered PCI devices, you have to use
> > the special reg encoding and all that weirdness, but it does work. The
> > of_node will be attached to the struct pci device automatically.
> 
> [Hi latency was also due to other activities, not just the low
> throughput of my brain cells] ;-)
> 
> I have one last question for Thierry though: what's the point of
> things such as
> 
> +					pci at 0,0 {
> +						compatible = "opencores,spi";
> 
> (apart from clarity, of course)?
> I mean, wouldn't the driver be bound to the device through its PCI
> vendor ID / device ID?
> Are we also supposed to register a platform driver based on a
> compatible string instead?

I think that compatible property is completely bogus. Or at least the
value is. The primary reason why I included them was for descriptive
purposes.

According to section 2.5 of the PCI Bus Binding to Open Firmware[0] this
should be something like:

	compatible = "pciVVVV,DDDD";

where VVVV is the vendor ID and DDDD is the device ID, both in
hexadecimal. Section 2.5 lists a few more, but I'm not sure exactly
which would really be required.

I'm not even sure that they really are required at all. The drivers will
certainly be able to bind to them via the standard vendor and device ID
matching as you say. And no, no platform driver required.

Thierry

[0]: http://www.openfirmware.org/1275/bindings/pci/pci2_1.pdf
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 836 bytes
Desc: not available
URL: <http://lists.infradead.org/pipermail/linux-arm-kernel/attachments/20131105/3402da29/attachment.sig>

^ permalink raw reply	[flat|nested] 90+ messages in thread

* pci-mvebu driver on km_kirkwood
  2013-07-11 14:32     ` Thomas Petazzoni
@ 2014-02-18 17:29       ` Gerlando Falauto
  2014-02-18 20:27         ` Thomas Petazzoni
  0 siblings, 1 reply; 90+ messages in thread
From: Gerlando Falauto @ 2014-02-18 17:29 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Thomas,

sorry for bringing up an old topic again...

On 07/11/2013 04:32 PM, Thomas Petazzoni wrote:
> Dear Gerlando Falauto,
>
> On Wed, 10 Jul 2013 19:31:56 +0200, Gerlando Falauto wrote:
>
>> Yes, though we had to trick it a little bit to get both the internal
>> switch and this PCIe device working:
>>
>> - this PCIe device requires to map 256M of memory as opposed to just 128
>> - we need a virtual PCIe device to connect to the internal switch, which
>> must be mapped at 0xf4000000 (normally used for the NAND which must then
>> move to 0xff000000)
>
> Aah, if you need 256 MB, then you need to adjust the ranges, because
> by default there is only 128 MB for PCIe memory. So, you would need
> something like:
>
> So, within the pcie-controller node, you should do something like:
>
> 			ranges = <0x82000000 0 0x00040000 0x00040000 0 0x00002000   /* Port 0.0 registers */
> 				  0x82000000 0 0xe0000000 0xe0000000 0 0x10000000   /* non-prefetchable memory */
> 			          0x81000000 0 0          0xf0000000 0 0x00100000>; /* downstream I/O */
>
> and in the ranges property at the ocp { } level, you should do something like:
>
>
> 		ranges = <0x00000000 0xf1000000 0x0100000
> 		          0xe0000000 0xe0000000 0x10100000 /* PCIE */
> 		          0xf4000000 0xf4000000 0x0000400
> 		          0xf5000000 0xf5000000 0x0000400>;
>
> Basically, before the change the configuration was:
>
>   * 128 MB of PCIe memory at 0xe0000000
>   * 1 MB of PCIe I/O at 0xe8000000
>
> After the change, you have:
>
>   * 256 MB of PCIe memory at 0xe0000000
>   * 1 MB of PCIe I/O at 0xf0000000
>

I tried these settings (a long time ago) and everything seemed to work 
fine. Except, we now have a different problem.
Essentially, this device requires 128MB for a given BAR to provide a 
PCI-to-localbus bridge. (another BAR provides the configuration space to 
configure chip select regions and so on).
Apparently, only the first 64MB of this BAR seem to work correctly with 
the new driver. As soon as you exceed that, reads (always?) return 0.
Other BARs (which are then of course assigned a higher region) seem to 
work just fine, so it looks like a per-BAR limitation.

This was not a problem with a 3.0 kernel. Do you have any idea what 
could be wrong here?
I'm currently using a 3.10 kernel, where your patches for the pci-mvebu 
driver were forcibly brought in (without full support for the MBUS 
description at device tree level though).

Thank you very much in advance,

Gerlando

P.S. Here's the relevant portion of the startup log so to give you an 
idea of the layout:

mvebu-pcie pcie-controller.1: PCIe0.0: link up
mvebu-pcie pcie-controller.1: PCI host bridge to bus 0000:00
pci_bus 0000:00: root bus resource [io  0x1000-0xfffff]
pci_bus 0000:00: root bus resource [mem 0xe0000000-0xefffffff]
pci_bus 0000:00: root bus resource [bus 00-ff]
pci 0000:00:01.0: [11ab:7846] type 01 class 0x060400
PCI: bus0: Fast back to back transfers disabled
pci 0000:00:01.0: bridge configuration invalid ([bus 00-00]), reconfiguring
pci 0000:01:00.0: [10ee:0008] type 00 class 0x050000
pci 0000:01:00.0: reg 10: [mem 0x00000000-0x00000fff]
pci 0000:01:00.0: reg 14: [mem 0x00000000-0x07ffffff]
pci 0000:01:00.0: reg 18: [mem 0x00000000-0x00000fff]
pci 0000:01:00.0: reg 1c: [mem 0x00000000-0x007fffff]
pci 0000:01:00.0: reg 20: [mem 0x00000000-0x00001fff]
pci 0000:01:00.0: reg 24: [mem 0x00000000-0x00000fff]
pci 0000:01:00.0: supports D1 D2
pci 0000:01:00.0: PME# supported from D0 D1 D2 D3hot
PCI: bus1: Fast back to back transfers disabled
pci_bus 0000:01: busn_res: [bus 01-ff] end is updated to 01
pci 0000:00:01.0: BAR 8: assigned [mem 0xe0000000-0xebffffff]
pci 0000:01:00.0: BAR 1: assigned [mem 0xe0000000-0xe7ffffff]
pci 0000:01:00.0: BAR 3: assigned [mem 0xe8000000-0xe87fffff]
pci 0000:01:00.0: BAR 4: assigned [mem 0xe8800000-0xe8801fff]
pci 0000:01:00.0: BAR 0: assigned [mem 0xe8802000-0xe8802fff]
pci 0000:01:00.0: BAR 2: assigned [mem 0xe8803000-0xe8803fff]
pci 0000:01:00.0: BAR 5: assigned [mem 0xe8804000-0xe8804fff]
pci 0000:00:01.0: PCI bridge to [bus 01]
pci 0000:00:01.0:   bridge window [mem 0xe0000000-0xebffffff]
PCI: enabling device 0000:00:01.0 (0140 -> 0143)

^ permalink raw reply	[flat|nested] 90+ messages in thread

* pci-mvebu driver on km_kirkwood
  2014-02-18 17:29       ` Gerlando Falauto
@ 2014-02-18 20:27         ` Thomas Petazzoni
  2014-02-19  8:38           ` Gerlando Falauto
  0 siblings, 1 reply; 90+ messages in thread
From: Thomas Petazzoni @ 2014-02-18 20:27 UTC (permalink / raw)
  To: linux-arm-kernel

Dear Gerlando Falauto,

On Tue, 18 Feb 2014 18:29:56 +0100, Gerlando Falauto wrote:

> I tried these settings (a long time ago) and everything seemed to work 
> fine. Except, we now have a different problem.
> Essentially, this device requires 128MB for a given BAR to provide a 
> PCI-to-localbus bridge. (another BAR provides the configuration space to 
> configure chip select regions and so on).
> Apparently, only the first 64MB of this BAR seem to work correctly with 
> the new driver. As soon as you exceed that, reads (always?) return 0.
> Other BARs (which are then of course assigned a higher region) seem to 
> work just fine, so it looks like a per-BAR limitation.
> 
> This was not a problem with a 3.0 kernel. Do you have any idea what 
> could be wrong here?
> I'm currently using a 3.10 kernel, where your patches for the pci-mvebu 
> driver were forcibly brought in (without full support for the MBUS 
> description at device tree level though).

[...]

> mvebu-pcie pcie-controller.1: PCIe0.0: link up
> mvebu-pcie pcie-controller.1: PCI host bridge to bus 0000:00
> pci_bus 0000:00: root bus resource [io  0x1000-0xfffff]
> pci_bus 0000:00: root bus resource [mem 0xe0000000-0xefffffff]
> pci_bus 0000:00: root bus resource [bus 00-ff]
> pci 0000:00:01.0: [11ab:7846] type 01 class 0x060400
> PCI: bus0: Fast back to back transfers disabled
> pci 0000:00:01.0: bridge configuration invalid ([bus 00-00]), reconfiguring
> pci 0000:01:00.0: [10ee:0008] type 00 class 0x050000
> pci 0000:01:00.0: reg 10: [mem 0x00000000-0x00000fff]
> pci 0000:01:00.0: reg 14: [mem 0x00000000-0x07ffffff]
> pci 0000:01:00.0: reg 18: [mem 0x00000000-0x00000fff]
> pci 0000:01:00.0: reg 1c: [mem 0x00000000-0x007fffff]
> pci 0000:01:00.0: reg 20: [mem 0x00000000-0x00001fff]
> pci 0000:01:00.0: reg 24: [mem 0x00000000-0x00000fff]
> pci 0000:01:00.0: supports D1 D2
> pci 0000:01:00.0: PME# supported from D0 D1 D2 D3hot
> PCI: bus1: Fast back to back transfers disabled
> pci_bus 0000:01: busn_res: [bus 01-ff] end is updated to 01
> pci 0000:00:01.0: BAR 8: assigned [mem 0xe0000000-0xebffffff]
> pci 0000:01:00.0: BAR 1: assigned [mem 0xe0000000-0xe7ffffff]

So I guess this one is the 128 MB BAR, right?

> pci 0000:01:00.0: BAR 3: assigned [mem 0xe8000000-0xe87fffff]
> pci 0000:01:00.0: BAR 4: assigned [mem 0xe8800000-0xe8801fff]
> pci 0000:01:00.0: BAR 0: assigned [mem 0xe8802000-0xe8802fff]
> pci 0000:01:00.0: BAR 2: assigned [mem 0xe8803000-0xe8803fff]
> pci 0000:01:00.0: BAR 5: assigned [mem 0xe8804000-0xe8804fff]

So in total, for the device 0000:01:00, the memory region should go
from 0xe0000000 to 0xe8804fff. This means that a 256 MB window is
needed for this device, because only power of two sizes are possible
for MBus windows.

Can you give me the output of /sys/kernel/debug/mvebu-mbus/devices ? It
will tell us how the MBus windows are configured, as I suspect the
problem might be here.

Thanks!

Thomas
-- 
Thomas Petazzoni, CTO, Free Electrons
Embedded Linux, Kernel and Android engineering
http://free-electrons.com

^ permalink raw reply	[flat|nested] 90+ messages in thread

* pci-mvebu driver on km_kirkwood
  2014-02-18 20:27         ` Thomas Petazzoni
@ 2014-02-19  8:38           ` Gerlando Falauto
  2014-02-19  9:26             ` Thomas Petazzoni
  0 siblings, 1 reply; 90+ messages in thread
From: Gerlando Falauto @ 2014-02-19  8:38 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Thomas,

first of all thank you for your invaluable help!

On 02/18/2014 09:27 PM, Thomas Petazzoni wrote:
> Dear Gerlando Falauto,
>
> On Tue, 18 Feb 2014 18:29:56 +0100, Gerlando Falauto wrote:
>
>> I tried these settings (a long time ago) and everything seemed to work
>> fine. Except, we now have a different problem.
>> Essentially, this device requires 128MB for a given BAR to provide a
>> PCI-to-localbus bridge. (another BAR provides the configuration space to
>> configure chip select regions and so on).
>> Apparently, only the first 64MB of this BAR seem to work correctly with
>> the new driver. As soon as you exceed that, reads (always?) return 0.
>> Other BARs (which are then of course assigned a higher region) seem to
>> work just fine, so it looks like a per-BAR limitation.
>>
>> This was not a problem with a 3.0 kernel. Do you have any idea what
>> could be wrong here?
>> I'm currently using a 3.10 kernel, where your patches for the pci-mvebu
>> driver were forcibly brought in (without full support for the MBUS
>> description at device tree level though).
>
> [...]
>
>> mvebu-pcie pcie-controller.1: PCIe0.0: link up
>> mvebu-pcie pcie-controller.1: PCI host bridge to bus 0000:00
>> pci_bus 0000:00: root bus resource [io  0x1000-0xfffff]
>> pci_bus 0000:00: root bus resource [mem 0xe0000000-0xefffffff]
>> pci_bus 0000:00: root bus resource [bus 00-ff]
>> pci 0000:00:01.0: [11ab:7846] type 01 class 0x060400
>> PCI: bus0: Fast back to back transfers disabled
>> pci 0000:00:01.0: bridge configuration invalid ([bus 00-00]), reconfiguring
>> pci 0000:01:00.0: [10ee:0008] type 00 class 0x050000
>> pci 0000:01:00.0: reg 10: [mem 0x00000000-0x00000fff]
>> pci 0000:01:00.0: reg 14: [mem 0x00000000-0x07ffffff]
>> pci 0000:01:00.0: reg 18: [mem 0x00000000-0x00000fff]
>> pci 0000:01:00.0: reg 1c: [mem 0x00000000-0x007fffff]
>> pci 0000:01:00.0: reg 20: [mem 0x00000000-0x00001fff]
>> pci 0000:01:00.0: reg 24: [mem 0x00000000-0x00000fff]
>> pci 0000:01:00.0: supports D1 D2
>> pci 0000:01:00.0: PME# supported from D0 D1 D2 D3hot
>> PCI: bus1: Fast back to back transfers disabled
>> pci_bus 0000:01: busn_res: [bus 01-ff] end is updated to 01
>> pci 0000:00:01.0: BAR 8: assigned [mem 0xe0000000-0xebffffff]
>> pci 0000:01:00.0: BAR 1: assigned [mem 0xe0000000-0xe7ffffff]
>
> So I guess this one is the 128 MB BAR, right?

That's correct.

>
>> pci 0000:01:00.0: BAR 3: assigned [mem 0xe8000000-0xe87fffff]
>> pci 0000:01:00.0: BAR 4: assigned [mem 0xe8800000-0xe8801fff]
>> pci 0000:01:00.0: BAR 0: assigned [mem 0xe8802000-0xe8802fff]
>> pci 0000:01:00.0: BAR 2: assigned [mem 0xe8803000-0xe8803fff]
>> pci 0000:01:00.0: BAR 5: assigned [mem 0xe8804000-0xe8804fff]
>
> So in total, for the device 0000:01:00, the memory region should go
> from 0xe0000000 to 0xe8804fff. This means that a 256 MB window is
> needed for this device, because only power of two sizes are possible
> for MBus windows.
>
> Can you give me the output of /sys/kernel/debug/mvebu-mbus/devices ? It
> will tell us how the MBus windows are configured, as I suspect the
> problem might be here.

Here it goes:

[00] disabled
[01] disabled
[02] disabled
[03] disabled
[04] 00000000ff000000 - 00000000ff010000 : nand
[05] 00000000f4000000 - 00000000f8000000 : vpcie
[06] 00000000fe000000 - 00000000fe010000 : dragonite
[07] 00000000e0000000 - 00000000ec000000 : pcie0.0

So there's something wrong: a 256MB window should go all the way up to 
0xf0000000, and we have 192MB instead and I don't know how that would be 
interpreted.
I couldn't figure out where this range comes from though, as in the 
device tree I now have a size of 256MB (I stupidly set it to 192MB at 
some point, but I now changed it):

# hexdump -C /proc/device-tree/ocp at f1000000/pcie-controller/ranges
  | cut -c1-58
00000000  82 00 00 00 00 00 00 00  00 04 00 00 00 04 00 00
00000010  00 00 00 00 00 00 20 00  82 00 00 00 00 00 00 00
00000020  e0 00 00 00 e0 00 00 00  00 00 00 00 10 00 00 00
                                                ^^^^^^^^^^^
00000030  81 00 00 00 00 00 00 00  00 00 00 00 f0 00 00 00
00000040  00 00 00 00 00 10 00 00
00000048

But apart from that, what I still don't understand is how that could 
have anything to do with my problem. The memory area I'm not able to 
access starts at 0xe4000000.
BAR0, on the other hand, spawns 0xe8802000-0xe8802fff and seems to work 
fine.

Any ideas?

Thanks a lot!
Gerlando

^ permalink raw reply	[flat|nested] 90+ messages in thread

* pci-mvebu driver on km_kirkwood
  2014-02-19  8:38           ` Gerlando Falauto
@ 2014-02-19  9:26             ` Thomas Petazzoni
  2014-02-19  9:39               ` Gerlando Falauto
  0 siblings, 1 reply; 90+ messages in thread
From: Thomas Petazzoni @ 2014-02-19  9:26 UTC (permalink / raw)
  To: linux-arm-kernel

Dear Gerlando Falauto,

On Wed, 19 Feb 2014 09:38:48 +0100, Gerlando Falauto wrote:

> >> pci 0000:01:00.0: BAR 3: assigned [mem 0xe8000000-0xe87fffff]
> >> pci 0000:01:00.0: BAR 4: assigned [mem 0xe8800000-0xe8801fff]
> >> pci 0000:01:00.0: BAR 0: assigned [mem 0xe8802000-0xe8802fff]
> >> pci 0000:01:00.0: BAR 2: assigned [mem 0xe8803000-0xe8803fff]
> >> pci 0000:01:00.0: BAR 5: assigned [mem 0xe8804000-0xe8804fff]
> >
> > So in total, for the device 0000:01:00, the memory region should go
> > from 0xe0000000 to 0xe8804fff. This means that a 256 MB window is
> > needed for this device, because only power of two sizes are possible
> > for MBus windows.
> >
> > Can you give me the output of /sys/kernel/debug/mvebu-mbus/devices ? It
> > will tell us how the MBus windows are configured, as I suspect the
> > problem might be here.
> 
> Here it goes:
> 
> [00] disabled
> [01] disabled
> [02] disabled
> [03] disabled
> [04] 00000000ff000000 - 00000000ff010000 : nand
> [05] 00000000f4000000 - 00000000f8000000 : vpcie
> [06] 00000000fe000000 - 00000000fe010000 : dragonite
> [07] 00000000e0000000 - 00000000ec000000 : pcie0.0
> 
> So there's something wrong: a 256MB window should go all the way up to 
> 0xf0000000, and we have 192MB instead and I don't know how that would be 
> interpreted.

My understanding is that a 192 MB window is illegal, because the window
size should be encoded as a sequence of 1s followed by a sequence of 0s
from the LSB to the MSB. To me, this means that only power of two
window sizes are possible.

> I couldn't figure out where this range comes from though, as in the 
> device tree I now have a size of 256MB (I stupidly set it to 192MB at 
> some point, but I now changed it):
> 
> # hexdump -C /proc/device-tree/ocp at f1000000/pcie-controller/ranges
>   | cut -c1-58
> 00000000  82 00 00 00 00 00 00 00  00 04 00 00 00 04 00 00
> 00000010  00 00 00 00 00 00 20 00  82 00 00 00 00 00 00 00
> 00000020  e0 00 00 00 e0 00 00 00  00 00 00 00 10 00 00 00
>                                                 ^^^^^^^^^^^

Wow, that's an old DT representation that you have here :)

But ok, let me try to explain. The 256 MB value that you define in the
DT is the global PCIe memory aperture: it is the maximum amount of
memory that we allow the PCIe driver to allocate for PCIe windows. But
depending on which PCIe devices you have plugged in, and how large
their BARs are, not necessarily all of these 256 MB will be used.

So, you can very well have a 256 MB global PCIe memory aperture, and
still have only one 1 MB PCIe memory window for PCIe 0.0 and a 256 KB
PCIe memory window for PCIe 1.0, and that's it.

Now, the 192 MB comes from the enumeration of your device. Linux
enumerates the BAR of your device:

pci 0000:01:00.0: BAR 1: assigned [mem 0xe0000000-0xe7ffffff]
pci 0000:01:00.0: BAR 3: assigned [mem 0xe8000000-0xe87fffff]
pci 0000:01:00.0: BAR 4: assigned [mem 0xe8800000-0xe8801fff]
pci 0000:01:00.0: BAR 0: assigned [mem 0xe8802000-0xe8802fff]
pci 0000:01:00.0: BAR 2: assigned [mem 0xe8803000-0xe8803fff]
pci 0000:01:00.0: BAR 5: assigned [mem 0xe8804000-0xe8804fff]

and then concludes that at the emulated bridge level, the memory region
to be created is:

pci 0000:00:01.0:   bridge window [mem 0xe0000000-0xebffffff]

which corresponds to the 192 MB window that we see created.

But I believe a 192 MB memory window cannot work with MBus, it should
be rounded up to the next power of 2. Can you try the below patch (not
tested, not even compiled, might need some tweaks to apply to your 3.10
kernel) :

diff --git a/drivers/pci/host/pci-mvebu.c b/drivers/pci/host/pci-mvebu.c
index 13478ec..002229a 100644
--- a/drivers/pci/host/pci-mvebu.c
+++ b/drivers/pci/host/pci-mvebu.c
@@ -372,6 +372,11 @@ static void mvebu_pcie_handle_membase_change(struct mvebu_pcie_port *port)
                (((port->bridge.memlimit & 0xFFF0) << 16) | 0xFFFFF) -
                port->memwin_base;
 
+       pr_info("PCIE %d.%d: creating window at 0x%x, size 0x%x rounded up to 0x%x\n",
+               port->port, port->lane, port->memwin_base,
+               port->memwin_size, roundup_pow_of_two(port->memwin_size));
+       port->memwin_size = roundup_pow_of_two(port->memwin_size);
+
        mvebu_mbus_add_window_by_id(port->mem_target, port->mem_attr,
                                    port->memwin_base, port->memwin_size);
 }

I'm obviously interested in seeing the message that gets shown, as well
as the new mvebu-mbus debugfs output.

For good measure, if you could also dump the registers of the PCIe
window. In your case, it was window 7, so dumping 0xf1020070 and
0xf1020074 would be useful.

> But apart from that, what I still don't understand is how that could 
> have anything to do with my problem. The memory area I'm not able to 
> access starts at 0xe4000000.
> BAR0, on the other hand, spawns 0xe8802000-0xe8802fff and seems to work 
> fine.

I am not sure, but since we are configuring an invalid memory size,
maybe the MBus behavior is undefined, and we get some completely funky
behavior, where parts of the 192 MB window are actually work, but parts
of it are not.

Thomas
-- 
Thomas Petazzoni, CTO, Free Electrons
Embedded Linux, Kernel and Android engineering
http://free-electrons.com

^ permalink raw reply related	[flat|nested] 90+ messages in thread

* pci-mvebu driver on km_kirkwood
  2014-02-19  9:26             ` Thomas Petazzoni
@ 2014-02-19  9:39               ` Gerlando Falauto
  2014-02-19 13:37                   ` Thomas Petazzoni
  0 siblings, 1 reply; 90+ messages in thread
From: Gerlando Falauto @ 2014-02-19  9:39 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Thomas,

spoiler first: SUCCESS!!!!

On 02/19/2014 10:26 AM, Thomas Petazzoni wrote:
> Dear Gerlando Falauto,
[...]

>>
>> # hexdump -C /proc/device-tree/ocp at f1000000/pcie-controller/ranges
>>    | cut -c1-58
>> 00000000  82 00 00 00 00 00 00 00  00 04 00 00 00 04 00 00
>> 00000010  00 00 00 00 00 00 20 00  82 00 00 00 00 00 00 00
>> 00000020  e0 00 00 00 e0 00 00 00  00 00 00 00 10 00 00 00
>>                                                  ^^^^^^^^^^^
>
> Wow, that's an old DT representation that you have here :)

Indeed... ;-)

> But ok, let me try to explain. The 256 MB value that you define in the
> DT is the global PCIe memory aperture: it is the maximum amount of
> memory that we allow the PCIe driver to allocate for PCIe windows. But
> depending on which PCIe devices you have plugged in, and how large
> their BARs are, not necessarily all of these 256 MB will be used.
>
> So, you can very well have a 256 MB global PCIe memory aperture, and
> still have only one 1 MB PCIe memory window for PCIe 0.0 and a 256 KB
> PCIe memory window for PCIe 1.0, and that's it.
>
> Now, the 192 MB comes from the enumeration of your device. Linux
> enumerates the BAR of your device:
>
> pci 0000:01:00.0: BAR 1: assigned [mem 0xe0000000-0xe7ffffff]
> pci 0000:01:00.0: BAR 3: assigned [mem 0xe8000000-0xe87fffff]
> pci 0000:01:00.0: BAR 4: assigned [mem 0xe8800000-0xe8801fff]
> pci 0000:01:00.0: BAR 0: assigned [mem 0xe8802000-0xe8802fff]
> pci 0000:01:00.0: BAR 2: assigned [mem 0xe8803000-0xe8803fff]
> pci 0000:01:00.0: BAR 5: assigned [mem 0xe8804000-0xe8804fff]
>
> and then concludes that at the emulated bridge level, the memory region
> to be created is:
>
> pci 0000:00:01.0:   bridge window [mem 0xe0000000-0xebffffff]
>
> which corresponds to the 192 MB window that we see created.
>
> But I believe a 192 MB memory window cannot work with MBus, it should
> be rounded up to the next power of 2. Can you try the below patch (not
> tested, not even compiled, might need some tweaks to apply to your 3.10
> kernel) :
>
> diff --git a/drivers/pci/host/pci-mvebu.c b/drivers/pci/host/pci-mvebu.c
> index 13478ec..002229a 100644
> --- a/drivers/pci/host/pci-mvebu.c
> +++ b/drivers/pci/host/pci-mvebu.c
> @@ -372,6 +372,11 @@ static void mvebu_pcie_handle_membase_change(struct mvebu_pcie_port *port)
>                  (((port->bridge.memlimit & 0xFFF0) << 16) | 0xFFFFF) -
>                  port->memwin_base;
>
> +       pr_info("PCIE %d.%d: creating window at 0x%x, size 0x%x rounded up to 0x%x\n",
> +               port->port, port->lane, port->memwin_base,
> +               port->memwin_size, roundup_pow_of_two(port->memwin_size));
> +       port->memwin_size = roundup_pow_of_two(port->memwin_size);
> +
>          mvebu_mbus_add_window_by_id(port->mem_target, port->mem_attr,
>                                      port->memwin_base, port->memwin_size);
>   }
>
> I'm obviously interested in seeing the message that gets shown, as well
> as the new mvebu-mbus debugfs output.

----------
pci 0000:00:01.0:   bridge window [mem 0xe0000000-0xebffffff]
PCIE 0.0: creating window at 0xe0000000, size 0xbffffff rounded up to 
0x10000000
----------

  cat /sys/kernel/debug/mvebu-mbus/devices
[00] disabled
[01] disabled
[02] disabled
[03] disabled
[04] 00000000ff000000 - 00000000ff010000 : nand
[05] 00000000f4000000 - 00000000f8000000 : vpcie
[06] 00000000fe000000 - 00000000fe010000 : dragonite
[07] 00000000e0000000 - 00000000f0000000 : pcie0.0

> For good measure, if you could also dump the registers of the PCIe
> window. In your case, it was window 7, so dumping 0xf1020070 and
> 0xf1020074 would be useful.

Isn't that where the output of debugfs comes from?

>> But apart from that, what I still don't understand is how that could
>> have anything to do with my problem. The memory area I'm not able to
>> access starts at 0xe4000000.
>> BAR0, on the other hand, spawns 0xe8802000-0xe8802fff and seems to work
>> fine.
>
> I am not sure, but since we are configuring an invalid memory size,
> maybe the MBus behavior is undefined, and we get some completely funky
> behavior, where parts of the 192 MB window are actually work, but parts
> of it are not.

And... Ladies and gentlemen... it turns out YOU'RE RIGHT!!!
With your patch now everything works fine!!!

No words (or quads, for that matter) can express how grateful I am! ;-)

Thank you so much!!!
Gerlando

>
> Thomas
>

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: pci-mvebu driver on km_kirkwood
  2014-02-19  9:39               ` Gerlando Falauto
@ 2014-02-19 13:37                   ` Thomas Petazzoni
  0 siblings, 0 replies; 90+ messages in thread
From: Thomas Petazzoni @ 2014-02-19 13:37 UTC (permalink / raw)
  To: Gerlando Falauto, Bjorn Helgaas, linux-pci
  Cc: linux-arm-kernel, Andrew Lunn, Sebastian Hesselbarth,
	Jason Cooper, Longchamp, Valentin, Ezequiel Garcia, Lior Amsalem

Gerlando, Bjorn,

Bjorn, I added you as the To: because there is a PCI related question
for you below :)

On Wed, 19 Feb 2014 10:39:07 +0100, Gerlando Falauto wrote:

> spoiler first: SUCCESS!!!!

Awesome :)

> > I'm obviously interested in seeing the message that gets shown, as well
> > as the new mvebu-mbus debugfs output.
> 
> ----------
> pci 0000:00:01.0:   bridge window [mem 0xe0000000-0xebffffff]
> PCIE 0.0: creating window at 0xe0000000, size 0xbffffff rounded up to 
> 0x10000000

Right, rounding from 192 MB to 265 MB.

>   cat /sys/kernel/debug/mvebu-mbus/devices
> [00] disabled
> [01] disabled
> [02] disabled
> [03] disabled
> [04] 00000000ff000000 - 00000000ff010000 : nand
> [05] 00000000f4000000 - 00000000f8000000 : vpcie
> [06] 00000000fe000000 - 00000000fe010000 : dragonite
> [07] 00000000e0000000 - 00000000f0000000 : pcie0.0
> 
> > For good measure, if you could also dump the registers of the PCIe
> > window. In your case, it was window 7, so dumping 0xf1020070 and
> > 0xf1020074 would be useful.
> 
> Isn't that where the output of debugfs comes from?

It is, but the mvebu-mbus is interpreting the sequence of 1s and 0s to
give the real size, and this involves a little bit of magic of bit
manipulation, which I wanted to check by having a look at the raw
values of the registers.

> > I am not sure, but since we are configuring an invalid memory size,
> > maybe the MBus behavior is undefined, and we get some completely funky
> > behavior, where parts of the 192 MB window are actually work, but parts
> > of it are not.
> 
> And... Ladies and gentlemen... it turns out YOU'RE RIGHT!!!
> With your patch now everything works fine!!!
> 
> No words (or quads, for that matter) can express how grateful I am! ;-)

Cool. However, I am not sure my fix is really correct, because is you
had another PCIe device that needed 64 MB of memory space, the PCIe
core would have allocated addresses 0xec000000 -> 0xf0000000 to it,
which would have conflicted with the forced "power of 2 up-rounding"
we've applied on the memory space of the first device.

Therefore, I believe this constraint should be taken into account by
the PCIe core when allocating the different memory regions for each
device.

Bjorn, the mvebu PCIe host driver has the constraint that the I/O and
memory regions associated to each PCIe device of the emulated bridge
have a size that is a power of 2.

I am currently using the ->align_resource() hook to ensure that the
start address of the resource matches certain other constraints, but I
don't see a way of telling the PCI core that I need the resource to
have its size rounded up to the next power of 2 size. Is there a way of
doing this?

In the case described by Gerlando, the PCI core has assigned a 192 MB
region, but the Marvell hardware can only create windows that have a
power of two size, i.e 256 MB. Therefore, the PCI core should be told
this constraint, so that it doesn't allocate the next resource right
after the 192 MB one.

Thanks!

Thomas
-- 
Thomas Petazzoni, CTO, Free Electrons
Embedded Linux, Kernel and Android engineering
http://free-electrons.com

^ permalink raw reply	[flat|nested] 90+ messages in thread

* pci-mvebu driver on km_kirkwood
@ 2014-02-19 13:37                   ` Thomas Petazzoni
  0 siblings, 0 replies; 90+ messages in thread
From: Thomas Petazzoni @ 2014-02-19 13:37 UTC (permalink / raw)
  To: linux-arm-kernel

Gerlando, Bjorn,

Bjorn, I added you as the To: because there is a PCI related question
for you below :)

On Wed, 19 Feb 2014 10:39:07 +0100, Gerlando Falauto wrote:

> spoiler first: SUCCESS!!!!

Awesome :)

> > I'm obviously interested in seeing the message that gets shown, as well
> > as the new mvebu-mbus debugfs output.
> 
> ----------
> pci 0000:00:01.0:   bridge window [mem 0xe0000000-0xebffffff]
> PCIE 0.0: creating window at 0xe0000000, size 0xbffffff rounded up to 
> 0x10000000

Right, rounding from 192 MB to 265 MB.

>   cat /sys/kernel/debug/mvebu-mbus/devices
> [00] disabled
> [01] disabled
> [02] disabled
> [03] disabled
> [04] 00000000ff000000 - 00000000ff010000 : nand
> [05] 00000000f4000000 - 00000000f8000000 : vpcie
> [06] 00000000fe000000 - 00000000fe010000 : dragonite
> [07] 00000000e0000000 - 00000000f0000000 : pcie0.0
> 
> > For good measure, if you could also dump the registers of the PCIe
> > window. In your case, it was window 7, so dumping 0xf1020070 and
> > 0xf1020074 would be useful.
> 
> Isn't that where the output of debugfs comes from?

It is, but the mvebu-mbus is interpreting the sequence of 1s and 0s to
give the real size, and this involves a little bit of magic of bit
manipulation, which I wanted to check by having a look at the raw
values of the registers.

> > I am not sure, but since we are configuring an invalid memory size,
> > maybe the MBus behavior is undefined, and we get some completely funky
> > behavior, where parts of the 192 MB window are actually work, but parts
> > of it are not.
> 
> And... Ladies and gentlemen... it turns out YOU'RE RIGHT!!!
> With your patch now everything works fine!!!
> 
> No words (or quads, for that matter) can express how grateful I am! ;-)

Cool. However, I am not sure my fix is really correct, because is you
had another PCIe device that needed 64 MB of memory space, the PCIe
core would have allocated addresses 0xec000000 -> 0xf0000000 to it,
which would have conflicted with the forced "power of 2 up-rounding"
we've applied on the memory space of the first device.

Therefore, I believe this constraint should be taken into account by
the PCIe core when allocating the different memory regions for each
device.

Bjorn, the mvebu PCIe host driver has the constraint that the I/O and
memory regions associated to each PCIe device of the emulated bridge
have a size that is a power of 2.

I am currently using the ->align_resource() hook to ensure that the
start address of the resource matches certain other constraints, but I
don't see a way of telling the PCI core that I need the resource to
have its size rounded up to the next power of 2 size. Is there a way of
doing this?

In the case described by Gerlando, the PCI core has assigned a 192 MB
region, but the Marvell hardware can only create windows that have a
power of two size, i.e 256 MB. Therefore, the PCI core should be told
this constraint, so that it doesn't allocate the next resource right
after the 192 MB one.

Thanks!

Thomas
-- 
Thomas Petazzoni, CTO, Free Electrons
Embedded Linux, Kernel and Android engineering
http://free-electrons.com

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: pci-mvebu driver on km_kirkwood
  2014-02-19 13:37                   ` Thomas Petazzoni
@ 2014-02-19 21:45                     ` Bjorn Helgaas
  -1 siblings, 0 replies; 90+ messages in thread
From: Bjorn Helgaas @ 2014-02-19 21:45 UTC (permalink / raw)
  To: Thomas Petazzoni
  Cc: Gerlando Falauto, linux-pci, linux-arm-kernel, Andrew Lunn,
	Sebastian Hesselbarth, Jason Cooper, Longchamp, Valentin,
	Ezequiel Garcia, Lior Amsalem

On Wed, Feb 19, 2014 at 6:37 AM, Thomas Petazzoni
<thomas.petazzoni@free-electrons.com> wrote:
> Gerlando, Bjorn,
>
> Bjorn, I added you as the To: because there is a PCI related question
> for you below :)
>
> On Wed, 19 Feb 2014 10:39:07 +0100, Gerlando Falauto wrote:
>
>> spoiler first: SUCCESS!!!!
>
> Awesome :)
>
>> > I'm obviously interested in seeing the message that gets shown, as well
>> > as the new mvebu-mbus debugfs output.
>>
>> ----------
>> pci 0000:00:01.0:   bridge window [mem 0xe0000000-0xebffffff]
>> PCIE 0.0: creating window at 0xe0000000, size 0xbffffff rounded up to
>> 0x10000000
>
> Right, rounding from 192 MB to 265 MB.
>
>>   cat /sys/kernel/debug/mvebu-mbus/devices
>> [00] disabled
>> [01] disabled
>> [02] disabled
>> [03] disabled
>> [04] 00000000ff000000 - 00000000ff010000 : nand
>> [05] 00000000f4000000 - 00000000f8000000 : vpcie
>> [06] 00000000fe000000 - 00000000fe010000 : dragonite
>> [07] 00000000e0000000 - 00000000f0000000 : pcie0.0
>>
>> > For good measure, if you could also dump the registers of the PCIe
>> > window. In your case, it was window 7, so dumping 0xf1020070 and
>> > 0xf1020074 would be useful.
>>
>> Isn't that where the output of debugfs comes from?
>
> It is, but the mvebu-mbus is interpreting the sequence of 1s and 0s to
> give the real size, and this involves a little bit of magic of bit
> manipulation, which I wanted to check by having a look at the raw
> values of the registers.
>
>> > I am not sure, but since we are configuring an invalid memory size,
>> > maybe the MBus behavior is undefined, and we get some completely funky
>> > behavior, where parts of the 192 MB window are actually work, but parts
>> > of it are not.
>>
>> And... Ladies and gentlemen... it turns out YOU'RE RIGHT!!!
>> With your patch now everything works fine!!!
>>
>> No words (or quads, for that matter) can express how grateful I am! ;-)
>
> Cool. However, I am not sure my fix is really correct, because is you
> had another PCIe device that needed 64 MB of memory space, the PCIe
> core would have allocated addresses 0xec000000 -> 0xf0000000 to it,
> which would have conflicted with the forced "power of 2 up-rounding"
> we've applied on the memory space of the first device.
>
> Therefore, I believe this constraint should be taken into account by
> the PCIe core when allocating the different memory regions for each
> device.
>
> Bjorn, the mvebu PCIe host driver has the constraint that the I/O and
> memory regions associated to each PCIe device of the emulated bridge
> have a size that is a power of 2.
>
> I am currently using the ->align_resource() hook to ensure that the
> start address of the resource matches certain other constraints, but I
> don't see a way of telling the PCI core that I need the resource to
> have its size rounded up to the next power of 2 size. Is there a way of
> doing this?
>
> In the case described by Gerlando, the PCI core has assigned a 192 MB
> region, but the Marvell hardware can only create windows that have a
> power of two size, i.e 256 MB. Therefore, the PCI core should be told
> this constraint, so that it doesn't allocate the next resource right
> after the 192 MB one.

I'm not sure I understand this correctly, but I *think* this 192 MB
region that gets rounded up to 256 MB because of the Marvell
constraint is a host bridge aperture.  If that's the case, it's
entirely up to you (the host bridge driver author) to round it as
needed before passing it to pci_add_resource_offset().

The PCI core will never allocate any space that is outside the host
bridge apertures.

But maybe I don't understand your situation well enough.

Bjorn

^ permalink raw reply	[flat|nested] 90+ messages in thread

* pci-mvebu driver on km_kirkwood
@ 2014-02-19 21:45                     ` Bjorn Helgaas
  0 siblings, 0 replies; 90+ messages in thread
From: Bjorn Helgaas @ 2014-02-19 21:45 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, Feb 19, 2014 at 6:37 AM, Thomas Petazzoni
<thomas.petazzoni@free-electrons.com> wrote:
> Gerlando, Bjorn,
>
> Bjorn, I added you as the To: because there is a PCI related question
> for you below :)
>
> On Wed, 19 Feb 2014 10:39:07 +0100, Gerlando Falauto wrote:
>
>> spoiler first: SUCCESS!!!!
>
> Awesome :)
>
>> > I'm obviously interested in seeing the message that gets shown, as well
>> > as the new mvebu-mbus debugfs output.
>>
>> ----------
>> pci 0000:00:01.0:   bridge window [mem 0xe0000000-0xebffffff]
>> PCIE 0.0: creating window at 0xe0000000, size 0xbffffff rounded up to
>> 0x10000000
>
> Right, rounding from 192 MB to 265 MB.
>
>>   cat /sys/kernel/debug/mvebu-mbus/devices
>> [00] disabled
>> [01] disabled
>> [02] disabled
>> [03] disabled
>> [04] 00000000ff000000 - 00000000ff010000 : nand
>> [05] 00000000f4000000 - 00000000f8000000 : vpcie
>> [06] 00000000fe000000 - 00000000fe010000 : dragonite
>> [07] 00000000e0000000 - 00000000f0000000 : pcie0.0
>>
>> > For good measure, if you could also dump the registers of the PCIe
>> > window. In your case, it was window 7, so dumping 0xf1020070 and
>> > 0xf1020074 would be useful.
>>
>> Isn't that where the output of debugfs comes from?
>
> It is, but the mvebu-mbus is interpreting the sequence of 1s and 0s to
> give the real size, and this involves a little bit of magic of bit
> manipulation, which I wanted to check by having a look at the raw
> values of the registers.
>
>> > I am not sure, but since we are configuring an invalid memory size,
>> > maybe the MBus behavior is undefined, and we get some completely funky
>> > behavior, where parts of the 192 MB window are actually work, but parts
>> > of it are not.
>>
>> And... Ladies and gentlemen... it turns out YOU'RE RIGHT!!!
>> With your patch now everything works fine!!!
>>
>> No words (or quads, for that matter) can express how grateful I am! ;-)
>
> Cool. However, I am not sure my fix is really correct, because is you
> had another PCIe device that needed 64 MB of memory space, the PCIe
> core would have allocated addresses 0xec000000 -> 0xf0000000 to it,
> which would have conflicted with the forced "power of 2 up-rounding"
> we've applied on the memory space of the first device.
>
> Therefore, I believe this constraint should be taken into account by
> the PCIe core when allocating the different memory regions for each
> device.
>
> Bjorn, the mvebu PCIe host driver has the constraint that the I/O and
> memory regions associated to each PCIe device of the emulated bridge
> have a size that is a power of 2.
>
> I am currently using the ->align_resource() hook to ensure that the
> start address of the resource matches certain other constraints, but I
> don't see a way of telling the PCI core that I need the resource to
> have its size rounded up to the next power of 2 size. Is there a way of
> doing this?
>
> In the case described by Gerlando, the PCI core has assigned a 192 MB
> region, but the Marvell hardware can only create windows that have a
> power of two size, i.e 256 MB. Therefore, the PCI core should be told
> this constraint, so that it doesn't allocate the next resource right
> after the 192 MB one.

I'm not sure I understand this correctly, but I *think* this 192 MB
region that gets rounded up to 256 MB because of the Marvell
constraint is a host bridge aperture.  If that's the case, it's
entirely up to you (the host bridge driver author) to round it as
needed before passing it to pci_add_resource_offset().

The PCI core will never allocate any space that is outside the host
bridge apertures.

But maybe I don't understand your situation well enough.

Bjorn

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: pci-mvebu driver on km_kirkwood
  2014-02-19 21:45                     ` Bjorn Helgaas
@ 2014-02-20  8:55                       ` Thomas Petazzoni
  -1 siblings, 0 replies; 90+ messages in thread
From: Thomas Petazzoni @ 2014-02-20  8:55 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Gerlando Falauto, linux-pci, linux-arm-kernel, Andrew Lunn,
	Sebastian Hesselbarth, Jason Cooper, Longchamp, Valentin,
	Ezequiel Garcia, Lior Amsalem, Jason Gunthorpe

Dear Bjorn Helgaas,

+ Jason Gunthorpe.

On Wed, 19 Feb 2014 14:45:48 -0700, Bjorn Helgaas wrote:

> > Cool. However, I am not sure my fix is really correct, because is you
> > had another PCIe device that needed 64 MB of memory space, the PCIe
> > core would have allocated addresses 0xec000000 -> 0xf0000000 to it,
> > which would have conflicted with the forced "power of 2 up-rounding"
> > we've applied on the memory space of the first device.
> >
> > Therefore, I believe this constraint should be taken into account by
> > the PCIe core when allocating the different memory regions for each
> > device.
> >
> > Bjorn, the mvebu PCIe host driver has the constraint that the I/O and
> > memory regions associated to each PCIe device of the emulated bridge
> > have a size that is a power of 2.
> >
> > I am currently using the ->align_resource() hook to ensure that the
> > start address of the resource matches certain other constraints, but I
> > don't see a way of telling the PCI core that I need the resource to
> > have its size rounded up to the next power of 2 size. Is there a way of
> > doing this?
> >
> > In the case described by Gerlando, the PCI core has assigned a 192 MB
> > region, but the Marvell hardware can only create windows that have a
> > power of two size, i.e 256 MB. Therefore, the PCI core should be told
> > this constraint, so that it doesn't allocate the next resource right
> > after the 192 MB one.
> 
> I'm not sure I understand this correctly, but I *think* this 192 MB
> region that gets rounded up to 256 MB because of the Marvell
> constraint is a host bridge aperture.  If that's the case, it's
> entirely up to you (the host bridge driver author) to round it as
> needed before passing it to pci_add_resource_offset().
> 
> The PCI core will never allocate any space that is outside the host
> bridge apertures.

Hum, I believe there is a misunderstanding here. We are already using
pci_add_resource_offset() to define the global aperture for the entire
PCI bridge. This is not causing any problem.

Let me give a little bit of background first.

On Marvell hardware, the physical address space layout is configurable,
through the use of "MBus windows". A "MBus window" is defined by a base
address, a size, and a target device. So if the CPU needs to access a
given device (such as PCIe 0.0 for example), then we need to create a
"MBus window" whose size and target device match PCIe 0.0.

Since Armada XP has 10 PCIe interfaces, we cannot just statically
create as many MBus windows as there are PCIe interfaces: it would both
exhaust the number of MBus windows available, and also exhaust the
physical address space, because we would have to create very large
windows, just in case the PCIe device plugged behind this interface
needs large BARs.

So, what the pci-mvebu.c driver does is that it creates an emulated PCI
bridge. This emulated bridge is used to let the Linux PCI core
enumerate the real physical PCI devices behind the bridge, allocate a
range of physical addresses that is available for each of these
devices, and write them to the bridge registers. Since the bridge is
not a real one, but emulated, but trap those writes, and use them to
create the MBus windows that will allow the CPU to actually access the
device, at the base address chosen by the Linux PCI core during the
enumeration process.

However, MBus windows have a certain constraint that they must have a
power of two size, so the Linux PCI core should not write to one of the
bridge PCI_MEMORY_BASE / PCI_MEMORY_LIMIT registers any range of
address whose size is not a power of 2.

Let me take the example of Gerlando:

pci 0000:01:00.0: BAR 1: assigned [mem 0xe0000000-0xe7ffffff]
pci 0000:01:00.0: BAR 3: assigned [mem 0xe8000000-0xe87fffff]
pci 0000:01:00.0: BAR 4: assigned [mem 0xe8800000-0xe8801fff]
pci 0000:01:00.0: BAR 0: assigned [mem 0xe8802000-0xe8802fff]
pci 0000:01:00.0: BAR 2: assigned [mem 0xe8803000-0xe8803fff]
pci 0000:01:00.0: BAR 5: assigned [mem 0xe8804000-0xe8804fff]
pci 0000:00:01.0: PCI bridge to [bus 01]
pci 0000:00:01.0:   bridge window [mem 0xe0000000-0xebffffff]

So, pci 0000:01:00 is the real device, which has a number of BARs of a
certain size. Taking into account all those BARs, the Linux PCI core
decides to assign [mem 0xe0000000-0xebffffff] to the bridge (last line
of the log above). The problem is that [mem 0xe0000000-0xebffffff] is
192 MB, but we would like the Linux PCI core to extend that to 256 MB.

As you can see it is not about the global aperture associated to the
bridge, but about the size of the window associated to each "port" of
the bridge.

Does that make sense? Keep in mind that I'm still not completely
familiar with the PCI terminology, so maybe the above explanation does
not use the right terms.

Thanks for your feedback,

Thomas
-- 
Thomas Petazzoni, CTO, Free Electrons
Embedded Linux, Kernel and Android engineering
http://free-electrons.com

^ permalink raw reply	[flat|nested] 90+ messages in thread

* pci-mvebu driver on km_kirkwood
@ 2014-02-20  8:55                       ` Thomas Petazzoni
  0 siblings, 0 replies; 90+ messages in thread
From: Thomas Petazzoni @ 2014-02-20  8:55 UTC (permalink / raw)
  To: linux-arm-kernel

Dear Bjorn Helgaas,

+ Jason Gunthorpe.

On Wed, 19 Feb 2014 14:45:48 -0700, Bjorn Helgaas wrote:

> > Cool. However, I am not sure my fix is really correct, because is you
> > had another PCIe device that needed 64 MB of memory space, the PCIe
> > core would have allocated addresses 0xec000000 -> 0xf0000000 to it,
> > which would have conflicted with the forced "power of 2 up-rounding"
> > we've applied on the memory space of the first device.
> >
> > Therefore, I believe this constraint should be taken into account by
> > the PCIe core when allocating the different memory regions for each
> > device.
> >
> > Bjorn, the mvebu PCIe host driver has the constraint that the I/O and
> > memory regions associated to each PCIe device of the emulated bridge
> > have a size that is a power of 2.
> >
> > I am currently using the ->align_resource() hook to ensure that the
> > start address of the resource matches certain other constraints, but I
> > don't see a way of telling the PCI core that I need the resource to
> > have its size rounded up to the next power of 2 size. Is there a way of
> > doing this?
> >
> > In the case described by Gerlando, the PCI core has assigned a 192 MB
> > region, but the Marvell hardware can only create windows that have a
> > power of two size, i.e 256 MB. Therefore, the PCI core should be told
> > this constraint, so that it doesn't allocate the next resource right
> > after the 192 MB one.
> 
> I'm not sure I understand this correctly, but I *think* this 192 MB
> region that gets rounded up to 256 MB because of the Marvell
> constraint is a host bridge aperture.  If that's the case, it's
> entirely up to you (the host bridge driver author) to round it as
> needed before passing it to pci_add_resource_offset().
> 
> The PCI core will never allocate any space that is outside the host
> bridge apertures.

Hum, I believe there is a misunderstanding here. We are already using
pci_add_resource_offset() to define the global aperture for the entire
PCI bridge. This is not causing any problem.

Let me give a little bit of background first.

On Marvell hardware, the physical address space layout is configurable,
through the use of "MBus windows". A "MBus window" is defined by a base
address, a size, and a target device. So if the CPU needs to access a
given device (such as PCIe 0.0 for example), then we need to create a
"MBus window" whose size and target device match PCIe 0.0.

Since Armada XP has 10 PCIe interfaces, we cannot just statically
create as many MBus windows as there are PCIe interfaces: it would both
exhaust the number of MBus windows available, and also exhaust the
physical address space, because we would have to create very large
windows, just in case the PCIe device plugged behind this interface
needs large BARs.

So, what the pci-mvebu.c driver does is that it creates an emulated PCI
bridge. This emulated bridge is used to let the Linux PCI core
enumerate the real physical PCI devices behind the bridge, allocate a
range of physical addresses that is available for each of these
devices, and write them to the bridge registers. Since the bridge is
not a real one, but emulated, but trap those writes, and use them to
create the MBus windows that will allow the CPU to actually access the
device, at the base address chosen by the Linux PCI core during the
enumeration process.

However, MBus windows have a certain constraint that they must have a
power of two size, so the Linux PCI core should not write to one of the
bridge PCI_MEMORY_BASE / PCI_MEMORY_LIMIT registers any range of
address whose size is not a power of 2.

Let me take the example of Gerlando:

pci 0000:01:00.0: BAR 1: assigned [mem 0xe0000000-0xe7ffffff]
pci 0000:01:00.0: BAR 3: assigned [mem 0xe8000000-0xe87fffff]
pci 0000:01:00.0: BAR 4: assigned [mem 0xe8800000-0xe8801fff]
pci 0000:01:00.0: BAR 0: assigned [mem 0xe8802000-0xe8802fff]
pci 0000:01:00.0: BAR 2: assigned [mem 0xe8803000-0xe8803fff]
pci 0000:01:00.0: BAR 5: assigned [mem 0xe8804000-0xe8804fff]
pci 0000:00:01.0: PCI bridge to [bus 01]
pci 0000:00:01.0:   bridge window [mem 0xe0000000-0xebffffff]

So, pci 0000:01:00 is the real device, which has a number of BARs of a
certain size. Taking into account all those BARs, the Linux PCI core
decides to assign [mem 0xe0000000-0xebffffff] to the bridge (last line
of the log above). The problem is that [mem 0xe0000000-0xebffffff] is
192 MB, but we would like the Linux PCI core to extend that to 256 MB.

As you can see it is not about the global aperture associated to the
bridge, but about the size of the window associated to each "port" of
the bridge.

Does that make sense? Keep in mind that I'm still not completely
familiar with the PCI terminology, so maybe the above explanation does
not use the right terms.

Thanks for your feedback,

Thomas
-- 
Thomas Petazzoni, CTO, Free Electrons
Embedded Linux, Kernel and Android engineering
http://free-electrons.com

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: pci-mvebu driver on km_kirkwood
  2014-02-20  8:55                       ` Thomas Petazzoni
@ 2014-02-20 17:35                         ` Jason Gunthorpe
  -1 siblings, 0 replies; 90+ messages in thread
From: Jason Gunthorpe @ 2014-02-20 17:35 UTC (permalink / raw)
  To: Thomas Petazzoni
  Cc: Bjorn Helgaas, Gerlando Falauto, linux-pci, linux-arm-kernel,
	Andrew Lunn, Sebastian Hesselbarth, Jason Cooper, Longchamp,
	Valentin, Ezequiel Garcia, Lior Amsalem

On Thu, Feb 20, 2014 at 09:55:18AM +0100, Thomas Petazzoni wrote:

> Does that make sense? Keep in mind that I'm still not completely
> familiar with the PCI terminology, so maybe the above explanation does
> not use the right terms.

Stated another way, the Marvel PCI-E to PCI-E bridge config space has
a quirk that requires the window BARs to be aligned on their size and
sized to a power of 2.

The first requirement is already being handled by hooking through
ARM's 'align_resource' callback.

One avenue would be to have mvebu_pcie_align_resource return a struct
resource and manipulate the size as well. Assuming the PCI core will
accommodate that.

Jason

^ permalink raw reply	[flat|nested] 90+ messages in thread

* pci-mvebu driver on km_kirkwood
@ 2014-02-20 17:35                         ` Jason Gunthorpe
  0 siblings, 0 replies; 90+ messages in thread
From: Jason Gunthorpe @ 2014-02-20 17:35 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, Feb 20, 2014 at 09:55:18AM +0100, Thomas Petazzoni wrote:

> Does that make sense? Keep in mind that I'm still not completely
> familiar with the PCI terminology, so maybe the above explanation does
> not use the right terms.

Stated another way, the Marvel PCI-E to PCI-E bridge config space has
a quirk that requires the window BARs to be aligned on their size and
sized to a power of 2.

The first requirement is already being handled by hooking through
ARM's 'align_resource' callback.

One avenue would be to have mvebu_pcie_align_resource return a struct
resource and manipulate the size as well. Assuming the PCI core will
accommodate that.

Jason

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: pci-mvebu driver on km_kirkwood
  2014-02-20  8:55                       ` Thomas Petazzoni
@ 2014-02-20 19:18                         ` Bjorn Helgaas
  -1 siblings, 0 replies; 90+ messages in thread
From: Bjorn Helgaas @ 2014-02-20 19:18 UTC (permalink / raw)
  To: Thomas Petazzoni
  Cc: Gerlando Falauto, linux-pci, linux-arm-kernel, Andrew Lunn,
	Sebastian Hesselbarth, Jason Cooper, Longchamp, Valentin,
	Ezequiel Garcia, Lior Amsalem, Jason Gunthorpe

On Thu, Feb 20, 2014 at 1:55 AM, Thomas Petazzoni
<thomas.petazzoni@free-electrons.com> wrote:
> Dear Bjorn Helgaas,
>
> + Jason Gunthorpe.
>
> On Wed, 19 Feb 2014 14:45:48 -0700, Bjorn Helgaas wrote:
>
>> > Cool. However, I am not sure my fix is really correct, because is you
>> > had another PCIe device that needed 64 MB of memory space, the PCIe
>> > core would have allocated addresses 0xec000000 -> 0xf0000000 to it,
>> > which would have conflicted with the forced "power of 2 up-rounding"
>> > we've applied on the memory space of the first device.
>> >
>> > Therefore, I believe this constraint should be taken into account by
>> > the PCIe core when allocating the different memory regions for each
>> > device.
>> >
>> > Bjorn, the mvebu PCIe host driver has the constraint that the I/O and
>> > memory regions associated to each PCIe device of the emulated bridge
>> > have a size that is a power of 2.
>> >
>> > I am currently using the ->align_resource() hook to ensure that the
>> > start address of the resource matches certain other constraints, but I
>> > don't see a way of telling the PCI core that I need the resource to
>> > have its size rounded up to the next power of 2 size. Is there a way of
>> > doing this?
>> >
>> > In the case described by Gerlando, the PCI core has assigned a 192 MB
>> > region, but the Marvell hardware can only create windows that have a
>> > power of two size, i.e 256 MB. Therefore, the PCI core should be told
>> > this constraint, so that it doesn't allocate the next resource right
>> > after the 192 MB one.
>>
>> I'm not sure I understand this correctly, but I *think* this 192 MB
>> region that gets rounded up to 256 MB because of the Marvell
>> constraint is a host bridge aperture.  If that's the case, it's
>> entirely up to you (the host bridge driver author) to round it as
>> needed before passing it to pci_add_resource_offset().
>>
>> The PCI core will never allocate any space that is outside the host
>> bridge apertures.
>
> Hum, I believe there is a misunderstanding here. We are already using
> pci_add_resource_offset() to define the global aperture for the entire
> PCI bridge. This is not causing any problem.
>
> Let me give a little bit of background first.
>
> On Marvell hardware, the physical address space layout is configurable,
> through the use of "MBus windows". A "MBus window" is defined by a base
> address, a size, and a target device. So if the CPU needs to access a
> given device (such as PCIe 0.0 for example), then we need to create a
> "MBus window" whose size and target device match PCIe 0.0.

I was assuming "PCIe 0.0" was a host bridge, but it sounds like maybe
that's not true.  Is it really a PCIe root port?  That would mean the
MBus windows are some non-PCIe-compliant thing between the root
complex and the root ports, I guess.

> Since Armada XP has 10 PCIe interfaces, we cannot just statically
> create as many MBus windows as there are PCIe interfaces: it would both
> exhaust the number of MBus windows available, and also exhaust the
> physical address space, because we would have to create very large
> windows, just in case the PCIe device plugged behind this interface
> needs large BARs.

Everybody else in the world *does* statically configure host bridge
apertures before enumerating the devices below the bridge.  I see why
you want to know what devices are there before deciding whether and
how large to make an MBus window.  But that is new functionality that
we don't have today, and the general idea is not Marvell-specific, so
other systems might want something like this, too.  So I'm not sure if
using quirks to try to wedge it into the current PCI core is the right
approach.  I don't have another proposal, but we should at least think
about what direction we want to take.

> So, what the pci-mvebu.c driver does is that it creates an emulated PCI
> bridge. This emulated bridge is used to let the Linux PCI core
> enumerate the real physical PCI devices behind the bridge, allocate a
> range of physical addresses that is available for each of these
> devices, and write them to the bridge registers. Since the bridge is
> not a real one, but emulated, but trap those writes, and use them to
> create the MBus windows that will allow the CPU to actually access the
> device, at the base address chosen by the Linux PCI core during the
> enumeration process.
>
> However, MBus windows have a certain constraint that they must have a
> power of two size, so the Linux PCI core should not write to one of the
> bridge PCI_MEMORY_BASE / PCI_MEMORY_LIMIT registers any range of
> address whose size is not a power of 2.

I'm still not sure I understand what's going on here.  It sounds like
your emulated bridge basically wraps the host bridge and makes it look
like a PCI-PCI bridge.  But I assume the host bridge itself is also
visible, and has apertures (I guess these are the MBus windows?)  So
when you first discover the host bridge, before enumerating anything
below it, what apertures does it have?  Do you leave them disabled
until after we enumerate the devices, figure out how much space they
need, and configure the emulated PCI-PCI bridge to enable the MBus
windows?

It'd be nice if dmesg mentioned the host bridge explicitly as we do on
other architectures; maybe that would help understand what's going on
under the covers.  Maybe a longer excerpt would already have this; you
already use pci_add_resource_offset(), which is used when creating the
root bus, so you must have some sort of aperture before enumerating.

> Let me take the example of Gerlando:
>
> pci 0000:01:00.0: BAR 1: assigned [mem 0xe0000000-0xe7ffffff]
> pci 0000:01:00.0: BAR 3: assigned [mem 0xe8000000-0xe87fffff]
> pci 0000:01:00.0: BAR 4: assigned [mem 0xe8800000-0xe8801fff]
> pci 0000:01:00.0: BAR 0: assigned [mem 0xe8802000-0xe8802fff]
> pci 0000:01:00.0: BAR 2: assigned [mem 0xe8803000-0xe8803fff]
> pci 0000:01:00.0: BAR 5: assigned [mem 0xe8804000-0xe8804fff]
> pci 0000:00:01.0: PCI bridge to [bus 01]
> pci 0000:00:01.0:   bridge window [mem 0xe0000000-0xebffffff]
>
> So, pci 0000:01:00 is the real device, which has a number of BARs of a
> certain size. Taking into account all those BARs, the Linux PCI core
> decides to assign [mem 0xe0000000-0xebffffff] to the bridge (last line
> of the log above). The problem is that [mem 0xe0000000-0xebffffff] is
> 192 MB, but we would like the Linux PCI core to extend that to 256 MB.

If 01:00.0 is a PCIe endpoint, it must have a root port above it, so
that means 00:01.0 must be the root port.  But I think you're saying
that 00:01.0 is actually *emulated* and isn't PCIe-compliant, e.g., it
has extra window alignment restrictions.  I'm scared about what other
non-PCIe-compliant things there might be.  What happens when the PCI
core configures MPS, ASPM, etc.,

> As you can see it is not about the global aperture associated to the
> bridge, but about the size of the window associated to each "port" of
> the bridge.

Bjorn

^ permalink raw reply	[flat|nested] 90+ messages in thread

* pci-mvebu driver on km_kirkwood
@ 2014-02-20 19:18                         ` Bjorn Helgaas
  0 siblings, 0 replies; 90+ messages in thread
From: Bjorn Helgaas @ 2014-02-20 19:18 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, Feb 20, 2014 at 1:55 AM, Thomas Petazzoni
<thomas.petazzoni@free-electrons.com> wrote:
> Dear Bjorn Helgaas,
>
> + Jason Gunthorpe.
>
> On Wed, 19 Feb 2014 14:45:48 -0700, Bjorn Helgaas wrote:
>
>> > Cool. However, I am not sure my fix is really correct, because is you
>> > had another PCIe device that needed 64 MB of memory space, the PCIe
>> > core would have allocated addresses 0xec000000 -> 0xf0000000 to it,
>> > which would have conflicted with the forced "power of 2 up-rounding"
>> > we've applied on the memory space of the first device.
>> >
>> > Therefore, I believe this constraint should be taken into account by
>> > the PCIe core when allocating the different memory regions for each
>> > device.
>> >
>> > Bjorn, the mvebu PCIe host driver has the constraint that the I/O and
>> > memory regions associated to each PCIe device of the emulated bridge
>> > have a size that is a power of 2.
>> >
>> > I am currently using the ->align_resource() hook to ensure that the
>> > start address of the resource matches certain other constraints, but I
>> > don't see a way of telling the PCI core that I need the resource to
>> > have its size rounded up to the next power of 2 size. Is there a way of
>> > doing this?
>> >
>> > In the case described by Gerlando, the PCI core has assigned a 192 MB
>> > region, but the Marvell hardware can only create windows that have a
>> > power of two size, i.e 256 MB. Therefore, the PCI core should be told
>> > this constraint, so that it doesn't allocate the next resource right
>> > after the 192 MB one.
>>
>> I'm not sure I understand this correctly, but I *think* this 192 MB
>> region that gets rounded up to 256 MB because of the Marvell
>> constraint is a host bridge aperture.  If that's the case, it's
>> entirely up to you (the host bridge driver author) to round it as
>> needed before passing it to pci_add_resource_offset().
>>
>> The PCI core will never allocate any space that is outside the host
>> bridge apertures.
>
> Hum, I believe there is a misunderstanding here. We are already using
> pci_add_resource_offset() to define the global aperture for the entire
> PCI bridge. This is not causing any problem.
>
> Let me give a little bit of background first.
>
> On Marvell hardware, the physical address space layout is configurable,
> through the use of "MBus windows". A "MBus window" is defined by a base
> address, a size, and a target device. So if the CPU needs to access a
> given device (such as PCIe 0.0 for example), then we need to create a
> "MBus window" whose size and target device match PCIe 0.0.

I was assuming "PCIe 0.0" was a host bridge, but it sounds like maybe
that's not true.  Is it really a PCIe root port?  That would mean the
MBus windows are some non-PCIe-compliant thing between the root
complex and the root ports, I guess.

> Since Armada XP has 10 PCIe interfaces, we cannot just statically
> create as many MBus windows as there are PCIe interfaces: it would both
> exhaust the number of MBus windows available, and also exhaust the
> physical address space, because we would have to create very large
> windows, just in case the PCIe device plugged behind this interface
> needs large BARs.

Everybody else in the world *does* statically configure host bridge
apertures before enumerating the devices below the bridge.  I see why
you want to know what devices are there before deciding whether and
how large to make an MBus window.  But that is new functionality that
we don't have today, and the general idea is not Marvell-specific, so
other systems might want something like this, too.  So I'm not sure if
using quirks to try to wedge it into the current PCI core is the right
approach.  I don't have another proposal, but we should at least think
about what direction we want to take.

> So, what the pci-mvebu.c driver does is that it creates an emulated PCI
> bridge. This emulated bridge is used to let the Linux PCI core
> enumerate the real physical PCI devices behind the bridge, allocate a
> range of physical addresses that is available for each of these
> devices, and write them to the bridge registers. Since the bridge is
> not a real one, but emulated, but trap those writes, and use them to
> create the MBus windows that will allow the CPU to actually access the
> device, at the base address chosen by the Linux PCI core during the
> enumeration process.
>
> However, MBus windows have a certain constraint that they must have a
> power of two size, so the Linux PCI core should not write to one of the
> bridge PCI_MEMORY_BASE / PCI_MEMORY_LIMIT registers any range of
> address whose size is not a power of 2.

I'm still not sure I understand what's going on here.  It sounds like
your emulated bridge basically wraps the host bridge and makes it look
like a PCI-PCI bridge.  But I assume the host bridge itself is also
visible, and has apertures (I guess these are the MBus windows?)  So
when you first discover the host bridge, before enumerating anything
below it, what apertures does it have?  Do you leave them disabled
until after we enumerate the devices, figure out how much space they
need, and configure the emulated PCI-PCI bridge to enable the MBus
windows?

It'd be nice if dmesg mentioned the host bridge explicitly as we do on
other architectures; maybe that would help understand what's going on
under the covers.  Maybe a longer excerpt would already have this; you
already use pci_add_resource_offset(), which is used when creating the
root bus, so you must have some sort of aperture before enumerating.

> Let me take the example of Gerlando:
>
> pci 0000:01:00.0: BAR 1: assigned [mem 0xe0000000-0xe7ffffff]
> pci 0000:01:00.0: BAR 3: assigned [mem 0xe8000000-0xe87fffff]
> pci 0000:01:00.0: BAR 4: assigned [mem 0xe8800000-0xe8801fff]
> pci 0000:01:00.0: BAR 0: assigned [mem 0xe8802000-0xe8802fff]
> pci 0000:01:00.0: BAR 2: assigned [mem 0xe8803000-0xe8803fff]
> pci 0000:01:00.0: BAR 5: assigned [mem 0xe8804000-0xe8804fff]
> pci 0000:00:01.0: PCI bridge to [bus 01]
> pci 0000:00:01.0:   bridge window [mem 0xe0000000-0xebffffff]
>
> So, pci 0000:01:00 is the real device, which has a number of BARs of a
> certain size. Taking into account all those BARs, the Linux PCI core
> decides to assign [mem 0xe0000000-0xebffffff] to the bridge (last line
> of the log above). The problem is that [mem 0xe0000000-0xebffffff] is
> 192 MB, but we would like the Linux PCI core to extend that to 256 MB.

If 01:00.0 is a PCIe endpoint, it must have a root port above it, so
that means 00:01.0 must be the root port.  But I think you're saying
that 00:01.0 is actually *emulated* and isn't PCIe-compliant, e.g., it
has extra window alignment restrictions.  I'm scared about what other
non-PCIe-compliant things there might be.  What happens when the PCI
core configures MPS, ASPM, etc.,

> As you can see it is not about the global aperture associated to the
> bridge, but about the size of the window associated to each "port" of
> the bridge.

Bjorn

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: pci-mvebu driver on km_kirkwood
  2014-02-20 17:35                         ` Jason Gunthorpe
@ 2014-02-20 20:29                           ` Thomas Petazzoni
  -1 siblings, 0 replies; 90+ messages in thread
From: Thomas Petazzoni @ 2014-02-20 20:29 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Bjorn Helgaas, Gerlando Falauto, linux-pci, linux-arm-kernel,
	Andrew Lunn, Sebastian Hesselbarth, Jason Cooper, Longchamp,
	Valentin, Ezequiel Garcia, Lior Amsalem

Dear Jason Gunthorpe,

On Thu, 20 Feb 2014 10:35:18 -0700, Jason Gunthorpe wrote:
> On Thu, Feb 20, 2014 at 09:55:18AM +0100, Thomas Petazzoni wrote:
> 
> > Does that make sense? Keep in mind that I'm still not completely
> > familiar with the PCI terminology, so maybe the above explanation does
> > not use the right terms.
> 
> Stated another way, the Marvel PCI-E to PCI-E bridge config space has
> a quirk that requires the window BARs to be aligned on their size and
> sized to a power of 2.

Correct.

> The first requirement is already being handled by hooking through
> ARM's 'align_resource' callback.

Absolutely.

> One avenue would be to have mvebu_pcie_align_resource return a struct
> resource and manipulate the size as well. Assuming the PCI core will
> accommodate that.

That would effectively be the easiest solution from the point of view
of the PCIe driver.

In practice, the story is a little bit more subtle than that: the PCIe
driver may want to decide to either tell the PCI core to enlarge the
window BAR up to the next power of two size, or to dedicate two windows
to it.

For example:

 * If the PCI core allocates a 96 KB BAR, we clearly want it to be
   enlarged to 128 KB, so that we have to create a single window for it.

 * However, if the PCI core allocates a 192 MB BAR, we may want to
   instead create two windows: a first one of 128 MB and a second one
   of 64 MB. This consumes two windows, but saves 64 MB of physical
   address space.

(Note that I haven't tested myself the creation of two windows for the
same target device, but I was told by Lior that it should work).

As you can see from the two examples above, we may not necessarily want
to enforce this power-of-two constraint in all cases. We may want to
accept a non-power-of-2 size in the case of the 192 MB BAR, and let the
mvebu-mbus driver figure out that it should allocate several
consecutive windows to cover these 192 MB.

But to begin with, rounding up all window BARs up to the next power of
two size would be perfectly OK.

Jason, would you mind maybe replying to Bjorn Helgaas email (Thu, 20
Feb 2014 12:18:42 -0700) ? I believe that a lot of the misunderstanding
between Bjorn and me is due to the fact that I don't use the correct
PCI terminology to describe how the Marvell hardware works, and how the
Marvell PCIe driver copes with it. I'm sure you would explain it in a
way that would be more easily understood by someone very familiar with
the PCI terminology such as Bjorn. Thanks a lot!

Best regards,

Thomas
-- 
Thomas Petazzoni, CTO, Free Electrons
Embedded Linux, Kernel and Android engineering
http://free-electrons.com

^ permalink raw reply	[flat|nested] 90+ messages in thread

* pci-mvebu driver on km_kirkwood
@ 2014-02-20 20:29                           ` Thomas Petazzoni
  0 siblings, 0 replies; 90+ messages in thread
From: Thomas Petazzoni @ 2014-02-20 20:29 UTC (permalink / raw)
  To: linux-arm-kernel

Dear Jason Gunthorpe,

On Thu, 20 Feb 2014 10:35:18 -0700, Jason Gunthorpe wrote:
> On Thu, Feb 20, 2014 at 09:55:18AM +0100, Thomas Petazzoni wrote:
> 
> > Does that make sense? Keep in mind that I'm still not completely
> > familiar with the PCI terminology, so maybe the above explanation does
> > not use the right terms.
> 
> Stated another way, the Marvel PCI-E to PCI-E bridge config space has
> a quirk that requires the window BARs to be aligned on their size and
> sized to a power of 2.

Correct.

> The first requirement is already being handled by hooking through
> ARM's 'align_resource' callback.

Absolutely.

> One avenue would be to have mvebu_pcie_align_resource return a struct
> resource and manipulate the size as well. Assuming the PCI core will
> accommodate that.

That would effectively be the easiest solution from the point of view
of the PCIe driver.

In practice, the story is a little bit more subtle than that: the PCIe
driver may want to decide to either tell the PCI core to enlarge the
window BAR up to the next power of two size, or to dedicate two windows
to it.

For example:

 * If the PCI core allocates a 96 KB BAR, we clearly want it to be
   enlarged to 128 KB, so that we have to create a single window for it.

 * However, if the PCI core allocates a 192 MB BAR, we may want to
   instead create two windows: a first one of 128 MB and a second one
   of 64 MB. This consumes two windows, but saves 64 MB of physical
   address space.

(Note that I haven't tested myself the creation of two windows for the
same target device, but I was told by Lior that it should work).

As you can see from the two examples above, we may not necessarily want
to enforce this power-of-two constraint in all cases. We may want to
accept a non-power-of-2 size in the case of the 192 MB BAR, and let the
mvebu-mbus driver figure out that it should allocate several
consecutive windows to cover these 192 MB.

But to begin with, rounding up all window BARs up to the next power of
two size would be perfectly OK.

Jason, would you mind maybe replying to Bjorn Helgaas email (Thu, 20
Feb 2014 12:18:42 -0700) ? I believe that a lot of the misunderstanding
between Bjorn and me is due to the fact that I don't use the correct
PCI terminology to describe how the Marvell hardware works, and how the
Marvell PCIe driver copes with it. I'm sure you would explain it in a
way that would be more easily understood by someone very familiar with
the PCI terminology such as Bjorn. Thanks a lot!

Best regards,

Thomas
-- 
Thomas Petazzoni, CTO, Free Electrons
Embedded Linux, Kernel and Android engineering
http://free-electrons.com

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: pci-mvebu driver on km_kirkwood
  2014-02-20 19:18                         ` Bjorn Helgaas
@ 2014-02-21  0:24                           ` Jason Gunthorpe
  -1 siblings, 0 replies; 90+ messages in thread
From: Jason Gunthorpe @ 2014-02-21  0:24 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Thomas Petazzoni, Gerlando Falauto, linux-pci, linux-arm-kernel,
	Andrew Lunn, Sebastian Hesselbarth, Jason Cooper, Longchamp,
	Valentin, Ezequiel Garcia, Lior Amsalem

On Thu, Feb 20, 2014 at 12:18:42PM -0700, Bjorn Helgaas wrote:

> > On Marvell hardware, the physical address space layout is configurable,
> > through the use of "MBus windows". A "MBus window" is defined by a base
> > address, a size, and a target device. So if the CPU needs to access a
> > given device (such as PCIe 0.0 for example), then we need to create a
> > "MBus window" whose size and target device match PCIe 0.0.
> 
> I was assuming "PCIe 0.0" was a host bridge, but it sounds like maybe
> that's not true.  Is it really a PCIe root port?  That would mean the
> MBus windows are some non-PCIe-compliant thing between the root
> complex and the root ports, I guess.

It really is a root port. The hardware acts like a root port at the
TLP level. It has all the root port specific stuff in some format but
critically, completely lacks a compliant config space for a root
port bridge.

So the driver creates a 'compliant' config space for the root
port. Building the config space requires harmonizing registers related
to the PCI-E and registers related to internal routing and dealing
with the mismatch between what the hardware can actualy provide and
what the PCI spec requires it provide.

The only mismatch that gets exposed to the PCI core we know about is
the bridge window address alignment restrictions.

This is what Thomas has been asking about.

> > Since Armada XP has 10 PCIe interfaces, we cannot just statically
> > create as many MBus windows as there are PCIe interfaces: it would both
> > exhaust the number of MBus windows available, and also exhaust the
> > physical address space, because we would have to create very large
> > windows, just in case the PCIe device plugged behind this interface
> > needs large BARs.
> 
> Everybody else in the world *does* statically configure host bridge
> apertures before enumerating the devices below the bridge.  

The original PCI-E driver for this hardware did use a 1 root port per
host bridge model, with static host bridge aperture allocation and so
forth.

It works fine, just like everyone else in the world, as long as you
have only 1 or 2 ports. The XP hardware had *10* ports on a single
32 bit machine. You run out of address space, you run out of
HW routing resources, it just doesn't work acceptably.

> I see why you want to know what devices are there before deciding
> whether and how large to make an MBus window.  But that is new
> functionality that we don't have today, and the general idea is not

Well, in general, it isn't new core functionality, it is functionality
that already exists to support PCI bridges.

Choosing to use a one host bridge to N root port bridge model lets the
driver use all that functionality and the only wrinkle that becomes
visible to the PCI core as a whole is the non-compliant alignment
restriction on the bridge window BAR.

This also puts the driver in alignment with the PCI-E specs for root
complexes, which means user space can actually see things like the
PCI-E root port link capability block and makes it hot plug work
properly (I am actively using hot plug with this driver)

I personaly think this is a reasonable way to support this highly
flexible HW.

> I'm still not sure I understand what's going on here.  It sounds like
> your emulated bridge basically wraps the host bridge and makes it look
> like a PCI-PCI bridge.  But I assume the host bridge itself is also
> visible, and has apertures (I guess these are the MBus windows?)  

No, there is only one bridge, it is a per-physical-port MBUS / PCI-E
bridge. It performs an identical function to the root port bridge
described in PCI-E. MBUS serves as the root-complex internal bus 0.

There isn't 2 levels of bridging, so the MBUS / PCI-E bridge can
claim any system address and there is no such thing as a 'host
bridge'.

What Linux calls 'the host bridge aperture' is simply a wack of
otherwise unused physical address space, it has no special properties.

> It'd be nice if dmesg mentioned the host bridge explicitly as we do on
> other architectures; maybe that would help understand what's going on
> under the covers.  Maybe a longer excerpt would already have this; you
> already use pci_add_resource_offset(), which is used when creating the
> root bus, so you must have some sort of aperture before enumerating.

Well, /proc/iomem looks like this:

e0000000-efffffff : PCI MEM 0000
  e0000000-e00fffff : PCI Bus 0000:01
    e0000000-e001ffff : 0000:01:00.0

'PCI MEM 0000' is the 'host bridge aperture' it is an arbitary
range of address space that doesn't overlap anything.

'PCI Bus 0000:01' is the MBUS / PCI-E root port bridge for physical
port 0

'0000:01:00.0' is BAR 0 of an an off-chip device.

> If 01:00.0 is a PCIe endpoint, it must have a root port above it, so
> that means 00:01.0 must be the root port.  But I think you're saying
> that 00:01.0 is actually *emulated* and isn't PCIe-compliant, e.g., it
> has extra window alignment restrictions.  

It is important to understand that the emulation is only of the root
port bridge configuration space. The underlying TLP processing is done
in HW and is compliant.

> I'm scared about what other non-PCIe-compliant things there might
> be.  What happens when the PCI core configures MPS, ASPM, etc.,

As the TLP processing and the underlying PHY are all compliant these
things are all supported in HW.

MPS is supported directly by the HW

ASPM is supported by the HW, as is the entire link capability and
status block.

AER is supported directly by the HW

But here is the thing, without the software emulated config space
there would be no sane way for the Linux PCI core to access these
features. The HW simply does not present them in a way that the core
code can understand without a SW intervention of some kind.

Jason

^ permalink raw reply	[flat|nested] 90+ messages in thread

* pci-mvebu driver on km_kirkwood
@ 2014-02-21  0:24                           ` Jason Gunthorpe
  0 siblings, 0 replies; 90+ messages in thread
From: Jason Gunthorpe @ 2014-02-21  0:24 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, Feb 20, 2014 at 12:18:42PM -0700, Bjorn Helgaas wrote:

> > On Marvell hardware, the physical address space layout is configurable,
> > through the use of "MBus windows". A "MBus window" is defined by a base
> > address, a size, and a target device. So if the CPU needs to access a
> > given device (such as PCIe 0.0 for example), then we need to create a
> > "MBus window" whose size and target device match PCIe 0.0.
> 
> I was assuming "PCIe 0.0" was a host bridge, but it sounds like maybe
> that's not true.  Is it really a PCIe root port?  That would mean the
> MBus windows are some non-PCIe-compliant thing between the root
> complex and the root ports, I guess.

It really is a root port. The hardware acts like a root port at the
TLP level. It has all the root port specific stuff in some format but
critically, completely lacks a compliant config space for a root
port bridge.

So the driver creates a 'compliant' config space for the root
port. Building the config space requires harmonizing registers related
to the PCI-E and registers related to internal routing and dealing
with the mismatch between what the hardware can actualy provide and
what the PCI spec requires it provide.

The only mismatch that gets exposed to the PCI core we know about is
the bridge window address alignment restrictions.

This is what Thomas has been asking about.

> > Since Armada XP has 10 PCIe interfaces, we cannot just statically
> > create as many MBus windows as there are PCIe interfaces: it would both
> > exhaust the number of MBus windows available, and also exhaust the
> > physical address space, because we would have to create very large
> > windows, just in case the PCIe device plugged behind this interface
> > needs large BARs.
> 
> Everybody else in the world *does* statically configure host bridge
> apertures before enumerating the devices below the bridge.  

The original PCI-E driver for this hardware did use a 1 root port per
host bridge model, with static host bridge aperture allocation and so
forth.

It works fine, just like everyone else in the world, as long as you
have only 1 or 2 ports. The XP hardware had *10* ports on a single
32 bit machine. You run out of address space, you run out of
HW routing resources, it just doesn't work acceptably.

> I see why you want to know what devices are there before deciding
> whether and how large to make an MBus window.  But that is new
> functionality that we don't have today, and the general idea is not

Well, in general, it isn't new core functionality, it is functionality
that already exists to support PCI bridges.

Choosing to use a one host bridge to N root port bridge model lets the
driver use all that functionality and the only wrinkle that becomes
visible to the PCI core as a whole is the non-compliant alignment
restriction on the bridge window BAR.

This also puts the driver in alignment with the PCI-E specs for root
complexes, which means user space can actually see things like the
PCI-E root port link capability block and makes it hot plug work
properly (I am actively using hot plug with this driver)

I personaly think this is a reasonable way to support this highly
flexible HW.

> I'm still not sure I understand what's going on here.  It sounds like
> your emulated bridge basically wraps the host bridge and makes it look
> like a PCI-PCI bridge.  But I assume the host bridge itself is also
> visible, and has apertures (I guess these are the MBus windows?)  

No, there is only one bridge, it is a per-physical-port MBUS / PCI-E
bridge. It performs an identical function to the root port bridge
described in PCI-E. MBUS serves as the root-complex internal bus 0.

There isn't 2 levels of bridging, so the MBUS / PCI-E bridge can
claim any system address and there is no such thing as a 'host
bridge'.

What Linux calls 'the host bridge aperture' is simply a wack of
otherwise unused physical address space, it has no special properties.

> It'd be nice if dmesg mentioned the host bridge explicitly as we do on
> other architectures; maybe that would help understand what's going on
> under the covers.  Maybe a longer excerpt would already have this; you
> already use pci_add_resource_offset(), which is used when creating the
> root bus, so you must have some sort of aperture before enumerating.

Well, /proc/iomem looks like this:

e0000000-efffffff : PCI MEM 0000
  e0000000-e00fffff : PCI Bus 0000:01
    e0000000-e001ffff : 0000:01:00.0

'PCI MEM 0000' is the 'host bridge aperture' it is an arbitary
range of address space that doesn't overlap anything.

'PCI Bus 0000:01' is the MBUS / PCI-E root port bridge for physical
port 0

'0000:01:00.0' is BAR 0 of an an off-chip device.

> If 01:00.0 is a PCIe endpoint, it must have a root port above it, so
> that means 00:01.0 must be the root port.  But I think you're saying
> that 00:01.0 is actually *emulated* and isn't PCIe-compliant, e.g., it
> has extra window alignment restrictions.  

It is important to understand that the emulation is only of the root
port bridge configuration space. The underlying TLP processing is done
in HW and is compliant.

> I'm scared about what other non-PCIe-compliant things there might
> be.  What happens when the PCI core configures MPS, ASPM, etc.,

As the TLP processing and the underlying PHY are all compliant these
things are all supported in HW.

MPS is supported directly by the HW

ASPM is supported by the HW, as is the entire link capability and
status block.

AER is supported directly by the HW

But here is the thing, without the software emulated config space
there would be no sane way for the Linux PCI core to access these
features. The HW simply does not present them in a way that the core
code can understand without a SW intervention of some kind.

Jason

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: pci-mvebu driver on km_kirkwood
  2014-02-20 20:29                           ` Thomas Petazzoni
@ 2014-02-21  0:32                             ` Jason Gunthorpe
  -1 siblings, 0 replies; 90+ messages in thread
From: Jason Gunthorpe @ 2014-02-21  0:32 UTC (permalink / raw)
  To: Thomas Petazzoni
  Cc: Bjorn Helgaas, Gerlando Falauto, linux-pci, linux-arm-kernel,
	Andrew Lunn, Sebastian Hesselbarth, Jason Cooper, Longchamp,
	Valentin, Ezequiel Garcia, Lior Amsalem

On Thu, Feb 20, 2014 at 09:29:14PM +0100, Thomas Petazzoni wrote:

> In practice, the story is a little bit more subtle than that: the PCIe
> driver may want to decide to either tell the PCI core to enlarge the
> window BAR up to the next power of two size, or to dedicate two windows
> to it.

That is a smart, easy solution! Maybe that is the least invasive way
to proceed for now?

I have no idea how you decide when to round up and when to allocate
more windows, that feels like a fairly complex optimization problem!

Alternatively, I suspect you can use the PCI quirk mechanism to alter
the resource sizing on a bridge?

> Jason, would you mind maybe replying to Bjorn Helgaas email (Thu, 20
> Feb 2014 12:18:42 -0700) ? I believe that a lot of the misunderstanding
> between Bjorn and me is due to the fact that I don't use the correct
> PCI terminology to describe how the Marvell hardware works, and how the
> Marvell PCIe driver copes with it. I'm sure you would explain it in a
> way that would be more easily understood by someone very familiar with
> the PCI terminology such as Bjorn. Thanks a lot!

Done!

Hope it helps,
Jason

^ permalink raw reply	[flat|nested] 90+ messages in thread

* pci-mvebu driver on km_kirkwood
@ 2014-02-21  0:32                             ` Jason Gunthorpe
  0 siblings, 0 replies; 90+ messages in thread
From: Jason Gunthorpe @ 2014-02-21  0:32 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, Feb 20, 2014 at 09:29:14PM +0100, Thomas Petazzoni wrote:

> In practice, the story is a little bit more subtle than that: the PCIe
> driver may want to decide to either tell the PCI core to enlarge the
> window BAR up to the next power of two size, or to dedicate two windows
> to it.

That is a smart, easy solution! Maybe that is the least invasive way
to proceed for now?

I have no idea how you decide when to round up and when to allocate
more windows, that feels like a fairly complex optimization problem!

Alternatively, I suspect you can use the PCI quirk mechanism to alter
the resource sizing on a bridge?

> Jason, would you mind maybe replying to Bjorn Helgaas email (Thu, 20
> Feb 2014 12:18:42 -0700) ? I believe that a lot of the misunderstanding
> between Bjorn and me is due to the fact that I don't use the correct
> PCI terminology to describe how the Marvell hardware works, and how the
> Marvell PCIe driver copes with it. I'm sure you would explain it in a
> way that would be more easily understood by someone very familiar with
> the PCI terminology such as Bjorn. Thanks a lot!

Done!

Hope it helps,
Jason

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: pci-mvebu driver on km_kirkwood
  2014-02-21  0:32                             ` Jason Gunthorpe
@ 2014-02-21  8:34                               ` Thomas Petazzoni
  -1 siblings, 0 replies; 90+ messages in thread
From: Thomas Petazzoni @ 2014-02-21  8:34 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Bjorn Helgaas, Gerlando Falauto, linux-pci, linux-arm-kernel,
	Andrew Lunn, Sebastian Hesselbarth, Jason Cooper, Longchamp,
	Valentin, Ezequiel Garcia, Lior Amsalem

Dear Jason Gunthorpe,

On Thu, 20 Feb 2014 17:32:27 -0700, Jason Gunthorpe wrote:

> > In practice, the story is a little bit more subtle than that: the PCIe
> > driver may want to decide to either tell the PCI core to enlarge the
> > window BAR up to the next power of two size, or to dedicate two windows
> > to it.
> 
> That is a smart, easy solution! Maybe that is the least invasive way
> to proceed for now?

So you suggest that the mvebu-mbus driver should accept a
non power-of-two window size, and do internally the job of cutting that
into several power-of-two sized areas and creating the corresponding
windows?

> I have no idea how you decide when to round up and when to allocate
> more windows, that feels like a fairly complex optimization problem!

Yes, it is a fairly complex problem. I was thinking of a threshold of
"lost space". Below this threshold, it's better to enlarge the window,
above the threshold it's better to create two windows. But not easy.

> Alternatively, I suspect you can use the PCI quirk mechanism to alter
> the resource sizing on a bridge?

Can you give more details about this mechanism, and how it could be
used to alter the size of resources on a bridge?

> > Jason, would you mind maybe replying to Bjorn Helgaas email (Thu, 20
> > Feb 2014 12:18:42 -0700) ? I believe that a lot of the misunderstanding
> > between Bjorn and me is due to the fact that I don't use the correct
> > PCI terminology to describe how the Marvell hardware works, and how the
> > Marvell PCIe driver copes with it. I'm sure you would explain it in a
> > way that would be more easily understood by someone very familiar with
> > the PCI terminology such as Bjorn. Thanks a lot!
> 
> Done!

Thanks a lot! Really appreciated.

Best regards,

Thomas
-- 
Thomas Petazzoni, CTO, Free Electrons
Embedded Linux, Kernel and Android engineering
http://free-electrons.com

^ permalink raw reply	[flat|nested] 90+ messages in thread

* pci-mvebu driver on km_kirkwood
@ 2014-02-21  8:34                               ` Thomas Petazzoni
  0 siblings, 0 replies; 90+ messages in thread
From: Thomas Petazzoni @ 2014-02-21  8:34 UTC (permalink / raw)
  To: linux-arm-kernel

Dear Jason Gunthorpe,

On Thu, 20 Feb 2014 17:32:27 -0700, Jason Gunthorpe wrote:

> > In practice, the story is a little bit more subtle than that: the PCIe
> > driver may want to decide to either tell the PCI core to enlarge the
> > window BAR up to the next power of two size, or to dedicate two windows
> > to it.
> 
> That is a smart, easy solution! Maybe that is the least invasive way
> to proceed for now?

So you suggest that the mvebu-mbus driver should accept a
non power-of-two window size, and do internally the job of cutting that
into several power-of-two sized areas and creating the corresponding
windows?

> I have no idea how you decide when to round up and when to allocate
> more windows, that feels like a fairly complex optimization problem!

Yes, it is a fairly complex problem. I was thinking of a threshold of
"lost space". Below this threshold, it's better to enlarge the window,
above the threshold it's better to create two windows. But not easy.

> Alternatively, I suspect you can use the PCI quirk mechanism to alter
> the resource sizing on a bridge?

Can you give more details about this mechanism, and how it could be
used to alter the size of resources on a bridge?

> > Jason, would you mind maybe replying to Bjorn Helgaas email (Thu, 20
> > Feb 2014 12:18:42 -0700) ? I believe that a lot of the misunderstanding
> > between Bjorn and me is due to the fact that I don't use the correct
> > PCI terminology to describe how the Marvell hardware works, and how the
> > Marvell PCIe driver copes with it. I'm sure you would explain it in a
> > way that would be more easily understood by someone very familiar with
> > the PCI terminology such as Bjorn. Thanks a lot!
> 
> Done!

Thanks a lot! Really appreciated.

Best regards,

Thomas
-- 
Thomas Petazzoni, CTO, Free Electrons
Embedded Linux, Kernel and Android engineering
http://free-electrons.com

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: pci-mvebu driver on km_kirkwood
  2014-02-21  8:34                               ` Thomas Petazzoni
@ 2014-02-21  8:58                                 ` Gerlando Falauto
  -1 siblings, 0 replies; 90+ messages in thread
From: Gerlando Falauto @ 2014-02-21  8:58 UTC (permalink / raw)
  To: Thomas Petazzoni, Jason Gunthorpe
  Cc: Bjorn Helgaas, linux-pci, linux-arm-kernel, Andrew Lunn,
	Sebastian Hesselbarth, Jason Cooper, Longchamp, Valentin,
	Ezequiel Garcia, Lior Amsalem

Hi guys,

first of all thank you for your support and the explanations.
I'm slowly starting to understand something more about this kind of stuff.

On 02/21/2014 09:34 AM, Thomas Petazzoni wrote:
> Dear Jason Gunthorpe,
>
> On Thu, 20 Feb 2014 17:32:27 -0700, Jason Gunthorpe wrote:
>
>>> In practice, the story is a little bit more subtle than that: the PCIe
>>> driver may want to decide to either tell the PCI core to enlarge the
>>> window BAR up to the next power of two size, or to dedicate two windows
>>> to it.
>>
>> That is a smart, easy solution! Maybe that is the least invasive way
>> to proceed for now?
>
> So you suggest that the mvebu-mbus driver should accept a
> non power-of-two window size, and do internally the job of cutting that
> into several power-of-two sized areas and creating the corresponding
> windows?
>
>> I have no idea how you decide when to round up and when to allocate
>> more windows, that feels like a fairly complex optimization problem!
>
> Yes, it is a fairly complex problem. I was thinking of a threshold of
> "lost space". Below this threshold, it's better to enlarge the window,
> above the threshold it's better to create two windows. But not easy.
>
>> Alternatively, I suspect you can use the PCI quirk mechanism to alter
>> the resource sizing on a bridge?
>
> Can you give more details about this mechanism, and how it could be
> used to alter the size of resources on a bridge?

I'm not sure I understand all the details... but I guess some sort of 
rounding mechanism is indeed already in place somewhere:

pci 0000:00:01.0: BAR 8: assigned [mem 0xe0000000-0xebffffff]
pci 0000:01:00.0: BAR 1: assigned [mem 0xe0000000-0xe7ffffff]
pci 0000:01:00.0: BAR 3: assigned [mem 0xe8000000-0xe87fffff]
pci 0000:01:00.0: BAR 4: assigned [mem 0xe8800000-0xe8801fff]
pci 0000:01:00.0: BAR 0: assigned [mem 0xe8802000-0xe8802fff]
pci 0000:01:00.0: BAR 2: assigned [mem 0xe8803000-0xe8803fff]
pci 0000:01:00.0: BAR 5: assigned [mem 0xe8804000-0xe8804fff]
pci 0000:00:01.0: PCI bridge to [bus 01]

If you look at the numbers, the total size required by BAR0-5 is 
0x8805000, so around 136MB, that is 128MB+8MB+2K+1K+1K.
This gets rounded up (on this 'virtual' BAR 8) to 192MB (I don't know 
where or why), which is 1.5x a power of two (i.e. two consecutive bits 
followed by all zeroes).

If that's not just a coincidence, finding a coverage subset becomes a 
trivial matter (128MB+64MB).

In any case, even if we have an odd number like the above (0x8805000), I 
believe we could easily find a suboptimal coverage by just taking the 
most significant one and the second most significant one (possibly left 
shifted by 1 if there's a third one somewhere else).
In the above case, that would be 0x8000000 + 0x1000000. That's 
128MB+16MB, which is even smaller than the rounding above (192MB).

What do you think?

Thanks again!
Gerlando


^ permalink raw reply	[flat|nested] 90+ messages in thread

* pci-mvebu driver on km_kirkwood
@ 2014-02-21  8:58                                 ` Gerlando Falauto
  0 siblings, 0 replies; 90+ messages in thread
From: Gerlando Falauto @ 2014-02-21  8:58 UTC (permalink / raw)
  To: linux-arm-kernel

Hi guys,

first of all thank you for your support and the explanations.
I'm slowly starting to understand something more about this kind of stuff.

On 02/21/2014 09:34 AM, Thomas Petazzoni wrote:
> Dear Jason Gunthorpe,
>
> On Thu, 20 Feb 2014 17:32:27 -0700, Jason Gunthorpe wrote:
>
>>> In practice, the story is a little bit more subtle than that: the PCIe
>>> driver may want to decide to either tell the PCI core to enlarge the
>>> window BAR up to the next power of two size, or to dedicate two windows
>>> to it.
>>
>> That is a smart, easy solution! Maybe that is the least invasive way
>> to proceed for now?
>
> So you suggest that the mvebu-mbus driver should accept a
> non power-of-two window size, and do internally the job of cutting that
> into several power-of-two sized areas and creating the corresponding
> windows?
>
>> I have no idea how you decide when to round up and when to allocate
>> more windows, that feels like a fairly complex optimization problem!
>
> Yes, it is a fairly complex problem. I was thinking of a threshold of
> "lost space". Below this threshold, it's better to enlarge the window,
> above the threshold it's better to create two windows. But not easy.
>
>> Alternatively, I suspect you can use the PCI quirk mechanism to alter
>> the resource sizing on a bridge?
>
> Can you give more details about this mechanism, and how it could be
> used to alter the size of resources on a bridge?

I'm not sure I understand all the details... but I guess some sort of 
rounding mechanism is indeed already in place somewhere:

pci 0000:00:01.0: BAR 8: assigned [mem 0xe0000000-0xebffffff]
pci 0000:01:00.0: BAR 1: assigned [mem 0xe0000000-0xe7ffffff]
pci 0000:01:00.0: BAR 3: assigned [mem 0xe8000000-0xe87fffff]
pci 0000:01:00.0: BAR 4: assigned [mem 0xe8800000-0xe8801fff]
pci 0000:01:00.0: BAR 0: assigned [mem 0xe8802000-0xe8802fff]
pci 0000:01:00.0: BAR 2: assigned [mem 0xe8803000-0xe8803fff]
pci 0000:01:00.0: BAR 5: assigned [mem 0xe8804000-0xe8804fff]
pci 0000:00:01.0: PCI bridge to [bus 01]

If you look at the numbers, the total size required by BAR0-5 is 
0x8805000, so around 136MB, that is 128MB+8MB+2K+1K+1K.
This gets rounded up (on this 'virtual' BAR 8) to 192MB (I don't know 
where or why), which is 1.5x a power of two (i.e. two consecutive bits 
followed by all zeroes).

If that's not just a coincidence, finding a coverage subset becomes a 
trivial matter (128MB+64MB).

In any case, even if we have an odd number like the above (0x8805000), I 
believe we could easily find a suboptimal coverage by just taking the 
most significant one and the second most significant one (possibly left 
shifted by 1 if there's a third one somewhere else).
In the above case, that would be 0x8000000 + 0x1000000. That's 
128MB+16MB, which is even smaller than the rounding above (192MB).

What do you think?

Thanks again!
Gerlando

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: pci-mvebu driver on km_kirkwood
  2014-02-21  8:58                                 ` Gerlando Falauto
@ 2014-02-21  9:12                                   ` Thomas Petazzoni
  -1 siblings, 0 replies; 90+ messages in thread
From: Thomas Petazzoni @ 2014-02-21  9:12 UTC (permalink / raw)
  To: Gerlando Falauto
  Cc: Jason Gunthorpe, Bjorn Helgaas, linux-pci, linux-arm-kernel,
	Andrew Lunn, Sebastian Hesselbarth, Jason Cooper, Longchamp,
	Valentin, Ezequiel Garcia, Lior Amsalem

Dear Gerlando Falauto,

On Fri, 21 Feb 2014 09:58:21 +0100, Gerlando Falauto wrote:

> > Can you give more details about this mechanism, and how it could be
> > used to alter the size of resources on a bridge?
> 
> I'm not sure I understand all the details... but I guess some sort of 
> rounding mechanism is indeed already in place somewhere:
> 
> pci 0000:00:01.0: BAR 8: assigned [mem 0xe0000000-0xebffffff]
> pci 0000:01:00.0: BAR 1: assigned [mem 0xe0000000-0xe7ffffff]
> pci 0000:01:00.0: BAR 3: assigned [mem 0xe8000000-0xe87fffff]
> pci 0000:01:00.0: BAR 4: assigned [mem 0xe8800000-0xe8801fff]
> pci 0000:01:00.0: BAR 0: assigned [mem 0xe8802000-0xe8802fff]
> pci 0000:01:00.0: BAR 2: assigned [mem 0xe8803000-0xe8803fff]
> pci 0000:01:00.0: BAR 5: assigned [mem 0xe8804000-0xe8804fff]
> pci 0000:00:01.0: PCI bridge to [bus 01]
> 
> If you look at the numbers, the total size required by BAR0-5 is 
> 0x8805000, so around 136MB, that is 128MB+8MB+2K+1K+1K.
> This gets rounded up (on this 'virtual' BAR 8) to 192MB (I don't know 
> where or why), which is 1.5x a power of two (i.e. two consecutive bits 
> followed by all zeroes).

Would indeed be interesting to know who does this rounding, and why,
and according to what rules.

> If that's not just a coincidence, finding a coverage subset becomes a 
> trivial matter (128MB+64MB).
> 
> In any case, even if we have an odd number like the above (0x8805000), I 
> believe we could easily find a suboptimal coverage by just taking the 
> most significant one and the second most significant one (possibly left 
> shifted by 1 if there's a third one somewhere else).
> In the above case, that would be 0x8000000 + 0x1000000. That's 
> 128MB+16MB, which is even smaller than the rounding above (192MB).
> 
> What do you think?

Sure, but whichever choice we make, the Linux PCI core must know by how
much we've enlarge the bridge window BAR, otherwise the Linux PCI core
may allocate for the next bridge window BAR a range of addresses that
doesn't overlap with what it has allocate for the previous bridge
window BAR, but that ends up overlapping due to us "extending" the
previous bridge window BAR to match the MBus requirements.

Gerlando, would you be able to test a quick hack that creates 2 windows
to cover exactly 128 MB + 64 MB ? This would at least allow us to
confirm that the strategy of splitting in multiple windows is usable.

Thanks!

Thomas
-- 
Thomas Petazzoni, CTO, Free Electrons
Embedded Linux, Kernel and Android engineering
http://free-electrons.com

^ permalink raw reply	[flat|nested] 90+ messages in thread

* pci-mvebu driver on km_kirkwood
@ 2014-02-21  9:12                                   ` Thomas Petazzoni
  0 siblings, 0 replies; 90+ messages in thread
From: Thomas Petazzoni @ 2014-02-21  9:12 UTC (permalink / raw)
  To: linux-arm-kernel

Dear Gerlando Falauto,

On Fri, 21 Feb 2014 09:58:21 +0100, Gerlando Falauto wrote:

> > Can you give more details about this mechanism, and how it could be
> > used to alter the size of resources on a bridge?
> 
> I'm not sure I understand all the details... but I guess some sort of 
> rounding mechanism is indeed already in place somewhere:
> 
> pci 0000:00:01.0: BAR 8: assigned [mem 0xe0000000-0xebffffff]
> pci 0000:01:00.0: BAR 1: assigned [mem 0xe0000000-0xe7ffffff]
> pci 0000:01:00.0: BAR 3: assigned [mem 0xe8000000-0xe87fffff]
> pci 0000:01:00.0: BAR 4: assigned [mem 0xe8800000-0xe8801fff]
> pci 0000:01:00.0: BAR 0: assigned [mem 0xe8802000-0xe8802fff]
> pci 0000:01:00.0: BAR 2: assigned [mem 0xe8803000-0xe8803fff]
> pci 0000:01:00.0: BAR 5: assigned [mem 0xe8804000-0xe8804fff]
> pci 0000:00:01.0: PCI bridge to [bus 01]
> 
> If you look at the numbers, the total size required by BAR0-5 is 
> 0x8805000, so around 136MB, that is 128MB+8MB+2K+1K+1K.
> This gets rounded up (on this 'virtual' BAR 8) to 192MB (I don't know 
> where or why), which is 1.5x a power of two (i.e. two consecutive bits 
> followed by all zeroes).

Would indeed be interesting to know who does this rounding, and why,
and according to what rules.

> If that's not just a coincidence, finding a coverage subset becomes a 
> trivial matter (128MB+64MB).
> 
> In any case, even if we have an odd number like the above (0x8805000), I 
> believe we could easily find a suboptimal coverage by just taking the 
> most significant one and the second most significant one (possibly left 
> shifted by 1 if there's a third one somewhere else).
> In the above case, that would be 0x8000000 + 0x1000000. That's 
> 128MB+16MB, which is even smaller than the rounding above (192MB).
> 
> What do you think?

Sure, but whichever choice we make, the Linux PCI core must know by how
much we've enlarge the bridge window BAR, otherwise the Linux PCI core
may allocate for the next bridge window BAR a range of addresses that
doesn't overlap with what it has allocate for the previous bridge
window BAR, but that ends up overlapping due to us "extending" the
previous bridge window BAR to match the MBus requirements.

Gerlando, would you be able to test a quick hack that creates 2 windows
to cover exactly 128 MB + 64 MB ? This would at least allow us to
confirm that the strategy of splitting in multiple windows is usable.

Thanks!

Thomas
-- 
Thomas Petazzoni, CTO, Free Electrons
Embedded Linux, Kernel and Android engineering
http://free-electrons.com

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: pci-mvebu driver on km_kirkwood
  2014-02-21  9:12                                   ` Thomas Petazzoni
@ 2014-02-21  9:16                                     ` Gerlando Falauto
  -1 siblings, 0 replies; 90+ messages in thread
From: Gerlando Falauto @ 2014-02-21  9:16 UTC (permalink / raw)
  To: Thomas Petazzoni
  Cc: Jason Gunthorpe, Bjorn Helgaas, linux-pci, linux-arm-kernel,
	Andrew Lunn, Sebastian Hesselbarth, Jason Cooper, Longchamp,
	Valentin, Ezequiel Garcia, Lior Amsalem

Hi Thomas,

On 02/21/2014 10:12 AM, Thomas Petazzoni wrote:
> Dear Gerlando Falauto,
>
> On Fri, 21 Feb 2014 09:58:21 +0100, Gerlando Falauto wrote:
>
>>> Can you give more details about this mechanism, and how it could be
>>> used to alter the size of resources on a bridge?
>>
>> I'm not sure I understand all the details... but I guess some sort of
>> rounding mechanism is indeed already in place somewhere:
>>
>> pci 0000:00:01.0: BAR 8: assigned [mem 0xe0000000-0xebffffff]
>> pci 0000:01:00.0: BAR 1: assigned [mem 0xe0000000-0xe7ffffff]
>> pci 0000:01:00.0: BAR 3: assigned [mem 0xe8000000-0xe87fffff]
>> pci 0000:01:00.0: BAR 4: assigned [mem 0xe8800000-0xe8801fff]
>> pci 0000:01:00.0: BAR 0: assigned [mem 0xe8802000-0xe8802fff]
>> pci 0000:01:00.0: BAR 2: assigned [mem 0xe8803000-0xe8803fff]
>> pci 0000:01:00.0: BAR 5: assigned [mem 0xe8804000-0xe8804fff]
>> pci 0000:00:01.0: PCI bridge to [bus 01]
>>
>> If you look at the numbers, the total size required by BAR0-5 is
>> 0x8805000, so around 136MB, that is 128MB+8MB+2K+1K+1K.
>> This gets rounded up (on this 'virtual' BAR 8) to 192MB (I don't know
>> where or why), which is 1.5x a power of two (i.e. two consecutive bits
>> followed by all zeroes).
>
> Would indeed be interesting to know who does this rounding, and why,
> and according to what rules.
>
>> If that's not just a coincidence, finding a coverage subset becomes a
>> trivial matter (128MB+64MB).
>>
>> In any case, even if we have an odd number like the above (0x8805000), I
>> believe we could easily find a suboptimal coverage by just taking the
>> most significant one and the second most significant one (possibly left
>> shifted by 1 if there's a third one somewhere else).
>> In the above case, that would be 0x8000000 + 0x1000000. That's
>> 128MB+16MB, which is even smaller than the rounding above (192MB).
>>
>> What do you think?
>
> Sure, but whichever choice we make, the Linux PCI core must know by how
> much we've enlarge the bridge window BAR, otherwise the Linux PCI core
> may allocate for the next bridge window BAR a range of addresses that
> doesn't overlap with what it has allocate for the previous bridge
> window BAR, but that ends up overlapping due to us "extending" the
> previous bridge window BAR to match the MBus requirements.
>
> Gerlando, would you be able to test a quick hack that creates 2 windows
> to cover exactly 128 MB + 64 MB ? This would at least allow us to
> confirm that the strategy of splitting in multiple windows is usable.

Sure, though probably not until next week.
I guess it would then also be useful to restore my previous setup, where 
the total PCIe aperture is 192MB, right?

Thank you guys!
Gerlando

^ permalink raw reply	[flat|nested] 90+ messages in thread

* pci-mvebu driver on km_kirkwood
@ 2014-02-21  9:16                                     ` Gerlando Falauto
  0 siblings, 0 replies; 90+ messages in thread
From: Gerlando Falauto @ 2014-02-21  9:16 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Thomas,

On 02/21/2014 10:12 AM, Thomas Petazzoni wrote:
> Dear Gerlando Falauto,
>
> On Fri, 21 Feb 2014 09:58:21 +0100, Gerlando Falauto wrote:
>
>>> Can you give more details about this mechanism, and how it could be
>>> used to alter the size of resources on a bridge?
>>
>> I'm not sure I understand all the details... but I guess some sort of
>> rounding mechanism is indeed already in place somewhere:
>>
>> pci 0000:00:01.0: BAR 8: assigned [mem 0xe0000000-0xebffffff]
>> pci 0000:01:00.0: BAR 1: assigned [mem 0xe0000000-0xe7ffffff]
>> pci 0000:01:00.0: BAR 3: assigned [mem 0xe8000000-0xe87fffff]
>> pci 0000:01:00.0: BAR 4: assigned [mem 0xe8800000-0xe8801fff]
>> pci 0000:01:00.0: BAR 0: assigned [mem 0xe8802000-0xe8802fff]
>> pci 0000:01:00.0: BAR 2: assigned [mem 0xe8803000-0xe8803fff]
>> pci 0000:01:00.0: BAR 5: assigned [mem 0xe8804000-0xe8804fff]
>> pci 0000:00:01.0: PCI bridge to [bus 01]
>>
>> If you look at the numbers, the total size required by BAR0-5 is
>> 0x8805000, so around 136MB, that is 128MB+8MB+2K+1K+1K.
>> This gets rounded up (on this 'virtual' BAR 8) to 192MB (I don't know
>> where or why), which is 1.5x a power of two (i.e. two consecutive bits
>> followed by all zeroes).
>
> Would indeed be interesting to know who does this rounding, and why,
> and according to what rules.
>
>> If that's not just a coincidence, finding a coverage subset becomes a
>> trivial matter (128MB+64MB).
>>
>> In any case, even if we have an odd number like the above (0x8805000), I
>> believe we could easily find a suboptimal coverage by just taking the
>> most significant one and the second most significant one (possibly left
>> shifted by 1 if there's a third one somewhere else).
>> In the above case, that would be 0x8000000 + 0x1000000. That's
>> 128MB+16MB, which is even smaller than the rounding above (192MB).
>>
>> What do you think?
>
> Sure, but whichever choice we make, the Linux PCI core must know by how
> much we've enlarge the bridge window BAR, otherwise the Linux PCI core
> may allocate for the next bridge window BAR a range of addresses that
> doesn't overlap with what it has allocate for the previous bridge
> window BAR, but that ends up overlapping due to us "extending" the
> previous bridge window BAR to match the MBus requirements.
>
> Gerlando, would you be able to test a quick hack that creates 2 windows
> to cover exactly 128 MB + 64 MB ? This would at least allow us to
> confirm that the strategy of splitting in multiple windows is usable.

Sure, though probably not until next week.
I guess it would then also be useful to restore my previous setup, where 
the total PCIe aperture is 192MB, right?

Thank you guys!
Gerlando

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: pci-mvebu driver on km_kirkwood
  2014-02-21  9:16                                     ` Gerlando Falauto
@ 2014-02-21  9:39                                       ` Thomas Petazzoni
  -1 siblings, 0 replies; 90+ messages in thread
From: Thomas Petazzoni @ 2014-02-21  9:39 UTC (permalink / raw)
  To: Gerlando Falauto
  Cc: Jason Gunthorpe, Bjorn Helgaas, linux-pci, linux-arm-kernel,
	Andrew Lunn, Sebastian Hesselbarth, Jason Cooper, Longchamp,
	Valentin, Ezequiel Garcia, Lior Amsalem, Gregory Clément

Dear Gerlando Falauto,

On Fri, 21 Feb 2014 10:16:32 +0100, Gerlando Falauto wrote:

> > Sure, but whichever choice we make, the Linux PCI core must know by how
> > much we've enlarge the bridge window BAR, otherwise the Linux PCI core
> > may allocate for the next bridge window BAR a range of addresses that
> > doesn't overlap with what it has allocate for the previous bridge
> > window BAR, but that ends up overlapping due to us "extending" the
> > previous bridge window BAR to match the MBus requirements.
> >
> > Gerlando, would you be able to test a quick hack that creates 2 windows
> > to cover exactly 128 MB + 64 MB ? This would at least allow us to
> > confirm that the strategy of splitting in multiple windows is usable.
> 
> Sure, though probably not until next week.

No problem at all.

> I guess it would then also be useful to restore my previous setup, where 
> the total PCIe aperture is 192MB, right?

Yes, that's the case I'm interested in at the moment. If you could try
the above (ugly) patch, and see if you can access all your device BARs,
it would be interesting. It would tell us if two separate windows
having the same target/attribute and consecutive placement in the
physical address space can actually work to address a given PCIe
device. As you will see, the patch makes a very ugly special case for
192 MB :-)

diff --git a/drivers/bus/mvebu-mbus.c b/drivers/bus/mvebu-mbus.c
index 2394e97..f763ecc 100644
--- a/drivers/bus/mvebu-mbus.c
+++ b/drivers/bus/mvebu-mbus.c
@@ -223,11 +223,13 @@ static int mvebu_mbus_window_conflicts(struct mvebu_mbus_state *mbus,
                if ((u64)base < wend && end > wbase)
                        return 0;
 
+#if 0
                /*
                 * Check if target/attribute conflicts
                 */
                if (target == wtarget && attr == wattr)
                        return 0;
+#endif
        }
 
        return 1;
diff --git a/drivers/pci/host/pci-mvebu.c b/drivers/pci/host/pci-mvebu.c
index 2aa7b77c..67fe6df 100644
--- a/drivers/pci/host/pci-mvebu.c
+++ b/drivers/pci/host/pci-mvebu.c
@@ -361,8 +361,15 @@ static void mvebu_pcie_handle_membase_change(struct mvebu_pcie_port *port)
                (((port->bridge.memlimit & 0xFFF0) << 16) | 0xFFFFF) -
                port->memwin_base;
 
-       mvebu_mbus_add_window_by_id(port->mem_target, port->mem_attr,
-                                   port->memwin_base, port->memwin_size);
+       if (port->memwin_size == (SZ_128M + SZ_64M)) {
+               mvebu_mbus_add_window_by_id(port->mem_target, port->mem_attr,
+                                           port->memwin_base, SZ_128M);
+               mvebu_mbus_add_window_by_id(port->mem_target, port->mem_attr,
+                                           port->memwin_base + SZ_128M, SZ_64M);
+       } else {
+               mvebu_mbus_add_window_by_id(port->mem_target, port->mem_attr,
+                                           port->memwin_base, port->memwin_size);
+       }
 }
 
 /*



-- 
Thomas Petazzoni, CTO, Free Electrons
Embedded Linux, Kernel and Android engineering
http://free-electrons.com

^ permalink raw reply related	[flat|nested] 90+ messages in thread

* pci-mvebu driver on km_kirkwood
@ 2014-02-21  9:39                                       ` Thomas Petazzoni
  0 siblings, 0 replies; 90+ messages in thread
From: Thomas Petazzoni @ 2014-02-21  9:39 UTC (permalink / raw)
  To: linux-arm-kernel

Dear Gerlando Falauto,

On Fri, 21 Feb 2014 10:16:32 +0100, Gerlando Falauto wrote:

> > Sure, but whichever choice we make, the Linux PCI core must know by how
> > much we've enlarge the bridge window BAR, otherwise the Linux PCI core
> > may allocate for the next bridge window BAR a range of addresses that
> > doesn't overlap with what it has allocate for the previous bridge
> > window BAR, but that ends up overlapping due to us "extending" the
> > previous bridge window BAR to match the MBus requirements.
> >
> > Gerlando, would you be able to test a quick hack that creates 2 windows
> > to cover exactly 128 MB + 64 MB ? This would at least allow us to
> > confirm that the strategy of splitting in multiple windows is usable.
> 
> Sure, though probably not until next week.

No problem at all.

> I guess it would then also be useful to restore my previous setup, where 
> the total PCIe aperture is 192MB, right?

Yes, that's the case I'm interested in at the moment. If you could try
the above (ugly) patch, and see if you can access all your device BARs,
it would be interesting. It would tell us if two separate windows
having the same target/attribute and consecutive placement in the
physical address space can actually work to address a given PCIe
device. As you will see, the patch makes a very ugly special case for
192 MB :-)

diff --git a/drivers/bus/mvebu-mbus.c b/drivers/bus/mvebu-mbus.c
index 2394e97..f763ecc 100644
--- a/drivers/bus/mvebu-mbus.c
+++ b/drivers/bus/mvebu-mbus.c
@@ -223,11 +223,13 @@ static int mvebu_mbus_window_conflicts(struct mvebu_mbus_state *mbus,
                if ((u64)base < wend && end > wbase)
                        return 0;
 
+#if 0
                /*
                 * Check if target/attribute conflicts
                 */
                if (target == wtarget && attr == wattr)
                        return 0;
+#endif
        }
 
        return 1;
diff --git a/drivers/pci/host/pci-mvebu.c b/drivers/pci/host/pci-mvebu.c
index 2aa7b77c..67fe6df 100644
--- a/drivers/pci/host/pci-mvebu.c
+++ b/drivers/pci/host/pci-mvebu.c
@@ -361,8 +361,15 @@ static void mvebu_pcie_handle_membase_change(struct mvebu_pcie_port *port)
                (((port->bridge.memlimit & 0xFFF0) << 16) | 0xFFFFF) -
                port->memwin_base;
 
-       mvebu_mbus_add_window_by_id(port->mem_target, port->mem_attr,
-                                   port->memwin_base, port->memwin_size);
+       if (port->memwin_size == (SZ_128M + SZ_64M)) {
+               mvebu_mbus_add_window_by_id(port->mem_target, port->mem_attr,
+                                           port->memwin_base, SZ_128M);
+               mvebu_mbus_add_window_by_id(port->mem_target, port->mem_attr,
+                                           port->memwin_base + SZ_128M, SZ_64M);
+       } else {
+               mvebu_mbus_add_window_by_id(port->mem_target, port->mem_attr,
+                                           port->memwin_base, port->memwin_size);
+       }
 }
 
 /*



-- 
Thomas Petazzoni, CTO, Free Electrons
Embedded Linux, Kernel and Android engineering
http://free-electrons.com

^ permalink raw reply related	[flat|nested] 90+ messages in thread

* Re: pci-mvebu driver on km_kirkwood
  2014-02-21  9:39                                       ` Thomas Petazzoni
@ 2014-02-21 12:24                                         ` Gerlando Falauto
  -1 siblings, 0 replies; 90+ messages in thread
From: Gerlando Falauto @ 2014-02-21 12:24 UTC (permalink / raw)
  To: Thomas Petazzoni
  Cc: Jason Gunthorpe, Bjorn Helgaas, linux-pci, linux-arm-kernel,
	Andrew Lunn, Sebastian Hesselbarth, Jason Cooper, Longchamp,
	Valentin, Ezequiel Garcia, Lior Amsalem, Gregory Clément

Hi Thomas,

On 02/21/2014 10:39 AM, Thomas Petazzoni wrote:
> Dear Gerlando Falauto,

[...]

>> I guess it would then also be useful to restore my previous setup, where
>> the total PCIe aperture is 192MB, right?
>
> Yes, that's the case I'm interested in at the moment. If you could try
> the above (ugly) patch, and see if you can access all your device BARs,
> it would be interesting. It would tell us if two separate windows
> having the same target/attribute and consecutive placement in the
> physical address space can actually work to address a given PCIe
> device. As you will see, the patch makes a very ugly special case for
> 192 MB :-)
>

So I restored the total aperture size to 192MB.
I had to rework your patch a bit because:

a) I'm running an older kernel and driver
b) sizes are actually 1-byte offset

So here it is:

diff --git a/drivers/bus/mvebu-mbus.c b/drivers/bus/mvebu-mbus.c
index dd4445f..27fe162 100644
--- a/drivers/bus/mvebu-mbus.c
+++ b/drivers/bus/mvebu-mbus.c
@@ -251,11 +251,13 @@ static int mvebu_mbus_window_conflicts(struct 
mvebu_mbus_state *mbus,
  		if ((u64)base < wend && end > wbase)
  			return 0;

+#if 0
  		/*
  		 * Check if target/attribute conflicts
  		 */
  		if (target == wtarget && attr == wattr)
  			return 0;
+#endif
  	}

  	return 1;
diff --git a/drivers/pci/host/pci-mvebu.c b/drivers/pci/host/pci-mvebu.c
index c8397c4..120a822 100644
--- a/drivers/pci/host/pci-mvebu.c
+++ b/drivers/pci/host/pci-mvebu.c
@@ -332,10 +332,21 @@ static void 
mvebu_pcie_handle_membase_change(struct mvebu_pcie_port *port)
  		(((port->bridge.memlimit & 0xFFF0) << 16) | 0xFFFFF) -
  		port->memwin_base;

-	mvebu_mbus_add_window_remap_flags(port->name, port->memwin_base,
-					  port->memwin_size,
-					  MVEBU_MBUS_NO_REMAP,
-					  MVEBU_MBUS_PCI_MEM);
+	if (port->memwin_size + 1 == (SZ_128M + SZ_64M)) {
+		mvebu_mbus_add_window_remap_flags(port->name, port->memwin_base,
+						  SZ_128M - 1,
+						  MVEBU_MBUS_NO_REMAP,
+						  MVEBU_MBUS_PCI_MEM);
+		mvebu_mbus_add_window_remap_flags(port->name, port->memwin_base + 
SZ_128M,
+						  SZ_64M - 1,
+						  MVEBU_MBUS_NO_REMAP,
+						  MVEBU_MBUS_PCI_MEM);
+	} else {
+		mvebu_mbus_add_window_remap_flags(port->name, port->memwin_base,
+						  port->memwin_size,
+						  MVEBU_MBUS_NO_REMAP,
+						  MVEBU_MBUS_PCI_MEM);
+	}
  }

  /*


Here's the assignment (same as before):

pci 0000:00:01.0: BAR 8: assigned [mem 0xe0000000-0xebffffff]
pci 0000:01:00.0: BAR 1: assigned [mem 0xe0000000-0xe7ffffff]
pci 0000:01:00.0: BAR 3: assigned [mem 0xe8000000-0xe87fffff]
pci 0000:01:00.0: BAR 4: assigned [mem 0xe8800000-0xe8801fff]
pci 0000:01:00.0: BAR 0: assigned [mem 0xe8802000-0xe8802fff]
pci 0000:01:00.0: BAR 2: assigned [mem 0xe8803000-0xe8803fff]
pci 0000:01:00.0: BAR 5: assigned [mem 0xe8804000-0xe8804fff]

And here's the output I get from:

# cat /sys/kernel/debug/mvebu-mbus/devices
[00] 00000000e8000000 - 00000000ec000000 : pcie0.0 (remap 00000000e8000000)
[01] disabled
[02] disabled
[03] disabled
[04] 00000000ff000000 - 00000000ff010000 : nand
[05] 00000000f4000000 - 00000000f8000000 : vpcie
[06] 00000000fe000000 - 00000000fe010000 : dragonite
[07] 00000000e0000000 - 00000000e8000000 : pcie0.0

I did not get to test the whole address space thoroughly, but all the 
BARs are still accessible (mainly BAR0 which contains the control space 
and is mapped on the "new" MBUS window, and BAR1 which is the "big" 
one). So at least, the issues we had before are now gone.
So I'd say this looks like a very promising approach. :-)

Thank you,
Gerlando

^ permalink raw reply related	[flat|nested] 90+ messages in thread

* pci-mvebu driver on km_kirkwood
@ 2014-02-21 12:24                                         ` Gerlando Falauto
  0 siblings, 0 replies; 90+ messages in thread
From: Gerlando Falauto @ 2014-02-21 12:24 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Thomas,

On 02/21/2014 10:39 AM, Thomas Petazzoni wrote:
> Dear Gerlando Falauto,

[...]

>> I guess it would then also be useful to restore my previous setup, where
>> the total PCIe aperture is 192MB, right?
>
> Yes, that's the case I'm interested in at the moment. If you could try
> the above (ugly) patch, and see if you can access all your device BARs,
> it would be interesting. It would tell us if two separate windows
> having the same target/attribute and consecutive placement in the
> physical address space can actually work to address a given PCIe
> device. As you will see, the patch makes a very ugly special case for
> 192 MB :-)
>

So I restored the total aperture size to 192MB.
I had to rework your patch a bit because:

a) I'm running an older kernel and driver
b) sizes are actually 1-byte offset

So here it is:

diff --git a/drivers/bus/mvebu-mbus.c b/drivers/bus/mvebu-mbus.c
index dd4445f..27fe162 100644
--- a/drivers/bus/mvebu-mbus.c
+++ b/drivers/bus/mvebu-mbus.c
@@ -251,11 +251,13 @@ static int mvebu_mbus_window_conflicts(struct 
mvebu_mbus_state *mbus,
  		if ((u64)base < wend && end > wbase)
  			return 0;

+#if 0
  		/*
  		 * Check if target/attribute conflicts
  		 */
  		if (target == wtarget && attr == wattr)
  			return 0;
+#endif
  	}

  	return 1;
diff --git a/drivers/pci/host/pci-mvebu.c b/drivers/pci/host/pci-mvebu.c
index c8397c4..120a822 100644
--- a/drivers/pci/host/pci-mvebu.c
+++ b/drivers/pci/host/pci-mvebu.c
@@ -332,10 +332,21 @@ static void 
mvebu_pcie_handle_membase_change(struct mvebu_pcie_port *port)
  		(((port->bridge.memlimit & 0xFFF0) << 16) | 0xFFFFF) -
  		port->memwin_base;

-	mvebu_mbus_add_window_remap_flags(port->name, port->memwin_base,
-					  port->memwin_size,
-					  MVEBU_MBUS_NO_REMAP,
-					  MVEBU_MBUS_PCI_MEM);
+	if (port->memwin_size + 1 == (SZ_128M + SZ_64M)) {
+		mvebu_mbus_add_window_remap_flags(port->name, port->memwin_base,
+						  SZ_128M - 1,
+						  MVEBU_MBUS_NO_REMAP,
+						  MVEBU_MBUS_PCI_MEM);
+		mvebu_mbus_add_window_remap_flags(port->name, port->memwin_base + 
SZ_128M,
+						  SZ_64M - 1,
+						  MVEBU_MBUS_NO_REMAP,
+						  MVEBU_MBUS_PCI_MEM);
+	} else {
+		mvebu_mbus_add_window_remap_flags(port->name, port->memwin_base,
+						  port->memwin_size,
+						  MVEBU_MBUS_NO_REMAP,
+						  MVEBU_MBUS_PCI_MEM);
+	}
  }

  /*


Here's the assignment (same as before):

pci 0000:00:01.0: BAR 8: assigned [mem 0xe0000000-0xebffffff]
pci 0000:01:00.0: BAR 1: assigned [mem 0xe0000000-0xe7ffffff]
pci 0000:01:00.0: BAR 3: assigned [mem 0xe8000000-0xe87fffff]
pci 0000:01:00.0: BAR 4: assigned [mem 0xe8800000-0xe8801fff]
pci 0000:01:00.0: BAR 0: assigned [mem 0xe8802000-0xe8802fff]
pci 0000:01:00.0: BAR 2: assigned [mem 0xe8803000-0xe8803fff]
pci 0000:01:00.0: BAR 5: assigned [mem 0xe8804000-0xe8804fff]

And here's the output I get from:

# cat /sys/kernel/debug/mvebu-mbus/devices
[00] 00000000e8000000 - 00000000ec000000 : pcie0.0 (remap 00000000e8000000)
[01] disabled
[02] disabled
[03] disabled
[04] 00000000ff000000 - 00000000ff010000 : nand
[05] 00000000f4000000 - 00000000f8000000 : vpcie
[06] 00000000fe000000 - 00000000fe010000 : dragonite
[07] 00000000e0000000 - 00000000e8000000 : pcie0.0

I did not get to test the whole address space thoroughly, but all the 
BARs are still accessible (mainly BAR0 which contains the control space 
and is mapped on the "new" MBUS window, and BAR1 which is the "big" 
one). So at least, the issues we had before are now gone.
So I'd say this looks like a very promising approach. :-)

Thank you,
Gerlando

^ permalink raw reply related	[flat|nested] 90+ messages in thread

* Re: pci-mvebu driver on km_kirkwood
  2014-02-21 12:24                                         ` Gerlando Falauto
@ 2014-02-21 13:47                                           ` Thomas Petazzoni
  -1 siblings, 0 replies; 90+ messages in thread
From: Thomas Petazzoni @ 2014-02-21 13:47 UTC (permalink / raw)
  To: Gerlando Falauto
  Cc: Jason Gunthorpe, Bjorn Helgaas, linux-pci, linux-arm-kernel,
	Andrew Lunn, Sebastian Hesselbarth, Jason Cooper, Longchamp,
	Valentin, Ezequiel Garcia, Lior Amsalem, Gregory Clément

Dear Gerlando Falauto,

On Fri, 21 Feb 2014 13:24:36 +0100, Gerlando Falauto wrote:

> So I restored the total aperture size to 192MB.
> I had to rework your patch a bit because:
> 
> a) I'm running an older kernel and driver
> b) sizes are actually 1-byte offset

Hum, right. This is a bit weird, maybe I should change that, I don't
think the mvebu-mbus driver should accept 1-byte offset sizes.

> Here's the assignment (same as before):
> 
> pci 0000:00:01.0: BAR 8: assigned [mem 0xe0000000-0xebffffff]
> pci 0000:01:00.0: BAR 1: assigned [mem 0xe0000000-0xe7ffffff]
> pci 0000:01:00.0: BAR 3: assigned [mem 0xe8000000-0xe87fffff]
> pci 0000:01:00.0: BAR 4: assigned [mem 0xe8800000-0xe8801fff]
> pci 0000:01:00.0: BAR 0: assigned [mem 0xe8802000-0xe8802fff]
> pci 0000:01:00.0: BAR 2: assigned [mem 0xe8803000-0xe8803fff]
> pci 0000:01:00.0: BAR 5: assigned [mem 0xe8804000-0xe8804fff]
> 
> And here's the output I get from:
> 
> # cat /sys/kernel/debug/mvebu-mbus/devices
> [00] 00000000e8000000 - 00000000ec000000 : pcie0.0 (remap 00000000e8000000)
> [01] disabled
> [02] disabled
> [03] disabled
> [04] 00000000ff000000 - 00000000ff010000 : nand
> [05] 00000000f4000000 - 00000000f8000000 : vpcie
> [06] 00000000fe000000 - 00000000fe010000 : dragonite
> [07] 00000000e0000000 - 00000000e8000000 : pcie0.0

This seems correct: we have two windows pointing to the same device,
and they have consecutive addresses.

> I did not get to test the whole address space thoroughly, but all the 
> BARs are still accessible (mainly BAR0 which contains the control space 
> and is mapped on the "new" MBUS window, and BAR1 which is the "big" 
> one). So at least, the issues we had before are now gone.

Did you check that what you read from BAR0 (which is mapped on the new
MBUS window) is really what you expect, and not just the same thing as
BAR1 accessible for the big window? I just want to make sure that the
hardware indeed properly handles two windows for the same device.

> So I'd say this looks like a very promising approach. :-)

Indeed. However, I don't think this approach solves the entire problem,
for two reasons:

 *) For small BARs that are not power-of-two sized, we may not want to
    consume two windows, but instead consume a little bit more address
    space. Using two windows to map a 96 KB BAR would be a waste of
    windows: using a single 128 KB window is much more efficient.

 *) I don't know if the algorithm to split the BAR into multiple
    windows is going to be trivial.

Thomas
-- 
Thomas Petazzoni, CTO, Free Electrons
Embedded Linux, Kernel and Android engineering
http://free-electrons.com

^ permalink raw reply	[flat|nested] 90+ messages in thread

* pci-mvebu driver on km_kirkwood
@ 2014-02-21 13:47                                           ` Thomas Petazzoni
  0 siblings, 0 replies; 90+ messages in thread
From: Thomas Petazzoni @ 2014-02-21 13:47 UTC (permalink / raw)
  To: linux-arm-kernel

Dear Gerlando Falauto,

On Fri, 21 Feb 2014 13:24:36 +0100, Gerlando Falauto wrote:

> So I restored the total aperture size to 192MB.
> I had to rework your patch a bit because:
> 
> a) I'm running an older kernel and driver
> b) sizes are actually 1-byte offset

Hum, right. This is a bit weird, maybe I should change that, I don't
think the mvebu-mbus driver should accept 1-byte offset sizes.

> Here's the assignment (same as before):
> 
> pci 0000:00:01.0: BAR 8: assigned [mem 0xe0000000-0xebffffff]
> pci 0000:01:00.0: BAR 1: assigned [mem 0xe0000000-0xe7ffffff]
> pci 0000:01:00.0: BAR 3: assigned [mem 0xe8000000-0xe87fffff]
> pci 0000:01:00.0: BAR 4: assigned [mem 0xe8800000-0xe8801fff]
> pci 0000:01:00.0: BAR 0: assigned [mem 0xe8802000-0xe8802fff]
> pci 0000:01:00.0: BAR 2: assigned [mem 0xe8803000-0xe8803fff]
> pci 0000:01:00.0: BAR 5: assigned [mem 0xe8804000-0xe8804fff]
> 
> And here's the output I get from:
> 
> # cat /sys/kernel/debug/mvebu-mbus/devices
> [00] 00000000e8000000 - 00000000ec000000 : pcie0.0 (remap 00000000e8000000)
> [01] disabled
> [02] disabled
> [03] disabled
> [04] 00000000ff000000 - 00000000ff010000 : nand
> [05] 00000000f4000000 - 00000000f8000000 : vpcie
> [06] 00000000fe000000 - 00000000fe010000 : dragonite
> [07] 00000000e0000000 - 00000000e8000000 : pcie0.0

This seems correct: we have two windows pointing to the same device,
and they have consecutive addresses.

> I did not get to test the whole address space thoroughly, but all the 
> BARs are still accessible (mainly BAR0 which contains the control space 
> and is mapped on the "new" MBUS window, and BAR1 which is the "big" 
> one). So at least, the issues we had before are now gone.

Did you check that what you read from BAR0 (which is mapped on the new
MBUS window) is really what you expect, and not just the same thing as
BAR1 accessible for the big window? I just want to make sure that the
hardware indeed properly handles two windows for the same device.

> So I'd say this looks like a very promising approach. :-)

Indeed. However, I don't think this approach solves the entire problem,
for two reasons:

 *) For small BARs that are not power-of-two sized, we may not want to
    consume two windows, but instead consume a little bit more address
    space. Using two windows to map a 96 KB BAR would be a waste of
    windows: using a single 128 KB window is much more efficient.

 *) I don't know if the algorithm to split the BAR into multiple
    windows is going to be trivial.

Thomas
-- 
Thomas Petazzoni, CTO, Free Electrons
Embedded Linux, Kernel and Android engineering
http://free-electrons.com

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: pci-mvebu driver on km_kirkwood
  2014-02-21 13:47                                           ` Thomas Petazzoni
@ 2014-02-21 15:05                                             ` Arnd Bergmann
  -1 siblings, 0 replies; 90+ messages in thread
From: Arnd Bergmann @ 2014-02-21 15:05 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: Thomas Petazzoni, Gerlando Falauto, Lior Amsalem, Andrew Lunn,
	Jason Cooper, Longchamp, Valentin, linux-pci, Jason Gunthorpe,
	Gregory Clément, Ezequiel Garcia, Bjorn Helgaas,
	Sebastian Hesselbarth

On Friday 21 February 2014 14:47:08 Thomas Petazzoni wrote:
> 
> > So I'd say this looks like a very promising approach. 
> 
> Indeed. However, I don't think this approach solves the entire problem,
> for two reasons:
> 
>  *) For small BARs that are not power-of-two sized, we may not want to
>     consume two windows, but instead consume a little bit more address
>     space. Using two windows to map a 96 KB BAR would be a waste of
>     windows: using a single 128 KB window is much more efficient.

definitely.
 
>  *) I don't know if the algorithm to split the BAR into multiple
>     windows is going to be trivial.

The easiest solution would be to special case 'size is between
128MB+1 and 192MB' if that turns out to be the most interesting
case. It's easy enough to make the second window smaller than 64MB
if we want.

If we want things to be a little fancier, we could use:

	switch (size) {
		case (SZ_32M+1) ... (SZ_32M+SZ_16M):
			size2 = size - SZ_32M;
			size -= SZ_32M;
			break;
		case (SZ_64M+1) ... (SZ_64M+SZ_32M):
			size2 = size - SZ_64M;
			size -= SZ_64M;
			break;
		case (SZ_128M+1) ... (SZ_128M+SZ_64M):
			size2 = size - SZ_128M;
			size -= SZ_128M;
			break;
	};


	Arnd

^ permalink raw reply	[flat|nested] 90+ messages in thread

* pci-mvebu driver on km_kirkwood
@ 2014-02-21 15:05                                             ` Arnd Bergmann
  0 siblings, 0 replies; 90+ messages in thread
From: Arnd Bergmann @ 2014-02-21 15:05 UTC (permalink / raw)
  To: linux-arm-kernel

On Friday 21 February 2014 14:47:08 Thomas Petazzoni wrote:
> 
> > So I'd say this looks like a very promising approach. 
> 
> Indeed. However, I don't think this approach solves the entire problem,
> for two reasons:
> 
>  *) For small BARs that are not power-of-two sized, we may not want to
>     consume two windows, but instead consume a little bit more address
>     space. Using two windows to map a 96 KB BAR would be a waste of
>     windows: using a single 128 KB window is much more efficient.

definitely.
 
>  *) I don't know if the algorithm to split the BAR into multiple
>     windows is going to be trivial.

The easiest solution would be to special case 'size is between
128MB+1 and 192MB' if that turns out to be the most interesting
case. It's easy enough to make the second window smaller than 64MB
if we want.

If we want things to be a little fancier, we could use:

	switch (size) {
		case (SZ_32M+1) ... (SZ_32M+SZ_16M):
			size2 = size - SZ_32M;
			size -= SZ_32M;
			break;
		case (SZ_64M+1) ... (SZ_64M+SZ_32M):
			size2 = size - SZ_64M;
			size -= SZ_64M;
			break;
		case (SZ_128M+1) ... (SZ_128M+SZ_64M):
			size2 = size - SZ_128M;
			size -= SZ_128M;
			break;
	};


	Arnd

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: pci-mvebu driver on km_kirkwood
  2014-02-21 15:05                                             ` Arnd Bergmann
@ 2014-02-21 15:11                                               ` Thomas Petazzoni
  -1 siblings, 0 replies; 90+ messages in thread
From: Thomas Petazzoni @ 2014-02-21 15:11 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: linux-arm-kernel, Gerlando Falauto, Lior Amsalem, Andrew Lunn,
	Jason Cooper, Longchamp, Valentin, linux-pci, Jason Gunthorpe,
	Gregory Clément, Ezequiel Garcia, Bjorn Helgaas,
	Sebastian Hesselbarth

Dear Arnd Bergmann,

On Fri, 21 Feb 2014 16:05:16 +0100, Arnd Bergmann wrote:

> >  *) I don't know if the algorithm to split the BAR into multiple
> >     windows is going to be trivial.
> 
> The easiest solution would be to special case 'size is between
> 128MB+1 and 192MB' if that turns out to be the most interesting
> case. It's easy enough to make the second window smaller than 64MB
> if we want.
> 
> If we want things to be a little fancier, we could use:
> 
> 	switch (size) {
> 		case (SZ_32M+1) ... (SZ_32M+SZ_16M):
> 			size2 = size - SZ_32M;
> 			size -= SZ_32M;
> 			break;
> 		case (SZ_64M+1) ... (SZ_64M+SZ_32M):
> 			size2 = size - SZ_64M;
> 			size -= SZ_64M;
> 			break;
> 		case (SZ_128M+1) ... (SZ_128M+SZ_64M):
> 			size2 = size - SZ_128M;
> 			size -= SZ_128M;
> 			break;
> 	};

What if the size of your BAR is 128 MB + 64 MB + 32 MB ? Then you need
three windows, and your algorithm doesn't work :-)

Thomas
-- 
Thomas Petazzoni, CTO, Free Electrons
Embedded Linux, Kernel and Android engineering
http://free-electrons.com

^ permalink raw reply	[flat|nested] 90+ messages in thread

* pci-mvebu driver on km_kirkwood
@ 2014-02-21 15:11                                               ` Thomas Petazzoni
  0 siblings, 0 replies; 90+ messages in thread
From: Thomas Petazzoni @ 2014-02-21 15:11 UTC (permalink / raw)
  To: linux-arm-kernel

Dear Arnd Bergmann,

On Fri, 21 Feb 2014 16:05:16 +0100, Arnd Bergmann wrote:

> >  *) I don't know if the algorithm to split the BAR into multiple
> >     windows is going to be trivial.
> 
> The easiest solution would be to special case 'size is between
> 128MB+1 and 192MB' if that turns out to be the most interesting
> case. It's easy enough to make the second window smaller than 64MB
> if we want.
> 
> If we want things to be a little fancier, we could use:
> 
> 	switch (size) {
> 		case (SZ_32M+1) ... (SZ_32M+SZ_16M):
> 			size2 = size - SZ_32M;
> 			size -= SZ_32M;
> 			break;
> 		case (SZ_64M+1) ... (SZ_64M+SZ_32M):
> 			size2 = size - SZ_64M;
> 			size -= SZ_64M;
> 			break;
> 		case (SZ_128M+1) ... (SZ_128M+SZ_64M):
> 			size2 = size - SZ_128M;
> 			size -= SZ_128M;
> 			break;
> 	};

What if the size of your BAR is 128 MB + 64 MB + 32 MB ? Then you need
three windows, and your algorithm doesn't work :-)

Thomas
-- 
Thomas Petazzoni, CTO, Free Electrons
Embedded Linux, Kernel and Android engineering
http://free-electrons.com

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: pci-mvebu driver on km_kirkwood
  2014-02-21 15:11                                               ` Thomas Petazzoni
@ 2014-02-21 15:20                                                 ` Arnd Bergmann
  -1 siblings, 0 replies; 90+ messages in thread
From: Arnd Bergmann @ 2014-02-21 15:20 UTC (permalink / raw)
  To: Thomas Petazzoni
  Cc: linux-arm-kernel, Gerlando Falauto, Lior Amsalem, Andrew Lunn,
	Jason Cooper, Longchamp, Valentin, linux-pci, Jason Gunthorpe,
	Gregory Clément, Ezequiel Garcia, Bjorn Helgaas,
	Sebastian Hesselbarth

On Friday 21 February 2014 16:11:08 Thomas Petazzoni wrote:
> On Fri, 21 Feb 2014 16:05:16 +0100, Arnd Bergmann wrote:
> 
> > >  *) I don't know if the algorithm to split the BAR into multiple
> > >     windows is going to be trivial.
> > 
> > The easiest solution would be to special case 'size is between
> > 128MB+1 and 192MB' if that turns out to be the most interesting
> > case. It's easy enough to make the second window smaller than 64MB
> > if we want.
> > 
> > If we want things to be a little fancier, we could use:
> > 
> >       switch (size) {
> >               case (SZ_32M+1) ... (SZ_32M+SZ_16M):
> >                       size2 = size - SZ_32M;
> >                       size -= SZ_32M;
> >                       break;
> >               case (SZ_64M+1) ... (SZ_64M+SZ_32M):
> >                       size2 = size - SZ_64M;
> >                       size -= SZ_64M;
> >                       break;
> >               case (SZ_128M+1) ... (SZ_128M+SZ_64M):
> >                       size2 = size - SZ_128M;
> >                       size -= SZ_128M;
> >                       break;
> >       };
> 
> What if the size of your BAR is 128 MB + 64 MB + 32 MB ? Then you need
> three windows, and your algorithm doesn't work 

I was hoping we could avoid using more than two windows.
With the algorithm above we would round up to 256MB and
fail if that doesn't fit, which is the same thing that
happens when you run out of space.

	Arnd

^ permalink raw reply	[flat|nested] 90+ messages in thread

* pci-mvebu driver on km_kirkwood
@ 2014-02-21 15:20                                                 ` Arnd Bergmann
  0 siblings, 0 replies; 90+ messages in thread
From: Arnd Bergmann @ 2014-02-21 15:20 UTC (permalink / raw)
  To: linux-arm-kernel

On Friday 21 February 2014 16:11:08 Thomas Petazzoni wrote:
> On Fri, 21 Feb 2014 16:05:16 +0100, Arnd Bergmann wrote:
> 
> > >  *) I don't know if the algorithm to split the BAR into multiple
> > >     windows is going to be trivial.
> > 
> > The easiest solution would be to special case 'size is between
> > 128MB+1 and 192MB' if that turns out to be the most interesting
> > case. It's easy enough to make the second window smaller than 64MB
> > if we want.
> > 
> > If we want things to be a little fancier, we could use:
> > 
> >       switch (size) {
> >               case (SZ_32M+1) ... (SZ_32M+SZ_16M):
> >                       size2 = size - SZ_32M;
> >                       size -= SZ_32M;
> >                       break;
> >               case (SZ_64M+1) ... (SZ_64M+SZ_32M):
> >                       size2 = size - SZ_64M;
> >                       size -= SZ_64M;
> >                       break;
> >               case (SZ_128M+1) ... (SZ_128M+SZ_64M):
> >                       size2 = size - SZ_128M;
> >                       size -= SZ_128M;
> >                       break;
> >       };
> 
> What if the size of your BAR is 128 MB + 64 MB + 32 MB ? Then you need
> three windows, and your algorithm doesn't work 

I was hoping we could avoid using more than two windows.
With the algorithm above we would round up to 256MB and
fail if that doesn't fit, which is the same thing that
happens when you run out of space.

	Arnd

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: pci-mvebu driver on km_kirkwood
  2014-02-21 15:20                                                 ` Arnd Bergmann
@ 2014-02-21 15:37                                                   ` Thomas Petazzoni
  -1 siblings, 0 replies; 90+ messages in thread
From: Thomas Petazzoni @ 2014-02-21 15:37 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: linux-arm-kernel, Gerlando Falauto, Lior Amsalem, Andrew Lunn,
	Jason Cooper, Longchamp, Valentin, linux-pci, Jason Gunthorpe,
	Gregory Clément, Ezequiel Garcia, Bjorn Helgaas,
	Sebastian Hesselbarth

Dear Arnd Bergmann,

On Fri, 21 Feb 2014 16:20:45 +0100, Arnd Bergmann wrote:

> > What if the size of your BAR is 128 MB + 64 MB + 32 MB ? Then you need
> > three windows, and your algorithm doesn't work 
> 
> I was hoping we could avoid using more than two windows.
> With the algorithm above we would round up to 256MB and
> fail if that doesn't fit, which is the same thing that
> happens when you run out of space.

The problem is precisely that we currently don't have any well to tell
the Linux PCI core that we need to round up the size of a BAR. That's
the whole starting point of the discussion :-)

Thomas
-- 
Thomas Petazzoni, CTO, Free Electrons
Embedded Linux, Kernel and Android engineering
http://free-electrons.com

^ permalink raw reply	[flat|nested] 90+ messages in thread

* pci-mvebu driver on km_kirkwood
@ 2014-02-21 15:37                                                   ` Thomas Petazzoni
  0 siblings, 0 replies; 90+ messages in thread
From: Thomas Petazzoni @ 2014-02-21 15:37 UTC (permalink / raw)
  To: linux-arm-kernel

Dear Arnd Bergmann,

On Fri, 21 Feb 2014 16:20:45 +0100, Arnd Bergmann wrote:

> > What if the size of your BAR is 128 MB + 64 MB + 32 MB ? Then you need
> > three windows, and your algorithm doesn't work 
> 
> I was hoping we could avoid using more than two windows.
> With the algorithm above we would round up to 256MB and
> fail if that doesn't fit, which is the same thing that
> happens when you run out of space.

The problem is precisely that we currently don't have any well to tell
the Linux PCI core that we need to round up the size of a BAR. That's
the whole starting point of the discussion :-)

Thomas
-- 
Thomas Petazzoni, CTO, Free Electrons
Embedded Linux, Kernel and Android engineering
http://free-electrons.com

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: pci-mvebu driver on km_kirkwood
  2014-02-21 13:47                                           ` Thomas Petazzoni
@ 2014-02-21 16:39                                             ` Jason Gunthorpe
  -1 siblings, 0 replies; 90+ messages in thread
From: Jason Gunthorpe @ 2014-02-21 16:39 UTC (permalink / raw)
  To: Thomas Petazzoni
  Cc: Gerlando Falauto, Bjorn Helgaas, linux-pci, linux-arm-kernel,
	Andrew Lunn, Sebastian Hesselbarth, Jason Cooper, Longchamp,
	Valentin, Ezequiel Garcia, Lior Amsalem, Gregory Cl??ment

On Fri, Feb 21, 2014 at 02:47:08PM +0100, Thomas Petazzoni wrote:
>  *) I don't know if the algorithm to split the BAR into multiple
>     windows is going to be trivial.

physaddr_t base,size;

while (size != 0) {
   physaddr_t window_size = 1 << log2_round_down(size);
   create_window(base,window_size);
   base += window_size;
   size -= window_size;
}

At the very worst log2_round_down is approxmiately

unsigned int log2_round_down(unsigned int val)
{
	unsigned int res = 0;
	while ((1<<res) <= val)
		res++;
	return res - 1;
}

Minimum PCI required alignment for windows is 1MB so it will always
work out into some number of mbus windows..

Jason

^ permalink raw reply	[flat|nested] 90+ messages in thread

* pci-mvebu driver on km_kirkwood
@ 2014-02-21 16:39                                             ` Jason Gunthorpe
  0 siblings, 0 replies; 90+ messages in thread
From: Jason Gunthorpe @ 2014-02-21 16:39 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri, Feb 21, 2014 at 02:47:08PM +0100, Thomas Petazzoni wrote:
>  *) I don't know if the algorithm to split the BAR into multiple
>     windows is going to be trivial.

physaddr_t base,size;

while (size != 0) {
   physaddr_t window_size = 1 << log2_round_down(size);
   create_window(base,window_size);
   base += window_size;
   size -= window_size;
}

At the very worst log2_round_down is approxmiately

unsigned int log2_round_down(unsigned int val)
{
	unsigned int res = 0;
	while ((1<<res) <= val)
		res++;
	return res - 1;
}

Minimum PCI required alignment for windows is 1MB so it will always
work out into some number of mbus windows..

Jason

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: pci-mvebu driver on km_kirkwood
  2014-02-21 16:39                                             ` Jason Gunthorpe
@ 2014-02-21 17:05                                               ` Thomas Petazzoni
  -1 siblings, 0 replies; 90+ messages in thread
From: Thomas Petazzoni @ 2014-02-21 17:05 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Gerlando Falauto, Bjorn Helgaas, linux-pci, linux-arm-kernel,
	Andrew Lunn, Sebastian Hesselbarth, Jason Cooper, Longchamp,
	Valentin, Ezequiel Garcia, Lior Amsalem, Gregory Cl??ment

Dear Jason Gunthorpe,

On Fri, 21 Feb 2014 09:39:02 -0700, Jason Gunthorpe wrote:
> On Fri, Feb 21, 2014 at 02:47:08PM +0100, Thomas Petazzoni wrote:
> >  *) I don't know if the algorithm to split the BAR into multiple
> >     windows is going to be trivial.
> 
> physaddr_t base,size;
> 
> while (size != 0) {
>    physaddr_t window_size = 1 << log2_round_down(size);
>    create_window(base,window_size);
>    base += window_size;
>    size -= window_size;
> }
> 
> At the very worst log2_round_down is approxmiately
> 
> unsigned int log2_round_down(unsigned int val)
> {
> 	unsigned int res = 0;
> 	while ((1<<res) <= val)
> 		res++;
> 	return res - 1;
> }
> 
> Minimum PCI required alignment for windows is 1MB so it will always
> work out into some number of mbus windows..

Interesting! Thanks!

Now I have another question: our mvebu_pcie_align_resource() function
makes sure that the base address of the BAR is aligned on its size,
because it is a requirement of MBus windows. However, if you later
split the BAR into multiple windows, will this continue to work out?

Let's take an example: a 96 MB BAR. If it gets put at 0xe0000000, then
no problem: we create one 64 MB window at 0xe0000000 and a 32 MB window
at 0xe4000000. Both base addresses are aligned on the size of the
window.

However, if the 96 MB BAR gets put at 0xea000000 (which is aligned on a
96 MB boundary, as required by our mvebu_pcie_align_resource). We
create one 64 MB window at 0xea000000, and one 32 MB window at
0xee000000. Unfortunately, while 0xea000000 is aligned on a 96 MB
boundary, it is not aligned on a 64 MB boundary, so the 64 MB window we
have created is wrong.

Which also makes me think that our mvebu_pcie_align_resource()
function uses round_up(start, size), which most likely doesn't work with
non power-of-two sizes.

Thomas
-- 
Thomas Petazzoni, CTO, Free Electrons
Embedded Linux, Kernel and Android engineering
http://free-electrons.com

^ permalink raw reply	[flat|nested] 90+ messages in thread

* pci-mvebu driver on km_kirkwood
@ 2014-02-21 17:05                                               ` Thomas Petazzoni
  0 siblings, 0 replies; 90+ messages in thread
From: Thomas Petazzoni @ 2014-02-21 17:05 UTC (permalink / raw)
  To: linux-arm-kernel

Dear Jason Gunthorpe,

On Fri, 21 Feb 2014 09:39:02 -0700, Jason Gunthorpe wrote:
> On Fri, Feb 21, 2014 at 02:47:08PM +0100, Thomas Petazzoni wrote:
> >  *) I don't know if the algorithm to split the BAR into multiple
> >     windows is going to be trivial.
> 
> physaddr_t base,size;
> 
> while (size != 0) {
>    physaddr_t window_size = 1 << log2_round_down(size);
>    create_window(base,window_size);
>    base += window_size;
>    size -= window_size;
> }
> 
> At the very worst log2_round_down is approxmiately
> 
> unsigned int log2_round_down(unsigned int val)
> {
> 	unsigned int res = 0;
> 	while ((1<<res) <= val)
> 		res++;
> 	return res - 1;
> }
> 
> Minimum PCI required alignment for windows is 1MB so it will always
> work out into some number of mbus windows..

Interesting! Thanks!

Now I have another question: our mvebu_pcie_align_resource() function
makes sure that the base address of the BAR is aligned on its size,
because it is a requirement of MBus windows. However, if you later
split the BAR into multiple windows, will this continue to work out?

Let's take an example: a 96 MB BAR. If it gets put at 0xe0000000, then
no problem: we create one 64 MB window at 0xe0000000 and a 32 MB window
at 0xe4000000. Both base addresses are aligned on the size of the
window.

However, if the 96 MB BAR gets put at 0xea000000 (which is aligned on a
96 MB boundary, as required by our mvebu_pcie_align_resource). We
create one 64 MB window at 0xea000000, and one 32 MB window at
0xee000000. Unfortunately, while 0xea000000 is aligned on a 96 MB
boundary, it is not aligned on a 64 MB boundary, so the 64 MB window we
have created is wrong.

Which also makes me think that our mvebu_pcie_align_resource()
function uses round_up(start, size), which most likely doesn't work with
non power-of-two sizes.

Thomas
-- 
Thomas Petazzoni, CTO, Free Electrons
Embedded Linux, Kernel and Android engineering
http://free-electrons.com

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: pci-mvebu driver on km_kirkwood
  2014-02-21 17:05                                               ` Thomas Petazzoni
@ 2014-02-21 17:31                                                 ` Jason Gunthorpe
  -1 siblings, 0 replies; 90+ messages in thread
From: Jason Gunthorpe @ 2014-02-21 17:31 UTC (permalink / raw)
  To: Thomas Petazzoni
  Cc: Gerlando Falauto, Bjorn Helgaas, linux-pci, linux-arm-kernel,
	Andrew Lunn, Sebastian Hesselbarth, Jason Cooper, Longchamp,
	Valentin, Ezequiel Garcia, Lior Amsalem, Gregory Cl??ment

On Fri, Feb 21, 2014 at 06:05:08PM +0100, Thomas Petazzoni wrote:
 
> Now I have another question: our mvebu_pcie_align_resource() function
> makes sure that the base address of the BAR is aligned on its size,
> because it is a requirement of MBus windows. However, if you later
> split the BAR into multiple windows, will this continue to work out?

No, you must align to (1 << log2_round_down(size)) - that will always
be the largest mbus window created and thus the highest starting
alignment requirement.

I looked for a bit to see if I could find why the core is rounding up
to 196MB and it wasn't clear to me either.

Gerlando, if you instrument the code in setup-bus.c, particularly
pbus_size_mem, you will probably find out.

Jason

^ permalink raw reply	[flat|nested] 90+ messages in thread

* pci-mvebu driver on km_kirkwood
@ 2014-02-21 17:31                                                 ` Jason Gunthorpe
  0 siblings, 0 replies; 90+ messages in thread
From: Jason Gunthorpe @ 2014-02-21 17:31 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri, Feb 21, 2014 at 06:05:08PM +0100, Thomas Petazzoni wrote:
 
> Now I have another question: our mvebu_pcie_align_resource() function
> makes sure that the base address of the BAR is aligned on its size,
> because it is a requirement of MBus windows. However, if you later
> split the BAR into multiple windows, will this continue to work out?

No, you must align to (1 << log2_round_down(size)) - that will always
be the largest mbus window created and thus the highest starting
alignment requirement.

I looked for a bit to see if I could find why the core is rounding up
to 196MB and it wasn't clear to me either.

Gerlando, if you instrument the code in setup-bus.c, particularly
pbus_size_mem, you will probably find out.

Jason

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: pci-mvebu driver on km_kirkwood
  2014-02-21 17:31                                                 ` Jason Gunthorpe
@ 2014-02-21 18:05                                                   ` Arnd Bergmann
  -1 siblings, 0 replies; 90+ messages in thread
From: Arnd Bergmann @ 2014-02-21 18:05 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: Jason Gunthorpe, Thomas Petazzoni, Lior Amsalem, Andrew Lunn,
	Jason Cooper, Longchamp, Valentin, linux-pci, Gregory Cl??ment,
	Gerlando Falauto, Ezequiel Garcia, Bjorn Helgaas,
	Sebastian Hesselbarth

On Friday 21 February 2014 10:31:05 Jason Gunthorpe wrote:
> On Fri, Feb 21, 2014 at 06:05:08PM +0100, Thomas Petazzoni wrote:
>  
> > Now I have another question: our mvebu_pcie_align_resource() function
> > makes sure that the base address of the BAR is aligned on its size,
> > because it is a requirement of MBus windows. However, if you later
> > split the BAR into multiple windows, will this continue to work out?
> 
> No, you must align to (1 << log2_round_down(size)) - that will always
> be the largest mbus window created and thus the highest starting
> alignment requirement.

Unless you allow reordering the two windows. If you want a 96MB
window, you only need 32MB alignment because you can either put
the actual 64MB window first if you have 64MB alignment, or you
put the 32MB window first if you don't and then the following
64MB will be aligned.

It gets more complicated if you want to allow a 72MB window
(16MB+64MB), as that could either be 64MB aligned or start 16MB
before the next multiple of 64MB.

I don't think there is any reason why code anywhere should align
the window to a multiple of the size though if the size is not
power-of-two, such as aligning to multiples of 96MB. That wouldn't
help anyone.

	Arnd

^ permalink raw reply	[flat|nested] 90+ messages in thread

* pci-mvebu driver on km_kirkwood
@ 2014-02-21 18:05                                                   ` Arnd Bergmann
  0 siblings, 0 replies; 90+ messages in thread
From: Arnd Bergmann @ 2014-02-21 18:05 UTC (permalink / raw)
  To: linux-arm-kernel

On Friday 21 February 2014 10:31:05 Jason Gunthorpe wrote:
> On Fri, Feb 21, 2014 at 06:05:08PM +0100, Thomas Petazzoni wrote:
>  
> > Now I have another question: our mvebu_pcie_align_resource() function
> > makes sure that the base address of the BAR is aligned on its size,
> > because it is a requirement of MBus windows. However, if you later
> > split the BAR into multiple windows, will this continue to work out?
> 
> No, you must align to (1 << log2_round_down(size)) - that will always
> be the largest mbus window created and thus the highest starting
> alignment requirement.

Unless you allow reordering the two windows. If you want a 96MB
window, you only need 32MB alignment because you can either put
the actual 64MB window first if you have 64MB alignment, or you
put the 32MB window first if you don't and then the following
64MB will be aligned.

It gets more complicated if you want to allow a 72MB window
(16MB+64MB), as that could either be 64MB aligned or start 16MB
before the next multiple of 64MB.

I don't think there is any reason why code anywhere should align
the window to a multiple of the size though if the size is not
power-of-two, such as aligning to multiples of 96MB. That wouldn't
help anyone.

	Arnd

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: pci-mvebu driver on km_kirkwood
  2014-02-21 13:47                                           ` Thomas Petazzoni
@ 2014-02-21 18:18                                             ` Gerlando Falauto
  -1 siblings, 0 replies; 90+ messages in thread
From: Gerlando Falauto @ 2014-02-21 18:18 UTC (permalink / raw)
  To: Thomas Petazzoni
  Cc: Jason Gunthorpe, Bjorn Helgaas, linux-pci, linux-arm-kernel,
	Andrew Lunn, Sebastian Hesselbarth, Jason Cooper, Longchamp,
	Valentin, Ezequiel Garcia, Lior Amsalem, Gregory Clément

Dear Thomas,

On 02/21/2014 02:47 PM, Thomas Petazzoni wrote:
> Dear Gerlando Falauto,
>
> On Fri, 21 Feb 2014 13:24:36 +0100, Gerlando Falauto wrote:
>
>> So I restored the total aperture size to 192MB.
>> I had to rework your patch a bit because:
>>
>> a) I'm running an older kernel and driver
>> b) sizes are actually 1-byte offset
>
> Hum, right. This is a bit weird, maybe I should change that, I don't
> think the mvebu-mbus driver should accept 1-byte offset sizes.

I don't know anything about this, I only know the size dumped is of the 
form 0x...ffff, that's all.

>> Here's the assignment (same as before):
>>
>> pci 0000:00:01.0: BAR 8: assigned [mem 0xe0000000-0xebffffff]
>> pci 0000:01:00.0: BAR 1: assigned [mem 0xe0000000-0xe7ffffff]
>> pci 0000:01:00.0: BAR 3: assigned [mem 0xe8000000-0xe87fffff]
>> pci 0000:01:00.0: BAR 4: assigned [mem 0xe8800000-0xe8801fff]
>> pci 0000:01:00.0: BAR 0: assigned [mem 0xe8802000-0xe8802fff]
>> pci 0000:01:00.0: BAR 2: assigned [mem 0xe8803000-0xe8803fff]
>> pci 0000:01:00.0: BAR 5: assigned [mem 0xe8804000-0xe8804fff]
>>
>> And here's the output I get from:
>>
>> # cat /sys/kernel/debug/mvebu-mbus/devices
>> [00] 00000000e8000000 - 00000000ec000000 : pcie0.0 (remap 00000000e8000000)
>> [01] disabled
>> [02] disabled
>> [03] disabled
>> [04] 00000000ff000000 - 00000000ff010000 : nand
>> [05] 00000000f4000000 - 00000000f8000000 : vpcie
>> [06] 00000000fe000000 - 00000000fe010000 : dragonite
>> [07] 00000000e0000000 - 00000000e8000000 : pcie0.0
>
> This seems correct: we have two windows pointing to the same device,
> and they have consecutive addresses.

I don't know how to interpret the (remap ... ) bit, but yes, this looks 
right to me as well. I just don't know why mbus window 7 gets picked 
before 0, but apart from that, it looks nice.

>> I did not get to test the whole address space thoroughly, but all the
>> BARs are still accessible (mainly BAR0 which contains the control space
>> and is mapped on the "new" MBUS window, and BAR1 which is the "big"
>> one). So at least, the issues we had before are now gone.
>
> Did you check that what you read from BAR0 (which is mapped on the new
> MBUS window) is really what you expect, and not just the same thing as
> BAR1 accessible for the big window? I just want to make sure that the
> hardware indeed properly handles two windows for the same device.

Yes, there's no way the two BARs could be aliased. It's a fairly complex 
FPGA design, where BAR1 is the huge address space for a PCI-to-localbus 
bridge (whose connected devices are recognized correctly) and BAR0 is 
the control BAR (and its registers are read and written without a problem).


>> So I'd say this looks like a very promising approach. :-)
>
> Indeed. However, I don't think this approach solves the entire problem,
> for two reasons:
>
>   *) For small BARs that are not power-of-two sized, we may not want to
>      consume two windows, but instead consume a little bit more address
>      space. Using two windows to map a 96 KB BAR would be a waste of
>      windows: using a single 128 KB window is much more efficient.
>
>   *) I don't know if the algorithm to split the BAR into multiple
>      windows is going to be trivial.

I see others have already replied and I pretty much agree with them.

Thanks,
Gerlando

^ permalink raw reply	[flat|nested] 90+ messages in thread

* pci-mvebu driver on km_kirkwood
@ 2014-02-21 18:18                                             ` Gerlando Falauto
  0 siblings, 0 replies; 90+ messages in thread
From: Gerlando Falauto @ 2014-02-21 18:18 UTC (permalink / raw)
  To: linux-arm-kernel

Dear Thomas,

On 02/21/2014 02:47 PM, Thomas Petazzoni wrote:
> Dear Gerlando Falauto,
>
> On Fri, 21 Feb 2014 13:24:36 +0100, Gerlando Falauto wrote:
>
>> So I restored the total aperture size to 192MB.
>> I had to rework your patch a bit because:
>>
>> a) I'm running an older kernel and driver
>> b) sizes are actually 1-byte offset
>
> Hum, right. This is a bit weird, maybe I should change that, I don't
> think the mvebu-mbus driver should accept 1-byte offset sizes.

I don't know anything about this, I only know the size dumped is of the 
form 0x...ffff, that's all.

>> Here's the assignment (same as before):
>>
>> pci 0000:00:01.0: BAR 8: assigned [mem 0xe0000000-0xebffffff]
>> pci 0000:01:00.0: BAR 1: assigned [mem 0xe0000000-0xe7ffffff]
>> pci 0000:01:00.0: BAR 3: assigned [mem 0xe8000000-0xe87fffff]
>> pci 0000:01:00.0: BAR 4: assigned [mem 0xe8800000-0xe8801fff]
>> pci 0000:01:00.0: BAR 0: assigned [mem 0xe8802000-0xe8802fff]
>> pci 0000:01:00.0: BAR 2: assigned [mem 0xe8803000-0xe8803fff]
>> pci 0000:01:00.0: BAR 5: assigned [mem 0xe8804000-0xe8804fff]
>>
>> And here's the output I get from:
>>
>> # cat /sys/kernel/debug/mvebu-mbus/devices
>> [00] 00000000e8000000 - 00000000ec000000 : pcie0.0 (remap 00000000e8000000)
>> [01] disabled
>> [02] disabled
>> [03] disabled
>> [04] 00000000ff000000 - 00000000ff010000 : nand
>> [05] 00000000f4000000 - 00000000f8000000 : vpcie
>> [06] 00000000fe000000 - 00000000fe010000 : dragonite
>> [07] 00000000e0000000 - 00000000e8000000 : pcie0.0
>
> This seems correct: we have two windows pointing to the same device,
> and they have consecutive addresses.

I don't know how to interpret the (remap ... ) bit, but yes, this looks 
right to me as well. I just don't know why mbus window 7 gets picked 
before 0, but apart from that, it looks nice.

>> I did not get to test the whole address space thoroughly, but all the
>> BARs are still accessible (mainly BAR0 which contains the control space
>> and is mapped on the "new" MBUS window, and BAR1 which is the "big"
>> one). So at least, the issues we had before are now gone.
>
> Did you check that what you read from BAR0 (which is mapped on the new
> MBUS window) is really what you expect, and not just the same thing as
> BAR1 accessible for the big window? I just want to make sure that the
> hardware indeed properly handles two windows for the same device.

Yes, there's no way the two BARs could be aliased. It's a fairly complex 
FPGA design, where BAR1 is the huge address space for a PCI-to-localbus 
bridge (whose connected devices are recognized correctly) and BAR0 is 
the control BAR (and its registers are read and written without a problem).


>> So I'd say this looks like a very promising approach. :-)
>
> Indeed. However, I don't think this approach solves the entire problem,
> for two reasons:
>
>   *) For small BARs that are not power-of-two sized, we may not want to
>      consume two windows, but instead consume a little bit more address
>      space. Using two windows to map a 96 KB BAR would be a waste of
>      windows: using a single 128 KB window is much more efficient.
>
>   *) I don't know if the algorithm to split the BAR into multiple
>      windows is going to be trivial.

I see others have already replied and I pretty much agree with them.

Thanks,
Gerlando

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: pci-mvebu driver on km_kirkwood
  2014-02-21 18:05                                                   ` Arnd Bergmann
@ 2014-02-21 18:29                                                     ` Gerlando Falauto
  -1 siblings, 0 replies; 90+ messages in thread
From: Gerlando Falauto @ 2014-02-21 18:29 UTC (permalink / raw)
  To: Arnd Bergmann, linux-arm-kernel
  Cc: Jason Gunthorpe, Thomas Petazzoni, Lior Amsalem, Andrew Lunn,
	Jason Cooper, Longchamp, Valentin, linux-pci, Gregory Cl??ment,
	Ezequiel Garcia, Bjorn Helgaas, Sebastian Hesselbarth

On 02/21/2014 07:05 PM, Arnd Bergmann wrote:
> On Friday 21 February 2014 10:31:05 Jason Gunthorpe wrote:
>> On Fri, Feb 21, 2014 at 06:05:08PM +0100, Thomas Petazzoni wrote:
>>
>>> Now I have another question: our mvebu_pcie_align_resource() function
>>> makes sure that the base address of the BAR is aligned on its size,
>>> because it is a requirement of MBus windows. However, if you later
>>> split the BAR into multiple windows, will this continue to work out?
>>
>> No, you must align to (1 << log2_round_down(size)) - that will always
>> be the largest mbus window created and thus the highest starting
>> alignment requirement.
>
> Unless you allow reordering the two windows. If you want a 96MB
> window, you only need 32MB alignment because you can either put
> the actual 64MB window first if you have 64MB alignment, or you
> put the 32MB window first if you don't and then the following
> 64MB will be aligned.
>
> It gets more complicated if you want to allow a 72MB window
> (16MB+64MB), as that could either be 64MB aligned or start 16MB
> before the next multiple of 64MB.
>
> I don't think there is any reason why code anywhere should align
> the window to a multiple of the size though if the size is not
> power-of-two, such as aligning to multiples of 96MB. That wouldn't
> help anyone.

I also don't see why in the world there would be a requirement of having 
a given "oddly-sized" range (e.g. 96MB) aligned to a multiple of its 
size. In the end, AFAIK aligment requirements' only purpose is to make 
hardware simpler. Cant'see how aligning to an "odd" number would help 
achieving this purpose. But that's just me, of course.

Thanks guys!
Gerlando

^ permalink raw reply	[flat|nested] 90+ messages in thread

* pci-mvebu driver on km_kirkwood
@ 2014-02-21 18:29                                                     ` Gerlando Falauto
  0 siblings, 0 replies; 90+ messages in thread
From: Gerlando Falauto @ 2014-02-21 18:29 UTC (permalink / raw)
  To: linux-arm-kernel

On 02/21/2014 07:05 PM, Arnd Bergmann wrote:
> On Friday 21 February 2014 10:31:05 Jason Gunthorpe wrote:
>> On Fri, Feb 21, 2014 at 06:05:08PM +0100, Thomas Petazzoni wrote:
>>
>>> Now I have another question: our mvebu_pcie_align_resource() function
>>> makes sure that the base address of the BAR is aligned on its size,
>>> because it is a requirement of MBus windows. However, if you later
>>> split the BAR into multiple windows, will this continue to work out?
>>
>> No, you must align to (1 << log2_round_down(size)) - that will always
>> be the largest mbus window created and thus the highest starting
>> alignment requirement.
>
> Unless you allow reordering the two windows. If you want a 96MB
> window, you only need 32MB alignment because you can either put
> the actual 64MB window first if you have 64MB alignment, or you
> put the 32MB window first if you don't and then the following
> 64MB will be aligned.
>
> It gets more complicated if you want to allow a 72MB window
> (16MB+64MB), as that could either be 64MB aligned or start 16MB
> before the next multiple of 64MB.
>
> I don't think there is any reason why code anywhere should align
> the window to a multiple of the size though if the size is not
> power-of-two, such as aligning to multiples of 96MB. That wouldn't
> help anyone.

I also don't see why in the world there would be a requirement of having 
a given "oddly-sized" range (e.g. 96MB) aligned to a multiple of its 
size. In the end, AFAIK aligment requirements' only purpose is to make 
hardware simpler. Cant'see how aligning to an "odd" number would help 
achieving this purpose. But that's just me, of course.

Thanks guys!
Gerlando

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: pci-mvebu driver on km_kirkwood
  2014-02-21 18:18                                             ` Gerlando Falauto
@ 2014-02-21 18:45                                               ` Thomas Petazzoni
  -1 siblings, 0 replies; 90+ messages in thread
From: Thomas Petazzoni @ 2014-02-21 18:45 UTC (permalink / raw)
  To: Gerlando Falauto
  Cc: Jason Gunthorpe, Bjorn Helgaas, linux-pci, linux-arm-kernel,
	Andrew Lunn, Sebastian Hesselbarth, Jason Cooper, Longchamp,
	Valentin, Ezequiel Garcia, Lior Amsalem, Gregory Clément

Dear Gerlando Falauto,

On Fri, 21 Feb 2014 19:18:25 +0100, Gerlando Falauto wrote:

> > Hum, right. This is a bit weird, maybe I should change that, I don't
> > think the mvebu-mbus driver should accept 1-byte offset sizes.
> 
> I don't know anything about this, I only know the size dumped is of the 
> form 0x...ffff, that's all.

I'll have to look into this.

> >> # cat /sys/kernel/debug/mvebu-mbus/devices
> >> [00] 00000000e8000000 - 00000000ec000000 : pcie0.0 (remap 00000000e8000000)
> >> [01] disabled
> >> [02] disabled
> >> [03] disabled
> >> [04] 00000000ff000000 - 00000000ff010000 : nand
> >> [05] 00000000f4000000 - 00000000f8000000 : vpcie
> >> [06] 00000000fe000000 - 00000000fe010000 : dragonite
> >> [07] 00000000e0000000 - 00000000e8000000 : pcie0.0
> >
> > This seems correct: we have two windows pointing to the same device,
> > and they have consecutive addresses.
> 
> I don't know how to interpret the (remap ... ) bit, but yes, this looks 
> right to me as well. I just don't know why mbus window 7 gets picked 
> before 0, but apart from that, it looks nice.

Basically, some windows have an additional capability: they are
"remappable". On Kirkwood, the first 4 windows are remappable, and the
last 4 are not. Therefore, unless you request a remappable window, we
allocate a non-remappable one, which is why window 4 to 7 get used
first. And then, even though we don't need the remappable feature for
the last window, there are no more non-remappable windows available, so
window 0 gets allocated for our second PCIe window.

It matches fine with the expected behavior of the mvebu-mbus driver.

> > Did you check that what you read from BAR0 (which is mapped on the new
> > MBUS window) is really what you expect, and not just the same thing as
> > BAR1 accessible for the big window? I just want to make sure that the
> > hardware indeed properly handles two windows for the same device.
> 
> Yes, there's no way the two BARs could be aliased. It's a fairly complex 
> FPGA design, where BAR1 is the huge address space for a PCI-to-localbus 
> bridge (whose connected devices are recognized correctly) and BAR0 is 
> the control BAR (and its registers are read and written without a problem).

Great, so it means that it really works!

Thomas
-- 
Thomas Petazzoni, CTO, Free Electrons
Embedded Linux, Kernel and Android engineering
http://free-electrons.com

^ permalink raw reply	[flat|nested] 90+ messages in thread

* pci-mvebu driver on km_kirkwood
@ 2014-02-21 18:45                                               ` Thomas Petazzoni
  0 siblings, 0 replies; 90+ messages in thread
From: Thomas Petazzoni @ 2014-02-21 18:45 UTC (permalink / raw)
  To: linux-arm-kernel

Dear Gerlando Falauto,

On Fri, 21 Feb 2014 19:18:25 +0100, Gerlando Falauto wrote:

> > Hum, right. This is a bit weird, maybe I should change that, I don't
> > think the mvebu-mbus driver should accept 1-byte offset sizes.
> 
> I don't know anything about this, I only know the size dumped is of the 
> form 0x...ffff, that's all.

I'll have to look into this.

> >> # cat /sys/kernel/debug/mvebu-mbus/devices
> >> [00] 00000000e8000000 - 00000000ec000000 : pcie0.0 (remap 00000000e8000000)
> >> [01] disabled
> >> [02] disabled
> >> [03] disabled
> >> [04] 00000000ff000000 - 00000000ff010000 : nand
> >> [05] 00000000f4000000 - 00000000f8000000 : vpcie
> >> [06] 00000000fe000000 - 00000000fe010000 : dragonite
> >> [07] 00000000e0000000 - 00000000e8000000 : pcie0.0
> >
> > This seems correct: we have two windows pointing to the same device,
> > and they have consecutive addresses.
> 
> I don't know how to interpret the (remap ... ) bit, but yes, this looks 
> right to me as well. I just don't know why mbus window 7 gets picked 
> before 0, but apart from that, it looks nice.

Basically, some windows have an additional capability: they are
"remappable". On Kirkwood, the first 4 windows are remappable, and the
last 4 are not. Therefore, unless you request a remappable window, we
allocate a non-remappable one, which is why window 4 to 7 get used
first. And then, even though we don't need the remappable feature for
the last window, there are no more non-remappable windows available, so
window 0 gets allocated for our second PCIe window.

It matches fine with the expected behavior of the mvebu-mbus driver.

> > Did you check that what you read from BAR0 (which is mapped on the new
> > MBUS window) is really what you expect, and not just the same thing as
> > BAR1 accessible for the big window? I just want to make sure that the
> > hardware indeed properly handles two windows for the same device.
> 
> Yes, there's no way the two BARs could be aliased. It's a fairly complex 
> FPGA design, where BAR1 is the huge address space for a PCI-to-localbus 
> bridge (whose connected devices are recognized correctly) and BAR0 is 
> the control BAR (and its registers are read and written without a problem).

Great, so it means that it really works!

Thomas
-- 
Thomas Petazzoni, CTO, Free Electrons
Embedded Linux, Kernel and Android engineering
http://free-electrons.com

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: pci-mvebu driver on km_kirkwood
  2014-02-21  0:24                           ` Jason Gunthorpe
@ 2014-02-21 19:05                             ` Bjorn Helgaas
  -1 siblings, 0 replies; 90+ messages in thread
From: Bjorn Helgaas @ 2014-02-21 19:05 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Thomas Petazzoni, Gerlando Falauto, linux-pci, linux-arm-kernel,
	Andrew Lunn, Sebastian Hesselbarth, Jason Cooper, Longchamp,
	Valentin, Ezequiel Garcia, Lior Amsalem, Gavin Shan,
	Benjamin Herrenschmidt

[+cc Gavin, Ben for EEH alignment question below]

On Thu, Feb 20, 2014 at 5:24 PM, Jason Gunthorpe
<jgunthorpe@obsidianresearch.com> wrote:
> On Thu, Feb 20, 2014 at 12:18:42PM -0700, Bjorn Helgaas wrote:
>
>> > On Marvell hardware, the physical address space layout is configurable,
>> > through the use of "MBus windows". A "MBus window" is defined by a base
>> > address, a size, and a target device. So if the CPU needs to access a
>> > given device (such as PCIe 0.0 for example), then we need to create a
>> > "MBus window" whose size and target device match PCIe 0.0.
> ...

> So the driver creates a 'compliant' config space for the root
> port. Building the config space requires harmonizing registers related
> to the PCI-E and registers related to internal routing and dealing
> with the mismatch between what the hardware can actualy provide and
> what the PCI spec requires it provide.
> ...

>> > Since Armada XP has 10 PCIe interfaces, we cannot just statically
>> > create as many MBus windows as there are PCIe interfaces: it would both
>> > exhaust the number of MBus windows available, and also exhaust the
>> > physical address space, because we would have to create very large
>> > windows, just in case the PCIe device plugged behind this interface
>> > needs large BARs.
> ...

>> I'm still not sure I understand what's going on here.  It sounds like
>> your emulated bridge basically wraps the host bridge and makes it look
>> like a PCI-PCI bridge.  But I assume the host bridge itself is also
>> visible, and has apertures (I guess these are the MBus windows?)
>
> No, there is only one bridge, it is a per-physical-port MBUS / PCI-E
> bridge. It performs an identical function to the root port bridge
> described in PCI-E. MBUS serves as the root-complex internal bus 0.
>
> There isn't 2 levels of bridging, so the MBUS / PCI-E bridge can
> claim any system address and there is no such thing as a 'host
> bridge'.
>
> What Linux calls 'the host bridge aperture' is simply a wack of
> otherwise unused physical address space, it has no special properties.
>
>> It'd be nice if dmesg mentioned the host bridge explicitly as we do on
>> other architectures; maybe that would help understand what's going on
>> under the covers.  Maybe a longer excerpt would already have this; you
>> already use pci_add_resource_offset(), which is used when creating the
>> root bus, so you must have some sort of aperture before enumerating.
>
> Well, /proc/iomem looks like this:
>
> e0000000-efffffff : PCI MEM 0000
>   e0000000-e00fffff : PCI Bus 0000:01
>     e0000000-e001ffff : 0000:01:00.0
>
> 'PCI MEM 0000' is the 'host bridge aperture' it is an arbitary
> range of address space that doesn't overlap anything.
>
> 'PCI Bus 0000:01' is the MBUS / PCI-E root port bridge for physical
> port 0

Thanks for making this more concrete.  Let me see if I understand any better:

- e0000000-efffffff is the "host bridge aperture" but it doesn't
correspond to an actual aperture in hardware (there are no registers
where you set this range).  The only real use for this range is to be
the arena within which the PCI core can assign space to the Root
Ports.  This is static and you don't need to change it based on what
devices we discover.

- There may be several MBus/PCIe Root Ports, and you want to configure
their apertures at enumeration-time based on what devices are below
them.  As you say, the PCI core supports this except that MBus
apertures must be a power-of-two in size and aligned on their size,
while ordinary PCI bridge windows only need to start and end on 1MB
boundaries.

- e0000000-e00fffff is an example of one MBus/PCIe aperture, and this
space is available on PCI bus 01.  This one happens to be 1MB in size,
but it could be 2MB, 4MB, etc., but not 3MB like a normal bridge
window could be.

- You're currently using the ARM ->align_resource() hook (part of
pcibios_align_resource()), which is used in the bowels of the
allocator (__find_resource()) and affects the starting address of the
region we allocate, but not the size.  So you can force the start of
an MBus aperture to be power-of-two aligned, but not the end.

The allocate_resource() alignf argument is only used by PCI and
PCMCIA, so it doesn't seem like it would be too terrible to extend the
alignf interface so it could control the size, too.  Would something
like that solve this problem?

I first wondered if you could use pcibios_window_alignment(), but it
doesn't know the amount of space we need below the bridge, and it also
can't affect the size of the window or the ending address, so I don't
think it will help.

But I wonder if powerpc has a similar issue here: I think EEH might
need, for example 16MB bridge window alignment.  Since
pcibios_window_alignment() only affects the *starting* address, could
the core assign a 9MB window whose starting address is 16MB-aligned?
Could EEH deal with that?  What if the PCI core assigned the space
right after the 9MB window to another device?

Bjorn

^ permalink raw reply	[flat|nested] 90+ messages in thread

* pci-mvebu driver on km_kirkwood
@ 2014-02-21 19:05                             ` Bjorn Helgaas
  0 siblings, 0 replies; 90+ messages in thread
From: Bjorn Helgaas @ 2014-02-21 19:05 UTC (permalink / raw)
  To: linux-arm-kernel

[+cc Gavin, Ben for EEH alignment question below]

On Thu, Feb 20, 2014 at 5:24 PM, Jason Gunthorpe
<jgunthorpe@obsidianresearch.com> wrote:
> On Thu, Feb 20, 2014 at 12:18:42PM -0700, Bjorn Helgaas wrote:
>
>> > On Marvell hardware, the physical address space layout is configurable,
>> > through the use of "MBus windows". A "MBus window" is defined by a base
>> > address, a size, and a target device. So if the CPU needs to access a
>> > given device (such as PCIe 0.0 for example), then we need to create a
>> > "MBus window" whose size and target device match PCIe 0.0.
> ...

> So the driver creates a 'compliant' config space for the root
> port. Building the config space requires harmonizing registers related
> to the PCI-E and registers related to internal routing and dealing
> with the mismatch between what the hardware can actualy provide and
> what the PCI spec requires it provide.
> ...

>> > Since Armada XP has 10 PCIe interfaces, we cannot just statically
>> > create as many MBus windows as there are PCIe interfaces: it would both
>> > exhaust the number of MBus windows available, and also exhaust the
>> > physical address space, because we would have to create very large
>> > windows, just in case the PCIe device plugged behind this interface
>> > needs large BARs.
> ...

>> I'm still not sure I understand what's going on here.  It sounds like
>> your emulated bridge basically wraps the host bridge and makes it look
>> like a PCI-PCI bridge.  But I assume the host bridge itself is also
>> visible, and has apertures (I guess these are the MBus windows?)
>
> No, there is only one bridge, it is a per-physical-port MBUS / PCI-E
> bridge. It performs an identical function to the root port bridge
> described in PCI-E. MBUS serves as the root-complex internal bus 0.
>
> There isn't 2 levels of bridging, so the MBUS / PCI-E bridge can
> claim any system address and there is no such thing as a 'host
> bridge'.
>
> What Linux calls 'the host bridge aperture' is simply a wack of
> otherwise unused physical address space, it has no special properties.
>
>> It'd be nice if dmesg mentioned the host bridge explicitly as we do on
>> other architectures; maybe that would help understand what's going on
>> under the covers.  Maybe a longer excerpt would already have this; you
>> already use pci_add_resource_offset(), which is used when creating the
>> root bus, so you must have some sort of aperture before enumerating.
>
> Well, /proc/iomem looks like this:
>
> e0000000-efffffff : PCI MEM 0000
>   e0000000-e00fffff : PCI Bus 0000:01
>     e0000000-e001ffff : 0000:01:00.0
>
> 'PCI MEM 0000' is the 'host bridge aperture' it is an arbitary
> range of address space that doesn't overlap anything.
>
> 'PCI Bus 0000:01' is the MBUS / PCI-E root port bridge for physical
> port 0

Thanks for making this more concrete.  Let me see if I understand any better:

- e0000000-efffffff is the "host bridge aperture" but it doesn't
correspond to an actual aperture in hardware (there are no registers
where you set this range).  The only real use for this range is to be
the arena within which the PCI core can assign space to the Root
Ports.  This is static and you don't need to change it based on what
devices we discover.

- There may be several MBus/PCIe Root Ports, and you want to configure
their apertures at enumeration-time based on what devices are below
them.  As you say, the PCI core supports this except that MBus
apertures must be a power-of-two in size and aligned on their size,
while ordinary PCI bridge windows only need to start and end on 1MB
boundaries.

- e0000000-e00fffff is an example of one MBus/PCIe aperture, and this
space is available on PCI bus 01.  This one happens to be 1MB in size,
but it could be 2MB, 4MB, etc., but not 3MB like a normal bridge
window could be.

- You're currently using the ARM ->align_resource() hook (part of
pcibios_align_resource()), which is used in the bowels of the
allocator (__find_resource()) and affects the starting address of the
region we allocate, but not the size.  So you can force the start of
an MBus aperture to be power-of-two aligned, but not the end.

The allocate_resource() alignf argument is only used by PCI and
PCMCIA, so it doesn't seem like it would be too terrible to extend the
alignf interface so it could control the size, too.  Would something
like that solve this problem?

I first wondered if you could use pcibios_window_alignment(), but it
doesn't know the amount of space we need below the bridge, and it also
can't affect the size of the window or the ending address, so I don't
think it will help.

But I wonder if powerpc has a similar issue here: I think EEH might
need, for example 16MB bridge window alignment.  Since
pcibios_window_alignment() only affects the *starting* address, could
the core assign a 9MB window whose starting address is 16MB-aligned?
Could EEH deal with that?  What if the PCI core assigned the space
right after the 9MB window to another device?

Bjorn

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: pci-mvebu driver on km_kirkwood
  2014-02-21 19:05                             ` Bjorn Helgaas
@ 2014-02-21 19:21                               ` Thomas Petazzoni
  -1 siblings, 0 replies; 90+ messages in thread
From: Thomas Petazzoni @ 2014-02-21 19:21 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Jason Gunthorpe, Gerlando Falauto, linux-pci, linux-arm-kernel,
	Andrew Lunn, Sebastian Hesselbarth, Jason Cooper, Longchamp,
	Valentin, Ezequiel Garcia, Lior Amsalem, Gavin Shan,
	Benjamin Herrenschmidt

Dear Bjorn Helgaas,

On Fri, 21 Feb 2014 12:05:49 -0700, Bjorn Helgaas wrote:

> Thanks for making this more concrete.  Let me see if I understand any better:

Good to see that Jason Gunthorpe could explain this in better words.
I'll try to answer to your questions below, I'm sure Jason will correct
me if I say incorrect things, or things that are imprecise.

> - e0000000-efffffff is the "host bridge aperture" but it doesn't
> correspond to an actual aperture in hardware (there are no registers
> where you set this range).  The only real use for this range is to be
> the arena within which the PCI core can assign space to the Root
> Ports.  This is static and you don't need to change it based on what
> devices we discover.

Correct. We don't configure this in any hardware register. We just give
this aperture to the Linux PCI core to tell it "please allocate all
BAR physical ranges from this global aperture".

> - There may be several MBus/PCIe Root Ports, and you want to configure
> their apertures at enumeration-time based on what devices are below
> them.  As you say, the PCI core supports this except that MBus
> apertures must be a power-of-two in size and aligned on their size,
> while ordinary PCI bridge windows only need to start and end on 1MB
> boundaries.

Exactly.

> - e0000000-e00fffff is an example of one MBus/PCIe aperture, and this
> space is available on PCI bus 01.  This one happens to be 1MB in size,
> but it could be 2MB, 4MB, etc., but not 3MB like a normal bridge
> window could be.

Absolutely.

Note that we have the possibility of mapping a 3 MB BAR, by using a 2
MB window followed by a 1 MB window. However, since the number of
windows is limited (8 on Kirkwood, 20 on Armada 370/XP), we will prefer
to enlarge the BAR size if it's size is fairly small, and only resort
to using multiple windows if the amount of lost physical space is big.

So, for a 3 MB BAR, we will definitely prefer to extend it to a single 4
MB window, because losing 1 MB of physical address space is preferable
over losing one window.

For a 192 MB BAR, we may prefer to use one 128 MB window followed by
one 64 MB window.

But as long as the pci-mvebu driver can control the size of the BAR, it
can decide on its own whether its prefers enlarging the BAR, or using
multiple windows.

> - You're currently using the ARM ->align_resource() hook (part of
> pcibios_align_resource()), which is used in the bowels of the
> allocator (__find_resource()) and affects the starting address of the
> region we allocate, but not the size.  So you can force the start of
> an MBus aperture to be power-of-two aligned, but not the end.

Correct.

Happy to see that we've managed to get an understanding on what the
problem.

> The allocate_resource() alignf argument is only used by PCI and
> PCMCIA, so it doesn't seem like it would be too terrible to extend the
> alignf interface so it could control the size, too.  Would something
> like that solve this problem?

I don't know, I would have to look more precisely into this alignf
argument, and see how it could be extended to solve our constraints.

> I first wondered if you could use pcibios_window_alignment(), but it
> doesn't know the amount of space we need below the bridge, and it also
> can't affect the size of the window or the ending address, so I don't
> think it will help.
> 
> But I wonder if powerpc has a similar issue here: I think EEH might
> need, for example 16MB bridge window alignment.  Since
> pcibios_window_alignment() only affects the *starting* address, could
> the core assign a 9MB window whose starting address is 16MB-aligned?
> Could EEH deal with that?  What if the PCI core assigned the space
> right after the 9MB window to another device?

I'll let the other PCI people answer this :-)

Thanks a lot for your feedback!

Thomas
-- 
Thomas Petazzoni, CTO, Free Electrons
Embedded Linux, Kernel and Android engineering
http://free-electrons.com

^ permalink raw reply	[flat|nested] 90+ messages in thread

* pci-mvebu driver on km_kirkwood
@ 2014-02-21 19:21                               ` Thomas Petazzoni
  0 siblings, 0 replies; 90+ messages in thread
From: Thomas Petazzoni @ 2014-02-21 19:21 UTC (permalink / raw)
  To: linux-arm-kernel

Dear Bjorn Helgaas,

On Fri, 21 Feb 2014 12:05:49 -0700, Bjorn Helgaas wrote:

> Thanks for making this more concrete.  Let me see if I understand any better:

Good to see that Jason Gunthorpe could explain this in better words.
I'll try to answer to your questions below, I'm sure Jason will correct
me if I say incorrect things, or things that are imprecise.

> - e0000000-efffffff is the "host bridge aperture" but it doesn't
> correspond to an actual aperture in hardware (there are no registers
> where you set this range).  The only real use for this range is to be
> the arena within which the PCI core can assign space to the Root
> Ports.  This is static and you don't need to change it based on what
> devices we discover.

Correct. We don't configure this in any hardware register. We just give
this aperture to the Linux PCI core to tell it "please allocate all
BAR physical ranges from this global aperture".

> - There may be several MBus/PCIe Root Ports, and you want to configure
> their apertures at enumeration-time based on what devices are below
> them.  As you say, the PCI core supports this except that MBus
> apertures must be a power-of-two in size and aligned on their size,
> while ordinary PCI bridge windows only need to start and end on 1MB
> boundaries.

Exactly.

> - e0000000-e00fffff is an example of one MBus/PCIe aperture, and this
> space is available on PCI bus 01.  This one happens to be 1MB in size,
> but it could be 2MB, 4MB, etc., but not 3MB like a normal bridge
> window could be.

Absolutely.

Note that we have the possibility of mapping a 3 MB BAR, by using a 2
MB window followed by a 1 MB window. However, since the number of
windows is limited (8 on Kirkwood, 20 on Armada 370/XP), we will prefer
to enlarge the BAR size if it's size is fairly small, and only resort
to using multiple windows if the amount of lost physical space is big.

So, for a 3 MB BAR, we will definitely prefer to extend it to a single 4
MB window, because losing 1 MB of physical address space is preferable
over losing one window.

For a 192 MB BAR, we may prefer to use one 128 MB window followed by
one 64 MB window.

But as long as the pci-mvebu driver can control the size of the BAR, it
can decide on its own whether its prefers enlarging the BAR, or using
multiple windows.

> - You're currently using the ARM ->align_resource() hook (part of
> pcibios_align_resource()), which is used in the bowels of the
> allocator (__find_resource()) and affects the starting address of the
> region we allocate, but not the size.  So you can force the start of
> an MBus aperture to be power-of-two aligned, but not the end.

Correct.

Happy to see that we've managed to get an understanding on what the
problem.

> The allocate_resource() alignf argument is only used by PCI and
> PCMCIA, so it doesn't seem like it would be too terrible to extend the
> alignf interface so it could control the size, too.  Would something
> like that solve this problem?

I don't know, I would have to look more precisely into this alignf
argument, and see how it could be extended to solve our constraints.

> I first wondered if you could use pcibios_window_alignment(), but it
> doesn't know the amount of space we need below the bridge, and it also
> can't affect the size of the window or the ending address, so I don't
> think it will help.
> 
> But I wonder if powerpc has a similar issue here: I think EEH might
> need, for example 16MB bridge window alignment.  Since
> pcibios_window_alignment() only affects the *starting* address, could
> the core assign a 9MB window whose starting address is 16MB-aligned?
> Could EEH deal with that?  What if the PCI core assigned the space
> right after the 9MB window to another device?

I'll let the other PCI people answer this :-)

Thanks a lot for your feedback!

Thomas
-- 
Thomas Petazzoni, CTO, Free Electrons
Embedded Linux, Kernel and Android engineering
http://free-electrons.com

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: pci-mvebu driver on km_kirkwood
  2014-02-21 19:05                             ` Bjorn Helgaas
@ 2014-02-21 19:53                               ` Benjamin Herrenschmidt
  -1 siblings, 0 replies; 90+ messages in thread
From: Benjamin Herrenschmidt @ 2014-02-21 19:53 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Jason Gunthorpe, Thomas Petazzoni, Gerlando Falauto, linux-pci,
	linux-arm-kernel, Andrew Lunn, Sebastian Hesselbarth,
	Jason Cooper, Longchamp, Valentin, Ezequiel Garcia, Lior Amsalem,
	Gavin Shan

On Fri, 2014-02-21 at 12:05 -0700, Bjorn Helgaas wrote:
> But I wonder if powerpc has a similar issue here: I think EEH might
> need, for example 16MB bridge window alignment.  Since
> pcibios_window_alignment() only affects the *starting* address, could
> the core assign a 9MB window whose starting address is 16MB-aligned?
> Could EEH deal with that?  What if the PCI core assigned the space
> right after the 9MB window to another device?

Gavin, did you guys deal with that at all ? Are we aligning the size as
well somewhat ?

Cheers,
Ben.



^ permalink raw reply	[flat|nested] 90+ messages in thread

* pci-mvebu driver on km_kirkwood
@ 2014-02-21 19:53                               ` Benjamin Herrenschmidt
  0 siblings, 0 replies; 90+ messages in thread
From: Benjamin Herrenschmidt @ 2014-02-21 19:53 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri, 2014-02-21 at 12:05 -0700, Bjorn Helgaas wrote:
> But I wonder if powerpc has a similar issue here: I think EEH might
> need, for example 16MB bridge window alignment.  Since
> pcibios_window_alignment() only affects the *starting* address, could
> the core assign a 9MB window whose starting address is 16MB-aligned?
> Could EEH deal with that?  What if the PCI core assigned the space
> right after the 9MB window to another device?

Gavin, did you guys deal with that at all ? Are we aligning the size as
well somewhat ?

Cheers,
Ben.

^ permalink raw reply	[flat|nested] 90+ messages in thread

* pci-mvebu driver on km_kirkwood
  2014-02-21 19:53                               ` Benjamin Herrenschmidt
  (?)
@ 2014-02-23  3:43                               ` Gavin Shan
  -1 siblings, 0 replies; 90+ messages in thread
From: Gavin Shan @ 2014-02-23  3:43 UTC (permalink / raw)
  To: linux-arm-kernel

On Sat, Feb 22, 2014 at 06:53:23AM +1100, Benjamin Herrenschmidt wrote:
>On Fri, 2014-02-21 at 12:05 -0700, Bjorn Helgaas wrote:
>> But I wonder if powerpc has a similar issue here: I think EEH might
>> need, for example 16MB bridge window alignment.  Since
>> pcibios_window_alignment() only affects the *starting* address, could
>> the core assign a 9MB window whose starting address is 16MB-aligned?
>> Could EEH deal with that?  What if the PCI core assigned the space
>> right after the 9MB window to another device?
>
>Gavin, did you guys deal with that at all ? Are we aligning the size as
>well somewhat ?
>

Yeah, we can handle it well because pcibios_window_alignment() affects
both starting address and size of PCI bridge window. More details could
be found in pci/drivers/setup-bus.c::pbus_size_mem(): starting address,
"size0", "size1", "size1-size0" are aligned to "min_align", which is
coming from pcibios_window_alignment() (16MB as mentioned).

Thanks,
Gavin

^ permalink raw reply	[flat|nested] 90+ messages in thread

end of thread, other threads:[~2014-02-23  3:43 UTC | newest]

Thread overview: 90+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-07-10 16:15 pci-mvebu driver on km_kirkwood Gerlando Falauto
2013-07-10 16:57 ` Thomas Petazzoni
2013-07-10 17:31   ` Gerlando Falauto
2013-07-10 19:56     ` Gerlando Falauto
2013-07-11  7:03     ` Valentin Longchamp
2013-07-12  8:59       ` Thomas Petazzoni
2013-07-15 15:46         ` Valentin Longchamp
2013-07-15 19:51           ` Thomas Petazzoni
2013-07-11 14:32     ` Thomas Petazzoni
2014-02-18 17:29       ` Gerlando Falauto
2014-02-18 20:27         ` Thomas Petazzoni
2014-02-19  8:38           ` Gerlando Falauto
2014-02-19  9:26             ` Thomas Petazzoni
2014-02-19  9:39               ` Gerlando Falauto
2014-02-19 13:37                 ` Thomas Petazzoni
2014-02-19 13:37                   ` Thomas Petazzoni
2014-02-19 21:45                   ` Bjorn Helgaas
2014-02-19 21:45                     ` Bjorn Helgaas
2014-02-20  8:55                     ` Thomas Petazzoni
2014-02-20  8:55                       ` Thomas Petazzoni
2014-02-20 17:35                       ` Jason Gunthorpe
2014-02-20 17:35                         ` Jason Gunthorpe
2014-02-20 20:29                         ` Thomas Petazzoni
2014-02-20 20:29                           ` Thomas Petazzoni
2014-02-21  0:32                           ` Jason Gunthorpe
2014-02-21  0:32                             ` Jason Gunthorpe
2014-02-21  8:34                             ` Thomas Petazzoni
2014-02-21  8:34                               ` Thomas Petazzoni
2014-02-21  8:58                               ` Gerlando Falauto
2014-02-21  8:58                                 ` Gerlando Falauto
2014-02-21  9:12                                 ` Thomas Petazzoni
2014-02-21  9:12                                   ` Thomas Petazzoni
2014-02-21  9:16                                   ` Gerlando Falauto
2014-02-21  9:16                                     ` Gerlando Falauto
2014-02-21  9:39                                     ` Thomas Petazzoni
2014-02-21  9:39                                       ` Thomas Petazzoni
2014-02-21 12:24                                       ` Gerlando Falauto
2014-02-21 12:24                                         ` Gerlando Falauto
2014-02-21 13:47                                         ` Thomas Petazzoni
2014-02-21 13:47                                           ` Thomas Petazzoni
2014-02-21 15:05                                           ` Arnd Bergmann
2014-02-21 15:05                                             ` Arnd Bergmann
2014-02-21 15:11                                             ` Thomas Petazzoni
2014-02-21 15:11                                               ` Thomas Petazzoni
2014-02-21 15:20                                               ` Arnd Bergmann
2014-02-21 15:20                                                 ` Arnd Bergmann
2014-02-21 15:37                                                 ` Thomas Petazzoni
2014-02-21 15:37                                                   ` Thomas Petazzoni
2014-02-21 16:39                                           ` Jason Gunthorpe
2014-02-21 16:39                                             ` Jason Gunthorpe
2014-02-21 17:05                                             ` Thomas Petazzoni
2014-02-21 17:05                                               ` Thomas Petazzoni
2014-02-21 17:31                                               ` Jason Gunthorpe
2014-02-21 17:31                                                 ` Jason Gunthorpe
2014-02-21 18:05                                                 ` Arnd Bergmann
2014-02-21 18:05                                                   ` Arnd Bergmann
2014-02-21 18:29                                                   ` Gerlando Falauto
2014-02-21 18:29                                                     ` Gerlando Falauto
2014-02-21 18:18                                           ` Gerlando Falauto
2014-02-21 18:18                                             ` Gerlando Falauto
2014-02-21 18:45                                             ` Thomas Petazzoni
2014-02-21 18:45                                               ` Thomas Petazzoni
2014-02-20 19:18                       ` Bjorn Helgaas
2014-02-20 19:18                         ` Bjorn Helgaas
2014-02-21  0:24                         ` Jason Gunthorpe
2014-02-21  0:24                           ` Jason Gunthorpe
2014-02-21 19:05                           ` Bjorn Helgaas
2014-02-21 19:05                             ` Bjorn Helgaas
2014-02-21 19:21                             ` Thomas Petazzoni
2014-02-21 19:21                               ` Thomas Petazzoni
2014-02-21 19:53                             ` Benjamin Herrenschmidt
2014-02-21 19:53                               ` Benjamin Herrenschmidt
2014-02-23  3:43                               ` Gavin Shan
2013-07-31  8:03 ` Thomas Petazzoni
2013-07-31  8:26   ` Gerlando Falauto
2013-07-31  9:00     ` Thomas Petazzoni
2013-07-31 20:50       ` Jason Gunthorpe
2013-08-09 14:01         ` Thierry Reding
2013-08-26  9:27           ` Gerlando Falauto
2013-08-26  9:27             ` Gerlando Falauto
2013-08-26 12:02             ` Thierry Reding
2013-08-26 12:02               ` Thierry Reding
2013-08-26 14:49               ` Gerlando Falauto
2013-08-26 14:49                 ` Gerlando Falauto
2013-08-26 19:16                 ` Jason Gunthorpe
2013-08-26 19:16                   ` Jason Gunthorpe
     [not found]                   ` <20130826191615.GA20192-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
2013-11-04 14:49                     ` Gerlando Falauto
2013-11-04 14:49                       ` Gerlando Falauto
     [not found]                       ` <5277B417.2030506-SkAbAL50j+5BDgjK7y7TUQ@public.gmane.org>
2013-11-05  8:13                         ` Thierry Reding
2013-11-05  8:13                           ` Thierry Reding

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.