linux-pci.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* PCIe root bridge and memory ranges.
@ 2014-09-04 14:57 Robert
  2014-09-04 20:07 ` Bjorn Helgaas
  0 siblings, 1 reply; 7+ messages in thread
From: Robert @ 2014-09-04 14:57 UTC (permalink / raw)
  To: linux-pci

Hello All,

I am having trouble understanding what memory ranges go through the PCIe 
root bridge on a Haswell CPU (what I have in my system) and similarly on 
other modern CPUs. From what I can gather from sources online (including 
many datasheets) is that the PCIE root complex contains a PCI host bridge, 
which produces 1 PCI root bridge (ACPI\PNP0A08). This root bridge then 
forwards certain memory ranges onto the PCI/PCIe bus.

First of all if I take something like PAM registers, when something is 
written to this address the PAM register forwards it to DMI (if set to do so 
E.G. 0xD0000), so this transaction never goes through the PCI root bridge? 
What's confusing is if I look at the DSDT ACPI table and look at the 
ACPI\PNP0A08 device, it says that the PAM registers ranges go through it. I 
guess this is just for an OS purpose as it doesn’t need to know what exact 
ranges go through the root bridge? I'm not entirely sure on that and if 
anyone could clarify it would be appreciated.

As well as the PAM register ranges for the root bridge it also has the PCIe 
device memory range, which in my case is 0xC0000000 – 0xFEAFFFFF, now does 
that mean that anything above that range isn't going through the PCI root 
bridge, or is it just like that so an OS doesn't try map a device in that 
region. If I look at the Haswell datasheet it has small regions in that area 
between things like APIC and BIOS that reach the DMI.

It seems as if the PCI root bridge is using some sort of subtractive 
decoding that picks up whatever isn't sent to DRAM etc. and to make it easy 
for an OS the BIOS gives it a block of address space.

Finally, I was on a forum related to external GPUs, and some Windows users 
didn’t have enough space to map the device below 4GB. To resolve this they 
manually edited the DSDT table and added another entry above the 4GB 
barrier, now Windows mapped the GPU in the 64bit space. Now I presume 
changing the entry in the DSDT table didn't make any difference to how the 
hardware was set up, it just told the OS that the root bridge will in fact 
pick up this address range and therefore it knew it could map it there.

So am I write in thinking the ranges in the ACPI table are for the OSs 
purpose, and don't actually have to accurately represent what the hardware 
does.

..and does anyone know what ranges do actually go through a single PCIe root 
bridge on a modern system?

If anyone could help it would be greatly appreciated :)

Kind Regards,
Robert 


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: PCIe root bridge and memory ranges.
  2014-09-04 14:57 PCIe root bridge and memory ranges Robert
@ 2014-09-04 20:07 ` Bjorn Helgaas
  2014-09-04 21:41   ` Robert
  0 siblings, 1 reply; 7+ messages in thread
From: Bjorn Helgaas @ 2014-09-04 20:07 UTC (permalink / raw)
  To: Robert; +Cc: linux-pci

On Thu, Sep 4, 2014 at 8:57 AM, Robert <RJSmith92@live.com> wrote:
> Hello All,
>
> I am having trouble understanding what memory ranges go through the PCIe
> root bridge on a Haswell CPU (what I have in my system) and similarly on
> other modern CPUs. From what I can gather from sources online (including
> many datasheets) is that the PCIE root complex contains a PCI host bridge,
> which produces 1 PCI root bridge (ACPI\PNP0A08). This root bridge then
> forwards certain memory ranges onto the PCI/PCIe bus.

Right so far.

> First of all if I take something like PAM registers, when something is
> written to this address the PAM register forwards it to DMI (if set to do so
> E.G. 0xD0000), so this transaction never goes through the PCI root bridge?
> What's confusing is if I look at the DSDT ACPI table and look at the
> ACPI\PNP0A08 device, it says that the PAM registers ranges go through it. I
> guess this is just for an OS purpose as it doesn’t need to know what exact
> ranges go through the root bridge? I'm not entirely sure on that and if
> anyone could clarify it would be appreciated.

I don't really know anything about PAM registers.  Conceptually, the
PNP0A08 _CRS tells the OS that "if the host bridge sees a transaction
to an address in _CRS, it will forward it to PCI."  That allows the OS
manage BAR assignments for PCI devices.  If we hot-add a PCI device,
the OS can assign space for it from anything in _CRS.

It sounds like PAM is an arch-specific way to control transaction
routing.  That would probably be outside the purview of ACPI, and if
the OS uses PAM to re-route things mentioned in a PNP0A08 _CRS, that
would be some sort of arch-specific code.

> As well as the PAM register ranges for the root bridge it also has the PCIe
> device memory range, which in my case is 0xC0000000 – 0xFEAFFFFF, now does
> that mean that anything above that range isn't going through the PCI root
> bridge, or is it just like that so an OS doesn't try map a device in that
> region. If I look at the Haswell datasheet it has small regions in that area
> between things like APIC and BIOS that reach the DMI.

Theoretically, addresses not mentioned in _CRS should not be passed
down to PCI.  This is not always true in practice, of course.
Sometimes BIOSes leave PCI BARs assigned with addresses outside the
_CRS ranges.  As far as the kernel is concerned, that is illegal, and
both Windows and Linux will try to move those BARs so they are inside
a _CRS range.  But often those devices actually do work even when they
are outside the _CRS ranges, so obviously the bridge is forwarding
more than what _CRS describes.

> It seems as if the PCI root bridge is using some sort of subtractive
> decoding that picks up whatever isn't sent to DRAM etc. and to make it easy
> for an OS the BIOS gives it a block of address space.

That's possible, and I think many older systems used to work that way.
But it is not allowed by the ACPI spec, at least partly because you
can only have one subtractive decode bridge, and modern systems
typically have several PCI host bridges.

> Finally, I was on a forum related to external GPUs, and some Windows users
> didn’t have enough space to map the device below 4GB. To resolve this they
> manually edited the DSDT table and added another entry above the 4GB
> barrier, now Windows mapped the GPU in the 64bit space. Now I presume
> changing the entry in the DSDT table didn't make any difference to how the
> hardware was set up, it just told the OS that the root bridge will in fact
> pick up this address range and therefore it knew it could map it there.
>
> So am I write in thinking the ranges in the ACPI table are for the OSs
> purpose, and don't actually have to accurately represent what the hardware
> does.

Well, Linux relies completely on the host bridge _CRS.  We don't have
any native host bridge drivers (except some amd_bus and broadcom_bus
stuff that is deprecated and only kept for backwards compability), so
the PNP0A08 device is really all we have to operate the host bridge
and manage PCI device BARs.

> ..and does anyone know what ranges do actually go through a single PCIe root
> bridge on a modern system?
>
> If anyone could help it would be greatly appreciated :)
>
> Kind Regards,
> Robert
> --
> To unsubscribe from this list: send the line "unsubscribe linux-pci" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: PCIe root bridge and memory ranges.
  2014-09-04 20:07 ` Bjorn Helgaas
@ 2014-09-04 21:41   ` Robert
  2014-09-09 15:50     ` Bjorn Helgaas
  0 siblings, 1 reply; 7+ messages in thread
From: Robert @ 2014-09-04 21:41 UTC (permalink / raw)
  To: Bjorn Helgaas; +Cc: linux-pci

Thanks for the reply Bjorn, really appreciate it.

> Right so far.

Thanks.

> I don't really know anything about PAM registers.  Conceptually, the
> PNP0A08 _CRS tells the OS that "if the host bridge sees a transaction
> to an address in _CRS, it will forward it to PCI."  That allows the OS
> manage BAR assignments for PCI devices.  If we hot-add a PCI device,
> the OS can assign space for it from anything in _CRS.

The PAM registers are used for the legacy DOS memory ranges (0xA0000 - 
0xFFFFF) and either send reads/writes into DRAM or to the DMI. I was a 
little confused because they show up in the _CRS for the PCI root bridge, 
but reading the Haswell datasheet it never mentions that they go through the 
PCI root bridge, just that they are sent to DMI. I would think that they 
don't go through the root bridge and are there to let an OS know if it needs 
to map a legacy device or something (not sure on that)?

> Theoretically, addresses not mentioned in _CRS should not be passed
> down to PCI.  This is not always true in practice, of course.
> Sometimes BIOSes leave PCI BARs assigned with addresses outside the
> _CRS ranges.  As far as the kernel is concerned, that is illegal, and
> both Windows and Linux will try to move those BARs so they are inside
> a _CRS range.  But often those devices actually do work even when they
> are outside the _CRS ranges, so obviously the bridge is forwarding
> more than what _CRS describes.

Thanks, that's what I'm thinking as well. For example the Haswell datasheet 
says that up to the 512GB address mark can be used for MMIO, but the _CRS 
for the root bridge only mentions the '0xC0000000 – 0xFEAFFFFF' range, and 
nothing above the 4GB mark. I'd be interested to see what happens if you 
filled up that space with devices, would the BIOS then create a new _CRS 
entry to tell the OS it can map devices at regions above 4GB?

> That's possible, and I think many older systems used to work that way.
> But it is not allowed by the ACPI spec, at least partly because you
> can only have one subtractive decode bridge, and modern systems
> typically have several PCI host bridges.

Looking at the datasheet again, it says for the PCI regions "PCI MemoryAdd. 
Range (subtractively decoded to DMI)". I presume this means that the root 
bridge is using subtractive decoding, as the system only has one root bridge 
would that be possible?, and if you have a system with multiple root bridges 
then I'd guess that the firmware would need to program each bridge with a 
specific range?

> Well, Linux relies completely on the host bridge _CRS.  We don't have
> any native host bridge drivers (except some amd_bus and broadcom_bus
> stuff that is deprecated and only kept for backwards compability), so
> the PNP0A08 device is really all we have to operate the host bridge
> and manage PCI device BARs.

Thanks.

I was looking at the Intel PCI root bridge spec, which can be found at 
(http://www.intel.co.uk/content/dam/doc/reference-guide/efi-pci-host-bridge-allocation-protocol-specification.pdf) 
and it mentions that each root bridge has to request resources from the host 
bridge that will then allocate it resources etc. It's from 2002 so I'm not 
sure if it is used anymore but does anyone know if this is still used, and 
in my system that has one root bridge and looks to be using subtractive 
decoding, I don't think it would be used in my system. With system that have 
2 or more root bridges, would this protocol still be used?

..and finally, regarding PCI, an ancient HP article says "The PCI 2.2 
specification (pages 202-204) dictates that root PCI bus must be allocated 
one block of MMIO addresses. This block of addresses is subdivided into the 
regions needed for each device on that PCI bus. And each of those device 
MMIO regions must be aligned on addresses that are multiples of the size of 
the region". The part that says the root PCI bus must be allocated one block 
of addresses, is this true? I have looked at the PCI 2.2 spec pages 202 - 
204 and it says nothing about this and am I right in thinking the root 
bridges are chipset specific, so it wouldn't' be in the PCI 2.2 spec anyway? 
would it be possible for a root bridge to have 2 blocks of addresses go 
through it (not that you ever would) and then have 2 _CRS entries for that 
root bridge?

Thanks again.

Kind Regards,
Robert

-----Original Message----- 
From: Bjorn Helgaas
Sent: Thursday, September 04, 2014 9:07 PM
To: Robert
Cc: linux-pci@vger.kernel.org
Subject: Re: PCIe root bridge and memory ranges.

On Thu, Sep 4, 2014 at 8:57 AM, Robert <RJSmith92@live.com> wrote:
> Hello All,
>
> I am having trouble understanding what memory ranges go through the PCIe
> root bridge on a Haswell CPU (what I have in my system) and similarly on
> other modern CPUs. From what I can gather from sources online (including
> many datasheets) is that the PCIE root complex contains a PCI host bridge,
> which produces 1 PCI root bridge (ACPI\PNP0A08). This root bridge then
> forwards certain memory ranges onto the PCI/PCIe bus.

Right so far.

> First of all if I take something like PAM registers, when something is
> written to this address the PAM register forwards it to DMI (if set to do 
> so
> E.G. 0xD0000), so this transaction never goes through the PCI root bridge?
> What's confusing is if I look at the DSDT ACPI table and look at the
> ACPI\PNP0A08 device, it says that the PAM registers ranges go through it. 
> I
> guess this is just for an OS purpose as it doesn’t need to know what exact
> ranges go through the root bridge? I'm not entirely sure on that and if
> anyone could clarify it would be appreciated.

I don't really know anything about PAM registers.  Conceptually, the
PNP0A08 _CRS tells the OS that "if the host bridge sees a transaction
to an address in _CRS, it will forward it to PCI."  That allows the OS
manage BAR assignments for PCI devices.  If we hot-add a PCI device,
the OS can assign space for it from anything in _CRS.

It sounds like PAM is an arch-specific way to control transaction
routing.  That would probably be outside the purview of ACPI, and if
the OS uses PAM to re-route things mentioned in a PNP0A08 _CRS, that
would be some sort of arch-specific code.

> As well as the PAM register ranges for the root bridge it also has the 
> PCIe
> device memory range, which in my case is 0xC0000000 – 0xFEAFFFFF, now does
> that mean that anything above that range isn't going through the PCI root
> bridge, or is it just like that so an OS doesn't try map a device in that
> region. If I look at the Haswell datasheet it has small regions in that 
> area
> between things like APIC and BIOS that reach the DMI.

Theoretically, addresses not mentioned in _CRS should not be passed
down to PCI.  This is not always true in practice, of course.
Sometimes BIOSes leave PCI BARs assigned with addresses outside the
_CRS ranges.  As far as the kernel is concerned, that is illegal, and
both Windows and Linux will try to move those BARs so they are inside
a _CRS range.  But often those devices actually do work even when they
are outside the _CRS ranges, so obviously the bridge is forwarding
more than what _CRS describes.

> It seems as if the PCI root bridge is using some sort of subtractive
> decoding that picks up whatever isn't sent to DRAM etc. and to make it 
> easy
> for an OS the BIOS gives it a block of address space.

That's possible, and I think many older systems used to work that way.
But it is not allowed by the ACPI spec, at least partly because you
can only have one subtractive decode bridge, and modern systems
typically have several PCI host bridges.

> Finally, I was on a forum related to external GPUs, and some Windows users
> didn’t have enough space to map the device below 4GB. To resolve this they
> manually edited the DSDT table and added another entry above the 4GB
> barrier, now Windows mapped the GPU in the 64bit space. Now I presume
> changing the entry in the DSDT table didn't make any difference to how the
> hardware was set up, it just told the OS that the root bridge will in fact
> pick up this address range and therefore it knew it could map it there.
>
> So am I write in thinking the ranges in the ACPI table are for the OSs
> purpose, and don't actually have to accurately represent what the hardware
> does.

Well, Linux relies completely on the host bridge _CRS.  We don't have
any native host bridge drivers (except some amd_bus and broadcom_bus
stuff that is deprecated and only kept for backwards compability), so
the PNP0A08 device is really all we have to operate the host bridge
and manage PCI device BARs.

> ..and does anyone know what ranges do actually go through a single PCIe 
> root
> bridge on a modern system?
>
> If anyone could help it would be greatly appreciated :)
>
> Kind Regards,
> Robert
> --
> To unsubscribe from this list: send the line "unsubscribe linux-pci" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html 


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: PCIe root bridge and memory ranges.
  2014-09-04 21:41   ` Robert
@ 2014-09-09 15:50     ` Bjorn Helgaas
  2014-09-11  0:18       ` Robert
  0 siblings, 1 reply; 7+ messages in thread
From: Bjorn Helgaas @ 2014-09-09 15:50 UTC (permalink / raw)
  To: Robert; +Cc: linux-pci

On Thu, Sep 4, 2014 at 3:41 PM, Robert <RJSmith92@live.com> wrote:
> Bjorn Helgaas wrote:
>> I don't really know anything about PAM registers.  Conceptually, the
>> PNP0A08 _CRS tells the OS that "if the host bridge sees a transaction
>> to an address in _CRS, it will forward it to PCI."  That allows the OS
>> manage BAR assignments for PCI devices.  If we hot-add a PCI device,
>> the OS can assign space for it from anything in _CRS.
>
>
> The PAM registers are used for the legacy DOS memory ranges (0xA0000 -
> 0xFFFFF) and either send reads/writes into DRAM or to the DMI. I was a
> little confused because they show up in the _CRS for the PCI root bridge,
> but reading the Haswell datasheet it never mentions that they go through the
> PCI root bridge, just that they are sent to DMI. I would think that they
> don't go through the root bridge and are there to let an OS know if it needs
> to map a legacy device or something (not sure on that)?

I don't know much about DMI, but as far as I know, it is not visible
in the ACPI platform description.  If the range at 0xA0000 can be used
for a PCI device, then it needs to be in the _CRS of the host bridge.

>> Theoretically, addresses not mentioned in _CRS should not be passed
>> down to PCI.  This is not always true in practice, of course.
>> Sometimes BIOSes leave PCI BARs assigned with addresses outside the
>> _CRS ranges.  As far as the kernel is concerned, that is illegal, and
>> both Windows and Linux will try to move those BARs so they are inside
>> a _CRS range.  But often those devices actually do work even when they
>> are outside the _CRS ranges, so obviously the bridge is forwarding
>> more than what _CRS describes.
>
>
> Thanks, that's what I'm thinking as well. For example the Haswell datasheet
> says that up to the 512GB address mark can be used for MMIO, but the _CRS
> for the root bridge only mentions the '0xC0000000 – 0xFEAFFFFF' range, and
> nothing above the 4GB mark. I'd be interested to see what happens if you
> filled up that space with devices, would the BIOS then create a new _CRS
> entry to tell the OS it can map devices at regions above 4GB?

Sounds possible.  It seems like BIOSes often don't really do anything
with the bus address space above 4GB even when the hardware supports
it.  And of course, Linux has no idea what the hardware actually
supports, since we only look at the ACPI PNP0A03/08 descriptions.

>> That's possible, and I think many older systems used to work that way.
>> But it is not allowed by the ACPI spec, at least partly because you
>> can only have one subtractive decode bridge, and modern systems
>> typically have several PCI host bridges.
>
>
> Looking at the datasheet again, it says for the PCI regions "PCI MemoryAdd.
> Range (subtractively decoded to DMI)". I presume this means that the root
> bridge is using subtractive decoding, as the system only has one root bridge
> would that be possible?,

A host bridge definitely *can* use subtractive decoding.  But at least
on ACPI systems, that level of detail is really invisible to Linux.
We only know about the abstract host bridge described by ACPI, which
tells us about the positively decoded regions claimed by the bridge.

There actually is a _DEC bit in the ACPI Extended Address Space
Descriptor (ACPI r5.0, sec 6.4.3.5.4), that means "the bridge
subtractively decodes this address."  But Linux doesn't look at this
bit, and I assume it means that ACPI would have explicitly describe
all the address space that could be subtractively decoded anyway.

> and if you have a system with multiple root bridges
> then I'd guess that the firmware would need to program each bridge with a
> specific range?

Yes.

> I was looking at the Intel PCI root bridge spec, which can be found at
> (http://www.intel.co.uk/content/dam/doc/reference-guide/efi-pci-host-bridge-allocation-protocol-specification.pdf)
> and it mentions that each root bridge has to request resources from the host
> bridge that will then allocate it resources etc. It's from 2002 so I'm not
> sure if it is used anymore but does anyone know if this is still used, and
> in my system that has one root bridge and looks to be using subtractive
> decoding, I don't think it would be used in my system. With system that have
> 2 or more root bridges, would this protocol still be used?

Sorry, I don't know anything about this.  That spec is talking about
firmware, and really outside the view of the kernel.

> ..and finally, regarding PCI, an ancient HP article says "The PCI 2.2
> specification (pages 202-204) dictates that root PCI bus must be allocated
> one block of MMIO addresses. This block of addresses is subdivided into the
> regions needed for each device on that PCI bus. And each of those device
> MMIO regions must be aligned on addresses that are multiples of the size of
> the region". The part that says the root PCI bus must be allocated one block
> of addresses, is this true? I have looked at the PCI 2.2 spec pages 202 -
> 204 and it says nothing about this and am I right in thinking the root
> bridges are chipset specific, so it wouldn't' be in the PCI 2.2 spec anyway?
> would it be possible for a root bridge to have 2 blocks of addresses go
> through it (not that you ever would) and then have 2 _CRS entries for that
> root bridge?

Hmm.  I don't have a copy of the PCI 2.2 spec, but I don't think this
is true.  As far as I know, there is no restriction on the number of
regions that a PCI host bridge can claim.  The discovery and
programming of these regions is device-specific, of course, so this is
all outside the scope of the PCI specs.

We did a lot of work a few years ago to support an arbitrary number of
apertures, e.g.,
http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=2fe2abf896c1

Bjorn

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: PCIe root bridge and memory ranges.
  2014-09-09 15:50     ` Bjorn Helgaas
@ 2014-09-11  0:18       ` Robert
  2014-09-11 20:56         ` Bjorn Helgaas
  0 siblings, 1 reply; 7+ messages in thread
From: Robert @ 2014-09-11  0:18 UTC (permalink / raw)
  To: Bjorn Helgaas; +Cc: linux-pci

Thanks again Bjorn,

> I don't know much about DMI, but as far as I know, it is not visible
> in the ACPI platform description.  If the range at 0xA0000 can be used
> for a PCI device, then it needs to be in the _CRS of the host bridge.

As far as I know the DMI is transparent from an OS point of view, when the 
datasheet says something is sent to DMI, they're basically saying it is sent 
to the 'Southbridge'. I actually got the range wrong, The PAM range is from 
0xC0000 - 0xFFFFFF. From what I can gather these ranges are positively 
decoded either to DRAM or to the DMI. My guess is that these ranges don't go 
through the host bridge (which I believe to use subtractive decoding) but 
they are in the _CRS for the host bridge to let the OS know it can be used 
for a PCI device (even though technically it isn't going through the host 
bridge)

> Sounds possible.  It seems like BIOSes often don't really do anything
> with the bus address space above 4GB even when the hardware supports
> it.  And of course, Linux has no idea what the hardware actually
> supports, since we only look at the ACPI PNP0A03/08 descriptions.

Thanks.

> A host bridge definitely *can* use subtractive decoding.  But at least
> on ACPI systems, that level of detail is really invisible to Linux.
> We only know about the abstract host bridge described by ACPI, which
> tells us about the positively decoded regions claimed by the bridge.

> There actually is a _DEC bit in the ACPI Extended Address Space
> Descriptor (ACPI r5.0, sec 6.4.3.5.4), that means "the bridge
> subtractively decodes this address."  But Linux doesn't look at this
> bit, and I assume it means that ACPI would have explicitly describe
> all the address space that could be subtractively decoded anyway.

Thanks. I agree, when I look at the _CRS for the host bridge it reads form 
TOLUD to get the bottom of the MMIO space, but then the top of this space is 
hardcoded in rather than read from a register. I suppose if it wasn't using 
subtractive decoding you wouldn't need to do this and could read from the 
registers in the host bridge directly to see what it decodes.

>> ..and finally, regarding PCI, an ancient HP article says "The PCI 2.2
>> specification (pages 202-204) dictates that root PCI bus must be 
>> allocated
>> one block of MMIO addresses. This block of addresses is subdivided into 
>> the
>> regions needed for each device on that PCI bus. And each of those device
>> MMIO regions must be aligned on addresses that are multiples of the size 
>> of
>> the region". The part that says the root PCI bus must be allocated one 
>> block
>> of addresses, is this true? I have looked at the PCI 2.2 spec pages 202 -
>> 204 and it says nothing about this and am I right in thinking the root
>> bridges are chipset specific, so it wouldn't' be in the PCI 2.2 spec 
>> anyway?
>> would it be possible for a root bridge to have 2 blocks of addresses go
>> through it (not that you ever would) and then have 2 _CRS entries for 
>> that
>> root bridge?

> Hmm.  I don't have a copy of the PCI 2.2 spec, but I don't think this
> is true.  As far as I know, there is no restriction on the number of
> regions that a PCI host bridge can claim.  The discovery and
> programming of these regions is device-specific, of course, so this is
> all outside the scope of the PCI specs.

> We did a lot of work a few years ago to support an arbitrary number of
> apertures, e.g.,
> http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=2fe2abf896c1

Thanks. Yes I think the HP article just meant that on page 202 -204 of the 
PCI 2.2 spec that each device takes a some of the resources assigned to the 
root bridge, and as you say the root bridge is device specific and not in 
the Spec. I've had a look at the link you sent me, it's a bit too 
complicated for me :( but in basic terms, does it allow for a root bridge to 
have 2 or more memory address windows e.g. 0xC0000000 - 0xCFFFFFFF and 
0xD0000000 - 0xFFFFFFFF?

1 last question :) Regarding PCIe, I understand that it is a packet based 
protocol but where are the packets created? Online a lot of resources say 
the Root Complex generates a PCIe transaction on behalf of the processor, 
but isn't the Root Complex made up of multiple devices (host bridge/s and 
memory controller etc.) Do you know which specific device generates the PCIe 
packet, is it the root bridge?

As always, thanks for taking the time to answer my questions.

Kind Regards,
Robert

-----Original Message----- 
From: Bjorn Helgaas
Sent: Tuesday, September 09, 2014 4:50 PM
To: Robert
Cc: linux-pci@vger.kernel.org
Subject: Re: PCIe root bridge and memory ranges.

On Thu, Sep 4, 2014 at 3:41 PM, Robert <RJSmith92@live.com> wrote:
> Bjorn Helgaas wrote:
>> I don't really know anything about PAM registers.  Conceptually, the
>> PNP0A08 _CRS tells the OS that "if the host bridge sees a transaction
>> to an address in _CRS, it will forward it to PCI."  That allows the OS
>> manage BAR assignments for PCI devices.  If we hot-add a PCI device,
>> the OS can assign space for it from anything in _CRS.
>
>
> The PAM registers are used for the legacy DOS memory ranges (0xA0000 -
> 0xFFFFF) and either send reads/writes into DRAM or to the DMI. I was a
> little confused because they show up in the _CRS for the PCI root bridge,
> but reading the Haswell datasheet it never mentions that they go through 
> the
> PCI root bridge, just that they are sent to DMI. I would think that they
> don't go through the root bridge and are there to let an OS know if it 
> needs
> to map a legacy device or something (not sure on that)?

I don't know much about DMI, but as far as I know, it is not visible
in the ACPI platform description.  If the range at 0xA0000 can be used
for a PCI device, then it needs to be in the _CRS of the host bridge.

>> Theoretically, addresses not mentioned in _CRS should not be passed
>> down to PCI.  This is not always true in practice, of course.
>> Sometimes BIOSes leave PCI BARs assigned with addresses outside the
>> _CRS ranges.  As far as the kernel is concerned, that is illegal, and
>> both Windows and Linux will try to move those BARs so they are inside
>> a _CRS range.  But often those devices actually do work even when they
>> are outside the _CRS ranges, so obviously the bridge is forwarding
>> more than what _CRS describes.
>
>
> Thanks, that's what I'm thinking as well. For example the Haswell 
> datasheet
> says that up to the 512GB address mark can be used for MMIO, but the _CRS
> for the root bridge only mentions the '0xC0000000 – 0xFEAFFFFF' range, and
> nothing above the 4GB mark. I'd be interested to see what happens if you
> filled up that space with devices, would the BIOS then create a new _CRS
> entry to tell the OS it can map devices at regions above 4GB?

Sounds possible.  It seems like BIOSes often don't really do anything
with the bus address space above 4GB even when the hardware supports
it.  And of course, Linux has no idea what the hardware actually
supports, since we only look at the ACPI PNP0A03/08 descriptions.

>> That's possible, and I think many older systems used to work that way.
>> But it is not allowed by the ACPI spec, at least partly because you
>> can only have one subtractive decode bridge, and modern systems
>> typically have several PCI host bridges.
>
>
> Looking at the datasheet again, it says for the PCI regions "PCI 
> MemoryAdd.
> Range (subtractively decoded to DMI)". I presume this means that the root
> bridge is using subtractive decoding, as the system only has one root 
> bridge
> would that be possible?,

A host bridge definitely *can* use subtractive decoding.  But at least
on ACPI systems, that level of detail is really invisible to Linux.
We only know about the abstract host bridge described by ACPI, which
tells us about the positively decoded regions claimed by the bridge.

There actually is a _DEC bit in the ACPI Extended Address Space
Descriptor (ACPI r5.0, sec 6.4.3.5.4), that means "the bridge
subtractively decodes this address."  But Linux doesn't look at this
bit, and I assume it means that ACPI would have explicitly describe
all the address space that could be subtractively decoded anyway.

> and if you have a system with multiple root bridges
> then I'd guess that the firmware would need to program each bridge with a
> specific range?

Yes.

> I was looking at the Intel PCI root bridge spec, which can be found at
> (http://www.intel.co.uk/content/dam/doc/reference-guide/efi-pci-host-bridge-allocation-protocol-specification.pdf)
> and it mentions that each root bridge has to request resources from the 
> host
> bridge that will then allocate it resources etc. It's from 2002 so I'm not
> sure if it is used anymore but does anyone know if this is still used, and
> in my system that has one root bridge and looks to be using subtractive
> decoding, I don't think it would be used in my system. With system that 
> have
> 2 or more root bridges, would this protocol still be used?

Sorry, I don't know anything about this.  That spec is talking about
firmware, and really outside the view of the kernel.

> ..and finally, regarding PCI, an ancient HP article says "The PCI 2.2
> specification (pages 202-204) dictates that root PCI bus must be allocated
> one block of MMIO addresses. This block of addresses is subdivided into 
> the
> regions needed for each device on that PCI bus. And each of those device
> MMIO regions must be aligned on addresses that are multiples of the size 
> of
> the region". The part that says the root PCI bus must be allocated one 
> block
> of addresses, is this true? I have looked at the PCI 2.2 spec pages 202 -
> 204 and it says nothing about this and am I right in thinking the root
> bridges are chipset specific, so it wouldn't' be in the PCI 2.2 spec 
> anyway?
> would it be possible for a root bridge to have 2 blocks of addresses go
> through it (not that you ever would) and then have 2 _CRS entries for that
> root bridge?

Hmm.  I don't have a copy of the PCI 2.2 spec, but I don't think this
is true.  As far as I know, there is no restriction on the number of
regions that a PCI host bridge can claim.  The discovery and
programming of these regions is device-specific, of course, so this is
all outside the scope of the PCI specs.

We did a lot of work a few years ago to support an arbitrary number of
apertures, e.g.,
http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=2fe2abf896c1

Bjorn 


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: PCIe root bridge and memory ranges.
  2014-09-11  0:18       ` Robert
@ 2014-09-11 20:56         ` Bjorn Helgaas
  2014-09-14  0:12           ` Robert
  0 siblings, 1 reply; 7+ messages in thread
From: Bjorn Helgaas @ 2014-09-11 20:56 UTC (permalink / raw)
  To: Robert; +Cc: linux-pci

On Wed, Sep 10, 2014 at 6:18 PM, Robert <RJSmith92@live.com> wrote:

>> We did a lot of work a few years ago to support an arbitrary number of
>> apertures, e.g.,
>>
>> http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=2fe2abf896c1
>
> ... but in basic terms, does it allow for a root bridge to
> have 2 or more memory address windows e.g. 0xC0000000 - 0xCFFFFFFF and
> 0xD0000000 - 0xFFFFFFFF?

Yes, that's very common.  For example, my laptop reports these windows:

  pci_bus 0000:00: root bus resource [io  0x0000-0x0cf7]
  pci_bus 0000:00: root bus resource [io  0x0d00-0xffff]
  pci_bus 0000:00: root bus resource [mem 0x000a0000-0x000bffff]
  pci_bus 0000:00: root bus resource [mem 0xbf200000-0xdfffffff]
  pci_bus 0000:00: root bus resource [mem 0xf0000000-0xfedfffff]
  pci_bus 0000:00: root bus resource [mem 0xfee01000-0xffffffff]

> 1 last question :) Regarding PCIe, I understand that it is a packet based
> protocol but where are the packets created? Online a lot of resources say
> the Root Complex generates a PCIe transaction on behalf of the processor,
> but isn't the Root Complex made up of multiple devices (host bridge/s and
> memory controller etc.) Do you know which specific device generates the PCIe
> packet, is it the root bridge?

I'm not a hardware person, and I would only be guessing here.
Obviously packets leave a Root Port, so they have to be created there
or farther inside the Root Complex.  My understanding is that a Root
Port is conceptually part of a Root Complex.

Bjorn

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: PCIe root bridge and memory ranges.
  2014-09-11 20:56         ` Bjorn Helgaas
@ 2014-09-14  0:12           ` Robert
  0 siblings, 0 replies; 7+ messages in thread
From: Robert @ 2014-09-14  0:12 UTC (permalink / raw)
  To: Bjorn Helgaas; +Cc: linux-pci

Thanks for all your help with my questions Bjorn, it really has been 
appreciated :)

Kind Regards,
Robert

-----Original Message----- 
From: Bjorn Helgaas
Sent: Thursday, September 11, 2014 9:56 PM
To: Robert
Cc: linux-pci@vger.kernel.org
Subject: Re: PCIe root bridge and memory ranges.

On Wed, Sep 10, 2014 at 6:18 PM, Robert <RJSmith92@live.com> wrote:

>> We did a lot of work a few years ago to support an arbitrary number of
>> apertures, e.g.,
>>
>> http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=2fe2abf896c1
>
> ... but in basic terms, does it allow for a root bridge to
> have 2 or more memory address windows e.g. 0xC0000000 - 0xCFFFFFFF and
> 0xD0000000 - 0xFFFFFFFF?

Yes, that's very common.  For example, my laptop reports these windows:

  pci_bus 0000:00: root bus resource [io  0x0000-0x0cf7]
  pci_bus 0000:00: root bus resource [io  0x0d00-0xffff]
  pci_bus 0000:00: root bus resource [mem 0x000a0000-0x000bffff]
  pci_bus 0000:00: root bus resource [mem 0xbf200000-0xdfffffff]
  pci_bus 0000:00: root bus resource [mem 0xf0000000-0xfedfffff]
  pci_bus 0000:00: root bus resource [mem 0xfee01000-0xffffffff]

> 1 last question :) Regarding PCIe, I understand that it is a packet based
> protocol but where are the packets created? Online a lot of resources say
> the Root Complex generates a PCIe transaction on behalf of the processor,
> but isn't the Root Complex made up of multiple devices (host bridge/s and
> memory controller etc.) Do you know which specific device generates the 
> PCIe
> packet, is it the root bridge?

I'm not a hardware person, and I would only be guessing here.
Obviously packets leave a Root Port, so they have to be created there
or farther inside the Root Complex.  My understanding is that a Root
Port is conceptually part of a Root Complex.

Bjorn 


^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2014-09-14  0:12 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-09-04 14:57 PCIe root bridge and memory ranges Robert
2014-09-04 20:07 ` Bjorn Helgaas
2014-09-04 21:41   ` Robert
2014-09-09 15:50     ` Bjorn Helgaas
2014-09-11  0:18       ` Robert
2014-09-11 20:56         ` Bjorn Helgaas
2014-09-14  0:12           ` Robert

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).